1. 04 4月, 2016 1 次提交
    • Y
      btrfs: Reset IO error counters before start of device replacing · 7ccefb98
      Yauhen Kharuzhy 提交于
      If device replace entry was found on disk at mounting and its num_write_errors
      stats counter has non-NULL value, then replace operation will never be
      finished and -EIO error will be reported by btrfs_scrub_dev() because
      this counter is never reset.
      
       # mount -o degraded /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
       # btrfs replace status /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
       Started on 25.Mar 07:28:00, canceled on 25.Mar 07:28:01 at 0.0%, 40 write errs, 0 uncorr. read errs
       # btrfs replace start -B 4 /dev/sdg /media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/
       ERROR: ioctl(DEV_REPLACE_START) failed on "/media/a4fb5c0a-21c5-4fe7-8d0e-fdd87d5f71ee/": Input/output error, no error
      
      Reset num_write_errors and num_uncorrectable_read_errors counters in the
      dev_replace structure before start of replacing.
      Signed-off-by: NYauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7ccefb98
  2. 14 3月, 2016 1 次提交
  3. 23 2月, 2016 1 次提交
    • L
      Btrfs: fix lockdep deadlock warning due to dev_replace · 73beece9
      Liu Bo 提交于
      Xfstests btrfs/011 complains about a deadlock warning,
      
      [ 1226.649039] =========================================================
      [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ]
      [ 1226.649039] 4.1.0+ #270 Not tainted
      [ 1226.649039] ---------------------------------------------------------
      [ 1226.652955] kswapd0/46 just changed the state of lock:
      [ 1226.652955]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [ 1226.652955]  (&fs_info->dev_replace.lock){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      [ 1226.652955]
      other info that might help us debug this:
      [ 1226.652955] Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
      
      [ 1226.652955]  Possible interrupt unsafe locking scenario:
      
      [ 1226.652955]        CPU0                    CPU1
      [ 1226.652955]        ----                    ----
      [ 1226.652955]   lock(&fs_info->dev_replace.lock);
      [ 1226.652955]                                local_irq_disable();
      [ 1226.652955]                                lock(&delayed_node->mutex);
      [ 1226.652955]                                lock(&found->groups_sem);
      [ 1226.652955]   <Interrupt>
      [ 1226.652955]     lock(&delayed_node->mutex);
      [ 1226.652955]
       *** DEADLOCK ***
      
      Commit 084b6e7c ("btrfs: Fix a lockdep warning when running xfstest.") tried
      to fix a similar one that has the exactly same warning, but with that, we still
      run to this.
      
      The above lock chain comes from
      btrfs_commit_transaction
        ->btrfs_run_delayed_items
          ...
          ->__btrfs_update_delayed_inode
            ...
            ->__btrfs_cow_block
               ...
               ->find_free_extent
                  ->cache_block_group
                    ->load_free_space_cache
                      ->btrfs_readpages
                        ->submit_one_bio
                          ...
                          ->__btrfs_map_block
                            ->btrfs_dev_replace_lock
      
      However, with high memory pressure, tasks which hold dev_replace.lock can
      be interrupted by kswapd and then kswapd is intended to release memory occupied
      by superblock, inodes and dentries, where we may call evict_inode, and it comes
      to
      
      [ 1226.652955]  [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955]  [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30
      [ 1226.652955]  [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700
      
      delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads
      to a ABBA deadlock.
      
      To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but
      things are simpler here since we only needs read's spinlock to blocking lock.
      
      With this, btrfs/011 no more produces warnings in dmesg.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73beece9
  4. 11 2月, 2016 1 次提交
    • D
      btrfs: scrub: use GFP_KERNEL on the submission path · 58c4e173
      David Sterba 提交于
      Scrub is not on the critical writeback path we don't need to use
      GFP_NOFS for all allocations. The failures are handled and stats passed
      back to userspace.
      
      Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the
      global structures and the IO submission paths.
      
      Functions that do the repair and fixups still use GFP_NOFS as we might
      want to skip any other filesystem activity if we encounter an error.
      This could turn out to be unnecessary, but requires more review compared
      to the easy cases in this patch.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58c4e173
  5. 16 1月, 2016 1 次提交
  6. 11 10月, 2015 1 次提交
  7. 08 10月, 2015 1 次提交
  8. 02 10月, 2015 1 次提交
    • L
      Btrfs: move kobj stuff out of dev_replace lock range · 73416dab
      Liu Bo 提交于
      To avoid deadlock described in commit 084b6e7c ("btrfs: Fix a
      lockdep warning when running xfstest."), we should move kobj stuff out
      of dev_replace lock range.
      
        "It is because the btrfs_kobj_{add/rm}_device() will call memory
        allocation with GFP_KERNEL,
        which may flush fs page cache to free space, waiting for it self to do
        the commit, causing the deadlock.
      
        To solve the problem, move btrfs_kobj_{add/rm}_device() out of the
        dev_replace lock range, also involing split the
        btrfs_rm_dev_replace_srcdev() function into remove and free parts.
      
        Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace
        lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are
        called out of the lock range."
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      [added lockup description]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73416dab
  9. 01 10月, 2015 1 次提交
  10. 29 9月, 2015 2 次提交
  11. 01 9月, 2015 1 次提交
  12. 23 7月, 2015 1 次提交
  13. 19 6月, 2015 1 次提交
  14. 27 5月, 2015 2 次提交
  15. 04 3月, 2015 1 次提交
  16. 21 2月, 2015 1 次提交
  17. 22 1月, 2015 2 次提交
  18. 03 12月, 2014 2 次提交
  19. 25 11月, 2014 1 次提交
    • Q
      btrfs: Fix a lockdep warning when running xfstest. · 084b6e7c
      Qu Wenruo 提交于
      The following lockdep warning is triggered during xfstests:
      
      [ 1702.980872] =========================================================
      [ 1702.981181] [ INFO: possible irq lock inversion dependency detected ]
      [ 1702.981482] 3.18.0-rc1 #27 Not tainted
      [ 1702.981781] ---------------------------------------------------------
      [ 1702.982095] kswapd0/77 just changed the state of lock:
      [ 1702.982415]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa03b0b51>] __btrfs_release_delayed_node+0x41/0x1f0 [btrfs]
      [ 1702.982794] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [ 1702.983160]  (&fs_info->dev_replace.lock){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      [ 1702.984675]
      other info that might help us debug this:
      [ 1702.985524] Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
      
      [ 1702.986799]  Possible interrupt unsafe locking scenario:
      
      [ 1702.987681]        CPU0                    CPU1
      [ 1702.988137]        ----                    ----
      [ 1702.988598]   lock(&fs_info->dev_replace.lock);
      [ 1702.989069]                                local_irq_disable();
      [ 1702.989534]                                lock(&delayed_node->mutex);
      [ 1702.990038]                                lock(&found->groups_sem);
      [ 1702.990494]   <Interrupt>
      [ 1702.990938]     lock(&delayed_node->mutex);
      [ 1702.991407]
       *** DEADLOCK ***
      
      It is because the btrfs_kobj_{add/rm}_device() will call memory
      allocation with GFP_KERNEL,
      which may flush fs page cache to free space, waiting for it self to do
      the commit, causing the deadlock.
      
      To solve the problem, move btrfs_kobj_{add/rm}_device() out of the
      dev_replace lock range, also involing split the
      btrfs_rm_dev_replace_srcdev() function into remove and free parts.
      
      Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace
      lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are
      called out of the lock range.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      084b6e7c
  20. 21 11月, 2014 1 次提交
    • E
      Btrfs: return failure if btrfs_dev_replace_finishing() failed · 2fc9f6ba
      Eryu Guan 提交于
      device replace could fail due to another running scrub process or any
      other errors btrfs_scrub_dev() may hit, but this failure doesn't get
      returned to userspace.
      
      The following steps could reproduce this issue
      
      	mkfs -t btrfs -f /dev/sdb1 /dev/sdb2
      	mount /dev/sdb1 /mnt/btrfs
      	while true; do btrfs scrub start -B /mnt/btrfs >/dev/null 2>&1; done &
      	btrfs replace start -Bf /dev/sdb2 /dev/sdb3 /mnt/btrfs
      	# if this replace succeeded, do the following and repeat until
      	# you see this log in dmesg
      	# BTRFS: btrfs_scrub_dev(/dev/sdb2, 2, /dev/sdb3) failed -115
      	#btrfs replace start -Bf /dev/sdb3 /dev/sdb2 /mnt/btrfs
      
      	# once you see the error log in dmesg, check return value of
      	# replace
      	echo $?
      
      Introduce a new dev replace result
      
      BTRFS_IOCTL_DEV_REPLACE_RESULT_SCRUB_INPROGRESS
      
      to catch -EINPROGRESS explicitly and return other errors directly to
      userspace.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2fc9f6ba
  21. 18 9月, 2014 11 次提交
    • M
      Btrfs: make the logic of source device removing more clear · 82372bc8
      Miao Xie 提交于
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      82372bc8
    • M
      Btrfs: fix use-after-free problem of the device during device replace · 67a2c45e
      Miao Xie 提交于
      The problem is:
      	Task0(device scan task)		Task1(device replace task)
      	scan_one_device()
      	mutex_lock(&uuid_mutex)
      	device = find_device()
      					mutex_lock(&device_list_mutex)
      					lock_chunk()
      					rm_and_free_source_device
      					unlock_chunk()
      					mutex_unlock(&device_list_mutex)
      	check device
      
      Destroying the target device if device replace fails also has the same problem.
      
      We fix this problem by locking uuid_mutex during destroying source device or
      target device, just like the device remove operation.
      
      It is a temporary solution, we can fix this problem and make the code more
      clear by atomic counter in the future.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      67a2c45e
    • M
      Btrfs: Fix misuse of chunk mutex · 2196d6e8
      Miao Xie 提交于
      There were several problems about chunk mutex usage:
      - Lock chunk mutex when updating metadata. It would cause the nested
        deadlock because updating metadata might need allocate new chunks
        that need acquire chunk mutex. We remove chunk mutex at this case,
        because b-tree lock and other lock mechanism can help us.
      - ABBA deadlock occured between device_list_mutex and chunk_mutex.
        When we update device status, we must acquire device_list_mutex at the
        beginning, and then we might get chunk_mutex during the device status
        update because we need allocate new chunks for metadata COW. But at
        most place, we acquire chunk_mutex at first and then acquire device list
        mutex. We need change the lock order.
      - Some place we needn't acquire chunk_mutex. For example we needn't get
        chunk_mutex when we free a empty seed fs_devices structure.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2196d6e8
    • M
      Btrfs: fix unprotected device's variants on 32bits machine · 7cc8e58d
      Miao Xie 提交于
      ->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk
      lock when we change them, but sometimes we read them without any lock,
      and we might get unexpected value. We fix this problem like inode's
      i_size.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7cc8e58d
    • M
      Btrfs: fix wrong device bytes_used in the super block · ce7213c7
      Miao Xie 提交于
      device->bytes_used will be changed when allocating a new chunk, and
      disk_total_size will be changed if resizing is successful.
      Meanwhile, the on-disk super blocks of the previous transaction
      might not be updated. Considering the consistency of the metadata
      in the previous transaction, We should use the size in the previous
      transaction to check if the super block is beyond the boundary
      of the device.
      
      Though it is not big problem because we don't use it now, but anyway
      it is better that we make it be consistent with the common metadata,
      maybe we will use it in the future.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ce7213c7
    • M
      Btrfs: fix wrong disk size when writing super blocks · 935e5cc9
      Miao Xie 提交于
      total_size will be changed when resizing a device, and disk_total_size
      will be changed if resizing is successful. Meanwhile, the on-disk super
      blocks of the previous transaction might not be updated. Considering
      the consistency of the metadata in the previous transaction, We should
      use the size in the previous transaction to check if the super block is
      beyond the boundary of the device. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      935e5cc9
    • M
      Btrfs: fix unprotected assignment of the target device · 1c43366d
      Miao Xie 提交于
      We didn't protect the assignment of the target device, it might cause the
      problem that the super block update was skipped because we might find wrong
      size of the target device during the assignment. Fix it by moving the
      assignment sentences into the initialization function of the target device.
      And there is another merit that we can check if the target device is suitable
      more early.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1c43366d
    • M
    • Q
      btrfs: Fix a deadlock in btrfs_dev_replace_finishing() · 12b894cb
      Qu Wenruo 提交于
      btrfs-transacion:5657
      [stack snip]
      btrfs_bio_map()
          btrfs_bio_counter_inc_blocked()
              percpu_counter_inc(&fs_info->bio_counter)  ###bio_counter > 0(A)
              __btrfs_bio_map()
                  btrfs_dev_replace_lock()
                      mutex_lock(dev_replace->lock)	   ###wait mutex(B)
      
      btrfs:32612
      [stack snip]
      btrfs_dev_replace_start()
          btrfs_dev_replace_lock()
      	mutex_lock(dev_replace->lock)		   ###hold mutex(B)
          btrfs_dev_replace_finishing()
              btrfs_rm_dev_replace_blocked()
                  wait until percpu_counter_sum == 0	   ###wait on bio_counter(A)
      
      This bug can be triggered quite easily by the following test script:
      http://pastebin.com/MQmb37Cy
      
      This patch will fix the ABBA problem by calling
      btrfs_dev_replace_unlock() before btrfs_rm_dev_replace_blocked().
      
      The consistency of btrfs devices list and their superblocks is protected
      by device_list_mutex, not btrfs_dev_replace_lock/unlock().
      So it is safe the move btrfs_dev_replace_unlock() before
      btrfs_rm_dev_replace_blocked().
      Reported-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Cc: Stefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <clm@fb.com>
      12b894cb
    • A
      btrfs: fix typo in the log message · de4c296f
      Anand Jain 提交于
      there is no matching open parenthesis for the closing parenthesis
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      de4c296f
    • A
      btrfs: fix rw_devices miss match after seed replace · 63dd86fa
      Anand Jain 提交于
      reproducer:
          reproducer:
          mount /dev/sdb /btrfs
          btrfs dev add /dev/sdc /btrfs
          btrfs rep start -B /dev/sdb /dev/sdd /btrfs
          umount /btrfs
      
      WARNING: CPU: 0 PID: 3882 at fs/btrfs/volumes.c:892 __btrfs_close_devices+0x1c8/0x200 [btrfs]()
      
      which is
      
              WARN_ON(fs_devices->rw_devices);
      
         The problem here is that we did not add one to the rw_devices when
         we replace the seed device with a writable device.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      63dd86fa
  22. 29 6月, 2014 1 次提交
  23. 10 6月, 2014 1 次提交
  24. 11 3月, 2014 3 次提交
    • M
      Btrfs: don't flush all delalloc inodes when we doesn't get s_umount lock · 6c255e67
      Miao Xie 提交于
      We needn't flush all delalloc inodes when we doesn't get s_umount lock,
      or we would make the tasks wait for a long time.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      6c255e67
    • M
      Btrfs: fix use-after-free in the finishing procedure of the device replace · c404e0dc
      Miao Xie 提交于
      During device replace test, we hit a null pointer deference (It was very easy
      to reproduce it by running xfstests' btrfs/011 on the devices with the virtio
      scsi driver). There were two bugs that caused this problem:
      - We might allocate new chunks on the replaced device after we updated
        the mapping tree. And we forgot to replace the source device in those
        mapping of the new chunks.
      - We might get the mapping information which including the source device
        before the mapping information update. And then submit the bio which was
        based on that mapping information after we freed the source device.
      
      For the first bug, we can fix it by doing mapping tree update and source
      device remove in the same context of the chunk mutex. The chunk mutex is
      used to protect the allocable device list, the above method can avoid
      the new chunk allocation, and after we remove the source device, all
      the new chunks will be allocated on the new device. So it can fix
      the first bug.
      
      For the second bug, we need make sure all flighting bios are finished and
      no new bios are produced during we are removing the source device. To fix
      this problem, we introduced a global @bio_counter, we not only inc/dec
      @bio_counter outsize of map_blocks, but also inc it before submitting bio
      and dec @bio_counter when ending bios.
      
      Since Raid56 is a little different and device replace dosen't support raid56
      yet, it is not addressed in the patch and I add comments to make sure we will
      fix it in the future.
      Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      c404e0dc
    • M
      Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace · 391cd9df
      Miao Xie 提交于
      the alloc list of the filesystem is protected by ->chunk_mutex, we need
      get that mutex when we insert the new device into the list.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      391cd9df