1. 08 6月, 2016 3 次提交
  2. 30 5月, 2016 2 次提交
    • F
      Btrfs: fix race between device replace and chunk allocation · 22ab04e8
      Filipe Manana 提交于
      While iterating and copying extents from the source device, the device
      replace code keeps adjusting a left cursor that is used to make sure that
      once we finish processing a device extent, any future writes to extents
      from the corresponding block group will get into both the source and
      target devices. This left cursor is also used for resuming the device
      replace operation at mount time.
      
      However using this left cursor to decide whether writes go into both
      devices or only the source device is not enough to guarantee we don't
      miss copying extents into the target device. There are two cases where
      the current approach fails. The first one is related to when there are
      holes in the device and they get allocated for new block groups while
      the device replace operation is iterating the device extents (more on
      this explained below). The second one is that when that loop over the
      device extents finishes, we start dellaloc, wait for all ordered extents
      and then commit the current transaction, we might have got new block
      groups allocated that are now using a device extent that has an offset
      greater then or equals to the value of the left cursor, in which case
      writes to extents belonging to these new block groups will get issued
      only to the source device.
      
      For the first case where the current approach of using a left cursor
      fails, consider the source device currently has the following layout:
      
        [ extent bg A ] [ hole, unallocated space ] [extent bg B ]
        3Gb             4Gb                         5Gb
      
      While we are iterating the device extents from the source device using
      the commit root of the device tree, the following happens:
      
              CPU 1                                            CPU 2
      
                            <we are at transaction N>
      
        scrub_enumerate_chunks()
          --> searches the device tree for
              extents belonging to the source
              device using the device tree's
              commit root
          --> 1st iteration finds extent belonging to
              block group A
      
              --> sets block group A to RO mode
                  (btrfs_inc_block_group_ro)
      
              --> sets cursor left to found_key.offset
                  which is 3Gb
      
              --> scrub_chunk() starts
                  copies all allocated extents from
                  block group's A stripe at source
                  device into target device
      
                                                                 btrfs_alloc_chunk()
                                                                   --> allocates device extent
                                                                       in the range [4Gb, 5Gb[
                                                                       from the source device for
                                                                       a new block group C
      
                                                                 extent allocated from block
                                                                 group C for a direct IO,
                                                                 buffered write or btree node/leaf
      
                                                                 extent is written to, perhaps
                                                                 in response to a writepages()
                                                                 call from the VM or directly
                                                                 through direct IO
      
                                                                 the write is made only against
                                                                 the source device and not against
                                                                 the target device because the
                                                                 extent's offset is in the interval
                                                                 [4Gb, 5Gb[ which is larger then
                                                                 the value of cursor_left (3Gb)
      
              --> scrub_chunks() finishes
      
              --> updates left cursor from 3Gb to
                  4Gb
      
              --> btrfs_dec_block_group_ro() sets
                  block group A back to RW mode
      
                                   <we are still at transaction N>
      
          --> 2nd iteration finds extent belonging to
              block group B - it did not find the new
              extent in the range [4Gb, 5Gb[ for block
              group C because we are using the device
              tree's commit root or even because the
              block group's items are not all yet
              inserted in the respective btrees, that is,
              the block group is still attached to some
              transaction handle's new_bgs list and
              btrfs_create_pending_block_groups() was
              not called yet against that transaction
              handle, so the device extent items were
              not yet inserted into the devices tree
      
                                   <we are still at transaction N>
      
              --> so we end not copying anything from the newly
                  allocated device extent from the source device
                  to the target device
      
      So fix this by making __btrfs_map_block() always redirect writes to the
      target device as well, independently of the left cursor's value. With
      this change the left cursor is now used only for the purpose of tracking
      progress and allow a mount operation to resume a device replace.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      22ab04e8
    • F
      Btrfs: fix race between device replace and block group removal · 57ba4cb8
      Filipe Manana 提交于
      When it's finishing, the device replace code iterates all extent maps
      representing block group and for each one that has a stripe that refers
      to the source device, it replaces its device with the target device.
      However when it replaces the source device with the target device it,
      the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
      only after its ID is changed to match the one from the source device.
      This leads to races with the chunk removal code that can temporarly see
      a device with an ID of 0ULL and then attempt to use that ID to remove
      items from the device tree and fail, causing a transaction abort:
      
      [ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) to /dev/sde finished
      [ 9238.594377] ------------[ cut here ]------------
      [ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 btrfs_remove_chunk+0x2e5/0x793 [btrfs]
      [ 9238.594403] BTRFS: Transaction aborted (error 1)
      [ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 evdev sg i2c_core se
      rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod fl
      oppy [last unloaded: btrfs]
      [ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 4.6.0-rc7-btrfs-next-29+ #1
      [ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
      [ 9238.594421]  0000000000000000 ffff88017f1dbc60 ffffffff8126b42c ffff88017f1dbcb0
      [ 9238.594422]  0000000000000000 ffff88017f1dbca0 ffffffff81052b14 00000ad37f1dbd18
      [ 9238.594423]  0000000000000001 ffff88018068a558 ffff88005c4b9c00 ffff880233f60db0
      [ 9238.594424] Call Trace:
      [ 9238.594428]  [<ffffffff8126b42c>] dump_stack+0x67/0x90
      [ 9238.594430]  [<ffffffff81052b14>] __warn+0xc2/0xdd
      [ 9238.594432]  [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
      [ 9238.594434]  [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
      [ 9238.594450]  [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
      [ 9238.594452]  [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
      [ 9238.594464]  [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 [btrfs]
      [ 9238.594476]  [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
      [ 9238.594489]  [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
      [ 9238.594490]  [<ffffffff8106f403>] kthread+0xd4/0xdc
      [ 9238.594494]  [<ffffffff8149e242>] ret_from_fork+0x22/0x40
      [ 9238.594495]  [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
      [ 9238.594496] ---[ end trace 183efbe50275f059 ]---
      
      The sequence of steps leading to this is like the following:
      
                    CPU 1                                           CPU 2
      
       btrfs_dev_replace_finishing()
      
         at this point
         dev_replace->tgtdev->devid ==
         BTRFS_DEV_REPLACE_DEVID (0ULL)
      
         ...
      
         btrfs_start_transaction()
         btrfs_commit_transaction()
      
                                                           btrfs_delete_unused_bgs()
                                                             btrfs_remove_chunk()
      
                                                               looks up for the extent map
                                                               corresponding to the chunk
      
                                                               lock_chunks() (chunk_mutex)
                                                               check_system_chunk()
                                                               unlock_chunks() (chunk_mutex)
      
         locks fs_info->chunk_mutex
      
         btrfs_dev_replace_update_device_in_mapping_tree()
           --> iterates fs_info->mapping_tree and
               replaces the device in every extent
               map's map->stripes[] with
               dev_replace->tgtdev, which still has
               an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
      
                                                               iterates over all stripes from
                                                               the extent map
      
                                                                 --> calls btrfs_free_dev_extent()
                                                                     passing it the target device
                                                                     that still has an ID of 0ULL
      
                                                                 --> btrfs_free_dev_extent() fails
                                                                   --> aborts current transaction
      
         finishes setting up the target device,
         namely it sets tgtdev->devid to the value
         of srcdev->devid (which is necessarily > 0)
      
         frees the srcdev
      
         unlocks fs_info->chunk_mutex
      
      So fix this by taking the device list mutex while processing the stripes
      for the chunk's extent map. This is similar to the race between device
      replace and block group creation that was fixed by commit 50460e37
      ("Btrfs: fix race when finishing dev replace leading to transaction abort").
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      57ba4cb8
  3. 26 5月, 2016 2 次提交
  4. 21 5月, 2016 1 次提交
  5. 06 5月, 2016 5 次提交
    • A
      btrfs: fix lock dep warning move scratch super outside of chunk_mutex · 48b3b9d4
      Anand Jain 提交于
      Move scratch super outside of the chunk lock to avoid below
      lockdep warning. The better place to scratch super is in
      the function btrfs_rm_dev_replace_free_srcdev() just before
      free_device, which is outside of the chunk lock as well.
      
      To reproduce:
        (fresh boot)
        mkfs.btrfs -f -draid5 -mraid5 /dev/sdc /dev/sdd /dev/sde
        mount /dev/sdc /btrfs
        dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
        (get devmgt from https://github.com/asj/devmgt.git)
        devmgt detach /dev/sde
        dd if=/dev/zero of=/btrfs/tf1 bs=4096 count=100
        sync
        btrfs replace start -Brf 3 /dev/sdf /btrfs <--
        devmgt attach host7
      
      ======================================================
      [ INFO: possible circular locking dependency detected ]
      4.6.0-rc2asj+ #1 Not tainted
      ---------------------------------------------------
      
      btrfs/2174 is trying to acquire lock:
      (sb_writers){.+.+.+}, at:
      [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0
      
      but task is already holding lock:
      (&fs_info->chunk_mutex){+.+.+.}, at:
      [<ffffffffa05c5f55>] btrfs_dev_replace_finishing+0x145/0x980 [btrfs]
      
      which lock already depends on the new lock.
      
      Chain exists of:
      sb_writers --> &fs_devs->device_list_mutex --> &fs_info->chunk_mutex
      Possible unsafe locking scenario:
      CPU0				CPU1
      ----				----
      lock(&fs_info->chunk_mutex);
      				lock(&fs_devs->device_list_mutex);
      				lock(&fs_info->chunk_mutex);
      lock(sb_writers);
      
      *** DEADLOCK ***
      
      -> #0 (sb_writers){.+.+.+}:
      [<ffffffff810e6415>] __lock_acquire+0x1bc5/0x1ee0
      [<ffffffff810e707e>] lock_acquire+0xbe/0x210
      [<ffffffff810df49a>] percpu_down_read+0x4a/0xa0
      [<ffffffff812449b4>] __sb_start_write+0xb4/0xf0
      [<ffffffff81265534>] mnt_want_write+0x24/0x50
      [<ffffffff812508a2>] path_openat+0x952/0x1190
      [<ffffffff81252451>] do_filp_open+0x91/0x100
      [<ffffffff8123f5cc>] file_open_name+0xfc/0x140
      [<ffffffff8123f643>] filp_open+0x33/0x60
      [<ffffffffa0572bb6>] update_dev_time+0x16/0x40 [btrfs]
      [<ffffffffa057f60d>] btrfs_scratch_superblocks+0x5d/0xb0 [btrfs]
      [<ffffffffa057f70e>] btrfs_rm_dev_replace_remove_srcdev+0xae/0xd0 [btrfs]
      [<ffffffffa05c62c5>] btrfs_dev_replace_finishing+0x4b5/0x980 [btrfs]
      [<ffffffffa05c6ae8>] btrfs_dev_replace_start+0x358/0x530 [btrfs]
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      48b3b9d4
    • J
      Btrfs: remove BUG_ON()'s in btrfs_map_block · e042d1ec
      Josef Bacik 提交于
      btrfs_map_block can go horribly wrong in the face of fs corruption, lets agree
      to not be assholes and panic at any possible chance things are all fucked up.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      [ removed type casts ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e042d1ec
    • L
      Btrfs: fix divide error upon chunk's stripe_len · 3d8da678
      Liu Bo 提交于
      The struct 'map_lookup' uses type int for @stripe_len, while
      btrfs_chunk_stripe_len() can return a u64 value, and it may end up with
      @stripe_len being undefined value and it can lead to 'divide error' in
       __btrfs_map_block().
      
      This changes 'map_lookup' to use type u64 for stripe_len, also right now
      we only use BTRFS_STRIPE_LEN for stripe_len, so this adds a valid checker for
      BTRFS_STRIPE_LEN.
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Reported-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ folded division fix to scrub_raid56_parity ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3d8da678
    • A
      btrfs: fix lock dep warning, move scratch dev out of device_list_mutex and uuid_mutex · 779bf3fe
      Anand Jain 提交于
      When the replace target fails, the target device will be taken
      out of fs device list, scratch + update_dev_time and freed. However
      we could do the scratch  + update_dev_time and free part after the
      device has been taken out of device list, so that we don't have to
      hold the device_list_mutex and uuid_mutex locks.
      
      Reported issue:
      
      [ 5375.718845] ======================================================
      [ 5375.718846] [ INFO: possible circular locking dependency detected ]
      [ 5375.718849] 4.4.5-scst31x-debug-11+ #40 Not tainted
      [ 5375.718849] -------------------------------------------------------
      [ 5375.718851] btrfs-health/4662 is trying to acquire lock:
      [ 5375.718861]  (sb_writers){.+.+.+}, at: [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
      [ 5375.718862]
      [ 5375.718862] but task is already holding lock:
      [ 5375.718907]  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
      [ 5375.718907]
      [ 5375.718907] which lock already depends on the new lock.
      [ 5375.718907]
      [ 5375.718908]
      [ 5375.718908] the existing dependency chain (in reverse order) is:
      [ 5375.718911]
      [ 5375.718911] -> #3 (&fs_devs->device_list_mutex){+.+.+.}:
      [ 5375.718917]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.718921]        [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
      [ 5375.718940]        [<ffffffffa0219bf6>] btrfs_show_devname+0x36/0x210 [btrfs]
      [ 5375.718945]        [<ffffffff81267079>] show_vfsmnt+0x49/0x150
      [ 5375.718948]        [<ffffffff81240b07>] m_show+0x17/0x20
      [ 5375.718951]        [<ffffffff81246868>] seq_read+0x2d8/0x3b0
      [ 5375.718955]        [<ffffffff8121df28>] __vfs_read+0x28/0xd0
      [ 5375.718959]        [<ffffffff8121e806>] vfs_read+0x86/0x130
      [ 5375.718962]        [<ffffffff8121f4c9>] SyS_read+0x49/0xa0
      [ 5375.718966]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [ 5375.718968]
      [ 5375.718968] -> #2 (namespace_sem){+++++.}:
      [ 5375.718971]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.718974]        [<ffffffff81635199>] down_write+0x49/0x80
      [ 5375.718977]        [<ffffffff81243593>] lock_mount+0x43/0x1c0
      [ 5375.718979]        [<ffffffff81243c13>] do_add_mount+0x23/0xd0
      [ 5375.718982]        [<ffffffff81244afb>] do_mount+0x27b/0xe30
      [ 5375.718985]        [<ffffffff812459dc>] SyS_mount+0x8c/0xd0
      [ 5375.718988]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [ 5375.718991]
      [ 5375.718991] -> #1 (&sb->s_type->i_mutex_key#5){+.+.+.}:
      [ 5375.718994]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.718996]        [<ffffffff81633949>] mutex_lock_nested+0x69/0x3c0
      [ 5375.719001]        [<ffffffff8122d608>] path_openat+0x468/0x1360
      [ 5375.719004]        [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
      [ 5375.719007]        [<ffffffff8121da7b>] do_sys_open+0x12b/0x210
      [ 5375.719010]        [<ffffffff8121db7e>] SyS_open+0x1e/0x20
      [ 5375.719013]        [<ffffffff81637976>] entry_SYSCALL_64_fastpath+0x16/0x7a
      [ 5375.719015]
      [ 5375.719015] -> #0 (sb_writers){.+.+.+}:
      [ 5375.719018]        [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
      [ 5375.719021]        [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.719026]        [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
      [ 5375.719028]        [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
      [ 5375.719031]        [<ffffffff81242eb4>] mnt_want_write+0x24/0x50
      [ 5375.719035]        [<ffffffff8122ded2>] path_openat+0xd32/0x1360
      [ 5375.719037]        [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
      [ 5375.719040]        [<ffffffff8121d8a4>] file_open_name+0xe4/0x130
      [ 5375.719043]        [<ffffffff8121d923>] filp_open+0x33/0x60
      [ 5375.719073]        [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
      [ 5375.719099]        [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
      [ 5375.719123]        [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
      [ 5375.719150]        [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
      [ 5375.719175]        [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
      [ 5375.719199]        [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
      [ 5375.719222]        [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
      [ 5375.719225]        [<ffffffff810a70df>] kthread+0xef/0x110
      [ 5375.719229]        [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70
      [ 5375.719230]
      [ 5375.719230] other info that might help us debug this:
      [ 5375.719230]
      [ 5375.719233] Chain exists of:
      [ 5375.719233]   sb_writers --> namespace_sem --> &fs_devs->device_list_mutex
      [ 5375.719233]
      [ 5375.719234]  Possible unsafe locking scenario:
      [ 5375.719234]
      [ 5375.719234]        CPU0                    CPU1
      [ 5375.719235]        ----                    ----
      [ 5375.719236]   lock(&fs_devs->device_list_mutex);
      [ 5375.719238]                                lock(namespace_sem);
      [ 5375.719239]                                lock(&fs_devs->device_list_mutex);
      [ 5375.719241]   lock(sb_writers);
      [ 5375.719241]
      [ 5375.719241]  *** DEADLOCK ***
      [ 5375.719241]
      [ 5375.719243] 4 locks held by btrfs-health/4662:
      [ 5375.719266]  #0:  (&fs_info->health_mutex){+.+.+.}, at: [<ffffffffa0246303>] health_kthread+0x63/0x490 [btrfs]
      [ 5375.719293]  #1:  (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+.+.}, at: [<ffffffffa02c6611>] btrfs_dev_replace_finishing+0x41/0x990 [btrfs]
      [ 5375.719319]  #2:  (uuid_mutex){+.+.+.}, at: [<ffffffffa0282620>] btrfs_destroy_dev_replace_tgtdev+0x20/0x150 [btrfs]
      [ 5375.719343]  #3:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa028263c>] btrfs_destroy_dev_replace_tgtdev+0x3c/0x150 [btrfs]
      [ 5375.719343]
      [ 5375.719343] stack backtrace:
      [ 5375.719347] CPU: 2 PID: 4662 Comm: btrfs-health Not tainted 4.4.5-scst31x-debug-11+ #40
      [ 5375.719348] Hardware name: Supermicro SYS-6018R-WTRT/X10DRW-iT, BIOS 1.0c 01/07/2015
      [ 5375.719352]  0000000000000000 ffff880856f73880 ffffffff813529e3 ffffffff826182a0
      [ 5375.719354]  ffffffff8260c090 ffff880856f738c0 ffffffff810d667c ffff880856f73930
      [ 5375.719357]  ffff880861f32b40 ffff880861f32b68 0000000000000003 0000000000000004
      [ 5375.719357] Call Trace:
      [ 5375.719363]  [<ffffffff813529e3>] dump_stack+0x85/0xc2
      [ 5375.719366]  [<ffffffff810d667c>] print_circular_bug+0x1ec/0x260
      [ 5375.719369]  [<ffffffff810d97ca>] __lock_acquire+0x17ba/0x1ae0
      [ 5375.719373]  [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
      [ 5375.719376]  [<ffffffff810da4be>] lock_acquire+0xce/0x1e0
      [ 5375.719378]  [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
      [ 5375.719383]  [<ffffffff810d3bef>] percpu_down_read+0x4f/0xa0
      [ 5375.719385]  [<ffffffff812214f7>] ? __sb_start_write+0xb7/0xf0
      [ 5375.719387]  [<ffffffff812214f7>] __sb_start_write+0xb7/0xf0
      [ 5375.719389]  [<ffffffff81242eb4>] mnt_want_write+0x24/0x50
      [ 5375.719393]  [<ffffffff8122ded2>] path_openat+0xd32/0x1360
      [ 5375.719415]  [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
      [ 5375.719418]  [<ffffffff810f606d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
      [ 5375.719420]  [<ffffffff8122f86e>] do_filp_open+0x7e/0xe0
      [ 5375.719423]  [<ffffffff810f615d>] ? rcu_read_lock_sched_held+0x6d/0x80
      [ 5375.719426]  [<ffffffff81201a9b>] ? kmem_cache_alloc+0x26b/0x5d0
      [ 5375.719430]  [<ffffffff8122e7d4>] ? getname_kernel+0x34/0x120
      [ 5375.719433]  [<ffffffff8121d8a4>] file_open_name+0xe4/0x130
      [ 5375.719436]  [<ffffffff8121d923>] filp_open+0x33/0x60
      [ 5375.719462]  [<ffffffffa02776a6>] update_dev_time+0x16/0x40 [btrfs]
      [ 5375.719485]  [<ffffffffa02825be>] btrfs_scratch_superblocks+0x4e/0x90 [btrfs]
      [ 5375.719506]  [<ffffffffa0282665>] btrfs_destroy_dev_replace_tgtdev+0x65/0x150 [btrfs]
      [ 5375.719530]  [<ffffffffa02c6c80>] btrfs_dev_replace_finishing+0x6b0/0x990 [btrfs]
      [ 5375.719554]  [<ffffffffa02c6b23>] ? btrfs_dev_replace_finishing+0x553/0x990 [btrfs]
      [ 5375.719576]  [<ffffffffa02c729e>] btrfs_dev_replace_start+0x33e/0x540 [btrfs]
      [ 5375.719598]  [<ffffffffa02c7f58>] btrfs_auto_replace_start+0xf8/0x140 [btrfs]
      [ 5375.719621]  [<ffffffffa02464e6>] health_kthread+0x246/0x490 [btrfs]
      [ 5375.719641]  [<ffffffffa02463d8>] ? health_kthread+0x138/0x490 [btrfs]
      [ 5375.719661]  [<ffffffffa02462a0>] ? btrfs_congested_fn+0x180/0x180 [btrfs]
      [ 5375.719663]  [<ffffffff810a70df>] kthread+0xef/0x110
      [ 5375.719666]  [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
      [ 5375.719669]  [<ffffffff81637d2f>] ret_from_fork+0x3f/0x70
      [ 5375.719672]  [<ffffffff810a6ff0>] ? kthread_create_on_node+0x200/0x200
      [ 5375.719697] ------------[ cut here ]------------
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reported-by: NYauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      779bf3fe
    • A
      btrfs: allow balancing to dup with multi-device · 88be159c
      Austin S. Hemmelgarn 提交于
      Currently, we don't allow the user to try and rebalance to a dup profile
      on a multi-device filesystem.  In most cases, this is a perfectly sensible
      restriction as raid1 uses the same amount of space and provides better
      protection.
      
      However, when reshaping a multi-device filesystem down to a single device
      filesystem, this requires the user to convert metadata and system chunks
      to single profile before deleting devices, and then convert again to dup,
      which leaves a period of time where metadata integrity is reduced.
      
      This patch removes the single-device-only restriction from converting to
      dup profile to remove this potential data integrity reduction.
      Signed-off-by: NAustin S. Hemmelgarn <ahferroin7@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      88be159c
  6. 04 5月, 2016 2 次提交
  7. 28 4月, 2016 16 次提交
  8. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  9. 14 3月, 2016 1 次提交
  10. 23 2月, 2016 1 次提交
    • L
      Btrfs: fix lockdep deadlock warning due to dev_replace · 73beece9
      Liu Bo 提交于
      Xfstests btrfs/011 complains about a deadlock warning,
      
      [ 1226.649039] =========================================================
      [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ]
      [ 1226.649039] 4.1.0+ #270 Not tainted
      [ 1226.649039] ---------------------------------------------------------
      [ 1226.652955] kswapd0/46 just changed the state of lock:
      [ 1226.652955]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [ 1226.652955]  (&fs_info->dev_replace.lock){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      [ 1226.652955]
      other info that might help us debug this:
      [ 1226.652955] Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
      
      [ 1226.652955]  Possible interrupt unsafe locking scenario:
      
      [ 1226.652955]        CPU0                    CPU1
      [ 1226.652955]        ----                    ----
      [ 1226.652955]   lock(&fs_info->dev_replace.lock);
      [ 1226.652955]                                local_irq_disable();
      [ 1226.652955]                                lock(&delayed_node->mutex);
      [ 1226.652955]                                lock(&found->groups_sem);
      [ 1226.652955]   <Interrupt>
      [ 1226.652955]     lock(&delayed_node->mutex);
      [ 1226.652955]
       *** DEADLOCK ***
      
      Commit 084b6e7c ("btrfs: Fix a lockdep warning when running xfstest.") tried
      to fix a similar one that has the exactly same warning, but with that, we still
      run to this.
      
      The above lock chain comes from
      btrfs_commit_transaction
        ->btrfs_run_delayed_items
          ...
          ->__btrfs_update_delayed_inode
            ...
            ->__btrfs_cow_block
               ...
               ->find_free_extent
                  ->cache_block_group
                    ->load_free_space_cache
                      ->btrfs_readpages
                        ->submit_one_bio
                          ...
                          ->__btrfs_map_block
                            ->btrfs_dev_replace_lock
      
      However, with high memory pressure, tasks which hold dev_replace.lock can
      be interrupted by kswapd and then kswapd is intended to release memory occupied
      by superblock, inodes and dentries, where we may call evict_inode, and it comes
      to
      
      [ 1226.652955]  [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955]  [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30
      [ 1226.652955]  [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700
      
      delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads
      to a ABBA deadlock.
      
      To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but
      things are simpler here since we only needs read's spinlock to blocking lock.
      
      With this, btrfs/011 no more produces warnings in dmesg.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73beece9
  11. 11 2月, 2016 3 次提交
  12. 30 1月, 2016 1 次提交
  13. 22 1月, 2016 1 次提交
  14. 20 1月, 2016 1 次提交