1. 10 12月, 2020 1 次提交
  2. 08 12月, 2020 1 次提交
  3. 27 7月, 2020 1 次提交
  4. 24 3月, 2020 1 次提交
  5. 20 1月, 2020 1 次提交
    • D
      btrfs: keep track of which extents have been discarded · a7ccb255
      Dennis Zhou 提交于
      Async discard will use the free space cache as backing knowledge for
      which extents to discard. This patch plumbs knowledge about which
      extents need to be discarded into the free space cache from
      unpin_extent_range().
      
      An untrimmed extent can merge with everything as this is a new region.
      Absorbing trimmed extents is a tradeoff to for greater coalescing which
      makes life better for find_free_extent(). Additionally, it seems the
      size of a trim isn't as problematic as the trim io itself.
      
      When reading in the free space cache from disk, if sync is set, mark all
      extents as trimmed. The current code ensures at transaction commit that
      all free space is trimmed when sync is set, so this reflects that.
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a7ccb255
  6. 16 10月, 2019 1 次提交
    • Q
      btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() · 8702ba93
      Qu Wenruo 提交于
      [Background]
      Btrfs qgroup uses two types of reserved space for METADATA space,
      PERTRANS and PREALLOC.
      
      PERTRANS is metadata space reserved for each transaction started by
      btrfs_start_transaction().
      While PREALLOC is for delalloc, where we reserve space before joining a
      transaction, and finally it will be converted to PERTRANS after the
      writeback is done.
      
      [Inconsistency]
      However there is inconsistency in how we handle PREALLOC metadata space.
      
      The most obvious one is:
      In btrfs_buffered_write():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
      
      We always free qgroup PREALLOC meta space.
      
      While in btrfs_truncate_block():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
      
      We only free qgroup PREALLOC meta space when something went wrong.
      
      [The Correct Behavior]
      The correct behavior should be the one in btrfs_buffered_write(), we
      should always free PREALLOC metadata space.
      
      The reason is, the btrfs_delalloc_* mechanism works by:
      - Reserve metadata first, even it's not necessary
        In btrfs_delalloc_reserve_metadata()
      
      - Free the unused metadata space
        Normally in:
        btrfs_delalloc_release_extents()
        |- btrfs_inode_rsv_release()
           Here we do calculation on whether we should release or not.
      
      E.g. for 64K buffered write, the metadata rsv works like:
      
      /* The first page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=0
      total:		num_bytes=calc_inode_reservations()
      /* The first page caused one outstanding extent, thus needs metadata
         rsv */
      
      /* The 2nd page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed
      /* The 2nd page doesn't cause new outstanding extent, needs no new meta
         rsv, so we free what we have reserved */
      
      /* The 3rd~16th pages */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed (still space for one outstanding extent)
      
      This means, if btrfs_delalloc_release_extents() determines to free some
      space, then those space should be freed NOW.
      So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
      than btrfs_qgroup_convert_reserved_meta().
      
      The good news is:
      - The callers are not that hot
        The hottest caller is in btrfs_buffered_write(), which is already
        fixed by commit 336a8bb8 ("btrfs: Fix wrong
        btrfs_delalloc_release_extents parameter"). Thus it's not that
        easy to cause false EDQUOT.
      
      - The trans commit in advance for qgroup would hide the bug
        Since commit f5fef459 ("btrfs: qgroup: Make qgroup async transaction
        commit more aggressive"), when btrfs qgroup metadata free space is slow,
        it will try to commit transaction and free the wrongly converted
        PERTRANS space, so it's not that easy to hit such bug.
      
      [FIX]
      So to fix the problem, remove the @qgroup_free parameter for
      btrfs_delalloc_release_extents(), and always pass true to
      btrfs_inode_rsv_release().
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8702ba93
  7. 09 9月, 2019 6 次提交
    • J
      btrfs: rename the btrfs_calc_*_metadata_size helpers · 2bd36e7b
      Josef Bacik 提交于
      btrfs_calc_trunc_metadata_size differs from trans_metadata_size in that
      it doesn't take into account any splitting at the levels, because
      truncate will never split nodes.  However truncate _and_ changing will
      never split nodes, so rename btrfs_calc_trunc_metadata_size to
      btrfs_calc_metadata_size.  Also btrfs_calc_trans_metadata_size is purely
      for inserting items, so rename this to btrfs_calc_insert_metadata_size.
      Making these clearer will help when I start using them differently in
      upcoming patches.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2bd36e7b
    • F
      Btrfs: wake up inode cache waiters sooner to reduce waiting time · 32e53440
      Filipe Manana 提交于
      If we need to start an inode caching thread, because none currently exists
      on disk, we can wake up all waiters as soon as we mark the range starting
      at root's highest objectid + 1 and ending at BTRFS_LAST_FREE_OBJECTID as
      free, so that they don't need to wait for the caching thread to start and
      do some progress. We follow the same approach within the caching thread,
      since as soon as it finds a free range and marks it as free space in the
      cache, it wakes up all waiters. So improve this by adding such a wakeup
      call after marking that initial range as free space.
      
      Fixes: a47d6b70 ("Btrfs: setup free ino caching in a more asynchronous way")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      32e53440
    • F
      Btrfs: fix inode cache waiters hanging on path allocation failure · 9d123a35
      Filipe Manana 提交于
      If the caching thread fails to allocate a path, it returns without waking
      up any cache waiters, leaving them hang forever. Fix this by following the
      same approach as when we fail to start the caching thread: print an error
      message, disable inode caching and make the wakers fallback to non-caching
      mode behaviour (calling btrfs_find_free_objectid()).
      
      Fixes: 581bb050 ("Btrfs: Cache free inode numbers in memory")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9d123a35
    • F
      Btrfs: fix inode cache waiters hanging on failure to start caching thread · a68ebe07
      Filipe Manana 提交于
      If we fail to start the inode caching thread, we print an error message
      and disable the inode cache, however we never wake up any waiters, so they
      hang forever waiting for the caching to finish. Fix this by waking them
      up and have them fallback to a call to btrfs_find_free_objectid().
      
      Fixes: e60efa84 ("Btrfs: avoid triggering bug_on() when we fail to start inode caching task")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a68ebe07
    • F
      Btrfs: fix inode cache block reserve leak on failure to allocate data space · 29d47d00
      Filipe Manana 提交于
      If we failed to allocate the data extent(s) for the inode space cache, we
      were bailing out without releasing the previously reserved metadata. This
      was triggering the following warnings when unmounting a filesystem:
      
        $ cat -n fs/btrfs/inode.c
        (...)
        9268  void btrfs_destroy_inode(struct inode *inode)
        9269  {
        (...)
        9276          WARN_ON(BTRFS_I(inode)->block_rsv.reserved);
        9277          WARN_ON(BTRFS_I(inode)->block_rsv.size);
        (...)
        9281          WARN_ON(BTRFS_I(inode)->csum_bytes);
        9282          WARN_ON(BTRFS_I(inode)->defrag_bytes);
        (...)
      
      Several fstests test cases triggered this often, such as generic/083,
      generic/102, generic/172, generic/269 and generic/300 at least, producing
      stack traces like the following in dmesg/syslog:
      
        [82039.079546] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9276 btrfs_destroy_inode+0x203/0x270 [btrfs]
        (...)
        [82039.081543] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.081912] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.082673] RIP: 0010:btrfs_destroy_inode+0x203/0x270 [btrfs]
        (...)
        [82039.083913] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
        [82039.084320] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.084736] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.085156] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.085578] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.086000] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.086416] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.086837] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.087253] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.087672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.088089] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.088504] Call Trace:
        [82039.088918]  destroy_inode+0x3b/0x70
        [82039.089340]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.089768]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.090183]  ? wait_for_completion+0x65/0x1a0
        [82039.090607]  close_ctree+0x172/0x370 [btrfs]
        [82039.091021]  generic_shutdown_super+0x6c/0x110
        [82039.091427]  kill_anon_super+0xe/0x30
        [82039.091832]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.092233]  deactivate_locked_super+0x3a/0x70
        [82039.092636]  cleanup_mnt+0x3b/0x80
        [82039.093039]  task_work_run+0x93/0xc0
        [82039.093457]  exit_to_usermode_loop+0xfa/0x100
        [82039.093856]  do_syscall_64+0x162/0x1d0
        [82039.094244]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.094634] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.095876] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.096290] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.096700] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.097110] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.097522] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.097937] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.098350] irq event stamp: 0
        [82039.098750] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.099150] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.099545] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.099925] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.100292] ---[ end trace f2521afa616ddccc ]---
        [82039.100707] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9277 btrfs_destroy_inode+0x1ac/0x270 [btrfs]
        (...)
        [82039.103050] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.103428] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.104203] RIP: 0010:btrfs_destroy_inode+0x1ac/0x270 [btrfs]
        (...)
        [82039.105461] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010206
        [82039.105866] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.106270] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.106673] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.107078] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.107487] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.107894] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.108309] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.108723] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.109146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.109567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.109989] Call Trace:
        [82039.110405]  destroy_inode+0x3b/0x70
        [82039.110830]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.111257]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.111675]  ? wait_for_completion+0x65/0x1a0
        [82039.112101]  close_ctree+0x172/0x370 [btrfs]
        [82039.112519]  generic_shutdown_super+0x6c/0x110
        [82039.112988]  kill_anon_super+0xe/0x30
        [82039.113439]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.113861]  deactivate_locked_super+0x3a/0x70
        [82039.114278]  cleanup_mnt+0x3b/0x80
        [82039.114685]  task_work_run+0x93/0xc0
        [82039.115083]  exit_to_usermode_loop+0xfa/0x100
        [82039.115476]  do_syscall_64+0x162/0x1d0
        [82039.115863]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.116254] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.117463] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.117882] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.118330] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.118743] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.119159] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.119574] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.119987] irq event stamp: 0
        [82039.120387] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.120787] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.121182] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.121563] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.121933] ---[ end trace f2521afa616ddccd ]---
        [82039.122353] WARNING: CPU: 2 PID: 13167 at fs/btrfs/inode.c:9278 btrfs_destroy_inode+0x1bc/0x270 [btrfs]
        (...)
        [82039.124606] CPU: 2 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.125008] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.125801] RIP: 0010:btrfs_destroy_inode+0x1bc/0x270 [btrfs]
        (...)
        [82039.126998] RSP: 0018:ffffac0b426a7d30 EFLAGS: 00010202
        [82039.127399] RAX: ffff8ddf77691158 RBX: ffff8dde29b34660 RCX: 0000000000000002
        [82039.127803] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8dde29b34660
        [82039.128206] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.128611] R10: ffffac0b426a7c90 R11: ffffffffb9aad768 R12: ffffac0b426a7db0
        [82039.129020] R13: ffff8ddf5fbec0a0 R14: dead000000000100 R15: 0000000000000000
        [82039.129428] FS:  00007f8db96d12c0(0000) GS:ffff8de036b00000(0000) knlGS:0000000000000000
        [82039.129846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.130261] CR2: 0000000001416108 CR3: 00000002315cc001 CR4: 00000000003606e0
        [82039.130684] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.131142] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.131561] Call Trace:
        [82039.131990]  destroy_inode+0x3b/0x70
        [82039.132417]  btrfs_free_fs_root+0x16/0xa0 [btrfs]
        [82039.132844]  btrfs_free_fs_roots+0xd8/0x160 [btrfs]
        [82039.133262]  ? wait_for_completion+0x65/0x1a0
        [82039.133688]  close_ctree+0x172/0x370 [btrfs]
        [82039.134157]  generic_shutdown_super+0x6c/0x110
        [82039.134575]  kill_anon_super+0xe/0x30
        [82039.134997]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.135415]  deactivate_locked_super+0x3a/0x70
        [82039.135832]  cleanup_mnt+0x3b/0x80
        [82039.136239]  task_work_run+0x93/0xc0
        [82039.136637]  exit_to_usermode_loop+0xfa/0x100
        [82039.137029]  do_syscall_64+0x162/0x1d0
        [82039.137418]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.137812] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.139059] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.139475] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.139890] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.140302] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.140719] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.141138] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.141597] irq event stamp: 0
        [82039.142043] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.142443] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.142839] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.143220] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.143588] ---[ end trace f2521afa616ddcce ]---
        [82039.167472] WARNING: CPU: 3 PID: 13167 at fs/btrfs/extent-tree.c:10120 btrfs_free_block_groups+0x30d/0x460 [btrfs]
        (...)
        [82039.173800] CPU: 3 PID: 13167 Comm: umount Tainted: G        W         5.2.0-rc4-btrfs-next-50 #1
        [82039.174847] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
        [82039.177031] RIP: 0010:btrfs_free_block_groups+0x30d/0x460 [btrfs]
        (...)
        [82039.180397] RSP: 0018:ffffac0b426a7dd8 EFLAGS: 00010206
        [82039.181574] RAX: ffff8de010a1db40 RBX: ffff8de010a1db40 RCX: 0000000000170014
        [82039.182711] RDX: ffff8ddff4380040 RSI: ffff8de010a1da58 RDI: 0000000000000246
        [82039.183817] RBP: ffff8ddf5fbec000 R08: 0000000000000000 R09: 0000000000000000
        [82039.184925] R10: ffff8de036404380 R11: ffffffffb8a5ea00 R12: ffff8de010a1b2b8
        [82039.186090] R13: ffff8de010a1b2b8 R14: 0000000000000000 R15: dead000000000100
        [82039.187208] FS:  00007f8db96d12c0(0000) GS:ffff8de036b80000(0000) knlGS:0000000000000000
        [82039.188345] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [82039.189481] CR2: 00007fb044005170 CR3: 00000002315cc006 CR4: 00000000003606e0
        [82039.190674] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [82039.191829] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [82039.192978] Call Trace:
        [82039.194160]  close_ctree+0x19a/0x370 [btrfs]
        [82039.195315]  generic_shutdown_super+0x6c/0x110
        [82039.196486]  kill_anon_super+0xe/0x30
        [82039.197645]  btrfs_kill_super+0x12/0xa0 [btrfs]
        [82039.198696]  deactivate_locked_super+0x3a/0x70
        [82039.199619]  cleanup_mnt+0x3b/0x80
        [82039.200559]  task_work_run+0x93/0xc0
        [82039.201505]  exit_to_usermode_loop+0xfa/0x100
        [82039.202436]  do_syscall_64+0x162/0x1d0
        [82039.203339]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [82039.204091] RIP: 0033:0x7f8db8fbab37
        (...)
        [82039.206360] RSP: 002b:00007ffdce35b468 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        [82039.207132] RAX: 0000000000000000 RBX: 0000560d20b00060 RCX: 00007f8db8fbab37
        [82039.207906] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000560d20b00240
        [82039.208621] RBP: 0000560d20b00240 R08: 0000560d20b00270 R09: 0000000000000015
        [82039.209285] R10: 00000000000006b4 R11: 0000000000000246 R12: 00007f8db94bce64
        [82039.209984] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffdce35b6f0
        [82039.210642] irq event stamp: 0
        [82039.211306] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [82039.211971] hardirqs last disabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.212643] softirqs last  enabled at (0): [<ffffffffb7884ff2>] copy_process.part.33+0x7f2/0x1f00
        [82039.213304] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [82039.213875] ---[ end trace f2521afa616ddccf ]---
      
      Fix this by releasing the reserved metadata on failure to allocate data
      extent(s) for the inode cache.
      
      Fixes: 69fe2d75 ("btrfs: make the delalloc block rsv per inode")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      29d47d00
    • F
      Btrfs: fix hang when loading existing inode cache off disk · 7764d56b
      Filipe Manana 提交于
      If we are able to load an existing inode cache off disk, we set the state
      of the cache to BTRFS_CACHE_FINISHED, but we don't wake up any one waiting
      for the cache to be available. This means that anyone waiting for the
      cache to be available, waiting on the condition that either its state is
      BTRFS_CACHE_FINISHED or its available free space is greather than zero,
      can hang forever.
      
      This could be observed running fstests with MOUNT_OPTIONS="-o inode_cache",
      in particular test case generic/161 triggered it very frequently for me,
      producing a trace like the following:
      
        [63795.739712] BTRFS info (device sdc): enabling inode map caching
        [63795.739714] BTRFS info (device sdc): disk space caching is enabled
        [63795.739716] BTRFS info (device sdc): has skinny extents
        [64036.653886] INFO: task btrfs-transacti:3917 blocked for more than 120 seconds.
        [64036.654079]       Not tainted 5.2.0-rc4-btrfs-next-50 #1
        [64036.654143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [64036.654232] btrfs-transacti D    0  3917      2 0x80004000
        [64036.654239] Call Trace:
        [64036.654258]  ? __schedule+0x3ae/0x7b0
        [64036.654271]  schedule+0x3a/0xb0
        [64036.654325]  btrfs_commit_transaction+0x978/0xae0 [btrfs]
        [64036.654339]  ? remove_wait_queue+0x60/0x60
        [64036.654395]  transaction_kthread+0x146/0x180 [btrfs]
        [64036.654450]  ? btrfs_cleanup_transaction+0x620/0x620 [btrfs]
        [64036.654456]  kthread+0x103/0x140
        [64036.654464]  ? kthread_create_worker_on_cpu+0x70/0x70
        [64036.654476]  ret_from_fork+0x3a/0x50
        [64036.654504] INFO: task xfs_io:3919 blocked for more than 120 seconds.
        [64036.654568]       Not tainted 5.2.0-rc4-btrfs-next-50 #1
        [64036.654617] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [64036.654685] xfs_io          D    0  3919   3633 0x00000000
        [64036.654691] Call Trace:
        [64036.654703]  ? __schedule+0x3ae/0x7b0
        [64036.654716]  schedule+0x3a/0xb0
        [64036.654756]  btrfs_find_free_ino+0xa9/0x120 [btrfs]
        [64036.654764]  ? remove_wait_queue+0x60/0x60
        [64036.654809]  btrfs_create+0x72/0x1f0 [btrfs]
        [64036.654822]  lookup_open+0x6bc/0x790
        [64036.654849]  path_openat+0x3bc/0xc00
        [64036.654854]  ? __lock_acquire+0x331/0x1cb0
        [64036.654869]  do_filp_open+0x99/0x110
        [64036.654884]  ? __alloc_fd+0xee/0x200
        [64036.654895]  ? do_raw_spin_unlock+0x49/0xc0
        [64036.654909]  ? do_sys_open+0x132/0x220
        [64036.654913]  do_sys_open+0x132/0x220
        [64036.654926]  do_syscall_64+0x60/0x1d0
        [64036.654933]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix this by adding a wake_up() call right after setting the cache state to
      BTRFS_CACHE_FINISHED, at start_caching(), when we are able to load the
      cache from disk.
      
      Fixes: 82d5902d ("Btrfs: Support reading/writing on disk free ino cache")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7764d56b
  8. 04 7月, 2019 1 次提交
  9. 06 8月, 2018 2 次提交
  10. 12 4月, 2018 1 次提交
  11. 31 3月, 2018 1 次提交
    • Q
      btrfs: qgroup: Use separate meta reservation type for delalloc · 43b18595
      Qu Wenruo 提交于
      Before this patch, btrfs qgroup is mixing per-transcation meta rsv with
      preallocated meta rsv, making it quite easy to underflow qgroup meta
      reservation.
      
      Since we have the new qgroup meta rsv types, apply it to delalloc
      reservation.
      
      Now for delalloc, most of its reserved space will use META_PREALLOC qgroup
      rsv type.
      
      And for callers reducing outstanding extent like btrfs_finish_ordered_io(),
      they will convert corresponding META_PREALLOC reservation to
      META_PERTRANS.
      
      This is mainly due to the fact that current qgroup numbers will only be
      updated in btrfs_commit_transaction(), that's to say if we don't keep
      such placeholder reservation, we can exceed qgroup limitation.
      
      And for callers freeing outstanding extent in error handler, we will
      just free META_PREALLOC bytes.
      
      This behavior makes callers of btrfs_qgroup_release_meta() or
      btrfs_qgroup_convert_meta() to be aware of which type they are.
      So in this patch, btrfs_delalloc_release_metadata() and its callers get
      an extra parameter to info qgroup to do correct meta convert/release.
      
      The good news is, even we use the wrong type (convert or free), it won't
      cause obvious bug, as prealloc type is always in good shape, and the
      type only affects how per-trans meta is increased or not.
      
      So the worst case will be at most metadata limitation can be sometimes
      exceeded (no convert at all) or metadata limitation is reached too soon
      (no free at all).
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      43b18595
  12. 02 11月, 2017 1 次提交
    • J
      Btrfs: rework outstanding_extents · 8b62f87b
      Josef Bacik 提交于
      Right now we do a lot of weird hoops around outstanding_extents in order
      to keep the extent count consistent.  This is because we logically
      transfer the outstanding_extent count from the initial reservation
      through the set_delalloc_bits.  This makes it pretty difficult to get a
      handle on how and when we need to mess with outstanding_extents.
      
      Fix this by revamping the rules of how we deal with outstanding_extents.
      Now instead everybody that is holding on to a delalloc extent is
      required to increase the outstanding extents count for itself.  This
      means we'll have something like this
      
      btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
       btrfs_set_extent_delalloc	- outstanding_extents = 2
      btrfs_release_delalloc_extents	- outstanding_extents = 1
      
      for an initial file write.  Now take the append write where we extend an
      existing delalloc range but still under the maximum extent size
      
      btrfs_delalloc_reserve_metadata - outstanding_extents = 2
        btrfs_set_extent_delalloc
          btrfs_set_bit_hook		- outstanding_extents = 3
          btrfs_merge_extent_hook	- outstanding_extents = 2
      btrfs_delalloc_release_extents	- outstanding_extnets = 1
      
      In order to make the ordered extent transition we of course must now
      make ordered extents carry their own outstanding_extent reservation, so
      for cow_file_range we end up with
      
      btrfs_add_ordered_extent	- outstanding_extents = 2
      clear_extent_bit		- outstanding_extents = 1
      btrfs_remove_ordered_extent	- outstanding_extents = 0
      
      This makes all manipulations of outstanding_extents much more explicit.
      Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
      combined with btrfs_release_delalloc_extents, even in the error case, as
      that is the only function that actually modifies the
      outstanding_extents counter.
      
      The drawback to this is now we are much more likely to have transient
      cases where outstanding_extents is much larger than it actually should
      be.  This could happen before as we manipulated the delalloc bits, but
      now it happens basically at every write.  This may put more pressure on
      the ENOSPC flushing code, but I think making this code simpler is worth
      the cost.  I have another change coming to mitigate this side-effect
      somewhat.
      
      I also added trace points for the counter manipulation.  These were used
      by a bpf script I wrote to help track down leak issues.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8b62f87b
  13. 30 6月, 2017 1 次提交
  14. 28 2月, 2017 1 次提交
  15. 17 2月, 2017 1 次提交
  16. 06 12月, 2016 3 次提交
  17. 27 9月, 2016 1 次提交
  18. 25 8月, 2016 1 次提交
    • W
      btrfs: update btrfs_space_info's bytes_may_use timely · 18513091
      Wang Xiaoguang 提交于
      This patch can fix some false ENOSPC errors, below test script can
      reproduce one false ENOSPC error:
      	#!/bin/bash
      	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
      	dev=$(losetup --show -f fs.img)
      	mkfs.btrfs -f -M $dev
      	mkdir /tmp/mntpoint
      	mount $dev /tmp/mntpoint
      	cd /tmp/mntpoint
      	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
      
      Above script will fail for ENOSPC reason, but indeed fs still has free
      space to satisfy this request. Please see call graph:
      btrfs_fallocate()
      |-> btrfs_alloc_data_chunk_ondemand()
      |   bytes_may_use += 64M
      |-> btrfs_prealloc_file_range()
          |-> btrfs_reserve_extent()
              |-> btrfs_add_reserved_bytes()
              |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
              |   change bytes_may_use, and bytes_reserved += 64M. Now
              |   bytes_may_use + bytes_reserved == 128M, which is greater
              |   than btrfs_space_info's total_bytes, false enospc occurs.
              |   Note, the bytes_may_use decrease operation will be done in
              |   end of btrfs_fallocate(), which is too late.
      
      Here is another simple case for buffered write:
                          CPU 1              |              CPU 2
                                             |
      |-> cow_file_range()                   |-> __btrfs_buffered_write()
          |-> btrfs_reserve_extent()         |   |
          |                                  |   |
          |                                  |   |
          |    .....                         |   |-> btrfs_check_data_free_space()
          |                                  |
          |                                  |
          |-> extent_clear_unlock_delalloc() |
      
      In CPU 1, btrfs_reserve_extent()->find_free_extent()->
      btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
      operation will be delayed to be done in extent_clear_unlock_delalloc().
      Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
      btrfs_check_data_free_space() tries to reserve 100MB data space.
      If
      	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
      		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
      		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
      btrfs_check_data_free_space() will try to allcate new data chunk or call
      btrfs_start_delalloc_roots(), or commit current transaction in order to
      reserve some free space, obviously a lot of work. But indeed it's not
      necessary as long as decreasing bytes_may_use timely, we still have
      free space, decreasing 128M from bytes_may_use.
      
      To fix this issue, this patch chooses to update bytes_may_use for both
      data and metadata in btrfs_add_reserved_bytes(). For compress path, real
      extent length may not be equal to file content length, so introduce a
      ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
      btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
      file content length. Then compress path can update bytes_may_use
      correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
      and RESERVE_FREE.
      
      As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
      run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
      PREALLOC, we also need to update bytes_may_use, but can not pass
      EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
      here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
      to update btrfs_space_info's bytes_may_use.
      
      Meanwhile __btrfs_prealloc_file_range() will call
      btrfs_free_reserved_data_space() internally for both sucessful and failed
      path, btrfs_prealloc_file_range()'s callers does not need to call
      btrfs_free_reserved_data_space() any more.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      18513091
  19. 26 7月, 2016 2 次提交
  20. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  21. 12 3月, 2016 1 次提交
  22. 16 1月, 2016 1 次提交
    • C
      Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots · f32e48e9
      Chandan Rajendra 提交于
      The following call trace is seen when btrfs/031 test is executed in a loop,
      
      [  158.661848] ------------[ cut here ]------------
      [  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
      [  158.664102] BTRFS: Transaction aborted (error -2)
      [  158.664774] Modules linked in:
      [  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711af #2
      [  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
      [  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
      [  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
      [  158.670769] Call Trace:
      [  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
      [  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
      [  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
      [  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
      [  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
      [  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
      [  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
      [  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
      [  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
      [  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
      [  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
      [  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
      [  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
      [  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
      [  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
      [  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
      [  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
      [  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
      [  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
      [  158.709508] BTRFS info (device loop0): forced readonly
      [  158.737113] BTRFS info (device loop0): disk space caching is enabled
      [  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
      [  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30
      
      This occurs because,
      
      Mount filesystem
      Create subvol with ID 257
      Unmount filesystem
      Mount filesystem
      Delete subvol with ID 257
        btrfs_drop_snapshot()
          Add root corresponding to subvol 257 into
          btrfs_transaction->dropped_roots list
      Create new subvol (i.e. create_subvol())
        257 is returned as the next free objectid
        btrfs_read_fs_root_no_name()
          Finds the btrfs_root instance corresponding to the old subvol with ID 257
          in btrfs_fs_info->fs_roots_radix.
          Returns error since btrfs_root_item->refs has the value of 0.
      
      To fix the issue the commit initializes tree root's and subvolume root's
      highest_objectid when loading the roots from disk.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f32e48e9
  23. 07 1月, 2016 3 次提交
  24. 22 10月, 2015 2 次提交
  25. 01 7月, 2015 2 次提交
    • F
      Btrfs: fix race between caching kthread and returning inode to inode cache · ae9d8f17
      Filipe Manana 提交于
      While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
      we could have a concurrent call to btrfs_return_ino() that adds a new
      entry to the root's free space cache of pinned inodes. This concurrent
      call does not acquire the fs_info->commit_root_sem before adding a new
      entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
      because the caching kthread calls btrfs_unpin_free_ino() after setting
      the caching state to BTRFS_CACHE_FINISHED and therefore races with
      the task calling btrfs_return_ino(), which is adding a new entry, while
      the former (caching kthread) is navigating the cache's rbtree, removing
      and freeing nodes from the cache's rbtree without acquiring the spinlock
      that protects the rbtree.
      
      This race resulted in memory corruption due to double free of struct
      btrfs_free_space objects because both tasks can end up doing freeing the
      same objects. Note that adding a new entry can result in merging it with
      other entries in the cache, in which case those entries are freed.
      This is particularly important as btrfs_free_space structures are also
      used for the block group free space caches.
      
      This memory corruption can be detected by a debugging kernel, which
      reports it with the following trace:
      
      [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
      [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
      [132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
      [132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
      [132408.505075] Call Trace:
      [132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
      [132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
      [132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
      [132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
      [132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
      [132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
      [132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
      [132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
      [132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
      [132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
      [132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
      [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
      [132409.503355] ------------[ cut here ]------------
      [132409.504241] kernel BUG at mm/slab.c:2571!
      
      Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
      that protects the rbtree while doing the searches and removing entries.
      
      Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ae9d8f17
    • F
      Btrfs: use kmem_cache_free when freeing entry in inode cache · c3f4a168
      Filipe Manana 提交于
      The free space entries are allocated using kmem_cache_zalloc(),
      through __btrfs_add_free_space(), therefore we should use
      kmem_cache_free() and not kfree() to avoid any confusion and
      any potential problem. Looking at the kfree() definition at
      mm/slab.c it has the following comment:
      
        /*
         * (...)
         *
         * Don't free memory not originally allocated by kmalloc()
         * or you will run into trouble.
         */
      
      So better be safe and use kmem_cache_free().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      c3f4a168
  26. 11 4月, 2015 1 次提交
    • C
      Btrfs: allow block group cache writeout outside critical section in commit · 1bbc621e
      Chris Mason 提交于
      We loop through all of the dirty block groups during commit and write
      the free space cache.  In order to make sure the cache is currect, we do
      this while no other writers are allowed in the commit.
      
      If a large number of block groups are dirty, this can introduce long
      stalls during the final stages of the commit, which can block new procs
      trying to change the filesystem.
      
      This commit changes the block group cache writeout to take appropriate
      locks and allow it to run earlier in the commit.  We'll still have to
      redo some of the block groups, but it means we can get most of the work
      out of the way without blocking the entire FS.
      Signed-off-by: NChris Mason <clm@fb.com>
      1bbc621e
  27. 03 12月, 2014 1 次提交
    • F
      Btrfs: fix race between writing free space cache and trimming · 55507ce3
      Filipe Manana 提交于
      Trimming is completely transactionless, and the way it operates consists
      of hiding free space entries from a block group, perform the trim/discard
      and then make the free space entries visible again.
      Therefore while a free space entry is being trimmed, we can have free space
      cache writing running in parallel (as part of a transaction commit) which
      will miss the free space entry. This means that an unmount (or crash/reboot)
      after that transaction commit and mount again before another transaction
      starts/commits after the discard finishes, we will have some free space
      that won't be used again unless the free space cache is rebuilt. After the
      unmount, fsck (btrfsck, btrfs check) reports the issue like the following
      example:
      
              *** fsck.btrfs output ***
              checking extents
              checking free space cache
              There is no free space entry for 521764864-521781248
              There is no free space entry for 521764864-1103101952
              cache appears valid but isnt 29360128
              Checking filesystem on /dev/sdc
              UUID: b4789e27-4774-4626-98e9-ae8dfbfb0fb5
              found 1235681286 bytes used err is -22
              (...)
      
      Another issue caused by this race is a crash while writing bitmap entries
      to the cache, because while the cache writeout task accesses the bitmaps,
      the trim task can be concurrently modifying the bitmap or worse might
      be freeing the bitmap. The later case results in the following crash:
      
      [55650.804460] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [55650.804835] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop parport_pc parport i2c_piix4 psmouse evdev pcspkr microcode processor i2ccore serio_raw thermal_sys button ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif sr_mod cdrom crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 [last unloaded: btrfs]
      [55650.806169] CPU: 1 PID: 31002 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
      [55650.806493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [55650.806867] task: ffff8800b12f6410 ti: ffff880071538000 task.ti: ffff880071538000
      [55650.807166] RIP: 0010:[<ffffffffa037cf45>]  [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs]
      [55650.807514] RSP: 0018:ffff88007153bc30  EFLAGS: 00010246
      [55650.807687] RAX: 000000005d1ec000 RBX: ffff8800a665df08 RCX: 0000000000000400
      [55650.807885] RDX: ffff88005d1ec000 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88005d1ec000
      [55650.808017] RBP: ffff88007153bc58 R08: 00000000ddd51536 R09: 00000000000001e0
      [55650.808017] R10: 0000000000000000 R11: 0000000000000037 R12: 6b6b6b6b6b6b6b6b
      [55650.808017] R13: ffff88007153bca8 R14: 6b6b6b6b6b6b6b6b R15: ffff88007153bc98
      [55650.808017] FS:  0000000000000000(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
      [55650.808017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [55650.808017] CR2: 0000000002273b88 CR3: 00000000b18f6000 CR4: 00000000000006e0
      [55650.808017] Stack:
      [55650.808017]  ffff88020e834e00 ffff880172d68db0 0000000000000000 ffff88019257c800
      [55650.808017]  ffff8801d42ea720 ffff88007153bd10 ffffffffa037d2fa ffff880224e99180
      [55650.808017]  ffff8801469a6188 ffff880224e99140 ffff880172d68c50 00000003000000b7
      [55650.808017] Call Trace:
      [55650.808017]  [<ffffffffa037d2fa>] __btrfs_write_out_cache+0x1ea/0x37f [btrfs]
      [55650.808017]  [<ffffffffa037d959>] btrfs_write_out_cache+0xa1/0xd8 [btrfs]
      [55650.808017]  [<ffffffffa033936b>] btrfs_write_dirty_block_groups+0x4b5/0x505 [btrfs]
      [55650.808017]  [<ffffffffa03aa98e>] commit_cowonly_roots+0x15e/0x1f7 [btrfs]
      [55650.808017]  [<ffffffff813eb9c7>] ? _raw_spin_lock+0xe/0x10
      [55650.808017]  [<ffffffffa0346e46>] btrfs_commit_transaction+0x411/0x882 [btrfs]
      [55650.808017]  [<ffffffffa03432a4>] transaction_kthread+0xf2/0x1a4 [btrfs]
      [55650.808017]  [<ffffffffa03431b2>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
      [55650.808017]  [<ffffffff8105966b>] kthread+0xb7/0xbf
      [55650.808017]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
      [55650.808017]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
      [55650.808017]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
      [55650.808017] Code: 4c 89 ef 8d 70 ff e8 d4 fc ff ff 41 8b 45 34 41 39 45 30 7d 5c 31 f6 4c 89 ef e8 80 f6 ff ff 49 8b 7d 00 4c 89 f6 b9 00 04 00 00 <f3> a5 4c 89 ef 41 8b 45 30 8d 70 ff e8 a3 fc ff ff 41 8b 45 34
      [55650.808017] RIP  [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs]
      [55650.808017]  RSP <ffff88007153bc30>
      [55650.815725] ---[ end trace 1c032e96b149ff86 ]---
      
      Fix this by serializing both tasks in such a way that cache writeout
      doesn't wait for the trim/discard of free space entries to finish and
      doesn't miss any free space entry.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      55507ce3