1. 08 12月, 2020 10 次提交
    • J
      btrfs: use btrfs_tree_read_lock in btrfs_search_slot · fe596ca3
      Josef Bacik 提交于
      We no longer use recursion, so
      __btrfs_tree_read_lock(BTRFS_NESTING_NORMAL) == btrfs_tree_read_lock.
      Replace this call with the simple helper.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fe596ca3
    • J
      btrfs: merge back btrfs_read_lock_root_node helpers · 1bb96598
      Josef Bacik 提交于
      We no longer have recursive locking and there's no need for separate
      helpers that allowed the transition to rwsem with minimal code changes.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1bb96598
    • J
      btrfs: remove btrfs_path::recurse · 2f5239dc
      Josef Bacik 提交于
      With my async free space cache loading patches ("btrfs: load free space
      cache asynchronously") we no longer have a user of path->recurse and can
      remove it.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2f5239dc
    • J
      btrfs: unlock to current level in btrfs_next_old_leaf · 0e46318d
      Josef Bacik 提交于
      Filipe reported the following lockdep splat
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.10.0-rc2-btrfs-next-71 #1 Not tainted
        ------------------------------------------------------
        find/324157 is trying to acquire lock:
        ffff8ebc48d293a0 (btrfs-tree-01#2/3){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
      
        but task is already holding lock:
        ffff8eb9932c5088 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (btrfs-tree-00){++++}-{3:3}:
      	 lock_acquire+0xd8/0x490
      	 down_write_nested+0x44/0x120
      	 __btrfs_tree_lock+0x27/0x120 [btrfs]
      	 btrfs_search_slot+0x2a3/0xc50 [btrfs]
      	 btrfs_insert_empty_items+0x58/0xa0 [btrfs]
      	 insert_with_overflow+0x44/0x110 [btrfs]
      	 btrfs_insert_xattr_item+0xb8/0x1d0 [btrfs]
      	 btrfs_setxattr+0xd6/0x4c0 [btrfs]
      	 btrfs_setxattr_trans+0x68/0x100 [btrfs]
      	 __vfs_setxattr+0x66/0x80
      	 __vfs_setxattr_noperm+0x70/0x200
      	 vfs_setxattr+0x6b/0x120
      	 setxattr+0x125/0x240
      	 path_setxattr+0xba/0xd0
      	 __x64_sys_setxattr+0x27/0x30
      	 do_syscall_64+0x33/0x80
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (btrfs-tree-01#2/3){++++}-{3:3}:
      	 check_prev_add+0x91/0xc60
      	 __lock_acquire+0x1689/0x3130
      	 lock_acquire+0xd8/0x490
      	 down_read_nested+0x45/0x220
      	 __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
      	 btrfs_next_old_leaf+0x27d/0x580 [btrfs]
      	 btrfs_real_readdir+0x1e3/0x4b0 [btrfs]
      	 iterate_dir+0x170/0x1c0
      	 __x64_sys_getdents64+0x83/0x140
      	 do_syscall_64+0x33/0x80
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(btrfs-tree-00);
      				 lock(btrfs-tree-01#2/3);
      				 lock(btrfs-tree-00);
          lock(btrfs-tree-01#2/3);
      
         *** DEADLOCK ***
      
        5 locks held by find/324157:
         #0: ffff8ebc502c6e00 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x4d/0x60
         #1: ffff8eb97f689980 (&type->i_mutex_dir_key#10){++++}-{3:3}, at: iterate_dir+0x52/0x1c0
         #2: ffff8ebaec00ca58 (btrfs-tree-02#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
         #3: ffff8eb98f986f78 (btrfs-tree-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
         #4: ffff8eb9932c5088 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
      
        stack backtrace:
        CPU: 2 PID: 324157 Comm: find Not tainted 5.10.0-rc2-btrfs-next-71 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         check_noncircular+0xff/0x110
         ? mark_lock.part.0+0x468/0xe90
         check_prev_add+0x91/0xc60
         __lock_acquire+0x1689/0x3130
         ? kvm_clock_read+0x14/0x30
         ? kvm_sched_clock_read+0x5/0x10
         lock_acquire+0xd8/0x490
         ? __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
         down_read_nested+0x45/0x220
         ? __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
         __btrfs_tree_read_lock+0x32/0x1a0 [btrfs]
         btrfs_next_old_leaf+0x27d/0x580 [btrfs]
         btrfs_real_readdir+0x1e3/0x4b0 [btrfs]
         iterate_dir+0x170/0x1c0
         __x64_sys_getdents64+0x83/0x140
         ? filldir+0x1d0/0x1d0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This happens because btrfs_next_old_leaf searches down to our current
      key, and then walks up the path until we can move to the next slot, and
      then reads back down the path so we get the next leaf.
      
      However it doesn't unlock any lower levels until it replaces them with
      the new extent buffer.  This is technically fine, but of course causes
      lockdep to complain, because we could be holding locks on lower levels
      while locking upper levels.
      
      Fix this by dropping all nodes below the level that we use as our new
      starting point before we start reading back down the path.  This also
      allows us to drop the nested/recursive locking magic, because we're no
      longer locking two nodes at the same level anymore.
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0e46318d
    • J
      btrfs: cleanup the locking in btrfs_next_old_leaf · ffeb03cf
      Josef Bacik 提交于
      We are carrying around this next_rw_lock from when we would do spinning
      vs blocking read locks.  Now that we have the rwsem locking we can
      simply use the read lock flag unconditionally and the read lock helpers.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ffeb03cf
    • J
      btrfs: pass root owner to read_tree_block · 1b7ec85e
      Josef Bacik 提交于
      In order to properly set the lockdep class of a newly allocated block we
      need to know the owner of the block.  For non-refcounted trees this is
      straightforward, we always know in advance what tree we're reading from.
      For refcounted trees we don't necessarily know, however all refcounted
      trees share the same lockdep class name, tree-<level>.
      
      Fix all the callers of read_tree_block() to pass in the root objectid
      we're using.  In places like relocation and backref we could probably
      unconditionally use 0, but just in case use the root when we have it,
      otherwise use 0 in the cases we don't have the root as it's going to be
      a refcounted tree anyway.
      
      This is a preparation patch for further changes.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1b7ec85e
    • J
      btrfs: use btrfs_read_node_slot in btrfs_realloc_node · 206983b7
      Josef Bacik 提交于
      We have this open-coded nightmare in btrfs_realloc_node that does
      the same thing that the normal read path does, which is to see if we
      have the eb in memory already, and if not read it, and verify the eb is
      uptodate.  Delete this open coding and simply use btrfs_read_node_slot.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      206983b7
    • J
      btrfs: cleanup extent buffer readahead · bfb484d9
      Josef Bacik 提交于
      We're going to pass around more information when we allocate extent
      buffers, in order to make that cleaner how we do readahead.  Most of the
      callers have the parent node that we're getting our blockptr from, with
      the sole exception of relocation which simply has the bytenr it wants to
      read.
      
      Add a helper that takes the current arguments that we need (bytenr and
      gen), and add another helper for simply reading the slot out of a node.
      In followup patches the helper that takes all the extra arguments will
      be expanded, and the simpler helper won't need to have it's arguments
      adjusted.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bfb484d9
    • J
      btrfs: locking: rip out path->leave_spinning · b9729ce0
      Josef Bacik 提交于
      We no longer distinguish between blocking and spinning, so rip out all
      this code.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b9729ce0
    • J
      btrfs: locking: remove all the blocking helpers · ac5887c8
      Josef Bacik 提交于
      Now that we're using a rw_semaphore we no longer need to indicate if a
      lock is blocking or not, nor do we need to flip the entire path from
      blocking to spinning.  Remove these helpers and all the places they are
      called.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ac5887c8
  2. 07 10月, 2020 16 次提交
    • J
      btrfs: cleanup cow block on error · 572c83ac
      Josef Bacik 提交于
      In fstest btrfs/064 a transaction abort in __btrfs_cow_block could lead
      to a system lockup. It gets stuck trying to write back inodes, and the
      write back thread was trying to lock an extent buffer:
      
        $ cat /proc/2143497/stack
        [<0>] __btrfs_tree_lock+0x108/0x250
        [<0>] lock_extent_buffer_for_io+0x35e/0x3a0
        [<0>] btree_write_cache_pages+0x15a/0x3b0
        [<0>] do_writepages+0x28/0xb0
        [<0>] __writeback_single_inode+0x54/0x5c0
        [<0>] writeback_sb_inodes+0x1e8/0x510
        [<0>] wb_writeback+0xcc/0x440
        [<0>] wb_workfn+0xd7/0x650
        [<0>] process_one_work+0x236/0x560
        [<0>] worker_thread+0x55/0x3c0
        [<0>] kthread+0x13a/0x150
        [<0>] ret_from_fork+0x1f/0x30
      
      This is because we got an error while COWing a block, specifically here
      
              if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) {
                      ret = btrfs_reloc_cow_block(trans, root, buf, cow);
                      if (ret) {
                              btrfs_abort_transaction(trans, ret);
                              return ret;
                      }
              }
      
        [16402.241552] BTRFS: Transaction aborted (error -2)
        [16402.242362] WARNING: CPU: 1 PID: 2563188 at fs/btrfs/ctree.c:1074 __btrfs_cow_block+0x376/0x540
        [16402.249469] CPU: 1 PID: 2563188 Comm: fsstress Not tainted 5.9.0-rc6+ #8
        [16402.249936] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        [16402.250525] RIP: 0010:__btrfs_cow_block+0x376/0x540
        [16402.252417] RSP: 0018:ffff9cca40e578b0 EFLAGS: 00010282
        [16402.252787] RAX: 0000000000000025 RBX: 0000000000000002 RCX: ffff9132bbd19388
        [16402.253278] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9132bbd19380
        [16402.254063] RBP: ffff9132b41a49c0 R08: 0000000000000000 R09: 0000000000000000
        [16402.254887] R10: 0000000000000000 R11: ffff91324758b080 R12: ffff91326ef17ce0
        [16402.255694] R13: ffff91325fc0f000 R14: ffff91326ef176b0 R15: ffff9132815e2000
        [16402.256321] FS:  00007f542c6d7b80(0000) GS:ffff9132bbd00000(0000) knlGS:0000000000000000
        [16402.256973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [16402.257374] CR2: 00007f127b83f250 CR3: 0000000133480002 CR4: 0000000000370ee0
        [16402.257867] Call Trace:
        [16402.258072]  btrfs_cow_block+0x109/0x230
        [16402.258356]  btrfs_search_slot+0x530/0x9d0
        [16402.258655]  btrfs_lookup_file_extent+0x37/0x40
        [16402.259155]  __btrfs_drop_extents+0x13c/0xd60
        [16402.259628]  ? btrfs_block_rsv_migrate+0x4f/0xb0
        [16402.259949]  btrfs_replace_file_extents+0x190/0x820
        [16402.260873]  btrfs_clone+0x9ae/0xc00
        [16402.261139]  btrfs_extent_same_range+0x66/0x90
        [16402.261771]  btrfs_remap_file_range+0x353/0x3b1
        [16402.262333]  vfs_dedupe_file_range_one.part.0+0xd5/0x140
        [16402.262821]  vfs_dedupe_file_range+0x189/0x220
        [16402.263150]  do_vfs_ioctl+0x552/0x700
        [16402.263662]  __x64_sys_ioctl+0x62/0xb0
        [16402.264023]  do_syscall_64+0x33/0x40
        [16402.264364]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [16402.264862] RIP: 0033:0x7f542c7d15cb
        [16402.266901] RSP: 002b:00007ffd35944ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [16402.267627] RAX: ffffffffffffffda RBX: 00000000009d1968 RCX: 00007f542c7d15cb
        [16402.268298] RDX: 00000000009d2490 RSI: 00000000c0189436 RDI: 0000000000000003
        [16402.268958] RBP: 00000000009d2520 R08: 0000000000000036 R09: 00000000009d2e64
        [16402.269726] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
        [16402.270659] R13: 000000000001f000 R14: 00000000009d1970 R15: 00000000009d2e80
        [16402.271498] irq event stamp: 0
        [16402.271846] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        [16402.272497] hardirqs last disabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
        [16402.273343] softirqs last  enabled at (0): [<ffffffff910dbf59>] copy_process+0x6b9/0x1ba0
        [16402.273905] softirqs last disabled at (0): [<0000000000000000>] 0x0
        [16402.274338] ---[ end trace 737874a5a41a8236 ]---
        [16402.274669] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
        [16402.276179] BTRFS info (device dm-9): forced readonly
        [16402.277046] BTRFS: error (device dm-9) in btrfs_replace_file_extents:2723: errno=-2 No such entry
        [16402.278744] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
        [16402.279968] BTRFS: error (device dm-9) in __btrfs_cow_block:1074: errno=-2 No such entry
        [16402.280582] BTRFS info (device dm-9): balance: ended with status: -30
      
      The problem here is that as soon as we allocate the new block it is
      locked and marked dirty in the btree inode.  This means that we could
      attempt to writeback this block and need to lock the extent buffer.
      However we're not unlocking it here and thus we deadlock.
      
      Fix this by unlocking the cow block if we have any errors inside of
      __btrfs_cow_block, and also free it so we do not leak it.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      572c83ac
    • N
      btrfs: improve error message in setup_items_for_insert · 7269ddd2
      Nikolay Borisov 提交于
      Reword and update formats to match variable types.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update formats ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7269ddd2
    • N
    • N
      btrfs: sink total_data parameter in setup_items_for_insert · fc0d82e1
      Nikolay Borisov 提交于
      That parameter can easily be derived based on the "data_size" and "nr"
      parameters exploit this fact to simply the function's signature. No
      functional changes.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fc0d82e1
    • N
      btrfs: eliminate total_size parameter from setup_items_for_insert · 3dc9dc89
      Nikolay Borisov 提交于
      The value of this argument can be derived from the total_data as it's
      simply the value of the data size + size of btrfs_items being touched.
      Move the parameter calculation inside the function. This results in a
      simpler interface and also a minor size reduction:
      
      ./scripts/bloat-o-meter ctree.original fs/btrfs/ctree.o
      add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-34 (-34)
      Function                                     old     new   delta
      btrfs_duplicate_item                         260     259      -1
      setup_items_for_insert                      1200    1190     -10
      btrfs_insert_empty_items                     177     154     -23
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3dc9dc89
    • N
      btrfs: re-arrange statements in setup_items_for_insert · fc0716c2
      Nikolay Borisov 提交于
      Rearrange statements calculating the offset of the newly added items so
      that the calculation has to be done only once. No functional change.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fc0716c2
    • J
      btrfs: use BTRFS_NESTED_NEW_ROOT for double splits · ca9d473a
      Josef Bacik 提交于
      I've made this change separate since it requires both of the newly added
      NESTED flags and I didn't want to slip it into one of those changes.
      
      If we do a double split of a node we can end up doing a
      BTRFS_NESTED_SPLIT on level 0, which throws lockdep off because it
      appears as a double lock.  Since we're maxed out on subclasses, use
      BTRFS_NESTED_NEW_ROOT if we had to do a double split.  This is OK
      because we won't have to do a double split if we had to insert a new
      root, and the new root would be at a higher level anyway.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ca9d473a
    • J
      btrfs: introduce BTRFS_NESTING_NEW_ROOT for adding new roots · cf6f34aa
      Josef Bacik 提交于
      The way we add new roots is confusing from a locking perspective for
      lockdep.  We generally have the rule that we lock things in order from
      highest level to lowest, but in the case of adding a new level to the
      tree we actually allocate a new block for the root, which makes the
      locking go in reverse.  A similar issue exists for snapshotting, we cow
      the original root for the root of a new tree, however they're at the
      same level.  Address this by using BTRFS_NESTING_NEW_ROOT for these
      operations.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cf6f34aa
    • J
      btrfs: introduce BTRFS_NESTING_SPLIT for split blocks · 4dff97e6
      Josef Bacik 提交于
      If we are splitting a leaf/node, we could do something like the
      following
      
      lock(leaf)  BTRFS_NESTING_NORMAL
        lock(left) BTRFS_NESTING_LEFT + BTRFS_NESTING_COW
          push from leaf -> left
            reset path to point to left
              split left
                allocate new block, lock block BTRFS_NESTING_SPLIT
      
      at the new block point we need to have a different nesting level,
      because we have already used either BTRFS_NESTING_LEFT or
      BTRFS_NESTING_RIGHT when pushing items from the original leaf into the
      adjacent leaves.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4dff97e6
    • J
      btrfs: introduce BTRFS_NESTING_LEFT/RIGHT_COW · bf59a5a2
      Josef Bacik 提交于
      For similar reasons as BTRFS_NESTING_COW, we need
      BTRFS_NESTING_LEFT/RIGHT_COW.  The pattern is this
      
      lock leaf -> BTRFS_NESTING_NORMAL
        cow leaf -> BTRFS_NESTING_COW
          split leaf
            lock left -> BTRFS_NESTING_LEFT
              cow left -> BTRFS_NESTING_LEFT_COW
      
      We need this in order to indicate to lockdep that these locks are
      discrete and are being taken in a safe order.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bf59a5a2
    • J
      btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT · bf77467a
      Josef Bacik 提交于
      Our lockdep maps are based on rootid+level, however in some cases we
      will lock adjacent blocks on the same level, namely in searching forward
      or in split/balance.  Because of this lockdep will complain, so we need
      a separate subclass to indicate to lockdep that these are different
      locks.
      
      lock leaf -> BTRFS_NESTING_NORMAL
        cow leaf -> BTRFS_NESTING_COW
          split leaf
             lock left -> BTRFS_NESTING_LEFT
             lock right -> BTRFS_NESTING_RIGHT
      
      The above graph illustrates the need for this new nesting subclass.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bf77467a
    • J
      btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks · 9631e4cc
      Josef Bacik 提交于
      When we COW a block we are holding a lock on the original block, and
      then we lock the new COW block.  Because our lockdep maps are based on
      root + level, this will make lockdep complain.  We need a way to
      indicate a subclass for locking the COW'ed block, so plumb through our
      btrfs_lock_nesting from btrfs_cow_block down to the btrfs_init_buffer,
      and then introduce BTRFS_NESTING_COW to be used for cow'ing blocks.
      
      The reason I've added all this extra infrastructure is because there
      will be need of different nesting classes in follow up patches.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9631e4cc
    • J
      btrfs: add nesting tags to the locking helpers · fd7ba1c1
      Josef Bacik 提交于
      We will need these when we switch to an rwsem, so plumb in the
      infrastructure here to use later on.  I violate the 80 character limit
      some here because it'll be cleaned up later.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fd7ba1c1
    • J
      btrfs: introduce btrfs_path::recurse · 51899412
      Josef Bacik 提交于
      Our current tree locking stuff allows us to recurse with read locks if
      we're already holding the write lock.  This is necessary for the space
      cache inode, as we could be holding a lock on the root_tree root when we
      need to cache a block group, and thus need to be able to read down the
      root_tree to read in the inode cache.
      
      We can get away with this in our current locking, but we won't be able
      to with a rwsem.  Handle this by purposefully annotating the places
      where we require recursion, so that in the future we can maybe come up
      with a way to avoid the recursion.  In the case of the free space inode,
      this will be superseded by the free space tree.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      51899412
    • Q
      btrfs: ctree: check key order before merging tree blocks · d16c702f
      Qu Wenruo 提交于
      [BUG]
      With a crafted image, btrfs can panic at btrfs_del_csums():
      
        kernel BUG at fs/btrfs/ctree.c:3188!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 0 PID: 1156 Comm: btrfs-transacti Not tainted 5.0.0-rc8+ #9
        RIP: 0010:btrfs_set_item_key_safe+0x16c/0x180
        RSP: 0018:ffff976141257ab8 EFLAGS: 00010202
        RAX: 0000000000000001 RBX: ffff898a6b890930 RCX: 0000000004b70000
        RDX: 0000000000000000 RSI: ffff976141257bae RDI: ffff976141257acf
        RBP: ffff976141257b10 R08: 0000000000001000 R09: ffff9761412579a8
        R10: 0000000000000000 R11: 0000000000000000 R12: ffff976141257abe
        R13: 0000000000000003 R14: ffff898a6a8be578 R15: ffff976141257bae
        FS: 0000000000000000(0000) GS:ffff898a77a00000(0000) knlGS:0000000000000000
        CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f779d9cd624 CR3: 000000022b2b4006 CR4: 00000000000206f0
        Call Trace:
        truncate_one_csum+0xac/0xf0
        btrfs_del_csums+0x24f/0x3a0
        __btrfs_free_extent.isra.72+0x5a7/0xbe0
        __btrfs_run_delayed_refs+0x539/0x1120
        btrfs_run_delayed_refs+0xdb/0x1b0
        btrfs_commit_transaction+0x52/0x950
        ? start_transaction+0x94/0x450
        transaction_kthread+0x163/0x190
        kthread+0x105/0x140
        ? btrfs_cleanup_transaction+0x560/0x560
        ? kthread_destroy_worker+0x50/0x50
        ret_from_fork+0x35/0x40
        Modules linked in:
        ---[ end trace 93bf9db00e6c374e ]---
      
      [CAUSE]
      This crafted image has a tricky key order corruption:
      
        checksum tree key (CSUM_TREE ROOT_ITEM 0)
        node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
                ...
                key (EXTENT_CSUM EXTENT_CSUM 73785344) block 29757440 gen 19
                key (EXTENT_CSUM EXTENT_CSUM 77594624) block 29753344 gen 19
                ...
      
        leaf 29757440 items 5 free space 150 generation 19 owner CSUM_TREE
                item 0 key (EXTENT_CSUM EXTENT_CSUM 73785344) itemoff 2323 itemsize 1672
                        range start 73785344 end 75497472 length 1712128
                item 1 key (EXTENT_CSUM EXTENT_CSUM 75497472) itemoff 2319 itemsize 4
                        range start 75497472 end 75501568 length 4096
                item 2 key (EXTENT_CSUM EXTENT_CSUM 75501568) itemoff 579 itemsize 1740
                        range start 75501568 end 77283328 length 1781760
                item 3 key (EXTENT_CSUM EXTENT_CSUM 77283328) itemoff 575 itemsize 4
                        range start 77283328 end 77287424 length 4096
                item 4 key (EXTENT_CSUM EXTENT_CSUM 4120596480) itemoff 275 itemsize 300 <<<
                        range start 4120596480 end 4120903680 length 307200
        leaf 29753344 items 3 free space 1936 generation 19 owner CSUM_TREE
                item 0 key (18446744073457893366 EXTENT_CSUM 77594624) itemoff 2323 itemsize 1672
                        range start 77594624 end 79306752 length 1712128
                ...
      
      Note the item 4 key of leaf 29757440, which is obviously too large, and
      even larger than the first key of the next leaf.
      
      However it still follows the key order in that tree block, thus tree
      checker is unable to detect it at read time, since tree checker can only
      work inside one leaf, thus such complex corruption can't be detected in
      advance.
      
      [FIX]
      The next time to detect such problem is at tree block merge time,
      which is in push_node_left(), balance_node_right(), push_leaf_left() or
      push_leaf_right().
      
      Now we check if the key order of the right-most key of the left node is
      larger than the left-most key of the right node.
      
      By this we don't need to call the full tree-checker, while still keeping
      the key order correct as key order in each node is already checked by
      tree checker thus we only need to check the above two slots.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202833Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d16c702f
    • R
      btrfs: delete duplicated words + other fixes in comments · 260db43c
      Randy Dunlap 提交于
      Delete repeated words in fs/btrfs/.
      {to, the, a, and old}
      and change "into 2 part" to "into 2 parts".
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      260db43c
  3. 27 8月, 2020 1 次提交
  4. 20 8月, 2020 1 次提交
  5. 27 7月, 2020 1 次提交
    • D
      btrfs: add little-endian optimized key helpers · ce6ef5ab
      David Sterba 提交于
      The CPU and on-disk keys are mapped to two different structures because
      of the endianness. There's an intermediate buffer used to do the
      conversion, but this is not necessary when CPU and on-disk endianness
      match.
      
      Add optimized versions of helpers that take disk_key and use the buffer
      directly for CPU keys or drop the intermediate buffer and conversion.
      
      This saves a lot of stack space accross many functions and removes about
      6K of generated binary code:
      
         text    data     bss     dec     hex filename
      1090439   17468   14912 1122819  112203 pre/btrfs.ko
      1084613   17456   14912 1116981  110b35 post/btrfs.ko
      
      Delta: -5826
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce6ef5ab
  6. 02 7月, 2020 1 次提交
  7. 28 5月, 2020 2 次提交
  8. 25 5月, 2020 5 次提交
  9. 24 3月, 2020 3 次提交