1. 18 4月, 2023 8 次提交
    • C
      btrfs: raid56: no need for irqsafe locking · 74cc3600
      Christoph Hellwig 提交于
      These days all the operations that take locks in the raid56.c code are
      run from user context (mostly workqueues).  Drop all the irqsafe locking
      that is not required any more.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      74cc3600
    • J
      btrfs: abort the transaction if we get an error during snapshot drop · 9a93b5a3
      Josef Bacik 提交于
      We were seeing weird errors when we were testing our btrfs backports
      before we had the incorrect level check fix.  These errors appeared to
      be improper error handling, but error injection testing uncovered that
      the errors were a result of corruption that occurred from improper error
      handling during snapshot delete.
      
      With snapshot delete if we encounter any errors during walk_down or
      walk_up we'll simply return an error, we won't abort the transaction.
      This is problematic because we will be dropping references for nodes and
      leaves along the way, and if we fail in the middle we will leave the
      file system corrupt because we don't know where we left off in the drop.
      
      Fix this by making sure we abort if we hit any errors during the walk
      down or walk up operations, as we have no idea what operations could
      have been left half done at this point.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9a93b5a3
    • J
      btrfs: handle errors in walk_down_tree properly · 4e194384
      Josef Bacik 提交于
      We can get errors in walk_down_proc as we try and lookup extent info for
      the snapshot dropping to act on.  However if we get an error we simply
      return 1 which indicates we're done with walking down, which will lead
      us to improperly continue with the snapshot drop with the incorrect
      information.  Instead break if we get any error from walk_down_proc or
      do_walk_down, and handle the case of ret == 1 by returning 0, otherwise
      return the ret value that we have.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4e194384
    • J
      btrfs: drop root refs properly when orphan cleanup fails · 6989627d
      Josef Bacik 提交于
      When we mount the file system we do something like this:
      
      	while (1) {
      		lookup fs roots;
      
      		for (i = 0; i < num_roots; i++) {
      			ret = btrfs_orphan_cleanup(roots[i]);
      			if (ret)
      				break;
      			btrfs_put_root(roots[i]);
      		}
      	}
      
      	for (; i < num_roots; i++)
      		btrfs_put_root(roots[i]);
      
      As you can see if we break in that inner loop we just go back to the
      outer loop and lose the fact that we have to drop references on the
      remaining roots we looked up.  Fix this by making an out label and
      jumping to that on error so we don't leak a reference to the roots we
      looked up.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6989627d
    • J
      btrfs: add missing iputs on orphan cleanup failure · a13bb2c0
      Josef Bacik 提交于
      We missed a couple of iput()s in the orphan cleanup failure paths, add
      them so we don't get refcount errors. The iput needs to be done in the
      check and not under a common label due to the way the code is
      structured.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a13bb2c0
    • J
      btrfs: handle errors from btrfs_read_node_slot in split · 9cf14029
      Josef Bacik 提交于
      While investigating a problem with error injection I tripped over
      curious behavior in the node/leaf splitting code.  If we get an EIO when
      trying to read either the left or right leaf/node for splitting we'll
      simply treat the node as if it were full and continue on.  The end
      result of this isn't too bad, we simply end up allocating a block when
      we may have pushed items into the adjacent blocks.
      
      However this does essentially allow us to continue to modify a file
      system that we've gotten errors on, either from a bad disk or csum
      mismatch or other corruption.  This isn't particularly safe, so instead
      handle these btrfs_read_node_slot() usages differently.  We allow you to
      pass in any slot, the idea being that we save some code if the slot
      number is outside of the range of the parent.  This means we treat all
      errors the same, when in reality we only want to ignore -ENOENT.
      
      Fix this by changing how we call btrfs_read_node_slot(), which is to
      only call it for slots we know are valid.  This way if we get an error
      back from reading the block we can properly pass the error up the chain.
      This was validated with the error injection testing I was doing.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9cf14029
    • J
      btrfs: replace BUG_ON with ASSERT in btrfs_read_node_slot · d4694728
      Josef Bacik 提交于
      In btrfs_read_node_slot() we have a BUG_ON() that can be converted to an
      ASSERT(), it's from an extent buffer and the level is validated at the
      time it's read from disk.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d4694728
    • J
      btrfs: use btrfs_handle_fs_error in btrfs_fill_super · 13b98989
      Josef Bacik 提交于
      While trying to track down a lost EIO problem I hit the following
      assertion while doing my error injection testing
      
        BTRFS warning (device nvme1n1): transaction 1609 (with 180224 dirty metadata bytes) is not committed
        assertion failed: !found, in fs/btrfs/disk-io.c:4456
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/messages.h:169!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        CPU: 0 PID: 1445 Comm: mount Tainted: G        W          6.2.0-rc5+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
        RIP: 0010:btrfs_assertfail.constprop.0+0x18/0x1a
        RSP: 0018:ffffb95fc3b0bc68 EFLAGS: 00010286
        RAX: 0000000000000034 RBX: ffff9941c2ac2000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffb6741f7d RDI: 00000000ffffffff
        RBP: ffff9941c2ac2428 R08: 0000000000000000 R09: ffffb95fc3b0bb38
        R10: 0000000000000003 R11: ffffffffb71438a8 R12: ffff9941c2ac2428
        R13: ffff9941c2ac2450 R14: ffff9941c2ac2450 R15: 000000000002c000
        FS:  00007fcea2d07800(0000) GS:ffff9941fbc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f00cc7c83a8 CR3: 000000010c686000 CR4: 0000000000350ef0
        Call Trace:
         <TASK>
         close_ctree+0x426/0x48f
         btrfs_mount_root.cold+0x7e/0xee
         ? legacy_parse_param+0x2b/0x220
         legacy_get_tree+0x2b/0x50
         vfs_get_tree+0x29/0xc0
         vfs_kern_mount.part.0+0x73/0xb0
         btrfs_mount+0x11d/0x3d0
         ? legacy_parse_param+0x2b/0x220
         legacy_get_tree+0x2b/0x50
         vfs_get_tree+0x29/0xc0
         path_mount+0x438/0xa40
         __x64_sys_mount+0xe9/0x130
         do_syscall_64+0x3e/0x90
         entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      This is because the error injection did an EIO for the root inode lookup
      and we simply jumped to closing the ctree.  However because we didn't
      mark the file system as having an error we skipped all of the broken
      transaction cleanup stuff, and thus triggered this ASSERT().  Fix this
      by calling btrfs_handle_fs_error() in this case so we have the error set
      on the file system.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      13b98989
  2. 06 4月, 2023 2 次提交
  3. 29 3月, 2023 1 次提交
    • F
      btrfs: ignore fiemap path cache when there are multiple paths for a node · 2280d425
      Filipe Manana 提交于
      During fiemap, when walking backreferences to determine if a b+tree
      node/leaf is shared, we may find a tree block (leaf or node) for which
      two parents were added to the references ulist. This happens if we get
      for example one direct ref (shared tree block ref) and one indirect ref
      (non-shared tree block ref) for the tree block at the current level,
      which can happen during relocation.
      
      In that case the fiemap path cache can not be used since it's meant for
      a single path, with one tree block at each possible level, so having
      multiple references for a tree block at any level may result in getting
      the level counter exceed BTRFS_MAX_LEVEL and eventually trigger the
      warning:
      
         WARN_ON_ONCE(level >= BTRFS_MAX_LEVEL)
      
      at lookup_backref_shared_cache() and at store_backref_shared_cache().
      This is harmless since the code ignores any level >= BTRFS_MAX_LEVEL, the
      warning is there just to catch any unexpected case like the one described
      above. However if a user finds this it may be scary and get reported.
      
      So just ignore the path cache once we find a tree block for which there
      are more than one reference, which is the less common case, and update
      the cache with the sharedness check result for all levels below the level
      for which we found multiple references.
      Reported-by: NJarno Pelkonen <jarno.pelkonen@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAKv8qLmDNAGJGCtsevxx_VZ_YOvvs1L83iEJkTgyA4joJertng@mail.gmail.com/
      Fixes: 12a824dc ("btrfs: speedup checking for extent sharedness during fiemap")
      CC: stable@vger.kernel.org # 6.1+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2280d425
  4. 28 3月, 2023 3 次提交
    • F
      btrfs: fix deadlock when aborting transaction during relocation with scrub · 2d82a40a
      Filipe Manana 提交于
      Before relocating a block group we pause scrub, then do the relocation and
      then unpause scrub. The relocation process requires starting and committing
      a transaction, and if we have a failure in the critical section of the
      transaction commit path (transaction state >= TRANS_STATE_COMMIT_START),
      we will deadlock if there is a paused scrub.
      
      That results in stack traces like the following:
      
        [42.479] BTRFS info (device sdc): relocating block group 53876686848 flags metadata|raid6
        [42.936] BTRFS warning (device sdc): Skipping commit of aborted transaction.
        [42.936] ------------[ cut here ]------------
        [42.936] BTRFS: Transaction aborted (error -28)
        [42.936] WARNING: CPU: 11 PID: 346822 at fs/btrfs/transaction.c:1977 btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
        [42.936] Modules linked in: dm_flakey dm_mod loop btrfs (...)
        [42.936] CPU: 11 PID: 346822 Comm: btrfs Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [42.936] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        [42.936] RIP: 0010:btrfs_commit_transaction+0xcc8/0xeb0 [btrfs]
        [42.936] Code: ff ff 45 8b (...)
        [42.936] RSP: 0018:ffffb58649633b48 EFLAGS: 00010282
        [42.936] RAX: 0000000000000000 RBX: ffff8be6ef4d5bd8 RCX: 0000000000000000
        [42.936] RDX: 0000000000000002 RSI: ffffffffb35e7782 RDI: 00000000ffffffff
        [42.936] RBP: ffff8be6ef4d5c98 R08: 0000000000000000 R09: ffffb586496339e8
        [42.936] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8be6d38c7c00
        [42.936] R13: 00000000ffffffe4 R14: ffff8be6c268c000 R15: ffff8be6ef4d5cf0
        [42.936] FS:  00007f381a82b340(0000) GS:ffff8beddfcc0000(0000) knlGS:0000000000000000
        [42.936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [42.936] CR2: 00007f1e35fb7638 CR3: 0000000117680006 CR4: 0000000000370ee0
        [42.936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [42.936] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [42.936] Call Trace:
        [42.936]  <TASK>
        [42.936]  ? start_transaction+0xcb/0x610 [btrfs]
        [42.936]  prepare_to_relocate+0x111/0x1a0 [btrfs]
        [42.936]  relocate_block_group+0x57/0x5d0 [btrfs]
        [42.936]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
        [42.936]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
        [42.936]  ? __pfx_autoremove_wake_function+0x10/0x10
        [42.936]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
        [42.936]  btrfs_balance+0x8ff/0x11d0 [btrfs]
        [42.936]  ? __kmem_cache_alloc_node+0x14a/0x410
        [42.936]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
        [42.937]  ? mod_objcg_state+0xd2/0x360
        [42.937]  ? refill_obj_stock+0xb0/0x160
        [42.937]  ? seq_release+0x25/0x30
        [42.937]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
        [42.937]  ? percpu_counter_add_batch+0x2e/0xa0
        [42.937]  ? __x64_sys_ioctl+0x88/0xc0
        [42.937]  __x64_sys_ioctl+0x88/0xc0
        [42.937]  do_syscall_64+0x38/0x90
        [42.937]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [42.937] RIP: 0033:0x7f381a6ffe9b
        [42.937] Code: 00 48 89 44 24 (...)
        [42.937] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [42.937] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
        [42.937] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
        [42.937] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
        [42.937] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
        [42.937] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
        [42.937]  </TASK>
        [42.937] ---[ end trace 0000000000000000 ]---
        [42.937] BTRFS: error (device sdc: state A) in cleanup_transaction:1977: errno=-28 No space left
        [59.196] INFO: task btrfs:346772 blocked for more than 120 seconds.
        [59.196]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [59.196] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [59.196] task:btrfs           state:D stack:0     pid:346772 ppid:1      flags:0x00004002
        [59.196] Call Trace:
        [59.196]  <TASK>
        [59.196]  __schedule+0x392/0xa70
        [59.196]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
        [59.196]  schedule+0x5d/0xd0
        [59.196]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
        [59.197]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.197]  scrub_pause_off+0x21/0x50 [btrfs]
        [59.197]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
        [59.197]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
        [59.198]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.198]  scrub_stripe+0x20d/0x740 [btrfs]
        [59.198]  scrub_chunk+0xc4/0x130 [btrfs]
        [59.198]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
        [59.198]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.198]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
        [59.199]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
        [59.199]  ? _copy_from_user+0x7b/0x80
        [59.199]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
        [59.199]  ? refill_stock+0x33/0x50
        [59.199]  ? should_failslab+0xa/0x20
        [59.199]  ? kmem_cache_alloc_node+0x151/0x460
        [59.199]  ? alloc_io_context+0x1b/0x80
        [59.199]  ? preempt_count_add+0x70/0xa0
        [59.199]  ? __x64_sys_ioctl+0x88/0xc0
        [59.199]  __x64_sys_ioctl+0x88/0xc0
        [59.199]  do_syscall_64+0x38/0x90
        [59.199]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [59.199] RIP: 0033:0x7f82ffaffe9b
        [59.199] RSP: 002b:00007f82ff9fcc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [59.199] RAX: ffffffffffffffda RBX: 000055b191e36310 RCX: 00007f82ffaffe9b
        [59.199] RDX: 000055b191e36310 RSI: 00000000c400941b RDI: 0000000000000003
        [59.199] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
        [59.199] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff9fd640
        [59.199] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
        [59.199]  </TASK>
        [59.199] INFO: task btrfs:346773 blocked for more than 120 seconds.
        [59.200]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [59.200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [59.201] task:btrfs           state:D stack:0     pid:346773 ppid:1      flags:0x00004002
        [59.201] Call Trace:
        [59.201]  <TASK>
        [59.201]  __schedule+0x392/0xa70
        [59.201]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
        [59.201]  schedule+0x5d/0xd0
        [59.201]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
        [59.201]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.201]  scrub_pause_off+0x21/0x50 [btrfs]
        [59.202]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
        [59.202]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
        [59.202]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.202]  scrub_stripe+0x20d/0x740 [btrfs]
        [59.202]  scrub_chunk+0xc4/0x130 [btrfs]
        [59.203]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
        [59.203]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.203]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
        [59.203]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
        [59.203]  ? _copy_from_user+0x7b/0x80
        [59.203]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
        [59.204]  ? should_failslab+0xa/0x20
        [59.204]  ? kmem_cache_alloc_node+0x151/0x460
        [59.204]  ? alloc_io_context+0x1b/0x80
        [59.204]  ? preempt_count_add+0x70/0xa0
        [59.204]  ? __x64_sys_ioctl+0x88/0xc0
        [59.204]  __x64_sys_ioctl+0x88/0xc0
        [59.204]  do_syscall_64+0x38/0x90
        [59.204]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [59.204] RIP: 0033:0x7f82ffaffe9b
        [59.204] RSP: 002b:00007f82ff1fbc50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [59.204] RAX: ffffffffffffffda RBX: 000055b191e36790 RCX: 00007f82ffaffe9b
        [59.204] RDX: 000055b191e36790 RSI: 00000000c400941b RDI: 0000000000000003
        [59.204] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
        [59.204] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82ff1fc640
        [59.204] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
        [59.204]  </TASK>
        [59.204] INFO: task btrfs:346774 blocked for more than 120 seconds.
        [59.205]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [59.205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [59.206] task:btrfs           state:D stack:0     pid:346774 ppid:1      flags:0x00004002
        [59.206] Call Trace:
        [59.206]  <TASK>
        [59.206]  __schedule+0x392/0xa70
        [59.206]  schedule+0x5d/0xd0
        [59.206]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
        [59.206]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.206]  scrub_pause_off+0x21/0x50 [btrfs]
        [59.207]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
        [59.207]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
        [59.207]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.207]  scrub_stripe+0x20d/0x740 [btrfs]
        [59.208]  scrub_chunk+0xc4/0x130 [btrfs]
        [59.208]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
        [59.208]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
        [59.208]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
        [59.208]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
        [59.209]  ? _copy_from_user+0x7b/0x80
        [59.209]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
        [59.209]  ? should_failslab+0xa/0x20
        [59.209]  ? kmem_cache_alloc_node+0x151/0x460
        [59.209]  ? alloc_io_context+0x1b/0x80
        [59.209]  ? preempt_count_add+0x70/0xa0
        [59.209]  ? __x64_sys_ioctl+0x88/0xc0
        [59.209]  __x64_sys_ioctl+0x88/0xc0
        [59.209]  do_syscall_64+0x38/0x90
        [59.209]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [59.209] RIP: 0033:0x7f82ffaffe9b
        [59.209] RSP: 002b:00007f82fe9fac50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [59.209] RAX: ffffffffffffffda RBX: 000055b191e36c10 RCX: 00007f82ffaffe9b
        [59.209] RDX: 000055b191e36c10 RSI: 00000000c400941b RDI: 0000000000000003
        [59.209] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
        [59.209] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe9fb640
        [59.209] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
        [59.209]  </TASK>
        [59.209] INFO: task btrfs:346775 blocked for more than 120 seconds.
        [59.210]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [59.210] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [59.211] task:btrfs           state:D stack:0     pid:346775 ppid:1      flags:0x00004002
        [59.211] Call Trace:
        [59.211]  <TASK>
        [59.211]  __schedule+0x392/0xa70
        [59.211]  schedule+0x5d/0xd0
        [59.211]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
        [59.211]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.211]  scrub_pause_off+0x21/0x50 [btrfs]
        [59.212]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
        [59.212]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
        [59.212]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.212]  scrub_stripe+0x20d/0x740 [btrfs]
        [59.213]  scrub_chunk+0xc4/0x130 [btrfs]
        [59.213]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
        [59.213]  ? __mutex_unlock_slowpath.isra.0+0x9a/0x120
        [59.213]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
        [59.213]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
        [59.214]  ? _copy_from_user+0x7b/0x80
        [59.214]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
        [59.214]  ? should_failslab+0xa/0x20
        [59.214]  ? kmem_cache_alloc_node+0x151/0x460
        [59.214]  ? alloc_io_context+0x1b/0x80
        [59.214]  ? preempt_count_add+0x70/0xa0
        [59.214]  ? __x64_sys_ioctl+0x88/0xc0
        [59.214]  __x64_sys_ioctl+0x88/0xc0
        [59.214]  do_syscall_64+0x38/0x90
        [59.214]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [59.214] RIP: 0033:0x7f82ffaffe9b
        [59.214] RSP: 002b:00007f82fe1f9c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [59.214] RAX: ffffffffffffffda RBX: 000055b191e37090 RCX: 00007f82ffaffe9b
        [59.214] RDX: 000055b191e37090 RSI: 00000000c400941b RDI: 0000000000000003
        [59.214] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
        [59.214] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fe1fa640
        [59.214] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
        [59.214]  </TASK>
        [59.214] INFO: task btrfs:346776 blocked for more than 120 seconds.
        [59.215]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [59.216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [59.217] task:btrfs           state:D stack:0     pid:346776 ppid:1      flags:0x00004002
        [59.217] Call Trace:
        [59.217]  <TASK>
        [59.217]  __schedule+0x392/0xa70
        [59.217]  ? __pv_queued_spin_lock_slowpath+0x165/0x370
        [59.217]  schedule+0x5d/0xd0
        [59.217]  __scrub_blocked_if_needed+0x74/0xc0 [btrfs]
        [59.217]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.217]  scrub_pause_off+0x21/0x50 [btrfs]
        [59.217]  scrub_simple_mirror+0x1c7/0x950 [btrfs]
        [59.217]  ? scrub_parity_put+0x1a5/0x1d0 [btrfs]
        [59.218]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.218]  scrub_stripe+0x20d/0x740 [btrfs]
        [59.218]  scrub_chunk+0xc4/0x130 [btrfs]
        [59.218]  scrub_enumerate_chunks+0x3e4/0x7a0 [btrfs]
        [59.219]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.219]  btrfs_scrub_dev+0x236/0x6a0 [btrfs]
        [59.219]  ? btrfs_ioctl+0xd97/0x32c0 [btrfs]
        [59.219]  ? _copy_from_user+0x7b/0x80
        [59.219]  btrfs_ioctl+0xde1/0x32c0 [btrfs]
        [59.219]  ? should_failslab+0xa/0x20
        [59.219]  ? kmem_cache_alloc_node+0x151/0x460
        [59.219]  ? alloc_io_context+0x1b/0x80
        [59.219]  ? preempt_count_add+0x70/0xa0
        [59.219]  ? __x64_sys_ioctl+0x88/0xc0
        [59.219]  __x64_sys_ioctl+0x88/0xc0
        [59.219]  do_syscall_64+0x38/0x90
        [59.219]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [59.219] RIP: 0033:0x7f82ffaffe9b
        [59.219] RSP: 002b:00007f82fd9f8c50 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [59.219] RAX: ffffffffffffffda RBX: 000055b191e37510 RCX: 00007f82ffaffe9b
        [59.219] RDX: 000055b191e37510 RSI: 00000000c400941b RDI: 0000000000000003
        [59.219] RBP: 0000000000000000 R08: 00007fff1575016f R09: 0000000000000000
        [59.219] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f82fd9f9640
        [59.219] R13: 000000000000006b R14: 00007f82ffa87580 R15: 0000000000000000
        [59.219]  </TASK>
        [59.219] INFO: task btrfs:346822 blocked for more than 120 seconds.
        [59.220]       Tainted: G        W          6.3.0-rc2-btrfs-next-127+ #1
        [59.221] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [59.222] task:btrfs           state:D stack:0     pid:346822 ppid:1      flags:0x00004002
        [59.222] Call Trace:
        [59.222]  <TASK>
        [59.222]  __schedule+0x392/0xa70
        [59.222]  schedule+0x5d/0xd0
        [59.222]  btrfs_scrub_cancel+0x91/0x100 [btrfs]
        [59.222]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.222]  btrfs_commit_transaction+0x572/0xeb0 [btrfs]
        [59.223]  ? start_transaction+0xcb/0x610 [btrfs]
        [59.223]  prepare_to_relocate+0x111/0x1a0 [btrfs]
        [59.223]  relocate_block_group+0x57/0x5d0 [btrfs]
        [59.223]  ? btrfs_wait_nocow_writers+0x25/0xb0 [btrfs]
        [59.223]  btrfs_relocate_block_group+0x248/0x3c0 [btrfs]
        [59.224]  ? __pfx_autoremove_wake_function+0x10/0x10
        [59.224]  btrfs_relocate_chunk+0x3b/0x150 [btrfs]
        [59.224]  btrfs_balance+0x8ff/0x11d0 [btrfs]
        [59.224]  ? __kmem_cache_alloc_node+0x14a/0x410
        [59.224]  btrfs_ioctl+0x2334/0x32c0 [btrfs]
        [59.225]  ? mod_objcg_state+0xd2/0x360
        [59.225]  ? refill_obj_stock+0xb0/0x160
        [59.225]  ? seq_release+0x25/0x30
        [59.225]  ? __rseq_handle_notify_resume+0x3b5/0x4b0
        [59.225]  ? percpu_counter_add_batch+0x2e/0xa0
        [59.225]  ? __x64_sys_ioctl+0x88/0xc0
        [59.225]  __x64_sys_ioctl+0x88/0xc0
        [59.225]  do_syscall_64+0x38/0x90
        [59.225]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
        [59.225] RIP: 0033:0x7f381a6ffe9b
        [59.225] RSP: 002b:00007ffd45ecf060 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [59.225] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f381a6ffe9b
        [59.225] RDX: 00007ffd45ecf150 RSI: 00000000c4009420 RDI: 0000000000000003
        [59.225] RBP: 0000000000000003 R08: 0000000000000013 R09: 0000000000000000
        [59.225] R10: 00007f381a60c878 R11: 0000000000000246 R12: 00007ffd45ed0423
        [59.225] R13: 00007ffd45ecf150 R14: 0000000000000000 R15: 00007ffd45ecf148
        [59.225]  </TASK>
      
      What happens is the following:
      
      1) A scrub is running, so fs_info->scrubs_running is 1;
      
      2) Task A starts block group relocation, and at btrfs_relocate_chunk() it
         pauses scrub by calling btrfs_scrub_pause(). That increments
         fs_info->scrub_pause_req from 0 to 1 and waits for the scrub task to
         pause (for fs_info->scrubs_paused to be == to fs_info->scrubs_running);
      
      3) The scrub task pauses at scrub_pause_off(), waiting for
         fs_info->scrub_pause_req to decrease to 0;
      
      4) Task A then enters btrfs_relocate_block_group(), and down that call
         chain we start a transaction and then attempt to commit it;
      
      5) When task A calls btrfs_commit_transaction(), it either will do the
         commit itself or wait for some other task that already started the
         commit of the transaction - it doesn't matter which case;
      
      6) The transaction commit enters state TRANS_STATE_COMMIT_START;
      
      7) An error happens during the transaction commit, like -ENOSPC when
         running delayed refs or delayed items for example;
      
      8) This results in calling transaction.c:cleanup_transaction(), where
         we call btrfs_scrub_cancel(), incrementing fs_info->scrub_cancel_req
         from 0 to 1, and blocking this task waiting for fs_info->scrubs_running
         to decrease to 0;
      
      9) From this point on, both the transaction commit and the scrub task
         hang forever:
      
         1) The transaction commit is waiting for fs_info->scrubs_running to
            be decreased to 0;
      
         2) The scrub task is at scrub_pause_off() waiting for
            fs_info->scrub_pause_req to decrease to 0 - so it can not proceed
            to stop the scrub and decrement fs_info->scrubs_running from 0 to 1.
      
         Therefore resulting in a deadlock.
      
      Fix this by having cleanup_transaction(), called if a transaction commit
      fails, not call btrfs_scrub_cancel() if relocation is in progress, and
      having btrfs_relocate_block_group() call btrfs_scrub_cancel() instead if
      the relocation failed and a transaction abort happened.
      
      This was triggered with btrfs/061 from fstests.
      
      Fixes: 55e3a601 ("btrfs: Fix data checksum error cause by replace with io-load.")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2d82a40a
    • A
      btrfs: scan device in non-exclusive mode · 50d281fc
      Anand Jain 提交于
      This fixes mkfs/mount/check failures due to race with systemd-udevd
      scan.
      
      During the device scan initiated by systemd-udevd, other user space
      EXCL operations such as mkfs, mount, or check may get blocked and result
      in a "Device or resource busy" error. This is because the device
      scan process opens the device with the EXCL flag in the kernel.
      
      Two reports were received:
      
       - btrfs/179 test case, where the fsck command failed with the -EBUSY
         error
      
       - LTP pwritev03 test case, where mkfs.vfs failed with
         the -EBUSY error, when mkfs.vfs tried to overwrite old btrfs filesystem
         on the device.
      
      In both cases, fsck and mkfs (respectively) were racing with a
      systemd-udevd device scan, and systemd-udevd won, resulting in the
      -EBUSY error for fsck and mkfs.
      
      Reproducing the problem has been difficult because there is a very
      small window during which these userspace threads can race to
      acquire the exclusive device open. Even on the system where the problem
      was observed, the problem occurrences were anywhere between 10 to 400
      iterations and chances of reproducing decreases with debug printk()s.
      
      However, an exclusive device open is unnecessary for the scan process,
      as there are no write operations on the device during scan. Furthermore,
      during the mount process, the superblock is re-read in the below
      function call chain:
      
        btrfs_mount_root
         btrfs_open_devices
          open_fs_devices
           btrfs_open_one_device
             btrfs_get_bdev_and_sb
      
      So, to fix this issue, removes the FMODE_EXCL flag from the scan
      operation, and add a comment.
      
      The case where mkfs may still write to the device and a scan is running,
      the btrfs signature is not written at that time so scan will not
      recognize such device.
      Reported-by: NSherry Yang <sherry.yang@oracle.com>
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Link: https://lore.kernel.org/oe-lkp/202303170839.fdf23068-oliver.sang@intel.com
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      50d281fc
    • F
      btrfs: fix race between quota disable and quota assign ioctls · 2f1a6be1
      Filipe Manana 提交于
      The quota assign ioctl can currently run in parallel with a quota disable
      ioctl call. The assign ioctl uses the quota root, while the disable ioctl
      frees that root, and therefore we can have a use-after-free triggered in
      the assign ioctl, leading to a trace like the following when KASAN is
      enabled:
      
        [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
        [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
        [672.724][T736]
        [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
        [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        [672.727][T736] Call Trace:
        [672.728][T736]  <TASK>
        [672.728][T736]  dump_stack_lvl+0xd9/0x150
        [672.725][T736]  print_report+0xc1/0x5e0
        [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
        [672.727][T736]  ? __phys_addr+0xc9/0x150
        [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.722][T736]  kasan_report+0xc0/0xf0
        [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
        [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
        [672.722][T736]  ? split_leaf+0x13d0/0x13d0
        [672.726][T736]  ? rcu_is_watching+0x12/0xb0
        [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
        [672.722][T736]  update_qgroup_status_item+0xf7/0x320
        [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
        [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
        [672.730][T736]  ? spin_bug+0x1d0/0x1d0
        [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
        [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
        [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
        [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
        [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
        [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
        [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
        [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
        [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
        [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
        [672.732][T736]  ? sigprocmask+0xf2/0x340
        [672.737][T736]  ? __fget_files+0x26a/0x480
        [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
        [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.736][T736]  __x64_sys_ioctl+0x198/0x210
        [672.736][T736]  do_syscall_64+0x39/0xb0
        [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.739][T736] RIP: 0033:0x4556ad
        [672.742][T736]  </TASK>
        [672.743][T736]
        [672.748][T736] Allocated by task 27677:
        [672.743][T736]  kasan_save_stack+0x22/0x40
        [672.741][T736]  kasan_set_track+0x25/0x30
        [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
        [672.749][T736]  btrfs_alloc_root+0x48/0x90
        [672.746][T736]  btrfs_create_tree+0x146/0xa20
        [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
        [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
        [672.747][T736]  __x64_sys_ioctl+0x198/0x210
        [672.749][T736]  do_syscall_64+0x39/0xb0
        [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.756][T736]
        [672.757][T736] Freed by task 27677:
        [672.759][T736]  kasan_save_stack+0x22/0x40
        [672.759][T736]  kasan_set_track+0x25/0x30
        [672.756][T736]  kasan_save_free_info+0x2e/0x50
        [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
        [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
        [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
        [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
        [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
        [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
        [672.756][T736]  __x64_sys_ioctl+0x198/0x210
        [672.753][T736]  do_syscall_64+0x39/0xb0
        [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.769][T736]
        [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
        [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
        [672.769][T736] The buggy address is located 520 bytes inside of
        [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
        [672.760][T736]
        [672.764][T736] The buggy address belongs to the physical page:
        [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
        [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
        [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
        [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
        [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
        [672.771][T736] page dumped because: kasan: bad access detected
        [672.778][T736] page_owner tracks the page as allocated
        [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
        [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
        [672.779][T736]  __alloc_pages+0x1cb/0x4a0
        [672.776][T736]  alloc_pages+0x1aa/0x270
        [672.773][T736]  allocate_slab+0x260/0x390
        [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
        [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
        [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
        [672.789][T736]  __kmalloc+0x4e/0x1a0
        [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
        [672.781][T736]  tomoyo_path_perm+0x22f/0x420
        [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
        [672.780][T736]  security_path_unlink+0xdb/0x150
        [672.788][T736]  do_unlinkat+0x377/0x680
        [672.788][T736]  __x64_sys_unlink+0xca/0x110
        [672.789][T736]  do_syscall_64+0x39/0xb0
        [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.784][T736] page last free stack trace:
        [672.787][T736]  free_pcp_prepare+0x4e5/0x920
        [672.787][T736]  free_unref_page+0x1d/0x4e0
        [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
        [672.797][T736]  qlist_free_all+0x6a/0x180
        [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
        [672.797][T736]  __kasan_slab_alloc+0x64/0x90
        [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
        [672.799][T736]  getname_flags.part.0+0x50/0x4e0
        [672.799][T736]  getname_flags+0x9e/0xe0
        [672.792][T736]  vfs_fstatat+0x77/0xb0
        [672.791][T736]  __do_sys_newlstat+0x84/0x100
        [672.798][T736]  do_syscall_64+0x39/0xb0
        [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.790][T736]
        [672.791][T736] Memory state around the buggy address:
        [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]                       ^
        [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
      before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
      call.
      Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2f1a6be1
  5. 16 3月, 2023 7 次提交
  6. 08 3月, 2023 1 次提交
    • F
      btrfs: fix block group item corruption after inserting new block group · 675dfe12
      Filipe Manana 提交于
      We can often end up inserting a block group item, for a new block group,
      with a wrong value for the used bytes field.
      
      This happens if for the new allocated block group, in the same transaction
      that created the block group, we have tasks allocating extents from it as
      well as tasks removing extents from it.
      
      For example:
      
      1) Task A creates a metadata block group X;
      
      2) Two extents are allocated from block group X, so its "used" field is
         updated to 32K, and its "commit_used" field remains as 0;
      
      3) Transaction commit starts, by some task B, and it enters
         btrfs_start_dirty_block_groups(). There it tries to update the block
         group item for block group X, which currently has its "used" field with
         a value of 32K. But that fails since the block group item was not yet
         inserted, and so on failure update_block_group_item() sets the
         "commit_used" field of the block group back to 0;
      
      4) The block group item is inserted by task A, when for example
         btrfs_create_pending_block_groups() is called when releasing its
         transaction handle. This results in insert_block_group_item() inserting
         the block group item in the extent tree (or block group tree), with a
         "used" field having a value of 32K, but without updating the
         "commit_used" field in the block group, which remains with value of 0;
      
      5) The two extents are freed from block X, so its "used" field changes
         from 32K to 0;
      
      6) The transaction commit by task B continues, it enters
         btrfs_write_dirty_block_groups() which calls update_block_group_item()
         for block group X, and there it decides to skip the block group item
         update, because "used" has a value of 0 and "commit_used" has a value
         of 0 too.
      
         As a result, we end up with a block item having a 32K "used" field but
         no extents allocated from it.
      
      When this issue happens, a btrfs check reports an error like this:
      
         [1/7] checking root items
         [2/7] checking extents
         block group [1104150528 1073741824] used 39796736 but extent items used 0
         ERROR: errors found in extent allocation tree or chunk allocation
         (...)
      
      Fix this by making insert_block_group_item() update the block group's
      "commit_used" field.
      
      Fixes: 7248e0ce ("btrfs: skip update of block group item if used bytes are the same")
      CC: stable@vger.kernel.org # 6.2+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      675dfe12
  7. 07 3月, 2023 6 次提交
    • F
      btrfs: fix extent map logging bit not cleared for split maps after dropping range · e4cc1483
      Filipe Manana 提交于
      At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING
      bit on a 'flags' variable that was not initialized. This makes static
      checkers complain about it, so initialize the 'flags' variable before
      clearing the bit.
      
      In practice this has no consequences, because EXTENT_FLAG_LOGGING should
      not be set when btrfs_drop_extent_map_range() is called, as an fsync locks
      the inode in exclusive mode, locks the inode's mmap semaphore in exclusive
      mode too and it always flushes all delalloc.
      
      Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the
      flags of the split extent map.
      Reported-by: NDan Carpenter <error27@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/
      Fixes: db21370b ("btrfs: drop extent map range more efficiently")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e4cc1483
    • J
      btrfs: fix percent calculation for bg reclaim message · 95cd356c
      Johannes Thumshirn 提交于
      We have a report, that the info message for block-group reclaim is
      crossing the 100% used mark.
      
      This is happening as we were truncating the divisor for the division
      (the block_group->length) to a 32bit value.
      
      Fix this by using div64_u64() to not truncate the divisor.
      
      In the worst case, it can lead to a div by zero error and should be
      possible to trigger on 4 disks RAID0, and each device is large enough:
      
        $ mkfs.btrfs  -f /dev/test/scratch[1234] -m raid1 -d raid0
        btrfs-progs v6.1
        [...]
        Filesystem size:    40.00GiB
        Block group profiles:
          Data:             RAID0             4.00GiB <<<
          Metadata:         RAID1           256.00MiB
          System:           RAID1             8.00MiB
      Reported-by: NForza <forza@tnonline.net>
      Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
      Fixes: 5f93e776 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add Qu's note ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      95cd356c
    • N
      btrfs: fix unnecessary increment of read error stat on write error · 98e8d36a
      Naohiro Aota 提交于
      Current btrfs_log_dev_io_error() increases the read error count even if the
      erroneous IO is a WRITE request. This is because it forget to use "else
      if", and all the error WRITE requests counts as READ error as there is (of
      course) no REQ_RAHEAD bit set.
      
      Fixes: c3a62baf ("btrfs: use chained bios when cloning")
      CC: stable@vger.kernel.org # 6.1+
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      98e8d36a
    • V
      btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode · c06016a0
      void0red 提交于
      Even if the slot is already read out, we may still need to re-balance
      the tree, thus it can cause error in that btrfs_del_item() call and we
      need to handle it properly.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: Nvoid0red <void0red@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c06016a0
    • Q
      btrfs: ioctl: return device fsid from DEV_INFO ioctl · 2943868a
      Qu Wenruo 提交于
      Currently user space utilizes dev info ioctl to grab the info of a
      certain devid, this includes its device uuid.  But the returned info is
      not enough to determine if a device is a seed.
      
      Commit a26d60de ("btrfs: sysfs: add devinfo/fsid to retrieve actual
      fsid from the device") exports the same value in sysfs so this is for
      parity with ioctl.  Add a new member, fsid, into
      btrfs_ioctl_dev_info_args, and populate the member with fsid value.
      
      This should not cause any compatibility problem, following the
      combinations:
      
      - Old user space, old kernel
      - Old user space, new kernel
        User space tool won't even check the new member.
      
      - New user space, old kernel
        The kernel won't touch the new member, and user space tool should
        zero out its argument, thus the new member is all zero.
      
        User space tool can then know the kernel doesn't support this fsid
        reporting, and falls back to whatever they can.
      
      - New user space, new kernel
        Go as planned.
      
        Would find the fsid member is no longer zero, and trust its value.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2943868a
    • B
      btrfs: fix potential dead lock in size class loading logic · 12148367
      Boris Burkov 提交于
      As reported by Filipe, there's a potential deadlock caused by
      using btrfs_search_forward on commit_root. The locking there is
      unconditional, even if ->skip_locking and ->search_commit_root is set.
      It's not meant to be used for commit roots, so it always needs to do
      locking.
      
      So if another task is COWing a child node of the same root node and
      then needs to wait for block group caching to complete when trying to
      allocate a metadata extent, it deadlocks.
      
      For example:
      
      [539604.239315] sysrq: Show Blocked State
      [539604.240133] task:kworker/u16:6   state:D stack:0     pid:2119594 ppid:2      flags:0x00004000
      [539604.241613] Workqueue: btrfs-cache btrfs_work_helper [btrfs]
      [539604.242673] Call Trace:
      [539604.243129]  <TASK>
      [539604.243925]  __schedule+0x41d/0xee0
      [539604.244797]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.245399]  ? rwsem_down_read_slowpath+0x185/0x490
      [539604.246111]  schedule+0x5d/0xf0
      [539604.246593]  rwsem_down_read_slowpath+0x2da/0x490
      [539604.247290]  ? rcu_barrier_tasks_trace+0x10/0x20
      [539604.248090]  __down_read_common+0x3d/0x150
      [539604.248702]  down_read_nested+0xc3/0x140
      [539604.249280]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
      [539604.250097]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
      [539604.250915]  btrfs_search_forward+0x59/0x460 [btrfs]
      [539604.251781]  ? btrfs_global_root+0x50/0x70 [btrfs]
      [539604.252476]  caching_thread+0x1be/0x920 [btrfs]
      [539604.253167]  btrfs_work_helper+0xf6/0x400 [btrfs]
      [539604.253848]  process_one_work+0x24f/0x5a0
      [539604.254476]  worker_thread+0x52/0x3b0
      [539604.255166]  ? __pfx_worker_thread+0x10/0x10
      [539604.256047]  kthread+0xf0/0x120
      [539604.256591]  ? __pfx_kthread+0x10/0x10
      [539604.257212]  ret_from_fork+0x29/0x50
      [539604.257822]  </TASK>
      [539604.258233] task:btrfs-transacti state:D stack:0     pid:2236474 ppid:2      flags:0x00004000
      [539604.259802] Call Trace:
      [539604.260243]  <TASK>
      [539604.260615]  __schedule+0x41d/0xee0
      [539604.261205]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.262000]  ? rwsem_down_read_slowpath+0x185/0x490
      [539604.262822]  schedule+0x5d/0xf0
      [539604.263374]  rwsem_down_read_slowpath+0x2da/0x490
      [539604.266228]  ? lock_acquire+0x160/0x310
      [539604.266917]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.267996]  ? lock_contended+0x19e/0x500
      [539604.268720]  __down_read_common+0x3d/0x150
      [539604.269400]  down_read_nested+0xc3/0x140
      [539604.270057]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
      [539604.271129]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
      [539604.272372]  btrfs_search_slot+0x143/0xf70 [btrfs]
      [539604.273295]  update_block_group_item+0x9e/0x190 [btrfs]
      [539604.274282]  btrfs_start_dirty_block_groups+0x1c4/0x4f0 [btrfs]
      [539604.275381]  ? __mutex_unlock_slowpath+0x45/0x280
      [539604.276390]  btrfs_commit_transaction+0xee/0xed0 [btrfs]
      [539604.277391]  ? lock_acquire+0x1a4/0x310
      [539604.278080]  ? start_transaction+0xcb/0x6c0 [btrfs]
      [539604.279099]  transaction_kthread+0x142/0x1c0 [btrfs]
      [539604.279996]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
      [539604.280673]  kthread+0xf0/0x120
      [539604.281050]  ? __pfx_kthread+0x10/0x10
      [539604.281496]  ret_from_fork+0x29/0x50
      [539604.281966]  </TASK>
      [539604.282255] task:fsstress        state:D stack:0     pid:2236483 ppid:1      flags:0x00004006
      [539604.283897] Call Trace:
      [539604.284700]  <TASK>
      [539604.285088]  __schedule+0x41d/0xee0
      [539604.285660]  schedule+0x5d/0xf0
      [539604.286175]  btrfs_wait_block_group_cache_progress+0xf2/0x170 [btrfs]
      [539604.287342]  ? __pfx_autoremove_wake_function+0x10/0x10
      [539604.288450]  find_free_extent+0xd93/0x1750 [btrfs]
      [539604.289256]  ? _raw_spin_unlock+0x29/0x50
      [539604.289911]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
      [539604.290843]  btrfs_reserve_extent+0x147/0x290 [btrfs]
      [539604.291943]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
      [539604.292903]  __btrfs_cow_block+0x138/0x580 [btrfs]
      [539604.293773]  btrfs_cow_block+0x10e/0x240 [btrfs]
      [539604.294595]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
      [539604.295585]  btrfs_update_device+0x71/0x1b0 [btrfs]
      [539604.296459]  btrfs_chunk_alloc_add_chunk_item+0xe0/0x340 [btrfs]
      [539604.297489]  btrfs_chunk_alloc+0x1bf/0x490 [btrfs]
      [539604.298335]  find_free_extent+0x6fa/0x1750 [btrfs]
      [539604.299174]  ? _raw_spin_unlock+0x29/0x50
      [539604.299950]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
      [539604.300918]  btrfs_reserve_extent+0x147/0x290 [btrfs]
      [539604.301797]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
      [539604.303017]  ? lock_release+0x224/0x4a0
      [539604.303855]  __btrfs_cow_block+0x138/0x580 [btrfs]
      [539604.304789]  btrfs_cow_block+0x10e/0x240 [btrfs]
      [539604.305611]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
      [539604.306682]  ? btrfs_global_root+0x50/0x70 [btrfs]
      [539604.308198]  lookup_inline_extent_backref+0x17b/0x7a0 [btrfs]
      [539604.309254]  lookup_extent_backref+0x43/0xd0 [btrfs]
      [539604.310122]  __btrfs_free_extent+0xf8/0x810 [btrfs]
      [539604.310874]  ? lock_release+0x224/0x4a0
      [539604.311724]  ? btrfs_merge_delayed_refs+0x17b/0x1d0 [btrfs]
      [539604.313023]  __btrfs_run_delayed_refs+0x2ba/0x1260 [btrfs]
      [539604.314271]  btrfs_run_delayed_refs+0x8f/0x1c0 [btrfs]
      [539604.315445]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.316706]  btrfs_commit_transaction+0xa2/0xed0 [btrfs]
      [539604.317855]  ? do_raw_spin_unlock+0x4b/0xa0
      [539604.318544]  ? _raw_spin_unlock+0x29/0x50
      [539604.319240]  create_subvol+0x53d/0x6e0 [btrfs]
      [539604.320283]  btrfs_mksubvol+0x4f5/0x590 [btrfs]
      [539604.321220]  __btrfs_ioctl_snap_create+0x11b/0x180 [btrfs]
      [539604.322307]  btrfs_ioctl_snap_create_v2+0xc6/0x150 [btrfs]
      [539604.323295]  btrfs_ioctl+0x9f7/0x33e0 [btrfs]
      [539604.324331]  ? rcu_read_lock_sched_held+0x12/0x70
      [539604.325137]  ? lock_release+0x224/0x4a0
      [539604.325808]  ? __x64_sys_ioctl+0x87/0xc0
      [539604.326467]  __x64_sys_ioctl+0x87/0xc0
      [539604.327109]  do_syscall_64+0x38/0x90
      [539604.327875]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [539604.328792] RIP: 0033:0x7f05a7babaeb
      
      This needs to use regular btrfs_search_slot() with some skip and stop
      logic.
      
      Since we only consider five samples (five search slots), don't bother
      with the complexity of looking for commit_root_sem contention. If
      necessary, it can be added to the load function in between samples.
      Reported-by: NFilipe Manana <fdmanana@kernel.org>
      Link: https://lore.kernel.org/linux-btrfs/CAL3q7H7eKMD44Z1+=Kb-1RFMMeZpAm2fwyO59yeBwCcSOU80Pg@mail.gmail.com/
      Fixes: c7eec3d9 ("btrfs: load block group size class when caching")
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      12148367
  8. 02 3月, 2023 1 次提交
  9. 16 2月, 2023 11 次提交