1. 04 2月, 2022 6 次提交
  2. 03 2月, 2022 14 次提交
  3. 02 2月, 2022 9 次提交
  4. 01 2月, 2022 2 次提交
  5. 31 1月, 2022 9 次提交
    • F
      btrfs: skip reserved bytes warning on unmount after log cleanup failure · 40cdc509
      Filipe Manana 提交于
      After the recent changes made by commit c2e39305 ("btrfs: clear
      extent buffer uptodate when we fail to write it") and its followup fix,
      commit 651740a5 ("btrfs: check WRITE_ERR when trying to read an
      extent buffer"), we can now end up not cleaning up space reservations of
      log tree extent buffers after a transaction abort happens, as well as not
      cleaning up still dirty extent buffers.
      
      This happens because if writeback for a log tree extent buffer failed,
      then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer
      and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on,
      when trying to free the log tree with free_log_tree(), which iterates
      over the tree, we can end up getting an -EIO error when trying to read
      a node or a leaf, since read_extent_buffer_pages() returns -EIO if an
      extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the
      EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return
      immediately as we can not iterate over the entire tree.
      
      In that case we never update the reserved space for an extent buffer in
      the respective block group and space_info object.
      
      When this happens we get the following traces when unmounting the fs:
      
      [174957.284509] BTRFS: error (device dm-0) in cleanup_transaction:1913: errno=-5 IO failure
      [174957.286497] BTRFS: error (device dm-0) in free_log_tree:3420: errno=-5 IO failure
      [174957.399379] ------------[ cut here ]------------
      [174957.402497] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:127 btrfs_put_block_group+0x77/0xb0 [btrfs]
      [174957.407523] Modules linked in: btrfs overlay dm_zero (...)
      [174957.424917] CPU: 2 PID: 3206883 Comm: umount Tainted: G        W         5.16.0-rc5-btrfs-next-109 #1
      [174957.426689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [174957.428716] RIP: 0010:btrfs_put_block_group+0x77/0xb0 [btrfs]
      [174957.429717] Code: 21 48 8b bd (...)
      [174957.432867] RSP: 0018:ffffb70d41cffdd0 EFLAGS: 00010206
      [174957.433632] RAX: 0000000000000001 RBX: ffff8b09c3848000 RCX: ffff8b0758edd1c8
      [174957.434689] RDX: 0000000000000001 RSI: ffffffffc0b467e7 RDI: ffff8b0758edd000
      [174957.436068] RBP: ffff8b0758edd000 R08: 0000000000000000 R09: 0000000000000000
      [174957.437114] R10: 0000000000000246 R11: 0000000000000000 R12: ffff8b09c3848148
      [174957.438140] R13: ffff8b09c3848198 R14: ffff8b0758edd188 R15: dead000000000100
      [174957.439317] FS:  00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
      [174957.440402] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [174957.441164] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
      [174957.442117] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [174957.443076] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [174957.443948] Call Trace:
      [174957.444264]  <TASK>
      [174957.444538]  btrfs_free_block_groups+0x255/0x3c0 [btrfs]
      [174957.445238]  close_ctree+0x301/0x357 [btrfs]
      [174957.445803]  ? call_rcu+0x16c/0x290
      [174957.446250]  generic_shutdown_super+0x74/0x120
      [174957.446832]  kill_anon_super+0x14/0x30
      [174957.447305]  btrfs_kill_super+0x12/0x20 [btrfs]
      [174957.447890]  deactivate_locked_super+0x31/0xa0
      [174957.448440]  cleanup_mnt+0x147/0x1c0
      [174957.448888]  task_work_run+0x5c/0xa0
      [174957.449336]  exit_to_user_mode_prepare+0x1e5/0x1f0
      [174957.449934]  syscall_exit_to_user_mode+0x16/0x40
      [174957.450512]  do_syscall_64+0x48/0xc0
      [174957.450980]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [174957.451605] RIP: 0033:0x7f328fdc4a97
      [174957.452059] Code: 03 0c 00 f7 (...)
      [174957.454320] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [174957.455262] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
      [174957.456131] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
      [174957.457118] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
      [174957.458005] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
      [174957.459113] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
      [174957.460193]  </TASK>
      [174957.460534] irq event stamp: 0
      [174957.461003] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [174957.461947] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
      [174957.463147] softirqs last  enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
      [174957.465116] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [174957.466323] ---[ end trace bc7ee0c490bce3af ]---
      [174957.467282] ------------[ cut here ]------------
      [174957.468184] WARNING: CPU: 2 PID: 3206883 at fs/btrfs/block-group.c:3976 btrfs_free_block_groups+0x330/0x3c0 [btrfs]
      [174957.470066] Modules linked in: btrfs overlay dm_zero (...)
      [174957.483137] CPU: 2 PID: 3206883 Comm: umount Tainted: G        W         5.16.0-rc5-btrfs-next-109 #1
      [174957.484691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [174957.486853] RIP: 0010:btrfs_free_block_groups+0x330/0x3c0 [btrfs]
      [174957.488050] Code: 00 00 00 ad de (...)
      [174957.491479] RSP: 0018:ffffb70d41cffde0 EFLAGS: 00010206
      [174957.492520] RAX: ffff8b08d79310b0 RBX: ffff8b09c3848000 RCX: 0000000000000000
      [174957.493868] RDX: 0000000000000001 RSI: fffff443055ee600 RDI: ffffffffb1131846
      [174957.495183] RBP: ffff8b08d79310b0 R08: 0000000000000000 R09: 0000000000000000
      [174957.496580] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8b08d7931000
      [174957.498027] R13: ffff8b09c38492b0 R14: dead000000000122 R15: dead000000000100
      [174957.499438] FS:  00007f328fb82800(0000) GS:ffff8b0a2d200000(0000) knlGS:0000000000000000
      [174957.500990] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [174957.502117] CR2: 00007fff13563e98 CR3: 0000000404f4e005 CR4: 0000000000370ee0
      [174957.503513] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [174957.504864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [174957.506167] Call Trace:
      [174957.506654]  <TASK>
      [174957.507047]  close_ctree+0x301/0x357 [btrfs]
      [174957.507867]  ? call_rcu+0x16c/0x290
      [174957.508567]  generic_shutdown_super+0x74/0x120
      [174957.509447]  kill_anon_super+0x14/0x30
      [174957.510194]  btrfs_kill_super+0x12/0x20 [btrfs]
      [174957.511123]  deactivate_locked_super+0x31/0xa0
      [174957.511976]  cleanup_mnt+0x147/0x1c0
      [174957.512610]  task_work_run+0x5c/0xa0
      [174957.513309]  exit_to_user_mode_prepare+0x1e5/0x1f0
      [174957.514231]  syscall_exit_to_user_mode+0x16/0x40
      [174957.515069]  do_syscall_64+0x48/0xc0
      [174957.515718]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [174957.516688] RIP: 0033:0x7f328fdc4a97
      [174957.517413] Code: 03 0c 00 f7 d8 (...)
      [174957.521052] RSP: 002b:00007fff13564ec8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
      [174957.522514] RAX: 0000000000000000 RBX: 00007f328feea264 RCX: 00007f328fdc4a97
      [174957.523950] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000560b8ae51dd0
      [174957.525375] RBP: 0000560b8ae51ba0 R08: 0000000000000000 R09: 00007fff13563c40
      [174957.526763] R10: 00007f328fe49fc0 R11: 0000000000000246 R12: 0000000000000000
      [174957.528058] R13: 0000560b8ae51dd0 R14: 0000560b8ae51cb0 R15: 0000000000000000
      [174957.529404]  </TASK>
      [174957.529843] irq event stamp: 0
      [174957.530256] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
      [174957.531061] hardirqs last disabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
      [174957.532075] softirqs last  enabled at (0): [<ffffffffb0e94214>] copy_process+0x934/0x2040
      [174957.533083] softirqs last disabled at (0): [<0000000000000000>] 0x0
      [174957.533865] ---[ end trace bc7ee0c490bce3b0 ]---
      [174957.534452] BTRFS info (device dm-0): space_info 4 has 1070841856 free, is not full
      [174957.535404] BTRFS info (device dm-0): space_info total=1073741824, used=2785280, pinned=0, reserved=49152, may_use=0, readonly=65536 zone_unusable=0
      [174957.537029] BTRFS info (device dm-0): global_block_rsv: size 0 reserved 0
      [174957.537859] BTRFS info (device dm-0): trans_block_rsv: size 0 reserved 0
      [174957.538697] BTRFS info (device dm-0): chunk_block_rsv: size 0 reserved 0
      [174957.539552] BTRFS info (device dm-0): delayed_block_rsv: size 0 reserved 0
      [174957.540403] BTRFS info (device dm-0): delayed_refs_rsv: size 0 reserved 0
      
      This also means that in case we have log tree extent buffers that are
      still dirty, we can end up not cleaning them up in case we find an
      extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case
      we have no way for iterating over the rest of the tree.
      
      This issue is very often triggered with test cases generic/475 and
      generic/648 from fstests.
      
      The issue could almost be fixed by iterating over the io tree attached to
      each log root which keeps tracks of the range of allocated extent buffers,
      log_root->dirty_log_pages, however that does not work and has some
      inconveniences:
      
      1) After we sync the log, we clear the range of the extent buffers from
         the io tree, so we can't find them after writeback. We could keep the
         ranges in the io tree, with a separate bit to signal they represent
         extent buffers already written, but that means we need to hold into
         more memory until the transaction commits.
      
         How much more memory is used depends a lot on whether we are able to
         allocate contiguous extent buffers on disk (and how often) for a log
         tree - if we are able to, then a single extent state record can
         represent multiple extent buffers, otherwise we need multiple extent
         state record structures to track each extent buffer.
         In fact, my earlier approach did that:
      
         https://lore.kernel.org/linux-btrfs/3aae7c6728257c7ce2279d6660ee2797e5e34bbd.1641300250.git.fdmanana@suse.com/
      
         However that can cause a very significant negative impact on
         performance, not only due to the extra memory usage but also because
         we get a larger and deeper dirty_log_pages io tree.
         We got a report that, on beefy machines at least, we can get such
         performance drop with fsmark for example:
      
         https://lore.kernel.org/linux-btrfs/20220117082426.GE32491@xsang-OptiPlex-9020/
      
      2) We would be doing it only to deal with an unexpected and exceptional
         case, which is basically failure to read an extent buffer from disk
         due to IO failures. On a healthy system we don't expect transaction
         aborts to happen after all;
      
      3) Instead of relying on iterating the log tree or tracking the ranges
         of extent buffers in the dirty_log_pages io tree, using the radix
         tree that tracks extent buffers (fs_info->buffer_radix) to find all
         log tree extent buffers is not reliable either, because after writeback
         of an extent buffer it can be evicted from memory by the release page
         callback of the btree inode (btree_releasepage()).
      
      Since there's no way to be able to properly cleanup a log tree without
      being able to read its extent buffers from disk and without using more
      memory to track the logical ranges of the allocated extent buffers do
      the following:
      
      1) When we fail to cleanup a log tree, setup a flag that indicates that
         failure;
      
      2) Trigger writeback of all log tree extent buffers that are still dirty,
         and wait for the writeback to complete. This is just to cleanup their
         state, page states, page leaks, etc;
      
      3) When unmounting the fs, ignore if the number of bytes reserved in a
         block group and in a space_info is not 0 if, and only if, we failed to
         cleanup a log tree. Also ignore only for metadata block groups and the
         metadata space_info object.
      
      This is far from a perfect solution, but it serves to silence test
      failures such as those from generic/475 and generic/648. However having
      a non-zero value for the reserved bytes counters on unmount after a
      transaction abort, is not such a terrible thing and it's completely
      harmless, it does not affect the filesystem integrity in any way.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      40cdc509
    • T
      btrfs: fix use of uninitialized variable at rm device ioctl · 37b45995
      Tom Rix 提交于
      Clang static analysis reports this problem
      ioctl.c:3333:8: warning: 3rd function call argument is an
        uninitialized value
          ret = exclop_start_or_cancel_reloc(fs_info,
      
      cancel is only set in one branch of an if-check and is always used.  So
      initialize to false.
      
      Fixes: 1a15eb72 ("btrfs: use btrfs_get_dev_args_from_path in dev removal ioctls")
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NTom Rix <trix@redhat.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      37b45995
    • F
      btrfs: fix use-after-free after failure to create a snapshot · 28b21c55
      Filipe Manana 提交于
      At ioctl.c:create_snapshot(), we allocate a pending snapshot structure and
      then attach it to the transaction's list of pending snapshots. After that
      we call btrfs_commit_transaction(), and if that returns an error we jump
      to 'fail' label, where we kfree() the pending snapshot structure. This can
      result in a later use-after-free of the pending snapshot:
      
      1) We allocated the pending snapshot and added it to the transaction's
         list of pending snapshots;
      
      2) We call btrfs_commit_transaction(), and it fails either at the first
         call to btrfs_run_delayed_refs() or btrfs_start_dirty_block_groups().
         In both cases, we don't abort the transaction and we release our
         transaction handle. We jump to the 'fail' label and free the pending
         snapshot structure. We return with the pending snapshot still in the
         transaction's list;
      
      3) Another task commits the transaction. This time there's no error at
         all, and then during the transaction commit it accesses a pointer
         to the pending snapshot structure that the snapshot creation task
         has already freed, resulting in a user-after-free.
      
      This issue could actually be detected by smatch, which produced the
      following warning:
      
        fs/btrfs/ioctl.c:843 create_snapshot() warn: '&pending_snapshot->list' not removed from list
      
      So fix this by not having the snapshot creation ioctl directly add the
      pending snapshot to the transaction's list. Instead add the pending
      snapshot to the transaction handle, and then at btrfs_commit_transaction()
      we add the snapshot to the list only when we can guarantee that any error
      returned after that point will result in a transaction abort, in which
      case the ioctl code can safely free the pending snapshot and no one can
      access it anymore.
      
      CC: stable@vger.kernel.org # 5.10+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      28b21c55
    • S
      btrfs: tree-checker: check item_size for dev_item · ea1d1ca4
      Su Yue 提交于
      Check item size before accessing the device item to avoid out of bound
      access, similar to inode_item check.
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ea1d1ca4
    • S
      btrfs: tree-checker: check item_size for inode_item · 0c982944
      Su Yue 提交于
      while mounting the crafted image, out-of-bounds access happens:
      
        [350.429619] UBSAN: array-index-out-of-bounds in fs/btrfs/struct-funcs.c:161:1
        [350.429636] index 1048096 is out of range for type 'page *[16]'
        [350.429650] CPU: 0 PID: 9 Comm: kworker/u8:1 Not tainted 5.16.0-rc4 #1
        [350.429652] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
        [350.429653] Workqueue: btrfs-endio-meta btrfs_work_helper [btrfs]
        [350.429772] Call Trace:
        [350.429774]  <TASK>
        [350.429776]  dump_stack_lvl+0x47/0x5c
        [350.429780]  ubsan_epilogue+0x5/0x50
        [350.429786]  __ubsan_handle_out_of_bounds+0x66/0x70
        [350.429791]  btrfs_get_16+0xfd/0x120 [btrfs]
        [350.429832]  check_leaf+0x754/0x1a40 [btrfs]
        [350.429874]  ? filemap_read+0x34a/0x390
        [350.429878]  ? load_balance+0x175/0xfc0
        [350.429881]  validate_extent_buffer+0x244/0x310 [btrfs]
        [350.429911]  btrfs_validate_metadata_buffer+0xf8/0x100 [btrfs]
        [350.429935]  end_bio_extent_readpage+0x3af/0x850 [btrfs]
        [350.429969]  ? newidle_balance+0x259/0x480
        [350.429972]  end_workqueue_fn+0x29/0x40 [btrfs]
        [350.429995]  btrfs_work_helper+0x71/0x330 [btrfs]
        [350.430030]  ? __schedule+0x2fb/0xa40
        [350.430033]  process_one_work+0x1f6/0x400
        [350.430035]  ? process_one_work+0x400/0x400
        [350.430036]  worker_thread+0x2d/0x3d0
        [350.430037]  ? process_one_work+0x400/0x400
        [350.430038]  kthread+0x165/0x190
        [350.430041]  ? set_kthread_struct+0x40/0x40
        [350.430043]  ret_from_fork+0x1f/0x30
        [350.430047]  </TASK>
        [350.430077] BTRFS warning (device loop0): bad eb member start: ptr 0xffe20f4e start 20975616 member offset 4293005178 size 2
      
      check_leaf() is checking the leaf:
      
        corrupt leaf: root=4 block=29396992 slot=1, bad key order, prev (16140901064495857664 1 0) current (1 204 12582912)
        leaf 29396992 items 6 free space 3565 generation 6 owner DEV_TREE
        leaf 29396992 flags 0x1(WRITTEN) backref revision 1
        fs uuid a62e00e8-e94e-4200-8217-12444de93c2e
        chunk uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
      	  item 0 key (16140901064495857664 INODE_ITEM 0) itemoff 3955 itemsize 40
      		  generation 0 transid 0 size 0 nbytes 17592186044416
      		  block group 0 mode 52667 links 33 uid 0 gid 2104132511 rdev 94223634821136
      		  sequence 100305 flags 0x2409000a(none)
      		  atime 0.0 (1970-01-01 08:00:00)
      		  ctime 2973280098083405823.4294967295 (-269783007-01-01 21:37:03)
      		  mtime 18446744071572723616.4026825121 (1902-04-16 12:40:00)
      		  otime 9249929404488876031.4294967295 (622322949-04-16 04:25:58)
      	  item 1 key (1 DEV_EXTENT 12582912) itemoff 3907 itemsize 48
      		  dev extent chunk_tree 3
      		  chunk_objectid 256 chunk_offset 12582912 length 8388608
      		  chunk_tree_uuid cecbd0f7-9ca0-441e-ae9f-f782f9732bd8
      
      The corrupted leaf of device tree has an inode item. The leaf passed
      checksum and others checks in validate_extent_buffer until check_leaf_item().
      Because of the key type BTRFS_INODE_ITEM, check_inode_item() is called even we
      are in the device tree. Since the
      item offset + sizeof(struct btrfs_inode_item) > eb->len, out-of-bounds access
      is triggered.
      
      The item end vs leaf boundary check has been done before
      check_leaf_item(), so fix it by checking item size in check_inode_item()
      before access of the inode item in extent buffer.
      
      Other check functions except check_dev_item() in check_leaf_item()
      have their item size checks.
      The commit for check_dev_item() is followed.
      
      No regression observed during running fstests.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215299
      CC: stable@vger.kernel.org # 5.10+
      CC: Wenqing Liu <wenqingliu0120@gmail.com>
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0c982944
    • S
      btrfs: fix deadlock between quota disable and qgroup rescan worker · e804861b
      Shin'ichiro Kawasaki 提交于
      Quota disable ioctl starts a transaction before waiting for the qgroup
      rescan worker completes. However, this wait can be infinite and results
      in deadlock because of circular dependency among the quota disable
      ioctl, the qgroup rescan worker and the other task with transaction such
      as block group relocation task.
      
      The deadlock happens with the steps following:
      
      1) Task A calls ioctl to disable quota. It starts a transaction and
         waits for qgroup rescan worker completes.
      2) Task B such as block group relocation task starts a transaction and
         joins to the transaction that task A started. Then task B commits to
         the transaction. In this commit, task B waits for a commit by task A.
      3) Task C as the qgroup rescan worker starts its job and starts a
         transaction. In this transaction start, task C waits for completion
         of the transaction that task A started and task B committed.
      
      This deadlock was found with fstests test case btrfs/115 and a zoned
      null_blk device. The test case enables and disables quota, and the
      block group reclaim was triggered during the quota disable by chance.
      The deadlock was also observed by running quota enable and disable in
      parallel with 'btrfs balance' command on regular null_blk devices.
      
      An example report of the deadlock:
      
        [372.469894] INFO: task kworker/u16:6:103 blocked for more than 122 seconds.
        [372.479944]       Not tainted 5.16.0-rc8 #7
        [372.485067] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [372.493898] task:kworker/u16:6   state:D stack:    0 pid:  103 ppid:     2 flags:0x00004000
        [372.503285] Workqueue: btrfs-qgroup-rescan btrfs_work_helper [btrfs]
        [372.510782] Call Trace:
        [372.514092]  <TASK>
        [372.521684]  __schedule+0xb56/0x4850
        [372.530104]  ? io_schedule_timeout+0x190/0x190
        [372.538842]  ? lockdep_hardirqs_on+0x7e/0x100
        [372.547092]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
        [372.555591]  schedule+0xe0/0x270
        [372.561894]  btrfs_commit_transaction+0x18bb/0x2610 [btrfs]
        [372.570506]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
        [372.578875]  ? free_unref_page+0x3f2/0x650
        [372.585484]  ? finish_wait+0x270/0x270
        [372.591594]  ? release_extent_buffer+0x224/0x420 [btrfs]
        [372.599264]  btrfs_qgroup_rescan_worker+0xc13/0x10c0 [btrfs]
        [372.607157]  ? lock_release+0x3a9/0x6d0
        [372.613054]  ? btrfs_qgroup_account_extent+0xda0/0xda0 [btrfs]
        [372.620960]  ? do_raw_spin_lock+0x11e/0x250
        [372.627137]  ? rwlock_bug.part.0+0x90/0x90
        [372.633215]  ? lock_is_held_type+0xe4/0x140
        [372.639404]  btrfs_work_helper+0x1ae/0xa90 [btrfs]
        [372.646268]  process_one_work+0x7e9/0x1320
        [372.652321]  ? lock_release+0x6d0/0x6d0
        [372.658081]  ? pwq_dec_nr_in_flight+0x230/0x230
        [372.664513]  ? rwlock_bug.part.0+0x90/0x90
        [372.670529]  worker_thread+0x59e/0xf90
        [372.676172]  ? process_one_work+0x1320/0x1320
        [372.682440]  kthread+0x3b9/0x490
        [372.687550]  ? _raw_spin_unlock_irq+0x24/0x50
        [372.693811]  ? set_kthread_struct+0x100/0x100
        [372.700052]  ret_from_fork+0x22/0x30
        [372.705517]  </TASK>
        [372.709747] INFO: task btrfs-transacti:2347 blocked for more than 123 seconds.
        [372.729827]       Not tainted 5.16.0-rc8 #7
        [372.745907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [372.767106] task:btrfs-transacti state:D stack:    0 pid: 2347 ppid:     2 flags:0x00004000
        [372.787776] Call Trace:
        [372.801652]  <TASK>
        [372.812961]  __schedule+0xb56/0x4850
        [372.830011]  ? io_schedule_timeout+0x190/0x190
        [372.852547]  ? lockdep_hardirqs_on+0x7e/0x100
        [372.871761]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
        [372.886792]  schedule+0xe0/0x270
        [372.901685]  wait_current_trans+0x22c/0x310 [btrfs]
        [372.919743]  ? btrfs_put_transaction+0x3d0/0x3d0 [btrfs]
        [372.938923]  ? finish_wait+0x270/0x270
        [372.959085]  ? join_transaction+0xc75/0xe30 [btrfs]
        [372.977706]  start_transaction+0x938/0x10a0 [btrfs]
        [372.997168]  transaction_kthread+0x19d/0x3c0 [btrfs]
        [373.013021]  ? btrfs_cleanup_transaction.isra.0+0xfc0/0xfc0 [btrfs]
        [373.031678]  kthread+0x3b9/0x490
        [373.047420]  ? _raw_spin_unlock_irq+0x24/0x50
        [373.064645]  ? set_kthread_struct+0x100/0x100
        [373.078571]  ret_from_fork+0x22/0x30
        [373.091197]  </TASK>
        [373.105611] INFO: task btrfs:3145 blocked for more than 123 seconds.
        [373.114147]       Not tainted 5.16.0-rc8 #7
        [373.120401] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [373.130393] task:btrfs           state:D stack:    0 pid: 3145 ppid:  3141 flags:0x00004000
        [373.140998] Call Trace:
        [373.145501]  <TASK>
        [373.149654]  __schedule+0xb56/0x4850
        [373.155306]  ? io_schedule_timeout+0x190/0x190
        [373.161965]  ? lockdep_hardirqs_on+0x7e/0x100
        [373.168469]  ? _raw_spin_unlock_irqrestore+0x3e/0x60
        [373.175468]  schedule+0xe0/0x270
        [373.180814]  wait_for_commit+0x104/0x150 [btrfs]
        [373.187643]  ? test_and_set_bit+0x20/0x20 [btrfs]
        [373.194772]  ? kmem_cache_free+0x124/0x550
        [373.201191]  ? btrfs_put_transaction+0x69/0x3d0 [btrfs]
        [373.208738]  ? finish_wait+0x270/0x270
        [373.214704]  ? __btrfs_end_transaction+0x347/0x7b0 [btrfs]
        [373.222342]  btrfs_commit_transaction+0x44d/0x2610 [btrfs]
        [373.230233]  ? join_transaction+0x255/0xe30 [btrfs]
        [373.237334]  ? btrfs_record_root_in_trans+0x4d/0x170 [btrfs]
        [373.245251]  ? btrfs_apply_pending_changes+0x50/0x50 [btrfs]
        [373.253296]  relocate_block_group+0x105/0xc20 [btrfs]
        [373.260533]  ? mutex_lock_io_nested+0x1270/0x1270
        [373.267516]  ? btrfs_wait_nocow_writers+0x85/0x180 [btrfs]
        [373.275155]  ? merge_reloc_roots+0x710/0x710 [btrfs]
        [373.283602]  ? btrfs_wait_ordered_extents+0xd30/0xd30 [btrfs]
        [373.291934]  ? kmem_cache_free+0x124/0x550
        [373.298180]  btrfs_relocate_block_group+0x35c/0x930 [btrfs]
        [373.306047]  btrfs_relocate_chunk+0x85/0x210 [btrfs]
        [373.313229]  btrfs_balance+0x12f4/0x2d20 [btrfs]
        [373.320227]  ? lock_release+0x3a9/0x6d0
        [373.326206]  ? btrfs_relocate_chunk+0x210/0x210 [btrfs]
        [373.333591]  ? lock_is_held_type+0xe4/0x140
        [373.340031]  ? rcu_read_lock_sched_held+0x3f/0x70
        [373.346910]  btrfs_ioctl_balance+0x548/0x700 [btrfs]
        [373.354207]  btrfs_ioctl+0x7f2/0x71b0 [btrfs]
        [373.360774]  ? lockdep_hardirqs_on_prepare+0x410/0x410
        [373.367957]  ? lockdep_hardirqs_on_prepare+0x410/0x410
        [373.375327]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
        [373.383841]  ? find_held_lock+0x2c/0x110
        [373.389993]  ? lock_release+0x3a9/0x6d0
        [373.395828]  ? mntput_no_expire+0xf7/0xad0
        [373.402083]  ? lock_is_held_type+0xe4/0x140
        [373.408249]  ? vfs_fileattr_set+0x9f0/0x9f0
        [373.414486]  ? selinux_file_ioctl+0x349/0x4e0
        [373.420938]  ? trace_raw_output_lock+0xb4/0xe0
        [373.427442]  ? selinux_inode_getsecctx+0x80/0x80
        [373.434224]  ? lockdep_hardirqs_on+0x7e/0x100
        [373.440660]  ? force_qs_rnp+0x2a0/0x6b0
        [373.446534]  ? lock_is_held_type+0x9b/0x140
        [373.452763]  ? __blkcg_punt_bio_submit+0x1b0/0x1b0
        [373.459732]  ? security_file_ioctl+0x50/0x90
        [373.466089]  __x64_sys_ioctl+0x127/0x190
        [373.472022]  do_syscall_64+0x3b/0x90
        [373.477513]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [373.484823] RIP: 0033:0x7f8f4af7e2bb
        [373.490493] RSP: 002b:00007ffcbf936178 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [373.500197] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8f4af7e2bb
        [373.509451] RDX: 00007ffcbf936220 RSI: 00000000c4009420 RDI: 0000000000000003
        [373.518659] RBP: 00007ffcbf93774a R08: 0000000000000013 R09: 00007f8f4b02d4e0
        [373.527872] R10: 00007f8f4ae87740 R11: 0000000000000246 R12: 0000000000000001
        [373.537222] R13: 00007ffcbf936220 R14: 0000000000000000 R15: 0000000000000002
        [373.546506]  </TASK>
        [373.550878] INFO: task btrfs:3146 blocked for more than 123 seconds.
        [373.559383]       Not tainted 5.16.0-rc8 #7
        [373.565748] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        [373.575748] task:btrfs           state:D stack:    0 pid: 3146 ppid:  2168 flags:0x00000000
        [373.586314] Call Trace:
        [373.590846]  <TASK>
        [373.595121]  __schedule+0xb56/0x4850
        [373.600901]  ? __lock_acquire+0x23db/0x5030
        [373.607176]  ? io_schedule_timeout+0x190/0x190
        [373.613954]  schedule+0xe0/0x270
        [373.619157]  schedule_timeout+0x168/0x220
        [373.625170]  ? usleep_range_state+0x150/0x150
        [373.631653]  ? mark_held_locks+0x9e/0xe0
        [373.637767]  ? do_raw_spin_lock+0x11e/0x250
        [373.643993]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
        [373.651267]  ? _raw_spin_unlock_irq+0x24/0x50
        [373.657677]  ? lockdep_hardirqs_on+0x7e/0x100
        [373.664103]  wait_for_completion+0x163/0x250
        [373.670437]  ? bit_wait_timeout+0x160/0x160
        [373.676585]  btrfs_quota_disable+0x176/0x9a0 [btrfs]
        [373.683979]  ? btrfs_quota_enable+0x12f0/0x12f0 [btrfs]
        [373.691340]  ? down_write+0xd0/0x130
        [373.696880]  ? down_write_killable+0x150/0x150
        [373.703352]  btrfs_ioctl+0x3945/0x71b0 [btrfs]
        [373.710061]  ? find_held_lock+0x2c/0x110
        [373.716192]  ? lock_release+0x3a9/0x6d0
        [373.722047]  ? __handle_mm_fault+0x23cd/0x3050
        [373.728486]  ? btrfs_ioctl_get_supported_features+0x20/0x20 [btrfs]
        [373.737032]  ? set_pte+0x6a/0x90
        [373.742271]  ? do_raw_spin_unlock+0x55/0x1f0
        [373.748506]  ? lock_is_held_type+0xe4/0x140
        [373.754792]  ? vfs_fileattr_set+0x9f0/0x9f0
        [373.761083]  ? selinux_file_ioctl+0x349/0x4e0
        [373.767521]  ? selinux_inode_getsecctx+0x80/0x80
        [373.774247]  ? __up_read+0x182/0x6e0
        [373.780026]  ? count_memcg_events.constprop.0+0x46/0x60
        [373.787281]  ? up_write+0x460/0x460
        [373.792932]  ? security_file_ioctl+0x50/0x90
        [373.799232]  __x64_sys_ioctl+0x127/0x190
        [373.805237]  do_syscall_64+0x3b/0x90
        [373.810947]  entry_SYSCALL_64_after_hwframe+0x44/0xae
        [373.818102] RIP: 0033:0x7f1383ea02bb
        [373.823847] RSP: 002b:00007fffeb4d71f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
        [373.833641] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1383ea02bb
        [373.842961] RDX: 00007fffeb4d7210 RSI: 00000000c0109428 RDI: 0000000000000003
        [373.852179] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000078
        [373.861408] R10: 00007f1383daec78 R11: 0000000000000202 R12: 00007fffeb4d874a
        [373.870647] R13: 0000000000493099 R14: 0000000000000001 R15: 0000000000000000
        [373.879838]  </TASK>
        [373.884018]
                     Showing all locks held in the system:
        [373.894250] 3 locks held by kworker/4:1/58:
        [373.900356] 1 lock held by khungtaskd/63:
        [373.906333]  #0: ffffffff8945ff60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260
        [373.917307] 3 locks held by kworker/u16:6/103:
        [373.923938]  #0: ffff888127b4f138 ((wq_completion)btrfs-qgroup-rescan){+.+.}-{0:0}, at: process_one_work+0x712/0x1320
        [373.936555]  #1: ffff88810b817dd8 ((work_completion)(&work->normal_work)){+.+.}-{0:0}, at: process_one_work+0x73f/0x1320
        [373.951109]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_qgroup_rescan_worker+0x1f6/0x10c0 [btrfs]
        [373.964027] 2 locks held by less/1803:
        [373.969982]  #0: ffff88813ed56098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x24/0x80
        [373.981295]  #1: ffffc90000b3b2e8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x9e2/0x1060
        [373.992969] 1 lock held by btrfs-transacti/2347:
        [373.999893]  #0: ffff88813d4887a8 (&fs_info->transaction_kthread_mutex){+.+.}-{3:3}, at: transaction_kthread+0xe3/0x3c0 [btrfs]
        [374.015872] 3 locks held by btrfs/3145:
        [374.022298]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl_balance+0xc3/0x700 [btrfs]
        [374.034456]  #1: ffff88813d48a0a0 (&fs_info->reclaim_bgs_lock){+.+.}-{3:3}, at: btrfs_balance+0xfe5/0x2d20 [btrfs]
        [374.047646]  #2: ffff88813d488838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: btrfs_relocate_block_group+0x354/0x930 [btrfs]
        [374.063295] 4 locks held by btrfs/3146:
        [374.069647]  #0: ffff888102dd4460 (sb_writers#18){.+.+}-{0:0}, at: btrfs_ioctl+0x38b1/0x71b0 [btrfs]
        [374.081601]  #1: ffff88813d488bb8 (&fs_info->subvol_sem){+.+.}-{3:3}, at: btrfs_ioctl+0x38fd/0x71b0 [btrfs]
        [374.094283]  #2: ffff888102dd4650 (sb_internal#2){.+.+}-{0:0}, at: btrfs_quota_disable+0xc8/0x9a0 [btrfs]
        [374.106885]  #3: ffff88813d489800 (&fs_info->qgroup_ioctl_lock){+.+.}-{3:3}, at: btrfs_quota_disable+0xd5/0x9a0 [btrfs]
      
        [374.126780] =============================================
      
      To avoid the deadlock, wait for the qgroup rescan worker to complete
      before starting the transaction for the quota disable ioctl. Clear
      BTRFS_FS_QUOTA_ENABLE flag before the wait and the transaction to
      request the worker to complete. On transaction start failure, set the
      BTRFS_FS_QUOTA_ENABLE flag again. These BTRFS_FS_QUOTA_ENABLE flag
      changes can be done safely since the function btrfs_quota_disable is not
      called concurrently because of fs_info->subvol_sem.
      
      Also check the BTRFS_FS_QUOTA_ENABLE flag in qgroup_rescan_init to avoid
      another qgroup rescan worker to start after the previous qgroup worker
      completed.
      
      CC: stable@vger.kernel.org # 5.4+
      Suggested-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e804861b
    • Q
      btrfs: don't start transaction for scrub if the fs is mounted read-only · 2d192fc4
      Qu Wenruo 提交于
      [BUG]
      The following super simple script would crash btrfs at unmount time, if
      CONFIG_BTRFS_ASSERT() is set.
      
       mkfs.btrfs -f $dev
       mount $dev $mnt
       xfs_io -f -c "pwrite 0 4k" $mnt/file
       umount $mnt
       mount -r ro $dev $mnt
       btrfs scrub start -Br $mnt
       umount $mnt
      
      This will trigger the following ASSERT() introduced by commit
      0a31daa4 ("btrfs: add assertion for empty list of transactions at
      late stage of umount").
      
      That patch is definitely not the cause, it just makes enough noise for
      developers.
      
      [CAUSE]
      We will start transaction for the following call chain during scrub:
      
        scrub_enumerate_chunks()
        |- btrfs_inc_block_group_ro()
           |- btrfs_join_transaction()
      
      However for RO mount, there is no running transaction at all, thus
      btrfs_join_transaction() will start a new transaction.
      
      Furthermore, since it's read-only mount, btrfs_sync_fs() will not call
      btrfs_commit_super() to commit the new but empty transaction.
      
      And leads to the ASSERT().
      
      The bug has been there for a long time. Only the new ASSERT() makes it
      noisy enough to be noticed.
      
      [FIX]
      For read-only scrub on read-only mount, there is no need to start a
      transaction nor to allocate new chunks in btrfs_inc_block_group_ro().
      
      Just do extra read-only mount check in btrfs_inc_block_group_ro(), and
      if it's read-only, skip all chunk allocation and go inc_block_group_ro()
      directly.
      
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2d192fc4
    • D
      xfs: return errors in xfs_fs_sync_fs · 2d86293c
      Darrick J. Wong 提交于
      Now that the VFS will do something with the return values from
      ->sync_fs, make ours pass on error codes.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NChristian Brauner <brauner@kernel.org>
      2d86293c
    • D
      quota: make dquot_quota_sync return errors from ->sync_fs · dd5532a4
      Darrick J. Wong 提交于
      Strangely, dquot_quota_sync ignores the return code from the ->sync_fs
      call, which means that quotacalls like Q_SYNC never see the error.  This
      doesn't seem right, so fix that.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NChristian Brauner <brauner@kernel.org>
      dd5532a4
反馈
建议
客服 返回
顶部