1. 22 6月, 2021 1 次提交
  2. 21 6月, 2021 1 次提交
    • J
      btrfs: always abort the transaction if we abort a trans handle · 5963ffca
      Josef Bacik 提交于
      While stress testing our error handling I noticed that sometimes we
      would still commit the transaction even though we had aborted the
      transaction.
      
      Currently we track if a trans handle has dirtied any metadata, and if it
      hasn't we mark the filesystem as having an error (so no new transactions
      can be started), but we will allow the current transaction to complete
      as we do not mark the transaction itself as having been aborted.
      
      This sounds good in theory, but we were not properly tracking IO errors
      in btrfs_finish_ordered_io, and thus committing the transaction with
      bogus free space data.  This isn't necessarily a problem per-se with the
      free space cache, as the other guards in place would have kept us from
      accepting the free space cache as valid, but highlights a real world
      case where we had a bug and could have corrupted the filesystem because
      of it.
      
      This "skip abort on empty trans handle" is nice in theory, but assumes
      we have perfect error handling everywhere, which we clearly do not.
      Also we do not allow further transactions to be started, so all this
      does is save the last transaction that was happening, which doesn't
      necessarily gain us anything other than the potential for real
      corruption.
      
      Remove this particular bit of code, if we decide we need to abort the
      transaction then abort the current one and keep us from doing real harm
      to the file system, regardless of whether this specific trans handle
      dirtied anything or not.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5963ffca
  3. 21 4月, 2021 1 次提交
    • Q
      btrfs: more graceful errors/warnings on 32bit systems when reaching limits · e9306ad4
      Qu Wenruo 提交于
      Btrfs uses internally mapped u64 address space for all its metadata.
      Due to the page cache limit on 32bit systems, btrfs can't access
      metadata at or beyond (ULONG_MAX + 1) << PAGE_SHIFT. See
      how MAX_LFS_FILESIZE and page::index are defined.  This is 16T for 4K
      page size while 256T for 64K page size.
      
      Users can have a filesystem which doesn't have metadata beyond the
      boundary at mount time, but later balance can cause it to create
      metadata beyond the boundary.
      
      And modification to MM layer is unrealistic just for such minor use
      case. We can't do more than to prevent mounting such filesystem or warn
      early when the numbers are still within the limits.
      
      To address such problem, this patch will introduce the following checks:
      
      - Mount time rejection
        This will reject any fs which has metadata chunk at or beyond the
        boundary.
      
      - Mount time early warning
        If there is any metadata chunk beyond 5/8th of the boundary, we do an
        early warning and hope the end user will see it.
      
      - Runtime extent buffer rejection
        If we're going to allocate an extent buffer at or beyond the boundary,
        reject such request with EOVERFLOW.
        This is definitely going to cause problems like transaction abort, but
        we have no better ways.
      
      - Runtime extent buffer early warning
        If an extent buffer beyond 5/8th of the max file size is allocated, do
        an early warning.
      
      Above error/warning message will only be printed once for each fs to
      reduce dmesg flood.
      
      If the mount is rejected, the filesystem will be mountable only on a
      64bit host.
      
      Link: https://lore.kernel.org/linux-btrfs/1783f16d-7a28-80e6-4c32-fdf19b705ed0@gmx.com/Reported-by: NErik Jensen <erikjensen@rkjnsn.net>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e9306ad4
  4. 02 3月, 2021 1 次提交
    • B
      btrfs: fix spurious free_space_tree remount warning · c55a4319
      Boris Burkov 提交于
      The intended logic of the check is to catch cases where the desired
      free_space_tree setting doesn't match the mounted setting, and the
      remount is anything but ro->rw. However, it makes the mistake of
      checking equality on a masked integer (btrfs_test_opt) against a boolean
      (btrfs_fs_compat_ro).
      
      If you run the reproducer:
        $ mount -o space_cache=v2 dev mnt
        $ mount -o remount,ro mnt
      
      you would expect no warning, because the remount is not attempting to
      change the free space tree setting, but we do see the warning.
      
      To fix this, add explicit bool type casts to the condition.
      
      I tested a variety of transitions:
      sudo mount -o space_cache=v2 /dev/vg0/lv0 mnt/lol
      (fst enabled)
      mount -o remount,ro mnt/lol
      (no warning, no fst change)
      sudo mount -o remount,rw,space_cache=v1,clear_cache
      (no warning, ro->rw)
      sudo mount -o remount,rw,space_cache=v2 mnt
      (warning, rw->rw with change)
      sudo mount -o remount,ro mnt
      (no warning, no fst change)
      sudo mount -o remount,rw,space_cache=v2 mnt
      (no warning, no fst change)
      Reported-by: NChris Murphy <lists@colorremedies.com>
      CC: stable@vger.kernel.org # 5.11
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c55a4319
  5. 09 2月, 2021 2 次提交
  6. 18 12月, 2020 3 次提交
    • F
      btrfs: run delayed iputs when remounting RO to avoid leaking them · a8cc263e
      Filipe Manana 提交于
      When remounting RO, after setting the superblock with the RO flag, the
      cleaner task will start sleeping and do nothing, since the call to
      btrfs_need_cleaner_sleep() keeps returning 'true'. However, when the
      cleaner task goes to sleep, the list of delayed iputs may not be empty.
      
      As long as we are in RO mode, the cleaner task will keep sleeping and
      never run the delayed iputs. This means that if a filesystem unmount
      is started, we get into close_ctree() with a non-empty list of delayed
      iputs, and because the filesystem is in RO mode and is not in an error
      state (or a transaction aborted), btrfs_error_commit_super() and
      btrfs_commit_super(), which run the delayed iputs, are never called,
      and later we fail the assertion that checks if the delayed iputs list
      is empty:
      
        assertion failed: list_empty(&fs_info->delayed_iputs), in fs/btrfs/disk-io.c:4049
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3153!
        invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 1 PID: 3780621 Comm: umount Tainted: G             L    5.6.0-rc2-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs]
        Code: 8b 7b 58 48 85 ff 74 (...)
        RSP: 0018:ffffb748c89bbdf8 EFLAGS: 00010246
        RAX: 0000000000000051 RBX: ffff9608f2584000 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffff91998988 RDI: 00000000ffffffff
        RBP: ffff9608f25870d8 R08: 0000000000000000 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffc0cbc500
        R13: ffffffff92411750 R14: 0000000000000000 R15: ffff9608f2aab250
        FS:  00007fcbfaa66c80(0000) GS:ffff960936c80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fffc2c2dd38 CR3: 0000000235e54002 CR4: 00000000003606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         close_ctree+0x1a2/0x2e6 [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x93/0xc0
         exit_to_usermode_loop+0xf9/0x100
         do_syscall_64+0x20d/0x260
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x7fcbfaca6307
        Code: eb 0b 00 f7 d8 64 89 (...)
        RSP: 002b:00007fffc2c2ed68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 0000558203b559b0 RCX: 00007fcbfaca6307
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000558203b55bc0
        RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fffc2c2dad0
        R10: 0000558203b55bf0 R11: 0000000000000246 R12: 0000558203b55bc0
        R13: 00007fcbfadcc204 R14: 0000558203b55aa8 R15: 0000000000000000
        Modules linked in: btrfs dm_flakey dm_log_writes (...)
        ---[ end trace d44d303790049ef6 ]---
      
      So fix this by making the remount RO path run any remaining delayed iputs
      after waiting for the cleaner to become inactive.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a8cc263e
    • F
      btrfs: fix race between RO remount and the cleaner task · a0a1db70
      Filipe Manana 提交于
      When we are remounting a filesystem in RO mode we can race with the cleaner
      task and result in leaking a transaction if the filesystem is unmounted
      shortly after, before the transaction kthread had a chance to commit that
      transaction. That also results in a crash during unmount, due to a
      use-after-free, if hardware acceleration is not available for crc32c.
      
      The following sequence of steps explains how the race happens.
      
      1) The filesystem is mounted in RW mode and the cleaner task is running.
         This means that currently BTRFS_FS_CLEANER_RUNNING is set at
         fs_info->flags;
      
      2) The cleaner task is currently running delayed iputs for example;
      
      3) A filesystem RO remount operation starts;
      
      4) The RO remount task calls btrfs_commit_super(), which commits any
         currently open transaction, and it finishes;
      
      5) At this point the cleaner task is still running and it creates a new
         transaction by doing one of the following things:
      
         * When running the delayed iput() for an inode with a 0 link count,
           in which case at btrfs_evict_inode() we start a transaction through
           the call to evict_refill_and_join(), use it and then release its
           handle through btrfs_end_transaction();
      
         * When deleting a dead root through btrfs_clean_one_deleted_snapshot(),
           a transaction is started at btrfs_drop_snapshot() and then its handle
           is released through a call to btrfs_end_transaction_throttle();
      
         * When the remount task was still running, and before the remount task
           called btrfs_delete_unused_bgs(), the cleaner task also called
           btrfs_delete_unused_bgs() and it picked and removed one block group
           from the list of unused block groups. Before the cleaner task started
           a transaction, through btrfs_start_trans_remove_block_group() at
           btrfs_delete_unused_bgs(), the remount task had already called
           btrfs_commit_super();
      
      6) So at this point the filesystem is in RO mode and we have an open
         transaction that was started by the cleaner task;
      
      7) Shortly after a filesystem unmount operation starts. At close_ctree()
         we stop the transaction kthread before it had a chance to commit the
         transaction, since less than 30 seconds (the default commit interval)
         have elapsed since the last transaction was committed;
      
      8) We end up calling iput() against the btree inode at close_ctree() while
         there is an open transaction, and since that transaction was used to
         update btrees by the cleaner, we have dirty pages in the btree inode
         due to COW operations on metadata extents, and therefore writeback is
         triggered for the btree inode.
      
         So btree_write_cache_pages() is invoked to flush those dirty pages
         during the final iput() on the btree inode. This results in creating a
         bio and submitting it, which makes us end up at
         btrfs_submit_metadata_bio();
      
      9) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
         that calls btrfs_wq_submit_bio(), because check_async_write() returned
         a value of 1. This value of 1 is because we did not have hardware
         acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
         set in fs_info->flags;
      
      10) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
          workqueue at fs_info->workers, which was already freed before by the
          call to btrfs_stop_all_workers() at close_ctree(). This results in an
          invalid memory access due to a use-after-free, leading to a crash.
      
      When this happens, before the crash there are several warnings triggered,
      since we have reserved metadata space in a block group, the delayed refs
      reservation, etc:
      
        ------------[ cut here ]------------
        WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
        Code: f0 01 00 00 48 39 c2 75 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
        RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
        RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 48 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c6 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Code: 48 83 bb b0 03 00 00 00 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
        RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c7 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Code: ad de 49 be 22 01 00 (...)
        RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
        RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
        RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c8 ]---
        BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
        BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
        BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
      
      And the crash, which only happens when we do not have crc32c hardware
      acceleration, produces the following trace immediately after those
      warnings:
      
        stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
        Code: 54 55 53 48 89 f3 (...)
        RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
        RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
        RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
        R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
        FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
         btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
         submit_one_bio+0x61/0x70 [btrfs]
         btree_write_cache_pages+0x414/0x450 [btrfs]
         ? kobject_put+0x9a/0x1d0
         ? trace_hardirqs_on+0x1b/0xf0
         ? _raw_spin_unlock_irqrestore+0x3c/0x60
         ? free_debug_processing+0x1e1/0x2b0
         do_writepages+0x43/0xe0
         ? lock_acquired+0x199/0x490
         __writeback_single_inode+0x59/0x650
         writeback_single_inode+0xaf/0x120
         write_inode_now+0x94/0xd0
         iput+0x187/0x2b0
         close_ctree+0x2c6/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f3cfebabee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
        RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
        R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        ---[ end trace dd74718fef1ed5cc ]---
      
      Finally when we remove the btrfs module (rmmod btrfs), there are several
      warnings about objects that were allocated from our slabs but were never
      freed, consequence of the transaction that was never committed and got
      leaked:
      
        =============================================================================
        BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x0000000050cbdd61 @offset=12104
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
              btrfs_free_tree_block+0x128/0x360 [btrfs]
              __btrfs_cow_block+0x489/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              sync_filesystem+0x74/0x90
              generic_shutdown_super+0x22/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x0000000086e9b0ff @offset=12776
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
              btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
              commit_cowonly_roots+0x248/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000001a340018 @offset=4408
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
              btrfs_free_tree_block+0x128/0x360 [btrfs]
              __btrfs_cow_block+0x489/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              btrfs_commit_transaction+0x60/0xc40 [btrfs]
              create_subvol+0x56a/0x990 [btrfs]
              btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
              __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
              btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
              btrfs_ioctl+0x1a92/0x36f0 [btrfs]
              __x64_sys_ioctl+0x83/0xb0
              do_syscall_64+0x33/0x80
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x000000002b46292a @offset=13648
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
              btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
        INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? __mutex_unlock_slowpath+0x45/0x2a0
         kmem_cache_destroy+0x55/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000004cf95ea8 @offset=6264
        INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
      
      So fix this by making the remount path to wait for the cleaner task before
      calling btrfs_commit_super(). The remount path now waits for the bit
      BTRFS_FS_CLEANER_RUNNING to be cleared from fs_info->flags before calling
      btrfs_commit_super() and this ensures the cleaner can not start a
      transaction after that, because it sleeps when the filesystem is in RO
      mode and we have already flagged the filesystem as RO before waiting for
      BTRFS_FS_CLEANER_RUNNING to be cleared.
      
      This also introduces a new flag BTRFS_FS_STATE_RO to be used for
      fs_info->fs_state when the filesystem is in RO mode. This is because we
      were doing the RO check using the flags of the superblock and setting the
      RO mode simply by ORing into the superblock's flags - those operations are
      not atomic and could result in the cleaner not seeing the update from the
      remount task after it clears BTRFS_FS_CLEANER_RUNNING.
      Tested-by: NFabian Vogt <fvogt@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a0a1db70
    • F
      btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan · cb13eea3
      Filipe Manana 提交于
      If we remount a filesystem in RO mode while the qgroup rescan worker is
      running, we can end up having it still running after the remount is done,
      and at unmount time we may end up with an open transaction that ends up
      never getting committed. If that happens we end up with several memory
      leaks and can crash when hardware acceleration is unavailable for crc32c.
      Possibly it can lead to other nasty surprises too, due to use-after-free
      issues.
      
      The following steps explain how the problem happens.
      
      1) We have a filesystem mounted in RW mode and the qgroup rescan worker is
         running;
      
      2) We remount the filesystem in RO mode, and never stop/pause the rescan
         worker, so after the remount the rescan worker is still running. The
         important detail here is that the rescan task is still running after
         the remount operation committed any ongoing transaction through its
         call to btrfs_commit_super();
      
      3) The rescan is still running, and after the remount completed, the
         rescan worker started a transaction, after it finished iterating all
         leaves of the extent tree, to update the qgroup status item in the
         quotas tree. It does not commit the transaction, it only releases its
         handle on the transaction;
      
      4) A filesystem unmount operation starts shortly after;
      
      5) The unmount task, at close_ctree(), stops the transaction kthread,
         which had not had a chance to commit the open transaction since it was
         sleeping and the commit interval (default of 30 seconds) has not yet
         elapsed since the last time it committed a transaction;
      
      6) So after stopping the transaction kthread we still have the transaction
         used to update the qgroup status item open. At close_ctree(), when the
         filesystem is in RO mode and no transaction abort happened (or the
         filesystem is in error mode), we do not expect to have any transaction
         open, so we do not call btrfs_commit_super();
      
      7) We then proceed to destroy the work queues, free the roots and block
         groups, etc. After that we drop the last reference on the btree inode
         by calling iput() on it. Since there are dirty pages for the btree
         inode, corresponding to the COWed extent buffer for the quotas btree,
         btree_write_cache_pages() is invoked to flush those dirty pages. This
         results in creating a bio and submitting it, which makes us end up at
         btrfs_submit_metadata_bio();
      
      8) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
         that calls btrfs_wq_submit_bio(), because check_async_write() returned
         a value of 1. This value of 1 is because we did not have hardware
         acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
         set in fs_info->flags;
      
      9) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
         workqueue at fs_info->workers, which was already freed before by the
         call to btrfs_stop_all_workers() at close_ctree(). This results in an
         invalid memory access due to a use-after-free, leading to a crash.
      
      When this happens, before the crash there are several warnings triggered,
      since we have reserved metadata space in a block group, the delayed refs
      reservation, etc:
      
        ------------[ cut here ]------------
        WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
        Code: f0 01 00 00 48 39 c2 75 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
        RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
        RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 48 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c6 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Code: 48 83 bb b0 03 00 00 00 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
        RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c7 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Code: ad de 49 be 22 01 00 (...)
        RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
        RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
        RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c8 ]---
        BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
        BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
        BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
      
      And the crash, which only happens when we do not have crc32c hardware
      acceleration, produces the following trace immediately after those
      warnings:
      
        stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
        Code: 54 55 53 48 89 f3 (...)
        RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
        RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
        RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
        R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
        FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
         btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
         submit_one_bio+0x61/0x70 [btrfs]
         btree_write_cache_pages+0x414/0x450 [btrfs]
         ? kobject_put+0x9a/0x1d0
         ? trace_hardirqs_on+0x1b/0xf0
         ? _raw_spin_unlock_irqrestore+0x3c/0x60
         ? free_debug_processing+0x1e1/0x2b0
         do_writepages+0x43/0xe0
         ? lock_acquired+0x199/0x490
         __writeback_single_inode+0x59/0x650
         writeback_single_inode+0xaf/0x120
         write_inode_now+0x94/0xd0
         iput+0x187/0x2b0
         close_ctree+0x2c6/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f3cfebabee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
        RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
        R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        ---[ end trace dd74718fef1ed5cc ]---
      
      Finally when we remove the btrfs module (rmmod btrfs), there are several
      warnings about objects that were allocated from our slabs but were never
      freed, consequence of the transaction that was never committed and got
      leaked:
      
        =============================================================================
        BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x0000000050cbdd61 @offset=12104
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
      	btrfs_free_tree_block+0x128/0x360 [btrfs]
      	__btrfs_cow_block+0x489/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
      	btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	commit_cowonly_roots+0xfb/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	sync_filesystem+0x74/0x90
      	generic_shutdown_super+0x22/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x0000000086e9b0ff @offset=12776
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
      	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
      	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
      	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
      	commit_cowonly_roots+0x248/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	close_ctree+0x113/0x2fa [btrfs]
      	generic_shutdown_super+0x6c/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000001a340018 @offset=4408
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
      	btrfs_free_tree_block+0x128/0x360 [btrfs]
      	__btrfs_cow_block+0x489/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
      	btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	btrfs_commit_transaction+0x60/0xc40 [btrfs]
      	create_subvol+0x56a/0x990 [btrfs]
      	btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
      	__btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
      	btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
      	btrfs_ioctl+0x1a92/0x36f0 [btrfs]
      	__x64_sys_ioctl+0x83/0xb0
      	do_syscall_64+0x33/0x80
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x000000002b46292a @offset=13648
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
      	btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
      	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
      	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	commit_cowonly_roots+0xfb/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	close_ctree+0x113/0x2fa [btrfs]
      	generic_shutdown_super+0x6c/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? __mutex_unlock_slowpath+0x45/0x2a0
         kmem_cache_destroy+0x55/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000004cf95ea8 @offset=6264
        INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
      	__slab_alloc.isra.0+0x109/0x1c0
      	kmem_cache_alloc+0x7bb/0x830
      	btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
      	alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
      	__btrfs_cow_block+0x12d/0x5f0 [btrfs]
      	btrfs_cow_block+0xf7/0x220 [btrfs]
      	btrfs_search_slot+0x62a/0xc40 [btrfs]
      	btrfs_del_orphan_item+0x65/0xd0 [btrfs]
      	btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
      	open_ctree+0x125a/0x18a0 [btrfs]
      	btrfs_mount_root.cold+0x13/0xed [btrfs]
      	legacy_get_tree+0x30/0x60
      	vfs_get_tree+0x28/0xe0
      	fc_mount+0xe/0x40
      	vfs_kern_mount.part.0+0x71/0x90
      	btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
      	kmem_cache_free+0x34c/0x3c0
      	__btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
      	btrfs_run_delayed_refs+0x81/0x210 [btrfs]
      	commit_cowonly_roots+0xfb/0x300 [btrfs]
      	btrfs_commit_transaction+0x367/0xc40 [btrfs]
      	close_ctree+0x113/0x2fa [btrfs]
      	generic_shutdown_super+0x6c/0x100
      	kill_anon_super+0x14/0x30
      	btrfs_kill_super+0x12/0x20 [btrfs]
      	deactivate_locked_super+0x31/0x70
      	cleanup_mnt+0x100/0x160
      	task_work_run+0x68/0xb0
      	exit_to_user_mode_prepare+0x1bb/0x1c0
      	syscall_exit_to_user_mode+0x4b/0x260
      	entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
      
      Fix this issue by having the remount path stop the qgroup rescan worker
      when we are remounting RO and teach the rescan worker to stop when a
      remount is in progress. If later a remount in RW mode happens, we are
      already resuming the qgroup rescan worker through the call to
      btrfs_qgroup_rescan_resume(), so we do not need to worry about that.
      Tested-by: NFabian Vogt <fvogt@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cb13eea3
  7. 10 12月, 2020 9 次提交
    • B
      btrfs: warn when remount will not change the free space tree · 2838d255
      Boris Burkov 提交于
      If the remount is ro->ro, rw->ro, or rw->rw, we will not create or
      clear the free space tree. This can be surprising, so print a warning
      to dmesg to make the failure more visible. It is also important to
      ensure that the space cache options (SPACE_CACHE, FREE_SPACE_TREE) are
      consistent, so ensure those are set to properly match the current on
      disk state (which won't be changing).
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2838d255
    • B
      btrfs: use superblock state to print space_cache mount option · 04c41559
      Boris Burkov 提交于
      To make the contents of /proc/mounts better match the actual state of
      the filesystem, base the display of the space cache mount options off
      the contents of the super block rather than the last mount options
      passed in. Since there are many scenarios where the mount will ignore a
      space cache option, simply showing the passed in option is misleading.
      
      For example, if we mount with -o remount,space_cache=v2 on a read-write
      file system without an existing free space tree, we won't build a free
      space tree, but /proc/mounts will read space_cache=v2 (until we mount
      again and it goes away)
      
      cache_generation is set iff space_cache=v1, FREE_SPACE_TREE is set iff
      space_cache=v2, and if neither is the case, we print nospace_cache.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      04c41559
    • B
      btrfs: keep sb cache_generation consistent with space_cache · 94846229
      Boris Burkov 提交于
      When mounting, btrfs uses the cache_generation in the super block to
      determine if space cache v1 is in use. However, by mounting with
      nospace_cache or space_cache=v2, it is possible to disable space cache
      v1, which does not result in un-setting cache_generation back to 0.
      
      In order to base some logic, like mount option printing in /proc/mounts,
      on the current state of the space cache rather than just the values of
      the mount option, keep the value of cache_generation consistent with the
      status of space cache v1.
      
      We ensure that cache_generation > 0 iff the file system is using
      space_cache v1. This requires committing a transaction on any mount
      which changes whether we are using v1. (v1->nospace_cache, v1->v2,
      nospace_cache->v1, v2->v1).
      
      Since the mechanism for writing out the cache generation is transaction
      commit, but we want some finer grained control over when we un-set it,
      we can't just rely on the SPACE_CACHE mount option, and introduce an
      fs_info flag that mount can use when it wants to unset the generation.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      94846229
    • B
      btrfs: clear oneshot options on mount and remount · 8cd29088
      Boris Burkov 提交于
      Some options only apply during mount time and are cleared at the end
      of mount. For now, the example is USEBACKUPROOT, but CLEAR_CACHE also
      fits the bill, and this is a preparation patch for also clearing that
      option.
      
      One subtlety is that the current code only resets USEBACKUPROOT on rw
      mounts, but the option is meaningfully "consumed" by a ro mount, so it
      feels appropriate to clear in that case as well. A subsequent read-write
      remount would not go through open_ctree, which is the only place that
      checks the option, so the change should be benign.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8cd29088
    • B
      btrfs: lift read-write mount setup from mount and remount · 44c0ca21
      Boris Burkov 提交于
      Mounting rw and remounting from ro to rw naturally share invariants and
      functionality which result in a correctly setup rw filesystem. Luckily,
      there is even a strong unity in the code which implements them. In
      mount's open_ctree, these operations mostly happen after an early return
      for ro file systems, and in remount, they happen in a section devoted to
      remounting ro->rw, after some remount specific validation passes.
      
      However, there are unfortunately a few differences. There are small
      deviations in the order of some of the operations, remount does not
      start orphan cleanup in root_tree or fs_tree, remount does not create
      the free space tree, and remount does not handle "one-shot" mount
      options like clear_cache and uuid tree rescan.
      
      Since we want to add building the free space tree to remount, and also
      to start the same orphan cleanup process on a filesystem mounted as ro
      then remounted rw, we would benefit from unifying the logic between the
      two code paths.
      
      This patch only lifts the existing common functionality, and leaves a
      natural path for fixing the discrepancies.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      44c0ca21
    • N
      btrfs: remove inode number cache feature · 5297199a
      Nikolay Borisov 提交于
      It's been deprecated since commit b547a88e ("btrfs: start
      deprecation of mount option inode_cache") which enumerates the reasons.
      
      A filesystem that uses the feature (mount -o inode_cache) tracks the
      inode numbers in bitmaps, that data stay on the filesystem after this
      patch. The size is roughly 5MiB for 1M inodes [1], which is considered
      small enough to be left there. Removal of the change can be implemented
      in btrfs-progs if needed.
      
      [1] https://lore.kernel.org/linux-btrfs/20201127145836.GZ6430@twin.jikos.cz/Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5297199a
    • N
      btrfs: disallow space_cache in ZONED mode · 5d1ab66c
      Naohiro Aota 提交于
      As updates to the space cache v1 are in-place, the space cache cannot be
      located over sequential zones and there is no guarantees that the device
      will have enough conventional zones to store this cache. Resolve this
      problem by disabling completely the space cache v1.  This does not
      introduce any problems with sequential block groups: all the free space
      is located after the allocation pointer and no free space before the
      pointer.  There is no need to have such cache.
      
      Note: we can technically use free-space-tree (space cache v2) on ZONED
      mode. But, since ZONED mode now always allocates extents in a block
      group sequentially regardless of underlying device zone type, it's no
      use to enable and maintain the tree.
      
      For the same reason, NODATACOW is also disabled.
      
      In summary, ZONED will disable:
      
      | Disabled features | Reason                                              |
      |-------------------+-----------------------------------------------------|
      | RAID/DUP          | Cannot handle two zone append writes to different   |
      |                   | zones                                               |
      |-------------------+-----------------------------------------------------|
      | space_cache (v1)  | In-place updating                                   |
      | NODATACOW         | In-place updating                                   |
      |-------------------+-----------------------------------------------------|
      | fallocate         | Reserved extent will be a write hole                |
      |-------------------+-----------------------------------------------------|
      | MIXED_BG          | Allocated metadata region will be write holes for   |
      |                   | data writes                                         |
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5d1ab66c
    • N
      btrfs: check and enable ZONED mode · b70f5097
      Naohiro Aota 提交于
      Introduce function btrfs_check_zoned_mode() to check if ZONED flag is
      enabled on the file system and if the file system consists of zoned
      devices with equal zone size.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b70f5097
    • N
      btrfs: get zone information of zoned block devices · 5b316468
      Naohiro Aota 提交于
      If a zoned block device is found, get its zone information (number of
      zones and zone size).  To avoid costly run-time zone report
      commands to test the device zones type during block allocation, attach
      the seq_zones bitmap to the device structure to indicate if a zone is
      sequential or accept random writes. Also it attaches the empty_zones
      bitmap to indicate if a zone is empty or not.
      
      This patch also introduces the helper function btrfs_dev_is_sequential()
      to test if the zone storing a block is a sequential write required zone
      and btrfs_dev_is_empty_zone() to test if the zone is a empty zone.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5b316468
  8. 08 12月, 2020 10 次提交
  9. 07 10月, 2020 2 次提交
  10. 20 8月, 2020 1 次提交
    • M
      btrfs: reset compression level for lzo on remount · 282dd7d7
      Marcos Paulo de Souza 提交于
      Currently a user can set mount "-o compress" which will set the
      compression algorithm to zlib, and use the default compress level for
      zlib (3):
      
        relatime,compress=zlib:3,space_cache
      
      If the user remounts the fs using "-o compress=lzo", then the old
      compress_level is used:
      
        relatime,compress=lzo:3,space_cache
      
      But lzo does not expose any tunable compression level. The same happens
      if we set any compress argument with different level, also with zstd.
      
      Fix this by resetting the compress_level when compress=lzo is
      specified.  With the fix applied, lzo is shown without compress level:
      
        relatime,compress=lzo,space_cache
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      282dd7d7
  11. 11 8月, 2020 3 次提交
    • J
      btrfs: make sure SB_I_VERSION doesn't get unset by remount · faa00889
      Josef Bacik 提交于
      There's some inconsistency around SB_I_VERSION handling with mount and
      remount.  Since we don't really want it to be off ever just work around
      this by making sure we don't get the flag cleared on remount.
      
      There's a tiny cpu cost of setting the bit, otherwise all changes to
      i_version also change some of the times (ctime/mtime) so the inode needs
      to be synced. We wouldn't save anything by disabling it.
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add perf impact analysis ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      faa00889
    • J
      btrfs: don't show full path of bind mounts in subvol= · 3ef3959b
      Josef Bacik 提交于
      Chris Murphy reported a problem where rpm ostree will bind mount a bunch
      of things for whatever voodoo it's doing.  But when it does this
      /proc/mounts shows something like
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo/bar 0 0
      
      Despite subvolid=256 being subvol=/foo.  This is because we're just
      spitting out the dentry of the mount point, which in the case of bind
      mounts is the source path for the mountpoint.  Instead we should spit
      out the path to the actual subvol.  Fix this by looking up the name for
      the subvolid we have mounted.  With this fix the same test looks like
      this
      
        /dev/sda /mnt/test btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
        /dev/sda /mnt/test/baz btrfs rw,relatime,subvolid=256,subvol=/foo 0 0
      Reported-by: NChris Murphy <chris@colorremedies.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3ef3959b
    • D
      btrfs: fix messages after changing compression level by remount · 27942c99
      David Sterba 提交于
      Reported by Forza on IRC that remounting with compression options does
      not reflect the change in level, or at least it does not appear to do so
      according to the messages:
      
        mount -o compress=zstd:1 /dev/sda /mnt
        mount -o remount,compress=zstd:15 /mnt
      
      does not print the change to the level to syslog:
      
        [   41.366060] BTRFS info (device vda): use zstd compression, level 1
        [   41.368254] BTRFS info (device vda): disk space caching is enabled
        [   41.390429] BTRFS info (device vda): disk space caching is enabled
      
      What really happens is that the message is lost but the level is actualy
      changed.
      
      There's another weird output, if compression is reset to 'no':
      
        [   45.413776] BTRFS info (device vda): use no compression, level 4
      
      To fix that, save the previous compression level and print the message
      in that case too and use separate message for 'no' compression.
      
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      27942c99
  12. 27 7月, 2020 6 次提交
    • J
      btrfs: open-code remount flag setting in btrfs_remount · 88c4703f
      Johannes Thumshirn 提交于
      When we're (re)mounting a btrfs filesystem we set the
      BTRFS_FS_STATE_REMOUNTING state in fs_info to serialize against async
      reclaim or defrags.
      
      This flag is set in btrfs_remount_prepare() called by btrfs_remount().
      As btrfs_remount_prepare() does nothing but setting this flag and
      doesn't have a second caller, we can just open-code the flag setting in
      btrfs_remount().
      
      Similarly do for so clearing of the flag by moving it out of
      btrfs_remount_cleanup() into btrfs_remount() to be symmetrical.
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      88c4703f
    • J
      btrfs: document special case error codes for fs errors · 59131393
      Josef Bacik 提交于
      We've had some discussions about what to do in certain scenarios for
      error codes, specifically EUCLEAN and EROFS.  Document these near the
      error handling code so its clear what their intentions are.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      59131393
    • A
      btrfs: don't traverse into the seed devices in show_devname · 4faf55b0
      Anand Jain 提交于
      ->show_devname currently shows the lowest devid in the list. As the seed
      devices have the lowest devid in the sprouted filesystem, the userland
      tool such as findmnt end up seeing seed device instead of the device from
      the read-writable sprouted filesystem. As shown below.
      
       mount /dev/sda /btrfs
       mount: /btrfs: WARNING: device write-protected, mounted read-only.
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111
      
       btrfs dev add -f /dev/sdb /btrfs
      
       umount /btrfs
       mount /dev/sdb /btrfs
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111
      
      All sprouts from a single seed will show the same seed device and the
      same fsid. That's confusing.
      This is causing problems in our prototype as there isn't any reference
      to the sprout file-system(s) which is being used for actual read and
      write.
      
      This was added in the patch which implemented the show_devname in btrfs
      commit 9c5085c1 ("Btrfs: implement ->show_devname").
      I tried to look for any particular reason that we need to show the seed
      device, there isn't any.
      
      So instead, do not traverse through the seed devices, just show the
      lowest devid in the sprouted fsid.
      
      After the patch:
      
       mount /dev/sda /btrfs
       mount: /btrfs: WARNING: device write-protected, mounted read-only.
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sda /btrfs 899f7027-3e46-4626-93e7-7d4c9ad19111
      
       btrfs dev add -f /dev/sdb /btrfs
       mount -o rw,remount /dev/sdb /btrfs
      
       findmnt --output SOURCE,TARGET,UUID /btrfs
       SOURCE   TARGET UUID
       /dev/sdb /btrfs 595ca0e6-b82e-46b5-b9e2-c72a6928be48
      
       mount /dev/sda /btrfs1
       mount: /btrfs1: WARNING: device write-protected, mounted read-only.
      
       btrfs dev add -f /dev/sdc /btrfs1
      
       findmnt --output SOURCE,TARGET,UUID /btrfs1
       SOURCE   TARGET  UUID
       /dev/sdc /btrfs1 ca1dbb7a-8446-4f95-853c-a20f3f82bdbb
      
       cat /proc/self/mounts | grep btrfs
       /dev/sdb /btrfs btrfs rw,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0
       /dev/sdc /btrfs1 btrfs ro,relatime,noacl,space_cache,subvolid=5,subvol=/ 0 0
      Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
      CC: stable@vger.kernel.org # 4.19+
      Tested-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4faf55b0
    • D
      btrfs: remove deprecated mount option subvolrootid · b90a4ab6
      David Sterba 提交于
      The option subvolrootid used to be a workaround for mounting subvolumes
      and ineffective since 5e2a4b25 ("btrfs: deprecate subvolrootid mount
      option"). We have subvol= that works and we don't need to keep the
      cruft, let's remove it.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b90a4ab6
    • D
      btrfs: remove deprecated mount option alloc_start · d801e7a3
      David Sterba 提交于
      The mount option alloc_start has no effect since 0d0c71b3 ("btrfs:
      obsolete and remove mount option alloc_start") which has details why
      it's been deprecated. We can remove it.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d801e7a3
    • D
      btrfs: start deprecation of mount option inode_cache · b547a88e
      David Sterba 提交于
      Estimated time of removal of the functionality is 5.11, the option will
      be still parsed but will have no effect.
      
      Reasons for deprecation and removal:
      
      - very poor naming choice of the mount option, it's supposed to cache
        and reuse the inode _numbers_, but it sounds a some generic cache for
        inodes
      
      - the only known usecase where this option would make sense is on a
        32bit architecture where inode numbers in one subvolume would be
        exhausted due to 32bit inode::i_ino
      
      - the cache is stored on disk, consumes space, needs to be loaded and
        written back
      
      - new inode number allocation is slower due to lookups into the cache
        (compared to a simple increment which is the default)
      
      - uses the free-space-cache code that is going to be deprecated as well
        in the future
      
      Known problems:
      
      - since 2011, returning EEXIST when there's not enough space in a page
        to store all checksums, see commit 4b9465cb ("Btrfs: add mount -o
        inode_cache")
      
      Remaining issues:
      
      - if the option was enabled, new inodes created, the option disabled
        again, the cache is still stored on the devices and there's currently
        no way to remove it
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b547a88e