1. 21 6月, 2021 2 次提交
  2. 04 6月, 2021 1 次提交
  3. 21 4月, 2021 2 次提交
  4. 19 4月, 2021 2 次提交
  5. 18 3月, 2021 2 次提交
    • F
      btrfs: fix subvolume/snapshot deletion not triggered on mount · 8d488a8c
      Filipe Manana 提交于
      During the mount procedure we are calling btrfs_orphan_cleanup() against
      the root tree, which will find all orphans items in this tree. When an
      orphan item corresponds to a deleted subvolume/snapshot (instead of an
      inode space cache), it must not delete the orphan item, because that will
      cause btrfs_find_orphan_roots() to not find the orphan item and therefore
      not add the corresponding subvolume root to the list of dead roots, which
      results in the subvolume's tree never being deleted by the cleanup thread.
      
      The same applies to the remount from RO to RW path.
      
      Fix this by making btrfs_find_orphan_roots() run before calling
      btrfs_orphan_cleanup() against the root tree.
      
      A test case for fstests will follow soon.
      Reported-by: NRobbie Ko <robbieko@synology.com>
      Link: https://lore.kernel.org/linux-btrfs/b19f4310-35e0-606e-1eea-2dd84d28c5da@synology.com/
      Fixes: 638331fa ("btrfs: fix transaction leak and crash after cleaning up orphans on RO mount")
      CC: stable@vger.kernel.org # 5.11+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8d488a8c
    • J
      btrfs: initialize device::fs_info always · 820a49da
      Josef Bacik 提交于
      Neal reported a panic trying to use -o rescue=all
      
        BUG: kernel NULL pointer dereference, address: 0000000000000030
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP NOPTI
        CPU: 0 PID: 696 Comm: mount Tainted: G        W         5.12.0-rc2+ #296
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
        RIP: 0010:btrfs_device_init_dev_stats+0x1d/0x200
        RSP: 0018:ffffafaec1483bb8 EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff9a5715bcb298 RCX: 0000000000000070
        RDX: ffff9a5703248000 RSI: ffff9a57052ea150 RDI: ffff9a5715bca400
        RBP: ffff9a57052ea150 R08: 0000000000000070 R09: ffff9a57052ea150
        R10: 000130faf0741c10 R11: 0000000000000000 R12: ffff9a5703700000
        R13: 0000000000000000 R14: ffff9a5715bcb278 R15: ffff9a57052ea150
        FS:  00007f600d122c40(0000) GS:ffff9a577bc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000030 CR3: 0000000112a46005 CR4: 0000000000370ef0
        Call Trace:
         ? btrfs_init_dev_stats+0x1f/0xf0
         ? kmem_cache_alloc+0xef/0x1f0
         btrfs_init_dev_stats+0x5f/0xf0
         open_ctree+0x10cb/0x1720
         btrfs_mount_root.cold+0x12/0xea
         legacy_get_tree+0x27/0x40
         vfs_get_tree+0x25/0xb0
         vfs_kern_mount.part.0+0x71/0xb0
         btrfs_mount+0x10d/0x380
         legacy_get_tree+0x27/0x40
         vfs_get_tree+0x25/0xb0
         path_mount+0x433/0xa00
         __x64_sys_mount+0xe3/0x120
         do_syscall_64+0x33/0x40
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      This happens because when we call btrfs_init_dev_stats we do
      device->fs_info->dev_root.  However device->fs_info isn't initialized
      because we were only calling btrfs_init_devices_late() if we properly
      read the device root.  However we don't actually need the device root to
      init the devices, this function simply assigns the devices their
      ->fs_info pointer properly, so this needs to be done unconditionally
      always so that we can properly dereference device->fs_info in rescue
      cases.
      Reported-by: NNeal Gompa <ngompa13@gmail.com>
      CC: stable@vger.kernel.org # 5.11+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      820a49da
  6. 12 2月, 2021 1 次提交
    • S
      btrfs: initialize fs_info::csum_size earlier in open_ctree · 83c68bbc
      Su Yue 提交于
      User reported that btrfs-progs misc-tests/028-superblock-recover fails:
      
            [TEST/misc]   028-superblock-recover
        unexpected success: mounted fs with corrupted superblock
        test failed for case 028-superblock-recover
      
      The test case expects that a broken image with bad superblock will be
      rejected to be mounted. However, the test image just passed csum check
      of superblock and was successfully mounted.
      
      Commit 55fc29be ("btrfs: use cached value of fs_info::csum_size
      everywhere") replaces all calls to btrfs_super_csum_size by
      fs_info::csum_size. The calls include the place where fs_info->csum_size
      is not initialized. So btrfs_check_super_csum() passes because memcmp()
      with len 0 always returns 0.
      
      Fix it by caching csum size in btrfs_fs_info::csum_size once we know the
      csum type in superblock is valid in open_ctree().
      
      Link: https://github.com/kdave/btrfs-progs/issues/250
      Fixes: 55fc29be ("btrfs: use cached value of fs_info::csum_size everywhere")
      Signed-off-by: NSu Yue <l@damenly.su>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      83c68bbc
  7. 09 2月, 2021 17 次提交
  8. 08 1月, 2021 1 次提交
  9. 18 12月, 2020 3 次提交
    • F
      btrfs: add assertion for empty list of transactions at late stage of umount · 0a31daa4
      Filipe Manana 提交于
      Add an assertion to close_ctree(), after destroying all the work queues,
      to verify we do not have any transaction still open or committing at that
      at that point. If we have any, it means something is seriously wrong and
      that can cause memory leaks and use-after-free problems. This is motivated
      by the previous patches that fixed bugs where we ended up leaking an open
      transaction after unmounting the filesystem.
      Tested-by: NFabian Vogt <fvogt@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0a31daa4
    • F
      btrfs: fix race between RO remount and the cleaner task · a0a1db70
      Filipe Manana 提交于
      When we are remounting a filesystem in RO mode we can race with the cleaner
      task and result in leaking a transaction if the filesystem is unmounted
      shortly after, before the transaction kthread had a chance to commit that
      transaction. That also results in a crash during unmount, due to a
      use-after-free, if hardware acceleration is not available for crc32c.
      
      The following sequence of steps explains how the race happens.
      
      1) The filesystem is mounted in RW mode and the cleaner task is running.
         This means that currently BTRFS_FS_CLEANER_RUNNING is set at
         fs_info->flags;
      
      2) The cleaner task is currently running delayed iputs for example;
      
      3) A filesystem RO remount operation starts;
      
      4) The RO remount task calls btrfs_commit_super(), which commits any
         currently open transaction, and it finishes;
      
      5) At this point the cleaner task is still running and it creates a new
         transaction by doing one of the following things:
      
         * When running the delayed iput() for an inode with a 0 link count,
           in which case at btrfs_evict_inode() we start a transaction through
           the call to evict_refill_and_join(), use it and then release its
           handle through btrfs_end_transaction();
      
         * When deleting a dead root through btrfs_clean_one_deleted_snapshot(),
           a transaction is started at btrfs_drop_snapshot() and then its handle
           is released through a call to btrfs_end_transaction_throttle();
      
         * When the remount task was still running, and before the remount task
           called btrfs_delete_unused_bgs(), the cleaner task also called
           btrfs_delete_unused_bgs() and it picked and removed one block group
           from the list of unused block groups. Before the cleaner task started
           a transaction, through btrfs_start_trans_remove_block_group() at
           btrfs_delete_unused_bgs(), the remount task had already called
           btrfs_commit_super();
      
      6) So at this point the filesystem is in RO mode and we have an open
         transaction that was started by the cleaner task;
      
      7) Shortly after a filesystem unmount operation starts. At close_ctree()
         we stop the transaction kthread before it had a chance to commit the
         transaction, since less than 30 seconds (the default commit interval)
         have elapsed since the last transaction was committed;
      
      8) We end up calling iput() against the btree inode at close_ctree() while
         there is an open transaction, and since that transaction was used to
         update btrees by the cleaner, we have dirty pages in the btree inode
         due to COW operations on metadata extents, and therefore writeback is
         triggered for the btree inode.
      
         So btree_write_cache_pages() is invoked to flush those dirty pages
         during the final iput() on the btree inode. This results in creating a
         bio and submitting it, which makes us end up at
         btrfs_submit_metadata_bio();
      
      9) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
         that calls btrfs_wq_submit_bio(), because check_async_write() returned
         a value of 1. This value of 1 is because we did not have hardware
         acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
         set in fs_info->flags;
      
      10) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
          workqueue at fs_info->workers, which was already freed before by the
          call to btrfs_stop_all_workers() at close_ctree(). This results in an
          invalid memory access due to a use-after-free, leading to a crash.
      
      When this happens, before the crash there are several warnings triggered,
      since we have reserved metadata space in a block group, the delayed refs
      reservation, etc:
      
        ------------[ cut here ]------------
        WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
        Code: f0 01 00 00 48 39 c2 75 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
        RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
        RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 48 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c6 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
        Code: 48 83 bb b0 03 00 00 00 (...)
        RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
        RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
        RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c7 ]---
        ------------[ cut here ]------------
        WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
        Code: ad de 49 be 22 01 00 (...)
        RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
        RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
        RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
        R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
        FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         close_ctree+0x2ba/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f15ee221ee7
        Code: ff 0b 00 f7 d8 64 89 (...)
        RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
        RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
        R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        ---[ end trace dd74718fef1ed5c8 ]---
        BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
        BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
        BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
        BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
      
      And the crash, which only happens when we do not have crc32c hardware
      acceleration, produces the following trace immediately after those
      warnings:
      
        stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
        Code: 54 55 53 48 89 f3 (...)
        RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
        RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
        RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
        R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
        FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
         btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
         submit_one_bio+0x61/0x70 [btrfs]
         btree_write_cache_pages+0x414/0x450 [btrfs]
         ? kobject_put+0x9a/0x1d0
         ? trace_hardirqs_on+0x1b/0xf0
         ? _raw_spin_unlock_irqrestore+0x3c/0x60
         ? free_debug_processing+0x1e1/0x2b0
         do_writepages+0x43/0xe0
         ? lock_acquired+0x199/0x490
         __writeback_single_inode+0x59/0x650
         writeback_single_inode+0xaf/0x120
         write_inode_now+0x94/0xd0
         iput+0x187/0x2b0
         close_ctree+0x2c6/0x2fa [btrfs]
         generic_shutdown_super+0x6c/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0x100/0x160
         task_work_run+0x68/0xb0
         exit_to_user_mode_prepare+0x1bb/0x1c0
         syscall_exit_to_user_mode+0x4b/0x260
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f3cfebabee7
        Code: ff 0b 00 f7 d8 64 89 01 (...)
        RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
        RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
        RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
        R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
        R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
        Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
        ---[ end trace dd74718fef1ed5cc ]---
      
      Finally when we remove the btrfs module (rmmod btrfs), there are several
      warnings about objects that were allocated from our slabs but were never
      freed, consequence of the transaction that was never committed and got
      leaked:
      
        =============================================================================
        BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x0000000050cbdd61 @offset=12104
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
              btrfs_free_tree_block+0x128/0x360 [btrfs]
              __btrfs_cow_block+0x489/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              sync_filesystem+0x74/0x90
              generic_shutdown_super+0x22/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x0000000086e9b0ff @offset=12776
        INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
              btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
              commit_cowonly_roots+0x248/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
      
        INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? lock_release+0x20e/0x4c0
         kmem_cache_destroy+0x55/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000001a340018 @offset=4408
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
              btrfs_free_tree_block+0x128/0x360 [btrfs]
              __btrfs_cow_block+0x489/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              btrfs_commit_transaction+0x60/0xc40 [btrfs]
              create_subvol+0x56a/0x990 [btrfs]
              btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
              __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
              btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
              btrfs_ioctl+0x1a92/0x36f0 [btrfs]
              __x64_sys_ioctl+0x83/0xb0
              do_syscall_64+0x33/0x80
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        INFO: Object 0x000000002b46292a @offset=13648
        INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
              btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
        INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        =============================================================================
        BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
        INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
        CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         slab_err+0xb7/0xdc
         ? lock_acquired+0x199/0x490
         __kmem_cache_shutdown+0x1ac/0x3c0
         ? __mutex_unlock_slowpath+0x45/0x2a0
         kmem_cache_destroy+0x55/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 f5 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        INFO: Object 0x000000004cf95ea8 @offset=6264
        INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
        INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
        kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
        CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
        Call Trace:
         dump_stack+0x8d/0xb5
         kmem_cache_destroy+0x119/0x120
         exit_btrfs_fs+0xa/0x59 [btrfs]
         __x64_sys_delete_module+0x194/0x260
         ? fpregs_assert_state_consistent+0x1e/0x40
         ? exit_to_user_mode_prepare+0x55/0x1c0
         ? trace_hardirqs_on+0x1b/0xf0
         do_syscall_64+0x33/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f693e305897
        Code: 73 01 c3 48 8b 0d f9 (...)
        RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
        RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
        RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
        RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
        R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
        R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
        BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
      
      So fix this by making the remount path to wait for the cleaner task before
      calling btrfs_commit_super(). The remount path now waits for the bit
      BTRFS_FS_CLEANER_RUNNING to be cleared from fs_info->flags before calling
      btrfs_commit_super() and this ensures the cleaner can not start a
      transaction after that, because it sleeps when the filesystem is in RO
      mode and we have already flagged the filesystem as RO before waiting for
      BTRFS_FS_CLEANER_RUNNING to be cleared.
      
      This also introduces a new flag BTRFS_FS_STATE_RO to be used for
      fs_info->fs_state when the filesystem is in RO mode. This is because we
      were doing the RO check using the flags of the superblock and setting the
      RO mode simply by ORing into the superblock's flags - those operations are
      not atomic and could result in the cleaner not seeing the update from the
      remount task after it clears BTRFS_FS_CLEANER_RUNNING.
      Tested-by: NFabian Vogt <fvogt@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a0a1db70
    • F
      btrfs: fix transaction leak and crash after cleaning up orphans on RO mount · 638331fa
      Filipe Manana 提交于
      When we delete a root (subvolume or snapshot), at the very end of the
      operation, we attempt to remove the root's orphan item from the root tree,
      at btrfs_drop_snapshot(), by calling btrfs_del_orphan_item(). We ignore any
      error from btrfs_del_orphan_item() since it is not a serious problem and
      the next time the filesystem is mounted we remove such stray orphan items
      at btrfs_find_orphan_roots().
      
      However if the filesystem is mounted RO and we have stray orphan items for
      any previously deleted root, we can end up leaking a transaction and other
      data structures when unmounting the filesystem, as well as crashing if we
      do not have hardware acceleration for crc32c available.
      
      The steps that lead to the transaction leak are the following:
      
      1) The filesystem is mounted in RW mode;
      
      2) A subvolume is deleted;
      
      3) When the cleaner kthread runs btrfs_drop_snapshot() to delete the root,
         it gets a failure at btrfs_del_orphan_item(), which is ignored, due to
         an ENOMEM when allocating a path for example. So the orphan item for
         the root remains in the root tree;
      
      4) The filesystem is unmounted;
      
      5) The filesystem is mounted RO (-o ro). During the mount path we call
         btrfs_find_orphan_roots(), which iterates the root tree searching for
         orphan items. It finds the orphan item for our deleted root, and since
         it can not find the root, it starts a transaction to delete the orphan
         item (by calling btrfs_del_orphan_item());
      
      6) The RO mount completes;
      
      7) Before the transaction kthread commits the transaction created for
         deleting the orphan item (i.e. less than 30 seconds elapsed since the
         mount, the default commit interval), a filesystem unmount operation is
         started;
      
      8) At close_ctree(), we stop the transaction kthread, but we still have a
         transaction open with at least one dirty extent buffer, a leaf for the
         tree root which was COWed when deleting the orphan item;
      
      9) We then proceed to destroy the work queues, free the roots and block
         groups, etc. After that we drop the last reference on the btree inode by
         calling iput() on it. Since there are dirty pages for the btree inode,
         corresponding to the COWed extent buffer, btree_write_cache_pages() is
         invoked to flush those dirty pages. This results in creating a bio and
         submitting it, which makes us end up at btrfs_submit_metadata_bio();
      
      10) At btrfs_submit_metadata_bio() we end up at the if-then-else branch
          that calls btrfs_wq_submit_bio(), because check_async_write() returned
          a value of 1. This value of 1 is because we did not have hardware
          acceleration available for crc32c, so BTRFS_FS_CSUM_IMPL_FAST was not
          set in fs_info->flags;
      
      11) Then at btrfs_wq_submit_bio() we call btrfs_queue_work() against the
          workqueue at fs_info->workers, which was already freed before by the
          call to btrfs_stop_all_workers() at close_ctree(). This results in an
          invalid memory access due to a use-after-free, leading to a crash.
      
      When this happens, before the crash there are several warnings triggered,
      since we have reserved metadata space in a block group, the delayed refs
      reservation, etc:
      
       ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 1729896 at fs/btrfs/block-group.c:125 btrfs_put_block_group+0x63/0xa0 [btrfs]
       Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
       CPU: 4 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_put_block_group+0x63/0xa0 [btrfs]
       Code: f0 01 00 00 48 39 c2 75 (...)
       RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
       RAX: 0000000000000001 RBX: ffff947ed73e4000 RCX: ffff947ebc8b29c8
       RDX: 0000000000000001 RSI: ffffffffc0b150a0 RDI: ffff947ebc8b2800
       RBP: ffff947ebc8b2800 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
       R13: ffff947ed73e4160 R14: ffff947ebc8b2988 R15: dead000000000100
       FS:  00007f15edfea840(0000) GS:ffff9481ad600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f37e2893320 CR3: 0000000138f68001 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        btrfs_free_block_groups+0x17f/0x2f0 [btrfs]
        close_ctree+0x2ba/0x2fa [btrfs]
        generic_shutdown_super+0x6c/0x100
        kill_anon_super+0x14/0x30
        btrfs_kill_super+0x12/0x20 [btrfs]
        deactivate_locked_super+0x31/0x70
        cleanup_mnt+0x100/0x160
        task_work_run+0x68/0xb0
        exit_to_user_mode_prepare+0x1bb/0x1c0
        syscall_exit_to_user_mode+0x4b/0x260
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f15ee221ee7
       Code: ff 0b 00 f7 d8 64 89 01 48 (...)
       RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
       RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
       RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
       R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
       R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
       irq event stamp: 0
       hardirqs last  enabled at (0): [<0000000000000000>] 0x0
       hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
       softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
       softirqs last disabled at (0): [<0000000000000000>] 0x0
       ---[ end trace dd74718fef1ed5c6 ]---
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-rsv.c:459 btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
       Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
       CPU: 2 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_release_global_block_rsv+0x70/0xc0 [btrfs]
       Code: 48 83 bb b0 03 00 00 00 (...)
       RSP: 0018:ffffb270826bbdd8 EFLAGS: 00010206
       RAX: 000000000033c000 RBX: ffff947ed73e4000 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: ffffffffc0b0d8c1 RDI: 00000000ffffffff
       RBP: ffff947ebc8b7000 R08: 0000000000000001 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ed73e4110
       R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
       FS:  00007f15edfea840(0000) GS:ffff9481aca00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000561a79f76e20 CR3: 0000000138f68006 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        btrfs_free_block_groups+0x24c/0x2f0 [btrfs]
        close_ctree+0x2ba/0x2fa [btrfs]
        generic_shutdown_super+0x6c/0x100
        kill_anon_super+0x14/0x30
        btrfs_kill_super+0x12/0x20 [btrfs]
        deactivate_locked_super+0x31/0x70
        cleanup_mnt+0x100/0x160
        task_work_run+0x68/0xb0
        exit_to_user_mode_prepare+0x1bb/0x1c0
        syscall_exit_to_user_mode+0x4b/0x260
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f15ee221ee7
       Code: ff 0b 00 f7 d8 64 89 01 (...)
       RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
       RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
       RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
       R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
       R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
       irq event stamp: 0
       hardirqs last  enabled at (0): [<0000000000000000>] 0x0
       hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
       softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
       softirqs last disabled at (0): [<0000000000000000>] 0x0
       ---[ end trace dd74718fef1ed5c7 ]---
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 1729896 at fs/btrfs/block-group.c:3377 btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
       Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
       CPU: 5 PID: 1729896 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_free_block_groups+0x25d/0x2f0 [btrfs]
       Code: ad de 49 be 22 01 00 (...)
       RSP: 0018:ffffb270826bbde8 EFLAGS: 00010206
       RAX: ffff947ebeae1d08 RBX: ffff947ed73e4000 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: ffff947e9d823ae8 RDI: 0000000000000246
       RBP: ffff947ebeae1d08 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000001 R12: ffff947ebeae1c00
       R13: ffff947ed73e5278 R14: dead000000000122 R15: dead000000000100
       FS:  00007f15edfea840(0000) GS:ffff9481ad200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f1475d98ea8 CR3: 0000000138f68005 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        close_ctree+0x2ba/0x2fa [btrfs]
        generic_shutdown_super+0x6c/0x100
        kill_anon_super+0x14/0x30
        btrfs_kill_super+0x12/0x20 [btrfs]
        deactivate_locked_super+0x31/0x70
        cleanup_mnt+0x100/0x160
        task_work_run+0x68/0xb0
        exit_to_user_mode_prepare+0x1bb/0x1c0
        syscall_exit_to_user_mode+0x4b/0x260
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f15ee221ee7
       Code: ff 0b 00 f7 d8 64 89 (...)
       RSP: 002b:00007ffe9470f0f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       RAX: 0000000000000000 RBX: 00007f15ee347264 RCX: 00007f15ee221ee7
       RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000056169701d000
       RBP: 0000561697018a30 R08: 0000000000000000 R09: 00007f15ee2e2be0
       R10: 000056169701efe0 R11: 0000000000000246 R12: 0000000000000000
       R13: 000056169701d000 R14: 0000561697018b40 R15: 0000561697018c60
       irq event stamp: 0
       hardirqs last  enabled at (0): [<0000000000000000>] 0x0
       hardirqs last disabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
       softirqs last  enabled at (0): [<ffffffff8bcae560>] copy_process+0x8a0/0x1d70
       softirqs last disabled at (0): [<0000000000000000>] 0x0
       ---[ end trace dd74718fef1ed5c8 ]---
       BTRFS info (device sdc): space_info 4 has 268238848 free, is not full
       BTRFS info (device sdc): space_info total=268435456, used=114688, pinned=0, reserved=16384, may_use=0, readonly=65536
       BTRFS info (device sdc): global_block_rsv: size 0 reserved 0
       BTRFS info (device sdc): trans_block_rsv: size 0 reserved 0
       BTRFS info (device sdc): chunk_block_rsv: size 0 reserved 0
       BTRFS info (device sdc): delayed_block_rsv: size 0 reserved 0
       BTRFS info (device sdc): delayed_refs_rsv: size 524288 reserved 0
      
      And the crash, which only happens when we do not have crc32c hardware
      acceleration, produces the following trace immediately after those
      warnings:
      
       stack segment: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
       CPU: 2 PID: 1749129 Comm: umount Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_queue_work+0x36/0x190 [btrfs]
       Code: 54 55 53 48 89 f3 (...)
       RSP: 0018:ffffb27082443ae8 EFLAGS: 00010282
       RAX: 0000000000000004 RBX: ffff94810ee9ad90 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: ffff94810ee9ad90 RDI: ffff947ed8ee75a0
       RBP: a56b6b6b6b6b6b6b R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000007 R11: 0000000000000001 R12: ffff947fa9b435a8
       R13: ffff94810ee9ad90 R14: 0000000000000000 R15: ffff947e93dc0000
       FS:  00007f3cfe974840(0000) GS:ffff9481ac600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f1b42995a70 CR3: 0000000127638003 CR4: 00000000003706e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        btrfs_wq_submit_bio+0xb3/0xd0 [btrfs]
        btrfs_submit_metadata_bio+0x44/0xc0 [btrfs]
        submit_one_bio+0x61/0x70 [btrfs]
        btree_write_cache_pages+0x414/0x450 [btrfs]
        ? kobject_put+0x9a/0x1d0
        ? trace_hardirqs_on+0x1b/0xf0
        ? _raw_spin_unlock_irqrestore+0x3c/0x60
        ? free_debug_processing+0x1e1/0x2b0
        do_writepages+0x43/0xe0
        ? lock_acquired+0x199/0x490
        __writeback_single_inode+0x59/0x650
        writeback_single_inode+0xaf/0x120
        write_inode_now+0x94/0xd0
        iput+0x187/0x2b0
        close_ctree+0x2c6/0x2fa [btrfs]
        generic_shutdown_super+0x6c/0x100
        kill_anon_super+0x14/0x30
        btrfs_kill_super+0x12/0x20 [btrfs]
        deactivate_locked_super+0x31/0x70
        cleanup_mnt+0x100/0x160
        task_work_run+0x68/0xb0
        exit_to_user_mode_prepare+0x1bb/0x1c0
        syscall_exit_to_user_mode+0x4b/0x260
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f3cfebabee7
       Code: ff 0b 00 f7 d8 64 89 01 (...)
       RSP: 002b:00007ffc9c9a05f8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
       RAX: 0000000000000000 RBX: 00007f3cfecd1264 RCX: 00007f3cfebabee7
       RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000562b6b478000
       RBP: 0000562b6b473a30 R08: 0000000000000000 R09: 00007f3cfec6cbe0
       R10: 0000562b6b479fe0 R11: 0000000000000246 R12: 0000000000000000
       R13: 0000562b6b478000 R14: 0000562b6b473b40 R15: 0000562b6b473c60
       Modules linked in: btrfs dm_snapshot dm_thin_pool (...)
       ---[ end trace dd74718fef1ed5cc ]---
      
      Finally when we remove the btrfs module (rmmod btrfs), there are several
      warnings about objects that were allocated from our slabs but were never
      freed, consequence of the transaction that was never committed and got
      leaked:
       =============================================================================
       BUG btrfs_delayed_ref_head (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_ref_head on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0x0000000094c2ae56 objects=24 used=2 fp=0x000000002bfa2521 flags=0x17fffc000010200
       CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x8d/0xb5
        slab_err+0xb7/0xdc
        ? lock_acquired+0x199/0x490
        __kmem_cache_shutdown+0x1ac/0x3c0
        ? lock_release+0x20e/0x4c0
        kmem_cache_destroy+0x55/0x120
        btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
        exit_btrfs_fs+0xa/0x59 [btrfs]
        __x64_sys_delete_module+0x194/0x260
        ? fpregs_assert_state_consistent+0x1e/0x40
        ? exit_to_user_mode_prepare+0x55/0x1c0
        ? trace_hardirqs_on+0x1b/0xf0
        do_syscall_64+0x33/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f693e305897
       Code: 73 01 c3 48 8b 0d f9 f5 (...)
       RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
       RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
       RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
       RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
       R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
       R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
       INFO: Object 0x0000000050cbdd61 @offset=12104
       INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1894 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
              btrfs_free_tree_block+0x128/0x360 [btrfs]
              __btrfs_cow_block+0x489/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
       INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=4292 cpu=2 pid=1729526
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              sync_filesystem+0x74/0x90
              generic_shutdown_super+0x22/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
       INFO: Object 0x0000000086e9b0ff @offset=12776
       INFO: Allocated in btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs] age=1900 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0xbb/0x480 [btrfs]
              btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
       INFO: Freed in __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs] age=3141 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x1117/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              btrfs_write_dirty_block_groups+0x17d/0x3d0 [btrfs]
              commit_cowonly_roots+0x248/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
       kmem_cache_destroy btrfs_delayed_ref_head: Slab cache still has objects
       CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x8d/0xb5
        kmem_cache_destroy+0x119/0x120
        btrfs_delayed_ref_exit+0x11/0x35 [btrfs]
        exit_btrfs_fs+0xa/0x59 [btrfs]
        __x64_sys_delete_module+0x194/0x260
        ? fpregs_assert_state_consistent+0x1e/0x40
        ? exit_to_user_mode_prepare+0x55/0x1c0
        ? trace_hardirqs_on+0x1b/0xf0
        do_syscall_64+0x33/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f693e305897
       Code: 73 01 c3 48 8b 0d f9 f5 0b (...)
       RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
       RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
       RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
       RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
       R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
       R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
       =============================================================================
       BUG btrfs_delayed_tree_ref (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_tree_ref on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0x0000000011f78dc0 objects=37 used=2 fp=0x0000000032d55d91 flags=0x17fffc000010200
       CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x8d/0xb5
        slab_err+0xb7/0xdc
        ? lock_acquired+0x199/0x490
        __kmem_cache_shutdown+0x1ac/0x3c0
        ? lock_release+0x20e/0x4c0
        kmem_cache_destroy+0x55/0x120
        btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
        exit_btrfs_fs+0xa/0x59 [btrfs]
        __x64_sys_delete_module+0x194/0x260
        ? fpregs_assert_state_consistent+0x1e/0x40
        ? exit_to_user_mode_prepare+0x55/0x1c0
        ? trace_hardirqs_on+0x1b/0xf0
        do_syscall_64+0x33/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f693e305897
       Code: 73 01 c3 48 8b 0d f9 f5 (...)
       RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
       RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
       RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
       RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
       R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
       R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
       INFO: Object 0x000000001a340018 @offset=4408
       INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1917 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
              btrfs_free_tree_block+0x128/0x360 [btrfs]
              __btrfs_cow_block+0x489/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
       INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=4167 cpu=4 pid=1729795
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              btrfs_commit_transaction+0x60/0xc40 [btrfs]
              create_subvol+0x56a/0x990 [btrfs]
              btrfs_mksubvol+0x3fb/0x4a0 [btrfs]
              __btrfs_ioctl_snap_create+0x119/0x1a0 [btrfs]
              btrfs_ioctl_snap_create+0x58/0x80 [btrfs]
              btrfs_ioctl+0x1a92/0x36f0 [btrfs]
              __x64_sys_ioctl+0x83/0xb0
              do_syscall_64+0x33/0x80
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
       INFO: Object 0x000000002b46292a @offset=13648
       INFO: Allocated in btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs] age=1923 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_add_delayed_tree_ref+0x9e/0x480 [btrfs]
              btrfs_alloc_tree_block+0x2bf/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
       INFO: Freed in __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs] age=3164 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0x63d/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
       kmem_cache_destroy btrfs_delayed_tree_ref: Slab cache still has objects
       CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x8d/0xb5
        kmem_cache_destroy+0x119/0x120
        btrfs_delayed_ref_exit+0x1d/0x35 [btrfs]
        exit_btrfs_fs+0xa/0x59 [btrfs]
        __x64_sys_delete_module+0x194/0x260
        ? fpregs_assert_state_consistent+0x1e/0x40
        ? exit_to_user_mode_prepare+0x55/0x1c0
        ? trace_hardirqs_on+0x1b/0xf0
        do_syscall_64+0x33/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f693e305897
       Code: 73 01 c3 48 8b 0d f9 f5 (...)
       RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
       RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
       RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
       RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
       R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
       R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
       =============================================================================
       BUG btrfs_delayed_extent_op (Tainted: G    B   W        ): Objects remaining in btrfs_delayed_extent_op on __kmem_cache_shutdown()
       -----------------------------------------------------------------------------
      
       INFO: Slab 0x00000000f145ce2f objects=22 used=1 fp=0x00000000af0f92cf flags=0x17fffc000010200
       CPU: 5 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x8d/0xb5
        slab_err+0xb7/0xdc
        ? lock_acquired+0x199/0x490
        __kmem_cache_shutdown+0x1ac/0x3c0
        ? __mutex_unlock_slowpath+0x45/0x2a0
        kmem_cache_destroy+0x55/0x120
        exit_btrfs_fs+0xa/0x59 [btrfs]
        __x64_sys_delete_module+0x194/0x260
        ? fpregs_assert_state_consistent+0x1e/0x40
        ? exit_to_user_mode_prepare+0x55/0x1c0
        ? trace_hardirqs_on+0x1b/0xf0
        do_syscall_64+0x33/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f693e305897
       Code: 73 01 c3 48 8b 0d f9 f5 (...)
       RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
       RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
       RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
       RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
       R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
       R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
       INFO: Object 0x000000004cf95ea8 @offset=6264
       INFO: Allocated in btrfs_alloc_tree_block+0x1e0/0x360 [btrfs] age=1931 cpu=6 pid=1729873
              __slab_alloc.isra.0+0x109/0x1c0
              kmem_cache_alloc+0x7bb/0x830
              btrfs_alloc_tree_block+0x1e0/0x360 [btrfs]
              alloc_tree_block_no_bg_flush+0x4f/0x60 [btrfs]
              __btrfs_cow_block+0x12d/0x5f0 [btrfs]
              btrfs_cow_block+0xf7/0x220 [btrfs]
              btrfs_search_slot+0x62a/0xc40 [btrfs]
              btrfs_del_orphan_item+0x65/0xd0 [btrfs]
              btrfs_find_orphan_roots+0x1bf/0x200 [btrfs]
              open_ctree+0x125a/0x18a0 [btrfs]
              btrfs_mount_root.cold+0x13/0xed [btrfs]
              legacy_get_tree+0x30/0x60
              vfs_get_tree+0x28/0xe0
              fc_mount+0xe/0x40
              vfs_kern_mount.part.0+0x71/0x90
              btrfs_mount+0x13b/0x3e0 [btrfs]
       INFO: Freed in __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs] age=3173 cpu=6 pid=1729803
              kmem_cache_free+0x34c/0x3c0
              __btrfs_run_delayed_refs+0xabd/0x1290 [btrfs]
              btrfs_run_delayed_refs+0x81/0x210 [btrfs]
              commit_cowonly_roots+0xfb/0x300 [btrfs]
              btrfs_commit_transaction+0x367/0xc40 [btrfs]
              close_ctree+0x113/0x2fa [btrfs]
              generic_shutdown_super+0x6c/0x100
              kill_anon_super+0x14/0x30
              btrfs_kill_super+0x12/0x20 [btrfs]
              deactivate_locked_super+0x31/0x70
              cleanup_mnt+0x100/0x160
              task_work_run+0x68/0xb0
              exit_to_user_mode_prepare+0x1bb/0x1c0
              syscall_exit_to_user_mode+0x4b/0x260
              entry_SYSCALL_64_after_hwframe+0x44/0xa9
       kmem_cache_destroy btrfs_delayed_extent_op: Slab cache still has objects
       CPU: 3 PID: 1729921 Comm: rmmod Tainted: G    B   W         5.10.0-rc4-btrfs-next-73 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Call Trace:
        dump_stack+0x8d/0xb5
        kmem_cache_destroy+0x119/0x120
        exit_btrfs_fs+0xa/0x59 [btrfs]
        __x64_sys_delete_module+0x194/0x260
        ? fpregs_assert_state_consistent+0x1e/0x40
        ? exit_to_user_mode_prepare+0x55/0x1c0
        ? trace_hardirqs_on+0x1b/0xf0
        do_syscall_64+0x33/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f693e305897
       Code: 73 01 c3 48 8b 0d f9 (...)
       RSP: 002b:00007ffcf73eb508 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
       RAX: ffffffffffffffda RBX: 0000559df504f760 RCX: 00007f693e305897
       RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559df504f7c8
       RBP: 00007ffcf73eb568 R08: 0000000000000000 R09: 0000000000000000
       R10: 00007f693e378ac0 R11: 0000000000000206 R12: 00007ffcf73eb740
       R13: 00007ffcf73ec5a6 R14: 0000559df504f2a0 R15: 0000559df504f760
       BTRFS: state leak: start 30408704 end 30425087 state 1 in tree 1 refs 1
      
      So fix this by calling btrfs_find_orphan_roots() in the mount path only if
      we are mounting the filesystem in RW mode. It's pointless to have it called
      for RO mounts anyway, since despite adding any deleted roots to the list of
      dead roots, we will never have the roots deleted until the filesystem is
      remounted in RW mode, as the cleaner kthread does nothing when we are
      mounted in RO - btrfs_need_cleaner_sleep() always returns true and the
      cleaner spends all time sleeping, never cleaning dead roots.
      
      This is accomplished by moving the call to btrfs_find_orphan_roots() from
      open_ctree() to btrfs_start_pre_rw_mount(), which also guarantees that
      if later the filesystem is remounted RW, we populate the list of dead
      roots and have the cleaner task delete the dead roots.
      Tested-by: NFabian Vogt <fvogt@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      638331fa
  10. 10 12月, 2020 9 次提交
    • Q
      btrfs: rename bio_offset of extent_submit_bio_start_t to dio_file_offset · 1941b64b
      Qu Wenruo 提交于
      The parameter bio_offset of extent_submit_bio_start_t is very confusing.
      If it's really bio_offset (offset to bio), then it should be u32.  But
      in fact, it's only utilized by dio read, and that member is used as file
      offset, which must be u64.
      
      Rename it to dio_file_offset since the only user uses it as file offset,
      and add comment for who is using it.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1941b64b
    • B
      btrfs: fix lockdep warning when creating free space tree · 8a6a87cd
      Boris Burkov 提交于
      A lock dependency loop exists between the root tree lock, the extent tree
      lock, and the free space tree lock.
      
      The root tree lock depends on the free space tree lock because
      btrfs_create_tree holds the new tree's lock while adding it to the root
      tree.
      
      The extent tree lock depends on the root tree lock because during
      umount, we write out space cache v1, which writes inodes in the root
      tree, which results in holding the root tree lock while doing a lookup
      in the extent tree.
      
      Finally, the free space tree depends on the extent tree because
      populate_free_space_tree holds a locked path in the extent tree and then
      does a lookup in the free space tree to add the new item.
      
      The simplest of the three to break is the one during tree creation: we
      unlock the leaf before inserting the tree node into the root tree, which
      fixes the lockdep warning.
      
        [30.480136] ======================================================
        [30.480830] WARNING: possible circular locking dependency detected
        [30.481457] 5.9.0-rc8+ #76 Not tainted
        [30.481897] ------------------------------------------------------
        [30.482500] mount/520 is trying to acquire lock:
        [30.483064] ffff9babebe03908 (btrfs-free-space-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
        [30.484054]
      	      but task is already holding lock:
        [30.484637] ffff9babebe24468 (btrfs-extent-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
        [30.485581]
      	      which lock already depends on the new lock.
      
        [30.486397]
      	      the existing dependency chain (in reverse order) is:
        [30.487205]
      	      -> #2 (btrfs-extent-01#2){++++}-{3:3}:
        [30.487825]        down_read_nested+0x43/0x150
        [30.488306]        __btrfs_tree_read_lock+0x39/0x180
        [30.488868]        __btrfs_read_lock_root_node+0x3a/0x50
        [30.489477]        btrfs_search_slot+0x464/0x9b0
        [30.490009]        check_committed_ref+0x59/0x1d0
        [30.490603]        btrfs_cross_ref_exist+0x65/0xb0
        [30.491108]        run_delalloc_nocow+0x405/0x930
        [30.491651]        btrfs_run_delalloc_range+0x60/0x6b0
        [30.492203]        writepage_delalloc+0xd4/0x150
        [30.492688]        __extent_writepage+0x18d/0x3a0
        [30.493199]        extent_write_cache_pages+0x2af/0x450
        [30.493743]        extent_writepages+0x34/0x70
        [30.494231]        do_writepages+0x31/0xd0
        [30.494642]        __filemap_fdatawrite_range+0xad/0xe0
        [30.495194]        btrfs_fdatawrite_range+0x1b/0x50
        [30.495677]        __btrfs_write_out_cache+0x40d/0x460
        [30.496227]        btrfs_write_out_cache+0x8b/0x110
        [30.496716]        btrfs_start_dirty_block_groups+0x211/0x4e0
        [30.497317]        btrfs_commit_transaction+0xc0/0xba0
        [30.497861]        sync_filesystem+0x71/0x90
        [30.498303]        btrfs_remount+0x81/0x433
        [30.498767]        reconfigure_super+0x9f/0x210
        [30.499261]        path_mount+0x9d1/0xa30
        [30.499722]        do_mount+0x55/0x70
        [30.500158]        __x64_sys_mount+0xc4/0xe0
        [30.500616]        do_syscall_64+0x33/0x40
        [30.501091]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [30.501629]
      	      -> #1 (btrfs-root-00){++++}-{3:3}:
        [30.502241]        down_read_nested+0x43/0x150
        [30.502727]        __btrfs_tree_read_lock+0x39/0x180
        [30.503291]        __btrfs_read_lock_root_node+0x3a/0x50
        [30.503903]        btrfs_search_slot+0x464/0x9b0
        [30.504405]        btrfs_insert_empty_items+0x60/0xa0
        [30.504973]        btrfs_insert_item+0x60/0xd0
        [30.505412]        btrfs_create_tree+0x1b6/0x210
        [30.505913]        btrfs_create_free_space_tree+0x54/0x110
        [30.506460]        btrfs_mount_rw+0x15d/0x20f
        [30.506937]        btrfs_remount+0x356/0x433
        [30.507369]        reconfigure_super+0x9f/0x210
        [30.507868]        path_mount+0x9d1/0xa30
        [30.508264]        do_mount+0x55/0x70
        [30.508668]        __x64_sys_mount+0xc4/0xe0
        [30.509186]        do_syscall_64+0x33/0x40
        [30.509652]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [30.510271]
      	      -> #0 (btrfs-free-space-00){++++}-{3:3}:
        [30.510972]        __lock_acquire+0x11ad/0x1b60
        [30.511432]        lock_acquire+0xa2/0x360
        [30.511917]        down_read_nested+0x43/0x150
        [30.512383]        __btrfs_tree_read_lock+0x39/0x180
        [30.512947]        __btrfs_read_lock_root_node+0x3a/0x50
        [30.513455]        btrfs_search_slot+0x464/0x9b0
        [30.513947]        search_free_space_info+0x45/0x90
        [30.514465]        __add_to_free_space_tree+0x92/0x39d
        [30.515010]        btrfs_create_free_space_tree.cold.22+0x1ee/0x45d
        [30.515639]        btrfs_mount_rw+0x15d/0x20f
        [30.516142]        btrfs_remount+0x356/0x433
        [30.516538]        reconfigure_super+0x9f/0x210
        [30.517065]        path_mount+0x9d1/0xa30
        [30.517438]        do_mount+0x55/0x70
        [30.517824]        __x64_sys_mount+0xc4/0xe0
        [30.518293]        do_syscall_64+0x33/0x40
        [30.518776]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [30.519335]
      	      other info that might help us debug this:
      
        [30.520210] Chain exists of:
      		btrfs-free-space-00 --> btrfs-root-00 --> btrfs-extent-01#2
      
        [30.521407]  Possible unsafe locking scenario:
      
        [30.522037]        CPU0                    CPU1
        [30.522456]        ----                    ----
        [30.522941]   lock(btrfs-extent-01#2);
        [30.523311]                                lock(btrfs-root-00);
        [30.523952]                                lock(btrfs-extent-01#2);
        [30.524620]   lock(btrfs-free-space-00);
        [30.525068]
      	       *** DEADLOCK ***
      
        [30.525669] 5 locks held by mount/520:
        [30.526116]  #0: ffff9babebc520e0 (&type->s_umount_key#37){+.+.}-{3:3}, at: path_mount+0x7ef/0xa30
        [30.527056]  #1: ffff9babebc52640 (sb_internal#2){.+.+}-{0:0}, at: start_transaction+0x3d5/0x5c0
        [30.527960]  #2: ffff9babeae8f2e8 (&cache->free_space_lock#2){+.+.}-{3:3}, at: btrfs_create_free_space_tree.cold.22+0x101/0x45d
        [30.529118]  #3: ffff9babebe24468 (btrfs-extent-01#2){++++}-{3:3}, at: __btrfs_tree_read_lock+0x39/0x180
        [30.530113]  #4: ffff9babebd52eb8 (btrfs-extent-00){++++}-{3:3}, at: btrfs_try_tree_read_lock+0x16/0x100
        [30.531124]
      	      stack backtrace:
        [30.531528] CPU: 0 PID: 520 Comm: mount Not tainted 5.9.0-rc8+ #76
        [30.532166] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module_el8.1.0+248+298dec18 04/01/2014
        [30.533215] Call Trace:
        [30.533452]  dump_stack+0x8d/0xc0
        [30.533797]  check_noncircular+0x13c/0x150
        [30.534233]  __lock_acquire+0x11ad/0x1b60
        [30.534667]  lock_acquire+0xa2/0x360
        [30.535063]  ? __btrfs_tree_read_lock+0x39/0x180
        [30.535525]  down_read_nested+0x43/0x150
        [30.535939]  ? __btrfs_tree_read_lock+0x39/0x180
        [30.536400]  __btrfs_tree_read_lock+0x39/0x180
        [30.536862]  __btrfs_read_lock_root_node+0x3a/0x50
        [30.537304]  btrfs_search_slot+0x464/0x9b0
        [30.537713]  ? trace_hardirqs_on+0x1c/0xf0
        [30.538148]  search_free_space_info+0x45/0x90
        [30.538572]  __add_to_free_space_tree+0x92/0x39d
        [30.539071]  ? printk+0x48/0x4a
        [30.539367]  btrfs_create_free_space_tree.cold.22+0x1ee/0x45d
        [30.539972]  btrfs_mount_rw+0x15d/0x20f
        [30.540350]  btrfs_remount+0x356/0x433
        [30.540773]  ? shrink_dcache_sb+0xd9/0x100
        [30.541203]  reconfigure_super+0x9f/0x210
        [30.541642]  path_mount+0x9d1/0xa30
        [30.542040]  do_mount+0x55/0x70
        [30.542366]  __x64_sys_mount+0xc4/0xe0
        [30.542822]  do_syscall_64+0x33/0x40
        [30.543197]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
        [30.543691] RIP: 0033:0x7f109f7ab93a
        [30.546042] RSP: 002b:00007ffc47c4f858 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        [30.546770] RAX: ffffffffffffffda RBX: 00007f109f8cf264 RCX: 00007f109f7ab93a
        [30.547485] RDX: 0000557e6fc10770 RSI: 0000557e6fc19cf0 RDI: 0000557e6fc19cd0
        [30.548185] RBP: 0000557e6fc10520 R08: 0000557e6fc18e30 R09: 0000557e6fc18cb0
        [30.548911] R10: 0000000000200020 R11: 0000000000000246 R12: 0000000000000000
        [30.549606] R13: 0000557e6fc19cd0 R14: 0000557e6fc10770 R15: 0000557e6fc10520
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8a6a87cd
    • B
      btrfs: keep sb cache_generation consistent with space_cache · 94846229
      Boris Burkov 提交于
      When mounting, btrfs uses the cache_generation in the super block to
      determine if space cache v1 is in use. However, by mounting with
      nospace_cache or space_cache=v2, it is possible to disable space cache
      v1, which does not result in un-setting cache_generation back to 0.
      
      In order to base some logic, like mount option printing in /proc/mounts,
      on the current state of the space cache rather than just the values of
      the mount option, keep the value of cache_generation consistent with the
      status of space cache v1.
      
      We ensure that cache_generation > 0 iff the file system is using
      space_cache v1. This requires committing a transaction on any mount
      which changes whether we are using v1. (v1->nospace_cache, v1->v2,
      nospace_cache->v1, v2->v1).
      
      Since the mechanism for writing out the cache generation is transaction
      commit, but we want some finer grained control over when we un-set it,
      we can't just rely on the SPACE_CACHE mount option, and introduce an
      fs_info flag that mount can use when it wants to unset the generation.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      94846229
    • B
      btrfs: clear free space tree on ro->rw remount · 8b228324
      Boris Burkov 提交于
      A user might want to revert to v1 or nospace_cache on a root filesystem,
      and much like turning on the free space tree, that can only be done
      remounting from ro->rw. Support clearing the free space tree on such
      mounts by moving it into the shared remount logic.
      
      Since the CLEAR_CACHE option sticks around across remounts, this change
      would result in clearing the tree for ever on every remount, which is
      not desirable. To fix that, add CLEAR_CACHE to the oneshot options we
      clear at mount end, which has the other bonus of not cluttering the
      /proc/mounts output with clear_cache.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8b228324
    • B
      btrfs: clear oneshot options on mount and remount · 8cd29088
      Boris Burkov 提交于
      Some options only apply during mount time and are cleared at the end
      of mount. For now, the example is USEBACKUPROOT, but CLEAR_CACHE also
      fits the bill, and this is a preparation patch for also clearing that
      option.
      
      One subtlety is that the current code only resets USEBACKUPROOT on rw
      mounts, but the option is meaningfully "consumed" by a ro mount, so it
      feels appropriate to clear in that case as well. A subsequent read-write
      remount would not go through open_ctree, which is the only place that
      checks the option, so the change should be benign.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8cd29088
    • B
      btrfs: create free space tree on ro->rw remount · 5011139a
      Boris Burkov 提交于
      When a user attempts to remount a btrfs filesystem with
      'mount -o remount,space_cache=v2', that operation silently succeeds.
      Unfortunately, this is misleading, because the remount does not create
      the free space tree. /proc/mounts will incorrectly show space_cache=v2,
      but on the next mount, the file system will revert to the old
      space_cache.
      
      For now, we handle only the easier case, where the existing mount is
      read-only and the new mount is read-write. In that case, we can create
      the free space tree without contending with the block groups changing
      as we go.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5011139a
    • B
      btrfs: start orphan cleanup on ro->rw remount · 8f1c21d7
      Boris Burkov 提交于
      When we mount a rw filesystem, we start the orphan cleanup process in
      tree root and filesystem tree. However, when we remount a ro file system
      rw, we only clean the former. Move the calls to btrfs_orphan_cleanup()
      on tree_root and fs_root to the shared rw mount routine to effectively
      add them on ro->rw remount.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8f1c21d7
    • B
      btrfs: lift read-write mount setup from mount and remount · 44c0ca21
      Boris Burkov 提交于
      Mounting rw and remounting from ro to rw naturally share invariants and
      functionality which result in a correctly setup rw filesystem. Luckily,
      there is even a strong unity in the code which implements them. In
      mount's open_ctree, these operations mostly happen after an early return
      for ro file systems, and in remount, they happen in a section devoted to
      remounting ro->rw, after some remount specific validation passes.
      
      However, there are unfortunately a few differences. There are small
      deviations in the order of some of the operations, remount does not
      start orphan cleanup in root_tree or fs_tree, remount does not create
      the free space tree, and remount does not handle "one-shot" mount
      options like clear_cache and uuid tree rescan.
      
      Since we want to add building the free space tree to remount, and also
      to start the same orphan cleanup process on a filesystem mounted as ro
      then remounted rw, we would benefit from unifying the logic between the
      two code paths.
      
      This patch only lifts the existing common functionality, and leaves a
      natural path for fixing the discrepancies.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NBoris Burkov <boris@bur.io>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      44c0ca21
    • N
      btrfs: remove inode number cache feature · 5297199a
      Nikolay Borisov 提交于
      It's been deprecated since commit b547a88e ("btrfs: start
      deprecation of mount option inode_cache") which enumerates the reasons.
      
      A filesystem that uses the feature (mount -o inode_cache) tracks the
      inode numbers in bitmaps, that data stay on the filesystem after this
      patch. The size is roughly 5MiB for 1M inodes [1], which is considered
      small enough to be left there. Removal of the change can be implemented
      in btrfs-progs if needed.
      
      [1] https://lore.kernel.org/linux-btrfs/20201127145836.GZ6430@twin.jikos.cz/Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5297199a