1. 27 10月, 2021 7 次提交
  2. 26 10月, 2021 7 次提交
  3. 08 10月, 2021 8 次提交
    • J
      btrfs: fix abort logic in btrfs_replace_file_extents · 4afb912f
      Josef Bacik 提交于
      Error injection testing uncovered a case where we'd end up with a
      corrupt file system with a missing extent in the middle of a file.  This
      occurs because the if statement to decide if we should abort is wrong.
      
      The only way we would abort in this case is if we got a ret !=
      -EOPNOTSUPP and we called from the file clone code.  However the
      prealloc code uses this path too.  Instead we need to abort if there is
      an error, and the only error we _don't_ abort on is -EOPNOTSUPP and only
      if we came from the clone file code.
      
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4afb912f
    • F
      btrfs: check for error when looking up inode during dir entry replay · cfd31269
      Filipe Manana 提交于
      At replay_one_name(), we are treating any error from btrfs_lookup_inode()
      as if the inode does not exists. Fix this by checking for an error and
      returning it to the caller.
      
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cfd31269
    • F
      btrfs: unify lookup return value when dir entry is missing · 8dcbc261
      Filipe Manana 提交于
      btrfs_lookup_dir_index_item() and btrfs_lookup_dir_item() lookup for dir
      entries and both are used during log replay or when updating a log tree
      during an unlink.
      
      However when the dir item does not exists, btrfs_lookup_dir_item() returns
      NULL while btrfs_lookup_dir_index_item() returns PTR_ERR(-ENOENT), and if
      the dir item exists but there is no matching entry for a given name or
      index, both return NULL. This makes the call sites during log replay to
      be more verbose than necessary and it makes it easy to miss this slight
      difference. Since we don't need to distinguish between those two cases,
      make btrfs_lookup_dir_index_item() always return NULL when there is no
      matching directory entry - either because there isn't any dir entry or
      because there is one but it does not match the given name and index.
      
      Also rename the argument 'objectid' of btrfs_lookup_dir_index_item() to
      'index' since it is supposed to match an index number, and the name
      'objectid' is not very good because it can easily be confused with an
      inode number (like the inode number a dir entry points to).
      
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8dcbc261
    • F
      btrfs: deal with errors when adding inode reference during log replay · 52db7779
      Filipe Manana 提交于
      At __inode_add_ref(), we treating any error returned from
      btrfs_lookup_dir_item() or from btrfs_lookup_dir_index_item() as meaning
      that there is no existing directory entry in the fs/subvolume tree.
      This is not correct since we can get errors such as, for example, -EIO
      when reading extent buffers while searching the fs/subvolume's btree.
      
      So fix that and return the error to the caller when it is not -ENOENT.
      
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      52db7779
    • F
      btrfs: deal with errors when replaying dir entry during log replay · e15ac641
      Filipe Manana 提交于
      At replay_one_one(), we are treating any error returned from
      btrfs_lookup_dir_item() or from btrfs_lookup_dir_index_item() as meaning
      that there is no existing directory entry in the fs/subvolume tree.
      This is not correct since we can get errors such as, for example, -EIO
      when reading extent buffers while searching the fs/subvolume's btree.
      
      So fix that and return the error to the caller when it is not -ENOENT.
      
      CC: stable@vger.kernel.org # 4.14+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e15ac641
    • F
      btrfs: deal with errors when checking if a dir entry exists during log replay · 77a5b9e3
      Filipe Manana 提交于
      Currently inode_in_dir() ignores errors returned from
      btrfs_lookup_dir_index_item() and from btrfs_lookup_dir_item(), treating
      any errors as if the directory entry does not exists in the fs/subvolume
      tree, which is obviously not correct, as we can get errors such as -EIO
      when reading extent buffers while searching the fs/subvolume's tree.
      
      Fix that by making inode_in_dir() return the errors and making its only
      caller, add_inode_ref(), deal with returned errors as well.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      77a5b9e3
    • J
      btrfs: update refs for any root except tree log roots · d175209b
      Josef Bacik 提交于
      I hit a stuck relocation on btrfs/061 during my overnight testing.  This
      turned out to be because we had left over extent entries in our extent
      root for a data reloc inode that no longer existed.  This happened
      because in btrfs_drop_extents() we only update refs if we have SHAREABLE
      set or we are the tree_root.  This regression was introduced by
      aeb935a4 ("btrfs: don't set SHAREABLE flag for data reloc tree")
      where we stopped setting SHAREABLE for the data reloc tree.
      
      The problem here is we actually do want to update extent references for
      data extents in the data reloc tree, in fact we only don't want to
      update extent references if the file extents are in the log tree.
      Update this check to only skip updating references in the case of the
      log tree.
      
      This is relatively rare, because you have to be running scrub at the
      same time, which is what btrfs/061 does.  The data reloc inode has its
      extents pre-allocated, and then we copy the extent into the
      pre-allocated chunks.  We theoretically should never be calling
      btrfs_drop_extents() on a data reloc inode.  The exception of course is
      with scrub, if our pre-allocated extent falls inside of the block group
      we are scrubbing, then the block group will be marked read only and we
      will be forced to cow that extent.  This means we will call
      btrfs_drop_extents() on that range when we COW that file extent.
      
      This isn't really problematic if we do this, the data reloc inode
      requires that our extent lengths match exactly with the extent we are
      copying, thankfully we validate the extent is correct with
      get_new_location(), so if we happen to COW only part of the extent we
      won't link it in when we do the relocation, so we are safe from any
      other shenanigans that arise because of this interaction with scrub.
      
      Fixes: aeb935a4 ("btrfs: don't set SHAREABLE flag for data reloc tree")
      CC: stable@vger.kernel.org # 5.8+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d175209b
    • Q
      btrfs: unlock newly allocated extent buffer after error · 19ea40dd
      Qu Wenruo 提交于
      [BUG]
      There is a bug report that injected ENOMEM error could leave a tree
      block locked while we return to user-space:
      
        BTRFS info (device loop0): enabling ssd optimizations
        FAULT_INJECTION: forcing a failure.
        name failslab, interval 1, probability 0, space 0, times 0
        CPU: 0 PID: 7579 Comm: syz-executor Not tainted 5.15.0-rc1 #16
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
        rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
        Call Trace:
         __dump_stack lib/dump_stack.c:88 [inline]
         dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:106
         fail_dump lib/fault-inject.c:52 [inline]
         should_fail+0x13c/0x160 lib/fault-inject.c:146
         should_failslab+0x5/0x10 mm/slab_common.c:1328
         slab_pre_alloc_hook.constprop.99+0x4e/0xc0 mm/slab.h:494
         slab_alloc_node mm/slub.c:3120 [inline]
         slab_alloc mm/slub.c:3214 [inline]
         kmem_cache_alloc+0x44/0x280 mm/slub.c:3219
         btrfs_alloc_delayed_extent_op fs/btrfs/delayed-ref.h:299 [inline]
         btrfs_alloc_tree_block+0x38c/0x670 fs/btrfs/extent-tree.c:4833
         __btrfs_cow_block+0x16f/0x7d0 fs/btrfs/ctree.c:415
         btrfs_cow_block+0x12a/0x300 fs/btrfs/ctree.c:570
         btrfs_search_slot+0x6b0/0xee0 fs/btrfs/ctree.c:1768
         btrfs_insert_empty_items+0x80/0xf0 fs/btrfs/ctree.c:3905
         btrfs_new_inode+0x311/0xa60 fs/btrfs/inode.c:6530
         btrfs_create+0x12b/0x270 fs/btrfs/inode.c:6783
         lookup_open+0x660/0x780 fs/namei.c:3282
         open_last_lookups fs/namei.c:3352 [inline]
         path_openat+0x465/0xe20 fs/namei.c:3557
         do_filp_open+0xe3/0x170 fs/namei.c:3588
         do_sys_openat2+0x357/0x4a0 fs/open.c:1200
         do_sys_open+0x87/0xd0 fs/open.c:1216
         do_syscall_x64 arch/x86/entry/common.c:50 [inline]
         do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x46ae99
        Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48
        89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
        01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
        RSP: 002b:00007f46711b9c48 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
        RAX: ffffffffffffffda RBX: 000000000078c0a0 RCX: 000000000046ae99
        RDX: 0000000000000000 RSI: 00000000000000a1 RDI: 0000000020005800
        RBP: 00007f46711b9c80 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000017
        R13: 0000000000000000 R14: 000000000078c0a0 R15: 00007ffc129da6e0
      
        ================================================
        WARNING: lock held when returning to user space!
        5.15.0-rc1 #16 Not tainted
        ------------------------------------------------
        syz-executor/7579 is leaving the kernel with locks still held!
        1 lock held by syz-executor/7579:
         #0: ffff888104b73da8 (btrfs-tree-01/1){+.+.}-{3:3}, at:
        __btrfs_tree_lock+0x2e/0x1a0 fs/btrfs/locking.c:112
      
      [CAUSE]
      In btrfs_alloc_tree_block(), after btrfs_init_new_buffer(), the new
      extent buffer @buf is locked, but if later operations like adding
      delayed tree ref fail, we just free @buf without unlocking it,
      resulting above warning.
      
      [FIX]
      Unlock @buf in out_free_buf: label.
      Reported-by: NHao Sun <sunhao.th@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CACkBjsZ9O6Zr0KK1yGn=1rQi6Crh1yeCRdTSBxx9R99L4xdn-Q@mail.gmail.com/
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      19ea40dd
  4. 18 9月, 2021 4 次提交
    • Q
      btrfs: prevent __btrfs_dump_space_info() to underflow its free space · 0619b790
      Qu Wenruo 提交于
      It's not uncommon where __btrfs_dump_space_info() gets called
      under over-commit situations.
      
      In that case free space would underflow as total allocated space is not
      enough to handle all the over-committed space.
      
      Such underflow values can sometimes cause confusion for users enabled
      enospc_debug mount option, and takes some seconds for developers to
      convert the underflow value to signed result.
      
      Just output the free space as s64 to avoid such problem.
      Reported-by: NEli V <eliventer@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAJtFHUSy4zgyhf-4d9T+KdJp9w=UgzC2A0V=VtmaeEpcGgm1-Q@mail.gmail.com/
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0619b790
    • F
      btrfs: fix mount failure due to past and transient device flush error · 6b225baa
      Filipe Manana 提交于
      When we get an error flushing one device, during a super block commit, we
      record the error in the device structure, in the field 'last_flush_error'.
      This is used to later check if we should error out the super block commit,
      depending on whether the number of flush errors is greater than or equals
      to the maximum tolerated device failures for a raid profile.
      
      However if we get a transient device flush error, unmount the filesystem
      and later try to mount it, we can fail the mount because we treat that
      past error as critical and consider the device is missing. Even if it's
      very likely that the error will happen again, as it's probably due to a
      hardware related problem, there may be cases where the error might not
      happen again. One example is during testing, and a test case like the
      new generic/648 from fstests always triggers this. The test cases
      generic/019 and generic/475 also trigger this scenario, but very
      sporadically.
      
      When this happens we get an error like this:
      
        $ mount /dev/sdc /mnt
        mount: /mnt wrong fs type, bad option, bad superblock on /dev/sdc, missing codepage or helper program, or other error.
      
        $ dmesg
        (...)
        [12918.886926] BTRFS warning (device sdc): chunk 13631488 missing 1 devices, max tolerance is 0 for writable mount
        [12918.888293] BTRFS warning (device sdc): writable mount is not allowed due to too many missing devices
        [12918.890853] BTRFS error (device sdc): open_ctree failed
      
      The failure happens because when btrfs_check_rw_degradable() is called at
      mount time, or at remount from RO to RW time, is sees a non zero value in
      a device's ->last_flush_error attribute, and therefore considers that the
      device is 'missing'.
      
      Fix this by setting a device's ->last_flush_error to zero when we close a
      device, making sure the error is not seen on the next mount attempt. We
      only need to track flush errors during the current mount, so that we never
      commit a super block if such errors happened.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6b225baa
    • F
      btrfs: fix transaction handle leak after verity rollback failure · acbee9af
      Filipe Manana 提交于
      During a verity rollback, if we fail to update the inode or delete the
      orphan, we abort the transaction and return without releasing our
      transaction handle. Fix that by releasing the handle.
      
      Fixes: 14605409 ("btrfs: initial fsverity support")
      Fixes: 70524253 ("btrfs: verity metadata orphan items")
      Reviewed-by: NBoris Burkov <boris@bur.io>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      acbee9af
    • Q
      btrfs: replace BUG_ON() in btrfs_csum_one_bio() with proper error handling · bbc9a6eb
      Qu Wenruo 提交于
      There is a BUG_ON() in btrfs_csum_one_bio() to catch code logic error.
      It has indeed caught several bugs during subpage development.
      But the BUG_ON() itself will bring down the whole system which is
      an overkill.
      
      Replace it with a WARN() and exit gracefully, so that it won't crash the
      whole system while we can still catch the code logic error.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bbc9a6eb
  5. 07 9月, 2021 6 次提交
    • N
      btrfs: zoned: fix double counting of split ordered extent · f79645df
      Naohiro Aota 提交于
      btrfs_add_ordered_extent_*() add num_bytes to fs_info->ordered_bytes.
      Then, splitting an ordered extent will call btrfs_add_ordered_extent_*()
      again for split extents, leading to double counting of the region of
      a split extent. These leaked bytes are finally reported at unmount time
      as follow:
      
        BTRFS info (device dm-1): at unmount dio bytes count 364544
      
      Fix the double counting by subtracting split extent's size from
      fs_info->ordered_bytes.
      
      Fixes: d22002fd ("btrfs: zoned: split ordered extent when bio is sent")
      CC: stable@vger.kernel.org # 5.12+
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f79645df
    • A
      btrfs: fix lockdep warning while mounting sprout fs · c1247069
      Anand Jain 提交于
      Following test case reproduces lockdep warning.
      
        Test case:
      
        $ mkfs.btrfs -f <dev1>
        $ btrfstune -S 1 <dev1>
        $ mount <dev1> <mnt>
        $ btrfs device add <dev2> <mnt> -f
        $ umount <mnt>
        $ mount <dev2> <mnt>
        $ umount <mnt>
      
      The warning claims a possible ABBA deadlock between the threads
      initiated by [#1] btrfs device add and [#0] the mount.
      
        [ 540.743122] WARNING: possible circular locking dependency detected
        [ 540.743129] 5.11.0-rc7+ #5 Not tainted
        [ 540.743135] ------------------------------------------------------
        [ 540.743142] mount/2515 is trying to acquire lock:
        [ 540.743149] ffffa0c5544c2ce0 (&fs_devs->device_list_mutex){+.+.}-{4:4}, at: clone_fs_devices+0x6d/0x210 [btrfs]
        [ 540.743458] but task is already holding lock:
        [ 540.743461] ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
        [ 540.743541] which lock already depends on the new lock.
        [ 540.743543] the existing dependency chain (in reverse order) is:
      
        [ 540.743546] -> #1 (btrfs-chunk-00){++++}-{4:4}:
        [ 540.743566] down_read_nested+0x48/0x2b0
        [ 540.743585] __btrfs_tree_read_lock+0x32/0x200 [btrfs]
        [ 540.743650] btrfs_read_lock_root_node+0x70/0x200 [btrfs]
        [ 540.743733] btrfs_search_slot+0x6c6/0xe00 [btrfs]
        [ 540.743785] btrfs_update_device+0x83/0x260 [btrfs]
        [ 540.743849] btrfs_finish_chunk_alloc+0x13f/0x660 [btrfs] <--- device_list_mutex
        [ 540.743911] btrfs_create_pending_block_groups+0x18d/0x3f0 [btrfs]
        [ 540.743982] btrfs_commit_transaction+0x86/0x1260 [btrfs]
        [ 540.744037] btrfs_init_new_device+0x1600/0x1dd0 [btrfs]
        [ 540.744101] btrfs_ioctl+0x1c77/0x24c0 [btrfs]
        [ 540.744166] __x64_sys_ioctl+0xe4/0x140
        [ 540.744170] do_syscall_64+0x4b/0x80
        [ 540.744174] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        [ 540.744180] -> #0 (&fs_devs->device_list_mutex){+.+.}-{4:4}:
        [ 540.744184] __lock_acquire+0x155f/0x2360
        [ 540.744188] lock_acquire+0x10b/0x5c0
        [ 540.744190] __mutex_lock+0xb1/0xf80
        [ 540.744193] mutex_lock_nested+0x27/0x30
        [ 540.744196] clone_fs_devices+0x6d/0x210 [btrfs]
        [ 540.744270] btrfs_read_chunk_tree+0x3c7/0xbb0 [btrfs]
        [ 540.744336] open_ctree+0xf6e/0x2074 [btrfs]
        [ 540.744406] btrfs_mount_root.cold.72+0x16/0x127 [btrfs]
        [ 540.744472] legacy_get_tree+0x38/0x90
        [ 540.744475] vfs_get_tree+0x30/0x140
        [ 540.744478] fc_mount+0x16/0x60
        [ 540.744482] vfs_kern_mount+0x91/0x100
        [ 540.744484] btrfs_mount+0x1e6/0x670 [btrfs]
        [ 540.744536] legacy_get_tree+0x38/0x90
        [ 540.744537] vfs_get_tree+0x30/0x140
        [ 540.744539] path_mount+0x8d8/0x1070
        [ 540.744541] do_mount+0x8d/0xc0
        [ 540.744543] __x64_sys_mount+0x125/0x160
        [ 540.744545] do_syscall_64+0x4b/0x80
        [ 540.744547] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        [ 540.744551] other info that might help us debug this:
        [ 540.744552] Possible unsafe locking scenario:
      
        [ 540.744553] CPU0 				CPU1
        [ 540.744554] ---- 				----
        [ 540.744555] lock(btrfs-chunk-00);
        [ 540.744557] 					lock(&fs_devs->device_list_mutex);
        [ 540.744560] 					lock(btrfs-chunk-00);
        [ 540.744562] lock(&fs_devs->device_list_mutex);
        [ 540.744564]
         *** DEADLOCK ***
      
        [ 540.744565] 3 locks held by mount/2515:
        [ 540.744567] #0: ffffa0c56bf7a0e0 (&type->s_umount_key#42/1){+.+.}-{4:4}, at: alloc_super.isra.16+0xdf/0x450
        [ 540.744574] #1: ffffffffc05a9628 (uuid_mutex){+.+.}-{4:4}, at: btrfs_read_chunk_tree+0x63/0xbb0 [btrfs]
        [ 540.744640] #2: ffffa0c54a7932b8 (btrfs-chunk-00){++++}-{4:4}, at: __btrfs_tree_read_lock+0x32/0x200 [btrfs]
        [ 540.744708]
         stack backtrace:
        [ 540.744712] CPU: 2 PID: 2515 Comm: mount Not tainted 5.11.0-rc7+ #5
      
      But the device_list_mutex in clone_fs_devices() is redundant, as
      explained below.  Two threads [1]  and [2] (below) could lead to
      clone_fs_device().
      
        [1]
        open_ctree <== mount sprout fs
         btrfs_read_chunk_tree()
          mutex_lock(&uuid_mutex) <== global lock
          read_one_dev()
           open_seed_devices()
            clone_fs_devices() <== seed fs_devices
             mutex_lock(&orig->device_list_mutex) <== seed fs_devices
      
        [2]
        btrfs_init_new_device() <== sprouting
         mutex_lock(&uuid_mutex); <== global lock
         btrfs_prepare_sprout()
           lockdep_assert_held(&uuid_mutex)
           clone_fs_devices(seed_fs_device) <== seed fs_devices
      
      Both of these threads hold uuid_mutex which is sufficient to protect
      getting the seed device(s) freed while we are trying to clone it for
      sprouting [2] or mounting a sprout [1] (as above). A mounted seed device
      can not free/write/replace because it is read-only. An unmounted seed
      device can be freed by btrfs_free_stale_devices(), but it needs
      uuid_mutex.  So this patch removes the unnecessary device_list_mutex in
      clone_fs_devices().  And adds a lockdep_assert_held(&uuid_mutex) in
      clone_fs_devices().
      Reported-by: NSu Yue <l@damenly.su>
      Tested-by: NSu Yue <l@damenly.su>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c1247069
    • J
      btrfs: delay blkdev_put until after the device remove · 3fa421de
      Josef Bacik 提交于
      When removing the device we call blkdev_put() on the device once we've
      removed it, and because we have an EXCL open we need to take the
      ->open_mutex on the block device to clean it up.  Unfortunately during
      device remove we are holding the sb writers lock, which results in the
      following lockdep splat:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.14.0-rc2+ #407 Not tainted
      ------------------------------------------------------
      losetup/11595 is trying to acquire lock:
      ffff973ac35dd138 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
      
      but task is already holding lock:
      ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
             __mutex_lock+0x7d/0x750
             lo_open+0x28/0x60 [loop]
             blkdev_get_whole+0x25/0xf0
             blkdev_get_by_dev.part.0+0x168/0x3c0
             blkdev_open+0xd2/0xe0
             do_dentry_open+0x161/0x390
             path_openat+0x3cc/0xa20
             do_filp_open+0x96/0x120
             do_sys_openat2+0x7b/0x130
             __x64_sys_openat+0x46/0x70
             do_syscall_64+0x38/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #3 (&disk->open_mutex){+.+.}-{3:3}:
             __mutex_lock+0x7d/0x750
             blkdev_put+0x3a/0x220
             btrfs_rm_device.cold+0x62/0xe5
             btrfs_ioctl+0x2a31/0x2e70
             __x64_sys_ioctl+0x80/0xb0
             do_syscall_64+0x38/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #2 (sb_writers#12){.+.+}-{0:0}:
             lo_write_bvec+0xc2/0x240 [loop]
             loop_process_work+0x238/0xd00 [loop]
             process_one_work+0x26b/0x560
             worker_thread+0x55/0x3c0
             kthread+0x140/0x160
             ret_from_fork+0x1f/0x30
      
      -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
             process_one_work+0x245/0x560
             worker_thread+0x55/0x3c0
             kthread+0x140/0x160
             ret_from_fork+0x1f/0x30
      
      -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
             __lock_acquire+0x10ea/0x1d90
             lock_acquire+0xb5/0x2b0
             flush_workqueue+0x91/0x5e0
             drain_workqueue+0xa0/0x110
             destroy_workqueue+0x36/0x250
             __loop_clr_fd+0x9a/0x660 [loop]
             block_ioctl+0x3f/0x50
             __x64_sys_ioctl+0x80/0xb0
             do_syscall_64+0x38/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      other info that might help us debug this:
      
      Chain exists of:
        (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&lo->lo_mutex);
                                     lock(&disk->open_mutex);
                                     lock(&lo->lo_mutex);
        lock((wq_completion)loop0);
      
       *** DEADLOCK ***
      
      1 lock held by losetup/11595:
       #0: ffff973ac9812c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
      
      stack backtrace:
      CPU: 0 PID: 11595 Comm: losetup Not tainted 5.14.0-rc2+ #407
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Call Trace:
       dump_stack_lvl+0x57/0x72
       check_noncircular+0xcf/0xf0
       ? stack_trace_save+0x3b/0x50
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       ? flush_workqueue+0x67/0x5e0
       ? lockdep_init_map_type+0x47/0x220
       flush_workqueue+0x91/0x5e0
       ? flush_workqueue+0x67/0x5e0
       ? verify_cpu+0xf0/0x100
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       ? blkdev_ioctl+0x8d/0x2a0
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7fc21255d4cb
      
      So instead save the bdev and do the put once we've dropped the sb
      writers lock in order to avoid the lockdep recursion.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3fa421de
    • J
      btrfs: update the bdev time directly when closing · 8f96a5bf
      Josef Bacik 提交于
      We update the ctime/mtime of a block device when we remove it so that
      blkid knows the device changed.  However we do this by re-opening the
      block device and calling filp_update_time.  This is more correct because
      it'll call the inode->i_op->update_time if it exists, but the block dev
      inodes do not do this.  Instead call generic_update_time() on the
      bd_inode in order to avoid the blkdev_open path and get rid of the
      following lockdep splat:
      
      ======================================================
      WARNING: possible circular locking dependency detected
      5.14.0-rc2+ #406 Not tainted
      ------------------------------------------------------
      losetup/11596 is trying to acquire lock:
      ffff939640d2f538 ((wq_completion)loop0){+.+.}-{0:0}, at: flush_workqueue+0x67/0x5e0
      
      but task is already holding lock:
      ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #4 (&lo->lo_mutex){+.+.}-{3:3}:
             __mutex_lock+0x7d/0x750
             lo_open+0x28/0x60 [loop]
             blkdev_get_whole+0x25/0xf0
             blkdev_get_by_dev.part.0+0x168/0x3c0
             blkdev_open+0xd2/0xe0
             do_dentry_open+0x161/0x390
             path_openat+0x3cc/0xa20
             do_filp_open+0x96/0x120
             do_sys_openat2+0x7b/0x130
             __x64_sys_openat+0x46/0x70
             do_syscall_64+0x38/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #3 (&disk->open_mutex){+.+.}-{3:3}:
             __mutex_lock+0x7d/0x750
             blkdev_get_by_dev.part.0+0x56/0x3c0
             blkdev_open+0xd2/0xe0
             do_dentry_open+0x161/0x390
             path_openat+0x3cc/0xa20
             do_filp_open+0x96/0x120
             file_open_name+0xc7/0x170
             filp_open+0x2c/0x50
             btrfs_scratch_superblocks.part.0+0x10f/0x170
             btrfs_rm_device.cold+0xe8/0xed
             btrfs_ioctl+0x2a31/0x2e70
             __x64_sys_ioctl+0x80/0xb0
             do_syscall_64+0x38/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      -> #2 (sb_writers#12){.+.+}-{0:0}:
             lo_write_bvec+0xc2/0x240 [loop]
             loop_process_work+0x238/0xd00 [loop]
             process_one_work+0x26b/0x560
             worker_thread+0x55/0x3c0
             kthread+0x140/0x160
             ret_from_fork+0x1f/0x30
      
      -> #1 ((work_completion)(&lo->rootcg_work)){+.+.}-{0:0}:
             process_one_work+0x245/0x560
             worker_thread+0x55/0x3c0
             kthread+0x140/0x160
             ret_from_fork+0x1f/0x30
      
      -> #0 ((wq_completion)loop0){+.+.}-{0:0}:
             __lock_acquire+0x10ea/0x1d90
             lock_acquire+0xb5/0x2b0
             flush_workqueue+0x91/0x5e0
             drain_workqueue+0xa0/0x110
             destroy_workqueue+0x36/0x250
             __loop_clr_fd+0x9a/0x660 [loop]
             block_ioctl+0x3f/0x50
             __x64_sys_ioctl+0x80/0xb0
             do_syscall_64+0x38/0x90
             entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      other info that might help us debug this:
      
      Chain exists of:
        (wq_completion)loop0 --> &disk->open_mutex --> &lo->lo_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&lo->lo_mutex);
                                     lock(&disk->open_mutex);
                                     lock(&lo->lo_mutex);
        lock((wq_completion)loop0);
      
       *** DEADLOCK ***
      
      1 lock held by losetup/11596:
       #0: ffff939655510c68 (&lo->lo_mutex){+.+.}-{3:3}, at: __loop_clr_fd+0x41/0x660 [loop]
      
      stack backtrace:
      CPU: 1 PID: 11596 Comm: losetup Not tainted 5.14.0-rc2+ #406
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
      Call Trace:
       dump_stack_lvl+0x57/0x72
       check_noncircular+0xcf/0xf0
       ? stack_trace_save+0x3b/0x50
       __lock_acquire+0x10ea/0x1d90
       lock_acquire+0xb5/0x2b0
       ? flush_workqueue+0x67/0x5e0
       ? lockdep_init_map_type+0x47/0x220
       flush_workqueue+0x91/0x5e0
       ? flush_workqueue+0x67/0x5e0
       ? verify_cpu+0xf0/0x100
       drain_workqueue+0xa0/0x110
       destroy_workqueue+0x36/0x250
       __loop_clr_fd+0x9a/0x660 [loop]
       ? blkdev_ioctl+0x8d/0x2a0
       block_ioctl+0x3f/0x50
       __x64_sys_ioctl+0x80/0xb0
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8f96a5bf
    • K
      btrfs: use correct header for div_u64 in misc.h · cde7417c
      Kari Argillander 提交于
      asm/do_div.h is for div_u64, but it is found in math64.h. This change
      will make compiler job easier and prevent compiler errors in situation
      where compiler will not find math64.h from another paths.
      Signed-off-by: NKari Argillander <kari.argillander@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cde7417c
    • A
      btrfs: fix upper limit for max_inline for page size 64K · 6f93e834
      Anand Jain 提交于
      The mount option max_inline ranges from 0 to the sectorsize (which is
      now equal to page size). But we parse the mount options too early and
      before the actual sectorsize is read from the superblock. So the upper
      limit of max_inline is unaware of the actual sectorsize and is limited
      by the temporary sectorsize 4096, even on a system where the default
      sectorsize is 64K.
      
      Fix this by reading the superblock sectorsize before the mount option
      parse.
      Reported-by: NAlexander Tsvetkov <alexander.tsvetkov@oracle.com>
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6f93e834
  6. 25 8月, 2021 1 次提交
  7. 23 8月, 2021 7 次提交
    • D
      btrfs: reset replace target device to allocation state on close · 0d977e0e
      Desmond Cheong Zhi Xi 提交于
      This crash was observed with a failed assertion on device close:
      
        BTRFS: Transaction aborted (error -28)
        WARNING: CPU: 1 PID: 3902 at fs/btrfs/extent-tree.c:2150 btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
        Modules linked in: btrfs blake2b_generic libcrc32c crc32c_intel xor zstd_decompress zstd_compress xxhash lzo_compress lzo_decompress raid6_pq loop
        CPU: 1 PID: 3902 Comm: kworker/u8:4 Not tainted 5.14.0-rc5-default+ #1532
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
        RIP: 0010:btrfs_run_delayed_refs+0x1d2/0x1e0 [btrfs]
        RSP: 0018:ffffb7a5452d7d80 EFLAGS: 00010282
        RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
        RBP: ffff97834176a378 R08: 0000000000000001 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff97835195d388
        R13: 0000000005b08000 R14: ffff978385484000 R15: 000000000000016c
        FS:  0000000000000000(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000056190d003fe8 CR3: 000000002a81e005 CR4: 0000000000170ea0
        Call Trace:
         flush_space+0x197/0x2f0 [btrfs]
         btrfs_async_reclaim_metadata_space+0x139/0x300 [btrfs]
         process_one_work+0x262/0x5e0
         worker_thread+0x4c/0x320
         ? process_one_work+0x5e0/0x5e0
         kthread+0x144/0x170
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x1f/0x30
        irq event stamp: 19334989
        hardirqs last  enabled at (19334997): [<ffffffffab0e0c87>] console_unlock+0x2b7/0x400
        hardirqs last disabled at (19335006): [<ffffffffab0e0d0d>] console_unlock+0x33d/0x400
        softirqs last  enabled at (19334900): [<ffffffffaba0030d>] __do_softirq+0x30d/0x574
        softirqs last disabled at (19334893): [<ffffffffab0721ec>] irq_exit_rcu+0x12c/0x140
        ---[ end trace 45939e308e0dd3c7 ]---
        BTRFS: error (device vdd) in btrfs_run_delayed_refs:2150: errno=-28 No space left
        BTRFS info (device vdd): forced readonly
        BTRFS warning (device vdd): failed setting block group ro: -30
        BTRFS info (device vdd): suspending dev_replace for unmount
        assertion failed: !test_bit(BTRFS_DEV_STATE_REPLACE_TGT, &device->dev_state), in fs/btrfs/volumes.c:1150
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3431!
        invalid opcode: 0000 [#1] PREEMPT SMP
        CPU: 1 PID: 3982 Comm: umount Tainted: G        W         5.14.0-rc5-default+ #1532
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
        RIP: 0010:assertfail.constprop.0+0x18/0x1a [btrfs]
        RSP: 0018:ffffb7a5454c7db8 EFLAGS: 00010246
        RAX: 0000000000000068 RBX: ffff978364b91c00 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: ffffffffabee13c4 RDI: 00000000ffffffff
        RBP: ffff9783523a4c00 R08: 0000000000000001 R09: 0000000000000001
        R10: 0000000000000000 R11: 0000000000000001 R12: ffff9783523a4d18
        R13: 0000000000000000 R14: 0000000000000004 R15: 0000000000000003
        FS:  00007f61c8f42800(0000) GS:ffff9783bd800000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 000056190cffa810 CR3: 0000000030b96002 CR4: 0000000000170ea0
        Call Trace:
         btrfs_close_one_device.cold+0x11/0x55 [btrfs]
         close_fs_devices+0x44/0xb0 [btrfs]
         btrfs_close_devices+0x48/0x160 [btrfs]
         generic_shutdown_super+0x69/0x100
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x2c/0xa0
         cleanup_mnt+0x144/0x1b0
         task_work_run+0x59/0xa0
         exit_to_user_mode_loop+0xe7/0xf0
         exit_to_user_mode_prepare+0xaf/0xf0
         syscall_exit_to_user_mode+0x19/0x50
         do_syscall_64+0x4a/0x90
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      This happens when close_ctree is called while a dev_replace hasn't
      completed. In close_ctree, we suspend the dev_replace, but keep the
      replace target around so that we can resume the dev_replace procedure
      when we mount the root again. This is the call trace:
      
        close_ctree():
          btrfs_dev_replace_suspend_for_unmount();
          btrfs_close_devices():
            btrfs_close_fs_devices():
              btrfs_close_one_device():
                ASSERT(!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
                       &device->dev_state));
      
      However, since the replace target sticks around, there is a device
      with BTRFS_DEV_STATE_REPLACE_TGT set on close, and we fail the
      assertion in btrfs_close_one_device.
      
      To fix this, if we come across the replace target device when
      closing, we should properly reset it back to allocation state. This
      fix also ensures that if a non-target device has a corrupted state and
      has the BTRFS_DEV_STATE_REPLACE_TGT bit set, the assertion will still
      catch the error.
      Reported-by: NDavid Sterba <dsterba@suse.com>
      Fixes: b2a61667 ("btrfs: fix rw device counting in __btrfs_free_extra_devids")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDesmond Cheong Zhi Xi <desmondcheongzx@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0d977e0e
    • N
      btrfs: zoned: fix ordered extent boundary calculation · 939c7feb
      Naohiro Aota 提交于
      btrfs_lookup_ordered_extent() is supposed to query the offset in a file
      instead of the logical address. Pass the file offset from
      submit_extent_page() to calc_bio_boundaries().
      
      Also, calc_bio_boundaries() relies on the bio's operation flag, so move
      the call site after setting it.
      
      Fixes: 390ed29b ("btrfs: refactor submit_extent_page() to make bio and its flag tracing easier")
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      939c7feb
    • J
      btrfs: do not do preemptive flushing if the majority is global rsv · 11462397
      Josef Bacik 提交于
      A common characteristic of the bug report where preemptive flushing was
      going full tilt was the fact that the vast majority of the free metadata
      space was used up by the global reserve.  The hard 90% threshold would
      cover the majority of these cases, but to be even smarter we should take
      into account how much of the outstanding reservations are covered by the
      global block reserve.  If the global block reserve accounts for the vast
      majority of outstanding reservations, skip preemptive flushing, as it
      will likely just cause churn and pain.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212185Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      11462397
    • J
      btrfs: reduce the preemptive flushing threshold to 90% · 93c60b17
      Josef Bacik 提交于
      The preemptive flushing code was added in order to avoid needing to
      synchronously wait for ENOSPC flushing to recover space.  Once we're
      almost full however we can essentially flush constantly.  We were using
      98% as a threshold to determine if we were simply full, however in
      practice this is a really high bar to hit.  For example reports of
      systems running into this problem had around 94% usage and thus
      continued to flush.  Fix this by lowering the threshold to 90%, which is
      a more sane value, especially for smaller file systems.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212185
      CC: stable@vger.kernel.org # 5.12+
      Fixes: 576fa348 ("btrfs: improve preemptive background space flushing")
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      93c60b17
    • M
      btrfs: tree-log: check btrfs_lookup_data_extent return value · 3736127a
      Marcos Paulo de Souza 提交于
      Function btrfs_lookup_data_extent calls btrfs_search_slot to verify if
      the EXTENT_ITEM exists in the extent tree. btrfs_search_slot can return
      values bellow zero if an error happened.
      
      Function replay_one_extent currently checks if the search found
      something (0 returned) and increments the reference, and if not, it
      seems to evaluate as 'not found'.
      
      Fix the condition by checking if the value was bellow zero and return
      early.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NMarcos Paulo de Souza <mpdesouza@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3736127a
    • F
      btrfs: avoid unnecessarily logging directories that had no changes · 8be2ba2e
      Filipe Manana 提交于
      There are several cases where when logging an inode we need to log its
      parent directories or logging subdirectories when logging a directory.
      
      There are cases however where we end up logging a directory even if it was
      not changed in the current transaction, no dentries added or removed since
      the last transaction. While this is harmless from a functional point of
      view, it is a waste time as it brings no advantage.
      
      One example where this is triggered is the following:
      
        $ mkfs.btrfs -f /dev/sdc
        $ mount /dev/sdc /mnt
      
        $ mkdir /mnt/A
        $ mkdir /mnt/B
        $ mkdir /mnt/C
      
        $ touch /mnt/A/foo
        $ ln /mnt/A/foo /mnt/B/bar
        $ ln /mnt/A/foo /mnt/C/baz
      
        $ sync
      
        $ rm -f /mnt/A/foo
        $ xfs_io -c "fsync" /mnt/B/bar
      
      This last fsync ends up logging directories A, B and C, however we only
      need to log directory A, as B and C were not changed since the last
      transaction commit.
      
      So fix this by changing need_log_inode(), to return false in case the
      given inode is a directory and has a ->last_trans value smaller than the
      current transaction's ID.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8be2ba2e
    • C
      btrfs: allow idmapped mount · 5b9b26f5
      Christian Brauner 提交于
      Now that we converted btrfs internally to account for idmapped mounts
      allow the creation of idmapped mounts on by setting the FS_ALLOW_IDMAP
      flag.  We only need to raise this flag on the btrfs_root_fs_type
      filesystem since btrfs_mount_root() is ultimately responsible for
      allocating the superblock and is called into from btrfs_mount()
      associated with btrfs_fs_type.
      
      The conversion of the btrfs inode operations was straightforward.
      Regarding btrfs specific ioctls that perform checks based on inode
      permissions only those have been allowed that are not filesystem wide
      operations and hence can be reasonably charged against a specific mount.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5b9b26f5