1. 09 9月, 2019 8 次提交
    • N
      btrfs: Don't assign retval of btrfs_try_tree_write_lock/btrfs_tree_read_lock_atomic · 65e99c43
      Nikolay Borisov 提交于
      Those function are simple boolean predicates there is no need to assign
      their return values to interim variables. Use them directly as
      predicates. No functional changes.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      65e99c43
    • J
      btrfs: create structure to encode checksum type and length · af024ed2
      Johannes Thumshirn 提交于
      Create a structure to encode the type and length for the known on-disk
      checksums.  This makes it easier to add new checksums later.
      
      The structure and helpers are moved from ctree.h so they don't occupy
      space in all headers including ctree.h. This save some space in the
      final object.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      af024ed2
    • D
      btrfs: tie extent buffer and it's token together · c82f823c
      David Sterba 提交于
      Further simplifaction of the get/set helpers is possible when the token
      is uniquely tied to an extent buffer. A condition and an assignment can
      be avoided.
      
      The initializations are moved closer to the first use when the extent
      buffer is valid. There's one exception in __push_leaf_left where the
      token is reused.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c82f823c
    • D
      btrfs: move functions for tree compare to send.c · 18d0f5c6
      David Sterba 提交于
      Send is the only user of tree_compare, we can move it there along with
      the other helpers and definitions.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      18d0f5c6
    • D
      btrfs: rename and export read_node_slot · 4b231ae4
      David Sterba 提交于
      Preparatory work for code that will be moved out of ctree and uses this
      function.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4b231ae4
    • F
      Btrfs: fix use-after-free when using the tree modification log · efad8a85
      Filipe Manana 提交于
      At ctree.c:get_old_root(), we are accessing a root's header owner field
      after we have freed the respective extent buffer. This results in an
      use-after-free that can lead to crashes, and when CONFIG_DEBUG_PAGEALLOC
      is set, results in a stack trace like the following:
      
        [ 3876.799331] stack segment: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
        [ 3876.799363] CPU: 0 PID: 15436 Comm: pool Not tainted 5.3.0-rc3-btrfs-next-54 #1
        [ 3876.799385] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 3876.799433] RIP: 0010:btrfs_search_old_slot+0x652/0xd80 [btrfs]
        (...)
        [ 3876.799502] RSP: 0018:ffff9f08c1a2f9f0 EFLAGS: 00010286
        [ 3876.799518] RAX: ffff8dd300000000 RBX: ffff8dd85a7a9348 RCX: 000000038da26000
        [ 3876.799538] RDX: 0000000000000000 RSI: ffffe522ce368980 RDI: 0000000000000246
        [ 3876.799559] RBP: dae1922adadad000 R08: 0000000008020000 R09: ffffe522c0000000
        [ 3876.799579] R10: ffff8dd57fd788c8 R11: 000000007511b030 R12: ffff8dd781ddc000
        [ 3876.799599] R13: ffff8dd9e6240578 R14: ffff8dd6896f7a88 R15: ffff8dd688cf90b8
        [ 3876.799620] FS:  00007f23ddd97700(0000) GS:ffff8dda20200000(0000) knlGS:0000000000000000
        [ 3876.799643] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 3876.799660] CR2: 00007f23d4024000 CR3: 0000000710bb0005 CR4: 00000000003606f0
        [ 3876.799682] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 3876.799703] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 3876.799723] Call Trace:
        [ 3876.799735]  ? do_raw_spin_unlock+0x49/0xc0
        [ 3876.799749]  ? _raw_spin_unlock+0x24/0x30
        [ 3876.799779]  resolve_indirect_refs+0x1eb/0xc80 [btrfs]
        [ 3876.799810]  find_parent_nodes+0x38d/0x1180 [btrfs]
        [ 3876.799841]  btrfs_check_shared+0x11a/0x1d0 [btrfs]
        [ 3876.799870]  ? extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799895]  extent_fiemap+0x598/0x6e0 [btrfs]
        [ 3876.799913]  do_vfs_ioctl+0x45a/0x700
        [ 3876.799926]  ksys_ioctl+0x70/0x80
        [ 3876.799938]  ? trace_hardirqs_off_thunk+0x1a/0x20
        [ 3876.799953]  __x64_sys_ioctl+0x16/0x20
        [ 3876.799965]  do_syscall_64+0x62/0x220
        [ 3876.799977]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 3876.799993] RIP: 0033:0x7f23e0013dd7
        (...)
        [ 3876.800056] RSP: 002b:00007f23ddd96ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 3876.800078] RAX: ffffffffffffffda RBX: 00007f23d80210f8 RCX: 00007f23e0013dd7
        [ 3876.800099] RDX: 00007f23d80210f8 RSI: 00000000c020660b RDI: 0000000000000003
        [ 3876.800626] RBP: 000055fa2a2a2440 R08: 0000000000000000 R09: 00007f23ddd96d7c
        [ 3876.801143] R10: 00007f23d8022000 R11: 0000000000000246 R12: 00007f23ddd96d80
        [ 3876.801662] R13: 00007f23ddd96d78 R14: 00007f23d80210f0 R15: 00007f23ddd96d80
        (...)
        [ 3876.805107] ---[ end trace e53161e179ef04f9 ]---
      
      Fix that by saving the root's header owner field into a local variable
      before freeing the root's extent buffer, and then use that local variable
      when needed.
      
      Fixes: 30b0463a ("Btrfs: fix accessing the root pointer in tree mod log functions")
      CC: stable@vger.kernel.org # 3.10+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      efad8a85
    • J
      btrfs: make caching_thread use btrfs_find_next_key · 6a9fb468
      Josef Bacik 提交于
      extent-tree.c has a find_next_key that just walks up the path to find
      the next key, but it is used for both the caching stuff and the snapshot
      delete stuff.  The snapshot deletion stuff is special so it can't really
      use btrfs_find_next_key, but the caching thread stuff can.  We just need
      to fix btrfs_find_next_key to deal with ->skip_locking and then it works
      exactly the same as the private find_next_key helper.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6a9fb468
    • D
      btrfs: assert tree mod log lock in __tree_mod_log_insert · 73e82fe4
      David Sterba 提交于
      The tree is going to be modified so it must be the exclusive lock.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73e82fe4
  2. 30 4月, 2019 21 次提交
    • Q
      btrfs: ctree: Dump the leaf before BUG_ON in btrfs_set_item_key_safe · 7c15d410
      Qu Wenruo 提交于
      We have a long standing problem with reversed keys that's detected by
      btrfs_set_item_key_safe. This is hard to reproduce so we'd like to
      capture more information for later analysis.
      
      Let's dump the leaf content before triggering BUG_ON() so that we can
      have some clue on what's going wrong.  The output of tree locks should
      help us to debug such problem.
      
      Sample stacktrace:
      
       generic/522             [00:07:05]
       [26946.113381] run fstests generic/522 at 2019-04-16 00:07:05
       [27161.474720] kernel BUG at fs/btrfs/ctree.c:3192!
       [27161.475923] invalid opcode: 0000 [#1] PREEMPT SMP
       [27161.477167] CPU: 0 PID: 15676 Comm: fsx Tainted: G        W         5.1.0-rc5-default+ #562
       [27161.478932] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
       [27161.481099] RIP: 0010:btrfs_set_item_key_safe+0x146/0x1c0 [btrfs]
       [27161.485369] RSP: 0018:ffffb087499e39b0 EFLAGS: 00010286
       [27161.486464] RAX: 00000000ffffffff RBX: ffff941534d80e70 RCX: 0000000000024000
       [27161.487929] RDX: 0000000000013039 RSI: ffffb087499e3aa5 RDI: ffffb087499e39c7
       [27161.489289] RBP: 000000000000000e R08: ffff9414e0f49008 R09: 0000000000001000
       [27161.490807] R10: 0000000000000000 R11: 0000000000000003 R12: ffff9414e0f48e70
       [27161.492305] R13: ffffb087499e3aa5 R14: 0000000000000000 R15: 0000000000071000
       [27161.493845] FS:  00007f8ea58d0b80(0000) GS:ffff94153d400000(0000) knlGS:0000000000000000
       [27161.495608] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [27161.496717] CR2: 00007f8ea57a9000 CR3: 0000000016a33000 CR4: 00000000000006f0
       [27161.498100] Call Trace:
       [27161.498771]  __btrfs_drop_extents+0x6ec/0xdf0 [btrfs]
       [27161.499872]  btrfs_log_changed_extents.isra.26+0x3a2/0x9e0 [btrfs]
       [27161.501114]  btrfs_log_inode+0x7ff/0xdc0 [btrfs]
       [27161.502114]  ? __mutex_unlock_slowpath+0x4b/0x2b0
       [27161.503172]  btrfs_log_inode_parent+0x237/0x9c0 [btrfs]
       [27161.504348]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
       [27161.505374]  btrfs_sync_file+0x1b7/0x480 [btrfs]
       [27161.506371]  __x64_sys_msync+0x180/0x210
       [27161.507208]  do_syscall_64+0x54/0x180
       [27161.507932]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [27161.508839] RIP: 0033:0x7f8ea5aa9c61
       [27161.512616] RSP: 002b:00007ffea2a06498 EFLAGS: 00000246 ORIG_RAX: 000000000000001a
       [27161.514161] RAX: ffffffffffffffda RBX: 000000000002a938 RCX: 00007f8ea5aa9c61
       [27161.515376] RDX: 0000000000000004 RSI: 000000000001c9b2 RDI: 00007f8ea578d000
       [27161.516572] RBP: 000000000001c07a R08: fffffffffffffff8 R09: 000000000002a000
       [27161.517883] R10: 00007f8ea57a99b2 R11: 0000000000000246 R12: 0000000000000938
       [27161.519080] R13: 00007f8ea578d000 R14: 000000000001c9b2 R15: 0000000000000000
       [27161.520281] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop [last unloaded: scsi_debug]
       [27161.522272] ---[ end trace d5afec7ccac6a252 ]---
       [27161.523111] RIP: 0010:btrfs_set_item_key_safe+0x146/0x1c0 [btrfs]
       [27161.527253] RSP: 0018:ffffb087499e39b0 EFLAGS: 00010286
       [27161.528192] RAX: 00000000ffffffff RBX: ffff941534d80e70 RCX: 0000000000024000
       [27161.529392] RDX: 0000000000013039 RSI: ffffb087499e3aa5 RDI: ffffb087499e39c7
       [27161.530607] RBP: 000000000000000e R08: ffff9414e0f49008 R09: 0000000000001000
       [27161.531802] R10: 0000000000000000 R11: 0000000000000003 R12: ffff9414e0f48e70
       [27161.533018] R13: ffffb087499e3aa5 R14: 0000000000000000 R15: 0000000000071000
       [27161.534405] FS:  00007f8ea58d0b80(0000) GS:ffff94153d400000(0000) knlGS:0000000000000000
       [27161.536048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [27161.537210] CR2: 00007f8ea57a9000 CR3: 0000000016a33000 CR4: 00000000000006f0
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7c15d410
    • D
    • D
      179d1e6a
    • D
      c7da9597
    • D
      c71dd880
    • D
      78ac4f9e
    • D
      25263cd7
    • D
      btrfs: get fs_info from eb in __push_leaf_left · 8087c193
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8087c193
    • D
      btrfs: get fs_info from eb in __push_leaf_right · f72f0010
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f72f0010
    • D
      btrfs: get fs_info from trans in copy_for_split · 94f94ad9
      David Sterba 提交于
      We can read fs_info from the transaction and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      94f94ad9
    • D
      btrfs: get fs_info from trans in insert_ptr · 6ad3cf6d
      David Sterba 提交于
      We can read fs_info from the transaction and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6ad3cf6d
    • D
      btrfs: get fs_info from trans in balance_node_right · 55d32ed8
      David Sterba 提交于
      We can read fs_info from the transaction and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      55d32ed8
    • D
      btrfs: get fs_info from trans in push_node_left · d30a668f
      David Sterba 提交于
      We can read fs_info from the transaction and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d30a668f
    • D
      btrfs: get fs_info from eb in btrfs_verify_level_key · e064d5e9
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e064d5e9
    • D
      btrfs: get fs_info from eb in read_node_slot · d0d20b0f
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d0d20b0f
    • D
      btrfs: get fs_info from eb in btrfs_leaf_free_space · e902baac
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e902baac
    • D
      btrfs: get fs_info from eb in clean_tree_block · 6a884d7d
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6a884d7d
    • D
      btrfs: get fs_info from eb in tree_mod_log_eb_copy · ed874f0d
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ed874f0d
    • D
      btrfs: get fs_info from eb in leaf_data_end · 8f881e8c
      David Sterba 提交于
      We can read fs_info from extent buffer and can drop it from the
      parameters.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8f881e8c
    • A
      btrfs: use BUG() instead of BUG_ON(1) · 290342f6
      Arnd Bergmann 提交于
      BUG_ON(1) leads to bogus warnings from clang when
      CONFIG_PROFILE_ANNOTATED_BRANCHES is set:
      
      fs/btrfs/volumes.c:5041:3: error: variable 'max_chunk_size' is used uninitialized whenever 'if' condition is false
            [-Werror,-Wsometimes-uninitialized]
                      BUG_ON(1);
                      ^~~~~~~~~
      include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
       #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                         ^~~~~~~~~~~~~~~~~~~
      include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
       #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      fs/btrfs/volumes.c:5046:9: note: uninitialized use occurs here
                                   max_chunk_size);
                                   ^~~~~~~~~~~~~~
      include/linux/kernel.h:860:36: note: expanded from macro 'min'
       #define min(x, y)       __careful_cmp(x, y, <)
                                               ^
      include/linux/kernel.h:853:17: note: expanded from macro '__careful_cmp'
                      __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
                                    ^
      include/linux/kernel.h:847:25: note: expanded from macro '__cmp_once'
                      typeof(y) unique_y = (y);               \
                                            ^
      fs/btrfs/volumes.c:5041:3: note: remove the 'if' if its condition is always true
                      BUG_ON(1);
                      ^
      include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
       #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                     ^
      fs/btrfs/volumes.c:4993:20: note: initialize the variable 'max_chunk_size' to silence this warning
              u64 max_chunk_size;
                                ^
                                 = 0
      
      Change it to BUG() so clang can see that this code path can never
      continue.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      290342f6
    • Q
      btrfs: Check the first key and level for cached extent buffer · 448de471
      Qu Wenruo 提交于
      [BUG]
      When reading a file from a fuzzed image, kernel can panic like:
      
        BTRFS warning (device loop0): csum failed root 5 ino 270 off 0 csum 0x98f94189 expected csum 0x00000000 mirror 1
        assertion failed: !memcmp_extent_buffer(b, &disk_key, offsetof(struct btrfs_leaf, items[0].key), sizeof(disk_key)), file: fs/btrfs/ctree.c, line: 2544
        ------------[ cut here ]------------
        kernel BUG at fs/btrfs/ctree.h:3500!
        invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
        RIP: 0010:btrfs_search_slot.cold.24+0x61/0x63 [btrfs]
        Call Trace:
         btrfs_lookup_csum+0x52/0x150 [btrfs]
         __btrfs_lookup_bio_sums+0x209/0x640 [btrfs]
         btrfs_submit_bio_hook+0x103/0x170 [btrfs]
         submit_one_bio+0x59/0x80 [btrfs]
         extent_read_full_page+0x58/0x80 [btrfs]
         generic_file_read_iter+0x2f6/0x9d0
         __vfs_read+0x14d/0x1a0
         vfs_read+0x8d/0x140
         ksys_read+0x52/0xc0
         do_syscall_64+0x60/0x210
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      [CAUSE]
      The fuzzed image has a corrupted leaf whose first key doesn't match its
      parent:
      
        checksum tree key (CSUM_TREE ROOT_ITEM 0)
        node 29741056 level 1 items 14 free 107 generation 19 owner CSUM_TREE
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
        	...
                key (EXTENT_CSUM EXTENT_CSUM 79691776) block 29761536 gen 19
      
        leaf 29761536 items 1 free space 1726 generation 19 owner CSUM_TREE
        leaf 29761536 flags 0x1(WRITTEN) backref revision 1
        fs uuid 3381d111-94a3-4ac7-8f39-611bbbdab7e6
        chunk uuid 9af1c3c7-2af5-488b-8553-530bd515f14c
                item 0 key (EXTENT_CSUM EXTENT_CSUM 8798638964736) itemoff 1751 itemsize 2244
                        range start 8798638964736 end 8798641262592 length 2297856
      
      When reading the above tree block, we have extent_buffer->refs = 2 in
      the context:
      
      - initial one from __alloc_extent_buffer()
        alloc_extent_buffer()
        |- __alloc_extent_buffer()
           |- atomic_set(&eb->refs, 1)
      
      - one being added to fs_info->buffer_radix
        alloc_extent_buffer()
        |- check_buffer_tree_ref()
           |- atomic_inc(&eb->refs)
      
      So if even we call free_extent_buffer() in read_tree_block or other
      similar situation, we only decrease the refs by 1, it doesn't reach 0
      and won't be freed right now.
      
      The staled eb and its corrupted content will still be kept cached.
      
      Furthermore, we have several extra cases where we either don't do first
      key check or the check is not proper for all callers:
      
      - scrub
        We just don't have first key in this context.
      
      - shared tree block
        One tree block can be shared by several snapshot/subvolume trees.
        In that case, the first key check for one subvolume doesn't apply to
        another.
      
      So for the above reasons, a corrupted extent buffer can sneak into the
      buffer cache.
      
      [FIX]
      Call verify_level_key in read_block_for_search to do another
      verification. For that purpose the function is exported.
      
      Due to above reasons, although we can free corrupted extent buffer from
      cache, we still need the check in read_block_for_search(), for scrub and
      shared tree blocks.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202755
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202757
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202759
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202761
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202767
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=202769Reported-by: NYoon Jungyeon <jungyeon@gatech.edu>
      CC: stable@vger.kernel.org # 4.19+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      448de471
  3. 25 2月, 2019 6 次提交
    • F
      Btrfs: remove assertion when searching for a key in a node/leaf · 253002f2
      Filipe Manana 提交于
      At ctree.c:key_search(), the assertion that verifies the first key on a
      child extent buffer corresponds to the key at a specific slot in the
      parent has a disadvantage: we effectively hit a BUG_ON() which requires
      rebooting the machine later. It also does not tell any information about
      which extent buffer is affected, from which root, the expected and found
      keys, etc.
      
      However as of commit 581c1760 ("btrfs: Validate child tree block's
      level and first key"), that assertion is not needed since at the time we
      read an extent buffer from disk we validate that its first key matches the
      key, at the respective slot, in the parent extent buffer. Therefore just
      remove the assertion at key_search().
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      253002f2
    • F
      Btrfs: add missing error handling after doing leaf/node binary search · cbca7d59
      Filipe Manana 提交于
      The function map_private_extent_buffer() can return an -EINVAL error, and
      it is called by generic_bin_search() which will return back the error. The
      btrfs_bin_search() function in turn calls generic_bin_search() and the
      key_search() function calls btrfs_bin_search(), so both can return the
      -EINVAL error coming from the map_private_extent_buffer() function. Some
      callers of these functions were ignoring that these functions can return
      an error, so fix them to deal with error return values.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cbca7d59
    • D
      btrfs: merge btrfs_set_lock_blocking_rw with it's caller · 766ece54
      David Sterba 提交于
      The last caller that does not have a fixed value of lock is
      btrfs_set_path_blocking, that actually does the same conditional swtich
      by the lock type so we can merge the branches together and remove the
      helper.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      766ece54
    • D
      btrfs: open code now trivial btrfs_set_lock_blocking · 8bead258
      David Sterba 提交于
      btrfs_set_lock_blocking is now only a simple wrapper around
      btrfs_set_lock_blocking_write. The name does not bring any semantic
      value that could not be inferred from the new function so there's no
      point keeping it.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8bead258
    • D
      btrfs: replace btrfs_set_lock_blocking_rw with appropriate helpers · 300aa896
      David Sterba 提交于
      We can use the right helper where the lock type is a fixed parameter.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      300aa896
    • Q
      btrfs: qgroup: Use delayed subtree rescan for balance · f616f5cd
      Qu Wenruo 提交于
      Before this patch, qgroup code traces the whole subtree of subvolume and
      reloc trees unconditionally.
      
      This makes qgroup numbers consistent, but it could cause tons of
      unnecessary extent tracing, which causes a lot of overhead.
      
      However for subtree swap of balance, just swap both subtrees because
      they contain the same contents and tree structure, so qgroup numbers
      won't change.
      
      It's the race window between subtree swap and transaction commit could
      cause qgroup number change.
      
      This patch will delay the qgroup subtree scan until COW happens for the
      subtree root.
      
      So if there is no other operations for the fs, balance won't cause extra
      qgroup overhead. (best case scenario)
      Depending on the workload, most of the subtree scan can still be
      avoided.
      
      Only for worst case scenario, it will fall back to old subtree swap
      overhead. (scan all swapped subtrees)
      
      [[Benchmark]]
      Hardware:
      	VM 4G vRAM, 8 vCPUs,
      	disk is using 'unsafe' cache mode,
      	backing device is SAMSUNG 850 evo SSD.
      	Host has 16G ram.
      
      Mkfs parameter:
      	--nodesize 4K (To bump up tree size)
      
      Initial subvolume contents:
      	4G data copied from /usr and /lib.
      	(With enough regular small files)
      
      Snapshots:
      	16 snapshots of the original subvolume.
      	each snapshot has 3 random files modified.
      
      balance parameter:
      	-m
      
      So the content should be pretty similar to a real world root fs layout.
      
      And after file system population, there is no other activity, so it
      should be the best case scenario.
      
                           | v4.20-rc1            | w/ patchset    | diff
      -----------------------------------------------------------------------
      relocated extents    | 22615                | 22457          | -0.1%
      qgroup dirty extents | 163457               | 121606         | -25.6%
      time (sys)           | 22.884s              | 18.842s        | -17.6%
      time (real)          | 27.724s              | 22.884s        | -17.5%
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f616f5cd
  4. 28 1月, 2019 1 次提交
    • F
      Btrfs: fix deadlock when allocating tree block during leaf/node split · a6279470
      Filipe Manana 提交于
      When splitting a leaf or node from one of the trees that are modified when
      flushing pending block groups (extent, chunk, device and free space trees),
      we need to allocate a new tree block, which in turn can result in the need
      to allocate a new block group. After allocating the new block group we may
      need to flush new block groups that were previously allocated during the
      course of the current transaction, which is what may cause a deadlock due
      to attempts to write lock twice the same leaf or node, as when splitting
      a leaf or node we are holding a write lock on it and its parent node.
      
      The same type of deadlock can also happen when increasing the tree's
      height, since we are holding a lock on the existing root while allocating
      the tree block to use as the new root node.
      
      An example trace when the deadlock happens during the leaf split path is:
      
        [27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G        W         4.19.16 #1
        [27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018
        [27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
        (...)
        [27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246
        [27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001
        [27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48
        [27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000
        [27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000
        [27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18
        [27175.303601] FS:  0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000
        [27175.304468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0
        [27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [27175.307940] Call Trace:
        [27175.308802]  btrfs_search_slot+0x779/0x9a0 [btrfs]
        [27175.309669]  ? update_space_info+0xba/0xe0 [btrfs]
        [27175.310534]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
        [27175.311397]  btrfs_insert_item+0x60/0xd0 [btrfs]
        [27175.312253]  btrfs_create_pending_block_groups+0xee/0x210 [btrfs]
        [27175.313116]  do_chunk_alloc+0x25f/0x300 [btrfs]
        [27175.313984]  find_free_extent+0x706/0x10d0 [btrfs]
        [27175.314855]  btrfs_reserve_extent+0x9b/0x1d0 [btrfs]
        [27175.315707]  btrfs_alloc_tree_block+0x100/0x5b0 [btrfs]
        [27175.316548]  split_leaf+0x130/0x610 [btrfs]
        [27175.317390]  btrfs_search_slot+0x94d/0x9a0 [btrfs]
        [27175.318235]  btrfs_insert_empty_items+0x67/0xc0 [btrfs]
        [27175.319087]  alloc_reserved_file_extent+0x84/0x2c0 [btrfs]
        [27175.319938]  __btrfs_run_delayed_refs+0x596/0x1150 [btrfs]
        [27175.320792]  btrfs_run_delayed_refs+0xed/0x1b0 [btrfs]
        [27175.321643]  delayed_ref_async_start+0x81/0x90 [btrfs]
        [27175.322491]  normal_work_helper+0xd0/0x320 [btrfs]
        [27175.323328]  ? move_linked_works+0x6e/0xa0
        [27175.324160]  process_one_work+0x191/0x370
        [27175.324976]  worker_thread+0x4f/0x3b0
        [27175.325763]  kthread+0xf8/0x130
        [27175.326531]  ? rescuer_thread+0x320/0x320
        [27175.327284]  ? kthread_create_worker_on_cpu+0x50/0x50
        [27175.328027]  ret_from_fork+0x35/0x40
        [27175.328741] ---[ end trace 300a1b9f0ac30e26 ]---
      
      Fix this by preventing the flushing of new blocks groups when splitting a
      leaf/node and when inserting a new root node for one of the trees modified
      by the flushing operation, similar to what is done when COWing a node/leaf
      from on of these trees.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383Reported-by: NEli V <eliventer@gmail.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a6279470
  5. 09 1月, 2019 1 次提交
  6. 17 12月, 2018 3 次提交
    • A
      btrfs: Fix typos in comments and strings · 52042d8e
      Andrea Gelmini 提交于
      The typos accumulate over time so once in a while time they get fixed in
      a large patch.
      Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      52042d8e
    • F
      Btrfs: send, fix race with transaction commits that create snapshots · be6821f8
      Filipe Manana 提交于
      If we create a snapshot of a snapshot currently being used by a send
      operation, we can end up with send failing unexpectedly (returning
      -ENOENT error to user space for example). The following diagram shows
      how this happens.
      
                  CPU 1                                   CPU2                                CPU3
      
       btrfs_ioctl_send()
        (...)
                                           create_snapshot()
                                            -> creates snapshot of a
                                               root used by the send
                                               task
                                            btrfs_commit_transaction()
                                             create_pending_snapshot()
        __get_inode_info()
         btrfs_search_slot()
          btrfs_search_slot_get_root()
           down_read commit_root_sem
      
           get reference on eb of the
           commit root
            -> eb with bytenr == X
      
           up_read commit_root_sem
      
                                              btrfs_cow_block(root node)
                                               btrfs_free_tree_block()
                                                -> creates delayed ref to
                                                   free the extent
      
                                             btrfs_run_delayed_refs()
                                              -> runs the delayed ref,
                                                 adds extent to
                                                 fs_info->pinned_extents
      
                                             btrfs_finish_extent_commit()
                                              unpin_extent_range()
                                               -> marks extent as free
                                                  in the free space cache
      
                                            transaction commit finishes
      
                                                                             btrfs_start_transaction()
                                                                              (...)
                                                                              btrfs_cow_block()
                                                                               btrfs_alloc_tree_block()
                                                                                btrfs_reserve_extent()
                                                                                 -> allocates extent at
                                                                                    bytenr == X
                                                                                btrfs_init_new_buffer(bytenr X)
                                                                                 btrfs_find_create_tree_block()
                                                                                  alloc_extent_buffer(bytenr X)
                                                                                   find_extent_buffer(bytenr X)
                                                                                    -> returns existing eb,
                                                                                       which the send task got
      
                                                                              (...)
                                                                               -> modifies content of the
                                                                                  eb with bytenr == X
      
          -> uses an eb that now
             belongs to some other
             tree and no more matches
             the commit root of the
             snapshot, resuts will be
             unpredictable
      
      The consequences of this race can be various, and can lead to searches in
      the commit root performed by the send task failing unexpectedly (unable to
      find inode items, returning -ENOENT to user space, for example) or not
      failing because an inode item with the same number was added to the tree
      that reused the metadata extent, in which case send can behave incorrectly
      in the worst case or just fail later for some reason.
      
      Fix this by performing a copy of the commit root's extent buffer when doing
      a search in the context of a send operation.
      
      CC: stable@vger.kernel.org # 4.4.x: 1fc28d8e: Btrfs: move get root out of btrfs_search_slot to a helper
      CC: stable@vger.kernel.org # 4.4.x: f9ddfd05: Btrfs: remove unused check of skip_locking
      CC: stable@vger.kernel.org # 4.4.x
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      be6821f8
    • J
      btrfs: catch cow on deleting snapshots · 83354f07
      Josef Bacik 提交于
      When debugging some weird extent reference bug I suspected that we were
      changing a snapshot while we were deleting it, which could explain my
      bug.  This was indeed what was happening, and this patch helped me
      verify my theory.  It is never correct to modify the snapshot once it's
      being deleted, so mark the root when we are deleting it and make sure we
      complain about it when it happens.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      83354f07