1. 09 2月, 2021 1 次提交
    • J
      btrfs: do not cleanup upper nodes in btrfs_backref_cleanup_node · 7e2a870a
      Josef Bacik 提交于
      Zygo reported the following panic when testing my error handling patches
      for relocation:
      
        kernel BUG at fs/btrfs/backref.c:2545!
        invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 3 PID: 8472 Comm: btrfs Tainted: G        W 14
        Hardware name: QEMU Standard PC (i440FX + PIIX,
      
        Call Trace:
         btrfs_backref_error_cleanup+0x4df/0x530
         build_backref_tree+0x1a5/0x700
         ? _raw_spin_unlock+0x22/0x30
         ? release_extent_buffer+0x225/0x280
         ? free_extent_buffer.part.52+0xd7/0x140
         relocate_tree_blocks+0x2a6/0xb60
         ? kasan_unpoison_shadow+0x35/0x50
         ? do_relocation+0xc10/0xc10
         ? kasan_kmalloc+0x9/0x10
         ? kmem_cache_alloc_trace+0x6a3/0xcb0
         ? free_extent_buffer.part.52+0xd7/0x140
         ? rb_insert_color+0x342/0x360
         ? add_tree_block.isra.36+0x236/0x2b0
         relocate_block_group+0x2eb/0x780
         ? merge_reloc_roots+0x470/0x470
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x18f0
         ? pvclock_clocksource_read+0xeb/0x190
         ? btrfs_relocate_chunk+0x120/0x120
         ? lock_contended+0x620/0x6e0
         ? do_raw_spin_lock+0x1e0/0x1e0
         ? do_raw_spin_unlock+0xa8/0x140
         btrfs_ioctl_balance+0x1f9/0x460
         btrfs_ioctl+0x24c8/0x4380
         ? __kasan_check_read+0x11/0x20
         ? check_chain_key+0x1f4/0x2f0
         ? __asan_loadN+0xf/0x20
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? kvm_sched_clock_read+0x18/0x30
         ? check_chain_key+0x1f4/0x2f0
         ? lock_downgrade+0x3f0/0x3f0
         ? handle_mm_fault+0xad6/0x2150
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags+0x26/0x30
         ? lock_is_held_type+0xc3/0xf0
         ? syscall_enter_from_user_mode+0x1b/0x60
         ? do_syscall_64+0x13/0x80
         ? rcu_read_lock_sched_held+0xa1/0xd0
         ? __kasan_check_read+0x11/0x20
         ? __fget_light+0xae/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurs because of this check
      
        if (RB_EMPTY_NODE(&upper->rb_node))
      	  BUG_ON(!list_empty(&node->upper));
      
      As we are dropping the backref node, if we discover that our upper node
      in the edge we just cleaned up isn't linked into the cache that we are
      now done with this node, thus the BUG_ON().
      
      However this is an erroneous assumption, as we will look up all the
      references for a node first, and then process the pending edges.  All of
      the 'upper' nodes in our pending edges won't be in the cache's rb_tree
      yet, because they haven't been processed.  We could very well have many
      edges still left to cleanup on this node.
      
      The fact is we simply do not need this check, we can just process all of
      the edges only for this node, because below this check we do the
      following
      
        if (list_empty(&upper->lower)) {
      	  list_add_tail(&upper->lower, &cache->leaves);
      	  upper->lowest = 1;
        }
      
      If the upper node truly isn't used yet, then we add it to the
      cache->leaves list to be cleaned up later.  If it is still used then the
      last child node that has it linked into its node will add it to the
      leaves list and then it will be cleaned up.
      
      Fix this problem by dropping this logic altogether.  With this fix I no
      longer see the panic when testing with error injection in the backref
      code.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7e2a870a
  2. 18 1月, 2021 1 次提交
    • J
      btrfs: do not double free backref nodes on error · 49ecc679
      Josef Bacik 提交于
      Zygo reported the following KASAN splat:
      
        BUG: KASAN: use-after-free in btrfs_backref_cleanup_node+0x18a/0x420
        Read of size 8 at addr ffff888112402950 by task btrfs/28836
      
        CPU: 0 PID: 28836 Comm: btrfs Tainted: G        W         5.10.0-e35f27394290-for-next+ #23
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
        Call Trace:
         dump_stack+0xbc/0xf9
         ? btrfs_backref_cleanup_node+0x18a/0x420
         print_address_description.constprop.8+0x21/0x210
         ? record_print_text.cold.34+0x11/0x11
         ? btrfs_backref_cleanup_node+0x18a/0x420
         ? btrfs_backref_cleanup_node+0x18a/0x420
         kasan_report.cold.10+0x20/0x37
         ? btrfs_backref_cleanup_node+0x18a/0x420
         __asan_load8+0x69/0x90
         btrfs_backref_cleanup_node+0x18a/0x420
         btrfs_backref_release_cache+0x83/0x1b0
         relocate_block_group+0x394/0x780
         ? merge_reloc_roots+0x4a0/0x4a0
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         ? check_flags.part.50+0x6c/0x1e0
         ? btrfs_relocate_chunk+0x120/0x120
         ? kmem_cache_alloc_trace+0xa06/0xcb0
         ? _copy_from_user+0x83/0xc0
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         ? __kasan_check_read+0x11/0x20
         ? check_chain_key+0x1f4/0x2f0
         ? __asan_loadN+0xf/0x20
         ? btrfs_ioctl_get_supported_features+0x30/0x30
         ? kvm_sched_clock_read+0x18/0x30
         ? check_chain_key+0x1f4/0x2f0
         ? lock_downgrade+0x3f0/0x3f0
         ? handle_mm_fault+0xad6/0x2150
         ? do_vfs_ioctl+0xfc/0x9d0
         ? ioctl_file_clone+0xe0/0xe0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags.part.50+0x6c/0x1e0
         ? check_flags+0x26/0x30
         ? lock_is_held_type+0xc3/0xf0
         ? syscall_enter_from_user_mode+0x1b/0x60
         ? do_syscall_64+0x13/0x80
         ? rcu_read_lock_sched_held+0xa1/0xd0
         ? __kasan_check_read+0x11/0x20
         ? __fget_light+0xae/0x110
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
        RIP: 0033:0x7f4c4bdfe427
      
        Allocated by task 28836:
         kasan_save_stack+0x21/0x50
         __kasan_kmalloc.constprop.18+0xbe/0xd0
         kasan_kmalloc+0x9/0x10
         kmem_cache_alloc_trace+0x410/0xcb0
         btrfs_backref_alloc_node+0x46/0xf0
         btrfs_backref_add_tree_node+0x60d/0x11d0
         build_backref_tree+0xc5/0x700
         relocate_tree_blocks+0x2be/0xb90
         relocate_block_group+0x2eb/0x780
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 28836:
         kasan_save_stack+0x21/0x50
         kasan_set_track+0x20/0x30
         kasan_set_free_info+0x1f/0x30
         __kasan_slab_free+0xf3/0x140
         kasan_slab_free+0xe/0x10
         kfree+0xde/0x200
         btrfs_backref_error_cleanup+0x452/0x530
         build_backref_tree+0x1a5/0x700
         relocate_tree_blocks+0x2be/0xb90
         relocate_block_group+0x2eb/0x780
         btrfs_relocate_block_group+0x26e/0x4c0
         btrfs_relocate_chunk+0x52/0x120
         btrfs_balance+0xe2e/0x1900
         btrfs_ioctl_balance+0x3a7/0x460
         btrfs_ioctl+0x24c8/0x4360
         __x64_sys_ioctl+0xc3/0x100
         do_syscall_64+0x37/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      This occurred because we freed our backref node in
      btrfs_backref_error_cleanup(), but then tried to free it again in
      btrfs_backref_release_cache().  This is because
      btrfs_backref_release_cache() will cycle through all of the
      cache->leaves nodes and free them up.  However
      btrfs_backref_error_cleanup() freed the backref node with
      btrfs_backref_free_node(), which simply kfree()d the backref node
      without unlinking it from the cache.  Change this to a
      btrfs_backref_drop_node(), which does the appropriate cleanup and
      removes the node from the cache->leaves list, so when we go to free the
      remaining cache we don't trip over items we've already dropped.
      
      Fixes: 75bfb9af ("Btrfs: cleanup error handling in build_backref_tree")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      49ecc679
  3. 08 12月, 2020 3 次提交
  4. 26 10月, 2020 1 次提交
    • J
      btrfs: add a helper to read the tree_root commit root for backref lookup · 49d11bea
      Josef Bacik 提交于
      I got the following lockdep splat with tree locks converted to rwsem
      patches on btrfs/104:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        5.9.0+ #102 Not tainted
        ------------------------------------------------------
        btrfs-cleaner/903 is trying to acquire lock:
        ffff8e7fab6ffe30 (btrfs-root-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x32/0x170
      
        but task is already holding lock:
        ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&fs_info->commit_root_sem){++++}-{3:3}:
      	 down_read+0x40/0x130
      	 caching_thread+0x53/0x5a0
      	 btrfs_work_helper+0xfa/0x520
      	 process_one_work+0x238/0x540
      	 worker_thread+0x55/0x3c0
      	 kthread+0x13a/0x150
      	 ret_from_fork+0x1f/0x30
      
        -> #2 (&caching_ctl->mutex){+.+.}-{3:3}:
      	 __mutex_lock+0x7e/0x7b0
      	 btrfs_cache_block_group+0x1e0/0x510
      	 find_free_extent+0xb6e/0x12f0
      	 btrfs_reserve_extent+0xb3/0x1b0
      	 btrfs_alloc_tree_block+0xb1/0x330
      	 alloc_tree_block_no_bg_flush+0x4f/0x60
      	 __btrfs_cow_block+0x11d/0x580
      	 btrfs_cow_block+0x10c/0x220
      	 commit_cowonly_roots+0x47/0x2e0
      	 btrfs_commit_transaction+0x595/0xbd0
      	 sync_filesystem+0x74/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0x14/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x36/0xa0
      	 cleanup_mnt+0x12d/0x190
      	 task_work_run+0x5c/0xa0
      	 exit_to_user_mode_prepare+0x1df/0x200
      	 syscall_exit_to_user_mode+0x54/0x280
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #1 (&space_info->groups_sem){++++}-{3:3}:
      	 down_read+0x40/0x130
      	 find_free_extent+0x2ed/0x12f0
      	 btrfs_reserve_extent+0xb3/0x1b0
      	 btrfs_alloc_tree_block+0xb1/0x330
      	 alloc_tree_block_no_bg_flush+0x4f/0x60
      	 __btrfs_cow_block+0x11d/0x580
      	 btrfs_cow_block+0x10c/0x220
      	 commit_cowonly_roots+0x47/0x2e0
      	 btrfs_commit_transaction+0x595/0xbd0
      	 sync_filesystem+0x74/0x90
      	 generic_shutdown_super+0x22/0x100
      	 kill_anon_super+0x14/0x30
      	 btrfs_kill_super+0x12/0x20
      	 deactivate_locked_super+0x36/0xa0
      	 cleanup_mnt+0x12d/0x190
      	 task_work_run+0x5c/0xa0
      	 exit_to_user_mode_prepare+0x1df/0x200
      	 syscall_exit_to_user_mode+0x54/0x280
      	 entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        -> #0 (btrfs-root-00){++++}-{3:3}:
      	 __lock_acquire+0x1167/0x2150
      	 lock_acquire+0xb9/0x3d0
      	 down_read_nested+0x43/0x130
      	 __btrfs_tree_read_lock+0x32/0x170
      	 __btrfs_read_lock_root_node+0x3a/0x50
      	 btrfs_search_slot+0x614/0x9d0
      	 btrfs_find_root+0x35/0x1b0
      	 btrfs_read_tree_root+0x61/0x120
      	 btrfs_get_root_ref+0x14b/0x600
      	 find_parent_nodes+0x3e6/0x1b30
      	 btrfs_find_all_roots_safe+0xb4/0x130
      	 btrfs_find_all_roots+0x60/0x80
      	 btrfs_qgroup_trace_extent_post+0x27/0x40
      	 btrfs_add_delayed_data_ref+0x3fd/0x460
      	 btrfs_free_extent+0x42/0x100
      	 __btrfs_mod_ref+0x1d7/0x2f0
      	 walk_up_proc+0x11c/0x400
      	 walk_up_tree+0xf0/0x180
      	 btrfs_drop_snapshot+0x1c7/0x780
      	 btrfs_clean_one_deleted_snapshot+0xfb/0x110
      	 cleaner_kthread+0xd4/0x140
      	 kthread+0x13a/0x150
      	 ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Chain exists of:
          btrfs-root-00 --> &caching_ctl->mutex --> &fs_info->commit_root_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&fs_info->commit_root_sem);
      				 lock(&caching_ctl->mutex);
      				 lock(&fs_info->commit_root_sem);
          lock(btrfs-root-00);
      
         *** DEADLOCK ***
      
        3 locks held by btrfs-cleaner/903:
         #0: ffff8e7fab628838 (&fs_info->cleaner_mutex){+.+.}-{3:3}, at: cleaner_kthread+0x6e/0x140
         #1: ffff8e7faadac640 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x40b/0x5c0
         #2: ffff8e7fab628a88 (&fs_info->commit_root_sem){++++}-{3:3}, at: btrfs_find_all_roots+0x41/0x80
      
        stack backtrace:
        CPU: 0 PID: 903 Comm: btrfs-cleaner Not tainted 5.9.0+ #102
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014
        Call Trace:
         dump_stack+0x8b/0xb0
         check_noncircular+0xcf/0xf0
         __lock_acquire+0x1167/0x2150
         ? __bfs+0x42/0x210
         lock_acquire+0xb9/0x3d0
         ? __btrfs_tree_read_lock+0x32/0x170
         down_read_nested+0x43/0x130
         ? __btrfs_tree_read_lock+0x32/0x170
         __btrfs_tree_read_lock+0x32/0x170
         __btrfs_read_lock_root_node+0x3a/0x50
         btrfs_search_slot+0x614/0x9d0
         ? find_held_lock+0x2b/0x80
         btrfs_find_root+0x35/0x1b0
         ? do_raw_spin_unlock+0x4b/0xa0
         btrfs_read_tree_root+0x61/0x120
         btrfs_get_root_ref+0x14b/0x600
         find_parent_nodes+0x3e6/0x1b30
         btrfs_find_all_roots_safe+0xb4/0x130
         btrfs_find_all_roots+0x60/0x80
         btrfs_qgroup_trace_extent_post+0x27/0x40
         btrfs_add_delayed_data_ref+0x3fd/0x460
         btrfs_free_extent+0x42/0x100
         __btrfs_mod_ref+0x1d7/0x2f0
         walk_up_proc+0x11c/0x400
         walk_up_tree+0xf0/0x180
         btrfs_drop_snapshot+0x1c7/0x780
         ? btrfs_clean_one_deleted_snapshot+0x73/0x110
         btrfs_clean_one_deleted_snapshot+0xfb/0x110
         cleaner_kthread+0xd4/0x140
         ? btrfs_alloc_root+0x50/0x50
         kthread+0x13a/0x150
         ? kthread_create_worker_on_cpu+0x40/0x40
         ret_from_fork+0x1f/0x30
        BTRFS info (device sdb): disk space caching is enabled
        BTRFS info (device sdb): has skinny extents
      
      This happens because qgroups does a backref lookup when we create a
      delayed ref.  From here it may have to look up a root from an indirect
      ref, which does a normal lookup on the tree_root, which takes the read
      lock on the tree_root nodes.
      
      To fix this we need to add a variant for looking up roots that searches
      the commit root of the tree_root.  Then when we do the backref search
      using the commit root we are sure to not take any locks on the tree_root
      nodes.  This gets rid of the lockdep splat when running btrfs/104.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      49d11bea
  5. 07 10月, 2020 1 次提交
  6. 11 8月, 2020 1 次提交
  7. 22 7月, 2020 1 次提交
    • F
      btrfs: fix double free on ulist after backref resolution failure · 580c079b
      Filipe Manana 提交于
      At btrfs_find_all_roots_safe() we allocate a ulist and set the **roots
      argument to point to it. However if later we fail due to an error returned
      by find_parent_nodes(), we free that ulist but leave a dangling pointer in
      the **roots argument. Upon receiving the error, a caller of this function
      can attempt to free the same ulist again, resulting in an invalid memory
      access.
      
      One such scenario is during qgroup accounting:
      
      btrfs_qgroup_account_extents()
      
       --> calls btrfs_find_all_roots() passes &new_roots (a stack allocated
           pointer) to btrfs_find_all_roots()
      
         --> btrfs_find_all_roots() just calls btrfs_find_all_roots_safe()
             passing &new_roots to it
      
           --> allocates ulist and assigns its address to **roots (which
               points to new_roots from btrfs_qgroup_account_extents())
      
           --> find_parent_nodes() returns an error, so we free the ulist
               and leave **roots pointing to it after returning
      
       --> btrfs_qgroup_account_extents() sees btrfs_find_all_roots() returned
           an error and jumps to the label 'cleanup', which just tries to
           free again the same ulist
      
      Stack trace example:
      
       ------------[ cut here ]------------
       BTRFS: tree first key check failed
       WARNING: CPU: 1 PID: 1763215 at fs/btrfs/disk-io.c:422 btrfs_verify_level_key+0xe0/0x180 [btrfs]
       Modules linked in: dm_snapshot dm_thin_pool (...)
       CPU: 1 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_verify_level_key+0xe0/0x180 [btrfs]
       Code: 28 5b 5d (...)
       RSP: 0018:ffffb89b473779a0 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff90397759bf08 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff
       RBP: ffff9039a419c000 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: ffffb89b43301000 R12: 000000000000005e
       R13: ffffb89b47377a2e R14: ffffb89b473779af R15: 0000000000000000
       FS:  00007fc47e1e1000(0000) GS:ffff9039ac200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fc47e1df000 CR3: 00000003d9e4e001 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        read_block_for_search+0xf6/0x350 [btrfs]
        btrfs_next_old_leaf+0x242/0x650 [btrfs]
        resolve_indirect_refs+0x7cf/0x9e0 [btrfs]
        find_parent_nodes+0x4ea/0x12c0 [btrfs]
        btrfs_find_all_roots_safe+0xbf/0x130 [btrfs]
        btrfs_qgroup_account_extents+0x9d/0x390 [btrfs]
        btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
        btrfs_sync_file+0x3d4/0x4d0 [btrfs]
        do_fsync+0x38/0x70
        __x64_sys_fdatasync+0x13/0x20
        do_syscall_64+0x5c/0xe0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fc47e2d72e3
       Code: Bad RIP value.
       RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
       RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
       RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
       RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
       R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
       irq event stamp: 0
       hardirqs last  enabled at (0): [<0000000000000000>] 0x0
       hardirqs last disabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
       softirqs last  enabled at (0): [<ffffffffb8eb5e85>] copy_process+0x755/0x1eb0
       softirqs last disabled at (0): [<0000000000000000>] 0x0
       ---[ end trace 8639237550317b48 ]---
       BTRFS error (device sdc): tree first key mismatch detected, bytenr=62324736 parent_transid=94 key expected=(262,108,1351680) has=(259,108,1921024)
       general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
       CPU: 2 PID: 1763215 Comm: fsstress Tainted: G        W         5.8.0-rc3-btrfs-next-64 #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:ulist_release+0x14/0x60 [btrfs]
       Code: c7 07 00 (...)
       RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
       RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
       RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
       R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
       FS:  00007fc47e1e1000(0000) GS:ffff9039ac600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f8c1c0a51c8 CR3: 00000003d9e4e004 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        ulist_free+0x13/0x20 [btrfs]
        btrfs_qgroup_account_extents+0xf3/0x390 [btrfs]
        btrfs_commit_transaction+0x4f7/0xb20 [btrfs]
        btrfs_sync_file+0x3d4/0x4d0 [btrfs]
        do_fsync+0x38/0x70
        __x64_sys_fdatasync+0x13/0x20
        do_syscall_64+0x5c/0xe0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fc47e2d72e3
       Code: Bad RIP value.
       RSP: 002b:00007fffa32098c8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
       RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc47e2d72e3
       RDX: 00007fffa3209830 RSI: 00007fffa3209830 RDI: 0000000000000003
       RBP: 000000000000072e R08: 0000000000000001 R09: 0000000000000003
       R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000003e8
       R13: 0000000051eb851f R14: 00007fffa3209970 R15: 00005607c4ac8b50
       Modules linked in: dm_snapshot dm_thin_pool (...)
       ---[ end trace 8639237550317b49 ]---
       RIP: 0010:ulist_release+0x14/0x60 [btrfs]
       Code: c7 07 00 (...)
       RSP: 0018:ffffb89b47377d60 EFLAGS: 00010282
       RAX: 6b6b6b6b6b6b6b6b RBX: ffff903959b56b90 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: 0000000000270024 RDI: ffff9036e2adc840
       RBP: ffff9036e2adc848 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff9036e2adc840
       R13: 0000000000000015 R14: ffff9039a419ccf8 R15: ffff90395d605840
       FS:  00007fc47e1e1000(0000) GS:ffff9039ad200000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f6a776f7d40 CR3: 00000003d9e4e002 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fix this by making btrfs_find_all_roots_safe() set *roots to NULL after
      it frees the ulist.
      
      Fixes: 8da6d581 ("Btrfs: added btrfs_find_all_roots()")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      580c079b
  8. 25 5月, 2020 13 次提交
  9. 30 4月, 2020 1 次提交
  10. 24 3月, 2020 10 次提交
    • J
      btrfs: do not resolve backrefs for roots that are being deleted · 39dba873
      Josef Bacik 提交于
      Zygo reported a deadlock where a task was stuck in the inode logical
      resolve code.  The deadlock looks like this
      
        Task 1
        btrfs_ioctl_logical_to_ino
        ->iterate_inodes_from_logical
         ->iterate_extent_inodes
          ->path->search_commit_root isn't set, so a transaction is started
            ->resolve_indirect_ref for a root that's being deleted
      	->search for our key, attempt to lock a node, DEADLOCK
      
        Task 2
        btrfs_drop_snapshot
        ->walk down to a leaf, lock it, walk up, lock node
         ->end transaction
          ->start transaction
            -> wait_cur_trans
      
        Task 3
        btrfs_commit_transaction
        ->wait_event(cur_trans->write_wait, num_writers == 1) DEADLOCK
      
      We are holding a transaction open in btrfs_ioctl_logical_to_ino while we
      try to resolve our references.  btrfs_drop_snapshot() holds onto its
      locks while it stops and starts transaction handles, because it assumes
      nobody is going to touch the root now.  Commit just does what commit
      does, waiting for the writers to finish, blocking any new trans handles
      from starting.
      
      Fix this by making the backref code not try to resolve backrefs of roots
      that are currently being deleted.  This will keep us from walking into a
      snapshot that's currently being deleted.
      
      This problem was harder to hit before because we rarely broke out of the
      snapshot delete halfway through, but with my delayed ref throttling code
      it happened much more often.  However we've always been able to do this,
      so it's not a new problem.
      
      Fixes: 8da6d581 ("Btrfs: added btrfs_find_all_roots()")
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      39dba873
    • J
      btrfs: kill the subvol_srcu · c75e8394
      Josef Bacik 提交于
      Now that we have proper root ref counting everywhere we can kill the
      subvol_srcu.
      
      * removal of fs_info::subvol_srcu reduces size of fs_info by 1176 bytes
      
      * the refcount_t used for the references checks for accidental 0->1
        in cases where the root lifetime would not be properly protected
      
      * there's a leak detector for roots to catch unfreed roots at umount
        time
      
      * SRCU served us well over the years but is was not a proper
        synchronization mechanism for some cases
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c75e8394
    • Q
      btrfs: relocation: Use btrfs_find_all_leafs to locate data extent parent tree leaves · 19b546d7
      Qu Wenruo 提交于
      In relocation, we need to locate all parent tree leaves referring to one
      data extent, thus we have a complex mechanism to iterate throught extent
      tree and subvolume trees to locate the related leaves.
      
      However this is already done in backref.c, we have
      btrfs_find_all_leafs(), which can return a ulist containing all leaves
      referring to that data extent.
      
      Use btrfs_find_all_leafs() to replace find_data_references().
      
      There is a special handling for v1 space cache data extents, where we
      need to delete the v1 space cache data extents, to avoid those data
      extents to hang the data relocation.
      
      In this patch, the special handling is done by re-iterating the root
      tree leaf.  Although it's a little less efficient than the old handling,
      considering we can reuse a lot of code, it should be acceptable.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      19b546d7
    • E
      btrfs: backref, use correct count to resolve normal data refs · b25b0b87
      ethanwu 提交于
      With the following patches:
      
      - btrfs: backref, only collect file extent items matching backref offset
      - btrfs: backref, not adding refs from shared block when resolving normal backref
      - btrfs: backref, only search backref entries from leaves of the same root
      
      we only collect the normal data refs we want, so the imprecise upper
      bound total_refs of that EXTENT_ITEM could now be changed to the count
      of the normal backref entry we want to search.
      
      Background and how the patches fit together:
      
      Btrfs has two types of data backref.
      For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the
      exact block number. Therefore, we need to call resolve_indirect_refs.
      It uses btrfs_search_slot to locate the leaf block. Then
      we need to walk through the leaves to search for the EXTENT_DATA items
      that have disk bytenr matching the extent item (add_all_parents).
      
      When resolving indirect refs, we could take entries that don't
      belong to the backref entry we are searching for right now.
      For that reason when searching backref entry, we always use total
      refs of that EXTENT_ITEM rather than individual count.
      
      For example:
      item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize
        extent refs 24 gen 7302 flags DATA
        shared data backref parent 394985472 count 10 #1
        extent data backref root 257 objectid 260 offset 1048576 count 3 #2
        extent data backref root 256 objectid 260 offset 65536 count 6 #3
        extent data backref root 257 objectid 260 offset 65536 count 5 #4
      
      For example, when searching backref entry #4, we'll use total_refs
      24, a very loose loop ending condition, instead of total_refs = 5.
      
      But using total_refs = 24 is not accurate. Sometimes, we'll never find
      all the refs from specific root.  As a result, the loop keeps on going
      until we reach the end of that inode.
      
      The first 3 patches, handle 3 different types refs we might encounter.
      These refs do not belong to the normal backref we are searching, and
      hence need to be skipped.
      
      This patch changes the total_refs to correct number so that we could
      end loop as soon as we find all the refs we want.
      
      btrfs send uses backref to find possible clone sources, the following
      is a simple test to compare the results with and without this patch:
      
       $ btrfs subvolume create /sub1
       $ for i in `seq 1 163840`; do
           dd if=/dev/zero of=/sub1/file bs=64K count=1 seek=$((i-1)) conv=notrunc oflag=direct
         done
       $ btrfs subvolume snapshot /sub1 /sub2
       $ for i in `seq 1 163840`; do
           dd if=/dev/zero of=/sub1/file bs=4K count=1 seek=$(((i-1)*16+10)) conv=notrunc oflag=direct
         done
       $ btrfs subvolume snapshot -r /sub1 /snap1
       $ time btrfs send /snap1 | btrfs receive /volume2
      
      Without this patch:
      
      real 69m48.124s
      user 0m50.199s
      sys  70m15.600s
      
      With this patch:
      
      real    1m59.683s
      user    0m35.421s
      sys     2m42.684s
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: Nethanwu <ethanwu@synology.com>
      [ add patchset cover letter with background and numbers ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b25b0b87
    • E
      btrfs: backref, only search backref entries from leaves of the same root · cfc0eed0
      ethanwu 提交于
      We could have some nodes/leaves in subvolume whose owner are not the
      that subvolume. In this way, when we resolve normal backrefs of that
      subvolume, we should avoid collecting those references from these blocks.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: Nethanwu <ethanwu@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cfc0eed0
    • E
      btrfs: backref, don't add refs from shared block when resolving normal backref · ed58f2e6
      ethanwu 提交于
      All references from the block of SHARED_DATA_REF belong to that shared
      block backref.
      
      For example:
      
        item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize 95
            extent refs 24 gen 7302 flags DATA
            extent data backref root 257 objectid 260 offset 65536 count 5
            extent data backref root 258 objectid 265 offset 0 count 9
            shared data backref parent 394985472 count 10
      
      Block 394985472 might be leaf from root 257, and the item obejctid and
      (file_pos - file_extent_item::offset) in that leaf just happens to be
      260 and 65536 which is equal to the first extent data backref entry.
      
      Before this patch, when we resolve backref:
      
        root 257 objectid 260 offset 65536
      
      we will add those refs in block 394985472 and wrongly treat those as the
      refs we want.
      
      Fix this by checking if the leaf we are processing is shared data
      backref, if so, just skip this leaf.
      
      Shared data refs added into preftrees.direct have all entry value = 0
      (root_id = 0, key = NULL, level = 0) except parent entry.
      
      Other refs from indirect tree will have key value and root id != 0, and
      these values won't be changed when their parent is resolved and added to
      preftrees.direct. Therefore, we could reuse the preftrees.direct and
      search ref with all values = 0 except parent is set to avoid getting
      those resolved refs block.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: Nethanwu <ethanwu@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ed58f2e6
    • E
      btrfs: backref, only collect file extent items matching backref offset · 7ac8b88e
      ethanwu 提交于
      When resolving one backref of type EXTENT_DATA_REF, we collect all
      references that simply reference the EXTENT_ITEM even though their
      (file_pos - file_extent_item::offset) are not the same as the
      btrfs_extent_data_ref::offset we are searching for.
      
      This patch adds additional check so that we only collect references whose
      (file_pos - file_extent_item::offset) == btrfs_extent_data_ref::offset.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: Nethanwu <ethanwu@synology.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7ac8b88e
    • J
      btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root · 00246528
      Josef Bacik 提交于
      We are now using these for all roots, rename them to btrfs_put_root()
      and btrfs_grab_root();
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      00246528
    • J
      btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root · bc44d7c4
      Josef Bacik 提交于
      Now that all callers of btrfs_get_fs_root are subsequently calling
      btrfs_grab_fs_root and handling dropping the ref when they are done
      appropriately, go ahead and push btrfs_grab_fs_root up into
      btrfs_get_fs_root.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bc44d7c4
    • J
      btrfs: hold a ref on the root in resolve_indirect_ref · 9326f76f
      Josef Bacik 提交于
      We're looking up a random root, we need to hold a ref on it while we're
      using it.
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9326f76f
  11. 31 7月, 2019 1 次提交
    • F
      Btrfs: fix deadlock between fiemap and transaction commits · a6d155d2
      Filipe Manana 提交于
      The fiemap handler locks a file range that can have unflushed delalloc,
      and after locking the range, it tries to attach to a running transaction.
      If the running transaction started its commit, that is, it is in state
      TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the
      flushoncommit option or the transaction is creating a snapshot for the
      subvolume that contains the file that fiemap is operating on, we end up
      deadlocking. This happens because fiemap is blocked on the transaction,
      waiting for it to complete, and the transaction is waiting for the flushed
      dealloc to complete, which requires locking the file range that the fiemap
      task already locked. The following stack traces serve as an example of
      when this deadlock happens:
      
        (...)
        [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
        [404571.515956] Call Trace:
        [404571.516360]  ? __schedule+0x3ae/0x7b0
        [404571.516730]  schedule+0x3a/0xb0
        [404571.517104]  lock_extent_bits+0x1ec/0x2a0 [btrfs]
        [404571.517465]  ? remove_wait_queue+0x60/0x60
        [404571.517832]  btrfs_finish_ordered_io+0x292/0x800 [btrfs]
        [404571.518202]  normal_work_helper+0xea/0x530 [btrfs]
        [404571.518566]  process_one_work+0x21e/0x5c0
        [404571.518990]  worker_thread+0x4f/0x3b0
        [404571.519413]  ? process_one_work+0x5c0/0x5c0
        [404571.519829]  kthread+0x103/0x140
        [404571.520191]  ? kthread_create_worker_on_cpu+0x70/0x70
        [404571.520565]  ret_from_fork+0x3a/0x50
        [404571.520915] kworker/u8:6    D    0 31651      2 0x80004000
        [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs]
        (...)
        [404571.537000] fsstress        D    0 13117  13115 0x00004000
        [404571.537263] Call Trace:
        [404571.537524]  ? __schedule+0x3ae/0x7b0
        [404571.537788]  schedule+0x3a/0xb0
        [404571.538066]  wait_current_trans+0xc8/0x100 [btrfs]
        [404571.538349]  ? remove_wait_queue+0x60/0x60
        [404571.538680]  start_transaction+0x33c/0x500 [btrfs]
        [404571.539076]  btrfs_check_shared+0xa3/0x1f0 [btrfs]
        [404571.539513]  ? extent_fiemap+0x2ce/0x650 [btrfs]
        [404571.539866]  extent_fiemap+0x2ce/0x650 [btrfs]
        [404571.540170]  do_vfs_ioctl+0x526/0x6f0
        [404571.540436]  ksys_ioctl+0x70/0x80
        [404571.540734]  __x64_sys_ioctl+0x16/0x20
        [404571.540997]  do_syscall_64+0x60/0x1d0
        [404571.541279]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        (...)
        [404571.543729] btrfs           D    0 14210  14208 0x00004000
        [404571.544023] Call Trace:
        [404571.544275]  ? __schedule+0x3ae/0x7b0
        [404571.544526]  ? wait_for_completion+0x112/0x1a0
        [404571.544795]  schedule+0x3a/0xb0
        [404571.545064]  schedule_timeout+0x1ff/0x390
        [404571.545351]  ? lock_acquire+0xa6/0x190
        [404571.545638]  ? wait_for_completion+0x49/0x1a0
        [404571.545890]  ? wait_for_completion+0x112/0x1a0
        [404571.546228]  wait_for_completion+0x131/0x1a0
        [404571.546503]  ? wake_up_q+0x70/0x70
        [404571.546775]  btrfs_wait_ordered_extents+0x27c/0x400 [btrfs]
        [404571.547159]  btrfs_commit_transaction+0x3b0/0xae0 [btrfs]
        [404571.547449]  ? btrfs_mksubvol+0x4a4/0x640 [btrfs]
        [404571.547703]  ? remove_wait_queue+0x60/0x60
        [404571.547969]  btrfs_mksubvol+0x605/0x640 [btrfs]
        [404571.548226]  ? __sb_start_write+0xd4/0x1c0
        [404571.548512]  ? mnt_want_write_file+0x24/0x50
        [404571.548789]  btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs]
        [404571.549048]  btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs]
        [404571.549307]  btrfs_ioctl+0x133f/0x3150 [btrfs]
        [404571.549549]  ? mem_cgroup_charge_statistics+0x4c/0xd0
        [404571.549792]  ? mem_cgroup_commit_charge+0x84/0x4b0
        [404571.550064]  ? __handle_mm_fault+0xe3e/0x11f0
        [404571.550306]  ? do_raw_spin_unlock+0x49/0xc0
        [404571.550608]  ? _raw_spin_unlock+0x24/0x30
        [404571.550976]  ? __handle_mm_fault+0xedf/0x11f0
        [404571.551319]  ? do_vfs_ioctl+0xa2/0x6f0
        [404571.551659]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
        [404571.552087]  do_vfs_ioctl+0xa2/0x6f0
        [404571.552355]  ksys_ioctl+0x70/0x80
        [404571.552621]  __x64_sys_ioctl+0x16/0x20
        [404571.552864]  do_syscall_64+0x60/0x1d0
        [404571.553104]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        (...)
      
      If we were joining the transaction instead of attaching to it, we would
      not risk a deadlock because a join only blocks if the transaction is in a
      state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc
      flush performed by a transaction is done before it reaches that state,
      when it is in the state TRANS_STATE_COMMIT_START. However a transaction
      join is intended for use cases where we do modify the filesystem, and
      fiemap only needs to peek at delayed references from the current
      transaction in order to determine if extents are shared, and, besides
      that, when there is no current transaction or when it blocks to wait for
      a current committing transaction to complete, it creates a new transaction
      without reserving any space. Such unnecessary transactions, besides doing
      unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary
      rotation of the precious backup roots.
      
      So fix this by adding a new transaction join variant, named join_nostart,
      which behaves like the regular join, but it does not create a transaction
      when none currently exists or after waiting for a committing transaction
      to complete.
      
      Fixes: 03628cdb ("Btrfs: do not start a transaction during fiemap")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a6d155d2
  12. 01 7月, 2019 1 次提交
  13. 30 4月, 2019 3 次提交
    • F
      Btrfs: do not start a transaction during fiemap · 03628cdb
      Filipe Manana 提交于
      During fiemap, for regular extents (non inline) we need to check if they
      are shared and if they are, set the shared bit. Checking if an extent is
      shared requires checking the delayed references of the currently running
      transaction, since some reference might have not yet hit the extent tree
      and be only in the in-memory delayed references.
      
      However we were using a transaction join for this, which creates a new
      transaction when there is no transaction currently running. That means
      that two more potential failures can happen: creating the transaction and
      committing it. Further, if no write activity is currently happening in the
      system, and fiemap calls keep being done, we end up creating and
      committing transactions that do nothing.
      
      In some extreme cases this can result in the commit of the transaction
      created by fiemap to fail with ENOSPC when updating the root item of a
      subvolume tree because a join does not reserve any space, leading to a
      trace like the following:
      
       heisenberg kernel: ------------[ cut here ]------------
       heisenberg kernel: BTRFS: Transaction aborted (error -28)
       heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
       heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
       heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
       heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
       heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
       heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
       heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
       heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
       heisenberg kernel: FS:  0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
       heisenberg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
       heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       heisenberg kernel: Call Trace:
       heisenberg kernel:  commit_fs_roots+0x166/0x1d0 [btrfs]
       heisenberg kernel:  ? _cond_resched+0x15/0x30
       heisenberg kernel:  ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
       heisenberg kernel:  btrfs_commit_transaction+0x2bd/0x870 [btrfs]
       heisenberg kernel:  ? start_transaction+0x9d/0x3f0 [btrfs]
       heisenberg kernel:  transaction_kthread+0x147/0x180 [btrfs]
       heisenberg kernel:  ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
       heisenberg kernel:  kthread+0x112/0x130
       heisenberg kernel:  ? kthread_bind+0x30/0x30
       heisenberg kernel:  ret_from_fork+0x35/0x40
       heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
      
      Since fiemap (and btrfs_check_shared()) is a read-only operation, do not do
      a transaction join to avoid the overhead of creating a new transaction (if
      there is currently no running transaction) and introducing a potential
      point of failure when the new transaction gets committed, instead use a
      transaction attach to grab a handle for the currently running transaction
      if any.
      Reported-by: NChristoph Anton Mitterer <calestyo@scientia.net>
      Link: https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/
      Fixes: afce772e ("btrfs: fix check_shared for fiemap ioctl")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      03628cdb
    • F
      Btrfs: do not start a transaction at iterate_extent_inodes() · bfc61c36
      Filipe Manana 提交于
      When finding out which inodes have references on a particular extent, done
      by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both
      v1 and v2) ioctl and from scrub we use the transaction join API to grab a
      reference on the currently running transaction, since in order to give
      accurate results we need to inspect the delayed references of the currently
      running transaction.
      
      However, if there is currently no running transaction, the join operation
      will create a new transaction. This is inefficient as the transaction will
      eventually be committed, doing unnecessary IO and introducing a potential
      point of failure that will lead to a transaction abort due to -ENOSPC, as
      recently reported [1].
      
      That's because the join, creates the transaction but does not reserve any
      space, so when attempting to update the root item of the root passed to
      btrfs_join_transaction(), during the transaction commit, we can end up
      failling with -ENOSPC. Users of a join operation are supposed to actually
      do some filesystem changes and reserve space by some means, which is not
      the case of iterate_extent_inodes(), it is a read-only operation for all
      contextes from which it is called.
      
      The reported [1] -ENOSPC failure stack trace is the following:
      
       heisenberg kernel: ------------[ cut here ]------------
       heisenberg kernel: BTRFS: Transaction aborted (error -28)
       heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
       heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
       heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
      (...)
       heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
       heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
       heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
       heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
       heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
       heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
       heisenberg kernel: FS:  0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
       heisenberg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
       heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       heisenberg kernel: Call Trace:
       heisenberg kernel:  commit_fs_roots+0x166/0x1d0 [btrfs]
       heisenberg kernel:  ? _cond_resched+0x15/0x30
       heisenberg kernel:  ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
       heisenberg kernel:  btrfs_commit_transaction+0x2bd/0x870 [btrfs]
       heisenberg kernel:  ? start_transaction+0x9d/0x3f0 [btrfs]
       heisenberg kernel:  transaction_kthread+0x147/0x180 [btrfs]
       heisenberg kernel:  ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
       heisenberg kernel:  kthread+0x112/0x130
       heisenberg kernel:  ? kthread_bind+0x30/0x30
       heisenberg kernel:  ret_from_fork+0x35/0x40
       heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
      
      So fix that by using the attach API, which does not create a transaction
      when there is currently no running transaction.
      
      [1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/Reported-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      bfc61c36
    • A
      btrfs: use BUG() instead of BUG_ON(1) · 290342f6
      Arnd Bergmann 提交于
      BUG_ON(1) leads to bogus warnings from clang when
      CONFIG_PROFILE_ANNOTATED_BRANCHES is set:
      
      fs/btrfs/volumes.c:5041:3: error: variable 'max_chunk_size' is used uninitialized whenever 'if' condition is false
            [-Werror,-Wsometimes-uninitialized]
                      BUG_ON(1);
                      ^~~~~~~~~
      include/asm-generic/bug.h:61:36: note: expanded from macro 'BUG_ON'
       #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                         ^~~~~~~~~~~~~~~~~~~
      include/linux/compiler.h:48:23: note: expanded from macro 'unlikely'
       #  define unlikely(x)   (__branch_check__(x, 0, __builtin_constant_p(x)))
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      fs/btrfs/volumes.c:5046:9: note: uninitialized use occurs here
                                   max_chunk_size);
                                   ^~~~~~~~~~~~~~
      include/linux/kernel.h:860:36: note: expanded from macro 'min'
       #define min(x, y)       __careful_cmp(x, y, <)
                                               ^
      include/linux/kernel.h:853:17: note: expanded from macro '__careful_cmp'
                      __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op))
                                    ^
      include/linux/kernel.h:847:25: note: expanded from macro '__cmp_once'
                      typeof(y) unique_y = (y);               \
                                            ^
      fs/btrfs/volumes.c:5041:3: note: remove the 'if' if its condition is always true
                      BUG_ON(1);
                      ^
      include/asm-generic/bug.h:61:32: note: expanded from macro 'BUG_ON'
       #define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)
                                     ^
      fs/btrfs/volumes.c:4993:20: note: initialize the variable 'max_chunk_size' to silence this warning
              u64 max_chunk_size;
                                ^
                                 = 0
      
      Change it to BUG() so clang can see that this code path can never
      continue.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      290342f6
  14. 25 2月, 2019 2 次提交
    • J
      btrfs: honor path->skip_locking in backref code · 38e3eebf
      Josef Bacik 提交于
      Qgroups will do the old roots lookup at delayed ref time, which could be
      while walking down the extent root while running a delayed ref.  This
      should be fine, except we specifically lock eb's in the backref walking
      code irrespective of path->skip_locking, which deadlocks the system.
      Fix up the backref code to honor path->skip_locking, nobody will be
      modifying the commit_root when we're searching so it's completely safe
      to do.
      
      This happens since fb235dc0 ("btrfs: qgroup: Move half of the qgroup
      accounting time out of commit trans"), kernel may lockup with quota
      enabled.
      
      There is one backref trace triggered by snapshot dropping along with
      write operation in the source subvolume.  The example can be reliably
      reproduced:
      
        btrfs-cleaner   D    0  4062      2 0x80000000
        Call Trace:
         schedule+0x32/0x90
         btrfs_tree_read_lock+0x93/0x130 [btrfs]
         find_parent_nodes+0x29b/0x1170 [btrfs]
         btrfs_find_all_roots_safe+0xa8/0x120 [btrfs]
         btrfs_find_all_roots+0x57/0x70 [btrfs]
         btrfs_qgroup_trace_extent_post+0x37/0x70 [btrfs]
         btrfs_qgroup_trace_leaf_items+0x10b/0x140 [btrfs]
         btrfs_qgroup_trace_subtree+0xc8/0xe0 [btrfs]
         do_walk_down+0x541/0x5e3 [btrfs]
         walk_down_tree+0xab/0xe7 [btrfs]
         btrfs_drop_snapshot+0x356/0x71a [btrfs]
         btrfs_clean_one_deleted_snapshot+0xb8/0xf0 [btrfs]
         cleaner_kthread+0x12b/0x160 [btrfs]
         kthread+0x112/0x130
         ret_from_fork+0x27/0x50
      
      When dropping snapshots with qgroup enabled, we will trigger backref
      walk.
      
      However such backref walk at that timing is pretty dangerous, as if one
      of the parent nodes get WRITE locked by other thread, we could cause a
      dead lock.
      
      For example:
      
                 FS 260     FS 261 (Dropped)
                  node A        node B
                 /      \      /      \
             node C      node D      node E
            /   \         /  \        /     \
        leaf F|leaf G|leaf H|leaf I|leaf J|leaf K
      
      The lock sequence would be:
      
            Thread A (cleaner)             |       Thread B (other writer)
      -----------------------------------------------------------------------
      write_lock(B)                        |
      write_lock(D)                        |
      ^^^ called by walk_down_tree()       |
                                           |       write_lock(A)
                                           |       write_lock(D) << Stall
      read_lock(H) << for backref walk     |
      read_lock(D) << lock owner is        |
                      the same thread A    |
                      so read lock is OK   |
      read_lock(A) << Stall                |
      
      So thread A hold write lock D, and needs read lock A to unlock.
      While thread B holds write lock A, while needs lock D to unlock.
      
      This will cause a deadlock.
      
      This is not only limited to snapshot dropping case.  As the backref
      walk, even only happens on commit trees, is breaking the normal top-down
      locking order, makes it deadlock prone.
      
      Fixes: fb235dc0 ("btrfs: qgroup: Move half of the qgroup accounting time out of commit trans")
      CC: stable@vger.kernel.org # 4.14+
      Reported-and-tested-by: NDavid Sterba <dsterba@suse.com>
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      [ rebase to latest branch and fix lock assert bug in btrfs/007 ]
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      [ copy logs and deadlock analysis from Qu's patch ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      38e3eebf
    • D
      btrfs: replace btrfs_set_lock_blocking_rw with appropriate helpers · 300aa896
      David Sterba 提交于
      We can use the right helper where the lock type is a fixed parameter.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      300aa896