1. 25 5月, 2020 3 次提交
    • J
      btrfs: improve global reserve stealing logic · 7f9fe614
      Josef Bacik 提交于
      For unlink transactions and block group removal
      btrfs_start_transaction_fallback_global_rsv will first try to start an
      ordinary transaction and if it fails it will fall back to reserving the
      required amount by stealing from the global reserve. This is problematic
      because of all the same reasons we had with previous iterations of the
      ENOSPC handling, thundering herd.  We get a bunch of failures all at
      once, everybody tries to allocate from the global reserve, some win and
      some lose, we get an ENSOPC.
      
      Fix this behavior by introducing BTRFS_RESERVE_FLUSH_ALL_STEAL. It's
      used to mark unlink reservation. To fix this we need to integrate this
      logic into the normal ENOSPC infrastructure.  We still go through all of
      the normal flushing work, and at the moment we begin to fail all the
      tickets we try to satisfy any tickets that are allowed to steal by
      stealing from the global reserve.  If this works we start the flushing
      system over again just like we would with a normal ticket satisfaction.
      This serializes our global reserve stealing, so we don't have the
      thundering herd problem.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Tested-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7f9fe614
    • Q
      btrfs: backref: rename and move should_ignore_root() · 55465730
      Qu Wenruo 提交于
      This function is mostly single purpose to relocation backref cache, but
      since we're moving the main part of backref cache to backref.c, we need
      to export such function.
      
      And to avoid confusion, rename the function to
      btrfs_should_ignore_reloc_root() make the name a little more clear.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      55465730
    • Q
      btrfs: reloc: make reloc root search-specific for relocation backref cache · 2433bea5
      Qu Wenruo 提交于
      find_reloc_root() searches reloc_control::reloc_root_tree to find the
      reloc root.  This behavior is only useful for relocation backref cache.
      
      For the incoming more generic purpose backref cache, we don't care
      about who owns the reloc root, but only care if it's a reloc root.
      
      So this patch makes the following modifications to make the reloc root
      search more specific to relocation backref:
      
      - Add backref_node::is_reloc_root
        This will be an extra indicator for generic purposed backref cache.
        User doesn't need to read root key from backref_node::root to
        determine if it's a reloc root.
        Also for reloc tree root, it's useless and will be queued to useless
        list.
      
      - Add backref_cache::is_reloc
        This will allow backref cache code to do different behavior for
        generic purpose backref cache and relocation backref cache.
      
      - Pass fs_info to find_reloc_root()
      
      - Export find_reloc_root()
        So backref.c can utilize this function.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2433bea5
  2. 24 3月, 2020 20 次提交
  3. 31 1月, 2020 1 次提交
    • F
      Btrfs: fix race between adding and putting tree mod seq elements and nodes · 7227ff4d
      Filipe Manana 提交于
      There is a race between adding and removing elements to the tree mod log
      list and rbtree that can lead to use-after-free problems.
      
      Consider the following example that explains how/why the problems happens:
      
      1) Task A has mod log element with sequence number 200. It currently is
         the only element in the mod log list;
      
      2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
         access the tree mod log. When it enters the function, it initializes
         'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
         before checking if there are other elements in the mod seq list.
         Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
         unlocks the lock 'tree_mod_seq_lock';
      
      3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
         itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
         sequence number of 201;
      
      4) Some other task, name it task C, modifies a btree and because there
         elements in the mod seq list, it adds a tree mod elem to the tree
         mod log rbtree. That node added to the mod log rbtree is assigned
         a sequence number of 202;
      
      5) Task B, which is doing fiemap and resolving indirect back references,
         calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
         calls tree_mod_log_search() - the search returns the mod log node
         from the rbtree with sequence number 202, created by task C;
      
      6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
         the mod log rbtree and finds the node with sequence number 202. Since
         202 is less than the previously computed 'min_seq', (u64)-1, it
         removes the node and frees it;
      
      7) Task B still has a pointer to the node with sequence number 202, and
         it dereferences the pointer itself and through the call to
         __tree_mod_log_rewind(), resulting in a use-after-free problem.
      
      This issue can be triggered sporadically with the test case generic/561
      from fstests, and it happens more frequently with a higher number of
      duperemove processes. When it happens to me, it either freezes the VM or
      it produces a trace like the following before crashing:
      
        [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
        [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
        [ 1245.321307] Code: ....
        [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
        [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
        [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
        [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
        [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
        [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
        [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
        [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
        [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 1245.321706] Call Trace:
        [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
        [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
        [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
        [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
        [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
        [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
        [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
        [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
        [ 1245.322081]  ksys_ioctl+0x70/0x80
        [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
        [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
        [ 1245.322126]  do_syscall_64+0x5c/0x280
        [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 1245.322155] RIP: 0033:0x7fdee3942dd7
        [ 1245.322177] Code: ....
        [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
        [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
        [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
        [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
        [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
        [ 1245.322423] Modules linked in: ....
        [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---
      
      Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
      sequence number and iterates the rbtree while holding the lock
      'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
      lock, since it is now redundant.
      
      Fixes: bd989ba3 ("Btrfs: add tree modification log functions")
      Fixes: 097b8a7c ("Btrfs: join tree mod log code with the code holding back delayed refs")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7227ff4d
  4. 24 1月, 2020 1 次提交
  5. 20 1月, 2020 15 次提交