1. 25 5月, 2020 15 次提交
    • D
      btrfs: simplify iget helpers · 0202e83f
      David Sterba 提交于
      The inode lookup starting at btrfs_iget takes the full location key,
      while only the objectid is used to match the inode, because the lookup
      happens inside the given root thus the inode number is unique.
      The entire location key is properly set up in btrfs_init_locked_inode.
      
      Simplify the helpers and pass only inode number, renaming it to 'ino'
      instead of 'objectid'. This allows to remove temporary variables key,
      saving some stack space.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0202e83f
    • Q
      btrfs: don't set SHAREABLE flag for data reloc tree · aeb935a4
      Qu Wenruo 提交于
      SHAREABLE flag is set for subvolumes because users can create snapshot
      for subvolumes, thus sharing tree blocks of them.
      
      But data reloc tree is not exposed to user space, as it's only an
      internal tree for data relocation, thus it doesn't need the full path
      replacement handling at all.
      
      This patch will make data reloc tree a non-shareable tree, and add
      btrfs_fs_info::data_reloc_root for data reloc tree, so relocation code
      can grab it from fs_info directly.
      
      This would slightly improve tree relocation, as now data reloc tree
      can go through regular COW routine to get relocated, without bothering
      the complex tree reloc tree routine.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      aeb935a4
    • Q
      btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE · 92a7cc42
      Qu Wenruo 提交于
      The name BTRFS_ROOT_REF_COWS is not very clear about the meaning.
      
      In fact, that bit can only be set to those trees:
      
      - Subvolume roots
      - Data reloc root
      - Reloc roots for above roots
      
      All other trees won't get this bit set.  So just by the result, it is
      obvious that, roots with this bit set can have tree blocks shared with
      other trees.  Either shared by snapshots, or by reloc roots (an special
      snapshot created by relocation).
      
      This patch will rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE to
      make it easier to understand, and update all comment mentioning
      "reference counted" to follow the rename.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      92a7cc42
    • D
      btrfs: constify extent_buffer in the API functions · 2b48966a
      David Sterba 提交于
      There are many helpers around extent buffers, found in extent_io.h and
      ctree.h. Most of them can be converted to take constified eb as there
      are no changes to the extent buffer structure itself but rather the
      pages.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2b48966a
    • D
      btrfs: preset set/get token with first page and drop condition · 870b388d
      David Sterba 提交于
      All the set/get helpers first check if the token contains a cached
      address. After first use the address is always valid, but the extra
      check is done for each call.
      
      The token initialization can optimistically set it to the first extent
      buffer page, that we know always exists. Then the condition in all
      btrfs_token_*/btrfs_set_token_* can be simplified by removing the
      address check from the condition, but for development the assertion
      still makes sure it's valid.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      870b388d
    • D
      btrfs: drop eb parameter from set/get token helpers · cc4c13d5
      David Sterba 提交于
      Now that all set/get helpers use the eb from the token, we don't need to
      pass it to many btrfs_token_*/btrfs_set_token_* helpers, saving some
      stack space.
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      cc4c13d5
    • F
      btrfs: move the block group freeze/unfreeze helpers into block-group.c · 684b752b
      Filipe Manana 提交于
      The helpers btrfs_freeze_block_group() and btrfs_unfreeze_block_group()
      used to be named btrfs_get_block_group_trimming() and
      btrfs_put_block_group_trimming() respectively.
      
      At the time they were added to free-space-cache.c, by commit e33e17ee
      ("btrfs: add missing discards when unpinning extents with -o discard")
      because all the trimming related functions were in free-space-cache.c.
      
      Now that the helpers were renamed and are used in scrub context as well,
      move them to block-group.c, a much more logical location for them.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      684b752b
    • F
      btrfs: rename member 'trimming' of block group to a more generic name · 6b7304af
      Filipe Manana 提交于
      Back in 2014, commit 04216820 ("Btrfs: fix race between fs trimming
      and block group remove/allocation"), I added the 'trimming' member to the
      block group structure. Its purpose was to prevent races between trimming
      and block group deletion/allocation by pinning the block group in a way
      that prevents its logical address and device extents from being reused
      while trimming is in progress for a block group, so that if another task
      deletes the block group and then another task allocates a new block group
      that gets the same logical address and device extents while the trimming
      task is still in progress.
      
      After the previous fix for scrub (patch "btrfs: fix a race between scrub
      and block group removal/allocation"), scrub now also has the same needs that
      trimming has, so the member name 'trimming' no longer makes sense.
      Since there is already a 'pinned' member in the block group that refers
      to space reservations (pinned bytes), rename the member to 'frozen',
      add a comment on top of it to describe its general purpose and rename
      the helpers to increment and decrement the counter as well, to match
      the new member name.
      
      The next patch in the series will move the helpers into a more suitable
      file (from free-space-cache.c to block-group.c).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6b7304af
    • D
      btrfs: remove more obsolete v0 extent ref declarations · 31344b2f
      David Sterba 提交于
      The extent references v0 have been superseded long time go, there are
      some unused declarations of access helpers. We can safely remove them
      now. The struct btrfs_extent_ref_v0 is not used anywhere, but struct
      btrfs_extent_item_v0 is still part of a backward compatibility check in
      relocation.c and thus not removed.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      31344b2f
    • Y
      btrfs: remove unused function btrfs_dev_extent_chunk_tree_uuid · 943aeb0d
      YueHaibing 提交于
      There's no callers in-tree anymore since
      commit d24ee97b ("btrfs: use new helpers to set uuids in eb")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      943aeb0d
    • O
      btrfs: get rid of endio_repair_workers · 5c047a69
      Omar Sandoval 提交于
      This was originally added in commit 8b110e39 ("Btrfs: implement
      repair function when direct read fails") to avoid a deadlock. In that
      commit, the direct I/O read endio executes on the endio_workers
      workqueue, submits a repair bio, and waits for it to complete. The
      repair bio endio must execute on a different workqueue, otherwise it
      could block on the endio_workers workqueue becoming available, which
      won't happen because the original endio is blocked on the repair bio.
      
      As of the previous commit, the original endio doesn't wait for the
      repair bio, so this separate workqueue is unnecessary.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5c047a69
    • Q
      btrfs: remove the redundant parameter level in btrfs_bin_search() · e3b83361
      Qu Wenruo 提交于
      All callers pass the eb::level so we can get read it directly inside the
      btrfs_bin_search and key_search.
      
      This is inspired by the work of Marek in U-boot.
      
      CC: Marek Behun <marek.behun@nic.cz>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e3b83361
    • J
      btrfs: improve global reserve stealing logic · 7f9fe614
      Josef Bacik 提交于
      For unlink transactions and block group removal
      btrfs_start_transaction_fallback_global_rsv will first try to start an
      ordinary transaction and if it fails it will fall back to reserving the
      required amount by stealing from the global reserve. This is problematic
      because of all the same reasons we had with previous iterations of the
      ENOSPC handling, thundering herd.  We get a bunch of failures all at
      once, everybody tries to allocate from the global reserve, some win and
      some lose, we get an ENSOPC.
      
      Fix this behavior by introducing BTRFS_RESERVE_FLUSH_ALL_STEAL. It's
      used to mark unlink reservation. To fix this we need to integrate this
      logic into the normal ENOSPC infrastructure.  We still go through all of
      the normal flushing work, and at the moment we begin to fail all the
      tickets we try to satisfy any tickets that are allowed to steal by
      stealing from the global reserve.  If this works we start the flushing
      system over again just like we would with a normal ticket satisfaction.
      This serializes our global reserve stealing, so we don't have the
      thundering herd problem.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Tested-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7f9fe614
    • Q
      btrfs: backref: rename and move should_ignore_root() · 55465730
      Qu Wenruo 提交于
      This function is mostly single purpose to relocation backref cache, but
      since we're moving the main part of backref cache to backref.c, we need
      to export such function.
      
      And to avoid confusion, rename the function to
      btrfs_should_ignore_reloc_root() make the name a little more clear.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      55465730
    • Q
      btrfs: reloc: make reloc root search-specific for relocation backref cache · 2433bea5
      Qu Wenruo 提交于
      find_reloc_root() searches reloc_control::reloc_root_tree to find the
      reloc root.  This behavior is only useful for relocation backref cache.
      
      For the incoming more generic purpose backref cache, we don't care
      about who owns the reloc root, but only care if it's a reloc root.
      
      So this patch makes the following modifications to make the reloc root
      search more specific to relocation backref:
      
      - Add backref_node::is_reloc_root
        This will be an extra indicator for generic purposed backref cache.
        User doesn't need to read root key from backref_node::root to
        determine if it's a reloc root.
        Also for reloc tree root, it's useless and will be queued to useless
        list.
      
      - Add backref_cache::is_reloc
        This will allow backref cache code to do different behavior for
        generic purpose backref cache and relocation backref cache.
      
      - Pass fs_info to find_reloc_root()
      
      - Export find_reloc_root()
        So backref.c can utilize this function.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      2433bea5
  2. 24 3月, 2020 20 次提交
  3. 31 1月, 2020 1 次提交
    • F
      Btrfs: fix race between adding and putting tree mod seq elements and nodes · 7227ff4d
      Filipe Manana 提交于
      There is a race between adding and removing elements to the tree mod log
      list and rbtree that can lead to use-after-free problems.
      
      Consider the following example that explains how/why the problems happens:
      
      1) Task A has mod log element with sequence number 200. It currently is
         the only element in the mod log list;
      
      2) Task A calls btrfs_put_tree_mod_seq() because it no longer needs to
         access the tree mod log. When it enters the function, it initializes
         'min_seq' to (u64)-1. Then it acquires the lock 'tree_mod_seq_lock'
         before checking if there are other elements in the mod seq list.
         Since the list it empty, 'min_seq' remains set to (u64)-1. Then it
         unlocks the lock 'tree_mod_seq_lock';
      
      3) Before task A acquires the lock 'tree_mod_log_lock', task B adds
         itself to the mod seq list through btrfs_get_tree_mod_seq() and gets a
         sequence number of 201;
      
      4) Some other task, name it task C, modifies a btree and because there
         elements in the mod seq list, it adds a tree mod elem to the tree
         mod log rbtree. That node added to the mod log rbtree is assigned
         a sequence number of 202;
      
      5) Task B, which is doing fiemap and resolving indirect back references,
         calls btrfs get_old_root(), with 'time_seq' == 201, which in turn
         calls tree_mod_log_search() - the search returns the mod log node
         from the rbtree with sequence number 202, created by task C;
      
      6) Task A now acquires the lock 'tree_mod_log_lock', starts iterating
         the mod log rbtree and finds the node with sequence number 202. Since
         202 is less than the previously computed 'min_seq', (u64)-1, it
         removes the node and frees it;
      
      7) Task B still has a pointer to the node with sequence number 202, and
         it dereferences the pointer itself and through the call to
         __tree_mod_log_rewind(), resulting in a use-after-free problem.
      
      This issue can be triggered sporadically with the test case generic/561
      from fstests, and it happens more frequently with a higher number of
      duperemove processes. When it happens to me, it either freezes the VM or
      it produces a trace like the following before crashing:
      
        [ 1245.321140] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
        [ 1245.321200] CPU: 1 PID: 26997 Comm: pool Not tainted 5.5.0-rc6-btrfs-next-52 #1
        [ 1245.321235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
        [ 1245.321287] RIP: 0010:rb_next+0x16/0x50
        [ 1245.321307] Code: ....
        [ 1245.321372] RSP: 0018:ffffa151c4d039b0 EFLAGS: 00010202
        [ 1245.321388] RAX: 6b6b6b6b6b6b6b6b RBX: ffff8ae221363c80 RCX: 6b6b6b6b6b6b6b6b
        [ 1245.321409] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8ae221363c80
        [ 1245.321439] RBP: ffff8ae20fcc4688 R08: 0000000000000002 R09: 0000000000000000
        [ 1245.321475] R10: ffff8ae20b120910 R11: 00000000243f8bb1 R12: 0000000000000038
        [ 1245.321506] R13: ffff8ae221363c80 R14: 000000000000075f R15: ffff8ae223f762b8
        [ 1245.321539] FS:  00007fdee1ec7700(0000) GS:ffff8ae236c80000(0000) knlGS:0000000000000000
        [ 1245.321591] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        [ 1245.321614] CR2: 00007fded4030c48 CR3: 000000021da16003 CR4: 00000000003606e0
        [ 1245.321642] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        [ 1245.321668] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        [ 1245.321706] Call Trace:
        [ 1245.321798]  __tree_mod_log_rewind+0xbf/0x280 [btrfs]
        [ 1245.321841]  btrfs_search_old_slot+0x105/0xd00 [btrfs]
        [ 1245.321877]  resolve_indirect_refs+0x1eb/0xc60 [btrfs]
        [ 1245.321912]  find_parent_nodes+0x3dc/0x11b0 [btrfs]
        [ 1245.321947]  btrfs_check_shared+0x115/0x1c0 [btrfs]
        [ 1245.321980]  ? extent_fiemap+0x59d/0x6d0 [btrfs]
        [ 1245.322029]  extent_fiemap+0x59d/0x6d0 [btrfs]
        [ 1245.322066]  do_vfs_ioctl+0x45a/0x750
        [ 1245.322081]  ksys_ioctl+0x70/0x80
        [ 1245.322092]  ? trace_hardirqs_off_thunk+0x1a/0x1c
        [ 1245.322113]  __x64_sys_ioctl+0x16/0x20
        [ 1245.322126]  do_syscall_64+0x5c/0x280
        [ 1245.322139]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [ 1245.322155] RIP: 0033:0x7fdee3942dd7
        [ 1245.322177] Code: ....
        [ 1245.322258] RSP: 002b:00007fdee1ec6c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
        [ 1245.322294] RAX: ffffffffffffffda RBX: 00007fded40210d8 RCX: 00007fdee3942dd7
        [ 1245.322314] RDX: 00007fded40210d8 RSI: 00000000c020660b RDI: 0000000000000004
        [ 1245.322337] RBP: 0000562aa89e7510 R08: 0000000000000000 R09: 00007fdee1ec6d44
        [ 1245.322369] R10: 0000000000000073 R11: 0000000000000246 R12: 00007fdee1ec6d48
        [ 1245.322390] R13: 00007fdee1ec6d40 R14: 00007fded40210d0 R15: 00007fdee1ec6d50
        [ 1245.322423] Modules linked in: ....
        [ 1245.323443] ---[ end trace 01de1e9ec5dff3cd ]---
      
      Fix this by ensuring that btrfs_put_tree_mod_seq() computes the minimum
      sequence number and iterates the rbtree while holding the lock
      'tree_mod_log_lock' in write mode. Also get rid of the 'tree_mod_seq_lock'
      lock, since it is now redundant.
      
      Fixes: bd989ba3 ("Btrfs: add tree modification log functions")
      Fixes: 097b8a7c ("Btrfs: join tree mod log code with the code holding back delayed refs")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7227ff4d
  4. 24 1月, 2020 1 次提交
  5. 20 1月, 2020 3 次提交