1. 27 7月, 2020 3 次提交
  2. 17 6月, 2020 2 次提交
    • F
      btrfs: fix race between block group removal and block group creation · ffcb9d44
      Filipe Manana 提交于
      There is a race between block group removal and block group creation
      when the removal is completed by a task running fitrim or scrub. When
      this happens we end up failing the block group creation with an error
      -EEXIST since we attempt to insert a duplicate block group item key
      in the extent tree. That results in a transaction abort.
      
      The race happens like this:
      
      1) Task A is doing a fitrim, and at btrfs_trim_block_group() it freezes
         block group X with btrfs_freeze_block_group() (until very recently
         that was named btrfs_get_block_group_trimming());
      
      2) Task B starts removing block group X, either because it's now unused
         or due to relocation for example. So at btrfs_remove_block_group(),
         while holding the chunk mutex and the block group's lock, it sets
         the 'removed' flag of the block group and it sets the local variable
         'remove_em' to false, because the block group is currently frozen
         (its 'frozen' counter is > 0, until very recently this counter was
         named 'trimming');
      
      3) Task B unlocks the block group and the chunk mutex;
      
      4) Task A is done trimming the block group and unfreezes the block group
         by calling btrfs_unfreeze_block_group() (until very recently this was
         named btrfs_put_block_group_trimming()). In this function we lock the
         block group and set the local variable 'cleanup' to true because we
         were able to decrement the block group's 'frozen' counter down to 0 and
         the flag 'removed' is set in the block group.
      
         Since 'cleanup' is set to true, it locks the chunk mutex and removes
         the extent mapping representing the block group from the mapping tree;
      
      5) Task C allocates a new block group Y and it picks up the logical address
         that block group X had as the logical address for Y, because X was the
         block group with the highest logical address and now the second block
         group with the highest logical address, the last in the fs mapping tree,
         ends at an offset corresponding to block group X's logical address (this
         logical address selection is done at volumes.c:find_next_chunk()).
      
         At this point the new block group Y does not have yet its item added
         to the extent tree (nor the corresponding device extent items and
         chunk item in the device and chunk trees). The new group Y is added to
         the list of pending block groups in the transaction handle;
      
      6) Before task B proceeds to removing the block group item for block
         group X from the extent tree, which has a key matching:
      
         (X logical offset, BTRFS_BLOCK_GROUP_ITEM_KEY, length)
      
         task C while ending its transaction handle calls
         btrfs_create_pending_block_groups(), which finds block group Y and
         tries to insert the block group item for Y into the exten tree, which
         fails with -EEXIST since logical offset is the same that X had and
         task B hasn't yet deleted the key from the extent tree.
         This failure results in a transaction abort, producing a stack like
         the following:
      
      ------------[ cut here ]------------
       BTRFS: Transaction aborted (error -17)
       WARNING: CPU: 2 PID: 19736 at fs/btrfs/block-group.c:2074 btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs]
       Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
       CPU: 2 PID: 19736 Comm: fsstress Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
       RIP: 0010:btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs]
       Code: ff ff ff 48 8b 55 50 f0 48 (...)
       RSP: 0018:ffffa4160a1c7d58 EFLAGS: 00010286
       RAX: 0000000000000000 RBX: ffff961581909d98 RCX: 0000000000000000
       RDX: 0000000000000001 RSI: ffffffffb3d63990 RDI: 0000000000000001
       RBP: ffff9614f3356a58 R08: 0000000000000000 R09: 0000000000000001
       R10: ffff9615b65b0040 R11: 0000000000000000 R12: ffff961581909c10
       R13: ffff9615b0c32000 R14: ffff9614f3356ab0 R15: ffff9614be779000
       FS:  00007f2ce2841e80(0000) GS:ffff9615bae00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000555f18780000 CR3: 0000000131d34005 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        btrfs_start_dirty_block_groups+0x398/0x4e0 [btrfs]
        btrfs_commit_transaction+0xd0/0xc50 [btrfs]
        ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
        ? __ia32_sys_fdatasync+0x20/0x20
        iterate_supers+0xdb/0x180
        ksys_sync+0x60/0xb0
        __ia32_sys_sync+0xa/0x10
        do_syscall_64+0x5c/0x280
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f2ce1d4d5b7
       Code: 83 c4 08 48 3d 01 (...)
       RSP: 002b:00007ffd8b558c58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2
       RAX: ffffffffffffffda RBX: 000000000000002c RCX: 00007f2ce1d4d5b7
       RDX: 00000000ffffffff RSI: 00000000186ba07b RDI: 000000000000002c
       RBP: 0000555f17b9e520 R08: 0000000000000012 R09: 000000000000ce00
       R10: 0000000000000078 R11: 0000000000000202 R12: 0000000000000032
       R13: 0000000051eb851f R14: 00007ffd8b558cd0 R15: 0000555f1798ec20
       irq event stamp: 0
       hardirqs last  enabled at (0): [<0000000000000000>] 0x0
       hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
       softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
       softirqs last disabled at (0): [<0000000000000000>] 0x0
       ---[ end trace bd7c03622e0b0a9c ]---
      
      Fix this simply by making btrfs_remove_block_group() remove the block
      group's item from the extent tree before it flags the block group as
      removed. Also make the free space deletion from the free space tree
      before flagging the block group as removed, to avoid a similar race
      with adding and removing free space entries for the free space tree.
      
      Fixes: 04216820 ("Btrfs: fix race between fs trimming and block group remove/allocation")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ffcb9d44
    • F
      btrfs: fix a block group ref counter leak after failure to remove block group · 9fecd132
      Filipe Manana 提交于
      When removing a block group, if we fail to delete the block group's item
      from the extent tree, we jump to the 'out' label and end up decrementing
      the block group's reference count once only (by 1), resulting in a counter
      leak because the block group at that point was already removed from the
      block group cache rbtree - so we have to decrement the reference count
      twice, once for the rbtree and once for our lookup at the start of the
      function.
      
      There is a second bug where if removing the free space tree entries (the
      call to remove_block_group_free_space()) fails we end up jumping to the
      'out_put_group' label but end up decrementing the reference count only
      once, when we should have done it twice, since we have already removed
      the block group from the block group cache rbtree. This happens because
      the reference count decrement for the rbtree reference happens after
      attempting to remove the free space tree entries, which is far away from
      the place where we remove the block group from the rbtree.
      
      To make things less error prone, decrement the reference count for the
      rbtree immediately after removing the block group from it. This also
      eleminates the need for two different exit labels on error, renaming
      'out_put_label' to just 'out' and removing the old 'out'.
      
      Fixes: f6033c5e ("btrfs: fix block group leak when removing fails")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9fecd132
  3. 25 5月, 2020 10 次提交
  4. 23 4月, 2020 2 次提交
    • X
      btrfs: fix block group leak when removing fails · f6033c5e
      Xiyu Yang 提交于
      btrfs_remove_block_group() invokes btrfs_lookup_block_group(), which
      returns a local reference of the block group that contains the given
      bytenr to "block_group" with increased refcount.
      
      When btrfs_remove_block_group() returns, "block_group" becomes invalid,
      so the refcount should be decreased to keep refcount balanced.
      
      The reference counting issue happens in several exception handling paths
      of btrfs_remove_block_group(). When those error scenarios occur such as
      btrfs_alloc_path() returns NULL, the function forgets to decrease its
      refcnt increased by btrfs_lookup_block_group() and will cause a refcnt
      leak.
      
      Fix this issue by jumping to "out_put_group" label and calling
      btrfs_put_block_group() when those error scenarios occur.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn>
      Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f6033c5e
    • F
      btrfs: fix memory leak of transaction when deleting unused block group · 5150bf19
      Filipe Manana 提交于
      When cleaning pinned extents right before deleting an unused block group,
      we check if there's still a previous transaction running and if so we
      increment its reference count before using it for cleaning pinned ranges
      in its pinned extents iotree. However we ended up never decrementing the
      reference count after using the transaction, resulting in a memory leak.
      
      Fix it by decrementing the reference count.
      
      Fixes: fe119a6e ("btrfs: switch to per-transaction pinned extents")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5150bf19
  5. 09 4月, 2020 1 次提交
    • F
      btrfs: fix reclaim counter leak of space_info objects · d611add4
      Filipe Manana 提交于
      Whenever we add a ticket to a space_info object we increment the object's
      reclaim_size counter witht the ticket's bytes, and we decrement it with
      the corresponding amount only when we are able to grant the requested
      space to the ticket. When we are not able to grant the space to a ticket,
      or when the ticket is removed due to a signal (e.g. an application has
      received sigterm from the terminal) we never decrement the counter with
      the corresponding bytes from the ticket. This leak can result in the
      space reclaim code to later do much more work than necessary. So fix it
      by decrementing the counter when those two cases happen as well.
      
      Fixes: db161806 ("btrfs: account ticket size at add/delete time")
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d611add4
  6. 24 3月, 2020 4 次提交
  7. 21 3月, 2020 1 次提交
  8. 31 1月, 2020 2 次提交
    • J
      btrfs: take overcommit into account in inc_block_group_ro · a30a3d20
      Josef Bacik 提交于
      inc_block_group_ro does a calculation to see if we have enough room left
      over if we mark this block group as read only in order to see if it's ok
      to mark the block group as read only.
      
      The problem is this calculation _only_ works for data, where our used is
      always less than our total.  For metadata we will overcommit, so this
      will almost always fail for metadata.
      
      Fix this by exporting btrfs_can_overcommit, and then see if we have
      enough space to remove the remaining free space in the block group we
      are trying to mark read only.  If we do then we can mark this block
      group as read only.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a30a3d20
    • J
      btrfs: fix force usage in inc_block_group_ro · a7a63acc
      Josef Bacik 提交于
      For some reason we've translated the do_chunk_alloc that goes into
      btrfs_inc_block_group_ro to force in inc_block_group_ro, but these are
      two different things.
      
      force for inc_block_group_ro is used when we are forcing the block group
      read only no matter what, for example when the underlying chunk is
      marked read only.  We need to not do the space check here as this block
      group needs to be read only.
      
      btrfs_inc_block_group_ro() has a do_chunk_alloc flag that indicates that
      we need to pre-allocate a chunk before marking the block group read
      only.  This has nothing to do with forcing, and in fact we _always_ want
      to do the space check in this case, so unconditionally pass false for
      force in this case.
      
      Then fixup inc_block_group_ro to honor force as it's expected and
      documented to do.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a7a63acc
  9. 24 1月, 2020 2 次提交
  10. 20 1月, 2020 5 次提交
    • J
      btrfs: remove unnecessary wrapper get_alloc_profile · ef0a82da
      Johannes Thumshirn 提交于
      btrfs_get_alloc_profile() is a simple wrapper over get_alloc_profile().
      The only difference is btrfs_get_alloc_profile() is visible to other
      functions in btrfs while get_alloc_profile() is static and thus only
      visible to functions in block-group.c.
      
      Let's just fold get_alloc_profile() into btrfs_get_alloc_profile() to
      get rid of the unnecessary second function.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NJohannes Thumshirn <jth@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ef0a82da
    • D
      btrfs: handle empty block_group removal for async discard · 6e80d4f8
      Dennis Zhou 提交于
      block_group removal is a little tricky. It can race with the extent
      allocator, the cleaner thread, and balancing. The current path is for a
      block_group to be added to the unused_bgs list. Then, when the cleaner
      thread comes around, it starts a transaction and then proceeds with
      removing the block_group. Extents that are pinned are subsequently
      removed from the pinned trees and then eventually a discard is issued
      for the entire block_group.
      
      Async discard introduces another player into the game, the discard
      workqueue. While it has none of the racing issues, the new problem is
      ensuring we don't leave free space untrimmed prior to forgetting the
      block_group.  This is handled by placing fully free block_groups on a
      separate discard queue. This is necessary to maintain discarding order
      as in the future we will slowly trim even fully free block_groups. The
      ordering helps us make progress on the same block_group rather than say
      the last fully freed block_group or needing to search through the fully
      freed block groups at the beginning of a list and insert after.
      
      The new order of events is a fully freed block group gets placed on the
      unused discard queue first. Once it's processed, it will be placed on
      the unusued_bgs list and then the original sequence of events will
      happen, just without the final whole block_group discard.
      
      The mount flags can change when processing unused_bgs, so when flipping
      from DISCARD to DISCARD_ASYNC, the unused_bgs must be punted to the
      discard_list to be trimmed. If we flip off DISCARD_ASYNC, we punt
      free block groups on the discard_list to the unused_bg queue which will
      do the final discard for us.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6e80d4f8
    • D
      btrfs: add the beginning of async discard, discard workqueue · b0643e59
      Dennis Zhou 提交于
      When discard is enabled, everytime a pinned extent is released back to
      the block_group's free space cache, a discard is issued for the extent.
      This is an overeager approach when it comes to discarding and helping
      the SSD maintain enough free space to prevent severe garbage collection
      situations.
      
      This adds the beginning of async discard. Instead of issuing a discard
      prior to returning it to the free space, it is just marked as untrimmed.
      The block_group is then added to a LRU which then feeds into a workqueue
      to issue discards at a much slower rate. Full discarding of unused block
      groups is still done and will be addressed in a future patch of the
      series.
      
      For now, we don't persist the discard state of extents and bitmaps.
      Therefore, our failure recovery mode will be to consider extents
      untrimmed. This lets us handle failure and unmounting as one in the
      same.
      
      On a number of Facebook webservers, I collected data every minute
      accounting the time we spent in btrfs_finish_extent_commit() (col. 1)
      and in btrfs_commit_transaction() (col. 2). btrfs_finish_extent_commit()
      is where we discard extents synchronously before returning them to the
      free space cache.
      
      discard=sync:
                       p99 total per minute       p99 total per minute
            Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
          ---------------------------------------------------------------
           Drive A  |           434           |          1170
           Drive B  |           880           |          2330
           Drive C  |          2943           |          3920
           Drive D  |          4763           |          5701
      
      discard=async:
                       p99 total per minute       p99 total per minute
            Drive   |   extent_commit() (ms)  |    commit_trans() (ms)
          --------------------------------------------------------------
           Drive A  |           134           |           956
           Drive B  |            64           |          1972
           Drive C  |            59           |          1032
           Drive D  |            62           |          1200
      
      While it's not great that the stats are cumulative over 1m, all of these
      servers are running the same workload and and the delta between the two
      are substantial. We are spending significantly less time in
      btrfs_finish_extent_commit() which is responsible for discarding.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b0643e59
    • D
      btrfs: rename DISCARD mount option to to DISCARD_SYNC · 46b27f50
      Dennis Zhou 提交于
      This series introduces async discard which will use the flag
      DISCARD_ASYNC, so rename the original flag to DISCARD_SYNC as it is
      synchronously done in transaction commit.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      46b27f50
    • J
      btrfs: kill min_allocable_bytes in inc_block_group_ro · f8935566
      Josef Bacik 提交于
      This is a relic from a time before we had a proper reservation mechanism
      and you could end up with really full chunks at chunk allocation time.
      This doesn't make sense anymore, so just kill it.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f8935566
  11. 19 11月, 2019 8 次提交
    • Q
      btrfs: scrub: Don't check free space before marking a block group RO · b12de528
      Qu Wenruo 提交于
      [BUG]
      When running btrfs/072 with only one online CPU, it has a pretty high
      chance to fail:
      
        btrfs/072 12s ... _check_dmesg: something found in dmesg (see xfstests-dev/results//btrfs/072.dmesg)
        - output mismatch (see xfstests-dev/results//btrfs/072.out.bad)
            --- tests/btrfs/072.out     2019-10-22 15:18:14.008965340 +0800
            +++ /xfstests-dev/results//btrfs/072.out.bad      2019-11-14 15:56:45.877152240 +0800
            @@ -1,2 +1,3 @@
             QA output created by 072
             Silence is golden
            +Scrub find errors in "-m dup -d single" test
            ...
      
      And with the following call trace:
      
        BTRFS info (device dm-5): scrub: started on devid 1
        ------------[ cut here ]------------
        BTRFS: Transaction aborted (error -27)
        WARNING: CPU: 0 PID: 55087 at fs/btrfs/block-group.c:1890 btrfs_create_pending_block_groups+0x3e6/0x470 [btrfs]
        CPU: 0 PID: 55087 Comm: btrfs Tainted: G        W  O      5.4.0-rc1-custom+ #13
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:btrfs_create_pending_block_groups+0x3e6/0x470 [btrfs]
        Call Trace:
         __btrfs_end_transaction+0xdb/0x310 [btrfs]
         btrfs_end_transaction+0x10/0x20 [btrfs]
         btrfs_inc_block_group_ro+0x1c9/0x210 [btrfs]
         scrub_enumerate_chunks+0x264/0x940 [btrfs]
         btrfs_scrub_dev+0x45c/0x8f0 [btrfs]
         btrfs_ioctl+0x31a1/0x3fb0 [btrfs]
         do_vfs_ioctl+0x636/0xaa0
         ksys_ioctl+0x67/0x90
         __x64_sys_ioctl+0x43/0x50
         do_syscall_64+0x79/0xe0
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        ---[ end trace 166c865cec7688e7 ]---
      
      [CAUSE]
      The error number -27 is -EFBIG, returned from the following call chain:
      btrfs_end_transaction()
      |- __btrfs_end_transaction()
         |- btrfs_create_pending_block_groups()
            |- btrfs_finish_chunk_alloc()
               |- btrfs_add_system_chunk()
      
      This happens because we have used up all space of
      btrfs_super_block::sys_chunk_array.
      
      The root cause is, we have the following bad loop of creating tons of
      system chunks:
      
      1. The only SYSTEM chunk is being scrubbed
         It's very common to have only one SYSTEM chunk.
      2. New SYSTEM bg will be allocated
         As btrfs_inc_block_group_ro() will check if we have enough space
         after marking current bg RO. If not, then allocate a new chunk.
      3. New SYSTEM bg is still empty, will be reclaimed
         During the reclaim, we will mark it RO again.
      4. That newly allocated empty SYSTEM bg get scrubbed
         We go back to step 2, as the bg is already mark RO but still not
         cleaned up yet.
      
      If the cleaner kthread doesn't get executed fast enough (e.g. only one
      CPU), then we will get more and more empty SYSTEM chunks, using up all
      the space of btrfs_super_block::sys_chunk_array.
      
      [FIX]
      Since scrub/dev-replace doesn't always need to allocate new extent,
      especially chunk tree extent, so we don't really need to do chunk
      pre-allocation.
      
      To break above spiral, here we introduce a new parameter to
      btrfs_inc_block_group(), @do_chunk_alloc, which indicates whether we
      need extra chunk pre-allocation.
      
      For relocation, we pass @do_chunk_alloc=true, while for scrub, we pass
      @do_chunk_alloc=false.
      This should keep unnecessary empty chunks from popping up for scrub.
      
      Also, since there are two parameters for btrfs_inc_block_group_ro(),
      add more comment for it.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b12de528
    • D
      btrfs: rename btrfs_block_group_cache · 32da5386
      David Sterba 提交于
      The type name is misleading, a single entry is named 'cache' while this
      normally means a collection of objects. Rename that everywhere. Also the
      identifier was quite long, making function prototypes harder to format.
      Suggested-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      32da5386
    • Q
      btrfs: block-group: Reuse the item key from caller of read_one_block_group() · d49a2ddb
      Qu Wenruo 提交于
      For read_one_block_group(), its only caller has already got the item key
      to search next block group item.
      
      So we can use that key directly without doing our own convertion on
      stack.
      
      Also, since that key used in btrfs_read_block_groups() is vital for
      block group item search, add 'const' keyword for that parameter to
      prevent read_one_block_group() to modify it.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d49a2ddb
    • Q
      btrfs: block-group: Refactor btrfs_read_block_groups() · ffb9e0f0
      Qu Wenruo 提交于
      Refactor the work inside the loop of btrfs_read_block_groups() into one
      separate function, read_one_block_group().
      
      This allows read_one_block_group to be reused for later BG_TREE feature.
      
      The refactor does the following extra fix:
      - Use btrfs_fs_incompat() to replace open-coded feature check
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ffb9e0f0
    • D
      btrfs: drop incompat bit for raid1c34 after last block group is gone · 9c907446
      David Sterba 提交于
      When there are no raid1c3 or raid1c4 block groups left after balance
      (either convert or with other filters applied), remove the incompat bit.
      This is already done for RAID56, do the same for RAID1C34.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9c907446
    • D
      btrfs: add dedicated members for start and length of a block group · b3470b5d
      David Sterba 提交于
      The on-disk format of block group item makes use of the key that stores
      the offset and length. This is further used in the code, although this
      makes thing harder to understand. The key is also packed so the
      offset/length is not properly aligned as u64.
      
      Add start (key.objectid) and length (key.offset) members to block group
      and remove the embedded key.  When the item is searched or written, a
      local variable for key is used.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b3470b5d
    • D
      btrfs: rename block_group_item on-stack accessors to follow naming · de0dc456
      David Sterba 提交于
      All accessors defined by BTRFS_SETGET_STACK_FUNCS contain _stack_ in the
      name, the block group ones were not following that scheme, so let's
      switch them.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      de0dc456
    • D
      btrfs: remove embedded block_group_cache::item · 3d976388
      David Sterba 提交于
      The members ::used and ::flags are now in the block group cache
      structure, the last one is chunk_objectid, but that's set to a fixed
      value and otherwise unused. The item is constructed from a local
      variable before write, so we can remove the embedded one from block
      group.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3d976388