1. 16 5月, 2022 6 次提交
    • C
      btrfs: check-integrity: split submit_bio from btrfsic checking · 58ff51f1
      Christoph Hellwig 提交于
      Require a separate call to the integrity checking helpers from the
      actual bio submission.
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58ff51f1
    • Y
      btrfs: remove unnecessary type casts · 0d031dc4
      Yu Zhe 提交于
      Explicit type casts are not necessary when it's void* to another pointer
      type.
      Signed-off-by: NYu Zhe <yuzhe@nfschina.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0d031dc4
    • Q
      btrfs: expand subpage support to any PAGE_SIZE > 4K · 1a42daab
      Qu Wenruo 提交于
      With the recent change in metadata handling, we can handle metadata in
      the following cases:
      
      - nodesize < PAGE_SIZE and sectorsize < PAGE_SIZE
        Go subpage routine for both metadata and data.
      
      - nodesize < PAGE_SIZE and sectorsize >= PAGE_SIZE
        Invalid case for now. As we require nodesize >= sectorsize.
      
      - nodesize >= PAGE_SIZE and sectorsize < PAGE_SIZE
        Go subpage routine for data, but regular page routine for metadata.
      
      - nodesize >= PAGE_SIZE and sectorsize >= PAGE_SIZE
        Go regular page routine for both metadata and data.
      
      Now we can handle any sectorsize < PAGE_SIZE, plus the existing
      sectorsize == PAGE_SIZE support.
      
      But here we introduce an artificial limit, any PAGE_SIZE > 4K case, we
      will only support 4K and PAGE_SIZE as sector size.
      
      The idea here is to reduce the test combinations, and push 4K as the
      default standard in the future.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1a42daab
    • Q
      btrfs: make nodesize >= PAGE_SIZE case to reuse the non-subpage routine · fbca46eb
      Qu Wenruo 提交于
      The reason why we only support 64K page size for subpage is, for 64K
      page size we can ensure no matter what the nodesize is, we can fit it
      into one page.
      
      When other page size come, especially like 16K, the limitation is a bit
      limiting.
      
      To remove such limitation, we allow nodesize >= PAGE_SIZE case to go the
      non-subpage routine.  By this, we can allow 4K sectorsize on 16K page
      size.
      
      Although this introduces another smaller limitation, the metadata can
      not cross page boundary, which is already met by most recent mkfs.
      
      Another small improvement is, we can avoid the overhead for metadata if
      nodesize >= PAGE_SIZE.
      For 4K sector size and 64K page size/node size, or 4K sector size and
      16K page size/node size, we don't need to allocate extra memory for the
      metadata pages.
      
      Please note that, this patch will not yet enable other page size support
      yet.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fbca46eb
    • Q
      btrfs: tree-checker: check extent buffer owner against owner rootid · 88c602ab
      Qu Wenruo 提交于
      Btrfs doesn't check whether the tree block respects the root owner.
      This means, if a tree block referred by a parent in extent tree, but has
      owner of 5, btrfs can still continue reading the tree block, as long as
      it doesn't trigger other sanity checks.
      
      Normally this is fine, but combined with the empty tree check in
      check_leaf(), if we hit an empty extent tree, but the root node has
      csum tree owner, we can let such extent buffer to sneak in.
      
      Shrink the hole by:
      
      - Do extra eb owner check at tree read time
      
      - Make sure the root owner extent buffer exactly matches the root id.
      
      Unfortunately we can't yet completely patch the hole, there are several
      call sites can't pass all info we need:
      
      - For reloc/log trees
        Their owner is key::offset, not key::objectid.
        We need the full root key to do that accurate check.
      
        For now, we just skip the ownership check for those trees.
      
      - For add_data_references() of relocation
        That call site doesn't have any parent/ownership info, as all the
        bytenrs are all from btrfs_find_all_leafs().
      
      - For direct backref items walk
        Direct backref items records the parent bytenr directly, thus unlike
        indirect backref item, we don't do a full tree search.
      
        Thus in that case, we don't have full parent owner to check.
      
      For the later two cases, they all pass 0 as @owner_root, thus we can
      skip those cases if @owner_root is 0.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      88c602ab
    • F
      btrfs: remove trivial wrapper btrfs_read_buffer() · 6a2e9dc4
      Filipe Manana 提交于
      The function btrfs_read_buffer() is useless, it just calls
      btree_read_extent_buffer_pages() with exactly the same arguments.
      
      So remove it and rename btree_read_extent_buffer_pages() to
      btrfs_read_extent_buffer(), which is a shorter name, has the "btrfs_"
      prefix (since it's used outside disk-io.c) and the name is clear enough
      about what it does.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6a2e9dc4
  2. 06 5月, 2022 1 次提交
    • Q
      btrfs: force v2 space cache usage for subpage mount · 9f73f1ae
      Qu Wenruo 提交于
      [BUG]
      For a 4K sector sized btrfs with v1 cache enabled and only mounted on
      systems with 4K page size, if it's mounted on subpage (64K page size)
      systems, it can cause the following warning on v1 space cache:
      
       BTRFS error (device dm-1): csum mismatch on free space cache
       BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now
      
      Although not a big deal, as kernel can rebuild it without problem, such
      warning will bother end users, especially if they want to switch the
      same btrfs seamlessly between different page sized systems.
      
      [CAUSE]
      V1 free space cache is still using fixed PAGE_SIZE for various bitmap,
      like BITS_PER_BITMAP.
      
      Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1
      cache size to checksum.
      
      Thus kernel will always reject v1 cache with a different PAGE_SIZE with
      csum mismatch.
      
      [FIX]
      Although we should fix v1 cache, it's already going to be marked
      deprecated soon.
      
      And we have v2 cache based on metadata (which is already fully subpage
      compatible), and it has almost everything superior than v1 cache.
      
      So just force subpage mount to use v2 cache on mount.
      Reported-by: NMatt Corallo <blnxfsl@bluematt.me>
      CC: stable@vger.kernel.org # 5.15+
      Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9f73f1ae
  3. 21 4月, 2022 1 次提交
    • N
      btrfs: zoned: use dedicated lock for data relocation · 5f0addf7
      Naohiro Aota 提交于
      Currently, we use btrfs_inode_{lock,unlock}() to grant an exclusive
      writeback of the relocation data inode in
      btrfs_zoned_data_reloc_{lock,unlock}(). However, that can cause a deadlock
      in the following path.
      
      Thread A takes btrfs_inode_lock() and waits for metadata reservation by
      e.g, waiting for writeback:
      
      prealloc_file_extent_cluster()
        - btrfs_inode_lock(&inode->vfs_inode, 0);
        - btrfs_prealloc_file_range()
        ...
          - btrfs_replace_file_extents()
            - btrfs_start_transaction
            ...
              - btrfs_reserve_metadata_bytes()
      
      Thread B (e.g, doing a writeback work) needs to wait for the inode lock to
      continue writeback process:
      
      do_writepages
        - btrfs_writepages
          - extent_writpages
            - btrfs_zoned_data_reloc_lock(BTRFS_I(inode));
              - btrfs_inode_lock()
      
      The deadlock is caused by relying on the vfs_inode's lock. By using it, we
      introduced unnecessary exclusion of writeback and
      btrfs_prealloc_file_range(). Also, the lock at this point is useless as we
      don't have any dirty pages in the inode yet.
      
      Introduce fs_info->zoned_data_reloc_io_lock and use it for the exclusive
      writeback.
      
      Fixes: 35156d85 ("btrfs: zoned: only allow one process to add pages to a relocation inode")
      CC: stable@vger.kernel.org # 5.16.x: 869f4cdc: btrfs: zoned: encapsulate inode locking for zoned relocation
      CC: stable@vger.kernel.org # 5.16.x
      CC: stable@vger.kernel.org # 5.17
      Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5f0addf7
  4. 06 4月, 2022 1 次提交
  5. 15 3月, 2022 2 次提交
  6. 14 3月, 2022 7 次提交
  7. 02 3月, 2022 1 次提交
    • J
      btrfs: do not start relocation until in progress drops are done · b4be6aef
      Josef Bacik 提交于
      We hit a bug with a recovering relocation on mount for one of our file
      systems in production.  I reproduced this locally by injecting errors
      into snapshot delete with balance running at the same time.  This
      presented as an error while looking up an extent item
      
        WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
        CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
        RIP: 0010:lookup_inline_extent_backref+0x647/0x680
        RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
        RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
        RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
        R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
        R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
        FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
        Call Trace:
         <TASK>
         insert_inline_extent_backref+0x46/0xd0
         __btrfs_inc_extent_ref.isra.0+0x5f/0x200
         ? btrfs_merge_delayed_refs+0x164/0x190
         __btrfs_run_delayed_refs+0x561/0xfa0
         ? btrfs_search_slot+0x7b4/0xb30
         ? btrfs_update_root+0x1a9/0x2c0
         btrfs_run_delayed_refs+0x73/0x1f0
         ? btrfs_update_root+0x1a9/0x2c0
         btrfs_commit_transaction+0x50/0xa50
         ? btrfs_update_reloc_root+0x122/0x220
         prepare_to_merge+0x29f/0x320
         relocate_block_group+0x2b8/0x550
         btrfs_relocate_block_group+0x1a6/0x350
         btrfs_relocate_chunk+0x27/0xe0
         btrfs_balance+0x777/0xe60
         balance_kthread+0x35/0x50
         ? btrfs_balance+0xe60/0xe60
         kthread+0x16b/0x190
         ? set_kthread_struct+0x40/0x40
         ret_from_fork+0x22/0x30
         </TASK>
      
      Normally snapshot deletion and relocation are excluded from running at
      the same time by the fs_info->cleaner_mutex.  However if we had a
      pending balance waiting to get the ->cleaner_mutex, and a snapshot
      deletion was running, and then the box crashed, we would come up in a
      state where we have a half deleted snapshot.
      
      Again, in the normal case the snapshot deletion needs to complete before
      relocation can start, but in this case relocation could very well start
      before the snapshot deletion completes, as we simply add the root to the
      dead roots list and wait for the next time the cleaner runs to clean up
      the snapshot.
      
      Fix this by setting a bit on the fs_info if we have any DEAD_ROOT's that
      had a pending drop_progress key.  If they do then we know we were in the
      middle of the drop operation and set a flag on the fs_info.  Then
      balance can wait until this flag is cleared to start up again.
      
      If there are DEAD_ROOT's that don't have a drop_progress set then we're
      safe to start balance right away as we'll be properly protected by the
      cleaner_mutex.
      
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b4be6aef
  8. 02 2月, 2022 2 次提交
  9. 07 1月, 2022 3 次提交
    • Q
      btrfs: output more debug messages for uncommitted transaction · 36c86a9e
      Qu Wenruo 提交于
      Print extra information about how many dirty bytes an uncommitted
      has at the end of mount.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      36c86a9e
    • Q
      btrfs: remove reada infrastructure · f26c9238
      Qu Wenruo 提交于
      Currently there is only one user for btrfs metadata readahead, and
      that's scrub.
      
      But even for the single user, it's not providing the correct
      functionality it needs, as scrub needs reada for commit root, which
      current readahead can't provide. (Although it's pretty easy to add such
      feature).
      
      Despite this, there are some extra problems related to metadata
      readahead:
      
      - Duplicated feature with btrfs_path::reada
      
      - Partly duplicated feature of btrfs_fs_info::buffer_radix
        Btrfs already caches its metadata in buffer_radix, while readahead
        tries to read the tree block no matter if it's already cached.
      
      - Poor layer separation
        Metadata readahead works kinda at device level.
        This is definitely not the correct layer it should be, since metadata
        is at btrfs logical address space, it should not bother device at all.
      
        This brings extra chance for bugs to sneak in, while brings
        unnecessary complexity.
      
      - Dead code
        In the very beginning of scrub.c we have #undef DEBUG, rendering all
        the debug related code useless and unable to test.
      
      Thus here I purpose to remove the metadata readahead mechanism
      completely.
      
      [BENCHMARK]
      There is a full benchmark for the scrub performance difference using the
      old btrfs_reada_add() and btrfs_path::reada.
      
      For the worst case (no dirty metadata, slow HDD), there could be a 5%
      performance drop for scrub.
      For other cases (even SATA SSD), there is no distinguishable performance
      difference.
      
      The number is reported scrub speed, in MiB/s.
      The resolution is limited by the reported duration, which only has a
      resolution of 1 second.
      
      	Old		New		Diff
      SSD	455.3		466.332		+2.42%
      HDD	103.927 	98.012		-5.69%
      
      Comprehensive test methodology is in the cover letter of the patch.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f26c9238
    • F
      btrfs: make send work with concurrent block group relocation · d96b3424
      Filipe Manana 提交于
      We don't allow send and balance/relocation to run in parallel in order
      to prevent send failing or silently producing some bad stream. This is
      because while send is using an extent (specially metadata) or about to
      read a metadata extent and expecting it belongs to a specific parent
      node, relocation can run, the transaction used for the relocation is
      committed and the extent gets reallocated while send is still using the
      extent, so it ends up with a different content than expected. This can
      result in just failing to read a metadata extent due to failure of the
      validation checks (parent transid, level, etc), failure to find a
      backreference for a data extent, and other unexpected failures. Besides
      reallocation, there's also a similar problem of an extent getting
      discarded when it's unpinned after the transaction used for block group
      relocation is committed.
      
      The restriction between balance and send was added in commit 9e967495
      ("Btrfs: prevent send failures and crashes due to concurrent relocation"),
      kernel 5.3, while the more general restriction between send and relocation
      was added in commit 1cea5cf0 ("btrfs: ensure relocation never runs
      while we have send operations running"), kernel 5.14.
      
      Both send and relocation can be very long running operations. Relocation
      because it has to do a lot of IO and expensive backreference lookups in
      case there are many snapshots, and send due to read IO when operating on
      very large trees. This makes it inconvenient for users and tools to deal
      with scheduling both operations.
      
      For zoned filesystem we also have automatic block group relocation, so
      send can fail with -EAGAIN when users least expect it or send can end up
      delaying the block group relocation for too long. In the future we might
      also get the automatic block group relocation for non zoned filesystems.
      
      This change makes it possible for send and relocation to run in parallel.
      This is achieved the following way:
      
      1) For all tree searches, send acquires a read lock on the commit root
         semaphore;
      
      2) After each tree search, and before releasing the commit root semaphore,
         the leaf is cloned and placed in the search path (struct btrfs_path);
      
      3) After releasing the commit root semaphore, the changed_cb() callback
         is invoked, which operates on the leaf and writes commands to the pipe
         (or file in case send/receive is not used with a pipe). It's important
         here to not hold a lock on the commit root semaphore, because if we did
         we could deadlock when sending and receiving to the same filesystem
         using a pipe - the send task blocks on the pipe because it's full, the
         receive task, which is the only consumer of the pipe, triggers a
         transaction commit when attempting to create a subvolume or reserve
         space for a write operation for example, but the transaction commit
         blocks trying to write lock the commit root semaphore, resulting in a
         deadlock;
      
      4) Before moving to the next key, or advancing to the next change in case
         of an incremental send, check if a transaction used for relocation was
         committed (or is about to finish its commit). If so, release the search
         path(s) and restart the search, to where we were before, so that we
         don't operate on stale extent buffers. The search restarts are always
         possible because both the send and parent roots are RO, and no one can
         add, remove of update keys (change their offset) in RO trees - the
         only exception is deduplication, but that is still not allowed to run
         in parallel with send;
      
      5) Periodically check if there is contention on the commit root semaphore,
         which means there is a transaction commit trying to write lock it, and
         release the semaphore and reschedule if there is contention, so as to
         avoid causing any significant delays to transaction commits.
      
      This leaves some room for optimizations for send to have less path
      releases and re searching the trees when there's relocation running, but
      for now it's kept simple as it performs quite well (on very large trees
      with resulting send streams in the order of a few hundred gigabytes).
      
      Test case btrfs/187, from fstests, stresses relocation, send and
      deduplication attempting to run in parallel, but without verifying if send
      succeeds and if it produces correct streams. A new test case will be added
      that exercises relocation happening in parallel with send and then checks
      that send succeeds and the resulting streams are correct.
      
      A final note is that for now this still leaves the mutual exclusion
      between send operations and deduplication on files belonging to a root
      used by send operations. A solution for that will be slightly more complex
      but it will eventually be built on top of this change.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d96b3424
  10. 03 1月, 2022 12 次提交
  11. 14 12月, 2021 1 次提交
    • F
      btrfs: fix double free of anon_dev after failure to create subvolume · 33fab972
      Filipe Manana 提交于
      When creating a subvolume, at create_subvol(), we allocate an anonymous
      device and later call btrfs_get_new_fs_root(), which in turn just calls
      btrfs_get_root_ref(). There we call btrfs_init_fs_root() which assigns
      the anonymous device to the root, but if after that call there's an error,
      when we jump to 'fail' label, we call btrfs_put_root(), which frees the
      anonymous device and then returns an error that is propagated back to
      create_subvol(). Than create_subvol() frees the anonymous device again.
      
      When this happens, if the anonymous device was not reallocated after
      the first time it was freed with btrfs_put_root(), we get a kernel
      message like the following:
      
        (...)
        [13950.282466] BTRFS: error (device dm-0) in create_subvol:663: errno=-5 IO failure
        [13950.283027] ida_free called for id=65 which is not allocated.
        [13950.285974] BTRFS info (device dm-0): forced readonly
        (...)
      
      If the anonymous device gets reallocated by another btrfs filesystem
      or any other kernel subsystem, then bad things can happen.
      
      So fix this by setting the root's anonymous device to 0 at
      btrfs_get_root_ref(), before we call btrfs_put_root(), if an error
      happened.
      
      Fixes: 2dfb1e43 ("btrfs: preallocate anon block device at first phase of snapshot creation")
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      33fab972
  12. 16 11月, 2021 1 次提交
    • W
      btrfs: check-integrity: fix a warning on write caching disabled disk · a91cf0ff
      Wang Yugui 提交于
      When a disk has write caching disabled, we skip submission of a bio with
      flush and sync requests before writing the superblock, since it's not
      needed. However when the integrity checker is enabled, this results in
      reports that there are metadata blocks referred by a superblock that
      were not properly flushed. So don't skip the bio submission only when
      the integrity checker is enabled for the sake of simplicity, since this
      is a debug tool and not meant for use in non-debug builds.
      
      fstests/btrfs/220 trigger a check-integrity warning like the following
      when CONFIG_BTRFS_FS_CHECK_INTEGRITY=y and the disk with WCE=0.
      
        btrfs: attempt to write superblock which references block M @5242880 (sdb2/5242880/0) which is not flushed out of disk's write cache (block flush_gen=1, dev->flush_gen=0)!
        ------------[ cut here ]------------
        WARNING: CPU: 28 PID: 843680 at fs/btrfs/check-integrity.c:2196 btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
        CPU: 28 PID: 843680 Comm: umount Not tainted 5.15.0-0.rc5.39.el8.x86_64 #1
        Hardware name: Dell Inc. Precision T7610/0NK70N, BIOS A18 09/11/2019
        RIP: 0010:btrfsic_process_written_superblock+0x22a/0x2a0 [btrfs]
        RSP: 0018:ffffb642afb47940 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
        RDX: 00000000ffffffff RSI: ffff8b722fc97d00 RDI: ffff8b722fc97d00
        RBP: ffff8b5601c00000 R08: 0000000000000000 R09: c0000000ffff7fff
        R10: 0000000000000001 R11: ffffb642afb476f8 R12: ffffffffffffffff
        R13: ffffb642afb47974 R14: ffff8b5499254c00 R15: 0000000000000003
        FS:  00007f00a06d4080(0000) GS:ffff8b722fc80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fff5cff5ff0 CR3: 00000001c0c2a006 CR4: 00000000001706e0
        Call Trace:
         btrfsic_process_written_block+0x2f7/0x850 [btrfs]
         __btrfsic_submit_bio.part.19+0x310/0x330 [btrfs]
         ? bio_associate_blkg_from_css+0xa4/0x2c0
         btrfsic_submit_bio+0x18/0x30 [btrfs]
         write_dev_supers+0x81/0x2a0 [btrfs]
         ? find_get_pages_range_tag+0x219/0x280
         ? pagevec_lookup_range_tag+0x24/0x30
         ? __filemap_fdatawait_range+0x6d/0xf0
         ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
         ? find_first_extent_bit+0x9b/0x160 [btrfs]
         ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
         write_all_supers+0x1b3/0xa70 [btrfs]
         ? __raw_callee_save___native_queued_spin_unlock+0x11/0x1e
         btrfs_commit_transaction+0x59d/0xac0 [btrfs]
         close_ctree+0x11d/0x339 [btrfs]
         generic_shutdown_super+0x71/0x110
         kill_anon_super+0x14/0x30
         btrfs_kill_super+0x12/0x20 [btrfs]
         deactivate_locked_super+0x31/0x70
         cleanup_mnt+0xb8/0x140
         task_work_run+0x6d/0xb0
         exit_to_user_mode_prepare+0x1f0/0x200
         syscall_exit_to_user_mode+0x12/0x30
         do_syscall_64+0x46/0x80
         entry_SYSCALL_64_after_hwframe+0x44/0xae
        RIP: 0033:0x7f009f711dfb
        RSP: 002b:00007fff5cff7928 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
        RAX: 0000000000000000 RBX: 000055b68c6c9970 RCX: 00007f009f711dfb
        RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000055b68c6c9b50
        RBP: 0000000000000000 R08: 000055b68c6ca900 R09: 00007f009f795580
        R10: 0000000000000000 R11: 0000000000000246 R12: 000055b68c6c9b50
        R13: 00007f00a04bf184 R14: 0000000000000000 R15: 00000000ffffffff
        ---[ end trace 2c4b82abcef9eec4 ]---
        S-65536(sdb2/65536/1)
         -->
        M-1064960(sdb2/1064960/1)
      Reviewed-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NWang Yugui <wangyugui@e16-tech.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a91cf0ff
  13. 29 10月, 2021 1 次提交
    • A
      btrfs: call btrfs_check_rw_degradable only if there is a missing device · 5c78a5e7
      Anand Jain 提交于
      In open_ctree() in btrfs_check_rw_degradable() [1], we check each block
      group individually if at least the minimum number of devices is available
      for that profile. If all the devices are available, then we don't have to
      check degradable.
      
      [1]
      open_ctree()
      ::
      3559 if (!sb_rdonly(sb) && !btrfs_check_rw_degradable(fs_info, NULL)) {
      
      Also before calling btrfs_check_rw_degradable() in open_ctee() at the
      line number shown below [2] we call btrfs_read_chunk_tree() and down to
      add_missing_dev() to record number of missing devices.
      
      [2]
      open_ctree()
      ::
      3454         ret = btrfs_read_chunk_tree(fs_info);
      
      btrfs_read_chunk_tree()
        read_one_chunk() / read_one_dev()
          add_missing_dev()
      
      So, check if there is any missing device before btrfs_check_rw_degradable()
      in open_ctree().
      
      Also, with this the mount command could save ~16ms.[3] in the most
      common case, that is no device is missing.
      
      [3]
       1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();
      
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5c78a5e7
  14. 27 10月, 2021 1 次提交