1. 23 2月, 2021 6 次提交
    • F
      btrfs: fix race between writes to swap files and scrub · 195a49ea
      Filipe Manana 提交于
      When we active a swap file, at btrfs_swap_activate(), we acquire the
      exclusive operation lock to prevent the physical location of the swap
      file extents to be changed by operations such as balance and device
      replace/resize/remove. We also call there can_nocow_extent() which,
      among other things, checks if the block group of a swap file extent is
      currently RO, and if it is we can not use the extent, since a write
      into it would result in COWing the extent.
      
      However we have no protection against a scrub operation running after we
      activate the swap file, which can result in the swap file extents to be
      COWed while the scrub is running and operating on the respective block
      group, because scrub turns a block group into RO before it processes it
      and then back again to RW mode after processing it. That means an attempt
      to write into a swap file extent while scrub is processing the respective
      block group, will result in COWing the extent, changing its physical
      location on disk.
      
      Fix this by making sure that block groups that have extents that are used
      by active swap files can not be turned into RO mode, therefore making it
      not possible for a scrub to turn them into RO mode. When a scrub finds a
      block group that can not be turned to RO due to the existence of extents
      used by swap files, it proceeds to the next block group and logs a warning
      message that mentions the block group was skipped due to active swap
      files - this is the same approach we currently use for balance.
      
      Fixes: ed46ff3d ("Btrfs: support swap files")
      CC: stable@vger.kernel.org # 5.4+
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      195a49ea
    • F
      btrfs: avoid checking for RO block group twice during nocow writeback · 20903032
      Filipe Manana 提交于
      During the nocow writeback path, we currently iterate the rbtree of block
      groups twice: once for checking if the target block group is RO with the
      call to btrfs_extent_readonly()), and once again for getting a nocow
      reference on the block group with a call to btrfs_inc_nocow_writers().
      
      Since btrfs_inc_nocow_writers() already returns false when the target
      block group is RO, remove the call to btrfs_extent_readonly(). Not only
      we avoid searching the blocks group rbtree twice, it also helps reduce
      contention on the lock that protects it (specially since it is a spin
      lock and not a read-write lock). That may make a noticeable difference
      on very large filesystems, with thousands of allocated block groups.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      20903032
    • N
      btrfs: fix race between extent freeing/allocation when using bitmaps · 3c179165
      Nikolay Borisov 提交于
      During allocation the allocator will try to allocate an extent using
      cluster policy. Once the current cluster is exhausted it will remove the
      entry under btrfs_free_cluster::lock and subsequently acquire
      btrfs_free_space_ctl::tree_lock to dispose of the already-deleted entry
      and adjust btrfs_free_space_ctl::total_bitmap. This poses a problem
      because there exists a race condition between removing the entry under
      one lock and doing the necessary accounting holding a different lock
      since extent freeing only uses the 2nd lock. This can result in the
      following situation:
      
      T1:                                    T2:
      btrfs_alloc_from_cluster               insert_into_bitmap <holds tree_lock>
       if (entry->bytes == 0)                   if (block_group && !list_empty(&block_group->cluster_list)) {
          rb_erase(entry)
      
       spin_unlock(&cluster->lock);
         (total_bitmaps is still 4)           spin_lock(&cluster->lock);
                                               <doesn't find entry in cluster->root>
       spin_lock(&ctl->tree_lock);             <goes to new_bitmap label, adds
      <blocked since T2 holds tree_lock>       <a new entry and calls add_new_bitmap>
      					    recalculate_thresholds  <crashes,
                                                    due to total_bitmaps
      					      becoming 5 and triggering
      					      an ASSERT>
      
      To fix this ensure that once depleted, the cluster entry is deleted when
      both cluster lock and tree locks are held in the allocator (T1), this
      ensures that even if there is a race with a concurrent
      insert_into_bitmap call it will correctly find the entry in the cluster
      and add the new space to it.
      
      CC: <stable@vger.kernel.org> # 4.4+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3c179165
    • Q
      btrfs: make check_compressed_csum() to be subpage compatible · 04d4ba4c
      Qu Wenruo 提交于
      Currently check_compressed_csum() completely relies on sectorsize ==
      PAGE_SIZE to do checksum verification for compressed extents.
      
      To make it subpage compatible, this patch will:
      - Do extra calculation for the csum range
        Since we have multiple sectors inside a page, we need to only hash
        the range we want, not the full page anymore.
      
      - Do sector-by-sector hash inside the page
      
      With this patch and previous conversion on
      btrfs_submit_compressed_read(), now we can read subpage compressed
      extents properly, and do proper csum verification.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      04d4ba4c
    • Q
      btrfs: make btrfs_submit_compressed_read() subpage compatible · be6a1361
      Qu Wenruo 提交于
      For compressed read, we always submit page read using page size.  This
      doesn't work well with subpage, as for subpage one page can contain
      several sectors.  Such submission will read range out of what we want,
      and cause problems.
      
      Thankfully to make it subpage compatible, we only need to change how the
      last page of the compressed extent is read.
      
      Instead of always adding a full page to the compressed read bio, if we're
      at the last page, calculate the size using compressed length, so that we
      only add part of the range into the compressed read bio.
      
      Since we are here, also change the PAGE_SIZE used in
      lookup_extent_mapping() to sectorsize.
      This modification won't cause any functional change, as
      lookup_extent_mapping() can handle the case where the search range is
      larger than found extent range.
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      be6a1361
    • I
      btrfs: fix raid6 qstripe kmap · d70cef0d
      Ira Weiny 提交于
      When a qstripe is required an extra page is allocated and mapped.  There
      were 3 problems:
      
      1) There is no corresponding call of kunmap() for the qstripe page.
      2) There is no reason to map the qstripe page more than once if the
         number of bits set in rbio->dbitmap is greater than one.
      3) There is no reason to map the parity page and unmap it each time
         through the loop.
      
      The page memory can continue to be reused with a single mapping on each
      iteration by raid6_call.gen_syndrome() without remapping.  So map the
      page for the duration of the loop.
      
      Similarly, improve the algorithm by mapping the parity page just 1 time.
      
      Fixes: 5a6ac9ea ("Btrfs, raid56: support parity scrub on raid56")
      CC: stable@vger.kernel.org # 4.4.x: c17af965: btrfs: raid56: simplify tracking of Q stripe presence
      CC: stable@vger.kernel.org # 4.4.x
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d70cef0d
  2. 09 2月, 2021 34 次提交