1. 24 3月, 2020 11 次提交
  2. 24 1月, 2020 1 次提交
    • F
      Btrfs: make deduplication with range including the last block work · 831d2fa2
      Filipe Manana 提交于
      Since btrfs was migrated to use the generic VFS helpers for clone and
      deduplication, it stopped allowing for the last block of a file to be
      deduplicated when the source file size is not sector size aligned (when
      eof is somewhere in the middle of the last block). There are two reasons
      for that:
      
      1) The generic code always rounds down, to a multiple of the block size,
         the range's length for deduplications. This means we end up never
         deduplicating the last block when the eof is not block size aligned,
         even for the safe case where the destination range's end offset matches
         the destination file's size. That rounding down operation is done at
         generic_remap_check_len();
      
      2) Because of that, the btrfs specific code does not expect anymore any
         non-aligned range length's for deduplication and therefore does not
         work if such nona-aligned length is given.
      
      This patch addresses that second part, and it depends on a patch that
      fixes generic_remap_check_len(), in the VFS, which was submitted ealier
      and has the following subject:
      
        "fs: allow deduplication of eof block into the end of the destination file"
      
      These two patches address reports from users that started seeing lower
      deduplication rates due to the last block never being deduplicated when
      the file size is not aligned to the filesystem's block size.
      
      Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/
      CC: stable@vger.kernel.org # 5.1+
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      831d2fa2
  3. 20 1月, 2020 1 次提交
  4. 17 1月, 2020 1 次提交
  5. 13 12月, 2019 2 次提交
    • J
      btrfs: abort transaction after failed inode updates in create_subvol · c7e54b51
      Josef Bacik 提交于
      We can just abort the transaction here, and in fact do that for every
      other failure in this function except these two cases.
      
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c7e54b51
    • F
      Btrfs: fix hole extent items with a zero size after range cloning · 147271e3
      Filipe Manana 提交于
      Normally when cloning a file range if we find an implicit hole at the end
      of the range we assume it is because the NO_HOLES feature is enabled.
      However that is not always the case. One well known case [1] is when we
      have a power failure after mixing buffered and direct IO writes against
      the same file.
      
      In such cases we need to punch a hole in the destination file, and if
      the NO_HOLES feature is not enabled, we need to insert explicit file
      extent items to represent the hole. After commit 690a5dbf
      ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
      extents"), we started to insert file extent items representing the hole
      with an item size of 0, which is invalid and should be 53 bytes (the size
      of a btrfs_file_extent_item structure), resulting in all sorts of
      corruptions and invalid memory accesses. This is detected by the tree
      checker when we attempt to write a leaf to disk.
      
      The problem can be sporadically triggered by test case generic/561 from
      fstests. That test case does not exercise power failure and creates a new
      filesystem when it starts, so it does not use a filesystem created by any
      previous test that tests power failure. However the test does both
      buffered and direct IO writes (through fsstress) and it's precisely that
      which is creating the implicit holes in files. That happens even before
      the commit mentioned earlier. I need to investigate why we get those
      implicit holes to check if there is a real problem or not. For now this
      change fixes the regression of introducing file extent items with an item
      size of 0 bytes.
      
      Fix the issue by calling btrfs_punch_hole_range() without passing a
      btrfs_clone_extent_info structure, which ensures file extent items are
      inserted to represent the hole with a correct item size. We were passing
      a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
      which was causing the insertion of file extent items with an item size
      of 0.
      
      [1] https://www.spinics.net/lists/linux-btrfs/msg75350.htmlReported-by: NDavid Sterba <dsterba@suse.com>
      Fixes: 690a5dbf ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      147271e3
  6. 19 11月, 2019 3 次提交
  7. 18 11月, 2019 4 次提交
  8. 05 11月, 2019 1 次提交
    • D
      btrfs: un-deprecate ioctls START_SYNC and WAIT_SYNC · a5009d3a
      David Sterba 提交于
      The two ioctls START_SYNC and WAIT_SYNC were mistakenly marked as
      deprecated and scheduled for removal but we actualy do use them for
      'btrfs subvolume delete -C/-c'. The deprecated thing in ebc87351
      should have been just the async flag for subvolume creation.
      
      The deprecation has been added in this development cycle, remove it
      until it's time.
      
      Fixes: ebc87351 ("btrfs: Deprecate BTRFS_SUBVOL_CREATE_ASYNC flag")
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a5009d3a
  9. 16 10月, 2019 1 次提交
    • Q
      btrfs: qgroup: Always free PREALLOC META reserve in btrfs_delalloc_release_extents() · 8702ba93
      Qu Wenruo 提交于
      [Background]
      Btrfs qgroup uses two types of reserved space for METADATA space,
      PERTRANS and PREALLOC.
      
      PERTRANS is metadata space reserved for each transaction started by
      btrfs_start_transaction().
      While PREALLOC is for delalloc, where we reserve space before joining a
      transaction, and finally it will be converted to PERTRANS after the
      writeback is done.
      
      [Inconsistency]
      However there is inconsistency in how we handle PREALLOC metadata space.
      
      The most obvious one is:
      In btrfs_buffered_write():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), reserve_bytes, true);
      
      We always free qgroup PREALLOC meta space.
      
      While in btrfs_truncate_block():
      	btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, (ret != 0));
      
      We only free qgroup PREALLOC meta space when something went wrong.
      
      [The Correct Behavior]
      The correct behavior should be the one in btrfs_buffered_write(), we
      should always free PREALLOC metadata space.
      
      The reason is, the btrfs_delalloc_* mechanism works by:
      - Reserve metadata first, even it's not necessary
        In btrfs_delalloc_reserve_metadata()
      
      - Free the unused metadata space
        Normally in:
        btrfs_delalloc_release_extents()
        |- btrfs_inode_rsv_release()
           Here we do calculation on whether we should release or not.
      
      E.g. for 64K buffered write, the metadata rsv works like:
      
      /* The first page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=0
      total:		num_bytes=calc_inode_reservations()
      /* The first page caused one outstanding extent, thus needs metadata
         rsv */
      
      /* The 2nd page */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed
      /* The 2nd page doesn't cause new outstanding extent, needs no new meta
         rsv, so we free what we have reserved */
      
      /* The 3rd~16th pages */
      reserve_meta:	num_bytes=calc_inode_reservations()
      free_meta:	num_bytes=calc_inode_reservations()
      total:		not changed (still space for one outstanding extent)
      
      This means, if btrfs_delalloc_release_extents() determines to free some
      space, then those space should be freed NOW.
      So for qgroup, we should call btrfs_qgroup_free_meta_prealloc() other
      than btrfs_qgroup_convert_reserved_meta().
      
      The good news is:
      - The callers are not that hot
        The hottest caller is in btrfs_buffered_write(), which is already
        fixed by commit 336a8bb8 ("btrfs: Fix wrong
        btrfs_delalloc_release_extents parameter"). Thus it's not that
        easy to cause false EDQUOT.
      
      - The trans commit in advance for qgroup would hide the bug
        Since commit f5fef459 ("btrfs: qgroup: Make qgroup async transaction
        commit more aggressive"), when btrfs qgroup metadata free space is slow,
        it will try to commit transaction and free the wrongly converted
        PERTRANS space, so it's not that easy to hit such bug.
      
      [FIX]
      So to fix the problem, remove the @qgroup_free parameter for
      btrfs_delalloc_release_extents(), and always pass true to
      btrfs_inode_rsv_release().
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Fixes: 43b18595 ("btrfs: qgroup: Use separate meta reservation type for delalloc")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8702ba93
  10. 09 9月, 2019 7 次提交
  11. 04 7月, 2019 1 次提交
  12. 02 7月, 2019 1 次提交
  13. 01 7月, 2019 3 次提交
    • D
      vfs: create a generic checking function for FS_IOC_FSSETXATTR · 7b0e492e
      Darrick J. Wong 提交于
      Create a generic checking function for the incoming FS_IOC_FSSETXATTR
      fsxattr values so that we can standardize some of the implementation
      behaviors.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      7b0e492e
    • D
      vfs: create a generic checking and prep function for FS_IOC_SETFLAGS · 5aca2842
      Darrick J. Wong 提交于
      Create a generic function to check incoming FS_IOC_SETFLAGS flag values
      and later prepare the inode for updates so that we can standardize the
      implementations that follow ext4's flag values.
      
      Note that the efivarfs implementation no longer fails a no-op SETFLAGS
      without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDavid Sterba <dsterba@suse.com>
      Reviewed-by: NBob Peterson <rpeterso@redhat.com>
      5aca2842
    • Q
      btrfs: Flush before reflinking any extent to prevent NOCOW write falling back... · a94d1d0c
      Qu Wenruo 提交于
      btrfs: Flush before reflinking any extent to prevent NOCOW write falling back to COW without data reservation
      
      [BUG]
      The following script can cause unexpected fsync failure:
      
        #!/bin/bash
      
        dev=/dev/test/test
        mnt=/mnt/btrfs
      
        mkfs.btrfs -f $dev -b 512M > /dev/null
        mount $dev $mnt -o nospace_cache
      
        # Prealloc one extent
        xfs_io -f -c "falloc 8k 64m" $mnt/file1
        # Fill the remaining data space
        xfs_io -f -c "pwrite 0 -b 4k 512M" $mnt/padding
        sync
      
        # Write into the prealloc extent
        xfs_io -c "pwrite 1m 16m" $mnt/file1
      
        # Reflink then fsync, fsync would fail due to ENOSPC
        xfs_io -c "reflink $mnt/file1 8k 0 4k" -c "fsync" $mnt/file1
        umount $dev
      
      The fsync fails with ENOSPC, and the last page of the buffered write is
      lost.
      
      [CAUSE]
      This is caused by:
      - Btrfs' back reference only has extent level granularity
        So write into shared extent must be COWed even only part of the extent
        is shared.
      
      So for above script we have:
      - fallocate
        Create a preallocated extent where we can do NOCOW write.
      
      - fill all the remaining data and unallocated space
      
      - buffered write into preallocated space
        As we have not enough space available for data and the extent is not
        shared (yet) we fall into NOCOW mode.
      
      - reflink
        Now part of the large preallocated extent is shared, later write
        into that extent must be COWed.
      
      - fsync triggers writeback
        But now the extent is shared and therefore we must fallback into COW
        mode, which fails with ENOSPC since there's not enough space to
        allocate data extents.
      
      [WORKAROUND]
      The workaround is to ensure any buffered write in the related extents
      (not just the reflink source range) get flushed before reflink/dedupe,
      so that NOCOW writes succeed that happened before reflinking succeed.
      
      The workaround is expensive, we could do it better by only flushing
      NOCOW range, but that needs extra accounting for NOCOW range.
      For now, fix the possible data loss first.
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a94d1d0c
  14. 20 6月, 2019 1 次提交
  15. 17 6月, 2019 1 次提交
    • F
      Btrfs: fix failure to persist compression property xattr deletion on fsync · 3763771c
      Filipe Manana 提交于
      After the recent series of cleanups in the properties and xattrs modules
      that landed in the 5.2 merge window, we ended up with a regression where
      after deleting the compression xattr property through the setflags ioctl,
      we don't set the BTRFS_INODE_COPY_EVERYTHING flag in the inode anymore.
      As a consequence, if the inode was fsync'ed when it had the compression
      property set, after deleting the compression property through the setflags
      ioctl and fsync'ing again the inode, the log will still contain the
      compression xattr, because the inode did not had that bit set, which
      made the fsync not delete all xattrs from the log and copy all xattrs
      from the subvolume tree to the log tree.
      
      This regression happens due to the fact that that series of cleanups
      made btrfs_set_prop() call the old function do_setxattr() (which is now
      named btrfs_setxattr()), and not the old version of btrfs_setxattr(),
      which is now called btrfs_setxattr_trans().
      
      Fix this by setting the BTRFS_INODE_COPY_EVERYTHING bit in the current
      btrfs_setxattr() function and remove it from everywhere else, including
      its setup at btrfs_ioctl_setflags(). This is cleaner, avoids similar
      regressions in the future, and centralizes the setup of the bit. After
      all, the need to setup this bit should only be in the xattrs module,
      since it is an implementation of xattrs.
      
      Fixes: 04e6863b ("btrfs: split btrfs_setxattr calls regarding transaction")
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3763771c
  16. 30 4月, 2019 1 次提交