1. 22 1月, 2018 19 次提交
  2. 07 12月, 2017 1 次提交
    • J
      btrfs: Fix quota reservation leak on preallocated files · b430b775
      Justin Maggard 提交于
      Commit c6887cd1 ("Btrfs: don't do nocow check unless we have to")
      changed the behavior of __btrfs_buffered_write() so that it first tries
      to get a data space reservation, and then skips the relatively expensive
      nocow check if the reservation succeeded.
      
      If we have quotas enabled, the data space reservation also includes a
      quota reservation.  But in the rewrite case, the space has already been
      accounted for in qgroups.  So btrfs_check_data_free_space() increases
      the quota reservation, but it never gets decreased when the data
      actually gets written and overwrites the pre-existing data.  So we're
      left with both the qgroup and qgroup reservation accounting for the same
      space.
      
      This commit adds the missing btrfs_qgroup_free_data() call in the case
      of BTRFS_ORDERED_PREALLOC extents.
      
      Fixes: c6887cd1 ("Btrfs: don't do nocow check unless we have to")
      Signed-off-by: NJustin Maggard <jmaggard@netgear.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b430b775
  3. 16 11月, 2017 1 次提交
    • F
      Btrfs: fix reported number of inode blocks after buffered append writes · e3b8a485
      Filipe Manana 提交于
      The patch from commit a7e3b975 ("Btrfs: fix reported number of inode
      blocks") introduced a regression where if we do a buffered write starting
      at position equal to or greater than the file's size and then stat(2) the
      file before writeback is triggered, the number of used blocks does not
      change (unless there's a prealloc/unwritten extent). Example:
      
        $ xfs_io -f -c "pwrite -S 0xab 0 64K" foobar
        $ du -h foobar
        0	foobar
        $ sync
        $ du -h foobar
        64K	foobar
      
      The first version of that patch didn't had this regression and the second
      version, which was the one committed, was made only to address some
      performance regression detected by the intel test robots using fs_mark.
      
      This fixes the regression by setting the new delaloc bit in the range, and
      doing it at btrfs_dirty_pages() while setting the regular dealloc bit as
      well, so that this way we set both bits at once avoiding navigation of the
      inode's io tree twice. Doing it at btrfs_dirty_pages() is also the most
      meaninful place, as we should set the new dellaloc bit when if we set the
      delalloc bit, which happens only if we copied bytes into the pages at
      __btrfs_buffered_write().
      
      This was making some of LTP's du tests fail, which can be quickly run
      using a command line like the following:
      
        $ ./runltp -q -p -l /ltp.log -f commands -s du -d /mnt
      
      Fixes: a7e3b975 ("Btrfs: fix reported number of inode blocks")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e3b8a485
  4. 15 11月, 2017 2 次提交
  5. 02 11月, 2017 5 次提交
    • J
      btrfs: move btrfs_truncate_block out of trans handle · ddfae63c
      Josef Bacik 提交于
      Since we do a delalloc reserve in btrfs_truncate_block we can deadlock
      with freeze.  If somebody else is trying to allocate metadata for this
      inode and it gets stuck in start_delalloc_inodes because of freeze we
      will deadlock.  Be safe and move this outside of a trans handle.  This
      also has a side-effect of making sure that we're not leaving stale data
      behind in the other_encoding or encryption case.  Not an issue now since
      nobody uses it, but it would be a problem in the future.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ddfae63c
    • J
      btrfs: make the delalloc block rsv per inode · 69fe2d75
      Josef Bacik 提交于
      The way we handle delalloc metadata reservations has gotten
      progressively more complicated over the years.  There is so much cruft
      and weirdness around keeping the reserved count and outstanding counters
      consistent and handling the error cases that it's impossible to
      understand.
      
      Fix this by making the delalloc block rsv per-inode.  This way we can
      calculate the actual size of the outstanding metadata reservations every
      time we make a change, and then reserve the delta based on that amount.
      This greatly simplifies the code everywhere, and makes the error
      handling in btrfs_delalloc_reserve_metadata far less terrifying.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      69fe2d75
    • J
      Btrfs: rework outstanding_extents · 8b62f87b
      Josef Bacik 提交于
      Right now we do a lot of weird hoops around outstanding_extents in order
      to keep the extent count consistent.  This is because we logically
      transfer the outstanding_extent count from the initial reservation
      through the set_delalloc_bits.  This makes it pretty difficult to get a
      handle on how and when we need to mess with outstanding_extents.
      
      Fix this by revamping the rules of how we deal with outstanding_extents.
      Now instead everybody that is holding on to a delalloc extent is
      required to increase the outstanding extents count for itself.  This
      means we'll have something like this
      
      btrfs_delalloc_reserve_metadata	- outstanding_extents = 1
       btrfs_set_extent_delalloc	- outstanding_extents = 2
      btrfs_release_delalloc_extents	- outstanding_extents = 1
      
      for an initial file write.  Now take the append write where we extend an
      existing delalloc range but still under the maximum extent size
      
      btrfs_delalloc_reserve_metadata - outstanding_extents = 2
        btrfs_set_extent_delalloc
          btrfs_set_bit_hook		- outstanding_extents = 3
          btrfs_merge_extent_hook	- outstanding_extents = 2
      btrfs_delalloc_release_extents	- outstanding_extnets = 1
      
      In order to make the ordered extent transition we of course must now
      make ordered extents carry their own outstanding_extent reservation, so
      for cow_file_range we end up with
      
      btrfs_add_ordered_extent	- outstanding_extents = 2
      clear_extent_bit		- outstanding_extents = 1
      btrfs_remove_ordered_extent	- outstanding_extents = 0
      
      This makes all manipulations of outstanding_extents much more explicit.
      Every successful call to btrfs_delalloc_reserve_metadata _must_ now be
      combined with btrfs_release_delalloc_extents, even in the error case, as
      that is the only function that actually modifies the
      outstanding_extents counter.
      
      The drawback to this is now we are much more likely to have transient
      cases where outstanding_extents is much larger than it actually should
      be.  This could happen before as we manipulated the delalloc bits, but
      now it happens basically at every write.  This may put more pressure on
      the ENOSPC flushing code, but I think making this code simpler is worth
      the cost.  I have another change coming to mitigate this side-effect
      somewhat.
      
      I also added trace points for the counter manipulation.  These were used
      by a bpf script I wrote to help track down leak issues.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      8b62f87b
    • Z
      btrfs: add a flag to iterate_inodes_from_logical to find all extent refs for uncompressed extents · c995ab3c
      Zygo Blaxell 提交于
      The LOGICAL_INO ioctl provides a backward mapping from extent bytenr and
      offset (encoded as a single logical address) to a list of extent refs.
      LOGICAL_INO complements TREE_SEARCH, which provides the forward mapping
      (extent ref -> extent bytenr and offset, or logical address).  These are
      useful capabilities for programs that manipulate extents and extent
      references from userspace (e.g. dedup and defrag utilities).
      
      When the extents are uncompressed (and not encrypted and not other),
      check_extent_in_eb performs filtering of the extent refs to remove any
      extent refs which do not contain the same extent offset as the 'logical'
      parameter's extent offset.  This prevents LOGICAL_INO from returning
      references to more than a single block.
      
      To find the set of extent references to an uncompressed extent from [a, b),
      userspace has to run a loop like this pseudocode:
      
      	for (i = a; i < b; ++i)
      		extent_ref_set += LOGICAL_INO(i);
      
      At each iteration of the loop (up to 32768 iterations for a 128M extent),
      data we are interested in is collected in the kernel, then deleted by
      the filter in check_extent_in_eb.
      
      When the extents are compressed (or encrypted or other), the 'logical'
      parameter must be an extent bytenr (the 'a' parameter in the loop).
      No filtering by extent offset is done (or possible?) so the result is
      the complete set of extent refs for the entire extent.  This removes
      the need for the loop, since we get all the extent refs in one call.
      
      Add an 'ignore_offset' argument to iterate_inodes_from_logical,
      [...several levels of function call graph...], and check_extent_in_eb, so
      that we can disable the extent offset filtering for uncompressed extents.
      This flag can be set by an improved version of the LOGICAL_INO ioctl to
      get either behavior as desired.
      
      There is no functional change in this patch.  The new flag is always
      false.
      Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor coding style fixes ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c995ab3c
    • D
      btrfs: allow to set compression level for zlib · f51d2b59
      David Sterba 提交于
      Preliminary support for setting compression level for zlib, the
      following works:
      
      $ mount -o compess=zlib                 # default
      $ mount -o compess=zlib0                # same
      $ mount -o compess=zlib9                # level 9, slower sync, less data
      $ mount -o compess=zlib1                # level 1, faster sync, more data
      $ mount -o remount,compress=zlib3	# level set by remount
      
      The compress-force works the same as compress'.  The level is visible in
      the same format in /proc/mounts. Level set via file property does not
      work yet.
      
      Required patch: "btrfs: prepare for extensions in compression options"
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f51d2b59
  6. 30 10月, 2017 8 次提交
  7. 26 9月, 2017 3 次提交
    • L
      Btrfs: fix unexpected result when dio reading corrupted blocks · 99c4e3b9
      Liu Bo 提交于
      commit 4246a0b6 ("block: add a bi_error field to struct bio")
      changed the logic of how dio read endio reports errors.
      
      For single stripe dio read, %bio->bi_status reflects the error before
      verifying checksum, and now we're updating it when data block matches
      with its checksum, while in the mismatching case, %bio->bi_status is
      not updated to relfect that.
      
      When some blocks in a file have been corrupted on disk, reading such a
      file ends up with
      
      1) checksum errors are reported in kernel log
      2) read(2) returns successfully with some content being 0x01.
      
      In order to fix it, we need to report its checksum mismatch error to
      the upper layer (dio layer in this case) as well.
      
      Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reported-by: NGoffredo Baroncelli <kreijack@inwind.it>
      Tested-by: NGoffredo Baroncelli <kreijack@inwind.it>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      99c4e3b9
    • N
      btrfs: finish ordered extent cleaning if no progress is found · 67c003f9
      Naohiro Aota 提交于
      __endio_write_update_ordered() repeats the search until it reaches the end
      of the specified range. This works well with direct IO path, because before
      the function is called, it's ensured that there are ordered extents filling
      whole the range. It's not the case, however, when it's called from
      run_delalloc_range(): it is possible to have error in the midle of the loop
      in e.g. run_delalloc_nocow(), so that there exisits the range not covered
      by any ordered extents. By cleaning such "uncomplete" range,
      __endio_write_update_ordered() stucks at offset where there're no ordered
      extents.
      
      Since the ordered extents are created from head to tail, we can stop the
      search if there are no offset progress.
      
      Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
      Cc: <stable@vger.kernel.org> # 4.12
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      67c003f9
    • N
      btrfs: clear ordered flag on cleaning up ordered extents · 63d71450
      Naohiro Aota 提交于
      Commit 52427260 ("btrfs: Handle delalloc error correctly to avoid
      ordered extent hang") introduced btrfs_cleanup_ordered_extents() to cleanup
      submitted ordered extents. However, it does not clear the ordered bit
      (Private2) of corresponding pages. Thus, the following BUG occurs from
      free_pages_check_bad() (on btrfs/125 with nospace_cache).
      
      BUG: Bad page state in process btrfs  pfn:3fa787
      page:ffffdf2acfe9e1c0 count:0 mapcount:0 mapping:          (null) index:0xd
      flags: 0x8000000000002008(uptodate|private_2)
      raw: 8000000000002008 0000000000000000 000000000000000d 00000000ffffffff
      raw: ffffdf2acf5c1b20 ffffb443802238b0 0000000000000000 0000000000000000
      page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
      bad because of flags: 0x2000(private_2)
      
      This patch clears the flag same as other places calling
      btrfs_dec_test_ordered_pending() for every page in the specified range.
      
      Fixes: 52427260 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
      Cc: <stable@vger.kernel.org> # 4.12
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NQu Wenruo <quwenruo.btrfs@gmx.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      63d71450
  8. 24 8月, 2017 1 次提交