1. 06 12月, 2022 6 次提交
  2. 29 9月, 2022 1 次提交
  3. 26 9月, 2022 3 次提交
  4. 25 7月, 2022 3 次提交
  5. 14 3月, 2022 2 次提交
  6. 07 9月, 2021 1 次提交
  7. 23 8月, 2021 1 次提交
  8. 22 7月, 2021 1 次提交
  9. 21 6月, 2021 4 次提交
    • Q
      btrfs: make page Ordered bit to be subpage compatible · b945a463
      Qu Wenruo 提交于
      This involves the following modification:
      
      - Ordered extent creation
        This is done in process_one_page(), now PAGE_SET_ORDERED will call
        subpage helper to do the work.
      
      - endio functions
        This is done in btrfs_mark_ordered_io_finished().
      
      - btrfs_invalidatepage()
      
      - btrfs_cleanup_ordered_extents()
        Use the subpage page helper, and add an extra branch to exit if the
        locked page have covered the full range.
      
      Now the usage of page Ordered flag for ordered extent accounting is fully
      subpage compatible.
      
      Tested-by: Ritesh Harjani <riteshh@linux.ibm.com> # [ppc64]
      Tested-by: Anand Jain <anand.jain@oracle.com> # [aarch64]
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      b945a463
    • Q
      btrfs: rename PagePrivate2 to PageOrdered inside btrfs · f57ad937
      Qu Wenruo 提交于
      Inside btrfs we use Private2 page status to indicate we have an ordered
      extent with pending IO for the sector.
      
      But the page status name, Private2, tells us nothing about the bit
      itself, so this patch will rename it to Ordered.
      And with extra comment about the bit added, so reader who is still
      uncertain about the page Ordered status, will find the comment pretty
      easily.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f57ad937
    • Q
      btrfs: introduce btrfs_lookup_first_ordered_range() · c095f333
      Qu Wenruo 提交于
      Although we already have btrfs_lookup_first_ordered_extent() and
      btrfs_lookup_ordered_extent(), they all have their own limitations:
      
      - btrfs_lookup_ordered_extent() can't do extra range check
      
        It's only designed to lookup any ordered extent before certain bytenr.
      
      - btrfs_lookup_first_ordered_extent() may not return the first ordered
        extent in the range
      
        It doesn't ensure the first ordered extent is returned.
        The existing callers are only interested in exhausting all ordered
        extents in a range, the order is not important.
      
      For incoming btrfs_invalidatepage() refactoring, we need a way to
      properly iterate all ordered extents in their bytenr order of a range.
      
      So this patch will introduce a new function,
      btrfs_lookup_first_ordered_range(), to do ordered extent with bytenr
      order awareness and extra range check.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c095f333
    • Q
      btrfs: refactor how we finish ordered extent io for endio functions · e65f152e
      Qu Wenruo 提交于
      Btrfs has two endio functions to mark certain io range finished for
      ordered extents:
      
      - __endio_write_update_ordered()
        This is for direct IO
      
      - btrfs_writepage_endio_finish_ordered()
        This for buffered IO.
      
      However they go different routines to handle ordered extent io:
      
      - Whether to iterate through all ordered extents
        __endio_write_update_ordered() will but
        btrfs_writepage_endio_finish_ordered() will not.
      
        In fact, iterating through all ordered extents will benefit later
        subpage support, while for current PAGE_SIZE == sectorsize requirement
        this behavior makes no difference.
      
      - Whether to update page Private2 flag
        __endio_write_update_ordered() will not update page Private2 flag as
        for iomap direct IO, the page can not be even mapped.
        While btrfs_writepage_endio_finish_ordered() will clear Private2 to
        prevent double accounting against btrfs_invalidatepage().
      
      Those differences are pretty subtle, and the ordered extent iterations
      code in callers makes code much harder to read.
      
      So this patch will introduce a new function,
      btrfs_mark_ordered_io_finished(), to do the heavy lifting:
      
      - Iterate through all ordered extents in the range
      - Do the ordered extent accounting
      - Queue the work for finished ordered extent
      
      This function has two new feature:
      
      - Proper underflow detection and recovery
        The old underflow detection will only detect the problem, then
        continue.
        No proper info like root/inode/ordered extent info, nor noisy enough
        to be caught by fstests.
      
        Furthermore when underflow happens, the ordered extent will never
        finish.
      
        New error detection will reset the bytes_left to 0, do proper
        kernel warning, and output extra info including root, ino, ordered
        extent range, the underflow value.
      
      - Prevent double accounting based on Private2 flag
        Now if we find a range without Private2 flag, we will skip to next
        range.
        As that means someone else has already finished the accounting of
        ordered extent.
      
        This makes no difference for current code, but will be a critical part
        for incoming subpage support, as we can call
        btrfs_mark_ordered_io_finished() for multiple sectors if they are
        beyond inode size.
        Thus such double accounting prevention is a key feature for subpage.
      
      Now both endio functions only need to call that new function.
      
      And since the only caller of btrfs_dec_test_first_ordered_pending() is
      removed, also remove btrfs_dec_test_first_ordered_pending() completely.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e65f152e
  10. 29 4月, 2021 1 次提交
    • F
      btrfs: zoned: fix silent data loss after failure splitting ordered extent · adbd914d
      Filipe Manana 提交于
      On a zoned filesystem, sometimes we need to split an ordered extent into 3
      different ordered extents. The original ordered extent is shortened, at
      the front and at the rear, and we create two other new ordered extents to
      represent the trimmed parts of the original ordered extent.
      
      After adjusting the original ordered extent, we create an ordered extent
      to represent the pre-range, and that may fail with ENOMEM for example.
      After that we always try to create the ordered extent for the post-range,
      and if that happens to succeed we end up returning success to the caller
      as we overwrite the 'ret' variable which contained the previous error.
      
      This means we end up with a file range for which there is no ordered
      extent, which results in the range never getting a new file extent item
      pointing to the new data location. And since the split operation did
      not return an error, writeback does not fail and the inode's mapping is
      not flagged with an error, resulting in a subsequent fsync not reporting
      an error either.
      
      It's possibly very unlikely to have the creation of the post-range ordered
      extent succeed after the creation of the pre-range ordered extent failed,
      but it's not impossible.
      
      So fix this by making sure we only create the post-range ordered extent
      if there was no error creating the ordered extent for the pre-range.
      
      Fixes: d22002fd ("btrfs: zoned: split ordered extent when bio is sent")
      CC: stable@vger.kernel.org # 5.12+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      adbd914d
  11. 19 4月, 2021 1 次提交
  12. 09 2月, 2021 6 次提交
    • N
      btrfs: zoned: use ZONE_APPEND write for zoned mode · d8e3fb10
      Naohiro Aota 提交于
      Enable zone append writing for zoned mode. When using zone append, a
      bio is issued to the start of a target zone and the device decides to
      place it inside the zone. Upon completion the device reports the actual
      written position back to the host.
      
      Three parts are necessary to enable zone append mode. First, modify the
      bio to use REQ_OP_ZONE_APPEND in btrfs_submit_bio_hook() and adjust the
      bi_sector to point the beginning of the zone.
      
      Second, record the returned physical address (and disk/partno) to the
      ordered extent in end_bio_extent_writepage() after the bio has been
      completed. We cannot resolve the physical address to the logical address
      because we can neither take locks nor allocate a buffer in this end_bio
      context. So, we need to record the physical address to resolve it later
      in btrfs_finish_ordered_io().
      
      And finally, rewrite the logical addresses of the extent mapping and
      checksum data according to the physical address using btrfs_rmap_block.
      If the returned address matches the originally allocated address, we can
      skip this rewriting process.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d8e3fb10
    • J
      btrfs: save irq flags when looking up an ordered extent · 24533f6a
      Johannes Thumshirn 提交于
      A following patch will add another caller of
      btrfs_lookup_ordered_extent(), but from a bio's endio context.
      
      btrfs_lookup_ordered_extent() uses spin_lock_irq() which unconditionally
      disables interrupts. Change this to spin_lock_irqsave() so interrupts
      aren't disabled and re-enabled unconditionally.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      24533f6a
    • N
      btrfs: zoned: split ordered extent when bio is sent · d22002fd
      Naohiro Aota 提交于
      For a zone append write, the device decides the location the data is being
      written to. Therefore we cannot ensure that two bios are written
      consecutively on the device. In order to ensure that an ordered extent
      maps to a contiguous region on disk, we need to maintain a "one bio ==
      one ordered extent" rule.
      
      Implement splitting of an ordered extent and extent map on bio submission
      to adhere to the rule.
      
      extract_ordered_extent() hooks into btrfs_submit_data_bio() and splits the
      corresponding ordered extent so that the ordered extent's region fits into
      one bio and the corresponding device limits.
      
      Several sanity checks need to be done in extract_ordered_extent() e.g.
      
      - We cannot split once end_bio'd ordered extent because we cannot divide
        ordered->bytes_left for the split ones
      - We do not expect a compressed ordered extent
      - We should not have checksum list because we omit the list splitting.
        Since the function is called before btrfs_wq_submit_bio() or
        btrfs_csum_one_bio(), this should be always ensured.
      
      We also need to split an extent map by creating a new one. If not,
      unpin_extent_cache() complains about the difference between the start of
      the extent map and the file's logical offset.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      d22002fd
    • J
      btrfs: track ordered bytes instead of just dio ordered bytes · 5deb17e1
      Josef Bacik 提交于
      We track dio_bytes because the shrink delalloc code needs to know if we
      have more DIO in flight than we have normal buffered IO.  The reason for
      this is because we can't "flush" DIO, we have to just wait on the
      ordered extents to finish.
      
      However this is true of all ordered extents.  If we have more ordered
      space outstanding than dirty pages we should be waiting on ordered
      extents.  We already are ok on this front technically, because we always
      do a FLUSH_DELALLOC_WAIT loop, but I want to use the ordered counter in
      the preemptive flushing code as well, so change this to count all
      ordered bytes instead of just DIO ordered bytes.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      5deb17e1
    • Q
      btrfs: rework the order of btrfs_ordered_extent::flags · 3c198fe0
      Qu Wenruo 提交于
      [BUG]
      There is a long existing bug in the last parameter of
      btrfs_add_ordered_extent(), in commit 771ed689 ("Btrfs: Optimize
      compressed writeback and reads") back to 2008.
      
      In that ancient commit btrfs_add_ordered_extent() expects the @type
      parameter to be one of the following:
      
      - BTRFS_ORDERED_REGULAR
      - BTRFS_ORDERED_NOCOW
      - BTRFS_ORDERED_PREALLOC
      - BTRFS_ORDERED_COMPRESSED
      
      But we pass 0 in cow_file_range(), which means BTRFS_ORDERED_IO_DONE.
      
      Ironically extra check in __btrfs_add_ordered_extent() won't set the bit
      if we see (type == IO_DONE || type == IO_COMPLETE), and avoid any
      obvious bug.
      
      But this still leads to regular COW ordered extent having no bit to
      indicate its type in various trace events, rendering REGULAR bit
      useless.
      
      [FIX]
      Change the following aspects to avoid such problem:
      
      - Reorder btrfs_ordered_extent::flags
        Now the type bits go first (REGULAR/NOCOW/PREALLCO/COMPRESSED), then
        DIRECT bit, finally extra status bits like IO_DONE/COMPLETE/IOERR.
      
      - Add extra ASSERT() for btrfs_add_ordered_extent_*()
      
      - Remove @type parameter for btrfs_add_ordered_extent_compress()
        As the only valid @type here is BTRFS_ORDERED_COMPRESSED.
      
      - Remove the unnecessary special check for IO_DONE/COMPLETE in
        __btrfs_add_ordered_extent()
        This is just to make the code work, with extra ASSERT(), there are
        limited values can be passed in.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3c198fe0
    • Q
      btrfs: refactor btrfs_dec_test_* functions for ordered extents · 58f74b22
      Qu Wenruo 提交于
      The refactoring involves the following modifications:
      
      - Return bool instead of int
      
      - Parameter update for @cached of btrfs_dec_test_first_ordered_pending()
        For btrfs_dec_test_first_ordered_pending(), @cached is only used to
        return the finished ordered extent.
        Rename it to @finished_ret.
      
      - Comment updates
      
        * Change one stale comment
          Which still refers to btrfs_dec_test_ordered_pending(), but the
          context is calling  btrfs_dec_test_first_ordered_pending().
        * Follow the common comment style for both functions
          Add more detailed descriptions for parameters and the return value
        * Move the reason why test_and_set_bit() is used into the call sites
      
      - Change how the return value is calculated
        The most anti-human part of the return value is:
      
          if (...)
      	ret = 1;
          ...
          return ret == 0;
      
        This means, when we set ret to 1, the function returns 0.
        Change the local variable name to @finished, and directly return the
        value of it.
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58f74b22
  13. 10 12月, 2020 1 次提交
    • Q
      btrfs: remove btrfs_find_ordered_sum call from btrfs_lookup_bio_sums · 9e46458a
      Qu Wenruo 提交于
      The function btrfs_lookup_bio_sums() is only called for read bios.
      While btrfs_find_ordered_sum() is to search ordered extent sums, which
      is only for write path.
      
      This means to read a page we either:
      
      - Submit read bio if it's not uptodate
        This means we only need to search csum tree for checksums.
      
      - The page is already uptodate
        It can be marked uptodate for previous read, or being marked dirty.
        As we always mark page uptodate for dirty page.
        In that case, we don't need to submit read bio at all, thus no need
        to search any checksums.
      
      Remove the btrfs_find_ordered_sum() call in btrfs_lookup_bio_sums().
      And since btrfs_lookup_bio_sums() is the only caller for
      btrfs_find_ordered_sum(), also remove the implementation.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      9e46458a
  14. 08 12月, 2020 4 次提交
  15. 07 10月, 2020 5 次提交