1. 05 12月, 2021 5 次提交
  2. 24 8月, 2021 1 次提交
    • D
      xfs: only set IOMAP_F_SHARED when providing a srcmap to a write · 72a048c1
      Darrick J. Wong 提交于
      While prototyping a free space defragmentation tool, I observed an
      unexpected IO error while running a sequence of commands that can be
      recreated by the following sequence of commands:
      
      # xfs_io -f -c "pwrite -S 0x58 -b 10m 0 10m" file1
      # cp --reflink=always file1 file2
      # punch-alternating -o 1 file2
      # xfs_io -c "funshare 0 10m" file2
      fallocate: Input/output error
      
      I then scraped this (abbreviated) stack trace from dmesg:
      
      WARNING: CPU: 0 PID: 30788 at fs/iomap/buffered-io.c:577 iomap_write_begin+0x376/0x450
      CPU: 0 PID: 30788 Comm: xfs_io Not tainted 5.14.0-rc6-xfsx #rc6 5ef57b62a900814b3e4d885c755e9014541c8732
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:iomap_write_begin+0x376/0x450
      RSP: 0018:ffffc90000c0fc20 EFLAGS: 00010297
      RAX: 0000000000000001 RBX: ffffc90000c0fd10 RCX: 0000000000001000
      RDX: ffffc90000c0fc54 RSI: 000000000000000c RDI: 000000000000000c
      RBP: ffff888005d5dbd8 R08: 0000000000102000 R09: ffffc90000c0fc50
      R10: 0000000000b00000 R11: 0000000000101000 R12: ffffea0000336c40
      R13: 0000000000001000 R14: ffffc90000c0fd10 R15: 0000000000101000
      FS:  00007f4b8f62fe40(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000056361c554108 CR3: 000000000524e004 CR4: 00000000001706f0
      Call Trace:
       iomap_unshare_actor+0x95/0x140
       iomap_apply+0xfa/0x300
       iomap_file_unshare+0x44/0x60
       xfs_reflink_unshare+0x50/0x140 [xfs 61947ea9b3a73e79d747dbc1b90205e7987e4195]
       xfs_file_fallocate+0x27c/0x610 [xfs 61947ea9b3a73e79d747dbc1b90205e7987e4195]
       vfs_fallocate+0x133/0x330
       __x64_sys_fallocate+0x3e/0x70
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f4b8f79140a
      
      Looking at the iomap tracepoints, I saw this:
      
      iomap_iter:           dev 8:64 ino 0x100 pos 0 length 0 flags WRITE|0x80 (0x81) ops xfs_buffered_write_iomap_ops caller iomap_file_unshare
      iomap_iter_dstmap:    dev 8:64 ino 0x100 bdev 8:64 addr -1 offset 0 length 131072 type DELALLOC flags SHARED
      iomap_iter_srcmap:    dev 8:64 ino 0x100 bdev 8:64 addr 147456 offset 0 length 4096 type MAPPED flags
      iomap_iter:           dev 8:64 ino 0x100 pos 0 length 4096 flags WRITE|0x80 (0x81) ops xfs_buffered_write_iomap_ops caller iomap_file_unshare
      iomap_iter_dstmap:    dev 8:64 ino 0x100 bdev 8:64 addr -1 offset 4096 length 4096 type DELALLOC flags SHARED
      console:              WARNING: CPU: 0 PID: 30788 at fs/iomap/buffered-io.c:577 iomap_write_begin+0x376/0x450
      
      The first time funshare calls ->iomap_begin, xfs sees that the first
      block is shared and creates a 128k delalloc reservation in the COW fork.
      The delalloc reservation is returned as dstmap, and the shared block is
      returned as srcmap.  So far so good.
      
      funshare calls ->iomap_begin to try the second block.  This time there's
      no srcmap (punch-alternating punched it out!) but we still have the
      delalloc reservation in the COW fork.  Therefore, we again return the
      reservation as dstmap and the hole as srcmap.  iomap_unshare_iter
      incorrectly tries to unshare the hole, which __iomap_write_begin rejects
      because shared regions must be fully written and therefore cannot
      require zeroing.
      
      Therefore, change the buffered write iomap_begin function not to set
      IOMAP_F_SHARED when there isn't a source mapping to read from for the
      unsharing.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      72a048c1
  3. 20 8月, 2021 2 次提交
  4. 27 5月, 2021 1 次提交
    • G
      xfs: Fix fall-through warnings for Clang · 53004ee7
      Gustavo A. R. Silva 提交于
      In preparation to enable -Wimplicit-fallthrough for Clang, fix
      the following warnings by replacing /* fall through */ comments,
      and its variants, with the new pseudo-keyword macro fallthrough:
      
      fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      
      Notice that Clang doesn't recognize /* fall through */ comments as
      implicit fall-through markings, so in order to globally enable
      -Wimplicit-fallthrough for Clang, these comments need to be
      replaced with fallthrough; in the whole codebase.
      
      Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      53004ee7
  5. 16 4月, 2021 2 次提交
  6. 08 4月, 2021 2 次提交
  7. 11 2月, 2021 1 次提交
    • B
      xfs: restore shutdown check in mapped write fault path · e4826691
      Brian Foster 提交于
      XFS triggers an iomap warning in the write fault path due to a
      !PageUptodate() page if a write fault happens to occur on a page
      that recently failed writeback. The iomap writeback error handling
      code can clear the Uptodate flag if no portion of the page is
      submitted for I/O. This is reproduced by fstest generic/019, which
      combines various forms of I/O with simulated disk failures that
      inevitably lead to filesystem shutdown (which then unconditionally
      fails page writeback).
      
      This is a regression introduced by commit f150b423 ("xfs: split
      the iomap ops for buffered vs direct writes") due to the removal of
      a shutdown check and explicit error return in the ->iomap_begin()
      path used by the write fault path. The explicit error return
      historically translated to a SIGBUS, but now carries on with iomap
      processing where it complains about the unexpected state. Restore
      the shutdown check to xfs_buffered_write_iomap_begin() to restore
      historical behavior.
      
      Fixes: f150b423 ("xfs: split the iomap ops for buffered vs direct writes")
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      e4826691
  8. 04 2月, 2021 6 次提交
  9. 02 2月, 2021 1 次提交
    • D
      xfs: reduce exclusive locking on unaligned dio · ed1128c2
      Dave Chinner 提交于
      Attempt shared locking for unaligned DIO, but only if the the
      underlying extent is already allocated and in written state. On
      failure, retry with the existing exclusive locking.
      
      Test case is fio randrw of 512 byte IOs using AIO and an iodepth of
      32 IOs.
      
      Vanilla:
      
        READ: bw=4560KiB/s (4670kB/s), 4560KiB/s-4560KiB/s (4670kB/s-4670kB/s), io=134MiB (140MB), run=30001-30001msec
        WRITE: bw=4567KiB/s (4676kB/s), 4567KiB/s-4567KiB/s (4676kB/s-4676kB/s), io=134MiB (140MB), run=30001-30001msec
      
      Patched:
         READ: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1127MiB (1182MB), run=30002-30002msec
        WRITE: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1128MiB (1183MB), run=30002-30002msec
      
      That's an improvement from ~18k IOPS to a ~150k IOPS, which is
      about the IOPS limit of the VM block device setup I'm testing on.
      
      4kB block IO comparison:
      
         READ: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8868MiB (9299MB), run=30002-30002msec
        WRITE: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8878MiB (9309MB), run=30002-30002msec
      
      Which is ~150k IOPS, same as what the test gets for sub-block
      AIO+DIO writes with this patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [hch: rebased, split unaligned from nowait]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ed1128c2
  10. 23 1月, 2021 2 次提交
  11. 20 11月, 2020 1 次提交
    • D
      xfs: don't allow NOWAIT DIO across extent boundaries · 883a790a
      Dave Chinner 提交于
      Jens has reported a situation where partial direct IOs can be issued
      and completed yet still return -EAGAIN. We don't want this to report
      a short IO as we want XFS to complete user DIO entirely or not at
      all.
      
      This partial IO situation can occur on a write IO that is split
      across an allocated extent and a hole, and the second mapping is
      returning EAGAIN because allocation would be required.
      
      The trivial reproducer:
      
      $ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo
      wrote 4096/4096 bytes at offset 0
      4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec)
      pwrite: Resource temporarily unavailable
      $
      
      The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done
      the first 4kB write:
      
       xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000
       iomap_apply:          dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
       xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
       xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
       xfs_iomap_found:      dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1
       iomap_apply_dstmap:   dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY
      
      Here the first iomap loop has mapped the first 4kB of the file and
      issued the IO, and we enter the second iomap_apply loop:
      
       iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
       xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
       xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
      
      And we exit with -EAGAIN out because we hit the allocate case trying
      to make the second 4kB block.
      
      Then IO completes on the first 4kB and the original IO context
      completes and unlocks the inode, returning -EAGAIN to userspace:
      
       xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096
       xfs_iunlock:          dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write
      
      There are other vectors to the same problem when we re-enter the
      mapping code if we have to make multiple mappinfs under NOWAIT
      conditions. e.g. failing trylocks, COW extents being found,
      allocation being required, and so on.
      
      Avoid all these potential problems by only allowing IOMAP_NOWAIT IO
      to go ahead if the mapping we retrieve for the IO spans an entire
      allocated extent. This avoids the possibility of subsequent mappings
      to complete the IO from triggering NOWAIT semantics by any means as
      NOWAIT IO will now only enter the mapping code once per NOWAIT IO.
      Reported-and-tested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      883a790a
  12. 05 8月, 2020 1 次提交
  13. 29 7月, 2020 3 次提交
  14. 27 5月, 2020 3 次提交
  15. 20 5月, 2020 2 次提交
  16. 21 1月, 2020 1 次提交
  17. 14 11月, 2019 1 次提交
  18. 12 11月, 2019 1 次提交
  19. 11 11月, 2019 1 次提交
    • D
      xfs: refactor "does this fork map blocks" predicate · 2fe4f928
      Darrick J. Wong 提交于
      Replace the open-coded checks for whether or not an inode fork maps
      blocks with a macro that will implant the code for us.  This helps us
      declutter the bmap code a bit.
      
      Note that I had to use a macro instead of a static inline function
      because of C header dependency problems between xfs_inode.h and
      xfs_inode_fork.h.
      
      Conversion was performed with the following Coccinelle script:
      
      @@
      expression ip, w;
      @@
      
      - XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE
      + xfs_ifork_has_extents(ip, w)
      
      @@
      expression ip, w;
      @@
      
      - XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE
      + !xfs_ifork_has_extents(ip, w)
      
      @@
      expression ip, w;
      @@
      
      - XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_BTREE || XFS_IFORK_FORMAT(ip, w) == XFS_DINODE_FMT_EXTENTS
      + xfs_ifork_has_extents(ip, w)
      
      @@
      expression ip, w;
      @@
      
      - XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_BTREE && XFS_IFORK_FORMAT(ip, w) != XFS_DINODE_FMT_EXTENTS
      + !xfs_ifork_has_extents(ip, w)
      
      @@
      expression ip, w;
      @@
      
      - (xfs_ifork_has_extents(ip, w))
      + xfs_ifork_has_extents(ip, w)
      
      @@
      expression ip, w;
      @@
      
      - (!xfs_ifork_has_extents(ip, w))
      + !xfs_ifork_has_extents(ip, w)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      2fe4f928
  20. 04 11月, 2019 3 次提交