1. 27 12月, 2021 4 次提交
    • D
      xfs: update superblock counters correctly for !lazysbcount · 20560d6e
      Dave Chinner 提交于
      mainline-inclusion
      from mainline-v5.12-rc4
      commit 6543990a
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6543990a168acf366f4b6174d7bd46ba15a8a2a6
      
      -------------------------------------------------
      
      Keep the mount superblock counters up to date for !lazysbcount
      filesystems so that when we log the superblock they do not need
      updating in any way because they are already correct.
      
      It's found by what Zorro reported:
      1. mkfs.xfs -f -l lazy-count=0 -m crc=0 $dev
      2. mount $dev $mnt
      3. fsstress -d $mnt -p 100 -n 1000 (maybe need more or less io load)
      4. umount $mnt
      5. xfs_repair -n $dev
      and I've seen no problem with this patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reported-by: NZorro Lang <zlang@redhat.com>
      Reviewed-by: NGao Xiang <hsiangkao@redhat.com>
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      20560d6e
    • D
      xfs: don't check agf_btreeblks on pre-lazysbcount filesystems · f183f494
      Darrick J. Wong 提交于
      mainline-inclusion
      from mainline-v5.12-rc4
      commit e6c01077
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6c01077ec2d28fe8b6e0bc79eddea8d788f6ea3
      
      -------------------------------------------------
      
      The AGF free space btree block counter wasn't added until the
      lazysbcount feature was added to XFS midway through the life of the V4
      format, so ignore the field when checking.  Online AGF repair requires
      rmapbt, so it doesn't need the feature check.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      f183f494
    • D
      xfs: remove obsolete AGF counter debugging · dc0c7ae0
      Darrick J. Wong 提交于
      mainline-inclusion
      from mainline-v5.12-rc4
      commit 1aec7c3d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1aec7c3d05670b92b7339b19999009a93808efb9
      
      -------------------------------------------------
      
      In commit f8f2835a we changed the behavior of XFS to use EFIs to
      remove blocks from an overfilled AGFL because there were complaints
      about transaction overruns that stemmed from trying to free multiple
      blocks in a single transaction.
      
      Unfortunately, that commit missed a subtlety in the debug-mode
      transaction accounting when a realtime volume is attached.  If a
      realtime file undergoes a data fork mapping change such that realtime
      extents are allocated (or freed) in the same transaction that a data
      device block is also allocated (or freed), we can trip a debugging
      assertion.  This can happen (for example) if a realtime extent is
      allocated and it is necessary to reshape the bmbt to hold the new
      mapping.
      
      When we go to allocate a bmbt block from an AG, the first thing the data
      device block allocator does is ensure that the freelist is the proper
      length.  If the freelist is too long, it will trim the freelist to the
      proper length.
      
      In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to
      record the decrement in the AG free list count.  Prior to f8f28 we would
      put the free block back in the free space btrees in the same
      transaction, which calls xfs_trans_agblocks_delta() to record the
      increment in the AG free block count.  Since AGFL blocks are included in
      the global free block count (fdblocks), there is no corresponding
      fdblocks update, so the AGFL free satisfies the following condition in
      xfs_trans_apply_sb_deltas:
      
      	/*
      	 * Check that superblock mods match the mods made to AGF counters.
      	 */
      	ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) ==
      	       (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta +
      		tp->t_ag_btree_delta));
      
      The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is
      the number blocks that were allocated.
      
      After commit f8f28 we defer the block freeing to the next chained
      transaction, which means that the calls to xfs_trans_agflist_delta and
      xfs_trans_agblocks_delta occur in separate transactions.  The (first)
      transaction that shortens the free list trips on the comparison, which
      has now become:
      
      (X + 0) == ((X) + -1 + 0)
      
      because we haven't freed the AGFL block yet; we've only logged an
      intention to free it.  When the second transaction (the deferred free)
      commits, it will evaluate the expression as:
      
      (0 + 0) == (1 + 0 + 0)
      
      and trip over that in turn.
      
      At this point, the astute reader may note that the two commits tagged by
      this patch have been in the kernel for a long time but haven't generated
      any bug reports.  How is it that the author became aware of this bug?
      
      This originally surfaced as an intermittent failure when I was testing
      realtime rmap, but a different bug report by Zorro Lang reveals the same
      assertion occuring on !lazysbcount filesystems.
      
      The common factor to both reports (and why this problem wasn't
      previously reported) becomes apparent if we consider when
      xfs_trans_apply_sb_deltas is called by __xfs_trans_commit():
      
      	if (tp->t_flags & XFS_TRANS_SB_DIRTY)
      		xfs_trans_apply_sb_deltas(tp);
      
      With a modern lazysbcount filesystem, transactions update only the
      percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence
      xfs_trans_apply_sb_deltas is rarely called.
      
      However, updates to the count of free realtime extents are not part of
      lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or
      removing data fork mappings to realtime files; similarly,
      XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems.
      
      Dave mentioned in response to an earlier version of this patch:
      
      "IIUC, what you are saying is that this debug code is simply not
      exercised in normal testing and hasn't been for the past decade?  And it
      still won't be exercised on anything other than realtime device testing?
      
      "...it was debugging code from 1994 that was largely turned into dead
      code when lazysbcounters were introduced in 2007. Hence I'm not sure it
      holds any value anymore."
      
      This debugging code isn't especially helpful - you can modify the
      flcount on one AG and the freeblks of another AG, and it won't trigger.
      Add the fact that nobody noticed for a decade, and let's just get rid of
      it (and start testing realtime :P).
      
      This bug was found by running generic/051 on either a V4 filesystem
      lacking lazysbcount; or a V5 filesystem with a realtime volume.
      
      Cc: bfoster@redhat.com, zlang@redhat.com
      Fixes: f8f2835a ("xfs: defer agfl block frees when dfops is available")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      dc0c7ae0
    • B
      xfs: drop submit side trans alloc for append ioends · b906e741
      Brian Foster 提交于
      mainline-inclusion
      from mainline-v5.12-rc4
      commit 7cd3099f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7cd3099f4925d7c15887d1940ebd65acd66100f5
      
      -------------------------------------------------
      
      Per-inode ioend completion batching has a log reservation deadlock
      vector between preallocated append transactions and transactions
      that are acquired at completion time for other purposes (i.e.,
      unwritten extent conversion or COW fork remaps). For example, if the
      ioend completion workqueue task executes on a batch of ioends that
      are sorted such that an append ioend sits at the tail, it's possible
      for the outstanding append transaction reservation to block
      allocation of transactions required to process preceding ioends in
      the list.
      
      Append ioend completion is historically the common path for on-disk
      inode size updates. While file extending writes may have completed
      sometime earlier, the on-disk inode size is only updated after
      successful writeback completion. These transactions are preallocated
      serially from writeback context to mitigate concurrency and
      associated log reservation pressure across completions processed by
      multi-threaded workqueue tasks.
      
      However, now that delalloc blocks unconditionally map to unwritten
      extents at physical block allocation time, size updates via append
      ioends are relatively rare. This means that inode size updates most
      commonly occur as part of the preexisting completion time
      transaction to convert unwritten extents. As a result, there is no
      longer a strong need to preallocate size update transactions.
      
      Remove the preallocation of inode size update transactions to avoid
      the ioend completion processing log reservation deadlock. Instead,
      continue to send all potential size extending ioends to workqueue
      context for completion and allocate the transaction from that
      context. This ensures that no outstanding log reservation is owned
      by the ioend completion worker task when it begins to process
      ioends.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      b906e741
  2. 15 11月, 2021 1 次提交
  3. 03 6月, 2021 1 次提交
  4. 09 4月, 2021 1 次提交
  5. 20 11月, 2020 2 次提交
    • D
      xfs: revert "xfs: fix rmap key and record comparison functions" · eb840907
      Darrick J. Wong 提交于
      This reverts commit 6ff646b2.
      
      Your maintainer committed a major braino in the rmap code by adding the
      attr fork, bmbt, and unwritten extent usage bits into rmap record key
      comparisons.  While XFS uses the usage bits *in the rmap records* for
      cross-referencing metadata in xfs_scrub and xfs_repair, it only needs
      the owner and offset information to distinguish between reverse mappings
      of the same physical extent into the data fork of a file at multiple
      offsets.  The other bits are not important for key comparisons for index
      lookups, and never have been.
      
      Eric Sandeen reports that this causes regressions in generic/299, so
      undo this patch before it does more damage.
      Reported-by: NEric Sandeen <sandeen@sandeen.net>
      Fixes: 6ff646b2 ("xfs: fix rmap key and record comparison functions")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      eb840907
    • D
      xfs: don't allow NOWAIT DIO across extent boundaries · 883a790a
      Dave Chinner 提交于
      Jens has reported a situation where partial direct IOs can be issued
      and completed yet still return -EAGAIN. We don't want this to report
      a short IO as we want XFS to complete user DIO entirely or not at
      all.
      
      This partial IO situation can occur on a write IO that is split
      across an allocated extent and a hole, and the second mapping is
      returning EAGAIN because allocation would be required.
      
      The trivial reproducer:
      
      $ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo
      wrote 4096/4096 bytes at offset 0
      4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec)
      pwrite: Resource temporarily unavailable
      $
      
      The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done
      the first 4kB write:
      
       xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000
       iomap_apply:          dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
       xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
       xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
       xfs_iomap_found:      dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1
       iomap_apply_dstmap:   dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY
      
      Here the first iomap loop has mapped the first 4kB of the file and
      issued the IO, and we enter the second iomap_apply loop:
      
       iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor
       xfs_ilock_nowait:     dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap
       xfs_iunlock:          dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
      
      And we exit with -EAGAIN out because we hit the allocate case trying
      to make the second 4kB block.
      
      Then IO completes on the first 4kB and the original IO context
      completes and unlocks the inode, returning -EAGAIN to userspace:
      
       xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096
       xfs_iunlock:          dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write
      
      There are other vectors to the same problem when we re-enter the
      mapping code if we have to make multiple mappinfs under NOWAIT
      conditions. e.g. failing trylocks, COW extents being found,
      allocation being required, and so on.
      
      Avoid all these potential problems by only allowing IOMAP_NOWAIT IO
      to go ahead if the mapping we retrieve for the IO spans an entire
      allocated extent. This avoids the possibility of subsequent mappings
      to complete the IO from triggering NOWAIT semantics by any means as
      NOWAIT IO will now only enter the mapping code once per NOWAIT IO.
      Reported-and-tested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      883a790a
  6. 19 11月, 2020 6 次提交
  7. 12 11月, 2020 1 次提交
  8. 11 11月, 2020 4 次提交
    • D
      xfs: fix brainos in the refcount scrubber's rmap fragment processor · 54e9b09e
      Darrick J. Wong 提交于
      Fix some serious WTF in the reference count scrubber's rmap fragment
      processing.  The code comment says that this loop is supposed to move
      all fragment records starting at or before bno onto the worklist, but
      there's no obvious reason why nr (the number of items added) should
      increment starting from 1, and breaking the loop when we've added the
      target number seems dubious since we could have more rmap fragments that
      should have been added to the worklist.
      
      This seems to manifest in xfs/411 when adding one to the refcount field.
      
      Fixes: dbde19da ("xfs: cross-reference the rmapbt data with the refcountbt")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      54e9b09e
    • D
      xfs: fix rmap key and record comparison functions · 6ff646b2
      Darrick J. Wong 提交于
      Keys for extent interval records in the reverse mapping btree are
      supposed to be computed as follows:
      
      (physical block, owner, fork, is_btree, is_unwritten, offset)
      
      This provides users the ability to look up a reverse mapping from a bmbt
      record -- start with the physical block; then if there are multiple
      records for the same block, move on to the owner; then the inode fork
      type; and so on to the file offset.
      
      However, the key comparison functions incorrectly remove the
      fork/btree/unwritten information that's encoded in the on-disk offset.
      This means that lookup comparisons are only done with:
      
      (physical block, owner, offset)
      
      This means that queries can return incorrect results.  On consistent
      filesystems this hasn't been an issue because blocks are never shared
      between forks or with bmbt blocks; and are never unwritten.  However,
      this bug means that online repair cannot always detect corruption in the
      key information in internal rmapbt nodes.
      
      Found by fuzzing keys[1].attrfork = ones on xfs/371.
      
      Fixes: 4b8ed677 ("xfs: add rmap btree operations")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      6ff646b2
    • D
      xfs: set the unwritten bit in rmap lookup flags in xchk_bmap_get_rmapextents · 5dda3897
      Darrick J. Wong 提交于
      When the bmbt scrubber is looking up rmap extents, we need to set the
      extent flags from the bmbt record fully.  This will matter once we fix
      the rmap btree comparison functions to check those flags correctly.
      
      Fixes: d852657c ("xfs: cross-reference reverse-mapping btree")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5dda3897
    • D
      xfs: fix flags argument to rmap lookup when converting shared file rmaps · ea843989
      Darrick J. Wong 提交于
      Pass the same oldext argument (which contains the existing rmapping's
      unwritten state) to xfs_rmap_lookup_le_range at the start of
      xfs_rmap_convert_shared.  At this point in the code, flags is zero,
      which means that we perform lookups using the wrong key.
      
      Fixes: 3f165b33 ("xfs: convert unwritten status of reverse mappings for shared files")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      ea843989
  9. 05 11月, 2020 5 次提交
    • D
      xfs: only flush the unshared range in xfs_reflink_unshare · 46afb062
      Darrick J. Wong 提交于
      There's no reason to flush an entire file when we're unsharing part of
      a file.  Therefore, only initiate writeback on the selected range.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      46afb062
    • D
      xfs: fix scrub flagging rtinherit even if there is no rt device · c1f6b1ac
      Darrick J. Wong 提交于
      The kernel has always allowed directories to have the rtinherit flag
      set, even if there is no rt device, so this check is wrong.
      
      Fixes: 80e4e126 ("xfs: scrub inodes")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c1f6b1ac
    • D
      xfs: fix missing CoW blocks writeback conversion retry · c2f09217
      Darrick J. Wong 提交于
      In commit 7588cbee, we tried to fix a race stemming from the lack of
      coordination between higher level code that wants to allocate and remap
      CoW fork extents into the data fork.  Christoph cites as examples the
      always_cow mode, and a directio write completion racing with writeback.
      
      According to the comments before the goto retry, we want to restart the
      lookup to catch the extent in the data fork, but we don't actually reset
      whichfork or cow_fsb, which means the second try executes using stale
      information.  Up until now I think we've gotten lucky that either
      there's something left in the CoW fork to cause cow_fsb to be reset, or
      either data/cow fork sequence numbers have advanced enough to force a
      fresh lookup from the data fork.  However, if we reach the retry with an
      empty stable CoW fork and a stable data fork, neither of those things
      happens.  The retry foolishly re-calls xfs_convert_blocks on the CoW
      fork which fails again.  This time, we toss the write.
      
      I've recently been working on extending reflink to the realtime device.
      When the realtime extent size is larger than a single block, we have to
      force the page cache to CoW the entire rt extent if a write (or
      fallocate) are not aligned with the rt extent size.  The strategy I've
      chosen to deal with this is derived from Dave's blocksize > pagesize
      series: dirtying around the write range, and ensuring that writeback
      always starts mapping on an rt extent boundary.  This has brought this
      race front and center, since generic/522 blows up immediately.
      
      However, I'm pretty sure this is a bug outright, independent of that.
      
      Fixes: 7588cbee ("xfs: retry COW fork delalloc conversion when no extent was found")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c2f09217
    • B
      iomap: support partial page discard on writeback block mapping failure · 763e4cdc
      Brian Foster 提交于
      iomap writeback mapping failure only calls into ->discard_page() if
      the current page has not been added to the ioend. Accordingly, the
      XFS callback assumes a full page discard and invalidation. This is
      problematic for sub-page block size filesystems where some portion
      of a page might have been mapped successfully before a failure to
      map a delalloc block occurs. ->discard_page() is not called in that
      error scenario and the bio is explicitly failed by iomap via the
      error return from ->prepare_ioend(). As a result, the filesystem
      leaks delalloc blocks and corrupts the filesystem block counters.
      
      Since XFS is the only user of ->discard_page(), tweak the semantics
      to invoke the callback unconditionally on mapping errors and provide
      the file offset that failed to map. Update xfs_discard_page() to
      discard the corresponding portion of the file and pass the range
      along to iomap_invalidatepage(). The latter already properly handles
      both full and sub-page scenarios by not changing any iomap or page
      state on sub-page invalidations.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      763e4cdc
    • B
      xfs: flush new eof page on truncate to avoid post-eof corruption · 869ae85d
      Brian Foster 提交于
      It is possible to expose non-zeroed post-EOF data in XFS if the new
      EOF page is dirty, backed by an unwritten block and the truncate
      happens to race with writeback. iomap_truncate_page() will not zero
      the post-EOF portion of the page if the underlying block is
      unwritten. The subsequent call to truncate_setsize() will, but
      doesn't dirty the page. Therefore, if writeback happens to complete
      after iomap_truncate_page() (so it still sees the unwritten block)
      but before truncate_setsize(), the cached page becomes inconsistent
      with the on-disk block. A mapped read after the associated page is
      reclaimed or invalidated exposes non-zero post-EOF data.
      
      For example, consider the following sequence when run on a kernel
      modified to explicitly flush the new EOF page within the race
      window:
      
      $ xfs_io -fc "falloc 0 4k" -c fsync /mnt/file
      $ xfs_io -c "pwrite 0 4k" -c "truncate 1k" /mnt/file
        ...
      $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
      00000400:  00 00 00 00 00 00 00 00  ........
      $ umount /mnt/; mount <dev> /mnt/
      $ xfs_io -c "mmap 0 4k" -c "mread -v 1k 8" /mnt/file
      00000400:  cd cd cd cd cd cd cd cd  ........
      
      Update xfs_setattr_size() to explicitly flush the new EOF page prior
      to the page truncate to ensure iomap has the latest state of the
      underlying block.
      
      Fixes: 68a9f5e7 ("xfs: implement iomap based buffered write path")
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      869ae85d
  10. 29 10月, 2020 1 次提交
  11. 26 10月, 2020 1 次提交
  12. 22 10月, 2020 2 次提交
    • D
      xfs: cancel intents immediately if process_intents fails · 2e76f188
      Darrick J. Wong 提交于
      If processing recovered log intent items fails, we need to cancel all
      the unprocessed recovered items immediately so that a subsequent AIL
      push in the bail out path won't get wedged on the pinned intent items
      that didn't get processed.
      
      This can happen if the log contains (1) an intent that gets and releases
      an inode, (2) an intent that cannot be recovered successfully, and (3)
      some third intent item.  When recovery of (2) fails, we leave (3) pinned
      in memory.  Inode reclamation is called in the error-out path of
      xfs_mountfs before xfs_log_cancel_mount.  Reclamation calls
      xfs_ail_push_all_sync, which gets stuck waiting for (3).
      
      Therefore, call xlog_recover_cancel_intents if _process_intents fails.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      2e76f188
    • D
      xfs: fix fallocate functions when rtextsize is larger than 1 · 25219dbf
      Darrick J. Wong 提交于
      In commit fe341eb1, I forgot that xfs_free_file_space isn't strictly
      a "remove mapped blocks" function.  It is actually a function to zero
      file space by punching out the middle and writing zeroes to the
      unaligned ends of the specified range.  Therefore, putting a rtextsize
      alignment check in that function is wrong because that breaks unaligned
      ZERO_RANGE on the realtime volume.
      
      Furthermore, xfs_file_fallocate already has alignment checks for the
      functions require the file range to be aligned to the size of a
      fundamental allocation unit (which is 1 FSB on the data volume and 1 rt
      extent on the realtime volume).  Create a new helper to check fallocate
      arguments against the realtiem allocation unit size, fix the fallocate
      frontend to use it, fix free_file_space to delete the correct range, and
      remove a now redundant check from insert_file_space.
      
      NOTE: The realtime extent size is not required to be a power of two!
      
      Fixes: fe341eb1 ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      25219dbf
  13. 17 10月, 2020 2 次提交
    • D
      xfs: fix Kconfig asking about XFS_SUPPORT_V4 when XFS_FS=n · 89464554
      Darrick J. Wong 提交于
      Pavel Machek complained that the question about supporting deprecated
      XFS v4 comes up even when XFS is disabled.  This clearly makes no sense,
      so fix Kconfig.
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      89464554
    • D
      xfs: fix high key handling in the rt allocator's query_range function · d88850bd
      Darrick J. Wong 提交于
      Fix some off-by-one errors in xfs_rtalloc_query_range.  The highest key
      in the realtime bitmap is always one less than the number of rt extents,
      which means that the key clamp at the start of the function is wrong.
      The 4th argument to xfs_rtfind_forw is the highest rt extent that we
      want to probe, which means that passing 1 less than the high key is
      wrong.  Finally, drop the rem variable that controls the loop because we
      can compare the iteration point (rtstart) against the high key directly.
      
      The sordid history of this function is that the original commit (fb3c3)
      incorrectly passed (high_rec->ar_startblock - 1) as the 'limit' parameter
      to xfs_rtfind_forw.  This was wrong because the "high key" is supposed
      to be the largest key for which the caller wants result rows, not the
      key for the first row that could possibly be outside the range that the
      caller wants to see.
      
      A subsequent attempt (8ad56) to strengthen the parameter checking added
      incorrect clamping of the parameters to the number of rt blocks in the
      system (despite the bitmap functions all taking units of rt extents) to
      avoid querying ranges past the end of rt bitmap file but failed to fix
      the incorrect _rtfind_forw parameter.  The original _rtfind_forw
      parameter error then survived the conversion of the startblock and
      blockcount fields to rt extents (a0e5c), and the most recent off-by-one
      fix (a3a37) thought it was patching a problem when the end of the rt
      volume is not in use, but none of these fixes actually solved the
      original problem that the author was confused about the "limit" argument
      to xfs_rtfind_forw.
      
      Sadly, all four of these patches were written by this author and even
      his own usage of this function and rt testing were inadequate to get
      this fixed quickly.
      
      Original-problem: fb3c3de2 ("xfs: add a couple of queries to iterate free extents in the rtbitmap")
      Not-fixed-by: 8ad560d2 ("xfs: strengthen rtalloc query range checks")
      Not-fixed-by: a0e5c435 ("xfs: fix xfs_rtalloc_rec units")
      Fixes: a3a374bf ("xfs: fix off-by-one error in xfs_rtalloc_query_range")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      d88850bd
  14. 13 10月, 2020 3 次提交
    • D
      xfs: annotate grabbing the realtime bitmap/summary locks in growfs · ace74e79
      Darrick J. Wong 提交于
      Use XFS_ILOCK_RT{BITMAP,SUM} to annotate grabbing the rt bitmap and
      summary locks when we grow the realtime volume, just like we do most
      everywhere else.  This shuts up lockdep warnings about grabbing the
      ILOCK class of locks recursively:
      
      ============================================
      WARNING: possible recursive locking detected
      5.9.0-rc4-djw #rc4 Tainted: G           O
      --------------------------------------------
      xfs_growfs/4841 is trying to acquire lock:
      ffff888035acc230 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]
      
      but task is already holding lock:
      ffff888035acedb0 (&xfs_nondir_ilock_class){++++}-{3:3}, at: xfs_ilock+0xac/0x1a0 [xfs]
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&xfs_nondir_ilock_class);
        lock(&xfs_nondir_ilock_class);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      ace74e79
    • D
      xfs: make xfs_growfs_rt update secondary superblocks · 7249c95a
      Darrick J. Wong 提交于
      When we call growfs on the data device, we update the secondary
      superblocks to reflect the updated filesystem geometry.  We need to do
      this for growfs on the realtime volume too, because a future xfs_repair
      run could try to fix the filesystem using a backup superblock.
      
      This was observed by the online superblock scrubbers while running
      xfs/233.  One can also trigger this by growing an rt volume, cycling the
      mount, and creating new rt files.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      7249c95a
    • D
      xfs: fix realtime bitmap/summary file truncation when growing rt volume · f4c32e87
      Darrick J. Wong 提交于
      The realtime bitmap and summary files are regular files that are hidden
      away from the directory tree.  Since they're regular files, inode
      inactivation will try to purge what it thinks are speculative
      preallocations beyond the incore size of the file.  Unfortunately,
      xfs_growfs_rt forgets to update the incore size when it resizes the
      inodes, with the result that inactivating the rt inodes at unmount time
      will cause their contents to be truncated.
      
      Fix this by updating the incore size when we change the ondisk size as
      part of updating the superblock.  Note that we don't do this when we're
      allocating blocks to the rt inodes because we actually want those blocks
      to get purged if the growfs fails.
      
      This fixes corruption complaints from the online rtsummary checker when
      running xfs/233.  Since that test requires rmap, one can also trigger
      this by growing an rt volume, cycling the mount, and creating rt files.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      f4c32e87
  15. 07 10月, 2020 6 次提交