1. 20 8月, 2021 3 次提交
    • D
      xfs: kill xfs_sb_version_has_v3inode() · cf28e17c
      Dave Chinner 提交于
      All callers to xfs_dinode_good_version() and XFS_DINODE_SIZE() in
      both the kernel and userspace have a xfs_mount structure available
      which means they can use mount features checks instead looking
      directly are the superblock.
      
      Convert these functions to take a mount and use a xfs_has_v3inodes()
      check and move it out of the libxfs/xfs_format.h file as it really
      doesn't have anything to do with the definition of the on-disk
      format.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      cf28e17c
    • D
      xfs: convert xfs_sb_version_has checks to use mount features · ebd9027d
      Dave Chinner 提交于
      This is a conversion of the remaining xfs_sb_version_has..(sbp)
      checks to use xfs_has_..(mp) feature checks.
      
      This was largely done with a vim replacement macro that did:
      
      :0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR>
      
      A couple of other variants were also used, and the rest touched up
      by hand.
      
      $ size -t fs/xfs/built-in.a
      	   text    data     bss     dec     hex filename
      before	1127533  311352     484 1439369  15f689 (TOTALS)
      after	1125360  311352     484 1437196  15ee0c (TOTALS)
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ebd9027d
    • D
      xfs: replace xfs_sb_version checks with feature flag checks · 38c26bfd
      Dave Chinner 提交于
      Convert the xfs_sb_version_hasfoo() to checks against
      mp->m_features. Checks of the superblock itself during disk
      operations (e.g. in the read/write verifiers and the to/from disk
      formatters) are not converted - they operate purely on the
      superblock state. Everything else should use the mount features.
      
      Large parts of this conversion were done with sed with commands like
      this:
      
      for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
      	sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
      done
      
      With manual cleanups for things like "xfs_has_extflgbit" and other
      little inconsistencies in naming.
      
      The result is ia lot less typing to check features and an XFS binary
      size reduced by a bit over 3kB:
      
      $ size -t fs/xfs/built-in.a
      	text	   data	    bss	    dec	    hex	filenam
      before	1130866  311352     484 1442702  16038e (TOTALS)
      after	1127727  311352     484 1439563  15f74b (TOTALS)
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      38c26bfd
  2. 16 7月, 2021 1 次提交
    • D
      xfs: correct the narrative around misaligned rtinherit/extszinherit dirs · 83193e5e
      Darrick J. Wong 提交于
      While auditing the realtime growfs code, I realized that the GROWFSRT
      ioctl (and by extension xfs_growfs) has always allowed sysadmins to
      change the realtime extent size when adding a realtime section to the
      filesystem.  Since we also have always allowed sysadmins to set
      RTINHERIT and EXTSZINHERIT on directories even if there is no realtime
      device, this invalidates the premise laid out in the comments added in
      commit 603f000b.
      
      In other words, this is not a case of inadequate metadata validation.
      This is a case of nearly forgotten (and apparently untested) but
      supported functionality.  Update the comments to reflect what we've
      learned, and remove the log message about correcting the misalignment.
      
      Fixes: 603f000b ("xfs: validate extsz hints against rt extent size when rtinherit is set")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      83193e5e
  3. 02 6月, 2021 1 次提交
  4. 25 5月, 2021 2 次提交
    • D
      xfs: validate extsz hints against rt extent size when rtinherit is set · 603f000b
      Darrick J. Wong 提交于
      The RTINHERIT bit can be set on a directory so that newly created
      regular files will have the REALTIME bit set to store their data on the
      realtime volume.  If an extent size hint (and EXTSZINHERIT) are set on
      the directory, the hint will also be copied into the new file.
      
      As pointed out in previous patches, for realtime files we require the
      extent size hint be an integer multiple of the realtime extent, but we
      don't perform the same validation on a directory with both RTINHERIT and
      EXTSZINHERIT set, even though the only use-case of that combination is
      to propagate extent size hints into new realtime files.  This leads to
      inode corruption errors when the bad values are propagated.
      
      Because there may be existing filesystems with such a configuration, we
      cannot simply amend the inode verifier to trip on these directories and
      call it a day because that will cause previously "working" filesystems
      to start throwing errors abruptly.  Note that it's valid to have
      directories with rtinherit set even if there is no realtime volume, in
      which case the problem does not manifest because rtinherit is ignored if
      there's no realtime device; and it's possible that someone set the flag,
      crashed, repaired the filesystem (which clears the hint on the realtime
      file) and continued.
      
      Therefore, mitigate this issue in several ways: First, if we try to
      write out an inode with both rtinherit/extszinherit set and an unaligned
      extent size hint, turn off the hint to correct the error.  Second, if
      someone tries to misconfigure a directory via the fssetxattr ioctl, fail
      the ioctl.  Third, reverify both extent size hint values when we
      propagate heritable inode attributes from parent to child, to prevent
      misconfigurations from spreading.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      603f000b
    • D
      xfs: standardize extent size hint validation · 6b69e485
      Darrick J. Wong 提交于
      While chasing a bug involving invalid extent size hints being propagated
      into newly created realtime files, I noticed that the xfs_ioctl_setattr
      checks for the extent size hints weren't the same as the ones now
      encoded in libxfs and used for validation in repair and mkfs.
      
      Because the checks in libxfs are more stringent than the ones in the
      ioctl, it's possible for a live system to set inode flags that
      immediately result in corruption warnings.  Specifically, it's possible
      to set an extent size hint on an rtinherit directory without checking if
      the hint is aligned to the realtime extent size, which makes no sense
      since that combination is used only to seed new realtime files.
      
      Replace the open-coded and inadequate checks with the libxfs verifier
      versions and update the code comments a bit.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      6b69e485
  5. 08 4月, 2021 13 次提交
  6. 10 12月, 2020 1 次提交
  7. 16 9月, 2020 3 次提交
  8. 07 7月, 2020 3 次提交
    • D
      xfs: remove xfs_inobp_check() · e2705b03
      Dave Chinner 提交于
      This debug code is called on every xfs_iflush() call, which then
      checks every inode in the buffer for non-zero unlinked list field.
      Hence it checks every inode in the cluster buffer every time a
      single inode on that cluster it flushed. This is resulting in:
      
      -   38.91%     5.33%  [kernel]  [k] xfs_iflush
         - 17.70% xfs_iflush
            - 9.93% xfs_inobp_check
                 4.36% xfs_buf_offset
      
      10% of the CPU time spent flushing inodes is repeatedly checking
      unlinked fields in the buffer. We don't need to do this.
      
      The other place we call xfs_inobp_check() is
      xfs_iunlink_update_dinode(), and this is after we've done this
      assert for the agino we are about to write into that inode:
      
      	ASSERT(xfs_verify_agino_or_null(mp, agno, next_agino));
      
      which means we've already checked that the agino we are about to
      write is not 0 on debug kernels. The inode buffer verifiers do
      everything else we need, so let's just remove this debug code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e2705b03
    • D
      xfs: pin inode backing buffer to the inode log item · 298f7bec
      Dave Chinner 提交于
      When we dirty an inode, we are going to have to write it disk at
      some point in the near future. This requires the inode cluster
      backing buffer to be present in memory. Unfortunately, under severe
      memory pressure we can reclaim the inode backing buffer while the
      inode is dirty in memory, resulting in stalling the AIL pushing
      because it has to do a read-modify-write cycle on the cluster
      buffer.
      
      When we have no memory available, the read of the cluster buffer
      blocks the AIL pushing process, and this causes all sorts of issues
      for memory reclaim as it requires inode writeback to make forwards
      progress. Allocating a cluster buffer causes more memory pressure,
      and results in more cluster buffers to be reclaimed, resulting in
      more RMW cycles to be done in the AIL context and everything then
      backs up on AIL progress. Only the synchronous inode cluster
      writeback in the the inode reclaim code provides some level of
      forwards progress guarantees that prevent OOM-killer rampages in
      this situation.
      
      Fix this by pinning the inode backing buffer to the inode log item
      when the inode is first dirtied (i.e. in xfs_trans_log_inode()).
      This may mean the first modification of an inode that has been held
      in cache for a long time may block on a cluster buffer read, but
      we can do that in transaction context and block safely until the
      buffer has been allocated and read.
      
      Once we have the cluster buffer, the inode log item takes a
      reference to it, pinning it in memory, and attaches it to the log
      item for future reference. This means we can always grab the cluster
      buffer from the inode log item when we need it.
      
      When the inode is finally cleaned and removed from the AIL, we can
      drop the reference the inode log item holds on the cluster buffer.
      Once all inodes on the cluster buffer are clean, the cluster buffer
      will be unpinned and it will be available for memory reclaim to
      reclaim again.
      
      This avoids the issues with needing to do RMW cycles in the AIL
      pushing context, and hence allows complete non-blocking inode
      flushing to be performed by the AIL pushing context.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      298f7bec
    • K
      xfs: Couple of typo fixes in comments · 06734e3c
      Keyur Patel 提交于
      ./xfs/libxfs/xfs_inode_buf.c:56: unnecssary ==> unnecessary
      ./xfs/libxfs/xfs_inode_buf.c:59: behavour ==> behaviour
      ./xfs/libxfs/xfs_inode_buf.c:206: unitialized ==> uninitialized
      Signed-off-by: NKeyur Patel <iamkeyur96@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      06734e3c
  9. 20 5月, 2020 10 次提交
  10. 07 5月, 2020 2 次提交
  11. 19 3月, 2020 1 次提交