1. 12 2月, 2019 19 次提交
  2. 04 2月, 2019 3 次提交
    • D
      xfs: set buffer ops when repair probes for btree type · add46b3b
      Darrick J. Wong 提交于
      In xrep_findroot_block, we work out the btree type and correctness of a
      given block by calling different btree verifiers on root block
      candidates.  However, we leave the NULL b_ops while ->verify_read
      validates the block, which means that if the verifier calls
      xfs_buf_verifier_error it'll crash on the null b_ops.  Fix it to set
      b_ops before calling the verifier and unsetting it if the verifier
      fails.
      
      Furthermore, improve the documentation around xfs_buf_ensure_ops, which
      is the function that is responsible for cleaning up the b_ops state of
      buffers that go through xrep_findroot_block but don't match anything.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      add46b3b
    • B
      xfs: end sync buffer I/O properly on shutdown error · 465fa17f
      Brian Foster 提交于
      As of commit e339dd8d ("xfs: use sync buffer I/O for sync delwri
      queue submission"), the delwri submission code uses sync buffer I/O
      for sync delwri I/O. Instead of waiting on async I/O to unlock the
      buffer, it uses the underlying sync I/O completion mechanism.
      
      If delwri buffer submission fails due to a shutdown scenario, an
      error is set on the buffer and buffer completion never occurs. This
      can cause xfs_buf_delwri_submit() to deadlock waiting on a
      completion event.
      
      We could check the error state before waiting on such buffers, but
      that doesn't serialize against the case of an error set via a racing
      I/O completion. Instead, invoke I/O completion in the shutdown case
      regardless of buffer I/O type.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      465fa17f
    • B
      xfs: eof trim writeback mapping as soon as it is cached · aa6ee4ab
      Brian Foster 提交于
      The cached writeback mapping is EOF trimmed to try and avoid races
      between post-eof block management and writeback that result in
      sending cached data to a stale location. The cached mapping is
      currently trimmed on the validation check, which leaves a race
      window between the time the mapping is cached and when it is trimmed
      against the current inode size.
      
      For example, if a new mapping is cached by delalloc conversion on a
      blocksize == page size fs, we could cycle various locks, perform
      memory allocations, etc.  in the writeback codepath before the
      associated mapping is eventually trimmed to i_size. This leaves
      enough time for a post-eof truncate and file append before the
      cached mapping is trimmed. The former event essentially invalidates
      a range of the cached mapping and the latter bumps the inode size
      such the trim on the next writepage event won't trim all of the
      invalid blocks. fstest generic/464 reproduces this scenario
      occasionally and causes a lost writeback and stale delalloc blocks
      warning on inode inactivation.
      
      To work around this problem, trim the cached writeback mapping as
      soon as it is cached in addition to on subsequent validation checks.
      This is a minor tweak to tighten the race window as much as possible
      until a proper invalidation mechanism is available.
      
      Fixes: 40214d12 ("xfs: trim writepage mapping to within eof")
      Cc: <stable@vger.kernel.org> # v4.14+
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      aa6ee4ab
  3. 30 12月, 2018 2 次提交
  4. 22 12月, 2018 1 次提交
  5. 20 12月, 2018 6 次提交
  6. 19 12月, 2018 3 次提交
    • N
      xfs: Fix x32 ioctls when cmd numbers differ from ia32. · a9d25bde
      Nick Bowler 提交于
      Several ioctl structs change size between native 32-bit (ia32) and x32
      applications, because x32 follows the native 64-bit (amd64) integer
      alignment rules and uses 64-bit time_t.  In these instances, the ioctl
      number changes so userspace simply gets -ENOTTY.  This scenario can be
      handled by simply adding more cases.
      
      Looking at the different ioctls implemented here:
      
      - All the ones marked 'No size or alignment issue on any arch' should
        presumably all be fine.
      
      - All the ones under BROKEN_X86_ALIGNMENT are different under integer
        alignment rules.  Since x32 matches amd64 here, we just need both
        sets of cases handled.
      
      - XFS_IOC_SWAPEXT has both integer alignment differences and time_t
        differences.  Since x32 matches amd64 here, we need to add a case
        which calls the native implementation.
      
      - The remaining ioctls have neither 64-bit integers nor time_t, so
        x32 matches ia32 here and no change is required at this level.  The
        bulkstat ioctl implementations have some pointer chasing which is
        handled separately.
      Signed-off-by: NNick Bowler <nbowler@draconx.ca>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a9d25bde
    • N
      xfs: Fix bulkstat compat ioctls on x32 userspace. · 7ca860e3
      Nick Bowler 提交于
      The bulkstat family of ioctls are problematic on x32, because there is
      a mixup of native 32-bit and 64-bit conventions.  The xfs_fsop_bulkreq
      struct contains pointers and 32-bit integers so that matches the native
      32-bit layout, and that means the ioctl implementation goes into the
      regular compat path on x32.
      
      However, the 'ubuffer' member of that struct in turn refers to either
      struct xfs_inogrp or xfs_bstat (or an array of these).  On x32, those
      structures match the native 64-bit layout.  The compat implementation
      writes out the 32-bit version of these structures.  This is not the
      expected format for x32 userspace, causing problems.
      
      Fortunately the functions which actually output these xfs_inogrp and
      xfs_bstat structures have an easy way to select which output format
      is required, so we just need a little tweak to select the right format
      on x32.
      Signed-off-by: NNick Bowler <nbowler@draconx.ca>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7ca860e3
    • N
      xfs: Align compat attrlist_by_handle with native implementation. · c456d644
      Nick Bowler 提交于
      While inspecting the ioctl implementations, I noticed that the compat
      implementation of XFS_IOC_ATTRLIST_BY_HANDLE does not do exactly the
      same thing as the native implementation.  Specifically, the "cursor"
      does not appear to be written out to userspace on the compat path,
      like it is on the native path.
      
      This adjusts the compat implementation to copy out the cursor just
      like the native implementation does.  The attrlist cursor does not
      require any special compat handling.  This fixes xfstests xfs/269
      on both IA-32 and x32 userspace, when running on an amd64 kernel.
      Signed-off-by: NNick Bowler <nbowler@draconx.ca>
      Fixes: 0facef7f ("xfs: in _attrlist_by_handle, copy the cursor back to userspace")
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      c456d644
  7. 14 12月, 2018 1 次提交
  8. 13 12月, 2018 5 次提交
    • O
      xfs: cache minimum realtime summary level · 355e3532
      Omar Sandoval 提交于
      The realtime summary is a two-dimensional array on disk, effectively:
      
      u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]
      
      rsum[log][bbno] is the number of extents of size 2**log which start in
      bitmap block bbno.
      
      xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
      rsum[log][bbno] != 0 for any log level. However, the summary array is
      stored in row-major order (i.e., like an array in C), so all of these
      entries are not adjacent, but rather spread across the entire summary
      file. In the worst case (a full bitmap block), xfs_rtany_summary() has
      to check every level.
      
      This means that on a moderately-used realtime device, an allocation will
      waste a lot of time finding, reading, and releasing buffers for the
      realtime summary. In particular, one of our storage services (which runs
      on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
      spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
      xfs_trans_brelse() called from xfs_rtany_summary().
      
      One solution would be to also store the summary with the dimensions
      swapped. However, this would require a disk format change to a very old
      component of XFS.
      
      Instead, we can cache the minimum size which contains any extents. We do
      so lazily; rather than guaranteeing that the cache contains the precise
      minimum, it always contains a loose lower bound which we tighten when we
      read or update a summary block. This only uses a few kilobytes of memory
      and is already serialized via the realtime bitmap and summary inode
      locks, so the cost is minimal. With this change, the same workload only
      spends 0.2% of its CPU cycles in the realtime allocator.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      355e3532
    • D
      xfs: count inode blocks correctly in inobt scrub · 2c2d9d3a
      Darrick J. Wong 提交于
      A big block filesystem might require more than one inobt record to cover
      all the inodes in the block.  In these cases it is not correct to round
      the irec count up to the nearest block because this causes us to
      overestimate the number of inode blocks we expect to find.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      2c2d9d3a
    • D
      xfs: precalculate cluster alignment in inodes and blocks · c1b4a321
      Darrick J. Wong 提交于
      Store the inode cluster alignment information in units of inodes and
      blocks in the mount data so that we don't have to keep recalculating
      them.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      c1b4a321
    • D
      xfs: precalculate inodes and blocks per inode cluster · 83dcdb44
      Darrick J. Wong 提交于
      Store the number of inodes and blocks per inode cluster in the mount
      data so that we don't have to keep recalculating them.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      83dcdb44
    • D
      xfs: add a block to inode count converter · 43004b2a
      Darrick J. Wong 提交于
      Add new helpers to convert units of fs blocks into inodes, and AG blocks
      into AG inodes, respectively.  Convert all the open-coded conversions
      and XFS_OFFBNO_TO_AGINO(, , 0) calls to use them, as appropriate.  The
      OFFBNO_TO_AGINO macro is retained for xfs_repair.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      43004b2a