1. 30 4月, 2019 3 次提交
    • D
      xfs: add online scrub for superblock counters · 75efa57d
      Darrick J. Wong 提交于
      Teach online scrub how to check the filesystem summary counters.  We use
      the incore delalloc block counter along with the incore AG headers to
      compute expected values for fdblocks, icount, and ifree, and then check
      that the percpu counter is within a certain threshold of the expected
      value.  This is done to avoid having to freeze or otherwise lock the
      filesystem, which means that we're only checking that the counters are
      fairly close, not that they're exactly correct.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      75efa57d
    • C
      xfs: don't parse the mtpt mount option · 94079285
      Christoph Hellwig 提交于
      The text isn't really any more useful than the default unknown option
      handling.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      94079285
    • D
      xfs: always rejoin held resources during defer roll · 710d707d
      Darrick J. Wong 提交于
      During testing of xfs/141 on a V4 filesystem, I observed some
      inconsistent behavior with regards to resources that are held (i.e.
      remain locked) across a defer roll.  The transaction roll always gives
      the defer roll function a new transaction, even if committing the old
      transaction fails.  However, the defer roll function only rejoins the
      held resources if the transaction commit succeedied.  This means that
      callers of defer roll have to figure out whether the held resources are
      attached to the transaction being passed back.
      
      Worse yet, if the defer roll was part of a defer finish call, we have a
      third possibility: the defer finish could pass back a dirty transaction
      with dirty held resources and an error code.
      
      The only sane way to handle all of these scenarios is to require that
      the code that held the resource either cancel the transaction before
      unlocking and releasing the resources, or use functions that detach
      resources from a transaction properly (e.g.  xfs_trans_brelse) if they
      need to drop the reference before committing or cancelling the
      transaction.
      
      In order to make this so, change the defer roll code to join held
      resources to the new transaction unconditionally and fix all the bhold
      callers to release the held buffers correctly.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      710d707d
  2. 27 4月, 2019 6 次提交
  3. 23 4月, 2019 7 次提交
  4. 17 4月, 2019 8 次提交
    • D
      xfs: merge adjacent io completions of the same type · 3994fc48
      Darrick J. Wong 提交于
      It's possible for pagecache writeback to split up a large amount of work
      into smaller pieces for throttling purposes or to reduce the amount of
      time a writeback operation is pending.  Whatever the reason, XFS can end
      up with a bunch of IO completions that call for the same operation to be
      performed on a contiguous extent mapping.  Since mappings are extent
      based in XFS, we'd prefer to run fewer transactions when we can.
      
      When we're processing an ioend on the list of io completions, check to
      see if the next items on the list are both adjacent and of the same
      type.  If so, we can merge the completions to reduce transaction
      overhead.
      
      On fast storage this doesn't seem to make much of a difference in
      performance, though the number of transactions for an overnight xfstests
      run seems to drop by ~5%.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      3994fc48
    • D
      xfs: remove unused m_data_workqueue · 28408243
      Darrick J. Wong 提交于
      Now that we're no longer using m_data_workqueue, remove it.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      28408243
    • D
      xfs: implement per-inode writeback completion queues · cb357bf3
      Darrick J. Wong 提交于
      When scheduling writeback of dirty file data in the page cache, XFS uses
      IO completion workqueue items to ensure that filesystem metadata only
      updates after the write completes successfully.  This is essential for
      converting unwritten extents to real extents at the right time and
      performing COW remappings.
      
      Unfortunately, XFS queues each IO completion work item to an unbounded
      workqueue, which means that the kernel can spawn dozens of threads to
      try to handle the items quickly.  These threads need to take the ILOCK
      to update file metadata, which results in heavy ILOCK contention if a
      large number of the work items target a single file, which is
      inefficient.
      
      Worse yet, the writeback completion threads get stuck waiting for the
      ILOCK while holding transaction reservations, which can use up all
      available log reservation space.  When that happens, metadata updates to
      other parts of the filesystem grind to a halt, even if the filesystem
      could otherwise have handled it.
      
      Even worse, if one of the things grinding to a halt happens to be a
      thread in the middle of a defer-ops finish holding the same ILOCK and
      trying to obtain more log reservation having exhausted the permanent
      reservation, we now have an ABBA deadlock - writeback completion has a
      transaction reserved and wants the ILOCK, and someone else has the ILOCK
      and wants a transaction reservation.
      
      Therefore, we create a per-inode writeback io completion queue + work
      item.  When writeback finishes, it can add the ioend to the per-inode
      queue and let the single worker item process that queue.  This
      dramatically cuts down on the number of kworkers and ILOCK contention in
      the system, and seems to have eliminated an occasional deadlock I was
      seeing while running generic/476.
      
      Testing with a program that simulates a heavy random-write workload to a
      single file demonstrates that the number of kworkers drops from
      approximately 120 threads per file to 1, without dramatically changing
      write bandwidth or pagecache access latency.
      
      Note that we leave the xfs-conv workqueue's max_active alone because we
      still want to be able to run ioend processing for as many inodes as the
      system can handle.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      cb357bf3
    • D
      xfs: scrub should only cross-reference with healthy btrees · 4fb7951f
      Darrick J. Wong 提交于
      Skip cross-referencing with a btree if the health report tells us that
      it's known to be bad.  This should reduce the dmesg spew considerably.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4fb7951f
    • D
      xfs: scrub/repair should update filesystem metadata health · 4860a05d
      Darrick J. Wong 提交于
      Now that we have the ability to track sick metadata in-core, make scrub
      and repair update those health assessments after doing work.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      4860a05d
    • D
      xfs: hoist the already_fixed variable to the scrub context · 160b5a78
      Darrick J. Wong 提交于
      Now that we no longer memset the scrub context, we can move the
      already_fixed variable into the scrub context's state flags instead of
      passing around pointers to separate stack variables.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      160b5a78
    • D
      xfs: collapse scrub bool state flags into a single unsigned int · f8c2a225
      Darrick J. Wong 提交于
      Combine all the boolean state flags in struct xfs_scrub into a single
      unsigned int, because we're going to be adding more state flags soon.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      f8c2a225
    • D
      xfs: refactor scrub context initialization · 9d71e155
      Darrick J. Wong 提交于
      It's a little silly how the memset in scrub context initialization
      forces us to declare stack variables to preserve context variables
      across a retry.  Since the teardown functions already null out most of
      the ephemeral state (buffer pointers, btree cursors, etc.), just skip
      the memset and move the initialization as needed.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      9d71e155
  5. 15 4月, 2019 14 次提交
  6. 12 4月, 2019 2 次提交
    • J
      block: fix the return errno for direct IO · a89afe58
      Jason Yan 提交于
      If the last bio returned is not dio->bio, the status of the bio will
      not assigned to dio->bio if it is error. This will cause the whole IO
      status wrong.
      
          ksoftirqd/21-117   [021] ..s.  4017.966090:   8,0    C   N 4883648 [0]
                <idle>-0     [018] ..s.  4017.970888:   8,0    C  WS 4924800 + 1024 [0]
                <idle>-0     [018] ..s.  4017.970909:   8,0    D  WS 4935424 + 1024 [<idle>]
                <idle>-0     [018] ..s.  4017.970924:   8,0    D  WS 4936448 + 321 [<idle>]
          ksoftirqd/21-117   [021] ..s.  4017.995033:   8,0    C   R 4883648 + 336 [65475]
          ksoftirqd/21-117   [021] d.s.  4018.001988: myprobe1: (blkdev_bio_end_io+0x0/0x168) bi_status=7
          ksoftirqd/21-117   [021] d.s.  4018.001992: myprobe: (aio_complete_rw+0x0/0x148) x0=0xffff802f2595ad80 res=0x12a000 res2=0x0
      
      We always have to assign bio->bi_status to dio->bio.bi_status because we
      will only check dio->bio.bi_status when we return the whole IO to
      the upper layer.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Cc: stable@vger.kernel.org
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a89afe58
    • O
      NFSv4.1 fix incorrect return value in copy_file_range · 0769663b
      Olga Kornievskaia 提交于
      According to the NFSv4.2 spec if the input and output file is the
      same file, operation should fail with EINVAL. However, linux
      copy_file_range() system call has no such restrictions. Therefore,
      in such case let's return EOPNOTSUPP and allow VFS to fallback
      to doing do_splice_direct(). Also when copy_file_range is called
      on an NFSv4.0 or 4.1 mount (ie., a server that doesn't support
      COPY functionality), we also need to return EOPNOTSUPP and
      fallback to a regular copy.
      
      Fixes xfstest generic/075, generic/091, generic/112, generic/263
      for all NFSv4.x versions.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      0769663b