1. 24 10月, 2021 1 次提交
    • A
      iomap: Add done_before argument to iomap_dio_rw · 4fdccaa0
      Andreas Gruenbacher 提交于
      Add a done_before argument to iomap_dio_rw that indicates how much of
      the request has already been transferred.  When the request succeeds, we
      report that done_before additional bytes were tranferred.  This is
      useful for finishing a request asynchronously when part of the request
      has already been completed synchronously.
      
      We'll use that to allow iomap_dio_rw to be used with page faults
      disabled: when a page fault occurs while submitting a request, we
      synchronously complete the part of the request that has already been
      submitted.  The caller can then take care of the page fault and call
      iomap_dio_rw again for the rest of the request, passing in the number of
      bytes already tranferred.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      4fdccaa0
  2. 20 8月, 2021 3 次提交
  3. 13 7月, 2021 1 次提交
  4. 22 6月, 2021 2 次提交
  5. 09 6月, 2021 2 次提交
  6. 03 6月, 2021 1 次提交
    • D
      xfs: don't take a spinlock unconditionally in the DIO fastpath · 977ec4dd
      Dave Chinner 提交于
      Because this happens at high thread counts on high IOPS devices
      doing mixed read/write AIO-DIO to a single file at about a million
      iops:
      
         64.09%     0.21%  [kernel]            [k] io_submit_one
         - 63.87% io_submit_one
            - 44.33% aio_write
               - 42.70% xfs_file_write_iter
                  - 41.32% xfs_file_dio_write_aligned
                     - 25.51% xfs_file_write_checks
                        - 21.60% _raw_spin_lock
                           - 21.59% do_raw_spin_lock
                              - 19.70% __pv_queued_spin_lock_slowpath
      
      This also happens of the IO completion IO path:
      
         22.89%     0.69%  [kernel]            [k] xfs_dio_write_end_io
         - 22.49% xfs_dio_write_end_io
            - 21.79% _raw_spin_lock
               - 20.97% do_raw_spin_lock
                  - 20.10% __pv_queued_spin_lock_slowpath
      
      IOWs, fio is burning ~14 whole CPUs on this spin lock.
      
      So, do an unlocked check against inode size first, then if we are
      at/beyond EOF, take the spinlock and recheck. This makes the
      spinlock disappear from the overwrite fastpath.
      
      I'd like to report that fixing this makes things go faster. It
      doesn't - it just exposes the the XFS_ILOCK as the next severe
      contention point doing extent mapping lookups, and that now burns
      all the 14 CPUs this spinlock was burning.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      977ec4dd
  7. 27 5月, 2021 1 次提交
    • G
      xfs: Fix fall-through warnings for Clang · 53004ee7
      Gustavo A. R. Silva 提交于
      In preparation to enable -Wimplicit-fallthrough for Clang, fix
      the following warnings by replacing /* fall through */ comments,
      and its variants, with the new pseudo-keyword macro fallthrough:
      
      fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      
      Notice that Clang doesn't recognize /* fall through */ comments as
      implicit fall-through markings, so in order to globally enable
      -Wimplicit-fallthrough for Clang, these comments need to be
      replaced with fallthrough; in the whole codebase.
      
      Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      53004ee7
  8. 08 4月, 2021 4 次提交
  9. 04 2月, 2021 4 次提交
  10. 02 2月, 2021 8 次提交
  11. 24 1月, 2021 2 次提交
  12. 23 1月, 2021 3 次提交
  13. 20 1月, 2021 1 次提交
  14. 22 10月, 2020 1 次提交
    • D
      xfs: fix fallocate functions when rtextsize is larger than 1 · 25219dbf
      Darrick J. Wong 提交于
      In commit fe341eb1, I forgot that xfs_free_file_space isn't strictly
      a "remove mapped blocks" function.  It is actually a function to zero
      file space by punching out the middle and writing zeroes to the
      unaligned ends of the specified range.  Therefore, putting a rtextsize
      alignment check in that function is wrong because that breaks unaligned
      ZERO_RANGE on the realtime volume.
      
      Furthermore, xfs_file_fallocate already has alignment checks for the
      functions require the file range to be aligned to the size of a
      fundamental allocation unit (which is 1 FSB on the data volume and 1 rt
      extent on the realtime volume).  Create a new helper to check fallocate
      arguments against the realtiem allocation unit size, fix the fallocate
      frontend to use it, fix free_file_space to delete the correct range, and
      remove a now redundant check from insert_file_space.
      
      NOTE: The realtime extent size is not required to be a power of two!
      
      Fixes: fe341eb1 ("xfs: ensure that fpunch, fcollapse, and finsert operations are aligned to rt extent size")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      25219dbf
  15. 16 9月, 2020 1 次提交
  16. 06 9月, 2020 1 次提交
    • M
      xfs: don't update mtime on COW faults · b17164e2
      Mikulas Patocka 提交于
      When running in a dax mode, if the user maps a page with MAP_PRIVATE and
      PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
      when the user hits a COW fault.
      
      This breaks building of the Linux kernel.  How to reproduce:
      
       1. extract the Linux kernel tree on dax-mounted xfs filesystem
       2. run make clean
       3. run make -j12
       4. run make -j12
      
      at step 4, make would incorrectly rebuild the whole kernel (although it
      was already built in step 3).
      
      The reason for the breakage is that almost all object files depend on
      objtool.  When we run objtool, it takes COW page fault on its .data
      section, and these faults will incorrectly update the timestamp of the
      objtool binary.  The updated timestamp causes make to rebuild the whole
      tree.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b17164e2
  17. 06 8月, 2020 2 次提交
  18. 07 7月, 2020 2 次提交
    • D
      xfs: add an inode item lock · 1319ebef
      Dave Chinner 提交于
      The inode log item is kind of special in that it can be aggregating
      new changes in memory at the same time time existing changes are
      being written back to disk. This means there are fields in the log
      item that are accessed concurrently from contexts that don't share
      any locking at all.
      
      e.g. updating ili_last_fields occurs at flush time under the
      ILOCK_EXCL and flush lock at flush time, under the flush lock at IO
      completion time, and is read under the ILOCK_EXCL when the inode is
      logged.  Hence there is no actual serialisation between reading the
      field during logging of the inode in transactions vs clearing the
      field in IO completion.
      
      We currently get away with this by the fact that we are only
      clearing fields in IO completion, and nothing bad happens if we
      accidentally log more of the inode than we actually modify. Worst
      case is we consume a tiny bit more memory and log bandwidth.
      
      However, if we want to do more complex state manipulations on the
      log item that requires updates at all three of these potential
      locations, we need to have some mechanism of serialising those
      operations. To do this, introduce a spinlock into the log item to
      serialise internal state.
      
      This could be done via the xfs_inode i_flags_lock, but this then
      leads to potential lock inversion issues where inode flag updates
      need to occur inside locks that best nest inside the inode log item
      locks (e.g. marking inodes stale during inode cluster freeing).
      Using a separate spinlock avoids these sorts of problems and
      simplifies future code.
      
      This does not touch the use of ili_fields in the item formatting
      code - that is entirely protected by the ILOCK_EXCL at this point in
      time, so it remains untouched.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1319ebef
    • D
      xfs: use MMAPLOCK around filemap_map_pages() · cd647d56
      Dave Chinner 提交于
      The page faultround path ->map_pages is implemented in XFS via
      filemap_map_pages(). This function checks that pages found in page
      cache lookups have not raced with truncate based invalidation by
      checking page->mapping is correct and page->index is within EOF.
      
      However, we've known for a long time that this is not sufficient to
      protect against races with invalidations done by operations that do
      not change EOF. e.g. hole punching and other fallocate() based
      direct extent manipulations. The way we protect against these
      races is we wrap the page fault operations in a XFS_MMAPLOCK_SHARED
      lock so they serialise against fallocate and truncate before calling
      into the filemap function that processes the fault.
      
      Do the same for XFS's ->map_pages implementation to close this
      potential data corruption issue.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      cd647d56