1. 18 6月, 2021 4 次提交
  2. 10 6月, 2021 2 次提交
  3. 09 6月, 2021 13 次提交
    • D
      Merge tag 'rename-eofblocks-5.14_2021-06-08' of... · 68b2c8bc
      Darrick J. Wong 提交于
      Merge tag 'rename-eofblocks-5.14_2021-06-08' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2
      
      xfs: rename struct xfs_eofblocks
      
      In the old days, struct xfs_eofblocks was an optional parameter to the
      speculative post-EOF allocation garbage collector to narrow the scope of
      a scan to files fitting specific criteria.  Nowadays it is used for all
      other kinds of inode cache walks (reclaim, quotaoff, inactivation), so
      the name is no longer fitting.  Change the flag namespace and rename the
      structure to something more appropriate for what it does.
      
      v2: separate the inode cache walk flag namespace from eofblocks
      
      * tag 'rename-eofblocks-5.14_2021-06-08' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: rename struct xfs_eofblocks to xfs_icwalk
        xfs: change the prefix of XFS_EOF_FLAGS_* to XFS_ICWALK_FLAG_
      68b2c8bc
    • D
      Merge tag 'fix-inode-health-reports-5.14_2021-06-08' of... · 295abff2
      Darrick J. Wong 提交于
      Merge tag 'fix-inode-health-reports-5.14_2021-06-08' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2
      
      xfs: preserve inode health reports for longer
      
      This is a quick series to make sure that inode sickness reports stick
      around in memory for some amount of time.
      
      v2: rebase to 5.13-rc4
      v3: require explicit request to reclaim sick inodes, drop weird icache
          miss interaction with DONTCACHE
      
      * tag 'fix-inode-health-reports-5.14_2021-06-08' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: selectively keep sick inodes in memory
        xfs: drop IDONTCACHE on inodes when we mark them sick
        xfs: only reset incore inode health state flags when reclaiming an inode
      295abff2
    • D
      xfs: rename struct xfs_eofblocks to xfs_icwalk · b26b2bf1
      Darrick J. Wong 提交于
      The xfs_eofblocks structure is no longer well-named -- nowadays it
      provides optional filtering criteria to any walk of the incore inode
      cache.  Only one of the cache walk goals has anything to do with
      clearing of speculative post-EOF preallocations, so change the name to
      be more appropriate.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      b26b2bf1
    • D
      xfs: selectively keep sick inodes in memory · 9492750a
      Darrick J. Wong 提交于
      It's important that the filesystem retain its memory of sick inodes for
      a little while after problems are found so that reports can be collected
      about what was wrong.  Don't let inode reclamation free sick inodes
      unless we're unmounting or the fs already went down.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      9492750a
    • D
      xfs: change the prefix of XFS_EOF_FLAGS_* to XFS_ICWALK_FLAG_ · 2d53f66b
      Darrick J. Wong 提交于
      In preparation for renaming struct xfs_eofblocks to struct xfs_icwalk,
      change the prefix of the existing XFS_EOF_FLAGS_* flags to
      XFS_ICWALK_FLAG_ and convert all the existing users.  This adds a degree
      of interface separation between the ioctl definitions and the incore
      parameters.  Since FLAGS_UNION is only used in xfs_icache.c, move it
      there as a private flag.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      2d53f66b
    • D
      xfs: drop IDONTCACHE on inodes when we mark them sick · 7975e465
      Darrick J. Wong 提交于
      When we decide to mark an inode sick, clear the DONTCACHE flag so that
      the incore inode will be kept around until memory pressure forces it out
      of memory.  This increases the chances that the sick status will be
      caught by someone compiling a health report later on.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      7975e465
    • D
      xfs: only reset incore inode health state flags when reclaiming an inode · 255794c7
      Darrick J. Wong 提交于
      While running some fuzz tests on inode metadata, I noticed that the
      filesystem health report (as provided by xfs_spaceman) failed to report
      the file corruption even when spaceman was run immediately after running
      xfs_scrub to detect the corruption.  That isn't the intended behavior;
      one ought to be able to run scrub to detect errors in the ondisk
      metadata and be able to access to those reports for some time after the
      scrub.
      
      After running the same sequence through an instrumented kernel, I
      discovered the reason why -- scrub igets the file, scans it, marks it
      sick, and ireleases the inode.  When the VFS lets go of the incore
      inode, it moves to RECLAIMABLE state.  If spaceman igets the incore
      inode before it moves to RECLAIM state, iget reinitializes the VFS
      state, clears the sick and checked masks, and hands back the inode.  At
      this point, the caller has the exact same incore inode, but with all the
      health state erased.
      
      In other words, we're erasing the incore inode's health state flags when
      we've decided NOT to sever the link between the incore inode and the
      ondisk inode.  This is wrong, so we need to remove the lines that zero
      the fields from xfs_iget_cache_hit.
      
      As a precaution, we add the same lines into xfs_reclaim_inode just after
      we sever the link between incore and ondisk inode.  Strictly speaking
      this isn't necessary because once an inode has gone through reclaim it
      must go through xfs_inode_alloc (which also zeroes the state) and
      xfs_iget is careful to check for mismatches between the inode it pulls
      out of the radix tree and the one it wants.
      
      Fixes: 6772c1f1 ("xfs: track metadata health status")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      255794c7
    • D
      Merge tag 'inode-walk-cleanups-5.14_2021-06-03' of... · ffc18582
      Darrick J. Wong 提交于
      Merge tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2
      
      xfs: clean up incore inode walk functions
      
      This ambitious series aims to cleans up redundant inode walk code in
      xfs_icache.c, hide implementation details of the quotaoff dquot release
      code, and eliminates indirect function calls from incore inode walks.
      
      The first thing it does is to move all the code that quotaoff calls to
      release dquots from all incore inodes into xfs_icache.c.  Next, it
      separates the goal of an inode walk from the actual radix tree tags that
      may or may not be involved and drops the kludgy XFS_ICI_NO_TAG thing.
      Finally, we split the speculative preallocation (blockgc) and quotaoff
      dquot release code paths into separate functions so that we can keep the
      implementations cohesive.
      
      Christoph suggested last cycle that we 'simply' change quotaoff not to
      allow deactivating quota entirely, but as these cleanups are to enable
      one major change in behavior (deferred inode inactivation) I do not want
      to add a second behavior change (quotaoff) as a dependency.
      
      To be blunt: Additional cleanups are not in scope for this series.
      
      Next, I made two observations about incore inode radix tree walks --
      since there's a 1:1 mapping between the walk goal and the per-inode
      processing function passed in, we can use the goal to make a direct call
      to the processing function.  Furthermore, the only caller to supply a
      nonzero iter_flags argument is quotaoff, and there's only one INEW flag.
      
      From that observation, I concluded that it's quite possible to remove
      two parameters from the xfs_inode_walk* function signatures -- the
      iter_flags, and the execute function pointer.  The middle of the series
      moves the INEW functionality into the one piece (quotaoff) that wants
      it, and removes the indirect calls.
      
      The final observation is that the inode reclaim walk loop is now almost
      the same as xfs_inode_walk, so it's silly to maintain two copies.  Merge
      the reclaim loop code into xfs_inode_walk.
      
      Lastly, refactor the per-ag radix tagging functions since there's
      duplicated code that can be consolidated.
      
      This series is a prerequisite for the next two patchsets, since deferred
      inode inactivation will add another inode radix tree tag and iterator
      function to xfs_inode_walk.
      
      v2: walk the vfs inode list when running quotaoff instead of the radix
          tree, then rework the (now completely internal) inode walk function
          to take the tag as the main parameter.
      v3: merge the reclaim loop into xfs_inode_walk, then consolidate the
          radix tree tagging functions
      v4: rebase to 5.13-rc4
      v5: combine with the quotaoff patchset, reorder functions to minimize
          forward declarations, split inode walk goals from radix tree tags
          to reduce conceptual confusion
      v6: start moving the inode cache code towards the xfs_icwalk prefix
      
      * tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: refactor per-AG inode tagging functions
        xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag
        xfs: pass struct xfs_eofblocks to the inode scan callback
        xfs: fix radix tree tag signs
        xfs: make the icwalk processing functions clean up the grab state
        xfs: clean up inode state flag tests in xfs_blockgc_igrab
        xfs: remove indirect calls from xfs_inode_walk{,_ag}
        xfs: remove iter_flags parameter from xfs_inode_walk_*
        xfs: move xfs_inew_wait call into xfs_dqrele_inode
        xfs: separate the dqrele_all inode grab logic from xfs_inode_walk_ag_grab
        xfs: pass the goal of the incore inode walk to xfs_inode_walk()
        xfs: rename xfs_inode_walk functions to xfs_icwalk
        xfs: move the inode walk functions further down
        xfs: detach inode dquots at the end of inactivation
        xfs: move the quotaoff dqrele inode walk into xfs_icache.c
      
      [djwong: added variable names to function declarations while fixing
      merge conflicts]
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ffc18582
    • D
      Merge tag 'assorted-fixes-5.14-1_2021-06-03' of... · 8b943d21
      Darrick J. Wong 提交于
      Merge tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2
      
      xfs: assorted fixes for 5.14, part 1
      
      This branch contains the first round of various small fixes for 5.14.
      
      * tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: don't take a spinlock unconditionally in the DIO fastpath
        xfs: mark xfs_bmap_set_attrforkoff static
        xfs: Remove redundant assignment to busy
        xfs: sort variable alphabetically to avoid repeated declaration
      8b943d21
    • D
      Merge tag 'unit-conversion-cleanups-5.14_2021-06-03' of... · f52edf6c
      Darrick J. Wong 提交于
      Merge tag 'unit-conversion-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2
      
      xfs: various unit conversions
      
      Crafting the realtime file extent size hint fixes revealed various
      opportunities to clean up unit conversions, so now that gets its own
      series.
      
      * tag 'unit-conversion-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux:
        xfs: remove unnecessary shifts
        xfs: clean up open-coded fs block unit conversions
      f52edf6c
    • D
      xfs: drop the AGI being passed to xfs_check_agi_freecount · 9ba0889e
      Dave Chinner 提交于
      From: Dave Chinner <dchinner@redhat.com>
      
      Stephen Rothwell reported this compiler warning from linux-next:
      
      fs/xfs/libxfs/xfs_ialloc.c: In function 'xfs_difree_finobt':
      fs/xfs/libxfs/xfs_ialloc.c:2032:20: warning: unused variable 'agi' [-Wunused-variable]
       2032 |  struct xfs_agi   *agi = agbp->b_addr;
      
      Which is fallout from agno -> perag conversions that were done in
      this function. xfs_check_agi_freecount() is the only user of "agi"
      in xfs_difree_finobt() now, and it only uses the agi to get the
      current free inode count. We hold that in the perag structure, so
      there's not need to directly reference the raw AGI to get this
      information.
      
      The btree cursor being passed to xfs_check_agi_freecount() has a
      reference to the perag being operated on, so use that directly in
      xfs_check_agi_freecount() rather than passing an AGI.
      
      Fixes: 7b13c515 ("xfs: use perag for ialloc btree cursors")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      9ba0889e
    • D
      Merge tag 'xfs-perag-conv-tag' of... · c3eabd36
      Darrick J. Wong 提交于
      Merge tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2
      
      xfs: initial agnumber -> perag conversions for shrink
      
      If we want to use active references to the perag to be able to gate
      shrink removing AGs and hence perags safely, we've got a fair bit of
      work to do actually use perags in all the places we need to.
      
      There's a lot of code that iterates ag numbers and then
      looks up perags from that, often multiple times for the same perag
      in the one operation. If we want to use reference counted perags for
      access control, then we need to convert all these uses to perag
      iterators, not agno iterators.
      
      [Patches 1-4]
      
      The first step of this is consolidating all the perag management -
      init, free, get, put, etc into a common location. THis is spread all
      over the place right now, so move it all into libxfs/xfs_ag.[ch].
      This does expose kernel only bits of the perag to libxfs and hence
      userspace, so the structures and code is rearranged to minimise the
      number of ifdefs that need to be added to the userspace codebase.
      The perag iterator in xfs_icache.c is promoted to a first class API
      and expanded to the needs of the code as required.
      
      [Patches 5-10]
      
      These are the first basic perag iterator conversions and changes to
      pass the perag down the stack from those iterators where
      appropriate. A lot of this is obvious, simple changes, though in
      some places we stop passing the perag down the stack because the
      code enters into an as yet unconverted subsystem that still uses raw
      AGs.
      
      [Patches 11-16]
      
      These replace the agno passed in the btree cursor for per-ag btree
      operations with a perag that is passed to the cursor init function.
      The cursor takes it's own reference to the perag, and the reference
      is dropped when the cursor is deleted. Hence we get reference
      coverage for the entire time the cursor is active, even if the code
      that initialised the cursor drops it's reference before the cursor
      or any of it's children (duplicates) have been deleted.
      
      The first patch adds the perag infrastructure for the cursor, the
      next four patches convert a btree cursor at a time, and the last
      removes the agno from the cursor once it is unused.
      
      [Patches 17-21]
      
      These patches are a demonstration of the simplifications and
      cleanups that come from plumbing the perag through interfaces that
      select and then operate on a specific AG. In this case the inode
      allocation algorithm does up to three walks across all AGs before it
      either allocates an inode or fails. Two of these walks are purely
      just to select the AG, and even then it doesn't guarantee inode
      allocation success so there's a third walk if the selected AG
      allocation fails.
      
      These patches collapse the selection and allocation into a single
      loop, simplifies the error handling because xfs_dir_ialloc() always
      returns ENOSPC if no AG was selected for inode allocation or we fail
      to allocate an inode in any AG, gets rid of xfs_dir_ialloc()
      wrapper, converts inode allocation to run entirely from a single
      perag instance, and then factors xfs_dialloc() into a much, much
      simpler loop which is easy to understand.
      
      Hence we end up with the same inode allocation logic, but it only
      needs two complete iterations at worst, makes AG selection and
      allocation atomic w.r.t. shrink and chops out out over 100 lines of
      code from this hot code path.
      
      [Patch 22]
      
      Converts the unlink path to pass perags through it.
      
      There's more conversion work to be done, but this patchset gets
      through a large chunk of it in one hit. Most of the iterators are
      converted, so once this is solidified we can move on to converting
      these to active references for being able to free perags while the
      fs is still active.
      
      * tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (23 commits)
        xfs: remove xfs_perag_t
        xfs: use perag through unlink processing
        xfs: clean up and simplify xfs_dialloc()
        xfs: inode allocation can use a single perag instance
        xfs: get rid of xfs_dir_ialloc()
        xfs: collapse AG selection for inode allocation
        xfs: simplify xfs_dialloc_select_ag() return values
        xfs: remove agno from btree cursor
        xfs: use perag for ialloc btree cursors
        xfs: convert allocbt cursors to use perags
        xfs: convert refcount btree cursor to use perags
        xfs: convert rmap btree cursor to using a perag
        xfs: add a perag to the btree cursor
        xfs: pass perags around in fsmap data dev functions
        xfs: push perags through the ag reservation callouts
        xfs: pass perags through to the busy extent code
        xfs: convert secondary superblock walk to use perags
        xfs: convert xfs_iwalk to use perag references
        xfs: convert raw ag walks to use for_each_perag
        xfs: make for_each_perag... a first class citizen
        ...
      c3eabd36
    • D
      Merge tag 'xfs-buf-bulk-alloc-tag' of... · ebf2e337
      Darrick J. Wong 提交于
      Merge tag 'xfs-buf-bulk-alloc-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2
      
      xfs: buffer cache bulk page allocation
      
      This patchset makes use of the new bulk page allocation interface to
      reduce the overhead of allocating large numbers of pages in a
      loop.
      
      The first two patches are refactoring buffer memory allocation and
      converting the uncached buffer path to use the same page allocation
      path, followed by converting the page allocation path to use bulk
      allocation.
      
      The rest of the patches are then consolidation of the page
      allocation and freeing code to simplify the code and remove a chunk
      of unnecessary abstraction. This is largely based on a series of
      changes made by Christoph Hellwig.
      
      * tag 'xfs-buf-bulk-alloc-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
        xfs: merge xfs_buf_allocate_memory
        xfs: cleanup error handling in xfs_buf_get_map
        xfs: get rid of xb_to_gfp()
        xfs: simplify the b_page_count calculation
        xfs: remove ->b_offset handling for page backed buffers
        xfs: move page freeing into _xfs_buf_free_pages()
        xfs: merge _xfs_buf_get_pages()
        xfs: use alloc_pages_bulk_array() for buffers
        xfs: use xfs_buf_alloc_pages for uncached buffers
        xfs: split up xfs_buf_allocate_memory
      ebf2e337
  4. 07 6月, 2021 5 次提交
  5. 04 6月, 2021 15 次提交
  6. 03 6月, 2021 1 次提交
    • D
      xfs: don't take a spinlock unconditionally in the DIO fastpath · 977ec4dd
      Dave Chinner 提交于
      Because this happens at high thread counts on high IOPS devices
      doing mixed read/write AIO-DIO to a single file at about a million
      iops:
      
         64.09%     0.21%  [kernel]            [k] io_submit_one
         - 63.87% io_submit_one
            - 44.33% aio_write
               - 42.70% xfs_file_write_iter
                  - 41.32% xfs_file_dio_write_aligned
                     - 25.51% xfs_file_write_checks
                        - 21.60% _raw_spin_lock
                           - 21.59% do_raw_spin_lock
                              - 19.70% __pv_queued_spin_lock_slowpath
      
      This also happens of the IO completion IO path:
      
         22.89%     0.69%  [kernel]            [k] xfs_dio_write_end_io
         - 22.49% xfs_dio_write_end_io
            - 21.79% _raw_spin_lock
               - 20.97% do_raw_spin_lock
                  - 20.10% __pv_queued_spin_lock_slowpath
      
      IOWs, fio is burning ~14 whole CPUs on this spin lock.
      
      So, do an unlocked check against inode size first, then if we are
      at/beyond EOF, take the spinlock and recheck. This makes the
      spinlock disappear from the overwrite fastpath.
      
      I'd like to report that fixing this makes things go faster. It
      doesn't - it just exposes the the XFS_ILOCK as the next severe
      contention point doing extent mapping lookups, and that now burns
      all the 14 CPUs this spinlock was burning.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      977ec4dd