1. 13 2月, 2023 9 次提交
  2. 11 2月, 2023 1 次提交
    • D
      xfs: prefer free inodes at ENOSPC over chunk allocation · f08f984c
      Dave Chinner 提交于
      When an XFS filesystem has free inodes in chunks already allocated
      on disk, it will still allocate new inode chunks if the target AG
      has no free inodes in it. Normally, this is a good idea as it
      preserves locality of all the inodes in a given directory.
      
      However, at ENOSPC this can lead to using the last few remaining
      free filesystem blocks to allocate a new chunk when there are many,
      many free inodes that could be allocated without consuming free
      space. This results in speeding up the consumption of the last few
      blocks and inode create operations then returning ENOSPC when there
      free inodes available because we don't have enough block left in the
      filesystem for directory creation reservations to proceed.
      
      Hence when we are near ENOSPC, we should be attempting to preserve
      the remaining blocks for directory block allocation rather than
      using them for unnecessary inode chunk creation.
      
      This particular behaviour is exposed by xfs/294, when it drives to
      ENOSPC on empty file creation whilst there are still thousands of
      free inodes available for allocation in other AGs in the filesystem.
      
      Hence, when we are within 1% of ENOSPC, change the inode allocation
      behaviour to prefer to use existing free inodes over allocating new
      inode chunks, even though it results is poorer locality of the data
      set. It is more important for the allocations to be space efficient
      near ENOSPC than to have optimal locality for performance, so lets
      modify the inode AG selection code to reflect that fact.
      
      This allows generic/294 to not only pass with this allocator rework
      patchset, but to increase the number of post-ENOSPC empty inode
      allocations to from ~600 to ~9080 before we hit ENOSPC on the
      directory create transaction reservation.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      f08f984c
  3. 18 11月, 2022 1 次提交
  4. 12 10月, 2022 2 次提交
    • J
      treewide: use get_random_u32() when possible · a251c17a
      Jason A. Donenfeld 提交于
      The prandom_u32() function has been a deprecated inline wrapper around
      get_random_u32() for several releases now, and compiles down to the
      exact same code. Replace the deprecated wrapper with a direct call to
      the real function. The same also applies to get_random_int(), which is
      just a wrapper around get_random_u32(). This was done as a basic find
      and replace.
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
      Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
      Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Acked-by: Helge Deller <deller@gmx.de> # for parisc
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      a251c17a
    • J
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld 提交于
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
  5. 07 7月, 2022 7 次提交
  6. 21 4月, 2022 1 次提交
  7. 11 4月, 2022 1 次提交
  8. 23 10月, 2021 1 次提交
  9. 20 10月, 2021 1 次提交
  10. 20 8月, 2021 5 次提交
  11. 19 8月, 2021 1 次提交
  12. 10 8月, 2021 1 次提交
  13. 16 7月, 2021 1 次提交
    • D
      xfs: check for sparse inode clusters that cross new EOAG when shrinking · da062d16
      Darrick J. Wong 提交于
      While running xfs/168, I noticed occasional write verifier shutdowns
      involving inodes at the very end of the filesystem.  Existing inode
      btree validation code checks that all inode clusters are fully contained
      within the filesystem.
      
      However, due to inadequate checking in the fs shrink code, it's possible
      that there could be a sparse inode cluster at the end of the filesystem
      where the upper inodes of the cluster are marked as holes and the
      corresponding blocks are free.  In this case, the last blocks in the AG
      are listed in the bnobt.  This enables the shrink to proceed but results
      in a filesystem that trips the inode verifiers.  Fix this by disallowing
      the shrink.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NGao Xiang <hsiangkao@linux.alibaba.com>
      da062d16
  14. 18 6月, 2021 1 次提交
  15. 09 6月, 2021 1 次提交
    • D
      xfs: drop the AGI being passed to xfs_check_agi_freecount · 9ba0889e
      Dave Chinner 提交于
      From: Dave Chinner <dchinner@redhat.com>
      
      Stephen Rothwell reported this compiler warning from linux-next:
      
      fs/xfs/libxfs/xfs_ialloc.c: In function 'xfs_difree_finobt':
      fs/xfs/libxfs/xfs_ialloc.c:2032:20: warning: unused variable 'agi' [-Wunused-variable]
       2032 |  struct xfs_agi   *agi = agbp->b_addr;
      
      Which is fallout from agno -> perag conversions that were done in
      this function. xfs_check_agi_freecount() is the only user of "agi"
      in xfs_difree_finobt() now, and it only uses the agi to get the
      current free inode count. We hold that in the perag structure, so
      there's not need to directly reference the raw AGI to get this
      information.
      
      The btree cursor being passed to xfs_check_agi_freecount() has a
      reference to the perag being operated on, so use that directly in
      xfs_check_agi_freecount() rather than passing an AGI.
      
      Fixes: 7b13c515 ("xfs: use perag for ialloc btree cursors")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      9ba0889e
  16. 02 6月, 2021 6 次提交