1. 13 2月, 2023 1 次提交
  2. 11 2月, 2023 2 次提交
    • D
      xfs: t_firstblock is tracking AGs not blocks · 692b6cdd
      Dave Chinner 提交于
      The tp->t_firstblock field is now raelly tracking the highest AG we
      have locked, not the block number of the highest allocation we've
      made. It's purpose is to prevent AGF locking deadlocks, so rename it
      to "highest AG" and simplify the implementation to just track the
      agno rather than a fsbno.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      692b6cdd
    • D
      xfs: fix low space alloc deadlock · 1dd0510f
      Dave Chinner 提交于
      I've recently encountered an ABBA deadlock with g/476. The upcoming
      changes seem to make this much easier to hit, but the underlying
      problem is a pre-existing one.
      
      Essentially, if we select an AG for allocation, then lock the AGF
      and then fail to allocate for some reason (e.g. minimum length
      requirements cannot be satisfied), then we drop out of the
      allocation with the AGF still locked.
      
      The caller then modifies the allocation constraints - usually
      loosening them up - and tries again. This can result in trying to
      access AGFs that are lower than the AGF we already have locked from
      the failed attempt. e.g. the failed attempt skipped several AGs
      before failing, so we have locks an AG higher than the start AG.
      Retrying the allocation from the start AG then causes us to violate
      AGF lock ordering and this can lead to deadlocks.
      
      The deadlock exists even if allocation succeeds - we can do a
      followup allocations in the same transaction for BMBT blocks that
      aren't guaranteed to be in the same AG as the original, and can move
      into higher AGs. Hence we really need to move the tp->t_firstblock
      tracking down into xfs_alloc_vextent() where it can be set when we
      exit with a locked AG.
      
      xfs_alloc_vextent() can also check there if the requested
      allocation falls within the allow range of AGs set by
      tp->t_firstblock. If we can't allocate within the range set, we have
      to fail the allocation. If we are allowed to to non-blocking AGF
      locking, we can ignore the AG locking order limitations as we can
      use try-locks for the first iteration over requested AG range.
      
      This invalidates a set of post allocation asserts that check that
      the allocation is always above tp->t_firstblock if it is set.
      Because we can use try-locks to avoid the deadlock in some
      circumstances, having a pre-existing locked AGF doesn't always
      prevent allocation from lower order AGFs. Hence those ASSERTs need
      to be removed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      1dd0510f
  3. 06 2月, 2023 1 次提交
  4. 18 11月, 2022 1 次提交
  5. 31 10月, 2022 1 次提交
  6. 12 10月, 2022 1 次提交
    • J
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld 提交于
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
  7. 23 7月, 2022 1 次提交
  8. 07 7月, 2022 7 次提交
  9. 21 4月, 2022 1 次提交
  10. 11 4月, 2022 1 次提交
  11. 22 3月, 2022 1 次提交
  12. 23 10月, 2021 4 次提交
  13. 20 10月, 2021 3 次提交
    • D
      xfs: compute absolute maximum nlevels for each btree type · 0ed5f735
      Darrick J. Wong 提交于
      Add code for all five btree types so that we can compute the absolute
      maximum possible btree height for each btree type.  This is a setup for
      the next patch, which makes every btree type have its own cursor cache.
      
      The functions are exported so that we can have xfs_db report the
      absolute maximum btree heights for each btree type, rather than making
      everyone run their own ad-hoc computations.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      0ed5f735
    • D
      xfs: rename m_ag_maxlevels to m_allocbt_maxlevels · 7cb3efb4
      Darrick J. Wong 提交于
      Years ago when XFS was thought to be much more simple, we introduced
      m_ag_maxlevels to specify the maximum btree height of per-AG btrees for
      a given filesystem mount.  Then we observed that inode btrees don't
      actually have the same height and split that off; and now we have rmap
      and refcount btrees with much different geometries and separate
      maxlevels variables.
      
      The 'ag' part of the name doesn't make much sense anymore, so rename
      this to m_alloc_maxlevels to reinforce that this is the maximum height
      of the *free space* btrees.  This sets us up for the next patch, which
      will add a variable to track the maximum height of all AG btrees.
      
      (Also take the opportunity to improve adjacent comments and fix minor
      style problems.)
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7cb3efb4
    • D
      xfs: prepare xfs_btree_cur for dynamic cursor heights · 6ca444cf
      Darrick J. Wong 提交于
      Split out the btree level information into a separate struct and put it
      at the end of the cursor structure as a VLA.  Files with huge data forks
      (and in the future, the realtime rmap btree) will require the ability to
      support many more levels than a per-AG btree cursor, which means that
      we're going to create per-btree type cursor caches to conserve memory
      for the more common case.
      
      Note that a subsequent patch actually introduces dynamic cursor heights.
      This one merely rearranges the structure to prepare for that.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      6ca444cf
  14. 15 10月, 2021 1 次提交
  15. 20 8月, 2021 4 次提交
  16. 19 8月, 2021 2 次提交
  17. 03 6月, 2021 1 次提交
  18. 02 6月, 2021 7 次提交