1. 16 6月, 2022 1 次提交
    • D
      xfs: fix TOCTOU race involving the new logged xattrs control knob · f4288f01
      Darrick J. Wong 提交于
      I found a race involving the larp control knob, aka the debugging knob
      that lets developers enable logging of extended attribute updates:
      
      Thread 1			Thread 2
      
      echo 0 > /sys/fs/xfs/debug/larp
      				setxattr(REPLACE)
      				xfs_has_larp (returns false)
      				xfs_attr_set
      
      echo 1 > /sys/fs/xfs/debug/larp
      
      				xfs_attr_defer_replace
      				xfs_attr_init_replace_state
      				xfs_has_larp (returns true)
      				xfs_attr_init_remove_state
      
      				<oops, wrong DAS state!>
      
      This isn't a particularly severe problem right now because xattr logging
      is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
      what they're doing.
      
      However, the eventual intent is that callers should be able to ask for
      the assistance of the log in persisting xattr updates.  This capability
      might not be required for /all/ callers, which means that dynamic
      control must work correctly.  Once an xattr update has decided whether
      or not to use logged xattrs, it needs to stay in that mode until the end
      of the operation regardless of what subsequent parallel operations might
      do.
      
      Therefore, it is an error to continue sampling xfs_globals.larp once
      xfs_attr_change has made a decision about larp, and it was not correct
      for me to have told Allison that ->create_intent functions can sample
      the global log incompat feature bitfield to decide to elide a log item.
      
      Instead, create a new op flag for the xfs_da_args structure, and convert
      all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
      the attr update state machine to look for the operations flag.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      f4288f01
  2. 22 5月, 2022 1 次提交
  3. 12 5月, 2022 2 次提交
    • D
      xfs: ATTR_REPLACE algorithm with LARP enabled needs rework · fdaf1bb3
      Dave Chinner 提交于
      We can't use the same algorithm for replacing an existing attribute
      when logging attributes. The existing algorithm is essentially:
      
      1. create new attr w/ INCOMPLETE
      2. atomically flip INCOMPLETE flags between old + new attribute
      3. remove old attr which is marked w/ INCOMPLETE
      
      This algorithm guarantees that we see either the old or new
      attribute, and if we fail after the atomic flag flip, we don't have
      to recover the removal of the old attr because we never see
      INCOMPLETE attributes in lookups.
      
      For logged attributes, however, this does not work. The logged
      attribute intents do not track the work that has been done as the
      transaction rolls, and hence the only recovery mechanism we have is
      "run the replace operation from scratch".
      
      This is further exacerbated by the attempt to avoid needing the
      INCOMPLETE flag to create an atomic swap. This means we can create
      a second active attribute of the same name before we remove the
      original. If we fail at any point after the create but before the
      removal has completed, we end up with duplicate attributes in
      the attr btree and recovery only tries to replace one of them.
      
      There are several other failure modes where we can leave partially
      allocated remote attributes that expose stale data, partially free
      remote attributes that enable UAF based stale data exposure, etc.
      
      TO fix this, we need a different algorithm for replace operations
      when LARP is enabled. Luckily, it's not that complex if we take the
      right first step. That is, the first thing we log is the attri
      intent with the new name/value pair and mark the old attr as
      INCOMPLETE in the same transaction.
      
      From there, we then remove the old attr and keep relogging the
      new name/value in the intent, such that we always know that we have
      to create the new attr in recovery. Once the old attr is removed,
      we then run a normal ATTR_CREATE operation relogging the intent as
      we go. If the new attr is local, then it gets created in a single
      atomic transaction that also logs the final intent done. If the new
      attr is remote, the we set INCOMPLETE on the new attr while we
      allocate and set the remote value, and then we clear the INCOMPLETE
      flag at in the last transaction taht logs the final intent done.
      
      If we fail at any point in this algorithm, log recovery will always
      see the same state on disk: the new name/value in the intent, and
      either an INCOMPLETE attr or no attr in the attr btree. If we find
      an INCOMPLETE attr, we run the full replace starting with removing
      the INCOMPLETE attr. If we don't find it, then we simply create the
      new attr.
      
      Notably, recovery of a failed create that has an INCOMPLETE flag set
      is now the same - we start with the lookup of the INCOMPLETE attr,
      and if that exists then we do the full replace recovery process,
      otherwise we just create the new attr.
      
      Hence changing the way we do the replace operation when LARP is
      enabled allows us to use the same log recovery algorithm for both
      the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
      algorithm we use for runtime ATTR_REPLACE operations (except for the
      step setting up the initial conditions).
      
      The result is that:
      
      - ATTR_CREATE uses the same algorithm regardless of whether LARP is
        enabled or not
      - ATTR_REPLACE with larp=0 is identical to the old algorithm
      - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
        from the larp=0 code and then runs the unmodified ATTR_CREATE
        code.
      - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
        it uses at runtime.
      
      Because the state machine is now quite clean, changing the algorithm
      is really just a case of changing the initial state and how the
      states link together for the ATTR_REPLACE case. Hence it's not a
      huge amount of code for what is a fairly substantial rework
      of the attr logging and recovery algorithm....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fdaf1bb3
    • D
      xfs: use XFS_DA_OP flags in deferred attr ops · e7f358de
      Dave Chinner 提交于
      We currently store the high level attr operation in
      args->attr_flags. This field contains what the VFS is telling us to
      do, but don't necessarily match what we are doing in the low level
      modification state machine. e.g. XATTR_REPLACE implies both
      XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a
      remove and adding a new attr.
      
      However, deep in the individual state machine operations, we check
      errors against this high level VFS op flags, not the low level
      XFS_DA_OP flags. Indeed, we don't even have a low level flag for
      a REMOVE operation, so the only way we know we are doing a remove
      is the complete absence of XATTR_REPLACE, XATTR_CREATE,
      XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other
      flags in these fields, this is a pain to check if we need to.
      
      As the XFS_DA_OP flags are only needed once the deferred operations
      are set up, set these flags appropriately when we set the initial
      operation state. We also introduce a XFS_DA_OP_REMOVE flag to make
      it easy to know that we are doing a remove operation.
      
      With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE
      in low level lookup operations, and manipulate the low level flags
      according to the low level context that is operating. e.g. log
      recovery does not have a VFS xattr operation state to copy into
      args->attr_flags, and the low level state machine ops we do for
      recovery do not match the high level VFS operations that were in
      progress when the system failed...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e7f358de
  4. 21 4月, 2022 1 次提交
  5. 13 4月, 2022 1 次提交
    • C
      xfs: Directory's data fork extent counter can never overflow · 83a21c18
      Chandan Babu R 提交于
      The maximum file size that can be represented by the data fork extent counter
      in the worst case occurs when all extents are 1 block in length and each block
      is 1KB in size.
      
      With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
      1KB sized blocks, a file can reach upto,
      (2^31) * 1KB = 2TB
      
      This is much larger than the theoretical maximum size of a directory
      i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
      
      Since a directory's inode can never overflow its data fork extent counter,
      this commit removes all the overflow checks associated with
      it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
      data fork is larger than 96GB.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NChandan Babu R <chandan.babu@oracle.com>
      83a21c18
  6. 23 10月, 2021 2 次提交
  7. 29 7月, 2020 1 次提交
  8. 14 5月, 2020 1 次提交
  9. 03 3月, 2020 5 次提交
  10. 10 1月, 2020 1 次提交
  11. 23 11月, 2019 5 次提交
  12. 14 11月, 2019 1 次提交
  13. 11 11月, 2019 12 次提交
  14. 31 8月, 2019 1 次提交
    • D
      xfs: allocate xattr buffer on demand · ddbca70c
      Dave Chinner 提交于
      When doing file lookups and checking for permissions, we end up in
      xfs_get_acl() to see if there are any ACLs on the inode. This
      requires and xattr lookup, and to do that we have to supply a buffer
      large enough to hold an maximum sized xattr.
      
      On workloads were we are accessing a wide range of cache cold files
      under memory pressure (e.g. NFS fileservers) we end up spending a
      lot of time allocating the buffer. The buffer is 64k in length, so
      is a contiguous multi-page allocation, and if that then fails we
      fall back to vmalloc(). Hence the allocation here is /expensive/
      when we are looking up hundreds of thousands of files a second.
      
      Initial numbers from a bpf trace show average time in xfs_get_acl()
      is ~32us, with ~19us of that in the memory allocation. Note these
      are average times, so there are going to be affected by the worst
      case allocations more than the common fast case...
      
      To avoid this, we could just do a "null"  lookup to see if the ACL
      xattr exists and then only do the allocation if it exists. This,
      however, optimises the path for the "no ACL present" case at the
      expense of the "acl present" case. i.e. we can halve the time in
      xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that
      then increases the ACL case by 30% (i.e. up to 40-45us).
      
      To solve this and speed up both cases, drive the xattr buffer
      allocation into the attribute code once we know what the actual
      xattr length is. For the no-xattr case, we avoid the allocation
      completely, speeding up that case. For the common ACL case, we'll
      end up with a fast heap allocation (because it'll be smaller than a
      page), and only for the rarer "we have a remote xattr" will we have
      a multi-page allocation occur. Hence the common ACL case will be
      much faster, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      ddbca70c
  15. 03 8月, 2018 1 次提交
  16. 12 7月, 2018 2 次提交
  17. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  18. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d