1. 08 4月, 2021 15 次提交
  2. 26 3月, 2021 3 次提交
    • B
      xfs: Rudimentary spelling fix · 0145225e
      Bhaskar Chowdhury 提交于
      s/sytemcall/syscall/
      Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
      Acked-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      0145225e
    • D
      xfs: initialise attr fork on inode create · e6a688c3
      Dave Chinner 提交于
      When we allocate a new inode, we often need to add an attribute to
      the inode as part of the create. This can happen as a result of
      needing to add default ACLs or security labels before the inode is
      made visible to userspace.
      
      This is highly inefficient right now. We do the create transaction
      to allocate the inode, then we do an "add attr fork" transaction to
      modify the just created empty inode to set the inode fork offset to
      allow attributes to be stored, then we go and do the attribute
      creation.
      
      This means 3 transactions instead of 1 to allocate an inode, and
      this greatly increases the load on the CIL commit code, resulting in
      excessive contention on the CIL spin locks and performance
      degradation:
      
       18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
        3.57%  [kernel]                [k] do_raw_spin_lock
        2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
        2.48%  [kernel]                [k] memcpy
        2.34%  [kernel]                [k] xfs_log_commit_cil
      
      The typical profile resulting from running fsmark on a selinux enabled
      filesytem is adds this overhead to the create path:
      
        - 15.30% xfs_init_security
           - 15.23% security_inode_init_security
      	- 13.05% xfs_initxattrs
      	   - 12.94% xfs_attr_set
      	      - 6.75% xfs_bmap_add_attrfork
      		 - 5.51% xfs_trans_commit
      		    - 5.48% __xfs_trans_commit
      		       - 5.35% xfs_log_commit_cil
      			  - 3.86% _raw_spin_lock
      			     - do_raw_spin_lock
      				  __pv_queued_spin_lock_slowpath
      		 - 0.70% xfs_trans_alloc
      		      0.52% xfs_trans_reserve
      	      - 5.41% xfs_attr_set_args
      		 - 5.39% xfs_attr_set_shortform.constprop.0
      		    - 4.46% xfs_trans_commit
      		       - 4.46% __xfs_trans_commit
      			  - 4.33% xfs_log_commit_cil
      			     - 2.74% _raw_spin_lock
      				- do_raw_spin_lock
      				     __pv_queued_spin_lock_slowpath
      			       0.60% xfs_inode_item_format
      		      0.90% xfs_attr_try_sf_addname
      	- 1.99% selinux_inode_init_security
      	   - 1.02% security_sid_to_context_force
      	      - 1.00% security_sid_to_context_core
      		 - 0.92% sidtab_entry_to_string
      		    - 0.90% sidtab_sid2str_get
      			 0.59% sidtab_sid2str_put.part.0
      	   - 0.82% selinux_determine_inode_label
      	      - 0.77% security_transition_sid
      		   0.70% security_compute_sid.part.0
      
      And fsmark creation rate performance drops by ~25%. The key point to
      note here is that half the additional overhead comes from adding the
      attribute fork to the newly created inode. That's crazy, considering
      we can do this same thing at inode create time with a couple of
      lines of code and no extra overhead.
      
      So, if we know we are going to add an attribute immediately after
      creating the inode, let's just initialise the attribute fork inside
      the create transaction and chop that whole chunk of code out of
      the create fast path. This completely removes the performance
      drop caused by enabling SELinux, and the profile looks like:
      
           - 8.99% xfs_init_security
               - 9.00% security_inode_init_security
                  - 6.43% xfs_initxattrs
                     - 6.37% xfs_attr_set
                        - 5.45% xfs_attr_set_args
                           - 5.42% xfs_attr_set_shortform.constprop.0
                              - 4.51% xfs_trans_commit
                                 - 4.54% __xfs_trans_commit
                                    - 4.59% xfs_log_commit_cil
                                       - 2.67% _raw_spin_lock
                                          - 3.28% do_raw_spin_lock
                                               3.08% __pv_queued_spin_lock_slowpath
                                         0.66% xfs_inode_item_format
                              - 0.90% xfs_attr_try_sf_addname
                        - 0.60% xfs_trans_alloc
                  - 2.35% selinux_inode_init_security
                     - 1.25% security_sid_to_context_force
                        - 1.21% security_sid_to_context_core
                           - 1.19% sidtab_entry_to_string
                              - 1.20% sidtab_sid2str_get
                                 - 0.86% sidtab_sid2str_put.part.0
                                    - 0.62% _raw_spin_lock_irqsave
                                       - 0.77% do_raw_spin_lock
                                            __pv_queued_spin_lock_slowpath
                     - 0.84% selinux_determine_inode_label
                        - 0.83% security_transition_sid
                             0.86% security_compute_sid.part.0
      
      Which indicates the XFS overhead of creating the selinux xattr has
      been halved. This doesn't fix the CIL lock contention problem, just
      means it's not a limiting factor for this workload. Lock contention
      in the security subsystems is going to be an issue soon, though...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [djwong: fix compilation error when CONFIG_SECURITY=n]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NGao Xiang <hsiangkao@redhat.com>
      e6a688c3
    • D
      xfs: prevent metadata files from being inactivated · 383e32b0
      Darrick J. Wong 提交于
      Files containing metadata (quota records, rt bitmap and summary info)
      are fully managed by the filesystem, which means that all resource
      cleanup must be explicit, not automatic.  This means that they should
      never be subjected automatic to post-eof truncation, nor should they be
      freed automatically even if the link count drops to zero.
      
      In other words, xfs_inactive() should leave these files alone.  Add the
      necessary predicate functions to make this happen.  This adds a second
      layer of prevention for the kinds of fs corruption that was fixed by
      commit f4c32e87.  If we ever decide to support removing metadata
      files, we should make all those metadata updates explicit.
      
      Rearrange the order of #includes to fix compiler errors, since
      xfs_mount.h is supposed to be included before xfs_inode.h
      
      Followup-to: f4c32e87 ("xfs: fix realtime bitmap/summary file truncation when growing rt volume")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      383e32b0
  3. 10 3月, 2021 1 次提交
  4. 04 2月, 2021 2 次提交
  5. 24 1月, 2021 1 次提交
  6. 23 1月, 2021 4 次提交
    • C
      xfs: fix up non-directory creation in SGID directories · 01ea173e
      Christoph Hellwig 提交于
      XFS always inherits the SGID bit if it is set on the parent inode, while
      the generic inode_init_owner does not do this in a few cases where it can
      create a possible security problem, see commit 0fa3ecd8
      ("Fix up non-directory creation in SGID directories") for details.
      
      Switch XFS to use the generic helper for the normal path to fix this,
      just keeping the simple field inheritance open coded for the case of the
      non-sgid case with the bsdgrpid mount option.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      01ea173e
    • C
      xfs: Check for extent overflow when renaming dir entries · 02092a2f
      Chandan Babu R 提交于
      A rename operation is essentially a directory entry remove operation
      from the perspective of parent directory (i.e. src_dp) of rename's
      source. Hence the only place where we check for extent count overflow
      for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real()
      returns -ENOSPC when it detects a possible extent count overflow and in
      response, the higher layers of directory handling code do the following:
      1. Data/Free blocks: XFS lets these blocks linger until a future remove
         operation removes them.
      2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf
         space and unmaps the last block.
      
      For target_dp, there are two cases depending on whether the destination
      directory entry exists or not.
      
      When destination directory entry does not exist (i.e. target_ip ==
      NULL), extent count overflow check is performed only when transaction
      has a non-zero sized space reservation associated with it.  With a
      zero-sized space reservation, XFS allows a rename operation to continue
      only when the directory has sufficient free space in its data/leaf/free
      space blocks to hold the new entry.
      
      When destination directory entry exists (i.e. target_ip != NULL), all
      we need to do is change the inode number associated with the already
      existing entry. Hence there is no need to perform an extent count
      overflow check.
      Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      02092a2f
    • C
      xfs: Check for extent overflow when adding dir entries · f5d92749
      Chandan Babu R 提交于
      Directory entry addition can cause the following,
      1. Data block can be added/removed.
         A new extent can cause extent count to increase by 1.
      2. Free disk block can be added/removed.
         Same behaviour as described above for Data block.
      3. Dabtree blocks.
         XFS_DA_NODE_MAXDEPTH blocks can be added. Each of these
         can be new extents. Hence extent count can increase by
         XFS_DA_NODE_MAXDEPTH.
      Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f5d92749
    • D
      xfs: fix an ABBA deadlock in xfs_rename · 6da1b4b1
      Darrick J. Wong 提交于
      When overlayfs is running on top of xfs and the user unlinks a file in
      the overlay, overlayfs will create a whiteout inode and ask xfs to
      "rename" the whiteout file atop the one being unlinked.  If the file
      being unlinked loses its one nlink, we then have to put the inode on the
      unlinked list.
      
      This requires us to grab the AGI buffer of the whiteout inode to take it
      off the unlinked list (which is where whiteouts are created) and to grab
      the AGI buffer of the file being deleted.  If the whiteout was created
      in a higher numbered AG than the file being deleted, we'll lock the AGIs
      in the wrong order and deadlock.
      
      Therefore, grab all the AGI locks we think we'll need ahead of time, and
      in order of increasing AG number per the locking rules.
      Reported-by: Nwenli xie <wlxie7296@gmail.com>
      Fixes: 93597ae8 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      6da1b4b1
  7. 13 12月, 2020 4 次提交
  8. 10 12月, 2020 2 次提交
  9. 22 9月, 2020 1 次提交
  10. 16 9月, 2020 4 次提交
  11. 07 9月, 2020 1 次提交
    • D
      xfs: xfs_iflock is no longer a completion · 718ecc50
      Dave Chinner 提交于
      With the recent rework of the inode cluster flushing, we no longer
      ever wait on the the inode flush "lock". It was never a lock in the
      first place, just a completion to allow callers to wait for inode IO
      to complete. We now never wait for flush completion as all inode
      flushing is non-blocking. Hence we can get rid of all the iflock
      infrastructure and instead just set and check a state flag.
      
      Rename the XFS_IFLOCK flag to XFS_IFLUSHING, convert all the
      xfs_iflock_nowait() test-and-set operations on that flag, and
      replace all the xfs_ifunlock() calls to clear operations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      718ecc50
  12. 05 8月, 2020 1 次提交
  13. 14 7月, 2020 1 次提交
    • G
      xfs: get rid of unnecessary xfs_perag_{get,put} pairs · 92a00544
      Gao Xiang 提交于
      In the course of some operations, we look up the perag from
      the mount multiple times to get or change perag information.
      These are often very short pieces of code, so while the
      lookup cost is generally low, the cost of the lookup is far
      higher than the cost of the operation we are doing on the
      perag.
      
      Since we changed buffers to hold references to the perag
      they are cached in, many modification contexts already hold
      active references to the perag that are held across these
      operations. This is especially true for any operation that
      is serialised by an allocation group header buffer.
      
      In these cases, we can just use the buffer's reference to
      the perag to avoid needing to do lookups to access the
      perag. This means that many operations don't need to do
      perag lookups at all to access the perag because they've
      already looked up objects that own persistent references
      and hence can use that reference instead.
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      92a00544