1. 16 4月, 2021 5 次提交
  2. 10 4月, 2021 1 次提交
  3. 08 4月, 2021 19 次提交
  4. 26 3月, 2021 8 次提交
    • G
      xfs: add error injection for per-AG resv failure · 2b92faed
      Gao Xiang 提交于
      per-AG resv failure after fixing up freespace is hard to test in an
      effective way, so directly add an error injection path to observe
      such error handling path works as expected.
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      2b92faed
    • G
      xfs: introduce xfs_ag_shrink_space() · 46141dc8
      Gao Xiang 提交于
      This patch introduces a helper to shrink unused space in the last AG
      by fixing up the freespace btree.
      
      Also make sure that the per-AG reservation works under the new AG
      size. If such per-AG reservation or extent allocation fails, roll
      the transaction so the new transaction could cancel without any side
      effects.
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      46141dc8
    • D
      xfs: reduce debug overhead of dir leaf/node checks · 1fea323f
      Dave Chinner 提交于
      On debug kernels, we call xfs_dir3_leaf_check_int() multiple times
      on every directory modification. The robust hash ordering checks it
      does on every entry in the leaf on every call results in a massive
      CPU overhead which slows down debug kernels by a large amount.
      
      We use xfs_dir3_leaf_check_int() for the verifiers as well, so we
      can't just gut the function to reduce overhead. What we can do,
      however, is reduce the work it does when it is called from the
      debug interfaces, just leaving the high level checks in place and
      leaving the robust validation to the verifiers. This means the debug
      checks will catch gross errors, but subtle bugs might not be caught
      until a verifier is run.
      
      It is easy enough to restore the existing debug behaviour if the
      developer needs it (just change a call parameter in the debug code),
      but overwise the overhead makes testing large directory block sizes
      on debug kernels very slow.
      
      Profile at an unlink rate of ~80k file/s on a 64k block size
      filesystem before the patch:
      
        40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
        10.98%  [kernel]  [k] __xfs_dir3_data_check
         8.10%  [kernel]  [k] xfs_verify_dir_ino
         4.42%  [kernel]  [k] memcpy
         2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
         1.52%  [kernel]  [k] do_raw_spin_lock
      
      Profile after, at an unlink rate of ~125k files/s (+50% improvement)
      has largely dropped the leaf verification debug overhead out of the
      profile.
      
        16.53%  [kernel]  [k] __xfs_dir3_data_check
        12.53%  [kernel]  [k] xfs_verify_dir_ino
         7.97%  [kernel]  [k] memcpy
         3.36%  [kernel]  [k] xfs_dir2_data_get_ftype
         2.86%  [kernel]  [k] __pv_queued_spin_lock_slowpath
      
      Create shows a similar change in profile and a +25% improvement in
      performance.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      1fea323f
    • D
      xfs: No need for inode number error injection in __xfs_dir3_data_check · 39d3c0b5
      Dave Chinner 提交于
      We call xfs_dir_ino_validate() for every dir entry in a directory
      when doing validity checking of the directory. It calls
      xfs_verify_dir_ino() then emits a corruption report if bad or does
      error injection if good. It is extremely costly:
      
        43.27%  [kernel]  [k] xfs_dir3_leaf_check_int
        10.28%  [kernel]  [k] __xfs_dir3_data_check
         6.61%  [kernel]  [k] xfs_verify_dir_ino
         4.16%  [kernel]  [k] xfs_errortag_test
         4.00%  [kernel]  [k] memcpy
         3.48%  [kernel]  [k] xfs_dir_ino_validate
      
      7% of the cpu usage in this directory traversal workload is
      xfs_dir_ino_validate() doing absolutely nothing.
      
      We don't need error injection to simulate a bad inode numbers in the
      directory structure because we can do that by fuzzing the structure
      on disk.
      
      And we don't need a corruption report, because the
      __xfs_dir3_data_check() will emit one if the inode number is bad.
      
      So just call xfs_verify_dir_ino() directly here, and get rid of all
      this unnecessary overhead:
      
        40.30%  [kernel]  [k] xfs_dir3_leaf_check_int
        10.98%  [kernel]  [k] __xfs_dir3_data_check
         8.10%  [kernel]  [k] xfs_verify_dir_ino
         4.42%  [kernel]  [k] memcpy
         2.22%  [kernel]  [k] xfs_dir2_data_get_ftype
         1.52%  [kernel]  [k] do_raw_spin_lock
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      39d3c0b5
    • D
      xfs: type verification is expensive · ec08c14b
      Dave Chinner 提交于
      From a concurrent rm -rf workload:
      
        41.04%  [kernel]  [k] xfs_dir3_leaf_check_int
         9.85%  [kernel]  [k] __xfs_dir3_data_check
         5.60%  [kernel]  [k] xfs_verify_ino
         5.32%  [kernel]  [k] xfs_agino_range
         4.21%  [kernel]  [k] memcpy
         3.06%  [kernel]  [k] xfs_errortag_test
         2.57%  [kernel]  [k] xfs_dir_ino_validate
         1.66%  [kernel]  [k] xfs_dir2_data_get_ftype
         1.17%  [kernel]  [k] do_raw_spin_lock
         1.11%  [kernel]  [k] xfs_verify_dir_ino
         0.84%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
         0.83%  [kernel]  [k] xfs_buf_find
         0.64%  [kernel]  [k] xfs_log_commit_cil
      
      THere's an awful lot of overhead in just range checking inode
      numbers in that, but each inode number check is not a lot of code.
      The total is a bit over 14.5% of the CPU time is spent validating
      inode numbers.
      
      The problem is that they deeply nested global scope functions so the
      overhead here is all in function call marshalling.
      
         text	   data	    bss	    dec	    hex	filename
         2077	      0	      0	   2077	    81d fs/xfs/libxfs/xfs_types.o.orig
         2197	      0	      0	   2197	    895	fs/xfs/libxfs/xfs_types.o
      
      There's a small increase in binary size by inlining all the local
      nested calls in the verifier functions, but the same workload now
      profiles as:
      
        40.69%  [kernel]  [k] xfs_dir3_leaf_check_int
        10.52%  [kernel]  [k] __xfs_dir3_data_check
         6.68%  [kernel]  [k] xfs_verify_dir_ino
         4.22%  [kernel]  [k] xfs_errortag_test
         4.15%  [kernel]  [k] memcpy
         3.53%  [kernel]  [k] xfs_dir_ino_validate
         1.87%  [kernel]  [k] xfs_dir2_data_get_ftype
         1.37%  [kernel]  [k] do_raw_spin_lock
         0.98%  [kernel]  [k] xfs_buf_find
         0.94%  [kernel]  [k] __raw_callee_save___pv_queued_spin_unlock
         0.73%  [kernel]  [k] xfs_log_commit_cil
      
      Now we only spend just over 10% of the time validing inode numbers
      for the same workload. Hence a few "inline" keyworks is good enough
      to reduce the validation overhead by 30%...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ec08c14b
    • D
      xfs: initialise attr fork on inode create · e6a688c3
      Dave Chinner 提交于
      When we allocate a new inode, we often need to add an attribute to
      the inode as part of the create. This can happen as a result of
      needing to add default ACLs or security labels before the inode is
      made visible to userspace.
      
      This is highly inefficient right now. We do the create transaction
      to allocate the inode, then we do an "add attr fork" transaction to
      modify the just created empty inode to set the inode fork offset to
      allow attributes to be stored, then we go and do the attribute
      creation.
      
      This means 3 transactions instead of 1 to allocate an inode, and
      this greatly increases the load on the CIL commit code, resulting in
      excessive contention on the CIL spin locks and performance
      degradation:
      
       18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
        3.57%  [kernel]                [k] do_raw_spin_lock
        2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
        2.48%  [kernel]                [k] memcpy
        2.34%  [kernel]                [k] xfs_log_commit_cil
      
      The typical profile resulting from running fsmark on a selinux enabled
      filesytem is adds this overhead to the create path:
      
        - 15.30% xfs_init_security
           - 15.23% security_inode_init_security
      	- 13.05% xfs_initxattrs
      	   - 12.94% xfs_attr_set
      	      - 6.75% xfs_bmap_add_attrfork
      		 - 5.51% xfs_trans_commit
      		    - 5.48% __xfs_trans_commit
      		       - 5.35% xfs_log_commit_cil
      			  - 3.86% _raw_spin_lock
      			     - do_raw_spin_lock
      				  __pv_queued_spin_lock_slowpath
      		 - 0.70% xfs_trans_alloc
      		      0.52% xfs_trans_reserve
      	      - 5.41% xfs_attr_set_args
      		 - 5.39% xfs_attr_set_shortform.constprop.0
      		    - 4.46% xfs_trans_commit
      		       - 4.46% __xfs_trans_commit
      			  - 4.33% xfs_log_commit_cil
      			     - 2.74% _raw_spin_lock
      				- do_raw_spin_lock
      				     __pv_queued_spin_lock_slowpath
      			       0.60% xfs_inode_item_format
      		      0.90% xfs_attr_try_sf_addname
      	- 1.99% selinux_inode_init_security
      	   - 1.02% security_sid_to_context_force
      	      - 1.00% security_sid_to_context_core
      		 - 0.92% sidtab_entry_to_string
      		    - 0.90% sidtab_sid2str_get
      			 0.59% sidtab_sid2str_put.part.0
      	   - 0.82% selinux_determine_inode_label
      	      - 0.77% security_transition_sid
      		   0.70% security_compute_sid.part.0
      
      And fsmark creation rate performance drops by ~25%. The key point to
      note here is that half the additional overhead comes from adding the
      attribute fork to the newly created inode. That's crazy, considering
      we can do this same thing at inode create time with a couple of
      lines of code and no extra overhead.
      
      So, if we know we are going to add an attribute immediately after
      creating the inode, let's just initialise the attribute fork inside
      the create transaction and chop that whole chunk of code out of
      the create fast path. This completely removes the performance
      drop caused by enabling SELinux, and the profile looks like:
      
           - 8.99% xfs_init_security
               - 9.00% security_inode_init_security
                  - 6.43% xfs_initxattrs
                     - 6.37% xfs_attr_set
                        - 5.45% xfs_attr_set_args
                           - 5.42% xfs_attr_set_shortform.constprop.0
                              - 4.51% xfs_trans_commit
                                 - 4.54% __xfs_trans_commit
                                    - 4.59% xfs_log_commit_cil
                                       - 2.67% _raw_spin_lock
                                          - 3.28% do_raw_spin_lock
                                               3.08% __pv_queued_spin_lock_slowpath
                                         0.66% xfs_inode_item_format
                              - 0.90% xfs_attr_try_sf_addname
                        - 0.60% xfs_trans_alloc
                  - 2.35% selinux_inode_init_security
                     - 1.25% security_sid_to_context_force
                        - 1.21% security_sid_to_context_core
                           - 1.19% sidtab_entry_to_string
                              - 1.20% sidtab_sid2str_get
                                 - 0.86% sidtab_sid2str_put.part.0
                                    - 0.62% _raw_spin_lock_irqsave
                                       - 0.77% do_raw_spin_lock
                                            __pv_queued_spin_lock_slowpath
                     - 0.84% selinux_determine_inode_label
                        - 0.83% security_transition_sid
                             0.86% security_compute_sid.part.0
      
      Which indicates the XFS overhead of creating the selinux xattr has
      been halved. This doesn't fix the CIL lock contention problem, just
      means it's not a limiting factor for this workload. Lock contention
      in the security subsystems is going to be an issue soon, though...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [djwong: fix compilation error when CONFIG_SECURITY=n]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NGao Xiang <hsiangkao@redhat.com>
      e6a688c3
    • D
      xfs: prevent metadata files from being inactivated · 383e32b0
      Darrick J. Wong 提交于
      Files containing metadata (quota records, rt bitmap and summary info)
      are fully managed by the filesystem, which means that all resource
      cleanup must be explicit, not automatic.  This means that they should
      never be subjected automatic to post-eof truncation, nor should they be
      freed automatically even if the link count drops to zero.
      
      In other words, xfs_inactive() should leave these files alone.  Add the
      necessary predicate functions to make this happen.  This adds a second
      layer of prevention for the kinds of fs corruption that was fixed by
      commit f4c32e87.  If we ever decide to support removing metadata
      files, we should make all those metadata updates explicit.
      
      Rearrange the order of #includes to fix compiler errors, since
      xfs_mount.h is supposed to be included before xfs_inode.h
      
      Followup-to: f4c32e87 ("xfs: fix realtime bitmap/summary file truncation when growing rt volume")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      383e32b0
    • D
      xfs: validate ag btree levels using the precomputed values · 973975b7
      Darrick J. Wong 提交于
      Use the AG btree height limits that we precomputed into the xfs_mount to
      validate the AG headers instead of using XFS_BTREE_MAXLEVELS.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      973975b7
  5. 26 2月, 2021 1 次提交
  6. 12 2月, 2021 1 次提交
    • B
      xfs: consider shutdown in bmapbt cursor delete assert · 1cd738b1
      Brian Foster 提交于
      The assert in xfs_btree_del_cursor() checks that the bmapbt block
      allocation field has been handled correctly before the cursor is
      freed. This field is used for accurate calculation of indirect block
      reservation requirements (for delayed allocations), for example.
      generic/019 reproduces a scenario where this assert fails because
      the filesystem has shutdown while in the middle of a bmbt record
      insertion. This occurs after a bmbt block has been allocated via the
      cursor but before the higher level bmap function (i.e.
      xfs_bmap_add_extent_hole_real()) completes and resets the field.
      
      Update the assert to accommodate the transient state if the
      filesystem has shutdown. While here, clean up the indentation and
      comments in the function.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      1cd738b1
  7. 04 2月, 2021 5 次提交