1. 13 7月, 2022 1 次提交
  2. 13 4月, 2022 1 次提交
    • C
      xfs: Directory's data fork extent counter can never overflow · 83a21c18
      Chandan Babu R 提交于
      The maximum file size that can be represented by the data fork extent counter
      in the worst case occurs when all extents are 1 block in length and each block
      is 1KB in size.
      
      With XFS_MAX_EXTCNT_DATA_FORK_SMALL representing maximum extent count and with
      1KB sized blocks, a file can reach upto,
      (2^31) * 1KB = 2TB
      
      This is much larger than the theoretical maximum size of a directory
      i.e. XFS_DIR2_SPACE_SIZE * 3 = ~96GB.
      
      Since a directory's inode can never overflow its data fork extent counter,
      this commit removes all the overflow checks associated with
      it. xfs_dinode_verify() now performs a rough check to verify if a diretory's
      data fork is larger than 96GB.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NChandan Babu R <chandan.babu@oracle.com>
      83a21c18
  3. 22 12月, 2021 1 次提交
    • D
      xfs: don't expose internal symlink metadata buffers to the vfs · 7b7820b8
      Darrick J. Wong 提交于
      Ian Kent reported that for inline symlinks, it's possible for
      vfs_readlink to hang on to the target buffer returned by
      _vn_get_link_inline long after it's been freed by xfs inode reclaim.
      This is a layering violation -- we should never expose XFS internals to
      the VFS.
      
      When the symlink has a remote target, we allocate a separate buffer,
      copy the internal information, and let the VFS manage the new buffer's
      lifetime.  Let's adapt the inline code paths to do this too.  It's
      less efficient, but fixes the layering violation and avoids the need to
      adapt the if_data lifetime to rcu rules.  Clearly I don't care about
      readlink benchmarks.
      
      As a side note, this fixes the minor locking violation where we can
      access the inode data fork without taking any locks; proper locking (and
      eliminating the possibility of having to switch inode_operations on a
      live inode) is essential to online repair coordinating repairs
      correctly.
      Reported-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7b7820b8
  4. 05 12月, 2021 1 次提交
  5. 20 8月, 2021 3 次提交
  6. 02 6月, 2021 1 次提交
  7. 16 4月, 2021 2 次提交
  8. 08 4月, 2021 2 次提交
  9. 26 3月, 2021 1 次提交
    • D
      xfs: initialise attr fork on inode create · e6a688c3
      Dave Chinner 提交于
      When we allocate a new inode, we often need to add an attribute to
      the inode as part of the create. This can happen as a result of
      needing to add default ACLs or security labels before the inode is
      made visible to userspace.
      
      This is highly inefficient right now. We do the create transaction
      to allocate the inode, then we do an "add attr fork" transaction to
      modify the just created empty inode to set the inode fork offset to
      allow attributes to be stored, then we go and do the attribute
      creation.
      
      This means 3 transactions instead of 1 to allocate an inode, and
      this greatly increases the load on the CIL commit code, resulting in
      excessive contention on the CIL spin locks and performance
      degradation:
      
       18.99%  [kernel]                [k] __pv_queued_spin_lock_slowpath
        3.57%  [kernel]                [k] do_raw_spin_lock
        2.51%  [kernel]                [k] __raw_callee_save___pv_queued_spin_unlock
        2.48%  [kernel]                [k] memcpy
        2.34%  [kernel]                [k] xfs_log_commit_cil
      
      The typical profile resulting from running fsmark on a selinux enabled
      filesytem is adds this overhead to the create path:
      
        - 15.30% xfs_init_security
           - 15.23% security_inode_init_security
      	- 13.05% xfs_initxattrs
      	   - 12.94% xfs_attr_set
      	      - 6.75% xfs_bmap_add_attrfork
      		 - 5.51% xfs_trans_commit
      		    - 5.48% __xfs_trans_commit
      		       - 5.35% xfs_log_commit_cil
      			  - 3.86% _raw_spin_lock
      			     - do_raw_spin_lock
      				  __pv_queued_spin_lock_slowpath
      		 - 0.70% xfs_trans_alloc
      		      0.52% xfs_trans_reserve
      	      - 5.41% xfs_attr_set_args
      		 - 5.39% xfs_attr_set_shortform.constprop.0
      		    - 4.46% xfs_trans_commit
      		       - 4.46% __xfs_trans_commit
      			  - 4.33% xfs_log_commit_cil
      			     - 2.74% _raw_spin_lock
      				- do_raw_spin_lock
      				     __pv_queued_spin_lock_slowpath
      			       0.60% xfs_inode_item_format
      		      0.90% xfs_attr_try_sf_addname
      	- 1.99% selinux_inode_init_security
      	   - 1.02% security_sid_to_context_force
      	      - 1.00% security_sid_to_context_core
      		 - 0.92% sidtab_entry_to_string
      		    - 0.90% sidtab_sid2str_get
      			 0.59% sidtab_sid2str_put.part.0
      	   - 0.82% selinux_determine_inode_label
      	      - 0.77% security_transition_sid
      		   0.70% security_compute_sid.part.0
      
      And fsmark creation rate performance drops by ~25%. The key point to
      note here is that half the additional overhead comes from adding the
      attribute fork to the newly created inode. That's crazy, considering
      we can do this same thing at inode create time with a couple of
      lines of code and no extra overhead.
      
      So, if we know we are going to add an attribute immediately after
      creating the inode, let's just initialise the attribute fork inside
      the create transaction and chop that whole chunk of code out of
      the create fast path. This completely removes the performance
      drop caused by enabling SELinux, and the profile looks like:
      
           - 8.99% xfs_init_security
               - 9.00% security_inode_init_security
                  - 6.43% xfs_initxattrs
                     - 6.37% xfs_attr_set
                        - 5.45% xfs_attr_set_args
                           - 5.42% xfs_attr_set_shortform.constprop.0
                              - 4.51% xfs_trans_commit
                                 - 4.54% __xfs_trans_commit
                                    - 4.59% xfs_log_commit_cil
                                       - 2.67% _raw_spin_lock
                                          - 3.28% do_raw_spin_lock
                                               3.08% __pv_queued_spin_lock_slowpath
                                         0.66% xfs_inode_item_format
                              - 0.90% xfs_attr_try_sf_addname
                        - 0.60% xfs_trans_alloc
                  - 2.35% selinux_inode_init_security
                     - 1.25% security_sid_to_context_force
                        - 1.21% security_sid_to_context_core
                           - 1.19% sidtab_entry_to_string
                              - 1.20% sidtab_sid2str_get
                                 - 0.86% sidtab_sid2str_put.part.0
                                    - 0.62% _raw_spin_lock_irqsave
                                       - 0.77% do_raw_spin_lock
                                            __pv_queued_spin_lock_slowpath
                     - 0.84% selinux_determine_inode_label
                        - 0.83% security_transition_sid
                             0.86% security_compute_sid.part.0
      
      Which indicates the XFS overhead of creating the selinux xattr has
      been halved. This doesn't fix the CIL lock contention problem, just
      means it's not a limiting factor for this workload. Lock contention
      in the security subsystems is going to be an issue soon, though...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [djwong: fix compilation error when CONFIG_SECURITY=n]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NGao Xiang <hsiangkao@redhat.com>
      e6a688c3
  10. 23 3月, 2021 1 次提交
    • C
      fs: document and rename fsid helpers · a65e58e7
      Christian Brauner 提交于
      Vivek pointed out that the fs{g,u}id_into_mnt() naming scheme can be
      misleading as it could be understood as implying they do the exact same
      thing as i_{g,u}id_into_mnt(). The original motivation for this naming
      scheme was to signal to callers that the helpers will always take care
      to map the k{g,u}id such that the ownership is expressed in terms of the
      mnt_users.
      Get rid of the confusion by renaming those helpers to something more
      sensible. Al suggested mapped_fs{g,u}id() which seems a really good fit.
      Usually filesystems don't need to bother with these helpers directly
      only in some cases where they allocate objects that carry {g,u}ids which
      are either filesystem specific (e.g. xfs quota objects) or don't have a
      clean set of helpers as inodes have.
      
      Link: https://lore.kernel.org/r/20210320122623.599086-3-christian.brauner@ubuntu.comInspired-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      a65e58e7
  11. 10 3月, 2021 1 次提交
  12. 04 2月, 2021 2 次提交
  13. 24 1月, 2021 1 次提交
  14. 23 1月, 2021 2 次提交
  15. 17 12月, 2020 1 次提交
  16. 20 5月, 2020 2 次提交
  17. 05 5月, 2020 1 次提交
  18. 31 3月, 2020 1 次提交
  19. 19 3月, 2020 1 次提交
  20. 03 3月, 2020 1 次提交
  21. 27 1月, 2020 3 次提交
  22. 08 11月, 2019 1 次提交
  23. 29 6月, 2019 1 次提交
  24. 13 12月, 2018 1 次提交
    • D
      xfs: zero length symlinks are not valid · 43feeea8
      Dave Chinner 提交于
      A log recovery failure has been reproduced where a symlink inode has
      a zero length in extent form. It was caused by a shutdown during a
      combined fstress+fsmark workload.
      
      The underlying problem is the issue in xfs_inactive_symlink(): the
      inode is unlocked between the symlink inactivation/truncation and
      the inode being freed. This opens a window for the inode to be
      written to disk before it xfs_ifree() removes it from the unlinked
      list, marks it free in the inobt and zeros the mode.
      
      For shortform inodes, the fix is simple. xfs_ifree() clears the data
      fork state, so there's no need to do it in xfs_inactive_symlink().
      This means the shortform fork verifier will not see a zero length
      data fork as it mirrors the inode size through to xfs_ifree()), and
      hence if the inode gets written back and the fork verifiers are run
      they will still see a fork that matches the on-disk inode size.
      
      For extent form (remote) symlinks, it is a little more tricky. Here
      we explicitly set the inode size to zero, so the above race can lead
      to zero length symlinks on disk. Because the inode is unlinked at
      this point (i.e. on the unlinked list) and unreferenced, it can
      never be seen again by a user. Hence when we set the inode size to
      zeor, also change the type to S_IFREG. xfs_ifree() expects S_IFREG
      inodes to be of zero length, and so this avoids all the problems of
      zero length symlinks ever hitting the disk. It also avoids the
      problem of needing to handle zero length symlink inodes in log
      recovery to replay the extent free intents and the remaining
      deferops to free the extents the symlink used.
      
      Also add a couple of asserts to warn us if zero length symlinks end
      up in either the symlink create or inactivation paths.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      43feeea8
  25. 03 8月, 2018 1 次提交
  26. 27 7月, 2018 2 次提交
  27. 12 7月, 2018 4 次提交