1. 31 10月, 2013 3 次提交
    • D
      xfs: prevent stack overflows from page cache allocation · ad22c7a0
      Dave Chinner 提交于
      Page cache allocation doesn't always go through ->begin_write and
      hence we don't always get the opportunity to set the allocation
      context to GFP_NOFS. Failing to do this means we open up the direct
      relcaim stack to recurse into the filesystem and consume a
      significant amount of stack.
      
      On RHEL6.4 kernels we are seeing ra_submit() and
      generic_file_splice_read() from an nfsd context recursing into the
      filesystem via the inode cache shrinker and evicting inodes. This is
      causing truncation to be run (e.g EOF block freeing) and causing
      bmap btree block merges and free space btree block splits to occur.
      These btree manipulations are occurring with the call chain already
      30 functions deep and hence there is not enough stack space to
      complete such operations.
      
      To avoid these specific overruns, we need to prevent the page cache
      allocation from recursing via direct reclaim. We can do that because
      the allocation functions take the allocation context from that which
      is stored in the mapping for the inode. We don't set that right now,
      so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
      GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
      when we initialise an inode, set the mapping gfp mask appropriately.
      
      This makes the use of AOP_FLAG_NOFS redundant from other parts of
      the XFS IO path, so get rid of it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad22c7a0
    • D
      xfs: vectorise DA btree operations · 4bceb18f
      Dave Chinner 提交于
      The remaining non-vectorised code for the directory structure is the
      node format blocks. This is shared with the attribute tree, and so
      is slightly more complex to vectorise.
      
      Introduce a "non-directory" directory ops structure that is attached
      to all non-directory inodes so that attribute operations can be
      vectorised for all inodes.
      
      Once we do this, we can vectorise all the da btree operations.
      Because this patch adds more infrastructure than it removes the
      binary size does not decrease:
      
         text    data     bss     dec     hex filename
       794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
       792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
       792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
       789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
       789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
       789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
       789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bceb18f
    • D
      xfs: abstract the differences in dir2/dir3 via an ops vector · 32c5483a
      Dave Chinner 提交于
      Lots of the dir code now goes through switches to determine what is
      the correct on-disk format to parse. It generally involves a
      "xfs_sbversion_hasfoo" check, deferencing the superblock version and
      feature fields and hence touching several cache lines per operation
      in the process. Some operations do multiple checks because they nest
      conditional operations and they don't pass the information in a
      direct fashion between each other.
      
      Hence, add an ops vector to the xfs_inode structure that is
      configured when the inode is initialised to point to all the correct
      decode and encoding operations.  This will significantly reduce the
      branchiness and cacheline footprint of the directory object decoding
      and encoding.
      
      This is the first patch in a series of conversion patches. It will
      introduce the ops structure, the setup of it and add the first
      operation to the vector. Subsequent patches will convert directory
      ops one at a time to keep the changes simple and obvious.
      
      Just this patch shows the benefit of such an approach on code size.
      Just converting the two shortform dir operations as this patch does
      decreases the built binary size by ~1500 bytes:
      
      $ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1
         text    data     bss     dec     hex filename
       794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
       792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
      $
      
      That's a significant decrease in the instruction cache footprint of
      the directory code for such a simple change, and indicates that this
      approach is definitely worth pursuing further.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32c5483a
  2. 24 10月, 2013 4 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  3. 22 10月, 2013 1 次提交
  4. 22 8月, 2013 1 次提交
    • D
      xfs: Add read-only support for dirent filetype field · 0cb97766
      Dave Chinner 提交于
      Add support for the file type field in directory entries so that
      readdir can return the type of the inode the dirent points to to
      userspace without first having to read the inode off disk.
      
      The encoding of the type field is a single byte that is added to the
      end of the directory entry name length. For all intents and
      purposes, it appends a "hidden" byte to the name field which
      contains the type information. As the directory entry is already of
      dynamic size, helpers are already required to access and decode the
      direct entry structures.
      
      Hence the relevent extraction and iteration helpers are updated to
      understand the hidden byte.  Helpers for reading and writing the
      filetype field from the directory entries are also added. Only the
      read helpers are used by this patch.  It also adds all the code
      necessary to read the type information out of the dirents on disk.
      
      Further we add the superblock feature bit and helpers to indicate
      that we understand the on-disk format change. This is not a
      compatible change - existing kernels cannot read the new format
      successfully - so an incompatible feature flag is added. We don't
      yet allow filesystems to mount with this flag yet - that will be
      added once write support is added.
      
      Finally, the code to take the type from the VFS, convert it to an
      XFS on-disk type and put it into the xfs_name structures passed
      around is added, but the directory code does not use this field yet.
      That will be in the next patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0cb97766
  5. 16 8月, 2013 1 次提交
  6. 13 8月, 2013 5 次提交
  7. 11 7月, 2013 1 次提交
    • C
      xfs: Add pquota fields where gquota is used. · 92f8ff73
      Chandra Seetharaman 提交于
      Add project quota changes to all the places where group quota field
      is used:
         * add separate project quota members into various structures
         * split project quota and group quotas so that instead of overriding
           the group quota members incore, the new project quota members are
           used instead
         * get rid of usage of the OQUOTA flag incore, in favor of separate
           group and project quota flags.
         * add a project dquot argument to various functions.
      
      Not using the pquotino field from superblock yet.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      92f8ff73
  8. 10 7月, 2013 1 次提交
    • C
      xfs: fix sgid inheritance for subdirectories inheriting default acls [V3] · 42c49d7f
      Carlos Maiolino 提交于
      XFS removes sgid bits of subdirectories under a directory containing a default
      acl.
      
      When a default acl is set, it implies xfs to call xfs_setattr_nonsize() in its
      code path. Such function is shared among mkdir and chmod system calls, and
      does some checks unneeded by mkdir (calling inode_change_ok()). Such checks
      remove sgid bit from the inode after it has been granted.
      
      With this patch, we extend the meaning of XFS_ATTR_NOACL flag to avoid these
      checks when acls are being inherited (thanks hch).
      
      Also, xfs_setattr_mode, doesn't need to re-check for group id and capabilities
      permissions, this only implies in another try to remove sgid bit from the
      directories. Such check is already done either on inode_change_ok() or
      xfs_setattr_nonsize().
      
      Changelog:
      
      V2: Extends the meaning of XFS_ATTR_NOACL instead of wrap the tests into another
          function
      
      V3: Remove S_ISDIR check in xfs_setattr_nonsize() from the patch
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      42c49d7f
  9. 20 6月, 2013 1 次提交
  10. 31 5月, 2013 2 次提交
  11. 15 11月, 2012 1 次提交
    • D
      xfs: remove xfs_flush_pages · 4bc1ea6b
      Dave Chinner 提交于
      It is a complex wrapper around VFS functions, but there are VFS
      functions that provide exactly the same functionality. Call the VFS
      functions directly and remove the unnecessary indirection and
      complexity.
      
      We don't need to care about clearing the XFS_ITRUNCATED flag, as
      that is done during .writepages. Hence is cleared by the VFS
      writeback path if there is anything to write back during the flush.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAndrew Dahl <adahl@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bc1ea6b
  12. 09 11月, 2012 1 次提交
    • B
      xfs: add EOFBLOCKS inode tagging/untagging · 27b52867
      Brian Foster 提交于
      Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with
      speculatively preallocated blocks beyond EOF. An inode is tagged
      when speculative preallocation occurs and untagged either via
      truncate down or when post-EOF blocks are freed via release or
      reclaim.
      
      The tag management is intentionally not aggressive to prefer
      simplicity over the complexity of handling all the corner cases
      under which post-EOF blocks could be freed (i.e., forward
      truncation, fallocate, write error conditions, etc.). This means
      that a tagged inode may or may not have post-EOF blocks after a
      period of time. The tag is eventually cleared when the inode is
      released or reclaimed.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      27b52867
  13. 22 7月, 2012 1 次提交
  14. 14 7月, 2012 2 次提交
  15. 15 5月, 2012 5 次提交
  16. 15 3月, 2012 1 次提交
  17. 14 3月, 2012 1 次提交
  18. 18 1月, 2012 1 次提交
    • C
      xfs: remove the i_size field in struct xfs_inode · ce7ae151
      Christoph Hellwig 提交于
      There is no fundamental need to keep an in-memory inode size copy in the XFS
      inode.  We already have the on-disk value in the dinode, and the separate
      in-memory copy that we need for regular files only in the XFS inode.
      
      Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the
      VFS inode i_size field for regular files.  Switch code that was directly
      accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where
      we are limited to regular files direct access of the VFS inode i_size field.
      
      This also allows dropping some fairly complicated code in the write path
      which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size
      that is getting updated inside ->write_end.
      
      Note that we do not bother resetting the VFS i_size when truncating a file
      that gets freed to zero as there is no point in doing so because the VFS inode
      is no longer in use at this point.  Just relax the assert in xfs_ifree to
      only check the on-disk size instead.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ce7ae151
  19. 14 1月, 2012 1 次提交
    • C
      xfs: remove xfs_itruncate_data · 673e8e59
      Christoph Hellwig 提交于
      This wrapper isn't overly useful, not to say rather confusing.
      
      Around the call to xfs_itruncate_extents it does:
      
       - add tracing
       - add a few asserts in debug builds
       - conditionally update the inode size in two places
       - log the inode
      
      Both the tracing and the inode logging can be moved to xfs_itruncate_extents
      as they are useful for the attribute fork as well - in fact the attr code
      already does an equivalent xfs_trans_log_inode call just after calling
      xfs_itruncate_extents.  The conditional size updates are a mess, and there
      was no reason to do them in two places anyway, as the first one was
      conditional on the inode having extents - but without extents we
      xfs_itruncate_extents would be a no-op and the placement wouldn't matter
      anyway.  Instead move the size assignments and the asserts that make sense
      to the callers that want it.
      
      As a side effect of this clean up xfs_setattr_size by introducing variables
      for the old and new inode size, and moving the size updates into a common
      place.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      673e8e59
  20. 04 1月, 2012 4 次提交
  21. 02 11月, 2011 1 次提交
  22. 12 10月, 2011 1 次提交
    • C
      xfs: simplify xfs_trans_ijoin* again · ddc3415a
      Christoph Hellwig 提交于
      There is no reason to keep a reference to the inode even if we unlock
      it during transaction commit because we never drop a reference between
      the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
      back into xfs_trans_ijoin - the third argument decides if an unlock
      is needed now.
      
      I'm actually starting to wonder if allowing inodes to be unlocked
      at transaction commit really is worth the effort.  The only real
      benefit is that they can be unlocked earlier when commiting a
      synchronous transactions, but that could be solved by doing the
      log force manually after the unlock, too.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      
      ddc3415a