1. 04 6月, 2015 1 次提交
    • C
      xfs: remove the flags argument to xfs_trans_cancel · 4906e215
      Christoph Hellwig 提交于
      xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
      XFS_TRANS_ABORT.  Both of them are a direct product of the transaction
      state, and can be deducted:
      
       - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
         and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
       - any transaction with a permanent log reservation needs
         XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
         XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
         log reservation is invalid.
      
      So just remove the flags argument and do the right thing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4906e215
  2. 16 4月, 2015 1 次提交
  3. 13 4月, 2015 1 次提交
  4. 25 3月, 2015 1 次提交
    • D
      xfs: add RENAME_WHITEOUT support · 7dcf5c3e
      Dave Chinner 提交于
      Whiteouts are used by overlayfs -  it has a crazy convention that a
      whiteout is a character device inode with a major:minor of 0:0.
      Because it's not documented anywhere, here's an example of what
      RENAME_WHITEOUT does on ext4:
      
      # echo foo > /mnt/scratch/foo
      # echo bar > /mnt/scratch/bar
      # ls -l /mnt/scratch
      total 24
      -rw-r--r-- 1 root root     4 Feb 11 20:22 bar
      -rw-r--r-- 1 root root     4 Feb 11 20:22 foo
      drwx------ 2 root root 16384 Feb 11 20:18 lost+found
      # src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar
      # ls -l /mnt/scratch
      total 20
      -rw-r--r-- 1 root root     4 Feb 11 20:22 bar
      c--------- 1 root root  0, 0 Feb 11 20:23 foo
      drwx------ 2 root root 16384 Feb 11 20:18 lost+found
      # cat /mnt/scratch/bar
      foo
      #
      
      In XFS rename terms, the operation that has been done is that source
      (foo) has been moved to the target (bar), which is like a nomal
      rename operation, but rather than the source being removed, it have
      been replaced with a whiteout.
      
      We can't allocate whiteout inodes within the rename transaction due
      to allocation being a multi-commit transaction: rename needs to
      be a single, atomic commit. Hence we have several options here, form
      most efficient to least efficient:
      
          - use DT_WHT in the target dirent and do no whiteout inode
            allocation.  The main issue with this approach is that we need
            hooks in lookup to create a virtual chardev inode to present
            to userspace and in places where we might need to modify the
            dirent e.g. unlink.  Overlayfs also needs to be taught about
            DT_WHT. Most invasive change, lowest overhead.
      
          - create a special whiteout inode in the root directory (e.g. a
            ".wino" dirent) and then hardlink every new whiteout to it.
            This means we only need to create a single whiteout inode, and
            rename simply creates a hardlink to it. We can use DT_WHT for
            these, though using DT_CHR means we won't have to modify
            overlayfs, nor anything in userspace. Downside is we have to
            look up the whiteout inode on every operation and create it if
            it doesn't exist.
      
          - copy ext4: create a special whiteout chardev inode for every
            whiteout.  This is more complex than the above options because
            of the lack of atomicity between inode creation and the rename
            operation, requiring us to create a tmpfile inode and then
            linking it into the directory structure during the rename. At
            least with a tmpfile inode crashes between the create and
            rename doesn't leave unreferenced inodes or directory
            pollution around.
      
      By far the simplest thing to do in the short term is to copy ext4.
      While it is the most inefficient way of supporting whiteouts, but as
      an initial implementation we can simply reuse existing functions and
      add a small amount of extra code the the rename operation.
      
      When we get full whiteout support in the VFS (via the dentry cache)
      we can then look to supporting DT_WHT method outlined as the first
      method of supporting whiteouts. But until then, we'll stick with
      what overlayfs expects us to be: dumb and stupid.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      7dcf5c3e
  5. 23 2月, 2015 4 次提交
    • D
      xfs: inodes are new until the dentry cache is set up · 58c90473
      Dave Chinner 提交于
      Al Viro noticed a generic set of issues to do with filehandle lookup
      racing with dentry cache setup. They involve a filehandle lookup
      occurring while an inode is being created and the filehandle lookup
      racing with the dentry creation for the real file. This can lead to
      multiple dentries for the one path being instantiated. There are a
      host of other issues around this same set of paths.
      
      The underlying cause is that file handle lookup only waits on inode
      cache instantiation rather than full dentry cache instantiation. XFS
      is mostly immune to the problems discovered due to it's own internal
      inode cache, but there are a couple of corner cases where races can
      happen.
      
      We currently clear the XFS_INEW flag when the inode is fully set up
      after insertion into the cache. Newly allocated inodes are inserted
      locked and so aren't usable until the allocation transaction
      commits. This, however, occurs before the dentry and security
      information is fully initialised and hence the inode is unlocked and
      available for lookups to find too early.
      
      To solve the problem, only clear the XFS_INEW flag for newly created
      inodes once the dentry is fully instantiated. This means lookups
      will retry until the XFS_INEW flag is removed from the inode and
      hence avoids the race conditions in questions.
      
      THis also means that xfs_create(), xfs_create_tmpfile() and
      xfs_symlink() need to finish the setup of the inode in their error
      paths if we had allocated the inode but failed later in the creation
      process. xfs_symlink(), in particular, needed a lot of help to make
      it's error handling match that of xfs_create().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      58c90473
    • D
      xfs: ensure truncate forces zeroed blocks to disk · 5885ebda
      Dave Chinner 提交于
      A new fsync vs power fail test in xfstests indicated that XFS can
      have unreliable data consistency when doing extending truncates that
      require block zeroing. The blocks beyond EOF get zeroed in memory,
      but we never force those changes to disk before we run the
      transaction that extends the file size and exposes those blocks to
      userspace. This can result in the blocks not being correctly zeroed
      after a crash.
      
      Because in-memory behaviour is correct, tools like fsx don't pick up
      any coherency problems - it's not until the filesystem is shutdown
      or the system crashes after writing the truncate transaction to the
      journal but before the zeroed data in the page cache is flushed that
      the issue is exposed.
      
      Fix this by also flushing the dirty data in memory region between
      the old size and new size when we've found blocks that need zeroing
      in the truncate process.
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5885ebda
    • D
      xfs: xfs_setattr_size no longer races with page faults · 0f9160b4
      Dave Chinner 提交于
      Now that truncate locks out new page faults, we no longer need to do
      special writeback hacks in truncate to work around potential races
      between page faults, page cache truncation and file size updates to
      ensure we get write page faults for extending truncates on sub-page
      block size filesystems. Hence we can remove the code in
      xfs_setattr_size() that handles this and update the comments around
      the code tha thandles page cache truncate and size updates to
      reflect the new reality.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      0f9160b4
    • D
      xfs: take i_mmap_lock on extent manipulation operations · e8e9ad42
      Dave Chinner 提交于
      Now we have the i_mmap_lock being held across the page fault IO
      path, we now add extent manipulation operation exclusion by adding
      the lock to the paths that directly modify extent maps. This
      includes truncate, hole punching and other fallocate based
      operations. The operations will now take both the i_iolock and the
      i_mmaplock in exclusive mode, thereby ensuring that all IO and page
      faults block without holding any page locks while the extent
      manipulation is in progress.
      
      This gives us the lock order during truncate of i_iolock ->
      i_mmaplock -> page_lock -> i_lock, hence providing the same
      lock order as the iolock provides the normal IO path without
      involving the mmap_sem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e8e9ad42
  6. 16 2月, 2015 2 次提交
  7. 24 12月, 2014 2 次提交
  8. 04 12月, 2014 1 次提交
  9. 28 11月, 2014 3 次提交
  10. 23 9月, 2014 1 次提交
    • D
      xfs: flush entire last page of old EOF on truncate up · 2ebff7bb
      Dave Chinner 提交于
      On a sub-page sized filesystem, truncating a mapped region down
      leaves us in a world of hurt. We truncate the pagecache, zeroing the
      newly unused tail, then punch blocks out from under the page. If we
      then truncate the file back up immediately, we expose that unmapped
      hole to a dirty page mapped into the user application, and that's
      where it all goes wrong.
      
      In truncating the page cache, we avoid unmapping the tail page of
      the cache because it still contains valid data. The problem is that
      it also contains a hole after the truncate, but nobody told the mm
      subsystem that. Therefore, if the page is dirty before the truncate,
      we'll never get a .page_mkwrite callout after we extend the file and
      the application writes data into the hole on the page.  Hence when
      we come to writing that region of the page, it has no blocks and no
      delayed allocation reservation and hence we toss the data away.
      
      This patch adds code to the truncate up case to solve it, by
      ensuring the partial page at the old EOF is always cleaned after we
      do any zeroing and move the EOF upwards. We can't actually serialise
      the page writeback and truncate against page faults (yes, that
      problem AGAIN) so this is really just a best effort and assumes it
      is extremely unlikely that someone is concurrently writing to the
      page at the EOF while extending the file.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2ebff7bb
  11. 04 8月, 2014 1 次提交
  12. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  13. 22 6月, 2014 1 次提交
  14. 15 5月, 2014 2 次提交
  15. 07 5月, 2014 1 次提交
    • D
      xfs: truncate_setsize should be outside transactions · 49abc3a8
      Dave Chinner 提交于
      truncate_setsize() removes pages from the page cache, and hence
      requires page locks to be held. It is not valid to lock a page cache
      page inside a transaction context as we can hold page locks when we
      we reserve space for a transaction. If we do, then we expose an ABBA
      deadlock between log space reservation and page locks.
      
      That is, both the write path and writeback lock a page, then start a
      transaction for block allocation, which means they can block waiting
      for a log reservation with the page lock held. If we hold a log
      reservation and then do something that locks a page (e.g.
      truncate_setsize in xfs_setattr_size) then that page lock can block
      on the page locked and waiting for a log reservation. If the
      transaction that is waiting for the page lock is the only active
      transaction in the system that can free log space via a commit,
      then writeback will never make progress and so log space will never
      free up.
      
      This issue with xfs_setattr_size() was introduced back in 2010 by
      commit fa9b227e ("xfs: new truncate sequence") which moved the page
      cache truncate from outside the transaction context (what was
      xfs_itruncate_data()) to inside the transaction context as a call to
      truncate_setsize().
      
      The reason truncate_setsize() was located where in this place was
      that we can't shouldn't change the file size until after we are in
      the transaction context and the operation will either succeed or
      shut down the filesystem on failure. However, block_truncate_page()
      already modifies the file contents before we enter the transaction
      context, so we can't really fulfill this guarantee in any way. Hence
      we may as well ensure that on success or failure, the in-memory
      inode and data is truncated away and that the application cleans up
      the mess appropriately.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      49abc3a8
  16. 06 5月, 2014 1 次提交
    • B
      xfs: initialize default acls for ->tmpfile() · d540e43b
      Brian Foster 提交于
      The current tmpfile handler does not initialize default ACLs. Doing so
      within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
      which is already used as a common create handler.
      
      xfs_vn_mknod() does not currently have a mechanism to determine whether
      to link the file into the namespace. Therefore, further abstract
      xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
      parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
      on the dentry when called via ->tmpfile().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d540e43b
  17. 17 4月, 2014 1 次提交
    • B
      xfs: fix tmpfile/selinux deadlock and initialize security · 330033d6
      Brian Foster 提交于
      xfstests generic/004 reproduces an ilock deadlock using the tmpfile
      interface when selinux is enabled. This occurs because
      xfs_create_tmpfile() takes the ilock and then calls d_tmpfile(). The
      latter eventually calls into xfs_xattr_get() which attempts to get the
      lock again. E.g.:
      
      xfs_io          D ffffffff81c134c0  4096  3561   3560 0x00000080
      ffff8801176a1a68 0000000000000046 ffff8800b401b540 ffff8801176a1fd8
      00000000001d5800 00000000001d5800 ffff8800b401b540 ffff8800b401b540
      ffff8800b73a6bd0 fffffffeffffffff ffff8800b73a6bd8 ffff8800b5ddb480
      Call Trace:
      [<ffffffff8177f969>] schedule+0x29/0x70
      [<ffffffff81783a65>] rwsem_down_read_failed+0xc5/0x120
      [<ffffffffa05aa97f>] ? xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffff813b3434>] call_rwsem_down_read_failed+0x14/0x30
      [<ffffffff810ed179>] ? down_read_nested+0x89/0xa0
      [<ffffffffa05aa7f2>] ? xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa7f2>] xfs_ilock+0x122/0x250 [xfs]
      [<ffffffffa05aa97f>] xfs_ilock_attr_map_shared+0x1f/0x50 [xfs]
      [<ffffffffa05701d0>] xfs_attr_get+0x90/0xe0 [xfs]
      [<ffffffffa0565e07>] xfs_xattr_get+0x37/0x50 [xfs]
      [<ffffffff8124842f>] generic_getxattr+0x4f/0x70
      [<ffffffff8133fd9e>] inode_doinit_with_dentry+0x1ae/0x650
      [<ffffffff81340e0c>] selinux_d_instantiate+0x1c/0x20
      [<ffffffff813351bb>] security_d_instantiate+0x1b/0x30
      [<ffffffff81237db0>] d_instantiate+0x50/0x70
      [<ffffffff81237e85>] d_tmpfile+0xb5/0xc0
      [<ffffffffa05add02>] xfs_create_tmpfile+0x362/0x410 [xfs]
      [<ffffffffa0559ac8>] xfs_vn_tmpfile+0x18/0x20 [xfs]
      [<ffffffff81230388>] path_openat+0x228/0x6a0
      [<ffffffff810230f9>] ? sched_clock+0x9/0x10
      [<ffffffff8105a427>] ? kvm_clock_read+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8123101a>] do_filp_open+0x3a/0x90
      [<ffffffff817845e7>] ? _raw_spin_unlock+0x27/0x40
      [<ffffffff8124054f>] ? __alloc_fd+0xaf/0x1f0
      [<ffffffff8121e3ce>] do_sys_open+0x12e/0x210
      [<ffffffff8121e4ce>] SyS_open+0x1e/0x20
      [<ffffffff8178eda9>] system_call_fastpath+0x16/0x1b
      
      xfs_vn_tmpfile() also fails to initialize security on the newly created
      inode.
      
      Pull the d_tmpfile() call up into xfs_vn_tmpfile() after the transaction
      has been committed and the inode unlocked. Also, initialize security on
      the inode based on the parent directory provided via the tmpfile call.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      330033d6
  18. 27 2月, 2014 1 次提交
  19. 10 2月, 2014 1 次提交
  20. 26 1月, 2014 1 次提交
  21. 25 1月, 2014 1 次提交
  22. 07 1月, 2014 1 次提交
    • Z
      xfs: add O_TMPFILE support · 99b6436b
      Zhi Yong Wu 提交于
      Add two functions xfs_create_tmpfile() and xfs_vn_tmpfile()
      to support O_TMPFILE file creation.
      
      In contrast to xfs_create(), xfs_create_tmpfile() has a different
      log reservation to the regular file creation because there is no
      directory modification, and doesn't check if an entry can be added
      to the directory, but the reservation quotas is required appropriately,
      and finally its inode is added to the unlinked list.
      
      xfs_vn_tmpfile() add one O_TMPFILE method to VFS interface and directly
      invoke xfs_create_tmpfile().
      Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      99b6436b
  23. 17 12月, 2013 1 次提交
    • J
      xfs: fix assertion failure at xfs_setattr_nonsize · 5c227278
      Jie Liu 提交于
      For CRC enabled v5 super block, change a file's ownership can simply
      trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
      project quota are enabled, i.e,
      
      [  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
      [  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
      [  305.383939] Call Trace:
      [  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
      [  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
      [  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
      [  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
      [  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
      [  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
      [  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]
      
      This fix adjust the assertion to check if the super block support both
      quota inodes or not.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 5a01dd54)
      5c227278
  24. 10 12月, 2013 1 次提交
    • J
      xfs: fix assertion failure at xfs_setattr_nonsize · 5a01dd54
      Jie Liu 提交于
      For CRC enabled v5 super block, change a file's ownership can simply
      trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
      project quota are enabled, i.e,
      
      [  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
      [  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
      [  305.383939] Call Trace:
      [  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
      [  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
      [  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
      [  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
      [  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
      [  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
      [  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]
      
      This fix adjust the assertion to check if the super block support both
      quota inodes or not.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5a01dd54
  25. 07 12月, 2013 2 次提交
  26. 31 10月, 2013 3 次提交
    • D
      xfs: prevent stack overflows from page cache allocation · ad22c7a0
      Dave Chinner 提交于
      Page cache allocation doesn't always go through ->begin_write and
      hence we don't always get the opportunity to set the allocation
      context to GFP_NOFS. Failing to do this means we open up the direct
      relcaim stack to recurse into the filesystem and consume a
      significant amount of stack.
      
      On RHEL6.4 kernels we are seeing ra_submit() and
      generic_file_splice_read() from an nfsd context recursing into the
      filesystem via the inode cache shrinker and evicting inodes. This is
      causing truncation to be run (e.g EOF block freeing) and causing
      bmap btree block merges and free space btree block splits to occur.
      These btree manipulations are occurring with the call chain already
      30 functions deep and hence there is not enough stack space to
      complete such operations.
      
      To avoid these specific overruns, we need to prevent the page cache
      allocation from recursing via direct reclaim. We can do that because
      the allocation functions take the allocation context from that which
      is stored in the mapping for the inode. We don't set that right now,
      so the default is GFP_HIGHUSER_MOVABLE, which is effectively a
      GFP_KERNEL context. We need it to be the equivalent of GFP_NOFS, so
      when we initialise an inode, set the mapping gfp mask appropriately.
      
      This makes the use of AOP_FLAG_NOFS redundant from other parts of
      the XFS IO path, so get rid of it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ad22c7a0
    • D
      xfs: vectorise DA btree operations · 4bceb18f
      Dave Chinner 提交于
      The remaining non-vectorised code for the directory structure is the
      node format blocks. This is shared with the attribute tree, and so
      is slightly more complex to vectorise.
      
      Introduce a "non-directory" directory ops structure that is attached
      to all non-directory inodes so that attribute operations can be
      vectorised for all inodes.
      
      Once we do this, we can vectorise all the da btree operations.
      Because this patch adds more infrastructure than it removes the
      binary size does not decrease:
      
         text    data     bss     dec     hex filename
       794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
       792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
       792350   96802    1096  890248   d9588 fs/xfs/xfs.o.p2
       789293   96802    1096  887191   d8997 fs/xfs/xfs.o.p3
       789005   96802    1096  886903   d8997 fs/xfs/xfs.o.p4
       789061   96802    1096  886959   d88af fs/xfs/xfs.o.p5
       789733   96802    1096  887631   d8b4f fs/xfs/xfs.o.p6
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4bceb18f
    • D
      xfs: abstract the differences in dir2/dir3 via an ops vector · 32c5483a
      Dave Chinner 提交于
      Lots of the dir code now goes through switches to determine what is
      the correct on-disk format to parse. It generally involves a
      "xfs_sbversion_hasfoo" check, deferencing the superblock version and
      feature fields and hence touching several cache lines per operation
      in the process. Some operations do multiple checks because they nest
      conditional operations and they don't pass the information in a
      direct fashion between each other.
      
      Hence, add an ops vector to the xfs_inode structure that is
      configured when the inode is initialised to point to all the correct
      decode and encoding operations.  This will significantly reduce the
      branchiness and cacheline footprint of the directory object decoding
      and encoding.
      
      This is the first patch in a series of conversion patches. It will
      introduce the ops structure, the setup of it and add the first
      operation to the vector. Subsequent patches will convert directory
      ops one at a time to keep the changes simple and obvious.
      
      Just this patch shows the benefit of such an approach on code size.
      Just converting the two shortform dir operations as this patch does
      decreases the built binary size by ~1500 bytes:
      
      $ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1
         text    data     bss     dec     hex filename
       794490   96802    1096  892388   d9de4 fs/xfs/xfs.o.orig
       792986   96802    1096  890884   d9804 fs/xfs/xfs.o.p1
      $
      
      That's a significant decrease in the instruction cache footprint of
      the directory code for such a simple change, and indicates that this
      approach is definitely worth pursuing further.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32c5483a
  27. 24 10月, 2013 3 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787