提交 · 4acdaf27ebe2034c342f3be57ef49aed1ad885ef · openeuler / raspberrypi-kernel

04 1月, 2012 5 次提交

由 Al Viro 提交于 7月 26, 2011

vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4acdaf27

switch vfs_mkdir() and ->mkdir() to umode_t · 18bb1db3

由 Al Viro 提交于 7月 26, 2011

vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

18bb1db3

vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05

由 Al Viro 提交于 12月 12, 2011

Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b520e05

vfs: mnt_drop_write_file() · 2a79f17e

由 Al Viro 提交于 12月 09, 2011

new helper (wrapper around mnt_drop_write()) to be used in pair with
mnt_want_write_file().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a79f17e

switch a bunch of places to mnt_want_write_file() · a561be71

由 Al Viro 提交于 11月 23, 2011

it's both faster (in case when file has been opened for write) and cleaner.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a561be71

24 12月, 2011 2 次提交

xfs: log all dirty inodes in xfs_fs_sync_fs · be4f1ac8

由 Christoph Hellwig 提交于 12月 20, 2011

Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync().  Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.

The older_than_this checks that are now more strictly enforced since

    writeback: avoid livelocking WB_SYNC_ALL writeback

by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once.  But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened.  I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.

Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty.  This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Tested-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

be4f1ac8

xfs: log the inode in ->write_inode calls for kupdate · 0b8fd303

由 Christoph Hellwig 提交于 12月 18, 2011

If the writeback code writes back an inode because it has expired we currently
use the non-blockin ->write_inode path.  This means any inode that is pinned
is skipped.  With delayed logging and a workload that has very little log
traffic otherwise it is very likely that an inode that gets constantly
written to is always pinned, and thus we keep refusing to write it.  The VM
writeback code at that point redirties it and doesn't try to write it again
for another 30 seconds.  This means under certain scenarious time based
metadata writeback never happens.

Fix this by calling into xfs_log_inode for kupdate in addition to data
integrity syncs, and thus transfer the inode to the log ASAP.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Tested-by: NMark Tinguely <tinguely@sgi.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

0b8fd303

07 12月, 2011 2 次提交

xfs: fix the logspace waiting algorithm · 9f9c19ec

由 Christoph Hellwig 提交于 11月 28, 2011

Apply the scheme used in log_regrant_write_log_space to wake up any other
threads waiting for log space before the newly added one to
log_regrant_write_log_space as well, and factor the code into readable
helpers.  For each of the queues we have add two helpers:

 - one to try to wake up all waiting threads.  This helper will also be
   usable by xfs_log_move_tail once we remove the current opportunistic
   wakeups in it.
 - one to sleep on t_wait until enough log space is available, loosely
   modelled after Linux waitqueues.
 
And use them to reimplement the guts of log_regrant_write_log_space and
log_regrant_write_log_space.  These two function now use one and the same
algorithm for waiting on log space instead of subtly different ones before,
with an option to completely unify them in the near future.

Also move the filesystem shutdown handling to the common caller given
that we had to touch it anyway.

Based on hard debugging and an earlier patch from
Chandra Seetharaman <sekharan@us.ibm.com>.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
Tested-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

9f9c19ec

xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels · c29f7d45

由 Christoph Hellwig 提交于 11月 30, 2011

The i_ino field in the VFS inode is of type unsigned long and thus can't
hold the full 64-bit inode number on 32-bit kernels.  We have the full
inode number in the XFS inode, so use that one for nfs exports.  Note
that I've also switched the 32-bit file handles types to it, just to make
the code more consistent and copy & paste errors less likely to happen.
Reported-by: NGuoquan Yang <ygq51@hotmail.com>
Reported-by: NHank Peng <pengxihan@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

c29f7d45

03 12月, 2011 1 次提交

xfs: fix allocation length overflow in xfs_bmapi_write() · a99ebf43

由 Dave Chinner 提交于 12月 01, 2011

When testing the new xfstests --large-fs option that does very large
file preallocations, this assert was tripped deep in
xfs_alloc_vextent():

XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239

The allocation was trying to allocate a zero length extent because
the lower 32 bits of the allocation length was zero. The remaining
length of the allocation to be done was an exact multiple of 2^32 -
the first case I saw was at 496TB remaining to be allocated.

This turns out to be an overflow when converting the allocation
length (a 64 bit quantity) into the extent length to allocate (a 32
bit quantity), and it requires the length to be allocated an exact
multiple of 2^32 blocks to trip the assert.

Fix it by limiting the extent lenth to allocate to MAXEXTLEN.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a99ebf43

30 11月, 2011 2 次提交

xfs: fix attr2 vs large data fork assert · 4c393a60

由 Christoph Hellwig 提交于 11月 19, 2011

With Dmitry fsstress updates I've seen very reproducible crashes in
xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
the attributes would not fit inline into the inode after removing an
attribute.  It turns out that we were operating on an inode with lots
of delalloc extents, and thus an if_bytes values for the data fork that
is larger than biggest possible on-disk storage for it which utterly
confuses the code near the end of xfs_attr_shortform_bytesfit.

Fix this by always allowing the current attribute fork, like we already
do for the attr1 format, given that delalloc conversion will take care
for moving either the data or attribute area out of line if it doesn't
fit at that point - or making the point moot by merging extents at this
point.

Also document the function better, and clean up some loose bits.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

4c393a60

xfs: force buffer writeback before blocking on the ilock in inode reclaim · 4dd2cb4a

由 Christoph Hellwig 提交于 11月 29, 2011

If we are doing synchronous inode reclaim we block the VM from making
progress in memory reclaim.  So if we encouter a flush locked inode
promote it in the delwri list and wake up xfsbufd to write it out now.
Without this we can get hangs of up to 30 seconds during workloads hitting
synchronous inode reclaim.

The scheme is copied from what we do for dquot reclaims.
Reported-by: NSimon Kirby <sim@hostway.ca>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NSimon Kirby <sim@hostway.ca>
Signed-off-by: NBen Myers <bpm@sgi.com>

4dd2cb4a

29 11月, 2011 1 次提交

xfs: validate acl count · fa8b18ed

由 Christoph Hellwig 提交于 11月 20, 2011

This prevents in-memory corruption and possible panics if the on-disk
ACL is badly corrupted.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

fa8b18ed

16 11月, 2011 1 次提交

xfs: use doalloc flag in xfs_qm_dqattach_one() · db3e74b5

由 Mitsuo Hayasaka 提交于 11月 10, 2011

The doalloc arg in xfs_qm_dqattach_one() is a flag that indicates
whether a new area to handle quota information will be allocated
if needed. Originally, it was passed to xfs_qm_dqget(), but has
been removed by the following commit (probably by mistake):

	commit 8e9b6e7f
	Author: Christoph Hellwig <hch@lst.de>
	Date:   Sun Feb 8 21:51:42 2009 +0100

	xfs: remove the unused XFS_QMOPT_DQLOCK flag

As the result, xfs_qm_dqget() called from xfs_qm_dqattach_one()
never allocates the new area even if it is needed.

This patch gives the doalloc arg to xfs_qm_dqget() in
xfs_qm_dqattach_one() to fix this problem.
Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

db3e74b5

09 11月, 2011 3 次提交

xfs: fix force shutdown handling in xfs_end_io · 810627d9

由 Christoph Hellwig 提交于 11月 08, 2011

Ensure ioend->io_error gets propagated back to e.g. AIO completions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

810627d9

xfs: constify xfs_item_ops · 272e42b2

由 Christoph Hellwig 提交于 10月 28, 2011

The log item ops aren't nessecarily the biggest exploit vector, but marking
them const is easy enough.  Also remove the unused xfs_item_ops_t typedef
while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

272e42b2

xfs: Fix possible memory corruption in xfs_readlink · b52a360b

由 Carlos Maiolino 提交于 11月 07, 2011

Fixes a possible memory corruption when the link is larger than
MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
S_ISLNK assert, since the inode mode is checked previously in
xfs_readlink_by_handle() and via VFS.

Updated to address concerns raised by Ben Hutchings about the loose
attention paid to 32- vs 64-bit values, and the lack of handling a
potentially negative pathlen value:
 - Changed type of "pathlen" to be xfs_fsize_t, to match that of
   ip->i_d.di_size
 - Added checking for a negative pathlen to the too-long pathlen
   test, and generalized the message that gets reported in that case
   to reflect the change
As a result, if a negative pathlen were encountered, this function
would return EFSCORRUPTED (and would fail an assertion for a debug
build)--just as would a too-long pathlen.
Signed-off-by: NAlex Elder <aelder@sgi.com>
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

b52a360b

02 11月, 2011 1 次提交

filesystems: add set_nlink() · bfe86848

由 Miklos Szeredi 提交于 10月 28, 2011

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Tested-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

bfe86848

01 11月, 2011 2 次提交

treewide: use __printf not __attribute__((format(printf,...))) · b9075fa9

由 Joe Perches 提交于 10月 31, 2011

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9075fa9

xfs: warn if direct reclaim tries to writeback pages · 94054fa3

由 Mel Gorman 提交于 10月 31, 2011

Direct reclaim should never writeback pages.  For now, handle the
situation and warn about it.  Ultimately, this will be a BUG_ON.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alex Elder <aelder@sgi.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94054fa3

19 10月, 2011 2 次提交

xfs: add AIL pushing tracepoints · 9e4c109a

由 Christoph Hellwig 提交于 10月 11, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

9e4c109a

xfs: put in missed fix for merge problem · 2900b339

由 Alex Elder 提交于 10月 18, 2011

I intended to do this as part of fixing part of the conflict with
the merge with Linus' tree, but evidently it didn't get included in
the commit.
Signed-off-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2900b339

12 10月, 2011 18 次提交

xfs: do not flush data workqueues in xfs_flush_buftarg · 5a93a064

由 Christoph Hellwig 提交于 10月 10, 2011

When we call xfs_flush_buftarg (generally from sync or umount) it already
is too late to flush the data workqueues, as I/O completion is signalled
for them and we are thus already done with the data we would flush here.

There are places where flushing them might be useful, but the current
sync interface doesn't give us that opportunity.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5a93a064

xfs: remove XFS_bflush · a9add83e

由 Christoph Hellwig 提交于 10月 10, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a9add83e

xfs: remove xfs_buf_target_name · 02b102df

由 Christoph Hellwig 提交于 10月 10, 2011

The calling convention that returns a pointer to a static buffer is
fairly nasty, so just opencode it in the only caller that is left.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

02b102df

xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks · b38505b0

由 Christoph Hellwig 提交于 10月 10, 2011

Use xfs_ioerror_alert instead of opencoding a very similar error
message.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b38505b0

xfs: clean up xfs_ioerror_alert · 901796af

由 Christoph Hellwig 提交于 10月 10, 2011

Instead of passing the block number and mount structure explicitly
get them off the bp and fix make the argument order more natural.

Also move it to xfs_buf.c and stop printing the device name given
that we already get the fs name as part of xfs_alert, and we know
what device is operates on because of the caller that gets printed,
finally rename it to xfs_buf_ioerror_alert and pass __func__ as
argument where it makes sense.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

901796af

xfs: clean up buffer allocation · 4347b9d7

由 Christoph Hellwig 提交于 10月 10, 2011

Change _xfs_buf_initialize to allocate the buffer directly and rename it to
xfs_buf_alloc now that is the only buffer allocation routine.  Also remove
the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4347b9d7

xfs: remove buffers from the delwri list in xfs_buf_stale · af5c4bee

由 Christoph Hellwig 提交于 10月 10, 2011

For each call to xfs_buf_stale we call xfs_buf_delwri_dequeue either
directly before or after it, or are guaranteed by the surrounding
conditionals that we are never called on delwri buffers.  Simply
this situation by moving the call to xfs_buf_delwri_dequeue into
xfs_buf_stale.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

af5c4bee

xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE · c867cb61

由 Christoph Hellwig 提交于 10月 10, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c867cb61

xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF · 38f23232

由 Christoph Hellwig 提交于 10月 10, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

38f23232

xfs: remove XFS_BUF_FINISH_IOWAIT · 5fde0326

由 Christoph Hellwig 提交于 10月 10, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5fde0326

xfs: remove xfs_get_buftarg_list · b17b8334

由 Christoph Hellwig 提交于 10月 10, 2011

The code is unused and under a config option that doesn't exist, remove it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b17b8334

xfs: fix buffer flushing during unmount · 87c7bec7

由 Christoph Hellwig 提交于 9月 14, 2011

The code to flush buffers in the umount code is a bit iffy: we first
flush all delwri buffers out, but then might be able to queue up a
new one when logging the sb counts.  On a normal shutdown that one
would get flushed out when doing the synchronous superblock write in
xfs_unmountfs_writesb, but we skip that one if the filesystem has
been shut down.

Fix this by moving the delwri list flushing until just before unmounting
the log, and while we're at it also remove the superflous delwri list
and buffer lru flusing for the rt and log device that can never have
cached or delwri buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NAmit Sahrawat <amit.sahrawat83@gmail.com>
Tested-by: NAmit Sahrawat <amit.sahrawat83@gmail.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

87c7bec7

xfs: optimize fsync on directories · 1da2f2db

由 Christoph Hellwig 提交于 10月 02, 2011

Directories are only updated transactionally, which means fsync only
needs to flush the log the inode is currently dirty, but not bother
with checking for dirty data, non-transactional updates, and most
importanly doesn't have to flush disk caches except as part of a
transaction commit.

While the first two optimizations can't easily be measured, the
latter actually makes a difference when doing lots of fsync that do
not actually have to commit the inode, e.g. because an earlier fsync
already pushed the log far enough.

The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except
for the prototype, but I'm not sure creating a common helper for the
two is worth it given how simple the functions are.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1da2f2db

xfs: reduce the number of log forces from tail pushing · 670ce93f

由 Dave Chinner 提交于 9月 30, 2011

The AIL push code will issue a log force on ever single push loop
that it exits and has encountered pinned items. It doesn't rescan
these pinned items until it revisits the AIL from the start. Hence
we only need to force the log once per walk from the start of the
AIL to the target LSN.

This results in numbers like this:

	xs_push_ail_flush.....         1456
	xs_log_force.........          1485

For an 8-way 50M inode create workload - almost all the log forces
are coming from the AIL pushing code.

Reduce the number of log forces by only forcing the log if the
previous walk found pinned buffers. This reduces the numbers to:

	xs_push_ail_flush.....          665
	xs_log_force.........           682

For the same test.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

670ce93f

xfs: Don't allocate new buffers on every call to _xfs_buf_find · 3815832a

由 Dave Chinner 提交于 9月 30, 2011

Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing
~1 million cache hit lookups to ~3000 buffer creates. That's almost
3 orders of magnitude more cahce hits than misses, so optimising for
cache hits is quite important. In the cache hit case, we do not need
to allocate a new buffer in case of a cache miss, so we are
effectively hitting the allocator for no good reason for vast the
majority of calls to _xfs_buf_find. 8-way create workloads are
showing similar cache hit/miss ratios.

The result is profiles that look like this:

     samples  pcnt function                        DSO
     _______ _____ _______________________________ _________________

     1036.00 10.0% _xfs_buf_find                   [kernel.kallsyms]
      582.00  5.6% kmem_cache_alloc                [kernel.kallsyms]
      519.00  5.0% __memcpy                        [kernel.kallsyms]
      468.00  4.5% __ticket_spin_lock              [kernel.kallsyms]
      388.00  3.7% kmem_cache_free                 [kernel.kallsyms]
      331.00  3.2% xfs_log_commit_cil              [kernel.kallsyms]


Further, there is a fair bit of work involved in initialising a new
buffer once a cache miss has occurred and we currently do that under
the rbtree spinlock. That increases spinlock hold time on what are
heavily used trees.

To fix this, remove the initialisation of the buffer from
_xfs_buf_find() and only allocate the new buffer once we've had a
cache miss. Initialise the buffer immediately after allocating it in
xfs_buf_get, too, so that is it ready for insert if we get another
cache miss after allocation. This minimises lock hold time and
avoids unnecessary allocator churn. The resulting profiles look
like:

     samples  pcnt function                    DSO
     _______ _____ ___________________________ _________________

     8111.00  9.1% _xfs_buf_find               [kernel.kallsyms]
     4380.00  4.9% __memcpy                    [kernel.kallsyms]
     4341.00  4.8% __ticket_spin_lock          [kernel.kallsyms]
     3401.00  3.8% kmem_cache_alloc            [kernel.kallsyms]
     2856.00  3.2% xfs_log_commit_cil          [kernel.kallsyms]
     2625.00  2.9% __kmalloc                   [kernel.kallsyms]
     2380.00  2.7% kfree                       [kernel.kallsyms]
     2016.00  2.3% kmem_cache_free             [kernel.kallsyms]

Showing a significant reduction in time spent doing allocation and
freeing from slabs (kmem_cache_alloc and kmem_cache_free).
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

3815832a

xfs: simplify xfs_trans_ijoin* again · ddc3415a

由 Christoph Hellwig 提交于 9月 19, 2011

There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.

I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort.  The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ddc3415a

xfs: unlock the inode before log force in xfs_change_file_space · 23bb0be1

由 Christoph Hellwig 提交于 9月 18, 2011

Let the transaction commit unlock the inode before it potentially causes
a synchronous log force.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

23bb0be1

xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata · 8292d88c

由 Christoph Hellwig 提交于 9月 18, 2011

Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

8292d88c