提交 · 670ce93fef93bba8c8a422a79747385bec8e846a · openeuler / raspberrypi-kernel

12 10月, 2011 40 次提交

xfs: reduce the number of log forces from tail pushing · 670ce93f

由 Dave Chinner 提交于 9月 30, 2011

The AIL push code will issue a log force on ever single push loop
that it exits and has encountered pinned items. It doesn't rescan
these pinned items until it revisits the AIL from the start. Hence
we only need to force the log once per walk from the start of the
AIL to the target LSN.

This results in numbers like this:

	xs_push_ail_flush.....         1456
	xs_log_force.........          1485

For an 8-way 50M inode create workload - almost all the log forces
are coming from the AIL pushing code.

Reduce the number of log forces by only forcing the log if the
previous walk found pinned buffers. This reduces the numbers to:

	xs_push_ail_flush.....          665
	xs_log_force.........           682

For the same test.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

670ce93f

xfs: Don't allocate new buffers on every call to _xfs_buf_find · 3815832a

由 Dave Chinner 提交于 9月 30, 2011

Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing
~1 million cache hit lookups to ~3000 buffer creates. That's almost
3 orders of magnitude more cahce hits than misses, so optimising for
cache hits is quite important. In the cache hit case, we do not need
to allocate a new buffer in case of a cache miss, so we are
effectively hitting the allocator for no good reason for vast the
majority of calls to _xfs_buf_find. 8-way create workloads are
showing similar cache hit/miss ratios.

The result is profiles that look like this:

     samples  pcnt function                        DSO
     _______ _____ _______________________________ _________________

     1036.00 10.0% _xfs_buf_find                   [kernel.kallsyms]
      582.00  5.6% kmem_cache_alloc                [kernel.kallsyms]
      519.00  5.0% __memcpy                        [kernel.kallsyms]
      468.00  4.5% __ticket_spin_lock              [kernel.kallsyms]
      388.00  3.7% kmem_cache_free                 [kernel.kallsyms]
      331.00  3.2% xfs_log_commit_cil              [kernel.kallsyms]


Further, there is a fair bit of work involved in initialising a new
buffer once a cache miss has occurred and we currently do that under
the rbtree spinlock. That increases spinlock hold time on what are
heavily used trees.

To fix this, remove the initialisation of the buffer from
_xfs_buf_find() and only allocate the new buffer once we've had a
cache miss. Initialise the buffer immediately after allocating it in
xfs_buf_get, too, so that is it ready for insert if we get another
cache miss after allocation. This minimises lock hold time and
avoids unnecessary allocator churn. The resulting profiles look
like:

     samples  pcnt function                    DSO
     _______ _____ ___________________________ _________________

     8111.00  9.1% _xfs_buf_find               [kernel.kallsyms]
     4380.00  4.9% __memcpy                    [kernel.kallsyms]
     4341.00  4.8% __ticket_spin_lock          [kernel.kallsyms]
     3401.00  3.8% kmem_cache_alloc            [kernel.kallsyms]
     2856.00  3.2% xfs_log_commit_cil          [kernel.kallsyms]
     2625.00  2.9% __kmalloc                   [kernel.kallsyms]
     2380.00  2.7% kfree                       [kernel.kallsyms]
     2016.00  2.3% kmem_cache_free             [kernel.kallsyms]

Showing a significant reduction in time spent doing allocation and
freeing from slabs (kmem_cache_alloc and kmem_cache_free).
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

3815832a

xfs: simplify xfs_trans_ijoin* again · ddc3415a

由 Christoph Hellwig 提交于 9月 19, 2011

There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.

I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort.  The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ddc3415a

xfs: unlock the inode before log force in xfs_change_file_space · 23bb0be1

由 Christoph Hellwig 提交于 9月 18, 2011

Let the transaction commit unlock the inode before it potentially causes
a synchronous log force.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

23bb0be1

xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata · 8292d88c

由 Christoph Hellwig 提交于 9月 18, 2011

Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

8292d88c

xfs: unlock the inode before log force in xfs_fsync · b1037058

由 Christoph Hellwig 提交于 9月 19, 2011

Only read the LSN we need to push to with the ilock held, and then release
it before we do the log force to improve concurrency.

This also removes the only direct caller of _xfs_trans_commit, thus
allowing it to be merged into the plain xfs_trans_commit again.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b1037058

xfs: XFS_TRANS_SWAPEXT is not a valid flag for xfs_trans_commit · 815cb216

由 Christoph Hellwig 提交于 9月 26, 2011

XFS_TRANS_SWAPEXT is a transaction type, not a flag for xfs_trans_commit, so
don't pass it in xfs_swap_extents.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

815cb216

xfs: fix possible overflow in xfs_ioc_trim() · c029a50d

由 Lukas Czerner 提交于 9月 21, 2011

In xfs_ioc_trim it is possible that computing the last allocation group
to discard might overflow for big start & len values, because the result
might be bigger then xfs_agnumber_t which is 32 bit long. Fix this by not
allowing the start and end block of the range to be beyond the end of the
file system.

Note that if the start is beyond the end of the file system we have to
return -EINVAL, but in the "end" case we have to truncate it to the fs
size.

Also introduce "end" variable, rather than using start+len which which
might be more confusing to get right as this bug shows.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c029a50d

xfs: cleanup xfs_bmap.h · d952e2f8

由 Christoph Hellwig 提交于 9月 18, 2011

Convert all function prototypes to the short form used elsewhere,
and remove duplicates of comments already placed at the function
body.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

d952e2f8

xfs: dont ignore error code from xfs_bmbt_update · b0eab14e

由 Christoph Hellwig 提交于 9月 18, 2011

Fix a case in xfs_bmap_add_extent_unwritten_real where we aren't
passing the returned error on.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b0eab14e

xfs: pass bmalloca to xfs_bmap_add_extent_hole_real · c6534249

由 Christoph Hellwig 提交于 9月 18, 2011

All the parameters passed to xfs_bmap_add_extent_hole_real() are in
the xfs_bmalloca structure now. Just pass the bmalloca parameter to
the function instead of 8 separate parameters.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c6534249

xfs: pass bmalloca to xfs_bmap_add_extent_delay_real · 572a4cf0

由 Christoph Hellwig 提交于 9月 18, 2011

All the parameters passed to xfs_bmap_add_extent_delay_real() are in
the xfs_bmalloca structure now. Just pass the bmalloca parameter to
the function instead of 8 separate parameters.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

572a4cf0

xfs: move logflags into bmalloca · c315c90b

由 Christoph Hellwig 提交于 9月 18, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c315c90b

xfs: move lastx and nallocs into bmalloca · e0c3da5d

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e0c3da5d

xfs: move btree cursor into bmalloca · 29c8d17a

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

29c8d17a

xfs: do not keep local copies of allocation ranges in xfs_bmapi_allocate · 963c30cf

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

963c30cf

xfs: rename allocation range fields in struct xfs_bmalloca · 3a75667e

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

3a75667e

xfs: move firstblock and bmap freelist cursor into bmalloca structure · 0937e0fd

由 Dave Chinner 提交于 9月 18, 2011

Rather than passing the firstblock and freelist structure around,
embed it into the bmalloca structure and remove it from the function
parameters.

This also enables the minleft parameter to be set only once in
xfs_bmapi_write(), and the freelist cursor directly queried in
xfs_bmapi_allocate to clear it when the lowspace algorithm is
activated.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0937e0fd

xfs: move extent records into bmalloca structure · baf41a52

由 Dave Chinner 提交于 9月 18, 2011

Rather that putting extent records on the stack and then pointing to
them in the bmalloca structure which is in the same stack frame, put
the extent records directly in the bmalloca structure. This reduces
the number of args that need to be passed around.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

baf41a52

xfs: pass bmalloca structure to xfs_bmap_isaeof · 1b16447b

由 Dave Chinner 提交于 9月 18, 2011

All the variables xfs_bmap_isaeof() is passed are contained within
the xfs_bmalloca structure. Pass that instead.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1b16447b

xfs: remove xfs_bmap_add_extent · a5bd606b

由 Christoph Hellwig 提交于 9月 18, 2011

There is no real need to the xfs_bmap_add_extent, as the callers
know what kind of extents they need to it.  Removing it means
duplicating the extents to btree conversion logic in three places,
but overall it's still much simpler code and quite a bit less code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a5bd606b

xfs: introduce xfs_bmap_last_extent · 27a3f8f2

由 Christoph Hellwig 提交于 9月 18, 2011

Add a common helper for finding the last extent in a file.

Largely based on a patch from Dave Chinner.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

27a3f8f2

xfs: rename xfs_bmapi to xfs_bmapi_write · c0dc7828

由 Dave Chinner 提交于 9月 18, 2011

Now that all the read-only users of xfs_bmapi have been converted to
use xfs_bmapi_read(), we can remove all the read-only handling cases
from xfs_bmapi().

Once this is done, rename xfs_bmapi to xfs_bmapi_write to reflect
the fact it is for allocation only. This enables us to kill the
XFS_BMAPI_WRITE flag as well.

Also clean up xfs_bmapi_write to the style used in the newly added
xfs_bmapi_read/delay functions.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c0dc7828

xfs: factor unwritten extent map manipulations out of xfs_bmapi · b447fe5a

由 Dave Chinner 提交于 9月 18, 2011

To further improve the readability of xfs_bmapi(), factor the
unwritten extent conversion out into a separate function. This
removes large block of logic from the xfs_bmapi() code loop and
makes it easier to see the operational logic flow for xfs_bmapi().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b447fe5a

xfs: factor extent allocation out of xfs_bmapi · 7e47a4ef

由 Dave Chinner 提交于 9月 18, 2011

To further improve the readability of xfs_bmapi(), factor the extent
allocation out into a separate function. This removes a large block
of logic from the xfs_bmapi() code loop and makes it easier to see
the operational logic flow for xfs_bmapi().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

7e47a4ef

xfs: do not use xfs_bmap_add_extent for adding delalloc extents · 1fd044d9

由 Christoph Hellwig 提交于 9月 18, 2011

We can just call xfs_bmap_add_extent_hole_delay directly to add a
delayed allocated regions to the extent tree, instead of going
through all the complexities of xfs_bmap_add_extent that aren't
needed for this simple case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1fd044d9

xfs: introduce xfs_bmapi_delay() · 4403280a

由 Christoph Hellwig 提交于 9月 18, 2011

Delalloc reservations are much simpler than allocations, so give
them a separate bmapi-level interface.  Using the previously added
xfs_bmapi_reserve_delalloc we get a function that is only minimally
more complicated than xfs_bmapi_read, which is far from the complexity
in xfs_bmapi.  Also remove the XFS_BMAPI_DELAY code after switching
over the only user to xfs_bmapi_delay.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4403280a

xfs: factor delalloc reservations out of xfs_bmapi · b64dfe4e

由 Christoph Hellwig 提交于 9月 18, 2011

Move the reservation of delayed allocations, and addition of delalloc
regions to the extent trees into a new helper function.  For now
this adds some twisted goto logic to xfs_bmapi, but that will be
cleaned up in the following patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b64dfe4e

xfs: remove xfs_bmapi_single() · 5b777ad5

由 Dave Chinner 提交于 9月 18, 2011

Now we have xfs_bmapi_read, there is no need for xfs_bmapi_single().
Change the remaining caller over and kill the function.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5b777ad5

xfs: introduce xfs_bmapi_read() · 5c8ed202

由 Dave Chinner 提交于 9月 18, 2011

xfs_bmapi() currently handles both extent map reading and
allocation. As a result, the code is littered with "if (wr)"
branches to conditionally do allocation operations if required.
This makes the code much harder to follow and causes significant
indent issues with the code.

Given that read mapping is much simpler than allocation, we can
split out read mapping from xfs_bmapi() and reuse the logic that
we have already factored out do do all the hard work of handling the
extent map manipulations. The results in a much simpler function for
the common extent read operations, and will allow the allocation
code to be simplified in another commit.

Once xfs_bmapi_read() is implemented, convert all the callers of
xfs_bmapi() that are only reading extents to use the new function.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5c8ed202

xfs: factor extent map manipulations out of xfs_bmapi · aef9a895

由 Dave Chinner 提交于 9月 18, 2011

To further improve the readability of xfs_bmapi(), factor the pure
extent map manipulations out into separate functions. This removes
large blocks of logic from the xfs_bmapi() code loop and makes it
easier to see the operational logic flow for xfs_bmapi().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

aef9a895

xfs: remove the nextents variable in xfs_bmapi · ecee76ba

由 Christoph Hellwig 提交于 9月 18, 2011

Instead of using a local variable that needs to updated when we modify
the extent map just check ifp->if_bytes directly where we use it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ecee76ba

xfs: remove impossible to read code in xfs_bmap_add_extent_delay_real · b9b984d7

由 Christoph Hellwig 提交于 9月 18, 2011

We already have the worst case blocks reserved, so xfs_icsb_modify_counters
won't fail in xfs_bmap_add_extent_delay_real.  In fact we've had an assert
to catch this case since day and it never triggered.  So remove the code
to try smaller reservations, and just return the error for that case in
addition to keeping the assert.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b9b984d7

xfs: remove the first extent special case in xfs_bmap_add_extent · e7455e02

由 Christoph Hellwig 提交于 9月 18, 2011

Both xfs_bmap_add_extent_hole_delay and xfs_bmap_add_extent_hole_real
already contain code to handle the case where there is no extent to
merge with, which is effectively the same as the code duplicated here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e7455e02

xfs: Return -EIO when xfs_vn_getattr() failed · ed32201e

由 Mitsuo Hayasaka 提交于 9月 17, 2011

An attribute of inode can be fetched via xfs_vn_getattr() in XFS.
Currently it returns EIO, not negative value, when it failed.  As a
result, the system call returns not negative value even though an
error occured. The stat(2), ls and mv commands cannot handle this
error and do not work correctly.

This patch fixes this bug, and returns -EIO, not EIO when an error
is detected in xfs_vn_getattr().
Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ed32201e

xfs: Fix the incorrect comment in the header of _xfs_buf_find · eabbaf11

由 Chandra Seetharaman 提交于 9月 08, 2011

Fix the incorrect comment in the header of the function
_xfs_buf_find().
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

eabbaf11

xfs: Check the return value of xfs_trans_get_buf() · 2a30f36d

由 Chandra Seetharaman 提交于 9月 20, 2011

Check the return value of xfs_trans_get_buf() and fail
appropriately.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

2a30f36d

xfs: Check the return value of xfs_buf_get() · b522950f

由 Chandra Seetharaman 提交于 9月 07, 2011

Check the return value of xfs_buf_get() and fail appropriately.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b522950f

xfs: improve ioend error handling · 04f658ee

由 Christoph Hellwig 提交于 8月 24, 2011

Return unwritten extent conversion errors to aio_complete.

Skip both unwritten extent conversion and size updates if we had an
I/O error or the filesystem has been shut down.

Return -EIO to the aio/buffer completion handlers in case of a
forced shutdown.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

04f658ee

xfs: avoid direct I/O write vs buffered I/O race · c58cb165

由 Christoph Hellwig 提交于 8月 27, 2011

Currently a buffered reader or writer can add pages to the pagecache
while we are waiting for the iolock in xfs_file_dio_aio_write.  Prevent
this by re-checking mapping->nrpages after we got the iolock, and if
nessecary upgrade the lock to exclusive mode.  To simplify this a bit
only take the ilock inside of xfs_file_aio_write_checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c58cb165