提交 · 29c8d17a8938be88e36b93522753f3519aefd05d · openeuler / Kernel

12 10月, 2011 40 次提交

xfs: move btree cursor into bmalloca · 29c8d17a

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

29c8d17a

xfs: do not keep local copies of allocation ranges in xfs_bmapi_allocate · 963c30cf

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

963c30cf

xfs: rename allocation range fields in struct xfs_bmalloca · 3a75667e

由 Dave Chinner 提交于 9月 18, 2011

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

3a75667e

xfs: move firstblock and bmap freelist cursor into bmalloca structure · 0937e0fd

由 Dave Chinner 提交于 9月 18, 2011

Rather than passing the firstblock and freelist structure around,
embed it into the bmalloca structure and remove it from the function
parameters.

This also enables the minleft parameter to be set only once in
xfs_bmapi_write(), and the freelist cursor directly queried in
xfs_bmapi_allocate to clear it when the lowspace algorithm is
activated.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0937e0fd

xfs: move extent records into bmalloca structure · baf41a52

由 Dave Chinner 提交于 9月 18, 2011

Rather that putting extent records on the stack and then pointing to
them in the bmalloca structure which is in the same stack frame, put
the extent records directly in the bmalloca structure. This reduces
the number of args that need to be passed around.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

baf41a52

xfs: pass bmalloca structure to xfs_bmap_isaeof · 1b16447b

由 Dave Chinner 提交于 9月 18, 2011

All the variables xfs_bmap_isaeof() is passed are contained within
the xfs_bmalloca structure. Pass that instead.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1b16447b

xfs: remove xfs_bmap_add_extent · a5bd606b

由 Christoph Hellwig 提交于 9月 18, 2011

There is no real need to the xfs_bmap_add_extent, as the callers
know what kind of extents they need to it.  Removing it means
duplicating the extents to btree conversion logic in three places,
but overall it's still much simpler code and quite a bit less code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a5bd606b

xfs: introduce xfs_bmap_last_extent · 27a3f8f2

由 Christoph Hellwig 提交于 9月 18, 2011

Add a common helper for finding the last extent in a file.

Largely based on a patch from Dave Chinner.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

27a3f8f2

xfs: rename xfs_bmapi to xfs_bmapi_write · c0dc7828

由 Dave Chinner 提交于 9月 18, 2011

Now that all the read-only users of xfs_bmapi have been converted to
use xfs_bmapi_read(), we can remove all the read-only handling cases
from xfs_bmapi().

Once this is done, rename xfs_bmapi to xfs_bmapi_write to reflect
the fact it is for allocation only. This enables us to kill the
XFS_BMAPI_WRITE flag as well.

Also clean up xfs_bmapi_write to the style used in the newly added
xfs_bmapi_read/delay functions.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c0dc7828

xfs: factor unwritten extent map manipulations out of xfs_bmapi · b447fe5a

由 Dave Chinner 提交于 9月 18, 2011

To further improve the readability of xfs_bmapi(), factor the
unwritten extent conversion out into a separate function. This
removes large block of logic from the xfs_bmapi() code loop and
makes it easier to see the operational logic flow for xfs_bmapi().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b447fe5a

xfs: factor extent allocation out of xfs_bmapi · 7e47a4ef

由 Dave Chinner 提交于 9月 18, 2011

To further improve the readability of xfs_bmapi(), factor the extent
allocation out into a separate function. This removes a large block
of logic from the xfs_bmapi() code loop and makes it easier to see
the operational logic flow for xfs_bmapi().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

7e47a4ef

xfs: do not use xfs_bmap_add_extent for adding delalloc extents · 1fd044d9

由 Christoph Hellwig 提交于 9月 18, 2011

We can just call xfs_bmap_add_extent_hole_delay directly to add a
delayed allocated regions to the extent tree, instead of going
through all the complexities of xfs_bmap_add_extent that aren't
needed for this simple case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1fd044d9

xfs: introduce xfs_bmapi_delay() · 4403280a

由 Christoph Hellwig 提交于 9月 18, 2011

Delalloc reservations are much simpler than allocations, so give
them a separate bmapi-level interface.  Using the previously added
xfs_bmapi_reserve_delalloc we get a function that is only minimally
more complicated than xfs_bmapi_read, which is far from the complexity
in xfs_bmapi.  Also remove the XFS_BMAPI_DELAY code after switching
over the only user to xfs_bmapi_delay.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4403280a

xfs: factor delalloc reservations out of xfs_bmapi · b64dfe4e

由 Christoph Hellwig 提交于 9月 18, 2011

Move the reservation of delayed allocations, and addition of delalloc
regions to the extent trees into a new helper function.  For now
this adds some twisted goto logic to xfs_bmapi, but that will be
cleaned up in the following patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b64dfe4e

xfs: remove xfs_bmapi_single() · 5b777ad5

由 Dave Chinner 提交于 9月 18, 2011

Now we have xfs_bmapi_read, there is no need for xfs_bmapi_single().
Change the remaining caller over and kill the function.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5b777ad5

xfs: introduce xfs_bmapi_read() · 5c8ed202

由 Dave Chinner 提交于 9月 18, 2011

xfs_bmapi() currently handles both extent map reading and
allocation. As a result, the code is littered with "if (wr)"
branches to conditionally do allocation operations if required.
This makes the code much harder to follow and causes significant
indent issues with the code.

Given that read mapping is much simpler than allocation, we can
split out read mapping from xfs_bmapi() and reuse the logic that
we have already factored out do do all the hard work of handling the
extent map manipulations. The results in a much simpler function for
the common extent read operations, and will allow the allocation
code to be simplified in another commit.

Once xfs_bmapi_read() is implemented, convert all the callers of
xfs_bmapi() that are only reading extents to use the new function.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5c8ed202

xfs: factor extent map manipulations out of xfs_bmapi · aef9a895

由 Dave Chinner 提交于 9月 18, 2011

To further improve the readability of xfs_bmapi(), factor the pure
extent map manipulations out into separate functions. This removes
large blocks of logic from the xfs_bmapi() code loop and makes it
easier to see the operational logic flow for xfs_bmapi().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

aef9a895

xfs: remove the nextents variable in xfs_bmapi · ecee76ba

由 Christoph Hellwig 提交于 9月 18, 2011

Instead of using a local variable that needs to updated when we modify
the extent map just check ifp->if_bytes directly where we use it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ecee76ba

xfs: remove impossible to read code in xfs_bmap_add_extent_delay_real · b9b984d7

由 Christoph Hellwig 提交于 9月 18, 2011

We already have the worst case blocks reserved, so xfs_icsb_modify_counters
won't fail in xfs_bmap_add_extent_delay_real.  In fact we've had an assert
to catch this case since day and it never triggered.  So remove the code
to try smaller reservations, and just return the error for that case in
addition to keeping the assert.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b9b984d7

xfs: remove the first extent special case in xfs_bmap_add_extent · e7455e02

由 Christoph Hellwig 提交于 9月 18, 2011

Both xfs_bmap_add_extent_hole_delay and xfs_bmap_add_extent_hole_real
already contain code to handle the case where there is no extent to
merge with, which is effectively the same as the code duplicated here.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e7455e02

xfs: Return -EIO when xfs_vn_getattr() failed · ed32201e

由 Mitsuo Hayasaka 提交于 9月 17, 2011

An attribute of inode can be fetched via xfs_vn_getattr() in XFS.
Currently it returns EIO, not negative value, when it failed.  As a
result, the system call returns not negative value even though an
error occured. The stat(2), ls and mv commands cannot handle this
error and do not work correctly.

This patch fixes this bug, and returns -EIO, not EIO when an error
is detected in xfs_vn_getattr().
Signed-off-by: NMitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ed32201e

xfs: Fix the incorrect comment in the header of _xfs_buf_find · eabbaf11

由 Chandra Seetharaman 提交于 9月 08, 2011

Fix the incorrect comment in the header of the function
_xfs_buf_find().
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

eabbaf11

xfs: Check the return value of xfs_trans_get_buf() · 2a30f36d

由 Chandra Seetharaman 提交于 9月 20, 2011

Check the return value of xfs_trans_get_buf() and fail
appropriately.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

2a30f36d

xfs: Check the return value of xfs_buf_get() · b522950f

由 Chandra Seetharaman 提交于 9月 07, 2011

Check the return value of xfs_buf_get() and fail appropriately.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b522950f

xfs: improve ioend error handling · 04f658ee

由 Christoph Hellwig 提交于 8月 24, 2011

Return unwritten extent conversion errors to aio_complete.

Skip both unwritten extent conversion and size updates if we had an
I/O error or the filesystem has been shut down.

Return -EIO to the aio/buffer completion handlers in case of a
forced shutdown.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

04f658ee

xfs: avoid direct I/O write vs buffered I/O race · c58cb165

由 Christoph Hellwig 提交于 8月 27, 2011

Currently a buffered reader or writer can add pages to the pagecache
while we are waiting for the iolock in xfs_file_dio_aio_write.  Prevent
this by re-checking mapping->nrpages after we got the iolock, and if
nessecary upgrade the lock to exclusive mode.  To simplify this a bit
only take the ilock inside of xfs_file_aio_write_checks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c58cb165

xfs: avoid synchronous transactions when deleting attr blocks · 859f57ca

由 Christoph Hellwig 提交于 8月 27, 2011

Currently xfs_attr_inactive causes a synchronous transactions if we are
removing a file that has any extents allocated to the attribute fork, and
thus makes XFS extremely slow at removing files with out of line extended
attributes. The code looks a like a relict from the days before the busy
extent list, but with the busy extent list we avoid reusing data and attr
extents that have been freed but not commited yet, so this code is just
as superflous as the synchronous transactions for data blocks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

859f57ca

xfs: remove i_iocount · 4a06fd26

由 Christoph Hellwig 提交于 8月 23, 2011

We now have an i_dio_count filed and surrounding infrastructure to wait
for direct I/O completion instead of i_icount, and we have never needed
to iocount waits for buffered I/O given that we only set the page uptodate
after finishing all required work.  Thus remove i_iocount, and replace
the actually needed waits with calls to inode_dio_wait.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4a06fd26

xfs: wait for I/O completion when writing out pages in xfs_setattr_size · 2b3ffd7e

由 Christoph Hellwig 提交于 8月 23, 2011

The current code relies on the xfs_ioend_wait call later on to make sure
all I/O actually has completed.  The xfs_ioend_wait call will go away soon,
so prepare for that by using the waiting filemap function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

2b3ffd7e

xfs: reduce ioend latency · fc0063c4

由 Christoph Hellwig 提交于 8月 23, 2011

There is no reason to queue up ioends for processing in user context
unless we actually need it.  Just complete ioends that do not convert
unwritten extents or need a size update from the end_io context.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

fc0063c4

xfs: defer AIO/DIO completions · c859cdd1

由 Christoph Hellwig 提交于 8月 23, 2011

We really shouldn't complete AIO or DIO requests until we have finished
the unwritten extent conversion and size update.  This means fsync never
has to pick up any ioends as all work has been completed when signalling
I/O completion.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c859cdd1

xfs: remove dead ENODEV handling in xfs_destroy_ioend · 398d25ef

由 Christoph Hellwig 提交于 8月 23, 2011

No driver returns ENODEV from it bio completion handler, not has this
ever been documented.  Remove the dead code dealing with it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

398d25ef

xfs: use the "delwri" terminology consistently · c4e1c098

由 Christoph Hellwig 提交于 8月 23, 2011

And also remove the strange local lock and delwri list pointers in a few
functions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c4e1c098

xfs: let xfs_bwrite callers handle the xfs_buf_relse · c2b006c1

由 Christoph Hellwig 提交于 8月 23, 2011

Remove the xfs_buf_relse from xfs_bwrite and let the caller handle it to
mirror the delwri and read paths.

Also remove the mount pointer passed to xfs_bwrite, which is superflous now
that we have a mount pointer in the buftarg.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c2b006c1

xfs: call xfs_buf_delwri_queue directly · 61551f1e

由 Christoph Hellwig 提交于 8月 23, 2011

Unify the ways we add buffers to the delwri queue by always calling
xfs_buf_delwri_queue directly.  The xfs_bdwrite functions is removed and
opencoded in its callers, and the two places setting XBF_DELWRI while a
buffer is locked and expecting xfs_buf_unlock to pick it up are converted
to call xfs_buf_delwri_queue directly, too.  Also replace the
XFS_BUF_UNDELAYWRITE macro with direct calls to xfs_buf_delwri_dequeue
to make the explicit queuing/dequeuing more obvious.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

61551f1e

xfs: move more delwri setup into xfs_buf_delwri_queue · 5a8ee6ba

由 Christoph Hellwig 提交于 8月 23, 2011

Do not transfer a reference held by the caller to the buffer on the list,
or decrement it in xfs_buf_delwri_queue, but instead grab a new reference
if needed, and let the caller drop its own reference.  Also move setting
of the XBF_DELWRI and XBF_ASYNC flags into xfs_buf_delwri_queue, and
only do it if needed.  Note that for now xfs_buf_unlock already has
XBF_DELWRI, but that will change in the following patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5a8ee6ba

xfs: remove the unlock argument to xfs_buf_delwri_queue · 527cfdf1

由 Christoph Hellwig 提交于 8月 23, 2011

We can just unlock the buffer in the caller, and the decrement of b_hold
would also be needed in the !unlock, we just never hit that case currently
given that the caller handles that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

527cfdf1

xfs: remove delwri buffer handling from xfs_buf_iorequest · 375ec69d

由 Christoph Hellwig 提交于 8月 23, 2011

We cannot ever reach xfs_buf_iorequest for a buffer with XBF_DELWRI set,
given that all write handlers make sure that the buffer is remove from
the delwri queue before, and we never do reads with the XBF_DELWRI flag
set (which the code would not handle correctly anyway).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

375ec69d

xfs: don't serialise adjacent concurrent direct IO appending writes · 7271d243

由 Dave Chinner 提交于 8月 25, 2011

For append write workloads, extending the file requires a certain
amount of exclusive locking to be done up front to ensure sanity in
things like ensuring that we've zeroed any allocated regions
between the old EOF and the start of the new IO.

For single threads, this typically isn't a problem, and for large
IOs we don't serialise enough for it to be a problem for two
threads on really fast block devices. However for smaller IO and
larger thread counts we have a problem.

Take 4 concurrent sequential, single block sized and aligned IOs.
After the first IO is submitted but before it completes, we end up
with this state:

        IO 1    IO 2    IO 3    IO 4
      +-------+-------+-------+-------+
      ^       ^
      |       |
      |       |
      |       |
      |       \- ip->i_new_size
      \- ip->i_size

And the IO is done without exclusive locking because offset <=
ip->i_size. When we submit IO 2, we see offset > ip->i_size, and
grab the IO lock exclusive, because there is a chance we need to do
EOF zeroing. However, there is already an IO in progress that avoids
the need for IO zeroing because offset <= ip->i_new_size. hence we
could avoid holding the IO lock exlcusive for this. Hence after
submission of the second IO, we'd end up this state:

        IO 1    IO 2    IO 3    IO 4
      +-------+-------+-------+-------+
      ^               ^
      |               |
      |               |
      |               |
      |               \- ip->i_new_size
      \- ip->i_size

There is no need to grab the i_mutex of the IO lock in exclusive
mode if we don't need to invalidate the page cache. Taking these
locks on every direct IO effective serialises them as taking the IO
lock in exclusive mode has to wait for all shared holders to drop
the lock. That only happens when IO is complete, so effective it
prevents dispatch of concurrent direct IO writes to the same inode.

And so you can see that for the third concurrent IO, we'd avoid
exclusive locking for the same reason we avoided the exclusive lock
for the second IO.

Fixing this is a bit more complex than that, because we need to hold
a write-submission local value of ip->i_new_size to that clearing
the value is only done if no other thread has updated it before our
IO completes.....
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

7271d243

xfs: don't serialise direct IO reads on page cache checks · 0c38a251

由 Dave Chinner 提交于 8月 25, 2011

There is no need to grab the i_mutex of the IO lock in exclusive
mode if we don't need to invalidate the page cache. Taking these
locks on every direct IO effective serialises them as taking the IO
lock in exclusive mode has to wait for all shared holders to drop
the lock. That only happens when IO is complete, so effective it
prevents dispatch of concurrent direct IO reads to the same inode.

Fix this by taking the IO lock shared to check the page cache state,
and only then drop it and take the IO lock exclusively if there is
work to be done. Hence for the normal direct IO case, no exclusive
locking will occur.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Tested-by: NJoern Engel <joern@logfs.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0c38a251

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功