提交 · 7d6a7bde52e449f21a0e86a7a4955b4e08a49d69 · openanolis / cloud-kernel

26 1月, 2010 2 次提交

xfs: Use delay write promotion for dquot flushing · 7d6a7bde

由 Dave Chinner 提交于 1月 26, 2010

xfs_qm_dqflock_pushbuf_wait() does a very similar trick to item
pushing used to do to flush out delayed write dquot buffers. Change
it to use the new promotion method rather than an async flush.

Also, xfs_qm_dqflock_pushbuf_wait() can return without the flush lock
held, yet the callers make the assumption that after this call the
flush lock is held. Always return with the flush lock held.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7d6a7bde

xfs: Sort delayed write buffers before dispatch · 089716aa

由 Dave Chinner 提交于 1月 26, 2010

Currently when the xfsbufd writes delayed write buffers, it pushes
them to disk in the order they come off the delayed write list. If
there are lots of buffers ѕpread widely over the disk, this results
in overwhelming the elevator sort queues in the block layer and we
end up losing the posibility of merging adjacent buffers to minimise
the number of IOs.

Use the new generic list_sort function to sort the delwri dispatch
queue before issue to ensure that the buffers are pushed in the most
friendly order possible to the lower layers.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

089716aa

02 2月, 2010 1 次提交

xfs: Don't issue buffer IO direct from AIL push V2 · d808f617

由 Dave Chinner 提交于 2月 02, 2010

All buffers logged into the AIL are marked as delayed write.
When the AIL needs to push the buffer out, it issues an async write of the
buffer. This means that IO patterns are dependent on the order of
buffers in the AIL.

Instead of flushing the buffer, promote the buffer in the delayed
write list so that the next time the xfsbufd is run the buffer will
be flushed by the xfsbufd. Return the state to the xfsaild that the
buffer was promoted so that the xfsaild knows that it needs to cause
the xfsbufd to run to flush the buffers that were promoted.

Using the xfsbufd for issuing the IO allows us to dispatch all
buffer IO from the one queue. This means that we can make much more
enlightened decisions on what order to flush buffers to disk as
we don't have multiple places issuing IO. Optimisations to xfsbufd
will be in a future patch.

Version 2
- kill XFS_ITEM_FLUSHING as it is now unused.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

d808f617

06 2月, 2010 2 次提交

xfs: Use delayed write for inodes rather than async V2 · c854363e

由 Dave Chinner 提交于 2月 06, 2010

We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them. Not only that, there are also blocking and non-blocking
asynchronous inode flushes, depending on where the flush comes from.

This patch completely removes asynchronous inode writeback. It
removes all the strange writeback modes and replaces them with
either a synchronous flush or a non-blocking delayed write flush.
That is, inode flushes will only issue IO directly if they are
synchronous, and background flushing may do nothing if the operation
would block (e.g. on a pinned inode or buffer lock).

Delayed write flushes will now result in the inode buffer sitting in
the delwri queue of the buffer cache to be flushed by either an AIL
push or by the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators. We will also
get adjacent inode cluster buffer IO merging for free when a later
patch in the series allows sorting of the delayed write buffers
before dispatch.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
This writeback path is currently non-optimal, but the next patch
in the series will fix that problem.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers. As a result, the
inode reclaim code has been rewritten so that it no longer relies on
the ambiguous return values of xfs_iflush() to determine whether it
is safe to reclaim an inode.

Portions of this patch are derived from patches by Christoph
Hellwig.

Version 2:
- cleanup reclaim code as suggested by Christoph
- log background reclaim inode flush errors
- just pass sync flags to xfs_iflush
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

c854363e

xfs: Make inode reclaim states explicit · 777df5af

由 Dave Chinner 提交于 2月 06, 2010

A.K.A.: don't rely on xfs_iflush() return value in reclaim

We have gradually been moving checks out of the reclaim code because
they are duplicated in xfs_iflush(). We've had a history of problems
in this area, and many of them stem from the overloading of the
return values from xfs_iflush() and interaction with inode flush
locking to determine if the inode is safe to reclaim.

With the desire to move to delayed write flushing of inodes and
non-blocking inode tree reclaim walks, the overloading of the
return value of xfs_iflush makes it very difficult to determine
the correct thing to do next.

This patch explicitly re-adds the checks to the inode reclaim code,
removing the reliance on the return value of xfs_iflush() to
determine what to do next. It also means that we can clearly
document all the inode states that reclaim must handle and hence
we can easily see that we handled all the necessary cases.

This also removes the need for the xfs_inode_clean() check in
xfs_iflush() as all callers now check this first (safely).
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

777df5af

09 2月, 2010 1 次提交

xfs: more reserved blocks fixups · d5db0f97

由 Eric Sandeen 提交于 2月 05, 2010

This mangles the reserved blocks counts a little more.

1) add a helper function for the default reserved count
2) add helper functions to save/restore counts on ro/rw
3) save/restore reserved blocks on freeze/thaw
4) disallow changing reserved count while readonly

V2: changed field name to match Dave's changes
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NAlex Elder <aelder@sgi.com>

d5db0f97

26 1月, 2010 2 次提交

xfs: turn off sign warnings · 388f1f0c

由 Dave Chinner 提交于 1月 26, 2010

Because they cause warnings in static inline functions conditionally
compiled into XFS from the VFS (e.g. fsnotify).
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

388f1f0c

xfs: don't hold onto reserved blocks on remount,ro · cbe132a8

由 Dave Chinner 提交于 1月 26, 2010

If we hold onto reserved blocks when doing a remount,ro we end
up writing the blocks used count to disk that includes the reserved
blocks. Reserved blocks are not actually used, so this results in
the values in the superblock being incorrect.

Hence if we run xfs_check or xfs_repair -n while the filesystem is
mounted remount,ro we end up with an inconsistent filesystem being
reported. Also, running xfs_copy on the remount,ro filesystem will
result in an inconsistent image being generated.

To fix this, unreserve the blocks when doing the remount,ro, and
reserved them again on remount,rw. This way a remount,ro filesystem
will appear consistent on disk to all utilities.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

cbe132a8

22 1月, 2010 8 次提交

xfs: quota limit statvfs available blocks · 9b00f307

由 Christoph Hellwig 提交于 1月 21, 2010

A "df" run on an NFS client of an exported XFS file system reports
the wrong information for "available" blocks.  When a block quota is
enforced, the amount reported as free is limited by the quota, but
the amount reported available is not (and should be).
Reported-by: NGuk-Bong, Kwon <gbkwon@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

9b00f307

xfs: replace KM_LARGE with explicit vmalloc use · bdfb0430

由 Christoph Hellwig 提交于 1月 20, 2010

We use the KM_LARGE flag to make kmem_alloc and friends use vmalloc
if necessary.  As we only need this for a few boot/mount time
allocations just switch to explicit vmalloc calls there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

bdfb0430

xfs: cleanup up xfs_log_force calling conventions · a14a348b

由 Christoph Hellwig 提交于 1月 19, 2010

Remove the XFS_LOG_FORCE argument which was always set, and the
XFS_LOG_URGE define, which was never used.

Split xfs_log_force into a two helpers - xfs_log_force which forces
the whole log, and xfs_log_force_lsn which forces up to the
specified LSN.  The underlying implementations already were entirely
separate, as were the users.

Also re-indent the new _xfs_log_force/_xfs_log_force which
previously had a weird coding style.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a14a348b

xfs: kill XLOG_VEC_SET_TYPE · 4139b3b3

由 Christoph Hellwig 提交于 1月 19, 2010

This macro only obsfucates the log item type assignments, so kill it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4139b3b3

xfs: remove duplicate buffer flags · 0cadda1c

由 Christoph Hellwig 提交于 1月 19, 2010

Currently we define aliases for the buffer flags in various
namespaces, which only adds confusion.  Remove all but the XBF_
flags to clean this up a bit.

Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer
uses, but I'll clean that up later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0cadda1c

xfs: implement quota warnings via netlink · a210c1aa

由 Christoph Hellwig 提交于 1月 17, 2010

Wire up quota_send_warning to send quota warnings over netlink.
This is used by various desktops to show user quota warnings.

Tested by running the quota_nld daemon while running the xfstest
quota tests and observing the warnings.  I'll see how I can get a
more formal testcase for it written.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a210c1aa

xfs: clean up error handling in xfs_trans_dqresv · 4d1f88d7

由 Christoph Hellwig 提交于 1月 13, 2010

Move the error code selection after the goto label and fold the
xfs_quota_error helper into it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4d1f88d7

xfs: kill XFS_QMOPT_ASYNC · 512dd1ab

由 Christoph Hellwig 提交于 1月 13, 2010

The option is unused and one of the few remaining users of
xfs_bawrite, so let's get rid of it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

512dd1ab

20 1月, 2010 10 次提交

xfs: rearrange xfs_mod_sb() to avoid array subscript warning · 587aa0fe

由 Dave Chinner 提交于 1月 20, 2010

gcc warns of an array subscript out of bounds in xfs_mod_sb().
The code is written in such a way that if the array subscript is
out of bounds, then it will assert fail. Rearrange the code to
avoid the bounds check warning.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

587aa0fe

xfs: suppress spurious uninitialised var warning in xfs_bmapi() · f0a0eaa8

由 Dave Chinner 提交于 1月 20, 2010

Initialise the xfs_bmalloca_t structure to zero to avoid uninitialised
variable warnings. This is done by zeroing the arg structure rather than
using the uninitialised_var() trick so we know for certain that the
structure is correctly initialised as xfs_bmapi is a very complex
function and it is difficult to prove warnings are spurious.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f0a0eaa8

xfs: make compile warn about char sign mismatches again · 58c75cfb

由 Dave Chinner 提交于 1月 20, 2010

The -fno-unsigned-char directive has no effect anymore as the
XFs build is clean. However, the kernel build hides pointer sign
differences so turn that back on so that we can clean up all the
mismatches prior to a userspace code resync.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

58c75cfb

xfs: clean up sign warnings in dir2 code · 4a24cb71

由 Dave Chinner 提交于 1月 20, 2010

We are now consistently using unsigned char strings for names
so fix up the remaining warnings in the dir2 code to complete
the cleanup.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4a24cb71

xfs: convert attr to use unsigned names · a9273ca5

由 Dave Chinner 提交于 1月 20, 2010

To be consistent with the directory code, the attr code should use
unsigned names. Convert the names from the vfs at the highest level
to unsigned, and ænsure they are consistenly used as unsigned down
to disk.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a9273ca5

xfs: xfs_buf_iomove() doesn't care about signedness · b9c48649

由 Dave Chinner 提交于 1月 20, 2010

xfs_buf_iomove() uses xfs_caddr_t as it's parameter types, but it doesn't
care about the signedness of the variables as it is just copying the
data. Change the prototype to use void * so that we don't get sign
warnings at call sites.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

b9c48649

xfs: make xfs_dir_cilookup_result use unsigned char · a3380ae3

由 Dave Chinner 提交于 1月 20, 2010

For consistency with the result of the code.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a3380ae3

xfs: convert dirnameops to unsigned char names · 2bc75421

由 Dave Chinner 提交于 1月 20, 2010

To be consistent across the codebase, convert the dirnameops to pass
the directory names by unsigned char strings.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

2bc75421

xfs: convert DM ops to use unsigned char names · 046ea753

由 Dave Chinner 提交于 1月 20, 2010

dmops uses a signed char for it's namespace event. To be consistent
with the rest of the code, convert them to unsigned char for the
namespace string.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

046ea753

xfs: directory names are unsigned · e2bcd936

由 Dave Chinner 提交于 1月 20, 2010

Convert the struct xfs_name to use unsigned chars for the name
strings to match both what is stored on disk (__uint8_t) and what
the VFS expects (unsigned char).
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

e2bcd936

16 1月, 2010 14 次提交

xfs: move more buffer helpers into xfs_buf.c · 4e23471a

由 Christoph Hellwig 提交于 1月 13, 2010

Move xfsbdstrat and xfs_bdstrat_cb from xfs_lrw.c and xfs_bioerror
and xfs_bioerror_relse from xfs_rw.c into xfs_buf.c.  This also
means xfs_bioerror and xfs_bioerror_relse can be marked static now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4e23471a

xfs: clean up xfs_bwrite · 64e0bc7d

由 Christoph Hellwig 提交于 1月 13, 2010

Fold XFS_bwrite into it's only caller, xfs_bwrite and move it into
xfs_buf.c instead of leaving it as a fairly large inline function.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

64e0bc7d

xfs: clean up log buffer writes · 873ff550

由 Christoph Hellwig 提交于 1月 13, 2010

Don't bother using XFS_bwrite as it doesn't provide much code for
our use case.  Instead opencode it and fold xlog_bdstrat_cb into the
new xlog_bdstrat helper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

873ff550

xfs: embed the pagb_list array in the perag structure · e57336ff

由 Dave Chinner 提交于 1月 11, 2010

Now that the perag structure is allocated memory rather than held in
an array, we don't need to have the busy extent array external to
the structure. Embed it into the perag structure to avoid needing an
extra allocation when setting up.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e57336ff

xfs: handle ENOMEM correctly during initialisation of perag structures · 8b26c582

由 Dave Chinner 提交于 1月 11, 2010

Add proper error handling in case an error occurs while initializing
new perag structures for a mount point.  The mount structure is
restored to its previous state by deleting and freeing any perag
structures added during the call.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

8b26c582

xfs: Kill filestreams cache flush · b657fc82

由 Dave Chinner 提交于 1月 11, 2010

The filestreams cache flush is not needed in the sync code as it
does not affect data writeback, and it is now not used by the growfs
code, either, so kill it.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b657fc82

xfs: Add trace points for per-ag refcount debugging. · 0fa800fb

由 Dave Chinner 提交于 1月 11, 2010

Uninline xfs_perag_{get,put} so that tracepoints can be inserted
into them to speed debugging of reference count problems.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0fa800fb

xfs: Reference count per-ag structures · aed3bb90

由 Dave Chinner 提交于 1月 11, 2010

Reference count the per-ag structures to ensure that we keep get/put
pairs balanced. Assert that the reference counts are zero at unmount
time to catch leaks. In future, reference counts will enable us to
safely remove perag structures by allowing us to detect when they
are no longer in use.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

aed3bb90

xfs: Replace per-ag array with a radix tree · 1c1c6ebc

由 Dave Chinner 提交于 1月 11, 2010

The use of an array for the per-ag structures requires reallocation
of the array when growing the filesystem. This requires locking
access to the array to avoid use after free situations, and the
locking is difficult to get right. To avoid needing to reallocate an
array, change the per-ag structures to an allocated object per ag
and index them using a tree structure.

The AGs are always densely indexed (hence the use of an array), but
the number supported is 2^32 and lookups tend to be random and hence
indexing needs to scale. A simple choice is a radix tree - it works
well with this sort of index. This change also removes another
large contiguous allocation from the mount/growfs path in XFS.

The growing process now needs to change to only initialise the new
AGs required for the extra space, and as such only needs to
exclusively lock the tree for inserts. The rest of the code only
needs to lock the tree while doing lookups, and hence this will
remove all the deadlocks that currently occur on the m_perag_lock as
it is now an innermost lock. The lock is also changed to a spinlock
from a read/write lock as the hold time is now extremely short.

To complete the picture, the per-ag structures will need to be
reference counted to ensure that we don't free/modify them while
they are still in use. This will be done in subsequent patch.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1c1c6ebc

xfs: convert remaining direct references to m_perag · 44b56e0a

由 Dave Chinner 提交于 1月 11, 2010

Convert the remaining direct lookups of the per ag structures to use
get/put accesses. Ensure that the loops across AGs and prior users
of the interface balance gets and puts correctly.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

44b56e0a

xfs: Convert filestreams code to use per-ag get/put routines · 4196ac08

由 Dave Chinner 提交于 1月 11, 2010

Use xfs_perag_get() and xfs_perag_put() in the filestreams code.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

4196ac08

xfs: Don't directly reference m_perag in allocation code · a862e0fd

由 Dave Chinner 提交于 1月 11, 2010

Start abstracting the perag references so that the indexing of the
structures is not directly coded into all the places that uses the
perag structures. This will allow us to separate the use of the
perag structure and the way it is indexed and hence avoid the known
deadlocks related to growing a busy filesystem.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a862e0fd

xfs: rename xfs_get_perag · 5017e97d

由 Dave Chinner 提交于 1月 11, 2010

xfs_get_perag is really getting the perag that an inode belongs to
based on it's inode number. Convert the use of this function to just
get the perag from a provided ag number.  Use this new function to
obtain the per-ag structure when traversing the per AG inode trees
for sync and reclaim.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5017e97d

xfs: Don't wake xfsbufd when idle · c9c12971

由 Dave Chinner 提交于 1月 11, 2010

The xfsbufd wakes every xfsbufd_centisecs (once per second by
default) for each filesystem even when the filesystem is idle.  If
the xfsbufd has nothing to do, put it into a long term sleep and
only wake it up when there is work pending (i.e. dirty buffers to
flush soon). This will make laptop power misers happy.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c9c12971

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功