提交 · 43ff2122e6492bcc88b065c433453dce88223b30 · openeuler / raspberrypi-kernel

15 5月, 2012 2 次提交

xfs: on-stack delayed write buffer lists · 43ff2122

由 Christoph Hellwig 提交于 4月 23, 2012

Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
and write back the buffers per-process instead of by waking up xfsbufd.

This is now easily doable given that we have very few places left that write
delwri buffers:

 - log recovery:
	Only done at mount time, and already forcing out the buffers
	synchronously using xfs_flush_buftarg

 - quotacheck:
	Same story.

 - dquot reclaim:
	Writes out dirty dquots on the LRU under memory pressure.  We might
	want to look into doing more of this via xfsaild, but it's already
	more optimal than the synchronous inode reclaim that writes each
	buffer synchronously.

 - xfsaild:
	This is the main beneficiary of the change.  By keeping a local list
	of buffers to write we reduce latency of writing out buffers, and
	more importably we can remove all the delwri list promotions which
	were hitting the buffer cache hard under sustained metadata loads.

The implementation is very straight forward - xfs_buf_delwri_queue now gets
a new list_head pointer that it adds the delwri buffers to, and all callers
need to eventually submit the list using xfs_buf_delwi_submit or
xfs_buf_delwi_submit_nowait.  Buffers that already are on a delwri list are
skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
list.  The biggest change to pass down the buffer list was done to the AIL
pushing. Now that we operate on buffers the trylock, push and pushbuf log
item methods are merged into a single push routine, which tries to lock the
item, and if possible add the buffer that needs writeback to the buffer list.
This leads to much simpler code than the previous split but requires the
individual IOP_PUSH instances to unlock and reacquire the AIL around calls
to blocking routines.

Given that xfsailds now also handle writing out buffers, the conditions for
log forcing and the sleep times needed some small changes.  The most
important one is that we consider an AIL busy as long we still have buffers
to push, and the other one is that we do increment the pushed LSN for
buffers that are under flushing at this moment, but still count them towards
the stuck items for restart purposes.  Without this we could hammer on stuck
items without ever forcing the log and not make progress under heavy random
delete workloads on fast flash storage devices.

[ Dave Chinner:
	- rebase on previous patches.
	- improved comments for XBF_DELWRI_Q handling
	- fix XBF_ASYNC handling in queue submission (test 106 failure)
	- rename delwri submit function buffer list parameters for clarity
	- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

43ff2122

xfs: do not write the buffer from xfs_iflush · 4c46819a

由 Christoph Hellwig 提交于 4月 23, 2012

Instead of writing the buffer directly from inside xfs_iflush return it to
the caller and let the caller decide what to do with the buffer.  Also
remove the pincount check in xfs_iflush that all non-blocking callers already
implement and the now unused flags parameter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

4c46819a

14 3月, 2012 4 次提交

xfs: reimplement fdatasync support · 8f639dde

由 Christoph Hellwig 提交于 2月 29, 2012

Add an in-memory only flag to say we logged timestamps only, and use it to
check if fdatasync can optimize away the log force.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

8f639dde

xfs: split in-core and on-disk inode log item fields · f5d8d5c4

由 Christoph Hellwig 提交于 2月 29, 2012

Add a new ili_fields member to the inode log item to isolate the in-memory
flags from the ones that actually go to the log.  This will allow tracking
timestamp-only updates for fdatasync and O_DSYNC in the next patch and
prepares for divorcing the on-disk log format from the in-memory log item
a little further down the road.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

f5d8d5c4

xfs: make xfs_inode_item_size idempotent · 339a5f5d

由 Christoph Hellwig 提交于 2月 29, 2012

Move all code messing with the inode log item flags into xfs_inode_item_format
to make sure xfs_inode_item_size really only calculates the the number of
vectors, but doesn't modify any state of the inode item.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

339a5f5d

xfs: log timestamp updates · 8a9c9980

由 Christoph Hellwig 提交于 2月 29, 2012

Timestamps on regular files are the last metadata that XFS does not update
transactionally.  Now that we use the delaylog mode exclusively and made
the log scode scale extremly well there is no need to bypass that code for
timestamp updates.  Logging all updates allows to drop a lot of code, and
will allow for further performance improvements later on.

Note that this patch drops optimized handling of fdatasync - it will be
added back in a separate commit.
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

8a9c9980

23 2月, 2012 1 次提交

xfs: remove xfs_trans_unlocked_item · 5b03ff1b

由 Christoph Hellwig 提交于 2月 20, 2012

There is no reason to wake up log space waiters when unlocking inodes or
dquots, and the commit log has no explanation for this function either.

Given that we now have exact log space wakeups everywhere we can assume
the reason for this function was to paper over log space races in earlier
XFS versions.
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

5b03ff1b

18 1月, 2012 3 次提交

xfs: replace i_pin_wait with a bit waitqueue · f392e631

由 Christoph Hellwig 提交于 12月 18, 2011

Replace i_pin_wait, which is only used during synchronous inode flushing
with a bit waitqueue.  This trades off a much smaller inode against
slightly slower wakeup performance, and saves 12 (32-bit) or 20 (64-bit)
bytes in the XFS inode.
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBen Myers <bpm@sgi.com>

f392e631

xfs: replace i_flock with a sleeping bitlock · 474fce06

由 Christoph Hellwig 提交于 12月 18, 2011

We almost never block on i_flock, the exception is synchronous inode
flushing.  Instead of bloating the inode with a 16/24-byte completion
that we abuse as a semaphore just implement it as a bitlock that uses
a bit waitqueue for the rare sleeping path.  This primarily is a
tradeoff between a much smaller inode and a faster non-blocking
path vs faster wakeups, and we are much better off with the former.

A small downside is that we will lose lockdep checking for i_flock, but
given that it's always taken inside the ilock that should be acceptable.

Note that for example the inode writeback locking is implemented in a
very similar way.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

474fce06

xfs: remove the if_ext_max field in struct xfs_ifork · 8096b1eb

由 Christoph Hellwig 提交于 12月 18, 2011

We spent a lot of effort to maintain this field, but it always equals to the
fork size divided by the constant size of an extent.  The prime use of it is
to assert that the two stay in sync.  Just divide the fork size by the extent
size in the few places that we actually use it and remove the overhead
of maintaining it.  Also introduce a few helpers to consolidate the places
where we actually care about the value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

8096b1eb

09 12月, 2011 1 次提交

xfs: remove the lid_size field in struct log_item_desc · b3934213

由 Christoph Hellwig 提交于 12月 06, 2011

Outside the now removed nodelaylog code this field is only used for
asserts and can be safely removed now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

b3934213

09 11月, 2011 1 次提交

xfs: constify xfs_item_ops · 272e42b2

由 Christoph Hellwig 提交于 10月 28, 2011

The log item ops aren't nessecarily the biggest exploit vector, but marking
them const is easy enough.  Also remove the unused xfs_item_ops_t typedef
while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

272e42b2

12 10月, 2011 2 次提交

xfs: simplify xfs_trans_ijoin* again · ddc3415a

由 Christoph Hellwig 提交于 9月 19, 2011

There is no reason to keep a reference to the inode even if we unlock
it during transaction commit because we never drop a reference between
the ijoin and commit.  Also use this fact to merge xfs_trans_ijoin_ref
back into xfs_trans_ijoin - the third argument decides if an unlock
is needed now.

I'm actually starting to wonder if allowing inodes to be unlocked
at transaction commit really is worth the effort.  The only real
benefit is that they can be unlocked earlier when commiting a
synchronous transactions, but that could be solved by doing the
log force manually after the unlock, too.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ddc3415a

xfs: force the log if we encounter pinned buffers in .iop_pushbuf · 17b38471

由 Christoph Hellwig 提交于 10月 11, 2011

We need to check for pinned buffers even in .iop_pushbuf given that inode
items flush into the same buffers that may be pinned directly due operations
on the unlinked inode list operating directly on buffers.  To do this add a
return value to .iop_pushbuf that tells the AIL push about this and use
the existing log force mechanisms to unpin it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NStefan Priebe <s.priebe@profihost.ag>
Tested-by: NStefan Priebe <s.priebe@profihost.ag>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

17b38471

13 7月, 2011 1 次提交

xfs: remove wrappers around b_fspriv · adadbeef

由 Christoph Hellwig 提交于 7月 13, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

adadbeef

08 7月, 2011 1 次提交

xfs: remove i_transp · f3ca8738

由 Christoph Hellwig 提交于 7月 08, 2011

Remove the transaction pointer in the inode.  It's only used to avoid
passing down an argument in the bmap code, and for a few asserts in
the transaction code right now.

Also use the local variable ip in a few more places in xfs_inode_item_unlock,
so that it isn't only used for debug builds after the above change.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

f3ca8738

07 7月, 2011 1 次提交

xfs: unpin stale inodes directly in IOP_COMMITTED · 1316d4da

由 Dave Chinner 提交于 7月 04, 2011

When inodes are marked stale in a transaction, they are treated
specially when the inode log item is being inserted into the AIL.
It tries to avoid moving the log item forward in the AIL due to a
race condition with the writing the underlying buffer back to disk.
The was "fixed" in commit de25c181 ("xfs: avoid moving stale inodes
in the AIL").

To avoid moving the item forward, we return a LSN smaller than the
commit_lsn of the completing transaction, thereby trying to trick
the commit code into not moving the inode forward at all. I'm not
sure this ever worked as intended - it assumes the inode is already
in the AIL, but I don't think the returned LSN would have been small
enough to prevent moving the inode. It appears that the reason it
worked is that the lower LSN of the inodes meant they were inserted
into the AIL and flushed before the inode buffer (which was moved to
the commit_lsn of the transaction).

The big problem is that with delayed logging, the returning of the
different LSN means insertion takes the slow, non-bulk path.  Worse
yet is that insertion is to a position -before- the commit_lsn so it
is doing a AIL traversal on every insertion, and has to walk over
all the items that have already been inserted into the AIL. It's
expensive.

To compound the matter further, with delayed logging inodes are
likely to go from clean to stale in a single checkpoint, which means
they aren't even in the AIL at all when we come across them at AIL
insertion time. Hence these were all getting inserted into the AIL
when they simply do not need to be as inodes marked XFS_ISTALE are
never written back.

Transactional/recovery integrity is maintained in this case by the
other items in the unlink transaction that were modified (e.g. the
AGI btree blocks) and committed in the same checkpoint.

So to fix this, simply unpin the stale inodes directly in
xfs_inode_item_committed() and return -1 to indicate that the AIL
insertion code does not need to do any further processing of these
inodes.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1316d4da

29 4月, 2011 1 次提交

xfs: cleanup duplicate initializations · 45c51b99

由 David Sterba 提交于 4月 13, 2011

follow these guidelines:
- leave initialization in the declaration block if it fits the line
- move to the code where it's more suitable ('for' init block)

The last chunk was modified from David's original to be a correct
fix for what appeared to be a duplicate initialization.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

45c51b99

08 4月, 2011 1 次提交

xfs: fix extent format buffer allocation size · e828776a

由 Dave Chinner 提交于 4月 08, 2011

When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.

As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.

Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e828776a

26 3月, 2011 1 次提交

xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04

由 Dave Chinner 提交于 3月 26, 2011

There is an ABBA deadlock between synchronous inode flushing in
xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
buffer, then takes inode ilocks, whilst synchronous reclaim takes
the ilock followed by the buffer lock in xfs_iflush().

To avoid this deadlock, separate the inode cluster buffer locking
semantics from the synchronous inode flush semantics, allowing
callers to attempt to lock the buffer but still issue synchronous IO
if it can get the buffer. This requires xfs_iflush() calls that
currently use non-blocking semantics to pass SYNC_TRYLOCK rather
than 0 as the flags parameter.

This allows xfs_reclaim_inode to avoid the deadlock on the buffer
lock and detect the failure so that it can drop the inode ilock and
restart the reclaim attempt on the inode. This allows
xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
release it and hence defuse the deadlock situation. It also has the
pleasant side effect of avoiding IO in xfs_reclaim_inode when it
tries to next reclaim the inode as it is now marked stale.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

1bfd8d04

20 12月, 2010 1 次提交

xfs: remove all the inodes on a buffer from the AIL in bulk · 30136832

由 Dave Chinner 提交于 12月 20, 2010

When inode buffer IO completes, usually all of the inodes are removed from the
AIL. This involves processing them one at a time and taking the AIL lock once
for every inode. When all CPUs are processing inode IO completions, this causes
excessive amount sof contention on the AIL lock.

Instead, change the way we process inode IO completion in the buffer
IO done callback. Allow the inode IO done callback to walk the list
of IO done callbacks and pull all the inodes off the buffer in one
go and then process them as a batch.

Once all the inodes for removal are collected, take the AIL lock
once and do a bulk removal operation to minimise traffic on the AIL
lock.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

30136832

01 12月, 2010 1 次提交

xfs: avoid moving stale inodes in the AIL · de25c181

由 Dave Chinner 提交于 11月 30, 2010

When an inode has been marked stale because the cluster is being
freed, we don't want to (re-)insert this inode into the AIL. There
is a race condition where the cluster buffer may be unpinned before
the inode is inserted into the AIL during transaction committed
processing. If the buffer is unpinned before the inode item has been
committed and inserted, then it is possible for the buffer to be
released and hence processthe stale inode callbacks before the inode
is inserted into the AIL.

In this case, we then insert a clean, stale inode into the AIL which
will never get removed by an IO completion. It will, however, get
reclaimed and that triggers an assert in xfs_inode_free()
complaining about freeing an inode still in the AIL.

This race can be avoided by not moving stale inodes forward in the AIL
during transaction commit completion processing. This closes the
race condition by ensuring we never insert clean stale inodes into
the AIL. It is safe to do this because a dirty stale inode, by
definition, must already be in the AIL.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

de25c181

19 10月, 2010 1 次提交

xfs: don't use vfs writeback for pure metadata modifications · dcd79a14

由 Dave Chinner 提交于 9月 28, 2010

Under heavy multi-way parallel create workloads, the VFS struggles
to write back all the inodes that have been changed in age order.
The bdi flusher thread becomes CPU bound, spending 85% of it's time
in the VFS code, mostly traversing the superblock dirty inode list
to separate dirty inodes old enough to flush.

We already keep an index of all metadata changes in age order - in
the AIL - and continued log pressure will do age ordered writeback
without any extra overhead at all. If there is no pressure on the
log, the xfssyncd will periodically write back metadata in ascending
disk address offset order so will be very efficient.

Hence we can stop marking VFS inodes dirty during transaction commit
or when changing timestamps during transactions. This will keep the
inodes in the superblock dirty list to those containing data or
unlogged metadata changes.

However, the timstamp changes are slightly more complex than this -
there are a couple of places that do unlogged updates of the
timestamps, and the VFS need to be informed of these. Hence add a
new function xfs_trans_ichgtime() for transactional changes,
and leave xfs_ichgtime() for the non-transactional changes.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

dcd79a14

27 7月, 2010 10 次提交

xfs: fix big endian build · 696123fc

由 Dave Chinner 提交于 7月 26, 2010

Commit 0fd7275cc42ab734eaa1a2c747e65479bd1e42af ("xfs: fix gcc 4.6
set but not read and unused statement warnings") failed to convert
some code inside XFS_NATIVE_HOST (big endian host code only) and
hence fails to build on such machines. Fix it.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

696123fc

xfs: fix gcc 4.6 set but not read and unused statement warnings · 73523a2e

由 Christoph Hellwig 提交于 7月 20, 2010

[hch: dropped a few hunks that need structural changes instead]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

73523a2e

xfs: remove xfs_iput · f2d67614

由 Christoph Hellwig 提交于 6月 24, 2010

xfs_iput is just a small wrapper for xfs_iunlock + IRELE.  Having this
out of line wrapper means the trace events in those two can't track
their caller properly.  So just remove the wrapper and opencode the
unlock + rele in the few callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

f2d67614

xfs: fix the xfs_log_iovec i_addr type · 4e0d5f92

由 Christoph Hellwig 提交于 6月 23, 2010

By making this member a void pointer we can get rid of a lot of pointless
casts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

4e0d5f92

xfs: simplify inode to transaction joining · 898621d5

由 Christoph Hellwig 提交于 6月 24, 2010

Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.

This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

898621d5

xfs: give li_cb callbacks the correct prototype · ca30b2a7

由 Christoph Hellwig 提交于 6月 23, 2010

Stop the function pointer casting madness and give all the li_cb instances
correct prototype.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

ca30b2a7

xfs: give xfs_item_ops methods the correct prototypes · 7bfa31d8

由 Christoph Hellwig 提交于 6月 23, 2010

Stop the function pointer casting madness and give all the xfs_item_ops the
correct prototypes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

7bfa31d8

xfs: merge iop_unpin_remove into iop_unpin · 9412e318

由 Christoph Hellwig 提交于 6月 23, 2010

The unpin_remove item operation instances always share most of the
implementation with the respective unpin implementation.  So instead
of keeping two different entry points add a remove flag to the unpin
operation and share the code more easily.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

9412e318

C
xfs: remove unneeded #include statements · 3400777f
由 Christoph Hellwig 提交于 6月 23, 2010
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
```
3400777f

xfs: drop dmapi hooks · 288699fe

由 Christoph Hellwig 提交于 6月 23, 2010

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead.  If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

288699fe

19 5月, 2010 3 次提交

xfs: remove stale parameter from ->iop_unpin method · 8e123850

由 Dave Chinner 提交于 3月 08, 2010

The staleness of a object being unpinned can be directly derived
from the object itself - there is no need to extract it from the
object then pass it as a parameter into IOP_UNPIN().

This means we can kill the XFS_LID_BUF_STALE flag - it is set,
checked and cleared in the same places XFS_BLI_STALE flag in the
xfs_buf_log_item so it is now redundant and hence safe to remove.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

8e123850

xfs: Add inode pin counts to traces · 4aaf15d1

由 Dave Chinner 提交于 3月 08, 2010

We don't record pin counts in inode events right now, and this makes
it difficult to track down problems related to pinning inodes. Add
the pin count to the inode trace class and add trace events for
pinning and unpinning inodes.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4aaf15d1

xfs: factor log item initialisation · 43f5efc5

由 Dave Chinner 提交于 3月 23, 2010

Each log item type does manual initialisation of the log item.
Delayed logging introduces new fields that need initialisation, so
factor all the open coded initialisation into a common function
first.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

43f5efc5

02 3月, 2010 1 次提交

xfs: remove xfs_ipin/xfs_iunpin · a14a5ab5

由 Christoph Hellwig 提交于 2月 18, 2010

Inodes are only pinned/unpinned via the inode item methods, and lots of
code relies on that fact.  So remove the separate xfs_ipin/xfs_iunpin
helpers and merge them into their only callers.  This also fixes up
various duplicate and/or incorrect comments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a14a5ab5

02 2月, 2010 1 次提交

xfs: Don't issue buffer IO direct from AIL push V2 · d808f617

由 Dave Chinner 提交于 2月 02, 2010

All buffers logged into the AIL are marked as delayed write.
When the AIL needs to push the buffer out, it issues an async write of the
buffer. This means that IO patterns are dependent on the order of
buffers in the AIL.

Instead of flushing the buffer, promote the buffer in the delayed
write list so that the next time the xfsbufd is run the buffer will
be flushed by the xfsbufd. Return the state to the xfsaild that the
buffer was promoted so that the xfsaild knows that it needs to cause
the xfsbufd to run to flush the buffers that were promoted.

Using the xfsbufd for issuing the IO allows us to dispatch all
buffer IO from the one queue. This means that we can make much more
enlightened decisions on what order to flush buffers to disk as
we don't have multiple places issuing IO. Optimisations to xfsbufd
will be in a future patch.

Version 2
- kill XFS_ITEM_FLUSHING as it is now unused.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

d808f617

06 2月, 2010 1 次提交

xfs: Use delayed write for inodes rather than async V2 · c854363e

由 Dave Chinner 提交于 2月 06, 2010

We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them. Not only that, there are also blocking and non-blocking
asynchronous inode flushes, depending on where the flush comes from.

This patch completely removes asynchronous inode writeback. It
removes all the strange writeback modes and replaces them with
either a synchronous flush or a non-blocking delayed write flush.
That is, inode flushes will only issue IO directly if they are
synchronous, and background flushing may do nothing if the operation
would block (e.g. on a pinned inode or buffer lock).

Delayed write flushes will now result in the inode buffer sitting in
the delwri queue of the buffer cache to be flushed by either an AIL
push or by the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators. We will also
get adjacent inode cluster buffer IO merging for free when a later
patch in the series allows sorting of the delayed write buffers
before dispatch.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
This writeback path is currently non-optimal, but the next patch
in the series will fix that problem.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers. As a result, the
inode reclaim code has been rewritten so that it no longer relies on
the ambiguous return values of xfs_iflush() to determine whether it
is safe to reclaim an inode.

Portions of this patch are derived from patches by Christoph
Hellwig.

Version 2:
- cleanup reclaim code as suggested by Christoph
- log background reclaim inode flush errors
- just pass sync flags to xfs_iflush
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

c854363e

22 1月, 2010 1 次提交

xfs: cleanup up xfs_log_force calling conventions · a14a348b

由 Christoph Hellwig 提交于 1月 19, 2010

Remove the XFS_LOG_FORCE argument which was always set, and the
XFS_LOG_URGE define, which was never used.

Split xfs_log_force into a two helpers - xfs_log_force which forces
the whole log, and xfs_log_force_lsn which forces up to the
specified LSN.  The underlying implementations already were entirely
separate, as were the users.

Also re-indent the new _xfs_log_force/_xfs_log_force which
previously had a weird coding style.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a14a348b