提交 · 1bfd8d04190c615bb8d1d98188dead0c09702208 · openeuler / Kernel

26 3月, 2011 1 次提交

xfs: introduce inode cluster buffer trylocks for xfs_iflush · 1bfd8d04

由 Dave Chinner 提交于 3月 26, 2011

There is an ABBA deadlock between synchronous inode flushing in
xfs_reclaim_inode and xfs_icluster_free. xfs_icluster_free locks the
buffer, then takes inode ilocks, whilst synchronous reclaim takes
the ilock followed by the buffer lock in xfs_iflush().

To avoid this deadlock, separate the inode cluster buffer locking
semantics from the synchronous inode flush semantics, allowing
callers to attempt to lock the buffer but still issue synchronous IO
if it can get the buffer. This requires xfs_iflush() calls that
currently use non-blocking semantics to pass SYNC_TRYLOCK rather
than 0 as the flags parameter.

This allows xfs_reclaim_inode to avoid the deadlock on the buffer
lock and detect the failure so that it can drop the inode ilock and
restart the reclaim attempt on the inode. This allows
xfs_ifree_cluster to obtain the inode lock, mark the inode stale and
release it and hence defuse the deadlock situation. It also has the
pleasant side effect of avoiding IO in xfs_reclaim_inode when it
tries to next reclaim the inode as it is now marked stale.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

1bfd8d04

07 3月, 2011 1 次提交

xfs: Convert linux-2.6/ files to new logging interface · 4f10700a

由 Dave Chinner 提交于 3月 07, 2011

Convert the files in fs/xfs/linux-2.6/ to use the new xfs_<level>
logging format that replaces the old Irix inherited cmn_err()
interfaces. While there, also convert naked printk calls to use the
relevant xfs logging function to standardise output format.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

4f10700a

12 1月, 2011 1 次提交

xfs: ensure log covering transactions are synchronous · c58efdb4

由 Dave Chinner 提交于 1月 04, 2011

To ensure the log is covered and the filesystem idles correctly, we
need to ensure that dummy transactions hit the disk and do not stay
pinned in memory.  If the superblock is pinned in memory, it can't
be flushed so the log covering cannot make progress. The result is
dependent on timing - more oftent han not we continue to issues a
log covering transaction every 36s rather than idling after ~90s.

Fix this by making the log covering transaction synchronous. To
avoid additional log force from xfssyncd, make the log covering
transaction take the place of the existing log force in the xfssyncd
background sync process.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c58efdb4

16 12月, 2010 1 次提交

xfs: convert pag_ici_lock to a spin lock · 1a427ab0

由 Dave Chinner 提交于 12月 16, 2010

now that we are using RCU protection for the inode cache lookups,
the lock is only needed on the modification side. Hence it is not
necessary for the lock to be a rwlock as there are no read side
holders anymore. Convert it to a spin lock to reflect it's exclusive
nature.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

1a427ab0

17 12月, 2010 1 次提交

xfs: convert inode cache lookups to use RCU locking · 1a3e8f3d

由 Dave Chinner 提交于 12月 17, 2010

With delayed logging greatly increasing the sustained parallelism of inode
operations, the inode cache locking is showing significant read vs write
contention when inode reclaim runs at the same time as lookups. There is
also a lot more write lock acquistions than there are read locks (4:1 ratio)
so the read locking is not really buying us much in the way of parallelism.

To avoid the read vs write contention, change the cache to use RCU locking on
the read side. To avoid needing to RCU free every single inode, use the built
in slab RCU freeing mechanism. This requires us to be able to detect lookups of
freed inodes, so enѕure that ever freed inode has an inode number of zero and
the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache hit
lookup path, but also add a check for a zero inode number as well.

We canthen convert all the read locking lockups to use RCU read side locking
and hence remove all read side locking.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

1a3e8f3d

11 11月, 2010 1 次提交

xfs: fix per-ag reference counting in inode reclaim tree walking · f83282a8

由 Dave Chinner 提交于 11月 08, 2010

The walk fails to decrement the per-ag reference count when the
non-blocking walk fails to obtain the per-ag reclaim lock, leading
to an assert failure on debug kernels when unmounting a filesystem.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

f83282a8

19 10月, 2010 6 次提交

xfs: serialise inode reclaim within an AG · 69b491c2

由 Dave Chinner 提交于 9月 27, 2010

Memory reclaim via shrinkers has a terrible habit of having N+M
concurrent shrinker executions (N = num CPUs, M = num kswapds) all
trying to shrink the same cache. When the cache they are all working
on is protected by a single spinlock, massive contention an
slowdowns occur.

Wrap the per-ag inode caches with a reclaim mutex to serialise
reclaim access to the AG. This will block concurrent reclaim in each
AG but still allow reclaim to scan multiple AGs concurrently. Allow
shrinkers to move on to the next AG if it can't get the lock, and if
we can't get any AG, then start blocking on locks.

To prevent reclaimers from continually scanning the same inodes in
each AG, add a cursor that tracks where the last reclaim got up to
and start from that point on the next reclaim. This should avoid
only ever scanning a small number of inodes at the satart of each AG
and not making progress. If we have a non-shrinker based reclaim
pass, ignore the cursor and reset it to zero once we are done.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

69b491c2

xfs: batch inode reclaim lookup · e3a20c0b

由 Dave Chinner 提交于 9月 24, 2010

Batch and optimise the per-ag inode lookup for reclaim to minimise
scanning overhead. This involves gang lookups on the radix trees to
get multiple inodes during each tree walk, and tighter validation of
what inodes can be reclaimed without blocking befor we take any
locks.

This is based on ideas suggested in a proof-of-concept patch
posted by Nick Piggin.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e3a20c0b

xfs: implement batched inode lookups for AG walking · 78ae5256

由 Dave Chinner 提交于 9月 28, 2010

With the reclaim code separated from the generic walking code, it is
simple to implement batched lookups for the generic walk code.
Separate out the inode validation from the execute operations and
modify the tree lookups to get a batch of inodes at a time.

Reclaim operations will be optimised separately.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

78ae5256

xfs: split out inode walk inode grabbing · e13de955

由 Dave Chinner 提交于 9月 28, 2010

When doing read side inode cache walks, the code to validate and
grab an inode is common to all callers. Split it out of the execute
callbacks in preparation for batching lookups. Similarly, split out
the inode reference dropping from the execute callbacks into the
main lookup look to be symmetric with the grab.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e13de955

xfs: split inode AG walking into separate code for reclaim · 65d0f205

由 Dave Chinner 提交于 9月 24, 2010

The reclaim walk requires different locking and has a slightly
different walk algorithm, so separate it out so that it can be
optimised separately.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

65d0f205

xfs: lockless per-ag lookups · e176579e

由 Dave Chinner 提交于 9月 22, 2010

When we start taking a reference to the per-ag for every cached
buffer in the system, kernel lockstat profiling on an 8-way create
workload shows the mp->m_perag_lock has higher acquisition rates
than the inode lock and has significantly more contention. That is,
it becomes the highest contended lock in the system.

The perag lookup is trivial to convert to lock-less RCU lookups
because perag structures never go away. Hence the only thing we need
to protect against is tree structure changes during a grow. This can
be done simply by replacing the locking in xfs_perag_get() with RCU
read locking. This removes the mp->m_perag_lock completely from this
path.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e176579e

07 10月, 2010 1 次提交

xfs: properly account for reclaimed inodes · 081003ff

由 Johannes Weiner 提交于 10月 01, 2010

When marking an inode reclaimable, a per-AG counter is increased, the
inode is tagged reclaimable in its per-AG tree, and, when this is the
first reclaimable inode in the AG, the AG entry in the per-mount tree
is also tagged.

When an inode is finally reclaimed, however, it is only deleted from
the per-AG tree.  Neither the counter is decreased, nor is the parent
tree's AG entry untagged properly.

Since the tags in the per-mount tree are not cleared, the inode
shrinker iterates over all AGs that have had reclaimable inodes at one
point in time.

The counters on the other hand signal an increasing amount of slab
objects to reclaim.  Since "70e60ce7 xfs: convert inode shrinker to
per-filesystem context" this is not a real issue anymore because the
shrinker bails out after one iteration.

But the problem was observable on a machine running v2.6.34, where the
reclaimable work increased and each process going into direct reclaim
eventually got stuck on the xfs inode shrinking path, trying to scan
several million objects.

Fix this by properly unwinding the reclaimable-state tracking of an
inode when it is reclaimed.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: stable@kernel.org
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

081003ff

24 8月, 2010 1 次提交

xfs: dummy transactions should not dirty VFS state · 1a387d3b

由 Dave Chinner 提交于 8月 24, 2010

When we  need to cover the log, we issue dummy transactions to ensure
the current log tail is on disk. Unfortunately we currently use the
root inode in the dummy transaction, and the act of committing the
transaction dirties the inode at the VFS level.

As a result, the VFS writeback of the dirty inode will prevent the
filesystem from idling long enough for the log covering state
machine to complete. The state machine gets stuck in a loop issuing
new dummy transactions to cover the log and never makes progress.

To avoid this problem, the dummy transactions should not cause
externally visible state changes. To ensure this occurs, make sure
that dummy transactions log an unchanging field in the superblock as
it's state is never propagated outside the filesystem. This allows
the log covering state machine to complete successfully and the
filesystem now correctly enters a fully idle state about 90s after
the last modification was made.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

1a387d3b

27 7月, 2010 5 次提交

xfs: simplify and remove xfs_ireclaim · 2f11feab

由 Dave Chinner 提交于 7月 20, 2010

xfs_ireclaim has to get and put te pag structure because it is only
called with the inode to reclaim. The one caller of this function
already has a reference on the pag and a pointer to is, so move the
radix tree delete to the caller and remove xfs_ireclaim completely.
This avoids a xfs_perag_get/put on every inode being reclaimed.

The overhead was noticed in a bug report at:

https://bugzilla.kernel.org/show_bug.cgi?id=16348Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

2f11feab

xfs: remove explicit xfs_sync_data/xfs_sync_attr calls on umount · 64c86149

由 Christoph Hellwig 提交于 6月 24, 2010

On the final put of a superblock the VFS already calls sync_filesystem
for us to write out all data and wait for it.  No need to start another
asynchronous writeback inside ->put_super.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

64c86149

xfs: simplify inode to transaction joining · 898621d5

由 Christoph Hellwig 提交于 6月 24, 2010

Currently we need to either call IHOLD or xfs_trans_ihold on an inode when
joining it to a transaction via xfs_trans_ijoin.

This patches instead makes xfs_trans_ijoin usable on it's own by doing
an implicity xfs_trans_ihold, which also allows us to drop the third
argument. For the case where we want to hold a reference on the inode
a xfs_trans_ijoin_ref wrapper is added which does the IHOLD and marks
the inode for needing an xfs_iput. In addition to the cleaner interface
to the caller this also simplifies the implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

898621d5

C
xfs: remove unneeded #include statements · 3400777f
由 Christoph Hellwig 提交于 6月 23, 2010
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
```
3400777f

xfs: drop dmapi hooks · 288699fe

由 Christoph Hellwig 提交于 6月 23, 2010

Dmapi support was never merged upstream, but we still have a lot of hooks
bloating XFS for it, all over the fast pathes of the filesystem.

This patch drops over 700 lines of dmapi overhead.  If we'll ever get HSM
support in mainline at least the namespace events can be done much saner
in the VFS instead of the individual filesystem, so it's not like this
is much help for future work.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

288699fe

20 7月, 2010 2 次提交

xfs: track AGs with reclaimable inodes in per-ag radix tree · 16fd5367

由 Dave Chinner 提交于 7月 20, 2010

https://bugzilla.kernel.org/show_bug.cgi?id=16348

When the filesystem grows to a large number of allocation groups,
the summing of recalimable inodes gets expensive. In many cases,
most AGs won't have any reclaimable inodes and so we are wasting CPU
time aggregating over these AGs. This is particularly important for
the inode shrinker that gets called frequently under memory
pressure.

To avoid the overhead, track AGs with reclaimable inodes in the
per-ag radix tree so that we can find all the AGs with reclaimable
inodes via a simple gang tag lookup. This involves setting the tag
when the first reclaimable inode is tracked in the AG, and removing
the tag when the last reclaimable inode is removed from the tree.
Then the summation process becomes a loop walking the radix tree
summing AGs with the reclaim tag set.

This significantly reduces the overhead of scanning - a 6400 AG
filesystea now only uses about 25% of a cpu in kswapd while slab
reclaim progresses instead of being permanently stuck at 100% CPU
and making little progress. Clean filesystems filesystems will see
no overhead and the overhead only increases linearly with the number
of dirty AGs.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

16fd5367

xfs: convert inode shrinker to per-filesystem contexts · 70e60ce7

由 Dave Chinner 提交于 7月 20, 2010

Now the shrinker passes us a context, wire up a shrinker context per
filesystem. This allows us to remove the global mount list and the
locking problems that introduced. It also means that a shrinker call
does not need to traverse clean filesystems before finding a
filesystem with reclaimable inodes.  This significantly reduces
scanning overhead when lots of filesystems are present.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

70e60ce7

19 7月, 2010 1 次提交

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

29 5月, 2010 1 次提交

xfs: fix access to upper inodes without inode64 · fb3b504a

由 Christoph Hellwig 提交于 5月 28, 2010

If a filesystem is mounted without the inode64 mount option we
should still be able to access inodes not fitting into 32 bits, just
not created new ones.  For this to work we need to make sure the
inode cache radix tree is initialized for all allocation groups, not
just those we plan to allocate inodes from.  This patch makes sure
we initialize the inode cache radix tree for all allocation groups,
and also cleans xfs_initialize_perag up a bit to separate the
inode32 logical from the general perag structure setup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

fb3b504a

19 5月, 2010 3 次提交

xfs: enforce synchronous writes in xfs_bwrite · 8c38366f

由 Christoph Hellwig 提交于 3月 12, 2010

xfs_bwrite is used with the intention of synchronously writing out
buffers, but currently it does not actually clear the async flag if
that's left from previous writes but instead implements async
behaviour if it finds it.  Remove the code handling asynchronous
writes as we've got rid of those entirely outside of the log and
delwri buffers, and make sure that we clear the async and read flags
before writing the buffer.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

8c38366f

xfs: remove periodic superblock writeback · df308bcf

由 Christoph Hellwig 提交于 3月 12, 2010

All modifications to the superblock are done transactional through
xfs_trans_log_buf, so there is no reason to initiate periodic
asynchronous writeback.  This only removes the superblock from the
delwri list and will lead to sub-optimal I/O scheduling.

Cut down xfs_sync_fsdata now that it's only used for synchronous
superblock writes and move the log coverage checks into the two
callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

df308bcf

xfs: add blockdev name to kthreads · e2a07812

由 Jan Engelhardt 提交于 3月 23, 2010

This allows to see in `ps` and similar tools which kthreads are
allotted to which block device/filesystem, similar to what jbd2
does. As the process name is a fixed 16-char array, no extra
space is needed in tasks.

  PID TTY      STAT   TIME COMMAND
    2 ?        S      0:00 [kthreadd]
  197 ?        S      0:00  \_ [jbd2/sda2-8]
  198 ?        S      0:00  \_ [ext4-dio-unwrit]
  204 ?        S      0:00  \_ [flush-8:0]
 2647 ?        S      0:00  \_ [xfs_mru_cache]
 2648 ?        S      0:00  \_ [xfslogd/0]
 2649 ?        S      0:00  \_ [xfsdatad/0]
 2650 ?        S      0:00  \_ [xfsconvertd/0]
 2651 ?        S      0:00  \_ [xfsbufd/ram0]
 2652 ?        S      0:00  \_ [xfsaild/ram0]
 2653 ?        S      0:00  \_ [xfssyncd/ram0]
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>

e2a07812

30 4月, 2010 1 次提交

xfs: add a shrinker to background inode reclaim · 9bf729c0

由 Dave Chinner 提交于 4月 29, 2010

On low memory boxes or those with highmem, kernel can OOM before the
background reclaims inodes via xfssyncd. Add a shrinker to run inode
reclaim so that it inode reclaim is expedited when memory is low.

This is more complex than it needs to be because the VM folk don't
want a context added to the shrinker infrastructure. Hence we need
to add a global list of XFS mount structures so the shrinker can
traverse them.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

9bf729c0

17 4月, 2010 1 次提交

xfs: don't warn on EAGAIN in inode reclaim · f1d486a3

由 Dave Chinner 提交于 4月 13, 2010

Any inode reclaim flush that returns EAGAIN will result in the inode
reclaim being attempted again later. There is no need to issue a
warning into the logs about this situation.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

f1d486a3

06 3月, 2010 1 次提交

xfs: check for more work before sleeping in xfssyncd · 20f6b2c7

由 Dave Chinner 提交于 3月 04, 2010

xfssyncd processes a queue of work by detaching the queue and
then iterating over all the work items. It then sleeps for a
time period or until new work comes in. If new work is queued
while xfssyncd is actively processing the detached work queue,
it will not process that new work until after a sleep timeout
or the next work event queued wakes it.

Fix this by checking the work queue again before going to sleep.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

20f6b2c7

02 3月, 2010 1 次提交

xfs: fix locking for inode cache radix tree tag updates · f1f724e4

由 Christoph Hellwig 提交于 3月 01, 2010

The radix-tree code requires it's users to serialize tag updates
against other updates to the tree.  While XFS protects tag updates
against each other it does not serialize them against updates of the
tree contents, which can lead to tag corruption.  Fix the inode
cache to always take pag_ici_lock in exclusive mode when updating
radix tree tags.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NPatrick Schreurs <patrick@news-service.com>
Tested-by: NPatrick Schreurs <patrick@news-service.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

f1f724e4

06 2月, 2010 2 次提交

xfs: Use delayed write for inodes rather than async V2 · c854363e

由 Dave Chinner 提交于 2月 06, 2010

We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them. Not only that, there are also blocking and non-blocking
asynchronous inode flushes, depending on where the flush comes from.

This patch completely removes asynchronous inode writeback. It
removes all the strange writeback modes and replaces them with
either a synchronous flush or a non-blocking delayed write flush.
That is, inode flushes will only issue IO directly if they are
synchronous, and background flushing may do nothing if the operation
would block (e.g. on a pinned inode or buffer lock).

Delayed write flushes will now result in the inode buffer sitting in
the delwri queue of the buffer cache to be flushed by either an AIL
push or by the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators. We will also
get adjacent inode cluster buffer IO merging for free when a later
patch in the series allows sorting of the delayed write buffers
before dispatch.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
This writeback path is currently non-optimal, but the next patch
in the series will fix that problem.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers. As a result, the
inode reclaim code has been rewritten so that it no longer relies on
the ambiguous return values of xfs_iflush() to determine whether it
is safe to reclaim an inode.

Portions of this patch are derived from patches by Christoph
Hellwig.

Version 2:
- cleanup reclaim code as suggested by Christoph
- log background reclaim inode flush errors
- just pass sync flags to xfs_iflush
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

c854363e

xfs: Make inode reclaim states explicit · 777df5af

由 Dave Chinner 提交于 2月 06, 2010

A.K.A.: don't rely on xfs_iflush() return value in reclaim

We have gradually been moving checks out of the reclaim code because
they are duplicated in xfs_iflush(). We've had a history of problems
in this area, and many of them stem from the overloading of the
return values from xfs_iflush() and interaction with inode flush
locking to determine if the inode is safe to reclaim.

With the desire to move to delayed write flushing of inodes and
non-blocking inode tree reclaim walks, the overloading of the
return value of xfs_iflush makes it very difficult to determine
the correct thing to do next.

This patch explicitly re-adds the checks to the inode reclaim code,
removing the reliance on the return value of xfs_iflush() to
determine what to do next. It also means that we can clearly
document all the inode states that reclaim must handle and hence
we can easily see that we handled all the necessary cases.

This also removes the need for the xfs_inode_clean() check in
xfs_iflush() as all callers now check this first (safely).
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

777df5af

22 1月, 2010 2 次提交

xfs: cleanup up xfs_log_force calling conventions · a14a348b

由 Christoph Hellwig 提交于 1月 19, 2010

Remove the XFS_LOG_FORCE argument which was always set, and the
XFS_LOG_URGE define, which was never used.

Split xfs_log_force into a two helpers - xfs_log_force which forces
the whole log, and xfs_log_force_lsn which forces up to the
specified LSN.  The underlying implementations already were entirely
separate, as were the users.

Also re-indent the new _xfs_log_force/_xfs_log_force which
previously had a weird coding style.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a14a348b

xfs: remove duplicate buffer flags · 0cadda1c

由 Christoph Hellwig 提交于 1月 19, 2010

Currently we define aliases for the buffer flags in various
namespaces, which only adds confusion.  Remove all but the XBF_
flags to clean this up a bit.

Note that we still abuse XFS_B_ASYNC/XBF_ASYNC for some non-buffer
uses, but I'll clean that up later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0cadda1c

16 1月, 2010 5 次提交

xfs: Kill filestreams cache flush · b657fc82

由 Dave Chinner 提交于 1月 11, 2010

The filestreams cache flush is not needed in the sync code as it
does not affect data writeback, and it is now not used by the growfs
code, either, so kill it.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b657fc82

xfs: rename xfs_get_perag · 5017e97d

由 Dave Chinner 提交于 1月 11, 2010

xfs_get_perag is really getting the perag that an inode belongs to
based on it's inode number. Convert the use of this function to just
get the perag from a provided ag number.  Use this new function to
obtain the per-ag structure when traversing the per AG inode trees
for sync and reclaim.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5017e97d

xfs: make several more functions static · 5d77c0dc

由 Eric Sandeen 提交于 11月 19, 2009

Just minor housekeeping, a lot more functions can be trivially made
static; others could if we reordered things a bit...
Signed-off-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5d77c0dc

xfs: Avoid inodes in reclaim when flushing from inode cache · 018027be

由 Dave Chinner 提交于 1月 10, 2010

The reclaim code will handle flushing of dirty inodes before reclaim
occurs, so avoid them when determining whether an inode is a
candidate for flushing to disk when walking the radix trees.  This
is based on a test patch from Christoph Hellwig.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

018027be

xfs: reclaim inodes under a write lock · c8e20be0

由 Dave Chinner 提交于 1月 10, 2010

Make the inode tree reclaim walk exclusive to avoid races with
concurrent sync walkers and lookups. This is a version of a patch
posted by Christoph Hellwig that avoids all the code duplication.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c8e20be0

15 12月, 2009 1 次提交

xfs: event tracing support · 0b1b213f

由 Christoph Hellwig 提交于 12月 14, 2009

Convert the old xfs tracing support that could only be used with the
out of tree kdb and xfsidbg patches to use the generic event tracer.

To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
all xfs trace channels by:

   echo 1 > /sys/kernel/debug/tracing/events/xfs/enable

or alternatively enable single events by just doing the same in one
event subdirectory, e.g.

   echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable

or set more complex filters, etc. In Documentation/trace/events.txt
all this is desctribed in more detail.  To reads the events do a

   cat /sys/kernel/debug/tracing/trace

Compared to the last posting this patch converts the tracing mostly to
the one tracepoint per callsite model that other users of the new
tracing facility also employ.  This allows a very fine-grained control
of the tracing, a cleaner output of the traces and also enables the
perf tool to use each tracepoint as a virtual performance counter,
     allowing us to e.g. count how often certain workloads git various
     spots in XFS.  Take a look at

    http://lwn.net/Articles/346470/

for some examples.

Also the btree tracing isn't included at all yet, as it will require
additional core tracing features not in mainline yet, I plan to
deliver it later.

And the really nice thing about this patch is that it actually removes
many lines of code while adding this nice functionality:

 fs/xfs/Makefile                |    8
 fs/xfs/linux-2.6/xfs_acl.c     |    1
 fs/xfs/linux-2.6/xfs_aops.c    |   52 -
 fs/xfs/linux-2.6/xfs_aops.h    |    2
 fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
 fs/xfs/linux-2.6/xfs_buf.h     |   33
 fs/xfs/linux-2.6/xfs_fs_subr.c |    3
 fs/xfs/linux-2.6/xfs_ioctl.c   |    1
 fs/xfs/linux-2.6/xfs_ioctl32.c |    1
 fs/xfs/linux-2.6/xfs_iops.c    |    1
 fs/xfs/linux-2.6/xfs_linux.h   |    1
 fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
 fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
 fs/xfs/linux-2.6/xfs_super.c   |  104 ---
 fs/xfs/linux-2.6/xfs_super.h   |    7
 fs/xfs/linux-2.6/xfs_sync.c    |    1
 fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
 fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/linux-2.6/xfs_vnode.h   |    4
 fs/xfs/quota/xfs_dquot.c       |  110 ---
 fs/xfs/quota/xfs_dquot.h       |   21
 fs/xfs/quota/xfs_qm.c          |   40 -
 fs/xfs/quota/xfs_qm_syscalls.c |    4
 fs/xfs/support/ktrace.c        |  323 ---------
 fs/xfs/support/ktrace.h        |   85 --
 fs/xfs/xfs.h                   |   16
 fs/xfs/xfs_ag.h                |   14
 fs/xfs/xfs_alloc.c             |  230 +-----
 fs/xfs/xfs_alloc.h             |   27
 fs/xfs/xfs_alloc_btree.c       |    1
 fs/xfs/xfs_attr.c              |  107 ---
 fs/xfs/xfs_attr.h              |   10
 fs/xfs/xfs_attr_leaf.c         |   14
 fs/xfs/xfs_attr_sf.h           |   40 -
 fs/xfs/xfs_bmap.c              |  507 +++------------
 fs/xfs/xfs_bmap.h              |   49 -
 fs/xfs/xfs_bmap_btree.c        |    6
 fs/xfs/xfs_btree.c             |    5
 fs/xfs/xfs_btree_trace.h       |   17
 fs/xfs/xfs_buf_item.c          |   87 --
 fs/xfs/xfs_buf_item.h          |   20
 fs/xfs/xfs_da_btree.c          |    3
 fs/xfs/xfs_da_btree.h          |    7
 fs/xfs/xfs_dfrag.c             |    2
 fs/xfs/xfs_dir2.c              |    8
 fs/xfs/xfs_dir2_block.c        |   20
 fs/xfs/xfs_dir2_leaf.c         |   21
 fs/xfs/xfs_dir2_node.c         |   27
 fs/xfs/xfs_dir2_sf.c           |   26
 fs/xfs/xfs_dir2_trace.c        |  216 ------
 fs/xfs/xfs_dir2_trace.h        |   72 --
 fs/xfs/xfs_filestream.c        |    8
 fs/xfs/xfs_fsops.c             |    2
 fs/xfs/xfs_iget.c              |  111 ---
 fs/xfs/xfs_inode.c             |   67 --
 fs/xfs/xfs_inode.h             |   76 --
 fs/xfs/xfs_inode_item.c        |    5
 fs/xfs/xfs_iomap.c             |   85 --
 fs/xfs/xfs_iomap.h             |    8
 fs/xfs/xfs_log.c               |  181 +----
 fs/xfs/xfs_log_priv.h          |   20
 fs/xfs/xfs_log_recover.c       |    1
 fs/xfs/xfs_mount.c             |    2
 fs/xfs/xfs_quota.h             |    8
 fs/xfs/xfs_rename.c            |    1
 fs/xfs/xfs_rtalloc.c           |    1
 fs/xfs/xfs_rw.c                |    3
 fs/xfs/xfs_trans.h             |   47 +
 fs/xfs/xfs_trans_buf.c         |   62 -
 fs/xfs/xfs_vnodeops.c          |    8
 70 files changed, 2151 insertions(+), 2592 deletions(-)
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

0b1b213f

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功