提交 · c6f6cd0608b1826ee1797cf57a808416e4bdb806 · openanolis / cloud-kernel

11 11月, 2010 5 次提交

由 Christoph Hellwig 提交于 11月 06, 2010

XFS does not need it's inodes to actuall be hashed in the VFS inode
cache, but we require the inode to be marked hashed for the
writeback code to work.

Insted of using insert_inode_hash, which requires a second
inode_lock roundtrip after the partial merge of the inode
scalability patches in 2.6.37-rc simply use the new hlist_add_fake
helper to mark it hashed without requiring a lock or touching a
global cache line.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c6f6cd06

xfs: move delayed write buffer trace · bfe27419

由 Dave Chinner 提交于 11月 08, 2010

The delayed write buffer split trace currently issues a trace for
every buffer it scans. These buffers are not necessarily queued for
delayed write. Indeed, when buffers are pinned, there can be
thousands of traces of buffers that aren't actually queued for
delayed write and the ones that are are lost in the noise. Move the
trace point to record only buffers that are split out for IO to be
issued on.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

bfe27419

xfs: fix per-ag reference counting in inode reclaim tree walking · f83282a8

由 Dave Chinner 提交于 11月 08, 2010

The walk fails to decrement the per-ag reference count when the
non-blocking walk fails to obtain the per-ag reclaim lock, leading
to an assert failure on debug kernels when unmounting a filesystem.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

f83282a8

xfs: xfs_ioctl: fix information leak to userland · 6762b938

由 Kulikov Vasiliy 提交于 10月 30, 2010

al_hreq is copied from userland. If al_hreq.buflen is not properly aligned
then xfs_attr_list will ignore the last bytes of kbuf. These bytes are
unitialized. It leads to leaking of contents of kernel stack memory.
Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

6762b938

xfs: remove experimental tag from the delaylog option · 5d0af85c

由 Christoph Hellwig 提交于 10月 28, 2010

We promised to do this for 2.6.37, and the code looks stable enough to
keep that promise.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

5d0af85c

29 10月, 2010 1 次提交

new helper: mount_bdev() · 152a0836

由 Al Viro 提交于 7月 25, 2010

... and switch of the obvious get_sb_bdev() users to ->mount()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

152a0836

27 10月, 2010 1 次提交

writeback: remove nonblocking/encountered_congestion references · 1b430bee

由 Wu Fengguang 提交于 10月 26, 2010

This removes more dead code that was somehow missed by commit 0d99519e
(writeback: remove unused nonblocking and congestion checks).  There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.

The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion.  The latter will lead to more seeky IO.

The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.

We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
   that would skip some dirty inodes on congestion and page out others, which
   is unfair in terms of LRU age.

Inspired by Christoph Hellwig. Thanks!
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b430bee

26 10月, 2010 4 次提交

fs: do not assign default i_ino in new_inode · 85fe4025

由 Christoph Hellwig 提交于 10月 23, 2010

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85fe4025

new helper: ihold() · 7de9c6ee

由 Al Viro 提交于 10月 23, 2010

Clones an existing reference to inode; caller must already hold one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7de9c6ee

fs: remove inode_add_to_list/__inode_add_to_list · 646ec461

由 Christoph Hellwig 提交于 10月 23, 2010

Split up inode_add_to_list/__inode_add_to_list.  Locking for the two
lists will be split soon so these helpers really don't buy us much
anymore.

The __ prefixes for the sb list helpers will go away soon, but until
inode_lock is gone we'll need them to distinguish between the locked
and unlocked variants.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

646ec461

fs: kill block_prepare_write · ebdec241

由 Christoph Hellwig 提交于 10月 06, 2010

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions.  Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebdec241

19 10月, 2010 23 次提交

xfs: semaphore cleanup · a731cd11

由 Thomas Gleixner 提交于 9月 07, 2010

Get rid of init_MUTEX[_LOCKED]() and use sema_init() instead.

(Ported to current XFS code by <aelder@sgi.com>.)
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a731cd11

xfs: Extend project quotas to support 32bit project ids · 6743099c

由 Arkadiusz Mi?kiewicz 提交于 9月 26, 2010

This patch adds support for 32bit project quota identifiers.

On disk format is backward compatible with 16bit projid numbers. projid
on disk is now kept in two 16bit values - di_projid_lo (which holds the
same position as old 16bit projid value) and new di_projid_hi (takes
existing padding) and converts from/to 32bit value on the fly.

xfs_admin (for existing fs), mkfs.xfs (for new fs) needs to be used
to enable PROJID32BIT support.
Signed-off-by: NArkadiusz Miśkiewicz <arekm@maven.pl>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

6743099c

xfs: remove xfs_buf wrappers · 1a1a3e97

由 Christoph Hellwig 提交于 10月 06, 2010

Stop having two different names for many buffer functions and use
the more descriptive xfs_buf_* names directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

1a1a3e97

xfs: remove xfs_cred.h · 6c77b0ea

由 Christoph Hellwig 提交于 10月 06, 2010

We're not actually passing around credentials inside XFS for a while
now, so remove all xfs_cred.h with it's cred_t typedef and all
instances of it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

6c77b0ea

xfs: remove xfs_globals.h · 78a4b096

由 Christoph Hellwig 提交于 10月 06, 2010

This header only provides one extern that isn't actually declared
anywhere, and shadowed by a macro.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

78a4b096

xfs: remove xfs_version.h · 668332e5

由 Christoph Hellwig 提交于 10月 06, 2010

It used to have a place when it contained an automatically generated
CVS version, but these days it's entirely superflous.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

668332e5

xfs: remove XFS_MOUNT_NO_PERCPU_SB · 61ba35de

由 Christoph Hellwig 提交于 9月 30, 2010

Fail the mount if we can't allocate memory for the per-CPU counters.
This is consistent with how we handle everything else in the mount
path and makes the superblock counter modification a lot simpler.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

61ba35de

xfs: pack xfs_buf structure more tightly · 50f59e8e

由 Dave Chinner 提交于 9月 24, 2010

pahole reports the struct xfs_buf has quite a few holes in it, so
packing the structure better will reduce the size of it by 16 bytes.
Also, move all the fields used in cache lookups into the first
cacheline.

Before on x86_64:

        /* size: 320, cachelines: 5 */
	/* sum members: 298, holes: 6, sum holes: 22 */

After on x86_64:

        /* size: 304, cachelines: 5 */
	/* padding: 6 */
	/* last cacheline: 48 bytes */
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

50f59e8e

xfs: convert buffer cache hash to rbtree · 74f75a0c

由 Dave Chinner 提交于 9月 24, 2010

The buffer cache hash is showing typical hash scalability problems.
In large scale testing the number of cached items growing far larger
than the hash can efficiently handle. Hence we need to move to a
self-scaling cache indexing mechanism.

I have selected rbtrees for indexing becuse they can have O(log n)
search scalability, and insert and remove cost is not excessive,
even on large trees. Hence we should be able to cache large numbers
of buffers without incurring the excessive cache miss search
penalties that the hash is imposing on us.

To ensure we still have parallel access to the cache, we need
multiple trees. Rather than hashing the buffers by disk address to
select a tree, it seems more sensible to separate trees by typical
access patterns. Most operations use buffers from within a single AG
at a time, so rather than searching lots of different lists,
separate the buffer indexes out into per-AG rbtrees. This means that
searches during metadata operation have a much higher chance of
hitting cache resident nodes, and that updates of the tree are less
likely to disturb trees being accessed on other CPUs doing
independent operations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

74f75a0c

xfs: serialise inode reclaim within an AG · 69b491c2

由 Dave Chinner 提交于 9月 27, 2010

Memory reclaim via shrinkers has a terrible habit of having N+M
concurrent shrinker executions (N = num CPUs, M = num kswapds) all
trying to shrink the same cache. When the cache they are all working
on is protected by a single spinlock, massive contention an
slowdowns occur.

Wrap the per-ag inode caches with a reclaim mutex to serialise
reclaim access to the AG. This will block concurrent reclaim in each
AG but still allow reclaim to scan multiple AGs concurrently. Allow
shrinkers to move on to the next AG if it can't get the lock, and if
we can't get any AG, then start blocking on locks.

To prevent reclaimers from continually scanning the same inodes in
each AG, add a cursor that tracks where the last reclaim got up to
and start from that point on the next reclaim. This should avoid
only ever scanning a small number of inodes at the satart of each AG
and not making progress. If we have a non-shrinker based reclaim
pass, ignore the cursor and reset it to zero once we are done.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

69b491c2

xfs: batch inode reclaim lookup · e3a20c0b

由 Dave Chinner 提交于 9月 24, 2010

Batch and optimise the per-ag inode lookup for reclaim to minimise
scanning overhead. This involves gang lookups on the radix trees to
get multiple inodes during each tree walk, and tighter validation of
what inodes can be reclaimed without blocking befor we take any
locks.

This is based on ideas suggested in a proof-of-concept patch
posted by Nick Piggin.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e3a20c0b

xfs: implement batched inode lookups for AG walking · 78ae5256

由 Dave Chinner 提交于 9月 28, 2010

With the reclaim code separated from the generic walking code, it is
simple to implement batched lookups for the generic walk code.
Separate out the inode validation from the execute operations and
modify the tree lookups to get a batch of inodes at a time.

Reclaim operations will be optimised separately.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

78ae5256

xfs: split out inode walk inode grabbing · e13de955

由 Dave Chinner 提交于 9月 28, 2010

When doing read side inode cache walks, the code to validate and
grab an inode is common to all callers. Split it out of the execute
callbacks in preparation for batching lookups. Similarly, split out
the inode reference dropping from the execute callbacks into the
main lookup look to be symmetric with the grab.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e13de955

xfs: split inode AG walking into separate code for reclaim · 65d0f205

由 Dave Chinner 提交于 9月 24, 2010

The reclaim walk requires different locking and has a slightly
different walk algorithm, so separate it out so that it can be
optimised separately.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

65d0f205

xfs: remove buftarg hash for external devices · 69d6cc76

由 Dave Chinner 提交于 9月 22, 2010

For RT and external log devices, we never use hashed buffers on them
now.  Remove the buftarg hash tables that are set up for them.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

69d6cc76

xfs: kill XBF_FS_MANAGED buffers · 26af6552

由 Dave Chinner 提交于 9月 22, 2010

Filesystem level managed buffers are buffers that have their
lifecycle controlled by the filesystem layer, not the buffer cache.
We currently cache these buffers, which makes cleanup and cache
walking somewhat troublesome. Convert the fs managed buffers to
uncached buffers obtained by via xfs_buf_get_uncached(), and remove
the XBF_FS_MANAGED special cases from the buffer cache.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

26af6552

xfs: store xfs_mount in the buftarg instead of in the xfs_buf · ebad861b

由 Dave Chinner 提交于 9月 22, 2010

Each buffer contains both a buftarg pointer and a mount pointer. If
we add a mount pointer into the buftarg, we can avoid needing the
b_mount field in every buffer and grab it from the buftarg when
needed instead. This shrinks the xfs_buf by 8 bytes.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

ebad861b

xfs: introduced uncached buffer read primitve · 5adc94c2

由 Dave Chinner 提交于 9月 24, 2010

To avoid the need to use cached buffers for single-shot or buffers
cached at the filesystem level, introduce a new buffer read
primitive that bypasses the cache an reads directly from disk.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

5adc94c2

xfs: rename xfs_buf_get_nodaddr to be more appropriate · 686865f7

由 Dave Chinner 提交于 9月 24, 2010

xfs_buf_get_nodaddr() is really used to allocate a buffer that is
uncached. While it is not directly assigned a disk address, the fact
that they are not cached is a more important distinction. With the
upcoming uncached buffer read primitive, we should be consistent
with this disctinction.

While there, make page allocation in xfs_buf_get_nodaddr() safe
against memory reclaim re-entrancy into the filesystem by allowing
a flags parameter to be passed.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

686865f7

xfs: don't use vfs writeback for pure metadata modifications · dcd79a14

由 Dave Chinner 提交于 9月 28, 2010

Under heavy multi-way parallel create workloads, the VFS struggles
to write back all the inodes that have been changed in age order.
The bdi flusher thread becomes CPU bound, spending 85% of it's time
in the VFS code, mostly traversing the superblock dirty inode list
to separate dirty inodes old enough to flush.

We already keep an index of all metadata changes in age order - in
the AIL - and continued log pressure will do age ordered writeback
without any extra overhead at all. If there is no pressure on the
log, the xfssyncd will periodically write back metadata in ascending
disk address offset order so will be very efficient.

Hence we can stop marking VFS inodes dirty during transaction commit
or when changing timestamps during transactions. This will keep the
inodes in the superblock dirty list to those containing data or
unlogged metadata changes.

However, the timstamp changes are slightly more complex than this -
there are a couple of places that do unlogged updates of the
timestamps, and the VFS need to be informed of these. Hence add a
new function xfs_trans_ichgtime() for transactional changes,
and leave xfs_ichgtime() for the non-transactional changes.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

dcd79a14

xfs: lockless per-ag lookups · e176579e

由 Dave Chinner 提交于 9月 22, 2010

When we start taking a reference to the per-ag for every cached
buffer in the system, kernel lockstat profiling on an 8-way create
workload shows the mp->m_perag_lock has higher acquisition rates
than the inode lock and has significantly more contention. That is,
it becomes the highest contended lock in the system.

The perag lookup is trivial to convert to lock-less RCU lookups
because perag structures never go away. Hence the only thing we need
to protect against is tree structure changes during a grow. This can
be done simply by replacing the locking in xfs_perag_get() with RCU
read locking. This removes the mp->m_perag_lock completely from this
path.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e176579e

xfs: Introduce XFS_IOC_ZERO_RANGE · 44722352

由 Dave Chinner 提交于 8月 24, 2010

XFS_IOC_ZERO_RANGE is the equivalent of an atomic XFS_IOC_UNRESVSP/
XFS_IOC_RESVSP call pair. It enabled ranges of written data to be
turned into zeroes without requiring IO or having to free and
reallocate the extents in the range given as would occur if we had
to punch and then preallocate them separately.  This enables
applications to zero parts of files very quickly without changing
the layout of the files in any way.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

44722352

xfs: use range primitives for xfs page cache operations · 3ae4c9de

由 Dave Chinner 提交于 8月 24, 2010

While XFS passes ranges to operate on from the core code, the
functions being called ignore the either the entire range or the end
of the range. This is historical because when the function were
written linux didn't have the necessary range operations. Update the
functions to use the correct operations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

3ae4c9de

11 10月, 2010 1 次提交

workqueue: add and use WQ_MEM_RECLAIM flag · 6370a6ad

由 Tejun Heo 提交于 10月 11, 2010

Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
WQ_MEM_RECLAIM.

This makes the API users express the intent of the workqueue instead
of indicating the internal mechanism used to guarantee forward
progress.  This is also to make it cleaner to add more semantics to
WQ_MEM_RECLAIM.  For example, if deemed necessary, memory reclaim
workqueues can be made highpri.

This patch doesn't introduce any functional change.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>

6370a6ad

07 10月, 2010 1 次提交

xfs: properly account for reclaimed inodes · 081003ff

由 Johannes Weiner 提交于 10月 01, 2010

When marking an inode reclaimable, a per-AG counter is increased, the
inode is tagged reclaimable in its per-AG tree, and, when this is the
first reclaimable inode in the AG, the AG entry in the per-mount tree
is also tagged.

When an inode is finally reclaimed, however, it is only deleted from
the per-AG tree.  Neither the counter is decreased, nor is the parent
tree's AG entry untagged properly.

Since the tags in the per-mount tree are not cleared, the inode
shrinker iterates over all AGs that have had reclaimable inodes at one
point in time.

The counters on the other hand signal an increasing amount of slab
objects to reclaim.  Since "70e60ce7 xfs: convert inode shrinker to
per-filesystem context" this is not a real issue anymore because the
shrinker bails out after one iteration.

But the problem was observable on a machine running v2.6.34, where the
reclaimable work increased and each process going into direct reclaim
eventually got stuck on the xfs inode shrinking path, trying to scan
several million objects.

Fix this by properly unwinding the reclaimable-state tracking of an
inode when it is reclaimed.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: stable@kernel.org
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

081003ff

17 9月, 2010 1 次提交

block: remove BLKDEV_IFL_WAIT · dd3932ed

由 Christoph Hellwig 提交于 9月 16, 2010

All the blkdev_issue_* helpers can only sanely be used for synchronous
caller. To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout. For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

dd3932ed

10 9月, 2010 3 次提交

xfs: log IO completion workqueue is a high priority queue · 51749e47

由 Dave Chinner 提交于 9月 08, 2010

The workqueue implementation in 2.6.36-rcX has changed, resulting
in the workqueues no longer having dedicated threads for work
processing. This has caused severe livelocks under heavy parallel
create workloads because the log IO completions have been getting
held up behind metadata IO completions.  Hence log commits would
stall, memory allocation would stall because pages could not be
cleaned, and lock contention on the AIL during inode IO completion
processing was being seen to slow everything down even further.

By making the log Io completion workqueue a high priority workqueue,
they are queued ahead of all data/metadata IO completions and
processed before the data/metadata completions. Hence the log never
gets stalled, and operations needed to clean memory can continue as
quickly as possible. This avoids the livelock conditions and allos
the system to keep running under heavy load as per normal.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

51749e47

xfs: prevent reading uninitialized stack memory · a122eb2f

由 Dan Rosenberg 提交于 9月 06, 2010

The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
bytes of uninitialized stack memory, because the fsxattr struct
declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
the 12-byte fsx_pad member before copying it back to the user.  This
patch takes care of it.
Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

a122eb2f

xfs: replace barriers with explicit flush / FUA usage · 80f6c29d

由 Christoph Hellwig 提交于 8月 18, 2010

Switch to the WRITE_FLUSH_FUA flag for log writes and remove the EOPNOTSUPP
detection for barriers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

80f6c29d

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功