提交 · efb170c22867cdc6f770de441bdefecec6712199 · openeuler / Kernel

08 8月, 2014 4 次提交

由 Al Viro 提交于 10年前

Add a new field to fs_pin - kill(pin). That's what umount and r/o remount
will be calling for all pins attached to vfsmount and superblock resp.
Called after bumping the refcount, so it won't go away under us. Dropping
the refcount is responsibility of the instance. All generic stuff moved to
fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from
fs/super.c and fs/namespace.c. After that - death to mnt_pin(); it was
intended to be usable as generic mechanism for code that wants to attach
objects to vfsmount, so that they would not make the sucker busy and
would get killed on umount. Never got it right; it remained acct.c-specific
all along. Now it's very close to being killable.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

efb170c2

drop ->s_umount around acct_auto_close() · 0aec09d0

由 Al Viro 提交于 10年前

just repeat the frozen check after regaining it, and check that sb
is still alive.  If several threads hit acct_auto_close() at the
same time, acct_auto_close() will survive that just fine.  And we
really don't want to play with writes and closing the file with
->s_umount held exclusive - it's a deadlock country.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0aec09d0

acct: get rid of acct_list · 215752fc

由 Al Viro 提交于 10年前

Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

215752fc

A
acct: switch to __kernel_write() · ed44724b
由 Al Viro 提交于 10年前
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ed44724b

01 8月, 2014 2 次提交

vfs: fix check for fallocate on active swapfile · 6d2b6170

由 Eric Biggers 提交于 10年前

Fix the broken check for calling sys_fallocate() on an active swapfile,
introduced by commit 0790b31b ("fs: disallow all fallocate
operation on active swapfile").
Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6d2b6170

direct-io: fix AIO regression · af436472

由 Christoph Hellwig 提交于 10年前

The direct-io.c rewrite to use the iov_iter infrastructure stopped updating
the size field in struct dio_submit, and thus rendered the check for
allowing asynchronous completions to always return false.  Fix this by
comparing it to the count of bytes in the iov_iter instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NTim Chen <tim.c.chen@linux.intel.com>
Tested-by: NTim Chen <tim.c.chen@linux.intel.com>

af436472

30 7月, 2014 1 次提交

AFS: Correctly assemble the client UUID · 0ef13515

由 David Howells 提交于 10年前

Correctly assemble the client UUID by OR'ing in the flags rather than
assigning them over the other components.
Reported-by: NHimangi Saraogi <himangi774@gmail.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ef13515

24 7月, 2014 6 次提交

fs: umount on symlink leaks mnt count · 295dc39d

由 Vasily Averin 提交于 10年前

Currently umount on symlink blocks following umount:

/vz is separate mount

# ls /vz/ -al | grep test
drwxr-xr-x.  2 root root       4096 Jul 19 01:14 testdir
lrwxrwxrwx.  1 root root         11 Jul 19 01:16 testlink -> /vz/testdir
# umount -l /vz/testlink
umount: /vz/testlink: not mounted (expected)

# lsof /vz
# umount /vz
umount: /vz: device is busy. (unexpected)

In this case mountpoint_last() gets an extra refcount on path->mnt
Signed-off-by: NVasily Averin <vvs@openvz.org>
Acked-by: NIan Kent <raven@themaw.net>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>

295dc39d

direct-io: fix uninitialized warning in do_direct_IO() · 6fcc5420

由 Boaz Harrosh 提交于 10年前

The following warnings:

  fs/direct-io.c: In function ‘__blockdev_direct_IO’:
  fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  fs/direct-io.c:913:16: note: ‘to’ was declared here
  fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  fs/direct-io.c:913:10: note: ‘from’ was declared here

are false positive because dio_get_page() either fails, or sets both
'from' and 'to'.

Paul Bolle said ...
Maybe it's better to move initializing "to" and "from" out of
dio_get_page(). That _might_ make it easier for both the the reader and
the compiler to understand what's going on. Something like this:

Christoph Hellwig said ...
The fix of moving the code definitively looks nicer, while I think
uninitialized_var is horrible wart that won't get anywhere near my code.

Boaz Harrosh: I agree with Christoph and Paul
Signed-off-by: NBoaz Harrosh <boaz@plexistor.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6fcc5420

simple_xattr: permit 0-size extended attributes · 4e66d445

由 Hugh Dickins 提交于 10年前

If a filesystem uses simple_xattr to support user extended attributes,
LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate
memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes
values of zero size.  Fix that off-by-one (but apparently no filesystem
needs them yet).
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e66d445

coredump: fix the setting of PF_DUMPCORE · aed8adb7

由 Silesh C V 提交于 10年前

Commit 079148b9 ("coredump: factor out the setting of PF_DUMPCORE")
cleaned up the setting of PF_DUMPCORE by removing it from all the
linux_binfmt->core_dump() and moving it to zap_threads().But this ended
up clearing all the previously set flags.  This causes issues during
core generation when tsk->flags is checked again (eg.  for PF_USED_MATH
to dump floating point registers).  Fix this.
Signed-off-by: NSilesh C V <svellattu@mvista.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: <stable@vger.kernel.org>	[3.10+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aed8adb7

sched: Make task->real_start_time nanoseconds based · 57e0be04

由 Thomas Gleixner 提交于 10年前

Simplify the only user of this data by removing the timespec
conversion.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

57e0be04

timerfd: Use ktime_mono_to_real() · 53cc7bad

由 Thomas Gleixner 提交于 10年前

We have a few other use cases of ktime_get_monotonic_offset() which
can be optimized with ktime_mono_to_real(). The timerfd code uses the
offset only for comparison, so we can use ktime_mono_to_real(0) for
this as well.

Funny enough text size shrinks with that on ARM and x8664 !?
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

53cc7bad

23 7月, 2014 1 次提交

NFSD: Fix crash encoding lock reply on 32-bit · f98bac5a

由 Kinglong Mee 提交于 10年前

Commit 8c7424cf "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.

Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).

(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the

	memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
	                                sizeof(stateid_t));

in nfsd4_lock().  In the 32-bit case this overwrites ld_owner.data.)
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cf ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f98bac5a

22 7月, 2014 2 次提交

fuse: add FUSE_NO_OPEN_SUPPORT flag to INIT · d7afaec0

由 Andrew Gallagher 提交于 10年前

Here some additional changes to set a capability flag so that clients can
detect when it's appropriate to return -ENOSYS from open.

This amends the following commit introduced in 3.14:

  7678ac50  fuse: support clients that don't implement 'open'

However we can only add the flag to 3.15 and later since there was no
protocol version update in 3.14.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+

d7afaec0

fuse: s_time_gran fix · a800bad3

由 Miklos Szeredi 提交于 10年前

Default s_time_gran is 1, don't overwrite that if userspace didn't
explicitly specify one.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: <stable@vger.kernel.org> # v3.15+

a800bad3

20 7月, 2014 2 次提交

btrfs: test for valid bdev before kobj removal in btrfs_rm_device · 0bfaa9c5

由 Eric Sandeen 提交于 10年前

commit 99994cde btrfs: dev delete should remove sysfs entry
added a btrfs_kobj_rm_device, which dereferences device->bdev...
right after we check whether device->bdev might be NULL.

I don't honestly know if it's possible to have a NULL device->bdev
here, but assuming that it is (given the test), we need to move
the kobject removal to be under that test.

(Coverity spotted this)
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NChris Mason <clm@fb.com>

0bfaa9c5

Btrfs: fix abnormal long waiting in fsync · 98ce2ded

由 Liu Bo 提交于 10年前

xfstests generic/127 detected this problem.

With commit 7fc34a62, now fsync will only flush
data within the passed range.  This is the cause of the above problem,
-- btrfs's fsync has a stage called 'sync log' which will wait for all the
ordered extents it've recorded to finish.

In xfstests/generic/127, with mixed operations such as truncate, fallocate,
punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
mmap, and then msync.  And I find that msync will wait for quite a long time
(about 20s in my case), thanks to ftrace, it turns out that the previous
fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
there can be some ordered extents created but not getting corresponding pages
flushed, then they're left in memory until we fsync which runs into the
stage 'sync log', and fsync will just wait for the system writeback thread
to flush those pages and get ordered extents finished, so the latency is
inevitable.

This adds a flush similar to btrfs_start_ordered_extent() in
btrfs_wait_logged_extents() to fix that.
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

98ce2ded

18 7月, 2014 10 次提交

GFS2: fs/gfs2/rgrp.c: kernel-doc warning fixes · 27ff6a0f

由 Fabian Frederick 提交于 10年前

Cc: cluster-devel@redhat.com
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

27ff6a0f

GFS2: memcontrol: Spelling s/invlidate/invalidate/ · 6b49d1d9

由 Geert Uytterhoeven 提交于 10年前

Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: cluster-devel@redhat.com
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6b49d1d9

GFS2: Allow caching of glocks for flock · 97a4f1d7

由 Bob Peterson 提交于 10年前

This patch removes the GLF_NOCACHE flag from the glocks associated with
flocks. There should be no good reason not to cache glocks for flocks:
they only force the glock to be demoted before they can be reacquired,
which can slow down performance and even cause glock hangs, especially
in cases where the flocks are held in Shared (SH) mode.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

97a4f1d7

GFS2: Allow flocks to use normal glock dq rather than dq_wait · 5bef3e7c

由 Bob Peterson 提交于 10年前

This patch allows flock glocks to use a non-blocking dequeue rather
than dq_wait. It also reverts the previous patch I had posted regarding
dq_wait. The reverted patch isn't necessarily a bad idea, but I decided
this might avoid unforeseen side effects, and was therefore safer.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

5bef3e7c

GFS2: replace count*size kzalloc by kcalloc · 6ec43b18

由 Fabian Frederick 提交于 10年前

kcalloc manages count*sizeof overflow.

Cc: cluster-devel@redhat.com
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

6ec43b18

GFS2: Use GFP_NOFS when allocating glocks · fe0bbd29

由 Steven Whitehouse 提交于 10年前

Normally GFP_KERNEL is ok here, but there is now a rarely used code path
relating to deallocation of unlinked inodes (in certain corner cases)
which if hit at times of memory shortage can cause recursion while
trying to free memory.

One solution would be to try and move the gfs2_glock_get() call so
that it is no longer called while another glock is held, but that
doesn't look at all easy, so GFP_NOFS is the best solution for the
time being.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

fe0bbd29

GFS2: Fix race in glock lru glock disposal · 94a09a39

由 Steven Whitehouse 提交于 10年前

We must not leave items on the LRU list with GLF_LOCK set, since
they can be removed if the glock is brought back into use, which
may then potentially result in a hang, waiting for GLF_LOCK to
clear.

It doesn't happen very often, since it requires a glock that has
not been used for a long time to be brought back into use at the
same moment that the shrinker is part way through disposing of
glocks.

The fix is to set GLF_LOCK at a later time, when we already know
that the other locks can be obtained. Also, we now only release
the lru_lock in case a resched is needed, rather than on every
iteration.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

94a09a39

GFS2: Only wait for demote when last holder is dequeued · 79272b35

由 Bob Peterson 提交于 10年前

Function gfs2_glock_dq_wait is supposed to dequeue a glock and then
wait for the lock to be demoted. The problem is, if this is a shared
lock, its demote will depend on the other holders, which means you
might end up waiting forever because the other process is blocked.
This problem is especially apparent when dealing with nested flocks.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

79272b35

timerfd: Implement timerfd_ioctl method to restore timerfd_ctx::ticks, v3 · 5442e9fb

由 Cyrill Gorcunov 提交于 10年前

The read() of timerfd files allows to fetch the number of timer ticks
while there is no way to set it back from userspace.

To restore the timer's state as it was at checkpoint moment we need
a path to bring @ticks back. Initially I thought about writing ticks
back via write() interface but it seems such API is somehow obscure.

Instead implement timerfd_ioctl() method with TFD_IOC_SET_TICKS
command which allows to adjust @ticks into non-zero value waking
up the waiters.

I wrapped code with CONFIG_CHECKPOINT_RESTORE which can be
dropped off if there users except c/r camp appear.

v2 (by akpm@):
 - Use define timerfd_ioctl NULL for non c/r config

v3:
 - Use copy_from_user for @ticks fetching since
   not all arch support get_user for 8 byte argument
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christopher Covington <cov@codeaurora.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Link: http://lkml.kernel.org/r/20140715215703.285617923@openvz.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5442e9fb

timerfd: Implement show_fdinfo method · af9c4957

由 Cyrill Gorcunov 提交于 10年前

For checkpoint/restore of timerfd files we need to know how exactly
the timer were armed, to be able to recreate it on restore stage.
Thus implement show_fdinfo method which provides enough information
for that.

One of significant changes I think is the addition of @settime_flags
member. Currently there are two flags TFD_TIMER_ABSTIME and
TFD_TIMER_CANCEL_ON_SET, and the second can be found from
@might_cancel variable but in case if the flags will be extended
in future we most probably will have to somehow remember them
explicitly anyway so I guss doing that right now won't hurt.

To not bloat the timerfd_ctx structure I've converted @expired
to short integer and defined @settime_flags as short too.

v2 (by avagin@, vdavydov@ and tglx@):

 - Add it_value/it_interval fields
 - Save flags being used in timerfd_setup in context

v3 (by tglx@):
 - don't forget to use CONFIG_PROC_FS

v4 (by akpm@):
 -Use define timerfd_show NULL for non c/r config
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Link: http://lkml.kernel.org/r/20140715215703.114365649@openvz.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

af9c4957

16 7月, 2014 1 次提交

quota: missing lock in dqcache_shrink_scan() · d68aab6b

由 Niu Yawei 提交于 10年前

Commit 1ab6c499 (fs: convert fs shrinkers to new scan/count API)
accidentally removed locking from quota shrinker. Fix it -
dqcache_shrink_scan() should use dq_list_lock to protect the
scan on free_dquots list.

CC: stable@vger.kernel.org
Fixes: 1ab6c499Signed-off-by: NNiu Yawei <yawei.niu@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>

d68aab6b

15 7月, 2014 4 次提交

xfs: null unused quota inodes when quota is on · 03e01349

由 Dave Chinner 提交于 10年前

When quota is on, it is expected that unused quota inodes have a
value of NULLFSINO. The changes to support a separate project quota
in 3.12 broken this rule for non-project quota inode enabled
filesystem, as the code now refuses to write the group quota inode
if neither group or project quotas are enabled. This regression was
introduced by commit d892d586 ("xfs: Start using pquotaino from the
superblock").

In this case, we should be writing NULLFSINO rather than nothing to
ensure that we leave the group quota inode in a valid state while
quotas are enabled.

Failure to do so doesn't cause a current kernel to break - the
separate project quota inodes introduced translation code to always
treat a zero inode as NULLFSINO. This was introduced by commit
01026297 ("xfs: Initialize all quota inodes to be NULLFSINO") with is
also in 3.12 but older kernels do not do this and hence taking a
filesystem back to an older kernel can result in quotas failing
initialisation at mount time. When that happens, we see this in
dmesg:

[ 1649.215390] XFS (sdb): Mounting Filesystem
[ 1649.316894] XFS (sdb): Failed to initialize disk quotas.
[ 1649.316902] XFS (sdb): Ending clean mount

By ensuring that we write NULLFSINO to quota inodes that aren't
active, we avoid this problem. We have to be really careful when
determining if the quota inodes are active or not, because we don't
want to write a NULLFSINO if the quota inodes are active and we
simply aren't updating them.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

03e01349

xfs: refine the allocation stack switch · cf11da9c

由 Dave Chinner 提交于 10年前

The allocation stack switch at xfs_bmapi_allocate() has served it's
purpose, but is no longer a sufficient solution to the stack usage
problem we have in the XFS allocation path.

Whilst the kernel stack size is now 16k, that is not a valid reason
for undoing all our "keep stack usage down" modifications. What it
does allow us to do is have the freedom to refine and perfect the
modifications knowing that if we get it wrong it won't blow up in
our faces - we have a safety net now.

This is important because we still have the issue of older kernels
having smaller stacks and that they are still supported and are
demonstrating a wide range of different stack overflows.  Red Hat
has several open bugs for allocation based stack overflows from
directory modifications and direct IO block allocation and these
problems still need to be solved. If we can solve them upstream,
then distro's won't need to bake their own unique solutions.

To that end, I've observed that every allocation based stack
overflow report has had a specific characteristic - it has happened
during or directly after a bmap btree block split. That event
requires a new block to be allocated to the tree, and so we
effectively stack one allocation stack on top of another, and that's
when we get into trouble.

A further observation is that bmap btree block splits are much rarer
than writeback allocation - over a range of different workloads I've
observed the ratio of bmap btree inserts to splits ranges from 100:1
(xfstests run) to 10000:1 (local VM image server with sparse files
that range in the hundreds of thousands to millions of extents).
Either way, bmap btree split events are much, much rarer than
allocation events.

Finally, we have to move the kswapd state to the allocation workqueue
work when allocation is done on behalf of kswapd. This is proving to
cause significant perturbation in performance under memory pressure
and appears to be generating allocation deadlock warnings under some
workloads, so avoiding the use of a workqueue for the majority of
kswapd writeback allocation will minimise the impact of such
behaviour.

Hence it makes sense to move the stack switch to xfs_btree_split()
and only do it for bmap btree splits. Stack switches during
allocation will be much rarer, so there won't be significant
performacne overhead caused by switching stacks. The worse case
stack from all allocation paths will be split, not just writeback.
And the majority of memory allocations will be done in the correct
context (e.g. kswapd) without causing additional latency, and so we
simplify the memory reclaim interactions between processes,
workqueues and kswapd.

The worst stack I've been able to generate with this patch in place
is 5600 bytes deep. It's very revealing because we exit XFS at:

37)     1768      64   kmem_cache_alloc+0x13b/0x170

about 1800 bytes of stack consumed, and the remaining 3800 bytes
(and 36 functions) is memory reclaim, swap and the IO stack. And
this occurs in the inode allocation from an open(O_CREAT) syscall,
not writeback.

The amount of stack being used is much less than I've previously be
able to generate - fs_mark testing has been able to generate stack
usage of around 7k without too much trouble; with this patch it's
only just getting to 5.5k. This is primarily because the metadata
allocation paths (e.g. directory blocks) are no longer causing
double splits on the same stack, and hence now stack tracing is
showing swapping being the worst stack consumer rather than XFS.

Performance of fs_mark inode create workloads is unchanged.
Performance of fs_mark async fsync workloads is consistently good
with context switches reduced by around 150,000/s (30%).
Performance of dbench, streaming IO and postmark is unchanged.
Allocation deadlock warnings have not been seen on the workloads
that generated them since adding this patch.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

cf11da9c

Revert "xfs: block allocation work needs to be kswapd aware" · aa182e64

由 Dave Chinner 提交于 10年前

This reverts commit 1f6d6482.

This commit resulted in regressions in performance in low
memory situations where kswapd was doing writeback of delayed
allocation blocks. It resulted in significant parallelism of the
kswapd work and with the special kswapd flags meant that hundreds of
active allocation could dip into kswapd specific memory reserves and
avoid being throttled. This cause a large amount of performance
variation, as well as random OOM-killer invocations that didn't
previously exist.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

aa182e64

aio: protect reqs_available updates from changes in interrupt handlers · 263782c1

由 Benjamin LaHaise 提交于 10年前

As of commit f8567a38 it is now possible to
have put_reqs_available() called from irq context.  While put_reqs_available()
is per cpu, it did not protect itself from interrupts on the same CPU.  This
lead to aio_complete() corrupting the available io requests count when run
under a heavy O_DIRECT workloads as reported by Robert Elliott.  Fix this by
disabling irq updates around the per cpu batch updates of reqs_available.

Many thanks to Robert and folks for testing and tracking this down.
Reported-by: NRobert Elliot <Elliott@hp.com>
Tested-by: NRobert Elliot <Elliott@hp.com>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org>
Cc: stable@vger.kenel.org

263782c1

14 7月, 2014 3 次提交

fuse: replace count*size kzalloc by kcalloc · f2b3455e

由 Fabian Frederick 提交于 10年前

kcalloc manages count*sizeof overflow.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

f2b3455e

fuse: release temporary page if fuse_writepage_locked() failed · 27f1b363

由 Maxim Patlasov 提交于 10年前

tmp_page to be freed if fuse_write_file_get() returns NULL.
Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

27f1b363

NFS: Don't reset pg_moreio in __nfs_pageio_add_request · f563b89b

由 Trond Myklebust 提交于 10年前

Once we've started sending unstable NFS writes, we do not want to
clear pg_moreio, or we may end up sending the very last request as
a stable write if the commit lists are still empty.

Do, however, reset pg_moreio in the case where we end up having to
recoalesce the write if an attempt to use pNFS failed.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f563b89b

13 7月, 2014 4 次提交

NFS: Remove 2 unused variables · aafe3750

由 Trond Myklebust 提交于 10年前

Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

aafe3750

nfs: handle multiple reqs in nfs_wb_page_cancel · 3e217045

由 Weston Andros Adamson 提交于 10年前

Use nfs_lock_and_join_requests to merge all subrequests into the head request -
this cancels and dereferences all subrequests.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3e217045

nfs: handle multiple reqs in nfs_page_async_flush · d4581383

由 Weston Andros Adamson 提交于 10年前

Change nfs_find_and_lock_request so nfs_page_async_flush can handle multiple
requests in a page. There is only one request for a page the first time
nfs_page_async_flush is called, but if a write or commit fails, async_flush
is called again and there may be multiple requests associated with the page.
The solution is to merge all the requests in a page group into a single
request before calling nfs_pageio_add_request.

Rename nfs_find_and_lock_request to nfs_lock_and_join_requests and
change it to first lock all requests for the page, then cancel and merge
all subrequests into the head request.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d4581383

nfs: change find_request to find_head_request · 84d3a9a9

由 Weston Andros Adamson 提交于 10年前

nfs_page_find_request_locked* should find the head request for that page.
Rename the functions and add comments to make this clear, and fix a bug
that could return a subrequest when page_private isn't set on the page.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

84d3a9a9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功