提交 · a1bd120d13e586ea1c424048fd2c8420a442852a · openanolis / cloud-kernel

22 5月, 2010 2 次提交

vfs: Add inode uid,gid,mode init helper · a1bd120d

由 Dmitry Monakhov 提交于 3月 04, 2010

Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a1bd120d

fs: inode.c use atomic_inc_return in __iget · 2e147f1e

由 Richard Kennedy 提交于 5月 14, 2010

Using atomic_inc_return in __iget(struct inode *inode) makes the intent
of this code clearer and generates less code on processors that have
this operation.

On x86_64 this patch reduces the text size of inode.o by 12 bytes.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>

----
patch against 2.6.34-rc7
compiled & tested on x86_64 AMD X2

I've been running with this patch applied for several weeks with no
obvious problems.
regards
Richard
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2e147f1e

12 4月, 2010 1 次提交

security: remove dead hook inode_delete · 9d5ed77d

由 Eric Paris 提交于 4月 07, 2010

Unused hook.  Remove.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

9d5ed77d

05 3月, 2010 2 次提交

dquot: move dquot initialization responsibility into the filesystem · 907f4554

由 Christoph Hellwig 提交于 3月 03, 2010

Currently various places in the VFS call vfs_dq_init directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the initialization.   For most metadata operations
this is a straight forward move into the methods, but for truncate and
open it's a bit more complicated.

For truncate we currently only call vfs_dq_init for the sys_truncate case
because open already takes care of it for ftruncate and open(O_TRUNC) - the
new code causes an additional vfs_dq_init for those which is harmless.

For open the initialization is moved from do_filp_open into the open method,
which means it happens slightly earlier now, and only for regular files.
The latter is fine because we don't need to initialize it for operations
on special files, and we already do it as part of the namespace operations
for directories.

Add a dquot_file_open helper that filesystems that support generic quotas
can use to fill in ->open.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

907f4554

dquot: move dquot drop responsibility into the filesystem · 257ba15c

由 Christoph Hellwig 提交于 3月 03, 2010

Currently clear_inode calls vfs_dq_drop directly.  This means
we tie the quota code into the VFS.  Get rid of that and make the
filesystem responsible for the drop inside the ->clear_inode
superblock operation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

257ba15c

18 12月, 2009 1 次提交

kill I_LOCK · eaff8079

由 Christoph Hellwig 提交于 12月 17, 2009

After I_SYNC was split from I_LOCK the leftover is always used together with
I_NEW and thus superflous.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eaff8079

25 10月, 2009 1 次提交

LSM: imbed ima calls in the security hooks · 6c21a7fb

由 Mimi Zohar 提交于 10月 22, 2009

Based on discussions on LKML and LSM, where there are consecutive
security_ and ima_ calls in the vfs layer, move the ima_ calls to
the existing security_ hooks.
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6c21a7fb

24 9月, 2009 4 次提交

vfs: optimize touch_time() too · ce06e0b2

由 Andi Kleen 提交于 9月 18, 2009

Do a similar optimization as earlier for touch_atime.  Getting the lock in
mnt_get_write is relatively costly, so try all avenues to avoid it first.

This patch is careful to still only update inode fields inside the lock
region.

This didn't show up in benchmarks, but it's easy enough to do.

[akpm@linux-foundation.org: fix typo in comment]
[hugh.dickins@tiscali.co.uk: fix inverted test of mnt_want_write_file()]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Valerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce06e0b2

vfs: optimization for touch_atime() · b12536c2

由 Andi Kleen 提交于 9月 18, 2009

Some benchmark testing shows touch_atime to be high up in profile logs for
IO intensive workloads.  Most likely that's due to the lock in
mnt_want_write().  Unfortunately touch_atime first takes the lock, and
then does all the other tests that could avoid atime updates (like noatime
or relatime).

Do it the other way round -- first try to avoid the update and only then
if that didn't succeed take the lock.  That works because none of the
atime avoidance tests rely on locking.

This also eliminates a goto.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NValerie Aurora <vaurora@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b12536c2

vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it · 22fe4042

由 Jan Kara 提交于 9月 18, 2009

Hugetlbfs needs to do special things instead of truncate_inode_pages().
 Currently, it copied generic_forget_inode() except for
truncate_inode_pages() call which is asking for trouble (the code there
isn't trivial).  So create a separate function generic_detach_inode()
which does all the list magic done in generic_forget_inode() and call
it from hugetlbfs_forget_inode().
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22fe4042

fs/inode.c: add dev-id and inode number for debugging in init_special_inode() · af0d9ae8

由 Manish Katiyar 提交于 9月 18, 2009

Add device-id and inode number for better debugging.  This was suggested
by Andreas in one of the threads
http://article.gmane.org/gmane.comp.file-systems.ext4/12062 .

"If anyone has a chance, fixing this error message to be not-useless would
be good...  Including the device name and the inode number would help
track down the source of the problem."
Signed-off-by: NManish Katiyar <mkatiyar@gmail.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

af0d9ae8

23 9月, 2009 1 次提交

fs: turn iprune_mutex into rwsem · 88e0fbc4

由 Nick Piggin 提交于 9月 22, 2009

We have had a report of bad memory allocation latency during DVD-RAM (UDF)
writing.  This is causing the user's desktop session to become unusable.

Jan tracked the cause of this down to UDF inode reclaim blocking:

gnome-screens D ffff810006d1d598     0 20686      1
 ffff810006d1d508 0000000000000082 ffff810037db6718 0000000000000800
 ffff810006d1d488 ffffffff807e4280 ffffffff807e4280 ffff810006d1a580
 ffff8100bccbc140 ffff810006d1a8c0 0000000006d1d4e8 ffff810006d1a8c0
Call Trace:
 [<ffffffff804477f3>] io_schedule+0x63/0xa5
 [<ffffffff802c2587>] sync_buffer+0x3b/0x3f
 [<ffffffff80447d2a>] __wait_on_bit+0x47/0x79
 [<ffffffff80447dc6>] out_of_line_wait_on_bit+0x6a/0x77
 [<ffffffff802c24f6>] __wait_on_buffer+0x1f/0x21
 [<ffffffff802c442a>] __bread+0x70/0x86
 [<ffffffff88de9ec7>] :udf:udf_tread+0x38/0x3a
 [<ffffffff88de0fcf>] :udf:udf_update_inode+0x4d/0x68c
 [<ffffffff88de26e1>] :udf:udf_write_inode+0x1d/0x2b
 [<ffffffff802bcf85>] __writeback_single_inode+0x1c0/0x394
 [<ffffffff802bd205>] write_inode_now+0x7d/0xc4
 [<ffffffff88de2e76>] :udf:udf_clear_inode+0x3d/0x53
 [<ffffffff802b39ae>] clear_inode+0xc2/0x11b
 [<ffffffff802b3ab1>] dispose_list+0x5b/0x102
 [<ffffffff802b3d35>] shrink_icache_memory+0x1dd/0x213
 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
 [<ffffffff802951fa>] alloc_page_vma+0x176/0x189
 [<ffffffff802822d8>] __do_fault+0x10c/0x417
 [<ffffffff80284232>] handle_mm_fault+0x466/0x940
 [<ffffffff8044b922>] do_page_fault+0x676/0xabf

This blocks with iprune_mutex held, which then blocks other reclaimers:

X             D ffff81009d47c400     0 17285  14831
 ffff8100844f3728 0000000000000086 0000000000000000 ffff81000000e288
 ffff81000000da00 ffffffff807e4280 ffffffff807e4280 ffff81009d47c400
 ffffffff805ff890 ffff81009d47c740 00000000844f3808 ffff81009d47c740
Call Trace:
 [<ffffffff80447f8c>] __mutex_lock_slowpath+0x72/0xa9
 [<ffffffff80447e1a>] mutex_lock+0x1e/0x22
 [<ffffffff802b3ba1>] shrink_icache_memory+0x49/0x213
 [<ffffffff8027ede3>] shrink_slab+0xe3/0x158
 [<ffffffff8027fbab>] try_to_free_pages+0x177/0x232
 [<ffffffff8027a578>] __alloc_pages+0x1fa/0x392
 [<ffffffff8029507f>] alloc_pages_current+0xd1/0xd6
 [<ffffffff80279ac0>] __get_free_pages+0xe/0x4d
 [<ffffffff802ae1b7>] __pollwait+0x5e/0xdf
 [<ffffffff8860f2b4>] :nvidia:nv_kern_poll+0x2e/0x73
 [<ffffffff802ad949>] do_select+0x308/0x506
 [<ffffffff802adced>] core_sys_select+0x1a6/0x254
 [<ffffffff802ae0b7>] sys_select+0xb5/0x157

Now I think the main problem is having the filesystem block (and do IO) in
inode reclaim.  The problem is that this doesn't get accounted well and
penalizes a random allocator with a big latency spike caused by work
generated from elsewhere.

I think the best idea would be to avoid this.  By design if possible, or
by deferring the hard work to an asynchronous context.  If the latter,
then the fs would probably want to throttle creation of new work with
queue size of the deferred work, but let's not get into those details.

Anyway, the other obvious thing we looked at is the iprune_mutex which is
causing the cascading blocking.  We could turn this into an rwsem to
improve concurrency.  It is unreasonable to totally ban all potentially
slow or blocking operations in inode reclaim, so I think this is a cheap
way to get a small improvement.

This doesn't solve the whole problem of course.  The process doing inode
reclaim will still take the latency hit, and concurrent processes may end
up contending on filesystem locks.  So fs developers should keep these
problems in mind.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88e0fbc4

22 9月, 2009 2 次提交

const: mark remaining inode_operations as const · 6e1d5dcc

由 Alexey Dobriyan 提交于 9月 21, 2009

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e1d5dcc

fs: make sure data stored into inode is properly seen before unlocking new inode · 580be083

由 Jan Kara 提交于 9月 21, 2009

In theory it could happen that on one CPU we initialize a new inode but
clearing of I_NEW | I_LOCK gets reordered before some of the
initialization.  Thus on another CPU we return not fully uptodate inode
from iget_locked().

This seems to fix a corruption issue on ext3 mounted over NFS.

[akpm@linux-foundation.org: add some commentary]
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

580be083

16 9月, 2009 1 次提交

fs: remove bdev->bd_inode_backing_dev_info · 2c96ce9f

由 Jens Axboe 提交于 9月 15, 2009

It has been unused since it was introduced in:

commit 520808bf20e90fdbdb320264ba7dd5cf9d47dcac
Author: Andrew Morton <akpm@osdl.org>
Date:   Fri May 21 00:46:17 2004 -0700

    [PATCH] block device layer: separate backing_dev_info infrastructure

So lets just kill it.
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

2c96ce9f

08 8月, 2009 2 次提交

vfs: add __destroy_inode · 2e00c97e

由 Christoph Hellwig 提交于 8月 07, 2009

When we want to tear down an inode that lost the add to the cache race
in XFS we must not call into ->destroy_inode because that would delete
the inode that won the race from the inode cache radix tree.

This patch provides the __destroy_inode helper needed to fix this,
the actual fix will be in th next patch. As XFS was the only reason
destroy_inode was exported we shift the export to the new __destroy_inode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>

2e00c97e

vfs: fix inode_init_always calling convention · 54e34621

由 Christoph Hellwig 提交于 8月 07, 2009

Currently inode_init_always calls into ->destroy_inode if the additional
initialization fails. That's not only counter-intuitive because
inode_init_always did not allocate the inode structure, but in case of
XFS it's actively harmful as ->destroy_inode might delete the inode from
a radix-tree that has never been added. This in turn might end up
deleting the inode for the same inum that has been instanciated by
another process and cause lots of cause subtile problems.

Also in the case of re-initializing a reclaimable inode in XFS it would
free an inode we still want to keep alive.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>

54e34621

24 6月, 2009 1 次提交
- A
  add caching of ACLs in struct inode · f19d4a8f
  由 Al Viro 提交于 6月 08, 2009
```
No helpers, no conversions yet.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f19d4a8f
23 6月, 2009 1 次提交

vfs: Set special lockdep map for dirs only if not set by fs · 9a7aa12f

由 Jan Kara 提交于 6月 04, 2009

Some filesystems need to set lockdep map for i_mutex differently for
different directories. For example OCFS2 has system directories (for
orphan inode tracking and for gathering all system files like journal
or quota files into a single place) which have different locking
locking rules than standard directories. For a filesystem setting
lockdep map is naturaly done when the inode is read but we have to
modify unlock_new_inode() not to overwrite the lockdep map the filesystem
has set.

Acked-by: peterz@infradead.org
CC: mingo@redhat.com
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

9a7aa12f

13 6月, 2009 1 次提交

trivial: fs/inode: Fix typo in file_update_time nanodoc · 2eadfc0e

由 Wolfram Sang 提交于 4月 02, 2009

The advertised flag for not updating the time was wrong.
Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

2eadfc0e

12 6月, 2009 3 次提交

fs: introduce mnt_clone_write · 96029c4e

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
 avg = 462.286
 std = 5.46106

After:
 avg = 453.12
 std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96029c4e

fsnotify: handle filesystem unmounts with fsnotify marks · 164bc619

由 Eric Paris 提交于 5月 21, 2009

When an fs is unmounted with an fsnotify mark entry attached to one of its
inodes we need to destroy that mark entry and we also (like inotify) send
an unmount event.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

164bc619

fsnotify: add marks to inodes so groups can interpret how to handle those inodes · 3be25f49

由 Eric Paris 提交于 5月 21, 2009

This patch creates a way for fsnotify groups to attach marks to inodes.
These marks have little meaning to the generic fsnotify infrastructure
and thus their meaning should be interpreted by the group that attached
them to the inode's list.

dnotify and inotify  will make use of these markings to indicate which
inodes are of interest to their respective groups.  But this implementation
has the useful property that in the future other listeners could actually
use the marks for the exact opposite reason, aka to indicate which inodes
it had NO interest in.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

3be25f49

07 6月, 2009 1 次提交

integrity: fix IMA inode leak · f07502da

由 Hugh Dickins 提交于 6月 06, 2009

CONFIG_IMA=y inode activity leaks iint_cache and radix_tree_node objects
until the system runs out of memory. Nowhere is calling ima_inode_free()
a.k.a. ima_iint_delete(). Fix that by calling it from destroy_inode().
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f07502da

06 6月, 2009 1 次提交

ext3/4 with synchronous writes gets wedged by Postfix · 72a43d63

由 Al Viro 提交于 5月 13, 2009

OK, that's probably the easiest way to do that, as much as I don't like it...
Since iget() et.al. will not accept I_FREEING (will wait to go away
and restart), and since we'd better have serialization between new/free
on fs data structures anyway, we can afford simply skipping I_FREEING
et.al. in insert_inode_locked().

We do that from new_inode, so it won't race with free_inode in any interesting
ways and it won't race with iget (of any origin; nfsd or in case of fs
corruption a lookup) since both still will wait for I_LOCK.
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJan Kara <jack@suse.cz>
Tested-by: NDavid Watson <dbwatson@ukfsn.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

72a43d63

09 5月, 2009 1 次提交

Make checkpatch.pl shut up on fs/inode.c · 6b3304b5

由 Manish Katiyar 提交于 3月 31, 2009

Code Quality According To Mingo(tm) has been vastly improved,
no code has been damaged^Wchanged^Wdamaged.

[commit message rewritten -- AV]
Signed-off-by: NManish Katiyar <mkatiyar@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b3304b5

15 4月, 2009 1 次提交

splice: add helpers for locking pipe inode · 61e0d47c

由 Miklos Szeredi 提交于 4月 14, 2009

There are lots of sequences like this, especially in splice code:

	if (pipe->inode)
		mutex_lock(&pipe->inode->i_mutex);
	/* do something */
	if (pipe->inode)
		mutex_unlock(&pipe->inode->i_mutex);

so introduce helpers which do the conditional locking and unlocking.
Also replace the inode_double_lock() call with a pipe_double_lock()
helper to avoid spreading the use of this functionality beyond the
pipe code.

This patch is just a cleanup, and should cause no behavioral changes.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

61e0d47c

28 3月, 2009 1 次提交

fs: avoid I_NEW inodes · aabb8fdb

由 Nick Piggin 提交于 3月 11, 2009

To be on the safe side, it should be less fragile to exclude I_NEW inodes
from inode list scans by default (unless there is an important reason to
have them).

Normally they will get excluded (eg.  by zero refcount or writecount etc),
however it is a bit fragile for list walkers to know exactly what parts of
the inode state is set up and valid to test when in I_NEW.  So along these
lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
checks with them too -- this shouldn't be a problem should it?)
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: NJan Kara <jack@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aabb8fdb

27 3月, 2009 1 次提交

Allow relatime to update atime once a day · 11ff6f05

由 Matthew Garrett 提交于 3月 26, 2009

Allow atime to be updated once per day even with relatime. This lets
utilities like tmpreaper (which delete files based on last access time)
continue working, making relatime a plausible default for distributions.
Signed-off-by: NMatthew Garrett <mjg@redhat.com>
Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
Acked-by: NValerie Aurora Henson <vaurora@redhat.com>
Acked-by: NAlan Cox <alan@redhat.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11ff6f05

26 3月, 2009 1 次提交

vfs: Use lowercase names of quota functions · 9e3509e2

由 Jan Kara 提交于 1月 26, 2009

Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: NJan Kara <jack@suse.cz>
CC: Alexander Viro <viro@zeniv.linux.org.uk>

9e3509e2

13 3月, 2009 1 次提交

fs: new inode i_state corruption fix · 7ef0d737

由 Nick Piggin 提交于 3月 12, 2009

There was a report of a data corruption
http://lkml.org/lkml/2008/11/14/121.  There is a script included to
reproduce the problem.

During testing, I encountered a number of strange things with ext3, so I
tried ext2 to attempt to reduce complexity of the problem.  I found that
fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
cleared, even though instrumentation showed that unlock_new_inode had
already been called for that inode.  This points to memory scribble, or
synchronisation problme.

i_state of I_NEW inodes is not protected by inode_lock because other
processes are not supposed to touch them until I_LOCK (and I_NEW) is
cleared.  Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
i_state revealed that generic_sync_sb_inodes is picking up new inodes from
the inode lists and passing them to __writeback_single_inode without
waiting for I_NEW.  Subsequently modifying i_state causes corruption.  In
my case it would look like this:

CPU0                            CPU1
unlock_new_inode()              __sync_single_inode()
 reg <- inode->i_state
 reg -> reg & ~(I_LOCK|I_NEW)   reg <- inode->i_state
 reg -> inode->i_state          reg -> reg | I_SYNC
                                reg -> inode->i_state

Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

Fix for this is rather than wait for I_NEW inodes, just skip over them:
inodes concurrently being created are not subject to data integrity
operations, and should not significantly contribute to dirty memory
either.

After this change, I'm unable to reproduce any of the added warnings or
hangs after ~1hour of running.  Previously, the new warnings would start
immediately and hang would happen in under 5 minutes.

I'm also testing on ext3 now, and so far no problems there either.  I
don't know whether this fixes the problem reported above, but it fixes a
real problem for me.

Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net>
Reported-by: NAdrian Hunter <ext-adrian.hunter@nokia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ef0d737

06 2月, 2009 1 次提交

integrity: IMA hooks · 6146f0d5

由 Mimi Zohar 提交于 2月 04, 2009

This patch replaces the generic integrity hooks, for which IMA registered
itself, with IMA integrity hooks in the appropriate places directly
in the fs directory.
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6146f0d5

10 1月, 2009 1 次提交

partial revert of asynchronous inode delete · b32714ba

由 Arjan van de Ven 提交于 1月 09, 2009

let the core of this one bake in -next as well, but leave
some of the infrastructure in place.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

b32714ba

08 1月, 2009 1 次提交

async: make the final inode deletion an asynchronous event · efaee192

由 Arjan van de Ven 提交于 1月 06, 2009

this makes "rm -rf" on a (names cached) kernel tree go from
11.6 to 8.6 seconds on an ext3 filesystem
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

efaee192

07 1月, 2009 2 次提交

fs/inode: fix kernel-doc notation · 0bc02f3f

由 Randy Dunlap 提交于 1月 06, 2009

Fix kernel-doc notation:

Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'sb'
Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'inode'
Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'sb'
Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'inode'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0bc02f3f

mm: remove GFP_HIGHUSER_PAGECACHE · 3c1d4378

由 Hugh Dickins 提交于 1月 06, 2009

GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
that harder to track down: remove it, and its out-of-work brothers
GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

Since we're making that improvement to hotremove_migrate_alloc(), I think
we can now also remove one of the "o"s from its comment.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c1d4378

06 1月, 2009 1 次提交

zero i_uid/i_gid on inode allocation · 56ff5efa

由 Al Viro 提交于 12月 09, 2008

... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

56ff5efa

01 1月, 2009 1 次提交

nfsd/create race fixes, infrastructure · 261bca86

由 Al Viro 提交于 12月 30, 2008

new helpers - insert_inode_locked() and insert_inode_locked4().
Hash new inode, making sure that there's no such inode in icache
already.  If there is and it does not end up unhashed (as would
happen if we have nfsd trying to resolve a bogus fhandle), fail.
Otherwise insert our inode into hash and succeed.

In either case have i_state set to new+locked; cleanup ends up
being simpler with such calling conventions.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

261bca86

10 11月, 2008 1 次提交

fs: xfs needs inode_wait to be exported · d44dab8d

由 Stephen Rothwell 提交于 11月 10, 2008

Since wait_on_inode() references it.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>

d44dab8d

30 10月, 2008 1 次提交

Inode: export symbol destroy_inode · 087e3b04

由 Christoph Hellwig 提交于 10月 30, 2008

To make sure we free the security data inodes need to be freed using
the proper VFS helper (which we also need to export for this). We mark
these inodes bad so we can skip the flush path for them.
Signed-off-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NLachlan McIlroy <lachlan@sgi.com>
Signed-off-by: NDavid Chinner <david@fromorbit.com>

087e3b04

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功