提交 · 42cb56ae2ab67390da34906b27bedc3f2ff1393b · xiphi1978 / linux

24 9月, 2009 1 次提交

vfs: change sb->s_maxbytes to a loff_t · 42cb56ae

由 Jeff Layton 提交于 9月 18, 2009

sb->s_maxbytes is supposed to indicate the maximum size of a file that can
exist on the filesystem.  It's declared as an unsigned long long.

Even if a filesystem has no inherent limit that prevents it from using
every bit in that unsigned long long, it's still problematic to set it to
anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
that cast s_maxbytes to a signed value.  If it's set too large then this
cast makes it a negative number and generally breaks the comparison.

Change s_maxbytes to be loff_t instead.  That should help eliminate the
temptation to set it too large by making it a signed value.

Also, add a warning for couple of releases to help catch filesystems that
set s_maxbytes too large.  Eventually we can either convert this to a
BUG() or just remove it and in the hope that no one will get it wrong now
that it's a signed value.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

42cb56ae

22 9月, 2009 1 次提交

const: mark remaining super_operations const · b87221de

由 Alexey Dobriyan 提交于 9月 21, 2009

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b87221de

16 9月, 2009 1 次提交

fs: Assign bdi in super_block · 32a88aa1

由 Jens Axboe 提交于 9月 16, 2009

We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().

Note that ->s_bdi assignment is required for proper writeback!
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

32a88aa1

11 9月, 2009 2 次提交

writeback: switch to per-bdi threads for flushing data · 03ba3782

由 Jens Axboe 提交于 9月 09, 2009

This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

03ba3782

writeback: move dirty inodes from super_block to backing_dev_info · 66f3b8e2

由 Jens Axboe 提交于 9月 02, 2009

This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

66f3b8e2

24 6月, 2009 2 次提交

A
... and the same for vfsmount id/mount group id · f21f6220
由 Al Viro 提交于 6月 24, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f21f6220

Make allocation of anon devices cheaper · c63e09ec

由 Al Viro 提交于 6月 24, 2009

Standard trick - add a new variable (start) such that
for each n < start n is known to be busy.  Allocation can
skip checking everything in [0..start) and if it returns
n, we can set start to n + 1.  Freeing below start sets
start to what we'd just freed.

Of course, it still sucks if we do something like
	free 0
	allocate
	allocate
in a loop - still O(n^2) time.  However, on saner loads it
improves the things a lot and the entire thing is not worth
the trouble of switching to something with better worst-case
behaviour.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c63e09ec

17 6月, 2009 1 次提交

remove unlock_kernel() left accidentally · b0895513

由 J. R. Okajima 提交于 6月 17, 2009

commit 337eb00a
Push BKL down into ->remount_fs()
and
commit 4aa98cf7
Push BKL down into do_remount_sb()

were uncorrectly merged.
The former removes one pair of lock/unlock_kernel(), but the latter adds
several unlock_kernel(). Finally a few unlock_kernel() calls left.
Signed-off-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b0895513

12 6月, 2009 17 次提交

Push BKL down into ->remount_fs() · 337eb00a

由 Alessio Igor Bogani 提交于 5月 12, 2009

[xfs, btrfs, capifs, shmem don't need BKL, exempt]
Signed-off-by: NAlessio Igor Bogani <abogani@texware.it>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

337eb00a

->write_super lock_super pushdown · ebc1ac16

由 Christoph Hellwig 提交于 5月 11, 2009

Push down lock_super into ->write_super instances and remove it from the
caller.

Following filesystem don't need ->s_lock in ->write_super and are skipped:

 * bfs, nilfs2 - no other uses of s_lock and have internal locks in
	->write_super
 * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
 * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
 	->write_super
 * xfs - no other uses of s_lock and uses internal lock (buffer lock on
	superblock buffer) to serialize ->write_super.  Also xfs_fs_write_super
	is superflous and will go away in the next merge window
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebc1ac16

A
Push BKL down into do_remount_sb() · 4aa98cf7
由 Al Viro 提交于 5月 08, 2009
```
[folded fix from Jiri Slaby]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
4aa98cf7

Push lock_super() into the ->remount_fs() of filesystems that care about it · bbd6851a

由 Al Viro 提交于 5月 06, 2009

Note that since we can't run into contention between remount_fs and write_super
(due to exclusion on s_umount), we have to care only about filesystems that
touch lock_super() on their own.  Out of those ext3, ext4, hpfs, sysv and ufs
do need it; fat doesn't since its ->remount_fs() only accesses assign-once
data (basically, it's "we have no atime on directories and only have atime on
files for vfat; force nodiratime and possibly noatime into *flags").

[folded a build fix from hch]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bbd6851a

push BKL down into ->put_super · 6cfd0148

由 Christoph Hellwig 提交于 5月 05, 2009

Move BKL into ->put_super from the only caller.  A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6cfd0148

No need to do lock_super() for exclusion in generic_shutdown_super() · a9e220f8

由 Al Viro 提交于 5月 05, 2009

We can't run into contention on it.  All other callers of lock_super()
either hold s_umount (and we have it exclusive) or hold an active
reference to superblock in question, which prevents the call of
generic_shutdown_super() while the reference is held.  So we can
replace lock_super(s) with get_fs_excl() in generic_shutdown_super()
(and corresponding change for unlock_super(), of course).

Since ext4 expects s_lock held for its put_super, take lock_super()
into it.  The rest of filesystems do not care at all.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a9e220f8

A
Make sure that all callers of remount hold s_umount exclusive · 443b94ba
由 Al Viro 提交于 5月 05, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
443b94ba

cleanup sync_supers · e5004753

由 Christoph Hellwig 提交于 5月 05, 2009

Merge the write_super helper into sync_super and move the check for
->write_super earlier so that we can avoid grabbing a reference to
a superblock that doesn't have it.

While we're at it also add a little comment documenting sync_supers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e5004753

remove ->write_super call in generic_shutdown_super · 8c85e125

由 Christoph Hellwig 提交于 4月 28, 2009

We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.

Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.

Exceptions:

 - affs already has identical copy & pasted code at the beginning of
   affs_put_super so no need to do it twice.
 - xfs does the right thing without it and I have changes pending for
   the xfs tree touching this are so I don't really need conflicts
   here..
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8c85e125

vfs: Rename fsync_super() to sync_filesystem() (version 4) · 60b0680f

由 Jan Kara 提交于 4月 27, 2009

Rename the function so that it better describe what it really does. Also
remove the unnecessary include of buffer_head.h.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

60b0680f

vfs: Move syncing code from super.c to sync.c (version 4) · c15c54f5

由 Jan Kara 提交于 4月 27, 2009

Move sync_filesystems(), __fsync_super(), fsync_super() from
super.c to sync.c where it fits better.

[build fixes folded]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c15c54f5

vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815

由 Jan Kara 提交于 4月 27, 2009

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

sync_blockdevs() introduced a couple of patches ago is gone now.

[build fixes folded]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5cee5815

vfs: Make __fsync_super() a static function (version 4) · 429479f0

由 Jan Kara 提交于 4月 27, 2009

__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

429479f0

vfs: Call ->sync_fs() even if s_dirt is 0 (version 4) · bfe88125

由 Jan Kara 提交于 4月 27, 2009

sync_filesystems() has a condition that if wait == 0 and s_dirt == 0, then
->sync_fs() isn't called. This does not really make much sence since s_dirt is
generally used by a filesystem to mean that ->write_super() needs to be called.
But ->sync_fs() does different things. I even suspect that some filesystems
(btrfs?) sets s_dirt just to fool this logic.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bfe88125

vfs: Fix sys_sync() and fsync_super() reliability (version 4) · 5a3e5cb8

由 Jan Kara 提交于 4月 27, 2009

So far, do_sync() called:
  sync_inodes(0);
  sync_supers();
  sync_filesystems(0);
  sync_filesystems(1);
  sync_inodes(1);

This ordering makes it kind of hard for filesystems as sync_inodes(0) need not
submit all the IO (for example it skips inodes with I_SYNC set) so e.g. forcing
transaction to disk in ->sync_fs() is not really enough. Therefore sys_sync has
not been completely reliable on some filesystems (ext3, ext4, reiserfs, ocfs2
and others are hit by this) when racing e.g. with background writeback. A
similar problem hits also other filesystems (e.g. ext2) because of
write_supers() being called before the sync_inodes(1).

Change the ordering of calls in do_sync() - this requires a new function
sync_blockdevs() to preserve the property that block devices are always synced
after write_super() / sync_fs() call.

The same issue is fixed in __fsync_super() function used on umount /
remount read-only.

[AV: build fixes]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5a3e5cb8

remove s_async_list · 876a9f76

由 Christoph Hellwig 提交于 4月 28, 2009

Remove the unused s_async_list in the superblock, a leftover of the
broken async inode deletion code that leaked into mainline.  Having this
in the middle of the sync/unmount path is not helpful for the following
cleanups.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

876a9f76

fs: move mark_files_ro into file_table.c · 864d7c4c

由 npiggin@suse.de 提交于 4月 26, 2009

This function walks the s_files lock, and operates primarily on the
files in a superblock, so it better belongs here (eg. see also
fs_may_remount_ro).

[AV: ... and it shouldn't be static after that move]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

864d7c4c

09 5月, 2009 2 次提交

NULL noise in fs/super.c:kill_bdev_super() · ddbaaf30

由 H Hartley Sweeten 提交于 4月 29, 2009

Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Subrata Modak <subrata@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ddbaaf30

New helper: deactivate_locked_super() · 74dbbdd7

由 Al Viro 提交于 5月 06, 2009

Does equivalent of up_write(&s->s_umount); deactivate_super(s);
However, it does not does not unlock it until it's all over.
As the result, it's safe to use to dispose of new superblock on ->get_sb()
failure exits - nobody will see the sucker until it's all over.
Equivalent using up_write/deactivate_super is safe for that purpose
if superblock is either	safe to use or has NULL ->s_root when we unlock.
Normally filesystems take the required precautions, but
	a) we do have bugs in that area in some of them.
	b) up_write/deactivate_super sequence is extremely common,
so the helper makes sense anyway.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

74dbbdd7

07 4月, 2009 1 次提交

namespaces: move proc_net_get_sb to a generic fs/super.c helper · 909e6d94

由 Serge E. Hallyn 提交于 4月 06, 2009

The mqueuefs filesystem will use this helper as well.  Proc's main get_sb
could also be made to use it, but that will require a bit more rework.
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

909e6d94

03 4月, 2009 1 次提交

CacheFiles: Export things for CacheFiles · 800a9647

由 David Howells 提交于 4月 03, 2009

Export a number of functions for CacheFiles's use.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Tested-by: NDaire Byrne <Daire.Byrne@framestore.com>

800a9647

28 3月, 2009 1 次提交

vfs: simple_set_mnt() should return void · a3ec947c

由 Sukadev Bhattiprolu 提交于 3月 04, 2009

simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail.  For instance, get_sb_single() and get_sb_nodev()
should:

        up_write(sb->s_unmount);
        deactivate_super(sb);

if simple_set_mnt() fails.

Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.

[akpm@linux-foundation.org: fix build]
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3ec947c

26 3月, 2009 2 次提交

Get rid of pdflush_operation() in emergency sync and remount · a2a9537a

由 Jens Axboe 提交于 3月 17, 2009

Opencode a cheasy approach with kevent. The idea here is that we'll
add some generic delayed work infrastructure, which probably wont be
based on pdflush (or maybe it will, in which case we can just add it
back).

This is in preparation for getting rid of pdflush completely.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a2a9537a

vfs: Use lowercase names of quota functions · 9e3509e2

由 Jan Kara 提交于 1月 26, 2009

Use lowercase names of quota functions instead of old uppercase ones.
Signed-off-by: NJan Kara <jack@suse.cz>
CC: Alexander Viro <viro@zeniv.linux.org.uk>

9e3509e2

13 3月, 2009 1 次提交

vfs: add missing unlock in sget() · a3cfbb53

由 Li Zefan 提交于 3月 12, 2009

In sget(), destroy_super(s) is called with s->s_umount held, which makes
lockdep unhappy.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3cfbb53

19 2月, 2009 1 次提交

fs/super.c: add lockdep annotation to s_umount · ada723dc

由 Peter Zijlstra 提交于 2月 18, 2009

Li Zefan said:

Thread 1:
  for ((; ;))
  {
      mount -t cpuset xxx /mnt > /dev/null 2>&1
      cat /mnt/cpus > /dev/null 2>&1
      umount /mnt > /dev/null 2>&1
  }

Thread 2:
  for ((; ;))
  {
      mount -t cpuset xxx /mnt > /dev/null 2>&1
      umount /mnt > /dev/null 2>&1
  }

(Note: It is irrelevant which cgroup subsys is used.)

After a while a lockdep warning showed up:

=============================================
[ INFO: possible recursive locking detected ]
2.6.28 #479
---------------------------------------------
mount/13554 is trying to acquire lock:
 (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321

but task is already holding lock:
 (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321

other info that might help us debug this:
1 lock held by mount/13554:
 #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321

stack backtrace:
Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
Call Trace:
 [<c044ad2e>] validate_chain+0x4c6/0xbbd
 [<c044ba9b>] __lock_acquire+0x676/0x700
 [<c044bb82>] lock_acquire+0x5d/0x7a
 [<c049d888>] ? sget+0x5e/0x321
 [<c061b9b8>] down_write+0x34/0x50
 [<c049d888>] ? sget+0x5e/0x321
 [<c049d888>] sget+0x5e/0x321
 [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
 [<c045959f>] ? cgroup_test_super+0x0/0x2f
 [<c045bcea>] cgroup_get_sb+0x98/0x2e7
 [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
 [<c049dfa4>] vfs_kern_mount+0x40/0x7b
 [<c049e02d>] do_kern_mount+0x37/0xbf
 [<c04af4a0>] do_mount+0x5c3/0x61a
 [<c04addd2>] ? copy_mount_options+0x2c/0x111
 [<c04af560>] sys_mount+0x69/0xa0
 [<c0403251>] sysenter_do_call+0x12/0x31

The cause is after alloc_super() and then retry, an old entry in list
fs_supers is found, so grab_super(old) is called, but both functions hold
s_umount lock:

struct super_block *sget(...)
{
	...
retry:
	spin_lock(&sb_lock);
	if (test) {
		list_for_each_entry(old, &type->fs_supers, s_instances) {
			if (!test(old, data))
				continue;
			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
				goto retry;
			if (s)
				destroy_super(s);
			return old;
		}
	}
	if (!s) {
		spin_unlock(&sb_lock);
		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
		if (!s)
			return ERR_PTR(-ENOMEM);
		goto retry;
	}
	...
}

It seems like a false positive, and seems like VFS but not cgroup needs to
be fixed.

Peter said:

We can simply put the new s_umount instance in a but lockdep doesn't
particularly cares about subclass order.

If there's any issue with the callers of sget() assuming the s_umount lock
being of sublcass 0, then there is another annotation we can use to fix
that, but lets not bother with that if this is sufficient.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Menage <menage@google.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ada723dc

09 2月, 2009 1 次提交

async: Rename _special -> _domain for clarity. · 766ccb9e

由 Cornelia Huck 提交于 1月 20, 2009

Rename the async_*_special() functions to async_*_domain(), which
describes the purpose of these functions much better.
[Broke up long lines to silence checkpatch]
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

766ccb9e

14 1月, 2009 1 次提交
- H
  [CVE-2009-0029] System call wrappers part 11 · 257ac264
  由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
  257ac264
09 1月, 2009 1 次提交

async: Don't call async_synchronize_full_special() while holding sb_lock · 96777fe7

由 Dave Kleikamp 提交于 1月 08, 2009

sync_filesystems() shouldn't be calling async_synchronize_full_special
while holding a spinlock.  The second while loop in that function is the
right place for this anyway.
Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Reported-by: NGrissiom <chaos.proton@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96777fe7

08 1月, 2009 1 次提交

async: make the final inode deletion an asynchronous event · efaee192

由 Arjan van de Ven 提交于 1月 06, 2009

this makes "rm -rf" on a (names cached) kernel tree go from
11.6 to 8.6 seconds on an ext3 filesystem
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

efaee192

03 1月, 2009 1 次提交

add releasepage hooks to block devices which can be used by file systems · 87d8fe1e

由 Theodore Ts'o 提交于 1月 03, 2009

Implement blkdev_releasepage() to release the buffer_heads and pages
after we release private data belonging to a mounted filesystem.

Cc: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

87d8fe1e

20 12月, 2008 1 次提交

security: pass mount flags to security_sb_kern_mount() · 12204e24

由 James Morris 提交于 12月 19, 2008

Pass mount flags to security_sb_kern_mount(), so security modules
can determine if a mount operation is being performed by the kernel.
Signed-off-by: NJames Morris <jmorris@namei.org>
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>

12204e24