提交 · 7a4dec53897ecd3367efb1e12fe8a4edc47dc0e9 · openanolis / cloud-kernel

10 8月, 2010 2 次提交

Fix sget() race with failing mount · 7a4dec53

由 Al Viro 提交于 8月 09, 2010

If sget() finds a matching superblock being set up, it'll
grab an active reference to it and grab s_umount.  That's
fine - we'll wait for completion of foofs_get_sb() that way.
However, if said foofs_get_sb() fails we'll end up holding
the halfway-created superblock.  deactivate_locked_super()
called by foofs_get_sb() will just unlock the sucker since
we are holding another active reference to it.

What we need is a way to tell if superblock has been successfully
set up.  Unfortunately, neither ->s_root nor the check for
MS_ACTIVE quite fit.  Cheap and easy way, suitable for backport:
new flag set by the (only) caller of ->get_sb().  If that flag
isn't present by the time sget() grabbed s_umount on preexisting
superblock it has found, it's seeing a stillborn and should
just bury it with deactivate_locked_super() (and repeat the search).

Longer term we want to set that flag in ->get_sb() instances (and
check for it to distinguish between "sget() found us a live sb"
and "sget() has allocated an sb, we need to set it up" in there,
instead of checking ->s_root as we do now).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org

7a4dec53

vfs: don't hold s_umount over close_bdev_exclusive() call · 4f331f01

由 Tejun Heo 提交于 7月 20, 2010

Fix an obscure AB-BA deadlock in get_sb_bdev().

When a superblock is mounted more than once get_sb_bdev() calls
close_bdev_exclusive() to drop the extra bdev reference while holding
s_umount.  However, sb->s_umount nests inside bd_mutex during
__invalidate_device() and close_bdev_exclusive() acquires bd_mutex during
blkdev_put(); thus creating an AB-BA deadlock.

This condition doesn't trigger frequently.  For this condition to be
visible to lockdep, the filesystem must occupy the whole device (as
__invalidate_device() only grabs bd_mutex for the whole device), the FS
must be mounted more than once and partition rescan should be issued while
the FS is still mounted.

Fix it by dropping s_umount over close_bdev_exclusive().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NCiprian Docan <docan@eden.rutgers.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4f331f01

30 6月, 2010 1 次提交

fs: fix superblock iteration race · 57439f87

由 npiggin@suse.de 提交于 6月 24, 2010

list_for_each_entry_safe is not suitable to protect against concurrent
modification of the list. 6754af64 introduced a race in sb walking.

list_for_each_entry can use the trick of pinning the current entry in
the list before we drop and retake the lock because it subsequently
follows cur->next. However list_for_each_entry_safe saves n=cur->next
for following before entering the loop body, so when the lock is
dropped, n may be deleted.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57439f87

28 5月, 2010 1 次提交

fs/super: fix kernel-doc warning · 7000d3c4

由 Randy Dunlap 提交于 5月 24, 2010

Fix fs/super.c kernel-doc warning and function notation:
Warning(fs/super.c:957): No description found for parameter 'sb'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7000d3c4

24 5月, 2010 3 次提交

quota: explicitly set ->dq_op and ->s_qcop · 123e9caf

由 Christoph Hellwig 提交于 5月 19, 2010

Only set the quota operation vectors if the filesystem actually supports
quota instead of doing it for all filesystems in alloc_super().

[Jan Kara: Export dquot_operations and vfs_quotactl_ops]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

123e9caf

quota: move unmount handling into the filesystem · e0ccfd95

由 Christoph Hellwig 提交于 5月 19, 2010

Currently the VFS calls into the quotactl interface for unmounting
filesystems.  This means filesystems with their own quota handling
can't easily distinguish between user-space originating quotaoff
and an unount.  Instead move the responsibily of the unmount handling
into the filesystem to be consistent with all other dquot handling.

Note that we do call dquot_disable a lot later now, e.g. after
a sync_filesystem.  But this is fine as the quota code does all its
writes via blockdev's mapping and that is synced even later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

e0ccfd95

quota: move remount handling into the filesystem · c79d967d

由 Christoph Hellwig 提交于 5月 19, 2010

Currently do_remount_sb calls into the dquot code to tell it about going
from rw to ro and ro to rw.  Move this code into the filesystem to
not depend on the dquot code in the VFS - note ocfs2 already ignores
these calls and handles remount by itself.  This gets rid of overloading
the quotactl calls and allows to unify the VFS and XFS codepaths in
that area later.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>

c79d967d

22 5月, 2010 18 次提交

vfs: add lockdep annotation to s_vfs_rename_key for ecryptfs · 51ee049e

由 Roland Dreier 提交于 4月 27, 2010

 >  =============================================
 >  [ INFO: possible recursive locking detected ]
 >  2.6.31-2-generic #14~rbd3
 >  ---------------------------------------------
 >  firefox-3.5/4162 is trying to acquire lock:
 >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >
 >  but task is already holding lock:
 >   (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >
 >  other info that might help us debug this:
 >  3 locks held by firefox-3.5/4162:
 >   #0:  (&s->s_vfs_rename_mutex){+.+.+.}, at: [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >   #1:  (&sb->s_type->i_mutex_key#11/1){+.+.+.}, at: [<ffffffff81139d5a>] lock_rename+0x6a/0xf0
 >   #2:  (&sb->s_type->i_mutex_key#11/2){+.+.+.}, at: [<ffffffff81139d6f>] lock_rename+0x7f/0xf0
 >
 >  stack backtrace:
 >  Pid: 4162, comm: firefox-3.5 Tainted: G         C 2.6.31-2-generic #14~rbd3
 >  Call Trace:
 >   [<ffffffff8108ae74>] print_deadlock_bug+0xf4/0x100
 >   [<ffffffff8108ce26>] validate_chain+0x4c6/0x750
 >   [<ffffffff8108d2e7>] __lock_acquire+0x237/0x430
 >   [<ffffffff8108d585>] lock_acquire+0xa5/0x150
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff815526ad>] __mutex_lock_common+0x4d/0x3d0
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff81139d31>] ? lock_rename+0x41/0xf0
 >   [<ffffffff8120eaf9>] ? ecryptfs_rename+0x99/0x170
 >   [<ffffffff81552b36>] mutex_lock_nested+0x46/0x60
 >   [<ffffffff81139d31>] lock_rename+0x41/0xf0
 >   [<ffffffff8120eb2a>] ecryptfs_rename+0xca/0x170
 >   [<ffffffff81139a9e>] vfs_rename_dir+0x13e/0x160
 >   [<ffffffff8113ac7e>] vfs_rename+0xee/0x290
 >   [<ffffffff8113c212>] ? __lookup_hash+0x102/0x160
 >   [<ffffffff8113d512>] sys_renameat+0x252/0x280
 >   [<ffffffff81133eb4>] ? cp_new_stat+0xe4/0x100
 >   [<ffffffff8101316a>] ? sysret_check+0x2e/0x69
 >   [<ffffffff8108c34d>] ? trace_hardirqs_on_caller+0x14d/0x190
 >   [<ffffffff8113d55b>] sys_rename+0x1b/0x20
 >   [<ffffffff81013132>] system_call_fastpath+0x16/0x1b

The trace above is totally reproducible by doing a cross-directory
rename on an ecryptfs directory.

The issue seems to be that sys_renameat() does lock_rename() then calls
into the filesystem; if the filesystem is ecryptfs, then
ecryptfs_rename() again does lock_rename() on the lower filesystem, and
lockdep can't tell that the two s_vfs_rename_mutexes are different.  It
seems an annotation like the following is sufficient to fix this (it
does get rid of the lockdep trace in my simple tests); however I would
like to make sure I'm not misunderstanding the locking, hence the CC
list...
Signed-off-by: NRoland Dreier <rdreier@cisco.com>
Cc: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51ee049e

Introduce freeze_super and thaw_super for the fsfreeze ioctl · 18e9e510

由 Josef Bacik 提交于 3月 23, 2010

Currently the way we do freezing is by passing sb>s_bdev to freeze_bdev and then
letting it do all the work. But freezing is more of an fs thing, and doesn't
really have much to do with the bdev at all, all the work gets done with the
super. In btrfs we do not populate s_bdev, since we can have multiple bdev's
for one fs and setting s_bdev makes removing devices from a pool kind of tricky.
This means that freezing a btrfs filesystem fails, which causes us to corrupt
with things like tux-on-ice which use the fsfreeze mechanism. So instead of
populating sb->s_bdev with a random bdev in our pool, I've broken the actual fs
freezing stuff into freeze_super and thaw_super. These just take the
super_block that we're freezing and does the appropriate work. It's basically
just copy and pasted from freeze_bdev. I've then converted freeze_bdev over to
use the new super helpers. I've tested this with ext4 and btrfs and verified
everything continues to work the same as before.

The only new gotcha is multiple calls to the fsfreeze ioctl will return EBUSY if
the fs is already frozen. I thought this was a better solution than adding a
freeze counter to the super_block, but if everybody hates this idea I'm open to
suggestions. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

18e9e510

A
Trim includes in fs/super.c · e1e46bf1
由 Al Viro 提交于 3月 23, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e1e46bf1
A
Move grabbing s_umount to callers of grab_super() · d3f21473
由 Al Viro 提交于 3月 23, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d3f21473
A
Take statfs variants to fs/statfs.c · 7ed1ee61
由 Al Viro 提交于 3月 23, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7ed1ee61

new helper: iterate_supers() · 01a05b33

由 Al Viro 提交于 3月 23, 2010

... and switch the simple "loop over superblocks and do something"
loops to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

01a05b33

A
Bury __put_super_and_need_restart() · 35cf7ba0
由 Al Viro 提交于 3月 22, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
35cf7ba0

In get_super() and user_get_super() restarts are unconditional · df40c01a

由 Al Viro 提交于 3月 22, 2010

If superblock had been still alive, we would've returned it...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df40c01a

A
fix get_active_super()/umount() race · 1494583d
由 Al Viro 提交于 3月 22, 2010
```
This one needs restarts...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1494583d

fix do_emergency_remount()/umount() races · e7fe0585

由 Al Viro 提交于 3月 22, 2010

need list_for_each_entry_safe() here.  Original didn't even
have restart logics, so if you race with umount() it blew up.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e7fe0585

A
Convert simple loops over superblocks to list_for_each_entry_safe · 6754af64
由 Al Viro 提交于 3月 22, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6754af64

get rid of restarts in sync_filesystems() · 8edd64bd

由 Al Viro 提交于 3月 22, 2010

At the same time we can kill s_need_restart and local mutex in there.
__put_super() made public for a while; will be gone later.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8edd64bd

Leave superblocks on s_list until the end · 551de6f3

由 Al Viro 提交于 3月 22, 2010

We used to remove from s_list and s_instances at the same
time.  So let's *not* do the former and skip superblocks
that have empty s_instances in the loops over s_list.

The next step, of course, will be to get rid of rescan logics
in those loops.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

551de6f3

Saner locking around deactivate_super() · 1712ac8f

由 Al Viro 提交于 3月 22, 2010

Make sure that s_umount is acquired *before* we drop the final
active reference; we still have the fast path (atomic_dec_unless)
and we have gotten rid of the window between the moment when
s_active hits zero and s_umount is acquired.  Which simplifies
the living hell out of grab_super() and inotify pin_to_kill()
stuff.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1712ac8f

get rid of S_BIAS · b20bd1a5

由 Al Viro 提交于 3月 22, 2010

use atomic_inc_not_zero(&sb->s_active) instead of playing games with
checking ->s_count > S_BIAS
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b20bd1a5

A
get rid of open-coded grab_super() in get_active_super() · 389b8be6
由 Al Viro 提交于 3月 21, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
389b8be6
C
remove incorrect comment in do_emergency_remount · a135aa2c
由 Christoph Hellwig 提交于 2月 01, 2010
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a135aa2c

writeback: fix problem with !CONFIG_BLOCK compilation · c2c4986e

由 Jens Axboe 提交于 5月 20, 2010

When CONFIG_BLOCK isn't enabled:

mm/page-writeback.c: In function 'laptop_mode_timer_fn':
mm/page-writeback.c:708: error: dereferencing pointer to incomplete type
mm/page-writeback.c:709: error: dereferencing pointer to incomplete type

Fix this by essentially eliminating the laptop sync handlers when
CONFIG_BLOCK isn't set, as most are only used from the block layer code.
The exception is laptop_sync_completion() which is used from sys_sync(),
make that an empty declaration in that case.
Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c2c4986e

30 4月, 2010 1 次提交

fs: fs/super.c needs to include backing-dev.h for !CONFIG_BLOCK · 5477d0fa

由 Jens Axboe 提交于 4月 29, 2010

When CONFIG_BLOCK is set, it ends up getting backing-dev.h included.
But for !CONFIG_BLOCK, it isn't so lucky. The proper thing to do is
include <linux/backing-dev.h> directly from the file it's used from,
so do that.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5477d0fa

25 4月, 2010 1 次提交

Catch filesystems lacking s_bdi · 5129a469

由 Jörn Engel 提交于 4月 25, 2010

noop_backing_dev_info is used only as a flag to mark filesystems that
don't have any backing store, like tmpfs, procfs, spufs, etc.
Signed-off-by: NJoern Engel <joern@logfs.org>

Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
to the noop_backing_dev_info is not legal and will not result in
them being flushed, but we already catch this condition in
__mark_inode_dirty() when checking for a registered bdi.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5129a469

04 3月, 2010 2 次提交

A
Mirror MS_KERNMOUNT in ->mnt_flags · 8089352a
由 Al Viro 提交于 2月 05, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8089352a

fs: improve remount,ro vs buffercache coherency · d208bbdd

由 Nick Piggin 提交于 12月 21, 2009

Invalidate sb->s_bdev on remount,ro.

Fixes a problem reported by Jorge Boncompte who is seeing corruption
trying to snapshot a minix filesystem image.  Some filesystems modify
their metadata via a path other than the bdev buffer cache (eg.  they may
use a private linear mapping for their metadata, or implement directories
in pagecache, etc).  Also, file data modifications usually go to the bdev
via their own mappings.

These updates are not coherent with buffercache IO (eg.  via /dev/bdev)
and never have been.  However there could be a reasonable expectation that
after a mount -oremount,ro operation then the buffercache should
subsequently be coherent with previous filesystem modifications.

So invalidate the bdev mappings on a remount,ro operation to provide a
coherency point.

The problem was exposed when we switched the old rd to brd because old rd
didn't really function like a normal block device and updates to rd via
mappings other than the buffercache would still end up going into its
buffercache.  But the same problem has always affected other "normal"
block devices, including loop.

[akpm@linux-foundation.org: repair comment layout]
Reported-by: N"Jorge Boncompte [DTI2]" <jorge@dti2.net>
Tested-by: N"Jorge Boncompte [DTI2]" <jorge@dti2.net>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d208bbdd

24 12月, 2009 1 次提交

vfs: get_sb_single() - do not pass options twice · 9329d1be

由 Kay Sievers 提交于 12月 18, 2009

Filesystem code usually destroys the option buffer while
parsing it. This leads to errors when the same buffer is
passed twice. In case we fill a new superblock do not call
remount.

This is needed to quite a warning that the debugfs code
causes every boot.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

9329d1be

24 9月, 2009 3 次提交

freeze_bdev: grab active reference to frozen superblocks · 4504230a

由 Christoph Hellwig 提交于 8月 03, 2009

Currently we held s_umount while a filesystem is frozen, despite that we
might return to userspace and unlock it from a different process.  Instead
grab an active reference to keep the file system busy and add an explicit
check for frozen filesystems in remount and reject the remount instead
of blocking on s_umount.

Add a new get_active_super helper to super.c for use by freeze_bdev that
grabs an active reference to a superblock from a given block device.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4504230a

freeze_bdev: kill bd_mount_sem · 4fadd7bb

由 Christoph Hellwig 提交于 8月 03, 2009

Now that we have the freeze count there is not much reason for bd_mount_sem
anymore. The actual freeze/thaw operations are serialized using the
bd_fsfreeze_mutex, and the only other place we take bd_mount_sem is
get_sb_bdev which tries to prevent mounting a filesystem while the block
device is frozen. Instead of add a check for bd_fsfreeze_count and
return -EBUSY if a filesystem is frozen. While that is a change in user
visible behaviour a failing mount is much better for this case rather
than having the mount process stuck uninterruptible for a long time.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4fadd7bb

vfs: change sb->s_maxbytes to a loff_t · 42cb56ae

由 Jeff Layton 提交于 9月 18, 2009

sb->s_maxbytes is supposed to indicate the maximum size of a file that can
exist on the filesystem.  It's declared as an unsigned long long.

Even if a filesystem has no inherent limit that prevents it from using
every bit in that unsigned long long, it's still problematic to set it to
anything larger than MAX_LFS_FILESIZE.  There are places in the kernel
that cast s_maxbytes to a signed value.  If it's set too large then this
cast makes it a negative number and generally breaks the comparison.

Change s_maxbytes to be loff_t instead.  That should help eliminate the
temptation to set it too large by making it a signed value.

Also, add a warning for couple of releases to help catch filesystems that
set s_maxbytes too large.  Eventually we can either convert this to a
BUG() or just remove it and in the hope that no one will get it wrong now
that it's a signed value.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Robert Love <rlove@google.com>
Cc: Mandeep Singh Baines <msb@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

42cb56ae

22 9月, 2009 1 次提交

const: mark remaining super_operations const · b87221de

由 Alexey Dobriyan 提交于 9月 21, 2009

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b87221de

16 9月, 2009 1 次提交

fs: Assign bdi in super_block · 32a88aa1

由 Jens Axboe 提交于 9月 16, 2009

We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().

Note that ->s_bdi assignment is required for proper writeback!
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

32a88aa1

11 9月, 2009 2 次提交

writeback: switch to per-bdi threads for flushing data · 03ba3782

由 Jens Axboe 提交于 9月 09, 2009

This gets rid of pdflush for bdi writeout and kupdated style cleaning.
pdflush writeout suffers from lack of locality and also requires more
threads to handle the same workload, since it has to work in a
non-blocking fashion against each queue. This also introduces lumpy
behaviour and potential request starvation, since pdflush can be starved
for queue access if others are accessing it. A sample ffsb workload that
does random writes to files is about 8% faster here on a simple SATA drive
during the benchmark phase. File layout also seems a LOT more smooth in
vmstat:

r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45

where vanilla tends to fluctuate a lot in the creation phase:

r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54

A 10 disk test with btrfs performs 26% faster with per-bdi flushing. A
SSD based writeback test on XFS performs over 20% better as well, with
the throughput being very stable around 1GB/sec, where pdflush only
manages 750MB/sec and fluctuates wildly while doing so. Random buffered
writes to many files behave a lot better as well, as does random mmap'ed
writes.

A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

03ba3782

writeback: move dirty inodes from super_block to backing_dev_info · 66f3b8e2

由 Jens Axboe 提交于 9月 02, 2009

This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

66f3b8e2

24 6月, 2009 2 次提交

A
... and the same for vfsmount id/mount group id · f21f6220
由 Al Viro 提交于 6月 24, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f21f6220

Make allocation of anon devices cheaper · c63e09ec

由 Al Viro 提交于 6月 24, 2009

Standard trick - add a new variable (start) such that
for each n < start n is known to be busy.  Allocation can
skip checking everything in [0..start) and if it returns
n, we can set start to n + 1.  Freeing below start sets
start to what we'd just freed.

Of course, it still sucks if we do something like
	free 0
	allocate
	allocate
in a loop - still O(n^2) time.  However, on saner loads it
improves the things a lot and the entire thing is not worth
the trouble of switching to something with better worst-case
behaviour.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c63e09ec

17 6月, 2009 1 次提交

remove unlock_kernel() left accidentally · b0895513

由 J. R. Okajima 提交于 6月 17, 2009

commit 337eb00a
Push BKL down into ->remount_fs()
and
commit 4aa98cf7
Push BKL down into do_remount_sb()

were uncorrectly merged.
The former removes one pair of lock/unlock_kernel(), but the latter adds
several unlock_kernel(). Finally a few unlock_kernel() calls left.
Signed-off-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b0895513

openanolis / cloud-kernel 10 个月 前同步成功

openanolis / cloud-kernel
10 个月前同步成功