提交 · 5cee5815d1564bbbd505fea86f4550f1efdb5cd0 · openanolis / cloud-kernel

12 6月, 2009 29 次提交

vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815

由 Jan Kara 提交于 4月 27, 2009

It is unnecessarily fragile to have two places (fsync_super() and do_sync())
doing data integrity sync of the filesystem. Alter __fsync_super() to
accommodate needs of both callers and use it. So after this patch
__fsync_super() is the only place where we gather all the calls needed to
properly send all data on a filesystem to disk.

Nice bonus is that we get a complete livelock avoidance and write_supers()
is now only used for periodic writeback of superblocks.

sync_blockdevs() introduced a couple of patches ago is gone now.

[build fixes folded]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5cee5815

vfs: Make __fsync_super() a static function (version 4) · 429479f0

由 Jan Kara 提交于 4月 27, 2009

__fsync_super() does the same thing as fsync_super(). So change the only
caller to use fsync_super() and make __fsync_super() static. This removes
unnecessarily duplicated call to sync_blockdev() and prepares ground
for the changes to __fsync_super() in the following patches.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

429479f0

remove s_async_list · 876a9f76

由 Christoph Hellwig 提交于 4月 28, 2009

Remove the unused s_async_list in the superblock, a leftover of the
broken async inode deletion code that leaked into mainline.  Having this
in the middle of the sync/unmount path is not helpful for the following
cleanups.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

876a9f76

fs: introduce mnt_clone_write · 96029c4e

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about another 2% after the
first patch.

Before:
 avg = 462.286
 std = 5.46106

After:
 avg = 453.12
 std = 9.58257

(50 runs of each, stddev gives a reasonable confidence)

It does this by introducing mnt_clone_write, which avoids some heavyweight
operations of mnt_want_write if called on a vfsmount which we know already
has a write count; and mnt_want_write_file, which can call mnt_clone_write
if the file is open for write.

After these two patches, mnt_want_write and mnt_drop_write go from 7% on
the profile down to 1.3% (including mnt_clone_write).

[AV: mnt_want_write_file() should take file alone and derive mnt from it;
not only all callers have that form, but that's the only mnt about which
we know that it's already held for write if file is opened for write]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

96029c4e

fs: mnt_want_write speedup · d3ef3d73

由 npiggin@suse.de 提交于 4月 26, 2009

This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up
basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it.
A microbenchmark yes, but it exercises some important paths in the mm.

Before:
 avg = 501.9
 std = 14.7773

After:
 avg = 462.286
 std = 5.46106

(50 runs of each, stddev gives a reasonable confidence, but there is quite
a bit of variation there still)

It does this by removing the complex per-cpu locking and counter-cache and
replaces it with a percpu counter in struct vfsmount. This makes the code
much simpler, and avoids spinlocks (although the msync is still pretty
costly, unfortunately). It results in about 900 bytes smaller code too. It
does increase the size of a vfsmount, however.

It should also give a speedup on large systems if CPUs are frequently operating
on different mounts (because the existing scheme has to operate on an atomic in
the struct vfsmount when switching between mounts). But I'm most interested in
the single threaded path performance for the moment.

[AV: minor cleanup]

Cc: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d3ef3d73

A
Move junk from proc_fs.h to fs/proc/internal.h · 3174c21b
由 Al Viro 提交于 4月 07, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3174c21b
A
switch lookup_mnt() · 1c755af4
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1c755af4
A
switch follow_down() · 9393bd07
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9393bd07
A
Switch collect_mounts() to struct path · 589ff870
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
589ff870
A
switch follow_up() to struct path · bab77ebf
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
bab77ebf
A
switch rqst_exp_parent() · e64c390c
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e64c390c
A
switch rqst_exp_get_by_name() · 91c9fa8f
由 Al Viro 提交于 4月 18, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
91c9fa8f

Cache root in nameidata · 2a737871

由 Al Viro 提交于 4月 07, 2009

New field: nd->root. When pathname resolution wants to know the root,
check if nd->root.mnt is non-NULL; use nd->root if it is, otherwise
copy current->fs->root there. After path_walk() is finished, we check
if we'd got a cached value in nd->root and drop it. Before calling
path_walk() we should either set nd->root.mnt to NULL *or* copy (and
pin down) some path to nd->root. In the latter case we won't be
looking at current->fs->root at all.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a737871

reiserfs: allow exposing privroot w/ xattrs enabled · 73422811

由 Jeff Mahoney 提交于 5月 10, 2009

This patch adds an -oexpose_privroot option to allow access to the privroot.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

73422811

fsnotify: move events should indicate the event was on a child · ff52cc21

由 Eric Paris 提交于 6月 11, 2009

fsnotify tells its listeners explicitly when an event happened on the given
inode verses on the child of the given inode. (see __fsnotify_parent)
However, the semantics of fsnotify_move() are such that we deliver events
directly to the two parent directories in question (old_dir and new_dir)
directly without using the __fsnotify_parent() call. fsnotify should be
adding FS_EVENT_ON_CHILD for the notifications to these parents.
Signed-off-by: NEric Paris <eparis@redhat.com>

ff52cc21

inotify: reimplement inotify using fsnotify · 63c882a0

由 Eric Paris 提交于 5月 21, 2009

Reimplement inotify_user using fsnotify.  This should be feature for feature
exactly the same as the original inotify_user.  This does not make any changes
to the in kernel inotify feature used by audit.  Those patches (and the eventual
removal of in kernel inotify) will come after the new inotify_user proves to be
working correctly.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

63c882a0

fsnotify: handle filesystem unmounts with fsnotify marks · 164bc619

由 Eric Paris 提交于 5月 21, 2009

When an fs is unmounted with an fsnotify mark entry attached to one of its
inodes we need to destroy that mark entry and we also (like inotify) send
an unmount event.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

164bc619

fsnotify: allow groups to add private data to events · e4aff117

由 Eric Paris 提交于 5月 21, 2009

inotify needs per group information attached to events.  This patch allows
groups to attach private information and implements a callback so that
information can be freed when an event is being destroyed.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

e4aff117

fsnotify: add correlations between events · 47882c6f

由 Eric Paris 提交于 5月 21, 2009

As part of the standard inotify events it includes a correlation cookie
between two dentry move operations.  This patch includes the same behaviour
in fsnotify events.  It is needed so that inotify userspace can be
implemented on top of fsnotify.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

47882c6f

fsnotify: include pathnames with entries when possible · 62ffe5df

由 Eric Paris 提交于 5月 21, 2009

When inotify wants to send events to a directory about a child it includes
the name of the original file.  This patch collects that filename and makes
it available for notification.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

62ffe5df

fsnotify: generic notification queue and waitq · a2d8bc6c

由 Eric Paris 提交于 5月 21, 2009

inotify needs to do asyc notification in which event information is stored
on a queue until the listener is ready to receive it.  This patch
implements a generic notification queue for inotify (and later fanotify) to
store events to be sent at a later time.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

a2d8bc6c

dnotify: reimplement dnotify using fsnotify · 3c5119c0

由 Eric Paris 提交于 5月 21, 2009

Reimplement dnotify using fsnotify.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

3c5119c0

fsnotify: parent event notification · c28f7e56

由 Eric Paris 提交于 5月 21, 2009

inotify and dnotify both use a similar parent notification mechanism.  We
add a generic parent notification mechanism to fsnotify for both of these
to use.  This new machanism also adds the dentry flag optimization which
exists for inotify to dnotify.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

c28f7e56

fsnotify: add marks to inodes so groups can interpret how to handle those inodes · 3be25f49

由 Eric Paris 提交于 5月 21, 2009

This patch creates a way for fsnotify groups to attach marks to inodes.
These marks have little meaning to the generic fsnotify infrastructure
and thus their meaning should be interpreted by the group that attached
them to the inode's list.

dnotify and inotify  will make use of these markings to indicate which
inodes are of interest to their respective groups.  But this implementation
has the useful property that in the future other listeners could actually
use the marks for the exact opposite reason, aka to indicate which inodes
it had NO interest in.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

3be25f49

fsnotify: unified filesystem notification backend · 90586523

由 Eric Paris 提交于 5月 21, 2009

fsnotify is a backend for filesystem notification. fsnotify does
not provide any userspace interface but does provide the basis
needed for other notification schemes such as dnotify. fsnotify
can be extended to be the backend for inotify or the upcoming
fanotify. fsnotify provides a mechanism for "groups" to register for
some set of filesystem events and to then deliver those events to
those groups for processing.

fsnotify has a number of benefits, the first being actually shrinking the size
of an inode. Before fsnotify to support both dnotify and inotify an inode had

unsigned long i_dnotify_mask; /* Directory notify events */
struct dnotify_struct *i_dnotify; /* for directory notifications */
struct list_head inotify_watches; /* watches on this inode */
struct mutex inotify_mutex; /* protects the watches list

But with fsnotify this same functionallity (and more) is done with just

__u32 i_fsnotify_mask; /* all events for this inode */
struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */

That's right, inotify, dnotify, and fanotify all in 64 bits. We used that
much space just in inotify_watches alone, before this patch set.

fsnotify object lifetime and locking is MUCH better than what we have today.
inotify locking is incredibly complex. See 8f7b0ba1 as an example of
what's been busted since inception. inotify needs to know internal semantics
of superblock destruction and unmounting to function. The inode pinning and
vfs contortions are horrible.

no fsnotify implementers do allocation under locks. This means things like
f04b30de which (due to an overabundance of caution) changes GFP_KERNEL to
GFP_NOFS can be reverted. There are no longer any allocation rules when using
or implementing your own fsnotify listener.

fsnotify paves the way for fanotify. In brief fanotify is a notification
mechanism that delivers the lisener both an 'event' and an open file descriptor
to the object in question. This means that fanotify is pathname agnostic.
Some on lkml may not care for the original companies or users that pushed for
TALPA, but fanotify was designed with flexibility and input for other users in
mind. The readahead group expressed interest in fanotify as it could be used
to profile disk access on boot without breaking the audit system. The desktop
search groups have also expressed interest in fanotify as it solves a number
of the race conditions and problems present with managing inotify when more
than a limited number of specific files are of interest. fanotify can provide
for a userspace access control system which makes it a clean interface for AV
vendors to hook without trying to do binary patching on the syscall table,
LSM, and everywhere else they do their things today. With this patch series
fanotify can be implemented in less than 1200 lines of easy to review code.
Almost all of which is the socket based user interface.

This patch series builds fsnotify to the point that it can implement
dnotify and inotify_user. Patches exist and will be sent soon after
acceptance to finish the in kernel inotify conversion (audit) and implement
fanotify.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>

90586523

x86: remove some alloc_bootmem_cpumask_var calling · 38c7fed2

由 Yinghai Lu 提交于 5月 25, 2009

Now that we set up the slab allocator earlier, we can get rid of some
alloc_bootmem_cpumask_var() calls in boot code.

Cc: Ingo Molnar <mingo@elte.hu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

38c7fed2

kmemleak: Remove some of the kmemleak false positives · 2e1483c9

由 Catalin Marinas 提交于 6月 11, 2009

There are allocations for which the main pointer cannot be found but
they are not memory leaks. This patch fixes some of them. For more
information on false positives, see Documentation/kmemleak.txt.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>

2e1483c9

kmemleak: Add the slab memory allocation/freeing hooks · d5cff635

由 Catalin Marinas 提交于 6月 11, 2009

This patch adds the callbacks to kmemleak_(alloc|free) functions from
the slab allocator. The patch also adds the SLAB_NOLEAKTRACE flag to
avoid recursive calls to kmemleak when it allocates its own data
structures.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>

d5cff635

kmemleak: Add the base support · 3c7b4e6b

由 Catalin Marinas 提交于 6月 11, 2009

This patch adds the base support for the kernel memory leak
detector. It traces the memory allocation/freeing in a way similar to
the Boehm's conservative garbage collector, the difference being that
the unreferenced objects are not freed but only shown in
/sys/kernel/debug/kmemleak. Enabling this feature introduces an
overhead to memory allocations.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3c7b4e6b

11 6月, 2009 11 次提交

perf_counter: Add counter->id to the throttle event · cca3f454

由 Peter Zijlstra 提交于 6月 11, 2009

So as to be able to distuinguish between multiple counters.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cca3f454

perf_counter: Better align code · a308444c

由 Ingo Molnar 提交于 6月 11, 2009

Whitespace and comment bits. Also update copyrights.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>

a308444c

perf_counter: Rename L2 to LL cache · 8be6e8f3

由 Peter Zijlstra 提交于 6月 11, 2009

The top (fastest) and last level (biggest) caches are the most
interesting ones, performance wise.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ Fixed the Nehalem LL table to LLC Reference/Miss events ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8be6e8f3

perf_counter: Standardize event names · f4dbfa8f

由 Peter Zijlstra 提交于 6月 11, 2009

Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f4dbfa8f

perf_counter: Rename enums · 1c432d89

由 Peter Zijlstra 提交于 6月 11, 2009

Rename the perf enums to be in the 'perf_' namespace and strictly
enumerate the ABI bits.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1c432d89

lib: isolate rational fractions helper function · 8759ef32

由 Oskar Schirmer 提交于 6月 11, 2009

Provide a helper function to determine optimum numerator
denominator value pairs taking into account restricted
register size. Useful especially with PLL and other clock
configurations.
Signed-off-by: NOskar Schirmer <os@emlix.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8759ef32

serial: Added Timberdale UART driver · 34aec591

由 Richard Röjfors 提交于 6月 11, 2009

Driver for the UART found in the Timberdale FPGA
Signed-off-by: NRichard Röjfors <richard.rojfors.ext@mocean-labs.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

34aec591

serial: add support for the TI AR7 internal UART · 08e0992f

由 Florian Fainelli 提交于 6月 11, 2009

This patch adds support for the TI AR7 internal UART.
Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08e0992f

tty: rewrite the ldisc locking · c65c9bc3

由 Alan Cox 提交于 6月 11, 2009

There are several pretty much unfixable races in the old ldisc code, especially
with respect to pty behaviour and also to hangup. It's easier to rewrite the
code than simply try and patch it up.

This patch
- splits the ldisc from the tty (so we will be able to refcount it more cleanly
  later)
- introduces a mutex lock for ldisc changing on an active device
- fixes the complete mess that hangup caused
- implements hopefully correct setldisc/close/hangup locking

There are still some problems around pty pairs that have always been there but
at least it is now possible to understand the code and fix further problems.

This fixes the following known bugs
- hang up can leak ldisc references
- hang up may not call open/close on ldisc in a matched way
- pty/tty pairs can deadlock during an ldisc change
- reading the ldisc proc files can cause every ldisc to be loaded

and probably a few other of the mysterious ldisc race reports.

I'm sure it also adds the odd new one.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c65c9bc3

tty: Extract various bits of ldisc code · e8b70e7d

由 Alan Cox 提交于 6月 11, 2009

Before trying to tackle the ldisc bugs the code needs to be a good deal
more readable, so do the simple extractions of routines first.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8b70e7d

tty: throttling race fix · 38db8979

由 Alan Cox 提交于 6月 11, 2009

The tty throttling code can race due to the lock drops. It takes very high
loads but this has been observed and verified by Rob Duncan.

The basic problem is that on an SMP box we can go

	CPU #1				CPU #2
	need to throttle ?
	suppose we should		buffer space cleared
					are we throttled
					yes ? - unthrottle
	call throttle method

This changeet take the termios lock to protect against this. The termios
lock isn't the initial obvious candidate but many implementations of throttle
methods already need to poke around their own termios structures (and nobody
really locks them against a racing change of flow control).

This does mean that anyone who is setting tty->low_latency = 1 and then
calling tty_flip_buffer_push from their unthrottle method is going to end up
collapsing in a pile of locks. However we've removed all the known bogus
users of low_latency = 1 and such use isn't safe anyway for other reasons so
catching it would be an improvement.
Signed-off-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

38db8979

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功