提交 · f8206b925fb0eba3a11839419be118b09105d7b1 · openeuler / Kernel

17 1月, 2011 1 次提交

sanitize vfsmount refcounting changes · f03c6599

由 Al Viro 提交于 1月 14, 2011

Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.

Accounting rules for longterm count:
	* 1 for each fs_struct.root.mnt
	* 1 for each fs_struct.pwd.mnt
	* 1 for having non-NULL ->mnt_ns
	* decrement to 0 happens only under vfsmount lock exclusive

That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.

For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.

Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc.  And we get to keep
the optimization Nick's variant had bought us...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f03c6599

14 1月, 2011 1 次提交

pipe: use event aware wakeups · e462c448

由 Davide Libenzi 提交于 1月 12, 2011

Send the events the wakeup refers to, so that epoll, and even the new poll
code in fs/select.c can avoid wakeups if the events do not match the
requested set.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e462c448

13 1月, 2011 1 次提交
- A
  pass default dentry_operations to mount_pseudo() · c74a1cbb
  由 Al Viro 提交于 1月 12, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  c74a1cbb
07 1月, 2011 4 次提交

fs: scale mntget/mntput · b3e19d92

由 Nick Piggin 提交于 1月 07, 2011

The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b3e19d92

fs: improve scalability of pseudo filesystems · 4b936885

由 Nick Piggin 提交于 1月 07, 2011

Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

4b936885

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: avoid inode RCU freeing for pseudo fs · ff0c7d15

由 Nick Piggin 提交于 1月 07, 2011

Pseudo filesystems that don't put inode on RCU list or reachable by
rcu-walk dentries do not need to RCU free their inodes.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

ff0c7d15

29 11月, 2010 2 次提交

Un-inline get_pipe_info() helper function · 72083646

由 Linus Torvalds 提交于 11月 28, 2010

This avoids some include-file hell, and the function isn't really
important enough to be inlined anyway.
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72083646

Export 'get_pipe_info()' to other users · c66fb347

由 Linus Torvalds 提交于 11月 28, 2010

And in particular, use it in 'pipe_fcntl()'.

The other pipe functions do not need to use the 'careful' version, since
they are only ever called for things that are already known to be pipes.

The normal read/write/ioctl functions are called through the file
operations structures, so if a file isn't a pipe, they'd never get
called.  But pipe_fcntl() is special, and called directly from the
generic fcntl code, and needs to use the same careful function that the
splice code is using.

Cc: Jens Axboe <jaxboe@fusionio.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c66fb347

29 10月, 2010 1 次提交
- A
  convert get_sb_pseudo() users · 51139ada
  由 Al Viro 提交于 7月 25, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  51139ada
26 10月, 2010 1 次提交

fs: do not assign default i_ino in new_inode · 85fe4025

由 Christoph Hellwig 提交于 10月 23, 2010

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85fe4025

21 10月, 2010 1 次提交

pipe: fix failure to return error code on ->confirm() · e5953cbd

由 Nicolas Kaiser 提交于 10月 21, 2010

The arguments were transposed, we want to assign the error code to
'ret', which is being returned.
Signed-off-by: NNicolas Kaiser <nikai@nikai.net>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

e5953cbd

11 6月, 2010 2 次提交

pipe: fix check in "set size" fcntl · 6db40cf0

由 Miklos Szeredi 提交于 6月 09, 2010

As it stands this check compares the number of pages to the page size.
This makes no sense and makes the fcntl fail in almost any sane case.

Fix it by checking if nr_pages is not zero (it can become zero only if
arg is too big and round_pipe_size() overflows).
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6db40cf0

pipe: fix pipe buffer resizing · 1d862f41

由 Miklos Szeredi 提交于 6月 08, 2010

pipe_set_size() needs to copy pipe bufs from the old circular buffer
to the new.

The current code gets this wrong in multiple ways, resulting in oops.

Test program is available here:
  http://www.kernel.org/pub/linux/kernel/people/mszeredi/piperesize/Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

1d862f41

03 6月, 2010 3 次提交

pipe: change /proc/sys/fs/pipe-max-pages to byte sized interface · ff9da691

由 Jens Axboe 提交于 6月 03, 2010

This changes the interface to be based on bytes instead. The API
matches that of F_SETPIPE_SZ in that it rounds up the passed in
size so that the resulting page array is a power-of-2 in size.

The proc file is renamed to /proc/sys/fs/pipe-max-size to
reflect this change.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

ff9da691

pipe: change the privilege required for growing a pipe beyond system max · 419f8367

由 Jens Axboe 提交于 6月 03, 2010

Change it to CAP_SYS_RESOURCE, as that more accurately models what
we want to control.
Suggested-by: NMichael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

419f8367

pipe: adjust minimum pipe size to 1 page · 6a6ca57d

由 Jens Axboe 提交于 6月 03, 2010

We don't need to pages to guarantee the POSIX requirement
that upto a page size write must be atomic to an empty
pipe.
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6a6ca57d

28 5月, 2010 1 次提交

fs: Add missing mutex_unlock · cc967be5

由 Julia Lawall 提交于 5月 26, 2010

Add a mutex_unlock missing on the error path.  At other exists from the
function that return an error flag, the mutex is unlocked, so do the same
here.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* mutex_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* mutex_unlock(E1,...);
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cc967be5

26 5月, 2010 1 次提交

mm: export generic_pipe_buf_*() to modules · 51921cb7

由 Miklos Szeredi 提交于 5月 26, 2010

This is needed by fuse device code which wants to create pipe buffers.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

51921cb7

25 5月, 2010 2 次提交

pipe: make F_{GET,SET}PIPE_SZ deal with byte sizes · b9598db3

由 Jens Axboe 提交于 5月 24, 2010

Instead of requiring an exact number of pages as the argument and
return value, change the API to deal with number of bytes instead.

This also relaxes the requirement that the passed in size must
result in a power-of-2 page array size. Round up to the nearest
power-of-2 automatically and return the resulting size of the pipe
on success.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b9598db3

pipe: F_SETPIPE_SZ should return -EPERM for non-root · 0191f869

由 Jens Axboe 提交于 5月 24, 2010

If the passed in size is larger than what has been set as the
system wide limit and the user is not root, we want to return
permission denied (not invalid value).
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0191f869

22 5月, 2010 2 次提交

pipe: set lower and upper limit on max pages in the pipe page array · b492e95b

由 Jens Axboe 提交于 5月 19, 2010

We need at least two to guarantee proper POSIX behaviour, so
never allow a smaller limit than that.

Also expose a /proc/sys/fs/pipe-max-pages sysctl file that allows
root to define a sane upper limit. Make it default to 16 times the
default size, which is 16 pages.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b492e95b

pipe: add support for shrinking and growing pipes · 35f3d14d

由 Jens Axboe 提交于 5月 20, 2010

This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

35f3d14d

17 12月, 2009 3 次提交

fs: no games with DCACHE_UNHASHED · a3a065e3

由 Nick Piggin 提交于 11月 18, 2009

Filesystems outside the regular namespace do not have to clear DCACHE_UNHASHED
in order to have a working /proc/$pid/fd/XXX. Nothing in proc prevents the
fd link from being used if its dentry is not in the hash.

Also, it does not get put into the dcache hash if DCACHE_UNHASHED is clear;
that depends on the filesystem calling d_add or d_rehash.

So delete the misleading comments and needless code.
Acked-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3a065e3

A
switch create_read_pipe() to alloc_file() · d231412d
由 Al Viro 提交于 8月 09, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d231412d

switch alloc_file() to passing struct path · 2c48b9c4

由 Al Viro 提交于 8月 09, 2009

... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c48b9c4

22 10月, 2009 1 次提交

fs: pipe.c null pointer dereference · ad396024

由 Earl Chew 提交于 10月 19, 2009

This patch fixes a null pointer exception in pipe_rdwr_open() which
generates the stack trace:

> Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
>  [<ffffffff802899a5>] pipe_rdwr_open+0x35/0x70
>  [<ffffffff8028125c>] __dentry_open+0x13c/0x230
>  [<ffffffff8028143d>] do_filp_open+0x2d/0x40
>  [<ffffffff802814aa>] do_sys_open+0x5a/0x100
>  [<ffffffff8021faf3>] sysenter_do_call+0x1b/0x67

The failure mode is triggered by an attempt to open an anonymous
pipe via /proc/pid/fd/* as exemplified by this script:

=============================================================
while : ; do
   { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
   PID=$!
   OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
        { read PID REST ; echo $PID; } )
   OUT="${OUT%% *}"
   DELAY=$((RANDOM * 1000 / 32768))
   usleep $((DELAY * 1000 + RANDOM % 1000 ))
   echo n > /proc/$OUT/fd/1                 # Trigger defect
done
=============================================================

Note that the failure window is quite small and I could only
reliably reproduce the defect by inserting a small delay
in pipe_rdwr_open(). For example:

 static int
 pipe_rdwr_open(struct inode *inode, struct file *filp)
 {
       msleep(100);
       mutex_lock(&inode->i_mutex);

Although the defect was observed in pipe_rdwr_open(), I think it
makes sense to replicate the change through all the pipe_*_open()
functions.

The core of the change is to verify that inode->i_pipe has not
been released before attempting to manipulate it. If inode->i_pipe
is no longer present, return ENOENT to indicate so.

The comment about potentially using atomic_t for i_pipe->readers
and i_pipe->writers has also been removed because it is no longer
relevant in this context. The inode->i_mutex lock must be used so
that inode->i_pipe can be dealt with correctly.
Signed-off-by: NEarl Chew <earl_chew@agilent.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ad396024

23 7月, 2009 1 次提交

lockdep: Fix lockdep annotation for pipe_double_lock() · 023d43c7

由 Peter Zijlstra 提交于 7月 21, 2009

The presumed use of the pipe_double_lock() routine is to lock 2 locks in
a deadlock free way by ordering the locks by their address. However it
fails to keep the specified lock classes in order and explicitly
annotates a deadlock.

Rectify this.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMiklos Szeredi <mszeredi@suse.cz>
LKML-Reference: <1248163763.15751.11098.camel@twins>

023d43c7

11 5月, 2009 1 次提交

splice: implement default splice_read method · 6818173b

由 Miklos Szeredi 提交于 5月 07, 2009

If f_op->splice_read() is not implemented, fall back to a plain read.
Use vfs_readv() to read into previously allocated pages.

This will allow splice and functions using splice, such as the loop
device, to work on all filesystems.  This includes "direct_io" files
in fuse which bypass the page cache.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6818173b

15 4月, 2009 1 次提交

splice: add helpers for locking pipe inode · 61e0d47c

由 Miklos Szeredi 提交于 4月 14, 2009

There are lots of sequences like this, especially in splice code:

	if (pipe->inode)
		mutex_lock(&pipe->inode->i_mutex);
	/* do something */
	if (pipe->inode)
		mutex_unlock(&pipe->inode->i_mutex);

so introduce helpers which do the conditional locking and unlocking.
Also replace the inode_double_lock() call with a pipe_double_lock()
helper to avoid spreading the use of this functionality beyond the
pipe code.

This patch is just a cleanup, and should cause no behavioral changes.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

61e0d47c

28 3月, 2009 2 次提交

A
constify dentry_operations: rest · 3ba13d17
由 Al Viro 提交于 2月 20, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3ba13d17

do_pipe cleanup: drop its last user in arch/alpha/ · 10f303ae

由 Cheng Renquan 提交于 1月 14, 2009

The last user of do_pipe is in arch/alpha/, after replacing it with
do_pipe_flags, the do_pipe can be totally dropped.
Signed-off-by: NCheng Renquan <crquan@gmail.com>
Acked-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

10f303ae

16 3月, 2009 1 次提交

Rationalize fasync return values · 60aa4924

由 Jonathan Corbet 提交于 2月 01, 2009

Most fasync implementations do something like:

     return fasync_helper(...);

But fasync_helper() will return a positive value at times - a feature used
in at least one place.  Thus, a number of other drivers do:

     err = fasync_helper(...);
     if (err < 0)
             return err;
     return 0;

In the interests of consistency and more concise code, it makes sense to
map positive return values onto zero where ->fasync() is called.

Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

60aa4924

13 3月, 2009 1 次提交

pipe_rdwr_fasync: fix the error handling to prevent the leak/crash · e5bc49ba

由 Oleg Nesterov 提交于 3月 12, 2009

If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error
but leaves the file on ->fasync_readers.

This was always wrong, but since 233e70f4
"saner FASYNC handling on file close" we have the new problem.  Because in
this case setfl() doesn't set FASYNC bit, __fput() will not do
->fasync(0), and we leak fasync_struct with ->fa_file pointing to the
freed file.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5bc49ba

14 1月, 2009 3 次提交

H
[CVE-2009-0029] System call wrappers part 33 · 2b664219
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
2b664219
H
[CVE-2009-0029] System call wrappers part 32 · d4e82042
由 Heiko Carstens 提交于 1月 14, 2009
```
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
```
d4e82042

[CVE-2009-0029] Remove __attribute__((weak)) from sys_pipe/sys_pipe2 · 1134723e

由 Heiko Carstens 提交于 1月 14, 2009

Remove __attribute__((weak)) from common code sys_pipe implemantation.
IA64, ALPHA, SUPERH (32bit) and SPARC (32bit) have own implemantations
with the same name. Just rename them.
For sys_pipe2 there is no architecture specific implementation.

Cc: Richard Henderson <rth@twiddle.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

1134723e

05 1月, 2009 1 次提交

sanitize audit_fd_pair() · 157cf649

由 Al Viro 提交于 12月 14, 2008

* no allocations
* return void
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

157cf649

14 11月, 2008 1 次提交

CRED: Wrap task credential accesses in the filesystem subsystem · da9592ed

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJames Morris <jmorris@namei.org>

da9592ed

02 11月, 2008 1 次提交

saner FASYNC handling on file close · 233e70f4

由 Al Viro 提交于 10月 31, 2008

As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set.  And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

233e70f4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功