提交 · e00c54c36ac2024c3a8a37432e2e2698ff849594 · openeuler / raspberrypi-kernel

02 10月, 2009 1 次提交

fs/bio.c: move EXPORT* macros to line after function · a112a71d

由 H Hartley Sweeten 提交于 9月 26, 2009

As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a112a71d

28 9月, 2009 1 次提交

const: mark struct vm_struct_operations · f0f37e2f

由 Alexey Dobriyan 提交于 9月 27, 2009

* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0f37e2f

26 9月, 2009 13 次提交

writeback: pass in super_block to bdi_start_writeback() · a72bfd4d

由 Jens Axboe 提交于 9月 26, 2009

Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.

This fixes a problem with commit 56a131dc
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a72bfd4d

cifs: fix locking and list handling code in cifs_open and its helper · 3321b791

由 Jeff Layton 提交于 9月 25, 2009

The patch to remove cifs_init_private introduced a locking imbalance. It
didn't remove the leftover list addition code and the unlocking in that
function. cifs_new_fileinfo does the list addition now, so there should
be no need to do it outside of that function.

pCifsInode will never be NULL, so we don't need to check for that. This
patch also gets rid of the ugly locking and unlocking across function
calls.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve French <sfrench@us.ibm.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

3321b791

writeback: writeback_inodes_sb() should use bdi_start_writeback() · 56a131dc

由 Jens Axboe 提交于 9月 25, 2009

Pointless to iterate other devices looking for a super, when
we have a bdi mapping.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

56a131dc

writeback: don't delay inodes redirtied by a fast dirtier · b3af9468

由 Wu Fengguang 提交于 9月 25, 2009

Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier.  We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3af9468

writeback: make the super_block pinning more efficient · 9ecc2738

由 Jens Axboe 提交于 9月 24, 2009

Currently we pin the inode->i_sb for every single inode. This
increases cache traffic on sb->s_umount sem. Lets instead
cache the inode sb pin state and keep the super_block pinned
for as long as keep writing out inodes from the same
super_block.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

9ecc2738

writeback: don't resort for a single super_block in move_expired_inodes() · cf137307

由 Jens Axboe 提交于 9月 24, 2009

If we only moved inodes from a single super_block to the temporary
list, there's no point in doing a resort for multiple super_blocks.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cf137307

writeback: move inodes from one super_block together · 5c03449d

由 Shaohua Li 提交于 9月 24, 2009

__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
several partitions, writeback might keep spindle moving between partitions.
To reduce the move, better write big chunk of one partition and then move to
another. Inodes from one fs usually are in one partion, so idealy move indoes
from one fs together should reduce spindle move. This patch tries to address
this. Before per-bdi writeback is added, the behavior is write indoes
from one fs first and then another, so the patch restores previous behavior.
The loop in the patch is a bit ugly, should we add a dirty list for each
superblock in bdi_writeback?

Test in a two partition disk with attached fio script shows about 3% ~ 6%
improvement.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5c03449d

J
writeback: get rid to incorrect references to pdflush in comments · 5b0830cb
由 Jens Axboe 提交于 9月 23, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
5b0830cb

writeback: improve readability of the wb_writeback() continue/break logic · 71fd05a8

由 Jens Axboe 提交于 9月 23, 2009

And throw some comments in there, too.
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

71fd05a8

writeback: cleanup writeback_single_inode() · ae1b7f7d

由 Wu Fengguang 提交于 9月 23, 2009

Make the if-else straight in writeback_single_inode().
No behavior change.

Cc: Jan Kara <jack@suse.cz>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ae1b7f7d

writeback: kupdate writeback shall not stop when more io is possible · 7fbdea32

由 Wu Fengguang 提交于 9月 23, 2009

Fix the kupdate case, which disregards wbc.more_io and stop writeback
prematurely even when there are more inodes to be synced.

wbc.more_io should always be respected.

Also remove the pages_skipped check. It will set when some page(s) of some
inode(s) cannot be written for now. Such inodes will be delayed for a while.
This variable has nothing to do with whether there are other writeable inodes.

CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7fbdea32

writeback: stop background writeback when below background threshold · d3ddec76

由 Wu Fengguang 提交于 9月 23, 2009

Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.

Also simplify the (nr_pages <= 0) checks. Since we already pass in
nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't
need to worry about it being decreased to zero.
Reported-by: NRichard Kennedy <richard@rsk.demon.co.uk>
CC: Jan Kara <jack@suse.cz>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d3ddec76

fs: Fix busyloop in wb_writeback() · a5989bdc

由 Jan Kara 提交于 9月 16, 2009

If all inodes are under writeback (e.g. in case when there's only one inode
with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
write anything.
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a5989bdc

25 9月, 2009 7 次提交

[CIFS] Remove build warning · 15dd4781

由 Steve French 提交于 9月 25, 2009

Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

15dd4781

cifs: fix problems with last two commits · 5d2c0e22

由 Jeff Layton 提交于 9月 24, 2009

Fix problems with commits:

086f68bd
3bc303c2Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

5d2c0e22

S
[CIFS] Fix build break when keys support turned off · 0f59e61c
由 Steve French 提交于 9月 25, 2009
```
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>
```
0f59e61c

procfs: disable per-task stack usage on NOMMU · c44972f1

由 Andrew Morton 提交于 9月 24, 2009

It needs walk_page_range().
Reported-by: NMichal Simek <monstr@monstr.eu>
Tested-by: NMichal Simek <monstr@monstr.eu>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c44972f1

cifs: eliminate cifs_init_private · 086f68bd

由 Jeff Layton 提交于 9月 21, 2009

...it does the same thing as cifs_fill_fileinfo, but doesn't handle the
flist ordering correctly. Also rename cifs_fill_fileinfo to a more
descriptive name and have it take an open flags arg instead of just a
write_only flag. That makes the logic in the callers a little simpler.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

086f68bd

nfs[23] tcp breakage in mount with binary options · 36dd2fdb

由 Al Viro 提交于 9月 24, 2009

We forget to set nfs_server.protocol in tcp case when old-style binary
options are passed to mount.  The thing remains zero and never validated
afterwards.  As the result, we hit BUG in fs/nfs/client.c:588.

Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data
merged yesterday...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

36dd2fdb

cifs: convert oplock breaks to use slow_work facility (try #4) · 3bc303c2

由 Jeff Layton 提交于 9月 21, 2009

This is the fourth respin of the patch to convert oplock breaks to
use the slow_work facility.

A customer of ours was testing a backport of one of the earlier
patchsets, and hit a "Busy inodes after umount..." problem. An oplock
break job had raced with a umount, and the superblock got torn down and
its memory reused. When the oplock break job tried to dereference the
inode->i_sb, the kernel oopsed.

This patchset has the oplock break job hold an inode and vfsmount
reference until the oplock break completes.  With this, there should be
no need to take a tcon reference (the vfsmount implicitly holds one
already).

Currently, when an oplock break comes in there's a chance that the
oplock break job won't occur if the allocation of the oplock_q_entry
fails. There are also some rather nasty races in the allocation and
handling these structs.

Rather than allocating oplock queue entries when an oplock break comes
in, add a few extra fields to the cifsFileInfo struct. Get rid of the
dedicated cifs_oplock_thread as well and queue the oplock break job to
the slow_work thread pool.

This approach also has the advantage that the oplock break jobs can
potentially run in parallel rather than be serialized like they are
today.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

3bc303c2

24 9月, 2009 18 次提交

task_struct cleanup: move binfmt field to mm_struct · 801460d0

由 Hiroshi Shimamoto 提交于 9月 23, 2009

Because the binfmt is not different between threads in the same process,
it can be moved from task_struct to mm_struct.  And binfmt moudle is
handled per mm_struct instead of task_struct.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

801460d0

fs/romfs: correct error-handling code · a21f3c2a

由 Julia Lawall 提交于 9月 23, 2009

romfs_iget returns an ERR_PTR value in an error case instead of NULL.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@

x = romfs_iget(...)
... when != x = E
(
*  if (x == NULL || ...) S1 else S2
|
*  if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a21f3c2a

adfs: remove redundant test on unsigned · 3886de93

由 Roel Kluin 提交于 9月 23, 2009

unsigned block cannot be less than 0.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3886de93

sysctl: remove "struct file *" argument of ->proc_handler · 8d65af78

由 Alexey Dobriyan 提交于 9月 23, 2009

It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d65af78

fs/char_dev.c: remove useless loop · dd5d81f3

由 Renzo Davoli 提交于 9月 23, 2009

There are two useless lines in fs/char_dev.c.

In register_chrdev there is a loop to change all '/' into '!' in the
kernel object name.
This code is useless as the same substitution is in kobject_set_name_vargs in
lib/kobject.c:
228         /* ewww... some of these buggers have '/' in the name ... */
229         while ((s = strchr(kobj->name, '/')))
230                 s[0] = '!';

kobject_set_name_vargs is called by kobject_set_name.
kobject_set_name is called just above the useless loop.

[hidave.darkstar@gmail.com: fix warning, remove the unused char *s]
Signed-off-by: NRenzo Davoli <renzo@cs.unibo.it>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd5d81f3

flat: use IS_ERR_VALUE() helper macro · 0b8c78f2

由 Mike Frysinger 提交于 9月 23, 2009

There is a common macro now for testing mixed pointer/errno values, so use
that rather than handling the casts ourself.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Acked-by: NDavid McCullough <david_mccullough@securecomputing.com>
Acked-by: NGreg Ungerer <gerg@uclinux.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0b8c78f2

fdpic: ignore the loader's PT_GNU_STACK when calculating the stack size · 8e8b63a6

由 David Howells 提交于 9月 23, 2009

Ignore the loader's PT_GNU_STACK when calculating the stack size, and only
consider the executable's PT_GNU_STACK, assuming the executable has one.

Currently the behaviour is to take the largest stack size and use that,
but that means you can't reduce the stack size in the executable.  The
loader's stack size should probably only be used when executing the loader
directly.

WARNING: This patch is slightly dangerous - it may render a system
inoperable if the loader's stack size is larger than that of important
executables, and the system relies unknowingly on this increasing the size
of the stack.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Acked-by: NPaul Mundt <lethal@linux-sh.org>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e8b63a6

elf: clean up fill_note_info() · 0cf062d0

由 Amerigo Wang 提交于 9月 23, 2009

Introduce a helper function elf_note_info_init() to help fill_note_info()
to do initializations, also fix the potential memory leaks.

[akpm@linux-foundation.org: remove NUM_NOTES]
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0cf062d0

fcntl: add F_[SG]ETOWN_EX · ba0a6c9f

由 Peter Zijlstra 提交于 9月 23, 2009

In order to direct the SIGIO signal to a particular thread of a
multi-threaded application we cannot, like suggested by the manpage, put a
TID into the regular fcntl(F_SETOWN) call.  It will still be send to the
whole process of which that thread is part.

Since people do want to properly direct SIGIO we introduce F_SETOWN_EX.

The need to direct SIGIO comes from self-monitoring profiling such as with
perf-counters.  Perf-counters uses SIGIO to notify that new sample data is
available.  If the signal is delivered to the same task that generated the
new sample it can augment that data by inspecting the task's user-space
state right after it returns from the kernel.  This is esp.  convenient
for interpreted or virtual machine driven environments.

Both F_SETOWN_EX and F_GETOWN_EX take a pointer to a struct f_owner_ex
as argument:

struct f_owner_ex {
	int   type;
	pid_t pid;
};

Where type is one of F_OWNER_TID, F_OWNER_PID or F_OWNER_GID.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: Nstephane eranian <eranian@googlemail.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba0a6c9f

signals: send_sigio: use do_send_sig_info() to avoid check_kill_permission() · 06f1631a

由 Oleg Nesterov 提交于 9月 23, 2009

group_send_sig_info()->check_kill_permission() assumes that current is the
sender and uses current_cred().

This is not true in send_sigio_to_task() case.  From the security pov the
sender is not current, but the task which did fcntl(F_SETOWN), that is why
we have sigio_perm() which uses the right creds to check.

Fortunately, send_sigio() always sends either SEND_SIG_PRIV or
SI_FROMKERNEL() signal, so check_kill_permission() does nothing.  But
still it would be tidier to avoid this bogus security check and save a
couple of cycles.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06f1631a

exec: fix set_binfmt() vs sys_delete_module() race · 964ee7df

由 Oleg Nesterov 提交于 9月 23, 2009

sys_delete_module() can set MODULE_STATE_GOING after
search_binary_handler() does try_module_get().  In this case
set_binfmt()->try_module_get() fails but since none of the callers
check the returned error, the task will run with the wrong old
->binfmt.

The proper fix should change all ->load_binary() methods, but we can
rely on fact that the caller must hold a reference to binfmt->module
and use __module_get() which never fails.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

964ee7df

exec: allow do_coredump() to wait for user space pipe readers to complete · 61be228a

由 Neil Horman 提交于 9月 23, 2009

Allow core_pattern pipes to wait for user space to complete

One of the things that user space processes like to do is look at metadata
for a crashing process in their /proc/<pid> directory. this is racy
however, since do_coredump in the kernel doesn't wait for the user space
process to complete before it reaps the crashing process. This patch
corrects that. Allowing the kernel to wait for the user space process to
complete before cleaning up the crashing process. This is a bit tricky to
do for a few reasons:

1) The user space process isn't our child, so we can't sys_wait4 on it
2) We need to close the pipe before waiting for the user process to complete,
since the user process may rely on an EOF condition

I've discussed several solutions with Oleg Nesterov off-list about this,
and this is the one we've come up with. We add ourselves as a pipe reader
(to prevent premature cleanup of the pipe_inode_info), and remove
ourselves as a writer (to provide an EOF condition to the writer in user
space), then we iterate until the user space process exits (which we
detect by pipe->readers == 1, hence the > 1 check in the loop). When we
exit the loop, we restore the proper reader/writer values, then we return
and let filp_close in do_coredump clean up the pipe data properly.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reported-by: NEarl Chew <earl_chew@agilent.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61be228a

exec: let do_coredump() limit the number of concurrent dumps to pipes · a293980c

由 Neil Horman 提交于 9月 23, 2009

Introduce core pipe limiting sysctl.

Since we can dump cores to pipe, rather than directly to the filesystem,
we create a condition in which a user can create a very high load on the
system simply by running bad applications.

If the pipe reader specified in core_pattern is poorly written, we can
have lots of ourstandig resources and processes in the system.

This sysctl introduces an ability to limit that resource consumption.
core_pipe_limit defines how many in-flight dumps may be run in parallel,
dumps beyond this value are skipped and a note is made in the kernel log.
A special value of 0 in core_pipe_limit denotes unlimited core dumps may
be handled (this is the default value).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reported-by: NEarl Chew <earl_chew@agilent.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a293980c

exec: make do_coredump() more resilient to recursive crashes · 725eae32

由 Neil Horman 提交于 9月 23, 2009

Change how we detect recursive dumps.

Currently we have a mechanism by which we try to compare pathnames of the
crashing process to the core_pattern path.  This is broken for a dozen
reasons, and just doesn't work in any sort of robust way.

I'm replacing it with the use of a 0 RLIMIT_CORE value.  Since helper apps
set RLIMIT_CORE to zero, we don't write out core files for any process
with that particular limit set.  It the core_pattern is a pipe, any
non-zero limit is translated to RLIM_INFINITY.

This allows complete dumps to be captured, but prevents infinite recursion
in the event that the core_pattern process itself crashes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Reported-by: NEarl Chew <earl_chew@agilent.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

725eae32

hugetlbfs: do not call user_shm_lock() for MAP_HUGETLB fix · ef1ff6b8

由 From: Mel Gorman 提交于 9月 23, 2009

Commit 6bfde05b ("hugetlbfs: allow the creation of files suitable for
MAP_PRIVATE on the vfs internal mount") altered can_do_hugetlb_shm() to
check if a file is being created for shared memory or mmap().  If this
returns false, we then unconditionally call user_shm_lock() triggering a
warning.  This block should never be entered for MAP_HUGETLB.  This
patch partially reverts the problem and fixes the check.
Signed-off-by: NEric B Munson <ebmunson@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Adam Litke <agl@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef1ff6b8

Btrfs: hash the btree inode during fill_super · c65ddb52

由 Yan Zheng 提交于 9月 24, 2009

The snapshot deletion  patches dropped this line, but the inode
needs to be hashed.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c65ddb52

Btrfs: relocate file extents in clusters · 0257bb82

由 Yan, Zheng 提交于 9月 24, 2009

The extent relocation code copy file extents one by one when
relocating data block group. This is inefficient if file
extents are small. This patch makes the relocation code copy
file extents in clusters. So we can can make better use of
read-ahead.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0257bb82

Btrfs: don't rename file into dummy directory · f679a840

由 Yan, Zheng 提交于 9月 24, 2009

A recent change enforces only one access point to each subvolume. The first
directory entry (the one added when the subvolume/snapshot was created) is
treated as valid access point, all other subvolume links are linked to dummy
empty directories. The dummy directories are temporary inodes that only in
memory, so we can not rename file into them.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f679a840