提交 · db2a144bedd58b3dcf19950c2f476c58c9f39d18 · openeuler / raspberrypi-kernel

07 5月, 2013 1 次提交

block_device_operations->release() should return void · db2a144b

由 Al Viro 提交于 5月 05, 2013

The value passed is 0 in all but "it can never happen" cases (and those
only in a couple of drivers) *and* it would've been lost on the way
out anyway, even if something tried to pass something meaningful.
Just don't bother.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db2a144b

06 5月, 2013 1 次提交

hfs: SMP race on directory close() · 1950267e

由 Al Viro 提交于 5月 05, 2013

->open_dir_list needs protection...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1950267e

05 5月, 2013 11 次提交

hostfs: use kmalloc instead of kzalloc · 371fdab1

由 James Hogan 提交于 3月 27, 2013

The inode info structure is zeroed at allocation with kzalloc, and then
all but one of the fields (including the largest, vfs_inode) are
initialised explicitly. Switch to using kmalloc and initialise the
remaining field too.
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

371fdab1

hostfs: move HOSTFS_SUPER_MAGIC to <linux/magic.h> · 2b3b9bb0

由 James Hogan 提交于 3月 27, 2013

Move HOSTFS_SUPER_MAGIC to <linux/magic.h> to be with it's magical
friends from other file systems.
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b3b9bb0

hostfs: remove "will unlock" comment · 9dcc5e8a

由 James Hogan 提交于 3月 27, 2013

A "will unlock" comment was added to hostfs in the following commit,
along with a spinlock:

Commit e9193059 ("hostfs: fix races in
dentry_name() and inode_name()").

But the spinlock was subsequently removed in the following commit:

Commit ec2447c2 ("hostfs: simplify
locking").

Since the comment is no longer applicable, remove it.
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9dcc5e8a

vfs: use list_move instead of list_del/list_add · 9ed53b12

由 Wei Yongjun 提交于 3月 12, 2013

Using list_move() instead of list_del() + list_add().
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9ed53b12

proc_devtree: Replace include linux/module.h with linux/export.h · 75fc0cf6

由 Syam Sidhardhan 提交于 2月 15, 2013

Since it uses only THIS_MODULE macro, include <linux/export.h>
is the right to go here.
Signed-off-by: NSyam Sidhardhan <s.syam@samsung.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

75fc0cf6

create_mnt_ns: unidiomatic use of list_add() · b1983cd8

由 Al Viro 提交于 5月 04, 2013

while list_add(A, B) and list_add(B, A) are equivalent when both A and B
are guaranteed to be empty, the usual idiom is list_add(what, where),
not the other way round...  Not a bug per se, but only by accident and
it makes RTFS harder for no good reason.
Spotted-by: NRajat Sharma <fs.rajat@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b1983cd8

fs: remove dentry_lru_prune() · 61572bb1

由 Yan, Zheng 提交于 4月 15, 2013

When pruning a dentry, its ancestor dentry can also be pruned. But
the ancestor dentry does not go through dput(), so it does not get
put on the dentry LRU. Hence associating d_prune with removing the
dentry from the LRU is the wrong.

The fix is remove dentry_lru_prune(). Call file system's d_prune()
callback directly when pruning dentries.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

61572bb1

Removed unused typedef to avoid "unused local typedef" warnings. · 6b13eb1b

由 Han Shen 提交于 4月 12, 2013

Fix warnings about unused local typedefs (reported by gcc 4.8).

Signed-off-by: Han Shen  (shenhan@google.com)

Change-Id: I4bccc234f1390daa808d2b309ed112e20c0ac096
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6b13eb1b

kill fs/read_write.h · c0bd14af

由 Al Viro 提交于 5月 04, 2013

fs/compat.c doesn't need it anymore, so let's just move the remaining
contents (two typedefs) into fs/read_write.c
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c0bd14af

A
do_coredump(): don't wait for thaw if coredump has already been interrupted · e86d35c3
由 Al Viro 提交于 5月 04, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e86d35c3

do_mount(): fix a leak introduced in 3.9 ("mount: consolidate permission checks") · 0d5cadb8

由 Al Viro 提交于 5月 04, 2013

Cc: stable@vger.kernel.org
Bisected-by: NMichael Leun <lkml20130126@newton.leun.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0d5cadb8

02 5月, 2013 17 次提交

A
don't bother with deferred freeing of fdtables · ac3e3c5b
由 Al Viro 提交于 4月 28, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ac3e3c5b

proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h · 59d8053f

由 David Howells 提交于 4月 11, 2013

Move non-public declarations and definitions from linux/proc_fs.h to
fs/proc/internal.h.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

59d8053f

proc: Make the PROC_I() and PDE() macros internal to procfs · c30480b9

由 David Howells 提交于 4月 12, 2013

Make the PROC_I() and PDE() macros internal to procfs.  This means making
PDE_DATA() out of line.  This could be made more optimal by storing
PDE()->data into inode->i_private.

Also provide a __PDE_DATA() that is inline and internal to procfs.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c30480b9

proc: Supply a function to remove a proc entry by PDE · a8ca16ea

由 David Howells 提交于 4月 12, 2013

Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer.  This allows us to eliminate all remaining pde->name
accesses outside of procfs.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NGrant Likely <grant.likely@linaro.or>
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8ca16ea

A
take cgroup_open() and cpuset_open() to fs/proc/base.c · 8d8b97ba
由 Al Viro 提交于 4月 19, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8d8b97ba

reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() · e42270a1

由 David Howells 提交于 4月 12, 2013

Don't access the proc_dir_entry in ReiserFS's r_open(), r_start() r_show()
procfs interface functions.

ReiserFS stores the ->show() method pointer in PDE->data and the super_block
pointer in PDE->parent->data.  This isn't changing.

Currently, ReiserFS passes the PDE pointer into seq_file::private from
r_open() so that r_start() and r_show() can then access it.  Instead, use
seq_open_private() to allocate a two-pointer struct that's passed through
seq_file::private and put the ->show() method and the sb pointers in there.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: reiserfs-devel@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e42270a1

proc: Supply an accessor for getting the data from a PDE's parent · 4a520d27

由 David Howells 提交于 4月 12, 2013

Supply an accessor function for getting the private data from the parent
proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

ReiserFS, for instance, stores the super_block pointer in the proc directory
it makes for that super_block, and a pointer to the respective seq_file show
function in each of the proc files in that directory.

This allows a reduction in the number of file_operations structs, open
functions and seq_operations structs required.  The problem otherwise is that
each show function requires two pieces of data but only has storage for one
per PDE (and this has no release function).
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: Maxim Mikityanskiy <maxtram95@gmail.com>
cc: YAMANE Toshiaki <yamanetoshi@gmail.com>
cc: linux-wireless@vger.kernel.org
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4a520d27

proc: Add proc_mkdir_data() · 270b5ac2

由 David Howells 提交于 4月 12, 2013

Add proc_mkdir_data() to allow procfs directories to be created that are
annotated at the time of creation with private data rather than doing this
post-creation.  This means no access is then required to the proc_dir_entry
struct to set this.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Neela Syam Kolli <megaraidlinux@lsi.com>
cc: Jerry Chuang <jerry-chuang@realtek.com>
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
cc: linux-wireless@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

270b5ac2

proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} · 34db8aaf

由 David Howells 提交于 4月 12, 2013

Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h.

Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as
they're internal to procfs.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NGrant Likely <grant.likely@secretlab.ca>
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-arch@vger.kernel.org
cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
cc: Jri Slaby <jslaby@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

34db8aaf

proc: Move PDE_NET() to fs/proc/proc_net.c · 4abfd029

由 David Howells 提交于 4月 12, 2013

Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4abfd029

proc: Split the namespace stuff out into linux/proc_ns.h · 0bb80f24

由 David Howells 提交于 4月 12, 2013

Split the proc namespace stuff out into linux/proc_ns.h.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn <serge.hallyn@ubuntu.com>
cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0bb80f24

proc: Move proc_fd() to fs/proc/fd.h · c3bef7bc

由 David Howells 提交于 4月 12, 2013

Move proc_fd() to fs/proc/fd.h.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c3bef7bc

proc: Uninline pid_delete_dentry() · 1dd704b6

由 David Howells 提交于 4月 12, 2013

Uninline pid_delete_dentry() as it's only used by three function pointers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1dd704b6

proc: Supply PDE attribute setting accessor functions · 271a15ea

由 David Howells 提交于 4月 12, 2013

Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

271a15ea

xfs: fix da node magic number mismatches · cab09a81

由 Dave Chinner 提交于 4月 30, 2013

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

cab09a81

xfs: Remote attr validation fixes and optimisations · 946217ba

由 Dave Chinner 提交于 4月 30, 2013

- optimise the calcuation for the number of blocks in a remote
  xattr.
- check attribute length against MAX_XATTR_SIZE, not MAXPATHLEN
- whitespace fixes
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

946217ba

jfs: fix a couple races · 73aaa22d

由 Dave Kleikamp 提交于 5月 01, 2013

This patch fixes races uncovered by xfstests testcase 068.

One race is the result of jfs_sync() trying to write a sync point to the
journal after it has been frozen (or possibly in the process). Since
freezing sync's the journal, there is no need to write a sync point so
we simply want to return.

The second involves jfs_write_inode() being called on a deleted inode.
It calls jfs_flush_journal which is held up by the jfs_commit thread
doing the final iput on the same deleted inode, which itself is
waiting for the I_SYNC flag to be cleared. jfs_write_inode need not
do anything when i_nlink is zero, which is the easy fix.
Reported-by: NMichael L. Semon <mlsemon35@gmail.com>
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>

73aaa22d

01 5月, 2013 10 次提交

exec: do not abuse ->cred_guard_mutex in threadgroup_lock() · e56fb287

由 Oleg Nesterov 提交于 4月 30, 2013

threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable.  This doesn't look nice, the scope of
this lock in do_execve() is huge.

And as Dave pointed out this can lead to deadlock, we have the
following dependencies:

	do_execve:		cred_guard_mutex -> i_mutex
	cgroup_mount:		i_mutex -> cgroup_mutex
	attach_task_by_pid:	cgroup_mutex -> cred_guard_mutex

Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.

Note that de_thread() can't sleep with ->group_rwsem held, this can
obviously deadlock with the exiting leader if the writer is active, so it
does threadgroup_change_end() before schedule().
Reported-by: NDave Jones <davej@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e56fb287

set_task_comm: kill the pointless memset() + wmb() · 12eaaf30

由 Oleg Nesterov 提交于 4月 30, 2013

set_task_comm() does memset() + wmb() before strlcpy().  This buys
nothing and to add to the confusion, the comment is wrong.

- We do not need memset() to be "safe from non-terminating string
  reads", the final char is always zero and we never change it.

- wmb() is paired with nothing, it cannot prevent from printing
  the mixture of the old/new data unless the reader takes the lock.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: John Stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12eaaf30

fs, proc: truncate /proc/pid/comm writes to first TASK_COMM_LEN bytes · 830e0fc9

由 David Rientjes 提交于 4月 30, 2013

Currently, a write to a procfs file will return the number of bytes
successfully written.  If the actual string is longer than this, the
remainder of the string will not be be written and userspace will
complete the operation by issuing additional write()s.

Hence

	$ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

results in

	$ cat /proc/$$/comm
	pqrs

since the final four bytes were written with a second write() since
TASK_COMM_LEN == 16.  This is obviously an undesired result and not
equivalent to prctl(PR_SET_NAME).  The implementation should not need to
know the definition of TASK_COMM_LEN.

This patch truncates the string to the first TASK_COMM_LEN bytes and
returns the bytes written as the length of the string written so the
second write() is suppressed.

	$ cat /proc/$$/comm
	abcdefghijklmno
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

830e0fc9

coredump: change wait_for_dump_helpers() to use wait_event_interruptible() · dc7ee2aa

由 Oleg Nesterov 提交于 4月 30, 2013

wait_for_dump_helpers() calls wake_up/kill_fasync from inside the
wait_event-like loop.  This is not needed and in fact this is not
strictly correct, we can/should do this only once after we change
pipe->writers.  We could even check if it becomes zero.

Change this code to use use wait_event_interruptible(), this can also
help to make this wait freezable.

With this patch we check pipe->readers without pipe_lock(), this is
fine.  Once we see pipe->readers == 1 we know that the handler
decremented the counter, this is all we need.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc7ee2aa

coredump: factor out the setting of PF_DUMPCORE · 079148b9

由 Oleg Nesterov 提交于 4月 30, 2013

Cleanup.  Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into
zap_threads() called by do_coredump().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

079148b9

coredump: introduce dump_interrupted() · 528f827e

由 Oleg Nesterov 提交于 4月 30, 2013

By discussion with Mandeep.

Change dump_write(), dump_seek() and do_coredump() to check
signal_pending() and abort if it is true.  dump_seek() does this only
before f_op->llseek(), otherwise it relies on dump_write().

We need this change to ensure that the coredump won't delay suspend, and
to ensure it reacts to SIGKILL "quickly enough", a core dump can take a
lot of time.  In particular this can help oom-killer.

We add the new trivial helper, dump_interrupted() to add the comments and
to simplify the potential freezer changes.  Perhaps it will have more
callers.

Ideally it should do try_to_freeze() but then we need the unpleasant
changes in dump_write() and wait_for_dump_helpers().  It is not trivial to
change dump_write() to restart if f_op->write() fails because of
freezing().  We need to handle the short writes, we need to clear
TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change
it to check PF_DUMPCORE).  And if the buggy f_op->write() sets
TIF_SIGPENDING we can not distinguish this case from the race with
freeze_task() + __thaw_task().

So we simply accept the fact that the freezer can truncate a core-dump but
at least you can reliably suspend.  Hopefully we can tolerate this
unlikely case and the necessary complications doesn't worth a trouble.
But if we decide to make the coredumping freezable later we can do this on
top of this change.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

528f827e

coredump: sanitize the setting of signal->group_exit_code · acdedd99

由 Oleg Nesterov 提交于 4月 30, 2013

Now that the coredumping process can be SIGKILL'ed, the setting of
->group_exit_code in do_coredump() can race with complete_signal() and
SIGKILL or 0x80 can be "lost", or wait(status) can report status ==
SIGKILL | 0x80.

But the main problem is that it is not clear to me what should we do if
binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch
comes as a separate change.

This patch adds 0x80 if ->core_dump() succeeds and the process was not
killed.  But perhaps we can (should?) re-set ->group_exit_code changed by
SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

acdedd99

coredump: ensure that SIGKILL always kills the dumping thread · 6cd8f0ac

由 Oleg Nesterov 提交于 4月 30, 2013

prepare_signal() blesses SIGKILL sent to the dumping process but this
signal can be "lost" anyway.  The problems is, complete_signal() sees
SIGNAL_GROUP_EXIT and skips the "kill them all" logic.  And even if the
dumping process is single-threaded (so the target is always "correct"),
the group-wide SIGKILL is not recorded in task->pending and thus
__fatal_signal_pending() won't be true.  A multi-threaded case has even
more problems.

And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look
right to me.  This coredumping process is not exiting yet, it can do a lot
of work dumping the core.

With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set
signal->group_exit_task instead.  This makes signal_group_exit() true and
thus this should equally close the races with exit/exec/stop but allows to
kill the dumping thread reliably.

Notes:
	- It is not clear what should we do with ->group_exit_code
	  if the dumper was killed, see the next change.

	- we need more (hopefully straightforward) changes to ensure
	  that SIGKILL actually interrupts the coredump. Basically we
	  need to check __fatal_signal_pending() in dump_write() and
	  dump_seek().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6cd8f0ac

coredump: only SIGKILL should interrupt the coredumping task · 403bad72

由 Oleg Nesterov 提交于 4月 30, 2013

There are 2 well known and ancient problems with coredump/signals, and a
lot of related bug reports:

- do_coredump() clears TIF_SIGPENDING but of course this can't help
  if, say, SIGCHLD comes after that.

  In this case the coredump can fail unexpectedly. See for example
  wait_for_dump_helper()->signal_pending() check but there are other
  reasons.

- At the same time, dumping a huge core on the slow media can take a
  lot of time/resources and there is no way to kill the coredumping
  task reliably. In particular this is not oom_kill-friendly.

This patch tries to fix the 1st problem, and makes the preparation for the
next changes.

We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate
that this process dumps the core.  prepare_signal() checks this flag and
nacks any signal except SIGKILL.

Note that this check tries to be conservative, in the long term we should
probably treat the SIGNAL_GROUP_EXIT case equally but this needs more
discussion.  See marc.info/?l=linux-kernel&m=120508897917439

Notes:
	- recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP.
	  The patch assumes that dump_write/etc paths should never
	  call it, but we can change it as well.

	- There is another source of TIF_SIGPENDING, freezer. This
	  will be addressed separately.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

403bad72

usermodehelper: split remaining calls to call_usermodehelper_fns() · 907ed132

由 Lucas De Marchi 提交于 4月 30, 2013

These are the only users of call_usermodehelper_fns().  This function
suffers from not being able to determine if the cleanup is called.  Even
if in this places the cleanup pointer is NULL, convert them to use the
separate call_usermodehelper_setup() + call_usermodehelper_exec()
functions so we can remove the _fns variant.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

907ed132