- 01 5月, 2013 40 次提交
-
-
由 Oleg Nesterov 提交于
threadgroup_lock() takes signal->cred_guard_mutex to ensure that thread_group_leader() is stable. This doesn't look nice, the scope of this lock in do_execve() is huge. And as Dave pointed out this can lead to deadlock, we have the following dependencies: do_execve: cred_guard_mutex -> i_mutex cgroup_mount: i_mutex -> cgroup_mutex attach_task_by_pid: cgroup_mutex -> cred_guard_mutex Change de_thread() to take threadgroup_change_begin() around the switch-the-leader code and change threadgroup_lock() to avoid ->cred_guard_mutex. Note that de_thread() can't sleep with ->group_rwsem held, this can obviously deadlock with the exiting leader if the writer is active, so it does threadgroup_change_end() before schedule(). Reported-by: NDave Jones <davej@redhat.com> Acked-by: NTejun Heo <tj@kernel.org> Acked-by: NLi Zefan <lizefan@huawei.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
set_task_comm() does memset() + wmb() before strlcpy(). This buys nothing and to add to the confusion, the comment is wrong. - We do not need memset() to be "safe from non-terminating string reads", the final char is always zero and we never change it. - wmb() is paired with nothing, it cannot prevent from printing the mixture of the old/new data unless the reader takes the lock. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: John Stultz <johnstul@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Currently, a write to a procfs file will return the number of bytes successfully written. If the actual string is longer than this, the remainder of the string will not be be written and userspace will complete the operation by issuing additional write()s. Hence $ echo -n "abcdefghijklmnopqrs" > /proc/self/comm results in $ cat /proc/$$/comm pqrs since the final four bytes were written with a second write() since TASK_COMM_LEN == 16. This is obviously an undesired result and not equivalent to prctl(PR_SET_NAME). The implementation should not need to know the definition of TASK_COMM_LEN. This patch truncates the string to the first TASK_COMM_LEN bytes and returns the bytes written as the length of the string written so the second write() is suppressed. $ cat /proc/$$/comm abcdefghijklmno Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
wait_for_dump_helpers() calls wake_up/kill_fasync from inside the wait_event-like loop. This is not needed and in fact this is not strictly correct, we can/should do this only once after we change pipe->writers. We could even check if it becomes zero. Change this code to use use wait_event_interruptible(), this can also help to make this wait freezable. With this patch we check pipe->readers without pipe_lock(), this is fine. Once we see pipe->readers == 1 we know that the handler decremented the counter, this is all we need. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMandeep Singh Baines <msb@chromium.org> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Cleanup. Every linux_binfmt->core_dump() sets PF_DUMPCORE, move this into zap_threads() called by do_coredump(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMandeep Singh Baines <msb@chromium.org> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
By discussion with Mandeep. Change dump_write(), dump_seek() and do_coredump() to check signal_pending() and abort if it is true. dump_seek() does this only before f_op->llseek(), otherwise it relies on dump_write(). We need this change to ensure that the coredump won't delay suspend, and to ensure it reacts to SIGKILL "quickly enough", a core dump can take a lot of time. In particular this can help oom-killer. We add the new trivial helper, dump_interrupted() to add the comments and to simplify the potential freezer changes. Perhaps it will have more callers. Ideally it should do try_to_freeze() but then we need the unpleasant changes in dump_write() and wait_for_dump_helpers(). It is not trivial to change dump_write() to restart if f_op->write() fails because of freezing(). We need to handle the short writes, we need to clear TIF_SIGPENDING (and we can't rely on recalc_sigpending() unless we change it to check PF_DUMPCORE). And if the buggy f_op->write() sets TIF_SIGPENDING we can not distinguish this case from the race with freeze_task() + __thaw_task(). So we simply accept the fact that the freezer can truncate a core-dump but at least you can reliably suspend. Hopefully we can tolerate this unlikely case and the necessary complications doesn't worth a trouble. But if we decide to make the coredumping freezable later we can do this on top of this change. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NMandeep Singh Baines <msb@chromium.org> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Now that the coredumping process can be SIGKILL'ed, the setting of ->group_exit_code in do_coredump() can race with complete_signal() and SIGKILL or 0x80 can be "lost", or wait(status) can report status == SIGKILL | 0x80. But the main problem is that it is not clear to me what should we do if binfmt->core_dump() succeeds but SIGKILL was sent, that is why this patch comes as a separate change. This patch adds 0x80 if ->core_dump() succeeds and the process was not killed. But perhaps we can (should?) re-set ->group_exit_code changed by SIGKILL back to "siginfo->si_signo |= 0x80" in case when core_dumped == T. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Tested-by: NMandeep Singh Baines <msb@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
prepare_signal() blesses SIGKILL sent to the dumping process but this signal can be "lost" anyway. The problems is, complete_signal() sees SIGNAL_GROUP_EXIT and skips the "kill them all" logic. And even if the dumping process is single-threaded (so the target is always "correct"), the group-wide SIGKILL is not recorded in task->pending and thus __fatal_signal_pending() won't be true. A multi-threaded case has even more problems. And even ignoring all technical details, SIGNAL_GROUP_EXIT doesn't look right to me. This coredumping process is not exiting yet, it can do a lot of work dumping the core. With this patch the dumping process doesn't have SIGNAL_GROUP_EXIT, we set signal->group_exit_task instead. This makes signal_group_exit() true and thus this should equally close the races with exit/exec/stop but allows to kill the dumping thread reliably. Notes: - It is not clear what should we do with ->group_exit_code if the dumper was killed, see the next change. - we need more (hopefully straightforward) changes to ensure that SIGKILL actually interrupts the coredump. Basically we need to check __fatal_signal_pending() in dump_write() and dump_seek(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Tested-by: NMandeep Singh Baines <msb@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
There are 2 well known and ancient problems with coredump/signals, and a lot of related bug reports: - do_coredump() clears TIF_SIGPENDING but of course this can't help if, say, SIGCHLD comes after that. In this case the coredump can fail unexpectedly. See for example wait_for_dump_helper()->signal_pending() check but there are other reasons. - At the same time, dumping a huge core on the slow media can take a lot of time/resources and there is no way to kill the coredumping task reliably. In particular this is not oom_kill-friendly. This patch tries to fix the 1st problem, and makes the preparation for the next changes. We add the new SIGNAL_GROUP_COREDUMP flag set by zap_threads() to indicate that this process dumps the core. prepare_signal() checks this flag and nacks any signal except SIGKILL. Note that this check tries to be conservative, in the long term we should probably treat the SIGNAL_GROUP_EXIT case equally but this needs more discussion. See marc.info/?l=linux-kernel&m=120508897917439 Notes: - recalc_sigpending() doesn't check SIGNAL_GROUP_COREDUMP. The patch assumes that dump_write/etc paths should never call it, but we can change it as well. - There is another source of TIF_SIGPENDING, freezer. This will be addressed separately. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Tested-by: NMandeep Singh Baines <msb@chromium.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lucas De Marchi 提交于
This function suffers from not being able to determine if the cleanup is called in case it returns -ENOMEM. Nobody is using it anymore, so let's remove it. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lucas De Marchi 提交于
These are the only users of call_usermodehelper_fns(). This function suffers from not being able to determine if the cleanup is called. Even if in this places the cleanup pointer is NULL, convert them to use the separate call_usermodehelper_setup() + call_usermodehelper_exec() functions so we can remove the _fns variant. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lucas De Marchi 提交于
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lucas De Marchi 提交于
Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of calling call_usermodehelper_fns(). In case there's an OOM in this last function the cleanup function may not be called - in this case we would miss a call to key_put(). Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lucas De Marchi 提交于
Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of calling call_usermodehelper_fns(). In case the latter returns -ENOMEM the cleanup function may had not been called - in this case we would not free argv and module_name. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: Oleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lucas De Marchi 提交于
call_usermodehelper_setup() + call_usermodehelper_exec() need to be called instead of call_usermodehelper_fns() when the cleanup function needs to be called even when an ENOMEM error occurs. In this case using call_usermodehelper_fns() the user can't distinguish if the cleanup function was called or not. [akpm@linux-foundation.org: export call_usermodehelper_setup() to modules] Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Vagin 提交于
* Dump signals from process-wide and per-thread queues with different sizes of buffers. * Check error paths for buffers with restricted permissions. A part of buffer or a whole buffer is for read-only. * Try to get nonexistent signal. Signed-off-by: NAndrew Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pedro Alves <palves@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Vagin 提交于
This patch adds a new ptrace request PTRACE_PEEKSIGINFO. This request is used to retrieve information about pending signals starting with the specified sequence number. Siginfo_t structures are copied from the child into the buffer starting at "data". The argument "addr" is a pointer to struct ptrace_peeksiginfo_args. struct ptrace_peeksiginfo_args { u64 off; /* from which siginfo to start */ u32 flags; s32 nr; /* how may siginfos to take */ }; "nr" has type "s32", because ptrace() returns "long", which has 32 bits on i386 and a negative values is used for errors. Currently here is only one flag PTRACE_PEEKSIGINFO_SHARED for dumping signals from process-wide queue. If this flag is not set, signals are read from a per-thread queue. The request PTRACE_PEEKSIGINFO returns a number of dumped signals. If a signal with the specified sequence number doesn't exist, ptrace returns zero. The request returns an error, if no signal has been dumped. Errors: EINVAL - one or more specified flags are not supported or nr is negative EFAULT - buf or addr is outside your accessible address space. A result siginfo contains a kernel part of si_code which usually striped, but it's required for queuing the same siginfo back during restore of pending signals. This functionality is required for checkpointing pending signals. Pedro Alves suggested using it in "gdb" to peek at pending signals. gdb already uses PTRACE_GETSIGINFO to get the siginfo for the signal which was already dequeued. This functionality allows gdb to look at the pending signals which were not reported yet. The prototype of this code was developed by Oleg Nesterov. Signed-off-by: NAndrew Vagin <avagin@openvz.org> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pedro Alves <palves@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vyacheslav Dubeyko 提交于
Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Khoroshilov 提交于
__hfsplus_ext_write_extent() suppresses errors coming from hfs_brec_find(). The patch implements error code propagation. Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Use a more current logging style. Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt hfsplus now uses "hfsplus: " for all messages. Coalesce formats. Prefix debugging messages too. Signed-off-by: NJoe Perches <joe@perches.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
Use a more current logging style. Rename macro and uses. Add do {} while (0) to macro. Add DBG_ to macro. Add and use hfs_dbg_cont variant where appropriate. Signed-off-by: NJoe Perches <joe@perches.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vyacheslav Dubeyko 提交于
fs/hfsplus/bfind.c: In function 'hfs_find_1st_rec_by_cnid': (1) include/uapi/linux/swab.h:60:2: warning: 'search_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized] (2) include/uapi/linux/swab.h:60:2: warning: 'cur_cnid' may be used uninitialized in this function [-Wmaybe-uninitialized] [akpm@linux-foundation.org: make the workaround more explicit] Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Khoroshilov 提交于
hfs_find_init() may fail with ENOMEM, but there are places, where the returned value is not checked. The consequences can be very unpleasant, e.g. kfree uninitialized pointer and inappropriate mutex unlocking. The patch adds checks for errors in hfs_find_init(). Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vyacheslav Dubeyko 提交于
page->mapping->host cannot be NULL in nilfs_writepage(), so remove the unneeded test. The fixes the smatch warning: "fs/nilfs2/inode.c:211 nilfs_writepage() error: we previously assumed 'inode' could be null (see line 195)". Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vyacheslav Dubeyko 提交于
Change test_bit(PG_locked, &page->flags) to PageLocked(). Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vyacheslav Dubeyko 提交于
nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption The NILFS2 driver remounts itself in RO mode in the case of discovering metadata corruption (for example, discovering a broken bmap). But usually, this takes place when there have been file system operations before remounting in RO mode. Thereby, NILFS2 driver can be in RO mode with presence of dirty pages in modified inodes' address spaces. It results in flush kernel thread's infinite trying to flush dirty pages in RO mode. As a result, it is possible to see such side effects as: (1) flush kernel thread occupies 50% - 99% of CPU time; (2) system can't be shutdowned without manual power switch off. SYMPTOMS: (1) System log contains error message: "Remounting filesystem read-only". (2) The flush kernel thread occupies 50% - 99% of CPU time. (3) The system can't be shutdowned without manual power switch off. REPRODUCTION PATH: (1) Create volume group with name "unencrypted" by means of vgcreate utility. (2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>): ----------------[BEGIN SCRIPT]-------------------- #!/bin/bash VG=unencrypted #apt-get install nilfs-tools darcs lvcreate --size 2G --name ntest $VG mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest mkdir /var/tmp/n mkdir /var/tmp/n/ntest mount /dev/mapper/$VG-ntest /var/tmp/n/ntest mkdir /var/tmp/n/ntest/thedir cd /var/tmp/n/ntest/thedir sleep 2 date darcs init sleep 2 dmesg|tail -n 5 date darcs whatsnew || true date sleep 2 dmesg|tail -n 5 ----------------[END SCRIPT]-------------------- (3) Try to shutdown the system. REPRODUCIBILITY: 100% FIX: This patch implements checking mount state of NILFS2 driver in nilfs_writepage(), nilfs_writepages() and nilfs_mdt_write_page() methods. If it is detected the RO mount state then all dirty pages are simply discarded with warning messages is written in system log. [akpm@linux-foundation.org: fix printk warning] Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com> Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Anthony Doggett <Anthony2486@interfaces.org.uk> Cc: ARAI Shun-ichi <hermes@ceres.dti.ne.jp> Cc: Piotr Szymaniak <szarpaj@grubelek.pl> Cc: Zahid Chowdhury <zahid.chowdhury@starsolutions.com> Cc: Elmer Zhang <freeboy6716@gmail.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Carpenter 提交于
Limit the size of the copy so we don't corrupt memory. Hopefully this can only be called by root, but fixing this makes the static checkers happier. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Masanari Iida <standby24x7@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ben Hutchings 提交于
Move the calls to memcpy_fromio() up into the loop in dmi_scan_machine(), and move the signature checks back down into dmi_decode(). We need to check at 16-byte intervals but keep a 32-byte buffer for an SMBIOS entry, so shift the buffer after each iteration. Merge smbios_present() into dmi_present(), so we look for an SMBIOS signature at the beginning of the given buffer and then for a DMI signature at an offset of 16 bytes. [artem.savkov@gmail.com: use proper buf type in dmi_present()] Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Reported-by: NTim McGrath <tmhikaru@gmail.com> Tested-by: NTim Mcgrath <tmhikaru@gmail.com> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: NArtem Savkov <artem.savkov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jiri Kosina 提交于
The comment I originally added in commit a3defbe5 ("binfmt_elf: fix PIE execution with randomization disabled") is not really 100% accurate -- sysctl is not the only way how PF_RANDOMIZE could be forcibly unset in runtime. Another option of course is direct modification of personality flags (i.e. running through setarch wrapper). Make the comment more explicit and accurate. Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Josh Triplett 提交于
Add a new configuration option CONFIG_BINFMT_SCRIPT to configure support for interpreted scripts starting with "#!"; allow compiling out that support, or building it as a module. Embedded systems running exclusively compiled binaries could leave this support out, and systems that don't need scripts before mounting the root filesystem can build this as a module. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Wong 提交于
It is always safe to use RCU_INIT_POINTER to NULL a pointer. This results in slightly smaller/faster code. Signed-off-by: NEric Wong <normalperson@yhbt.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Wong 提交于
This reduces the amount of code inside the ready list iteration loops for better readability IMHO. Signed-off-by: NEric Wong <normalperson@yhbt.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Wong 提交于
Technically we do not need to hold ep->mtx during ep_free since we are certain there are no other users of ep at that point. However, lockdep complains with a "suspicious rcu_dereference_check() usage!" message; so lock the mutex before ep_remove to silence the warning. Signed-off-by: NEric Wong <normalperson@yhbt.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arve Hjønnevåg <arve@android.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: NeilBrown <neilb@suse.de>, Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Wong 提交于
This prevents wakeup_source destruction when a user hits the item with EPOLL_CTL_MOD while ep_poll_callback is running. Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2" Signed-off-by: NEric Wong <normalperson@yhbt.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Arve Hjønnevåg <arve@android.com> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: NeilBrown <neilb@suse.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Wong 提交于
It is common for epoll users to have thousands of epitems, so saving a cache line on every allocation leads to large memory savings. Since epitem allocations are cache-aligned, reducing sizeof(struct epitem) from 136 bytes to 128 bytes will allow it to squeeze under a cache line boundary on x86_64. Via /sys/kernel/slab/eventpoll_epi, I see the following changes on my x86_64 Core2 Duo (which has 64-byte cache alignment): object_size : 192 => 128 objs_per_slab: 21 => 32 Also, add a BUILD_BUG_ON() to check for future accidental breakage. [akpm@linux-foundation.org: use __packed, for all architectures] Signed-off-by: NEric Wong <normalperson@yhbt.net> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Rothwell 提交于
Andrew Morton noted: akpm3:/usr/src/25> grep SYSCALL kernel/timer.c SYSCALL_DEFINE1(alarm, unsigned int, seconds) SYSCALL_DEFINE0(getpid) SYSCALL_DEFINE0(getppid) SYSCALL_DEFINE0(getuid) SYSCALL_DEFINE0(geteuid) SYSCALL_DEFINE0(getgid) SYSCALL_DEFINE0(getegid) SYSCALL_DEFINE0(gettid) SYSCALL_DEFINE1(sysinfo, struct sysinfo __user *, info) COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info) Only one of those should be in kernel/timer.c. Who wrote this thing? [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Rothwell 提交于
The only use outside of kernel/timer.c was in kernel/compat.c, so move compat_sys_sysinfo() next to sys_sysinfo() in kernel/timer.c. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
There is string_unescape_inplace() function which decodes strings in generic way. Let's use it. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
There is kernel function to do the job in generic way. Let's use it. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-