- 07 5月, 2012 3 次提交
-
-
由 Srikar Dronamraju 提交于
Implements trace_event support for uprobes. In its current form it can be used to put probes at a specified offset in a file and dump the required registers when the code flow reaches the probed address. The following example shows how to dump the instruction pointer and %ax a register at the probed text address. Here we are trying to probe zfree in /bin/zsh: # cd /sys/kernel/debug/tracing/ # cat /proc/`pgrep zsh`/maps | grep /bin/zsh | grep r-xp 00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh # objdump -T /bin/zsh | grep -w zfree 0000000000446420 g DF .text 0000000000000012 Base zfree # echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events # cat uprobe_events p:uprobes/p_zsh_0x46420 /bin/zsh:0x0000000000046420 # echo 1 > events/uprobes/enable # sleep 20 # echo 0 > events/uprobes/enable # cat trace # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | zsh-24842 [006] 258544.995456: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79 zsh-24842 [007] 258545.000270: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79 zsh-24842 [002] 258545.043929: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79 zsh-24842 [004] 258547.046129: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79 Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120411103043.GB29437@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Move parts of trace_kprobe.c that can be shared with upcoming trace_uprobe.c. Common code to kernel/trace/trace_probe.h and kernel/trace/trace_probe.c. There are no functional changes. Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120409091144.8343.76218.sendpatchset@srdronam.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
is_delete and is_return can take utmost 2 values and are better of being a boolean than a int. There are no functional changes. Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120409091133.8343.65289.sendpatchset@srdronam.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 4月, 2012 2 次提交
-
-
由 Srikar Dronamraju 提交于
Uprobes has a callback (uprobe_munmap()) in the unmap path to maintain the uprobes count. In the exit path this callback gets called in unlink_file_vma(). However by the time unlink_file_vma() is called, the pages would have been unmapped (in unmap_vmas()) and the task->rss_stat counts accounted (in zap_pte_range()). If the exiting process has probepoints, uprobe_munmap() checks if the breakpoint instruction was around before decrementing the probe count. This results in a file backed page being reread by uprobe_munmap() and hence it does not find the breakpoint. This patch fixes this problem by moving the callback to unmap_single_vma(). Since unmap_single_vma() may not unmap the complete vma, add start and end parameters to uprobe_munmap(). This bug became apparent courtesy of commit c3f0327f ("mm: add rss counters consistency check"). Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120411103527.23245.9835.sendpatchset@srdronam.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Background page replacement logic adds a new anonymous page instead of a file backed (while inserting a breakpoint) / anonymous page (while removing a breakpoint). Hence the uprobes logic should take care to update the task->ss_stat counters accordingly. This bug became apparent courtesy of commit c3f0327f ("mm: add rss counters consistency check"). Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120411103516.23245.2700.sendpatchset@srdronam.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 4月, 2012 1 次提交
-
-
由 Stephen Boyd 提交于
Many users of debugfs copy the implementation of default_open() when they want to support a custom read/write function op. This leads to a proliferation of the default_open() implementation across the entire tree. Now that the common implementation has been consolidated into libfs we can replace all the users of this function with simple_open(). This replacement was done with the following semantic patch: <smpl> @ open @ identifier open_f != simple_open; identifier i, f; @@ -int open_f(struct inode *i, struct file *f) -{ ( -if (i->i_private) -f->private_data = i->i_private; | -f->private_data = i->i_private; ) -return 0; -} @ has_open depends on open @ identifier fops; identifier open.open_f; @@ struct file_operations fops = { ... -.open = open_f, +.open = simple_open, ... }; </smpl> [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 4月, 2012 1 次提交
-
-
由 Kees Cook 提交于
Commit bfdc0b49 adds code to restrict access to dmesg_restrict, however, it incorrectly alters kptr_restrict rather than dmesg_restrict. The original patch from Richard Weinberger (https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as expected, and so the patch seems to have been misapplied. This adds the CAP_SYS_ADMIN check to both dmesg_restrict and kptr_restrict, since both are sensitive. Reported-by: NPhillip Lougher <plougher@redhat.com> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Acked-by: NRichard Weinberger <richard@nod.at> Cc: stable@vger.kernel.org Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 02 4月, 2012 1 次提交
-
-
由 Paul Gortmaker 提交于
Builds of the MIPS platform ip32_defconfig fails as of commit 0195c002 ("Merge tag 'split-asm_system_h ...") because MIPS xchg() macro uses BUILD_BUG_ON and it was moved in commit b81947c6 ("Disintegrate asm/system.h for MIPS"). The root cause is that the system.h split wasn't tested on a baseline with commit 6c03438e ("kernel.h: doesn't explicitly use bug.h, so don't include it.") Since this file uses BUG code in several other places besides the xchg call, simply make the inclusion explicit. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 3月, 2012 4 次提交
-
-
由 Srikar Dronamraju 提交于
Maintain a per-mm counter: number of uprobes that are inserted on this process address space. This counter can be used at probe hit time to determine if we need a lookup in the uprobes rbtree. Everytime a probe gets inserted successfully, the probe count is incremented and everytime a probe gets removed, the probe count is decremented. The new uprobe_munmap hook ensures the count is correct on a unmap or remap of a region. We expect that once a uprobe_munmap() is called, the vma goes away. So uprobe_unregister() finding a probe to unregister would either mean unmap event hasnt occurred yet or a mmap event on the same executable file occured after a unmap event. Additionally, uprobe_mmap hook now also gets called: a. on every executable vma that is COWed at fork. b. a vma of interest is newly mapped; breakpoint insertion also happens at the required address. On process creation, make sure the probes count in the child is set correctly. Special cases that are taken care include: a. mremap b. VM_DONTCOPY vmas on fork() c. insertion/removal races in the parent during fork(). Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120330182646.10018.85805.sendpatchset@srdronam.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srikar Dronamraju 提交于
Uprobes executes the original instruction at a probed location out of line. For this, we allocate a page (per mm) upon the first uprobe hit, in the process user address space, divide it into slots that are used to store the actual instructions to be singlestepped. These slots are known as xol (execution out of line) slots. Care is taken to ensure that the allocation is in an unmapped area as close to the top of the user address space as possible, with appropriate permission settings to keep selinux like frameworks happy. Upon a uprobe hit, a free slot is acquired, and is released after the singlestep completes. Lots of improvements courtesy suggestions/inputs from Peter and Oleg. [ Folded a fix for build issue on powerpc fixed and reported by Stephen Rothwell. ] Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Anton Arapov <anton@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120330182631.10018.48175.sendpatchset@srdronam.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Srivatsa S. Bhat 提交于
The function for_each_cpu_mask() expects a *pointer* to struct cpumask as its second argument, whereas select_fallback_rq() passes the value itself. And moreover, for_each_cpu_mask() has been marked as obselete in include/linux/cpumask.h. So move to the more appropriate for_each_cpu() variant. Reported-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Dave Jones <davej@redhat.com> Cc: Liu Chuansheng <chuansheng.liu@intel.com> Cc: vapier@gentoo.org Cc: rusty@rustcorp.com.au Link: http://lkml.kernel.org/r/4F75BED4.9050005@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiang Liu 提交于
irq_move_masked_irq() checks the return code of chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is also a valid return code, which is there to avoid a redundant copy of the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid the redundant copy, we also fail to adjust the thread affinity of an eventually threaded interrupt handler. Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return values correctly by checking the valid return values seperately. Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Keping Chen <chenkeping@huawei.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 30 3月, 2012 4 次提交
-
-
由 Tejun Heo 提交于
61d1d219 "cgroup: remove extra calls to find_existing_css_set" made cgroup_task_migrate() return void. An unfortunate side effect was that cgroup_attach_task() was depending on that function's return value to clear its @retval on the success path. On cgroup mounts without any subsystem with ->can_attach() callback, cgroup_attach_task() ended up returning @retval without initializing it on success. For some reason, gcc failed to warn about it and it didn't cause cgroup_attach_task() to return non-zero value in many cases, probably due to difference in register allocation. When the problem materializes, systemd fails to populate /systemd cgroup mount and fails to boot. Fix it by initializing @retval to zero on declaration. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NJiri Kosina <jkosina@suse.cz> LKML-Reference: <alpine.LNX.2.00.1203282354440.25526@pobox.suse.cz> Reviewed-by: NMandeep Singh Baines <msb@chromium.org> Acked-by: NLi Zefan <lizefan@huawei.com>
-
由 Jason Wessel 提交于
There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Cc: stable@vger.kernel.org # >= 2.6.36 Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
The Smatch tool warned that the change from commit b8adde8d (kdb: Avoid using dbg_io_ops until it is initialized) should add another null check later in the kdb_printf(). It is worth noting that the second use of dbg_io_ops->is_console is protected by the KDB_PAGER state variable which would only get set when kdb is fully active and initialized. If we ever encounter changes or defects in the KDB_PAGER state we do not want to crash the kernel in a kdb_printf/printk. CC: Tim Bird <tim.bird@am.sony.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Grant Likely 提交于
The debugfs code is really generic for all platforms. This patch removes the powerpc-specific directory reference and makes it available to all architectures. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 29 3月, 2012 24 次提交
-
-
由 Steffen Klassert 提交于
We don't remove the cpu that went offline from our cpumasks on cpu hotplug. This got lost somewhere along the line, so restore it. This fixes a hang of the padata instance on cpu hotplug. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Steffen Klassert 提交于
We use the active cpumask to determine the superset of cpus to use for parallelization. However, the active cpumask is for internal usage of the scheduler and therefore not the appropriate cpumask for these purposes. So use the online cpumask instead. Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Steffen Klassert 提交于
Add a reference to the padata api documentation at Documentation/padata.txt Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Kees Cook 提交于
Notify get_robust_list users that the syscall is going away. Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: kernel-hardening@lists.openwall.com Cc: spender@grsecurity.net Link: http://lkml.kernel.org/r/20120323190855.GA27213@www.outflux.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Kees Cook 提交于
It was possible to extract the robust list head address from a setuid process if it had used set_robust_list(), allowing an ASLR info leak. This changes the permission checks to be the same as those used for similar info that comes out of /proc. Running a setuid program that uses robust futexes would have had: cred->euid != pcred->euid cred->euid == pcred->uid so the old permissions check would allow it. I'm not aware of any setuid programs that use robust futexes, so this is just a preventative measure. (This patch is based on changes from grsecurity.) Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David Howells <dhowells@redhat.com> Cc: Serge E. Hallyn <serge.hallyn@canonical.com> Cc: kernel-hardening@lists.openwall.com Cc: spender@grsecurity.net Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Prarit Bhargava 提交于
We respect node affinity of devices already in the irq descriptor allocation, but we ignore it for the initial interrupt affinity setup, so the interrupt might be routed to a different node. Restrict the default affinity mask to the node on which the irq descriptor is allocated. [ tglx: Massaged changelog ] Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/1332788538-17425-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alexander Gordeev 提交于
The only place irq_finalize_oneshot() is called with force parameter set is the threaded handler error exit path. But IRQTF_RUNTHREAD is dropped at this point and irq_wake_thread() is not going to set it again, since PF_EXITING is set for this thread already. So irq_finalize_oneshot() will drop the threads bit in threads_oneshot anyway and hence the force parameter is superfluous. Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120321162234.GP24806@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Alexander Gordeev 提交于
exit_irq_thread() clears IRQTF_RUNTHREAD flag and drops the thread's bit in desc->threads_oneshot then. The bit must not be set again in between and it does not, since irq_wake_thread() sees PF_EXITING flag first and returns. Due to above the order or checking PF_EXITING and IRQTF_RUNTHREAD flags in irq_wake_thread() is important. This change just makes it more visible in the source code. Signed-off-by: NAlexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120321162212.GO24806@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Stephen Boyd 提交于
If schedule is called from an interrupt handler __schedule_bug() will call show_regs() with the registers saved during the interrupt handling done in do_IRQ(). This means we'll see the registers and the backtrace for the process that was interrupted and not the full backtrace explaining who called schedule(). This is due to 838225b4 ("sched: use show_regs() to improve __schedule_bug() output", 2007-10-24) which improperly assumed that get_irq_regs() would return the registers for the current stack because it is being called from within an interrupt handler. Simply remove the show_reg() code so that we dump a backtrace for the interrupt handler that called schedule(). [ I ran across this when I was presented with a scheduling while atomic log with a stacktrace pointing at spin_unlock_irqrestore(). It made no sense and I had to guess what interrupt handler could be called and poke around for someone calling schedule() in an interrupt handler. A simple test of putting an msleep() in an interrupt handler works better with this patch because you can actually see the msleep() call in the backtrace. ] Also-reported-by: NChris Metcalf <cmetcalf@tilera.com> Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Cc: Satyam Sharma <satyam@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1332979847-27102-1-git-send-email-sboyd@codeaurora.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rusty Russell 提交于
This has been obsolescent for a while, fix documentation and misc comments. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Daniel Lezcano 提交于
In the case of a child pid namespace, rebooting the system does not really makes sense. When the pid namespace is used in conjunction with the other namespaces in order to create a linux container, the reboot syscall leads to some problems. A container can reboot the host. That can be fixed by dropping the sys_reboot capability but we are unable to correctly to poweroff/ halt/reboot a container and the container stays stuck at the shutdown time with the container's init process waiting indefinitively. After several attempts, no solution from userspace was found to reliabily handle the shutdown from a container. This patch propose to make the init process of the child pid namespace to exit with a signal status set to : SIGINT if the child pid namespace called "halt/poweroff" and SIGHUP if the child pid namespace called "reboot". When the reboot syscall is called and we are not in the initial pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART", and "RESTART2". Otherwise we return EINVAL. Returning EINVAL is also an easy way to check if this feature is supported by the kernel when invoking another 'reboot' option like CAD. By this way the parent process of the child pid namespace knows if it rebooted or not and can take the right decision. Test case: ========== #include <alloca.h> #include <stdio.h> #include <sched.h> #include <unistd.h> #include <signal.h> #include <sys/reboot.h> #include <sys/types.h> #include <sys/wait.h> #include <linux/reboot.h> static int do_reboot(void *arg) { int *cmd = arg; if (reboot(*cmd)) printf("failed to reboot(%d): %m\n", *cmd); } int test_reboot(int cmd, int sig) { long stack_size = 4096; void *stack = alloca(stack_size) + stack_size; int status; pid_t ret; ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd); if (ret < 0) { printf("failed to clone: %m\n"); return -1; } if (wait(&status) < 0) { printf("unexpected wait error: %m\n"); return -1; } if (!WIFSIGNALED(status)) { printf("child process exited but was not signaled\n"); return -1; } if (WTERMSIG(status) != sig) { printf("signal termination is not the one expected\n"); return -1; } return 0; } int main(int argc, char *argv[]) { int status; status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1); if (status >= 0) { printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n"); return 1; } printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n"); return 0; } [akpm@linux-foundation.org: tweak and add comments] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Tested-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
Use bitmap_set() instead of using set_bit() for each bit. This conversion is valid because the bitmap is private in the function call and atomic bitops were unnecessary. This also includes minor change. - Use bitmap_copy() for shorter typing Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhenzhong Duan 提交于
When using crashkernel=2M-256M, the kernel doesn't give any warning. This is misleading sometimes. Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
nommu platforms don't have very interesting swapper_pg_dir pointers and usually just #define them to NULL, meaning that we can't include them in the vmcoreinfo on the kexec crash path. This patch only saves the swapper_pg_dir if we have an MMU. Signed-off-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NSimon Horman <horms@verge.net.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gilad Ben-Yossef 提交于
Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and calculates the cpumask of cpus to IPI by calling a function supplied as a parameter in order to determine whether to IPI each specific cpu. The function works around allocation failure of cpumask variable in CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time via smp_call_function_single(). The function is useful since it allows to seperate the specific code that decided in each case whether to IPI a specific cpu for a specific request from the common boilerplate code of handling creating the mask, handling failures etc. [akpm@linux-foundation.org: s/gfpflags/gfp_flags/] [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func'] [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment] Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by: NMichal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Reviewed-by: N"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gilad Ben-Yossef 提交于
We have lots of infrastructure in place to partition multi-core systems such that we have a group of CPUs that are dedicated to specific task: cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter. Still, kernel code will at times interrupt all CPUs in the system via IPIs for various needs. These IPIs are useful and cannot be avoided altogether, but in certain cases it is possible to interrupt only specific CPUs that have useful work to do and not the entire system. This patch set, inspired by discussions with Peter Zijlstra and Frederic Weisbecker when testing the nohz task patch set, is a first stab at trying to explore doing this by locating the places where such global IPI calls are being made and turning the global IPI into an IPI for a specific group of CPUs. The purpose of the patch set is to get feedback if this is the right way to go for dealing with this issue and indeed, if the issue is even worth dealing with at all. Based on the feedback from this patch set I plan to offer further patches that address similar issue in other code paths. This patch creates an on_each_cpu_mask() and on_each_cpu_cond() infrastructure API (the former derived from existing arch specific versions in Tile and Arm) and uses them to turn several global IPI invocation to per CPU group invocations. Core kernel: on_each_cpu_mask() calls a function on processors specified by cpumask, which may or may not include the local processor. You must not call this function with disabled interrupts or from a hardware interrupt handler or from a bottom half handler. arch/arm: Note that the generic version is a little different then the Arm one: 1. It has the mask as first parameter 2. It calls the function on the calling CPU with interrupts disabled, but this should be OK since the function is called on the other CPUs with interrupts disabled anyway. arch/tile: The API is the same as the tile private one, but the generic version also calls the function on the with interrupts disabled in UP case This is OK since the function is called on the other CPUs with interrupts disabled. Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: NChristoph Lameter <cl@linux.com> Acked-by: NChris Metcalf <cmetcalf@tilera.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by: NMichal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 MyungJoo Ham 提交于
The new API, pm_qos_update_request_timeout() is to provide a timeout with pm_qos_update_request. For example, pm_qos_update_request_timeout(req, 100, 1000), means that QoS request on req with value 100 will be active for 1000 microseconds. After 1000 microseconds, the QoS request thru req is reset. If there were another pm_qos_update_request(req, x) during the 1000 us, this new request with value x will override as this is another request on the same req handle. A new request on the same req handle will always override the previous request whether it is the conventional request or it is the new timeout request. Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Acked-by: NMark Gross <markgross@thegnar.org> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Rafael J. Wysocki 提交于
There is a race condition between the freezer and request_firmware() such that if request_firmware() is run on one CPU and freeze_processes() is run on another CPU and usermodehelper_disable() called by it succeeds to grab umhelper_sem for writing before usermodehelper_read_trylock() called from request_firmware() acquires it for reading, the request_firmware() will fail and trigger a WARN_ON() complaining that it was called at a wrong time. However, in fact, it wasn't called at a wrong time and freeze_processes() simply happened to be executed simultaneously. To avoid this race, at least in some cases, modify usermodehelper_read_trylock() so that it doesn't fail if the freezing of tasks has just started and hasn't been completed yet. Instead, during the freezing of tasks, it will try to freeze the task that has called it so that it can wait until user space is thawed without triggering the scary warning. For this purpose, change usermodehelper_disabled so that it can take three different values, UMH_ENABLED (0), UMH_FREEZING and UMH_DISABLED. The first one means that usermode helpers are enabled, the last one means "hard disable" (i.e. the system is not ready for usermode helpers to be used) and the second one is reserved for the freezer. Namely, when freeze_processes() is started, it sets usermodehelper_disabled to UMH_FREEZING which tells usermodehelper_read_trylock() that it shouldn't fail just yet and should call try_to_freeze() if woken up and cannot return immediately. This way all freezable tasks that happen to call request_firmware() right before freeze_processes() is started and lose the race for umhelper_sem with it will be frozen and will sleep until thaw_processes() unsets usermodehelper_disabled. [For the non-freezable callers of request_firmware() the race for umhelper_sem against freeze_processes() is unfortunately unavoidable.] Reported-by: NStephen Boyd <sboyd@codeaurora.org> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org
-
由 Rafael J. Wysocki 提交于
The core suspend/hibernation code calls usermodehelper_disable() to avoid race conditions between the freezer and the starting of usermode helpers and each code path has to do that on its own. However, it is always called right before freeze_processes() and usermodehelper_enable() is always called right after thaw_processes(). For this reason, to avoid code duplication and to make the connection between usermodehelper_disable() and the freezer more visible, make freeze_processes() call it and remove the direct usermodehelper_disable() and usermodehelper_enable() calls from all suspend/hibernation code paths. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org
-
由 Rafael J. Wysocki 提交于
There is no reason to call usermodehelper_disable() before creating memory bitmaps in hibernate() and software_resume(), so call it right before freeze_processes(), in accordance with the other suspend and hibernation code. Consequently, call usermodehelper_enable() right after the thawing of tasks rather than after freeing the memory bitmaps. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org
-
由 Rafael J. Wysocki 提交于
If firmware is requested asynchronously, by calling request_firmware_nowait(), there is no reason to fail the request (and warn the user) when the system is (presumably temporarily) unready to handle it (because user space is not available yet or frozen). For this reason, introduce an alternative routine for read-locking umhelper_sem, usermodehelper_read_lock_wait(), that will wait for usermodehelper_disabled to be unset (possibly with a timeout) and make request_firmware_work_func() use it instead of usermodehelper_read_trylock(). Accordingly, modify request_firmware() so that it uses usermodehelper_read_trylock() to acquire umhelper_sem and remove the code related to that lock from _request_firmware(). Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org
-
由 Rafael J. Wysocki 提交于
Instead of two functions, read_lock_usermodehelper() and usermodehelper_is_disabled(), used in combination, introduce usermodehelper_read_trylock() that will only return with umhelper_sem held if usermodehelper_disabled is unset (and will return -EAGAIN otherwise) and make _request_firmware() use it. Rename read_unlock_usermodehelper() to usermodehelper_read_unlock() to follow the naming convention of the new function. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org
-
由 David Howells 提交于
Remove all #inclusions of asm/system.h preparatory to splitting and killing it. Performed with the following command: perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *` Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
asm/system.h is a cause of circular dependency problems because it contains commonly used primitive stuff like barrier definitions and uncommonly used stuff like switch_to() that might require MMU definitions. asm/system.h has been disintegrated by this point on all arches into the following common segments: (1) asm/barrier.h Moved memory barrier definitions here. (2) asm/cmpxchg.h Moved xchg() and cmpxchg() here. #included in asm/atomic.h. (3) asm/bug.h Moved die() and similar here. (4) asm/exec.h Moved arch_align_stack() here. (5) asm/elf.h Moved AT_VECTOR_SIZE_ARCH here. (6) asm/switch_to.h Moved switch_to() here. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-