- 19 7月, 2014 5 次提交
-
-
由 Corey Minyard 提交于
The code for resizing the trace ring buffers has to run the per-cpu resize on the CPU itself. The code was using preempt_off() and running the code for the current CPU directly, otherwise calling schedule_work_on(). At least on RT this could result in the following: |BUG: sleeping function called from invalid context at kernel/rtmutex.c:673 |in_atomic(): 1, irqs_disabled(): 0, pid: 607, name: bash |3 locks held by bash/607: |CPU: 0 PID: 607 Comm: bash Not tainted 3.12.15-rt25+ #124 |(rt_spin_lock+0x28/0x68) |(free_hot_cold_page+0x84/0x3b8) |(free_buffer_page+0x14/0x20) |(rb_update_pages+0x280/0x338) |(ring_buffer_resize+0x32c/0x3dc) |(free_snapshot+0x18/0x38) |(tracing_set_tracer+0x27c/0x2ac) probably via |cd /sys/kernel/debug/tracing/ |echo 1 > events/enable ; sleep 2 |echo 1024 > buffer_size_kb If we just always use schedule_work_on(), there's no need for the preempt_off(). So do that. Link: http://lkml.kernel.org/p/1405537633-31518-1-git-send-email-cminyard@mvista.comReported-by: NStanislav Meduna <stano@meduna.org> Signed-off-by: NCorey Minyard <cminyard@mvista.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
All users of function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST have been removed. We can safely remove them from the kernel. Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
function_trace_stop is no longer used to stop function tracing. Remove the check from __ftrace_ops_list_func(). Also, call FTRACE_WARN_ON() instead of setting function_trace_stop if a ops has no func to call. Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
When function tracing is being updated function_trace_stop is set to keep from tracing the updates. This was fine when function tracing was done from stop machine. But it is no longer done that way and this can cause real tracing to be missed. Remove it. Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
All archs now use ftrace_graph_is_dead() to stop function graph tracing. Remove the usage of ftrace_stop() as that is no longer needed. Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 17 7月, 2014 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
ftrace_stop() is going away as it disables parts of function tracing that affects users that should not be affected. But ftrace_graph_stop() is built on ftrace_stop(). Here's another example of killing all of function tracing because something went wrong with function graph tracing. Instead of disabling all users of function tracing on function graph error, disable only function graph tracing. A new function is created called ftrace_graph_is_dead(). This is called in strategic paths to prevent function graph from doing more harm and allowing at least a warning to be printed before the system crashes. NOTE: ftrace_stop() is still used until all the archs are converted over to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed. Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
ftrace_stop() and ftrace_start() were added to the suspend and hibernate process because there was some function within the work flow that caused the system to reboot if it was traced. This function has recently been found (restore_processor_state()). Now there's no reason to disable function tracing while we are going into suspend or hibernate, which means that being able to trace this will help tremendously in debugging any issues with suspend or hibernate. This also means that the ftrace_stop/start() functions can be removed and simplify the function tracing code a bit. Link: http://lkml.kernel.org/r/1518201.VD9cU33jRU@vostro.rjw.lanAcked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 16 7月, 2014 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Currently if an arch supports function graph tracing, the core code will just assign the function graph trampoline to the function graph addr that gets called. But as the old method for function graph tracing always calls the function trampoline first and that calls the function graph trampoline, some archs may have the function graph trampoline dependent on operations that were done in the function trampoline. This causes function graph tracer to break on those archs. Instead of having the default be to set the function graph ftrace_ops to the function graph trampoline, have it instead just set it to zero which will keep it from jumping to a trampoline that is not set up to be jumped directly too. Link: http://lkml.kernel.org/r/53BED155.9040607@nvidia.comReported-by: NTuomas Tynkkynen <ttynkkynen@nvidia.com> Tested-by: NTuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 02 7月, 2014 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Disabling reading and writing to the trace file should not be able to disable all function tracing callbacks. There's other users today (like kprobes and perf). Reading a trace file should not stop those from happening. Cc: stable@vger.kernel.org # 3.0+ Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 01 7月, 2014 21 次提交
-
-
由 Namhyung Kim 提交于
It was missing the description of set_graph_notrace file. Add it. Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
When there's no entry in set_ftrace_notrace, it'll print nothing, but it's better to print something like below like set_graph_notrace does: #### no functions disabled #### Link: http://lkml.kernel.org/p/1402644246-4649-1-git-send-email-namhyung@kernel.orgReported-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
When there's no entry in set_graph_notrace, it'll print below message #### all functions enabled #### While this is technically correct, it's better to print like below: #### no functions disabled #### Link: http://lkml.kernel.org/p/1402590233-22321-3-git-send-email-namhyung@kernel.orgReported-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
The ftrace_graph_notrace option is for specifying notrace filter for function graph tracer at boot time. It can be altered after boot using set_graph_notrace file on the debugfs. Link: http://lkml.kernel.org/p/1402590233-22321-2-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Fabian Frederick 提交于
Convert pr_warning to standard pr_warn Define pr_fmt(fmt) fmt to avoid any future default fmt definition Link: http://lkml.kernel.org/p/1402141388-21144-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
When a filter file is open for writing and O_TRUNC is set, there's no need to copy and free the filter entries. Link: http://lkml.kernel.org/p/1402474014-28655-2-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
As struct ftrace_page is managed in a single linked list, it should free from the start page. Link: http://lkml.kernel.org/p/1402474014-28655-1-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
It seems like it's a leftover from commit 4104d326 ("ftrace: Remove global function list and call function directly"). As it isn't updated at all, checking its value is meaningless. Let's get rid of it. Link: http://lkml.kernel.org/p/1402584972-17824-1-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
There's several locations in the kernel that open code the calculation of the next location in the trace_seq buffer. This is usually done with p->buffer + p->len Instead of having this open coded, supply a helper function in the header to do it for them. This function is called trace_seq_buffer_ptr(). Link: http://lkml.kernel.org/p/20140626220129.452783019@goodmis.orgAcked-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Fabian Frederick 提交于
This fixes checkpatch warning: "WARNING: debugfs_remove(NULL) is safe this check is probably not required" Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.beSigned-off-by: NFabian Frederick <fabf@skynet.be> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
trace_seq_reserve() has no users in the kernel, it just wastes space. Remove it. Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
Currently trace_seq_putmem_hex() can only take as a parameter a pointer to something that is 8 bytes or less, otherwise it will overflow the buffer. This is protected by a macro that encompasses the call to trace_seq_putmem_hex() that has a BUILD_BUG_ON() for the variable before it is passed in. This is not very robust and if trace_seq_putmem_hex() ever gets used outside that macro it will cause issues. Instead of only being able to produce a hex output of memory that is for a single word, change it to be more robust and allow any size input. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
For using trace_seq_*() functions in NMI context, I posted a patch to move it to the lib/ directory. This caused Andrew Morton to take a look at the code. He went through and gave a lot of comments about missing kernel doc, inconsistent types for the save variable, mix match of EXPORT_SYMBOL_GPL() and EXPORT_SYMBOL() as well as missing EXPORT_SYMBOL*()s. There were a few comments about the way variables were being compared (int vs uint). All these were good review comments and should be implemented regardless of if trace_seq.c should be moved to lib/ or not. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
The trace_seq_*() functions are a nice utility that allows users to manipulate buffers with printf() like formats. It has its own trace_seq.h header in include/linux and should be in its own file. Being tied with trace_output.c is rather awkward. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Simplify ftrace_hash_disable/enable path in ftrace_hash_move for hardening the process if the memory allocation failed. Link: http://lkml.kernel.org/p/20140617110442.15167.81076.stgit@kbuild-fedora.novalocalSigned-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
The enabled_functions is used to help debug the dynamic function tracing. Adding what trampolines are attached to files is useful for debugging. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
Function graph tracing is a bit different than the function tracers, as it is processed after either the ftrace_caller or ftrace_regs_caller and we only have one place to modify the jump to ftrace_graph_caller, the jump needs to happen after the restore of registeres. The function graph tracer is dependent on the function tracer, where even if the function graph tracing is going on by itself, the save and restore of registers is still done for function tracing regardless of if function tracing is happening, before it calls the function graph code. If there's no function tracing happening, it is possible to just call the function graph tracer directly, and avoid the wasted effort to save and restore regs for function tracing. This requires adding new flags to the dyn_ftrace records: FTRACE_FL_TRAMP FTRACE_FL_TRAMP_EN The first is set if the count for the record is one, and the ftrace_ops associated to that record has its own trampoline. That way the mcount code can call that trampoline directly. In the future, trampolines can be added to arbitrary ftrace_ops, where you can have two or more ftrace_ops registered to ftrace (like kprobes and perf) and if they are not tracing the same functions, then instead of doing a loop to check all registered ftrace_ops against their hashes, just call the ftrace_ops trampoline directly, which would call the registered ftrace_ops function directly. Without this patch perf showed: 0.05% hackbench [kernel.kallsyms] [k] ftrace_caller 0.05% hackbench [kernel.kallsyms] [k] arch_local_irq_save 0.05% hackbench [kernel.kallsyms] [k] native_sched_clock 0.04% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit 0.04% hackbench [kernel.kallsyms] [k] preempt_trace 0.04% hackbench [kernel.kallsyms] [k] prepare_ftrace_return 0.04% hackbench [kernel.kallsyms] [k] __this_cpu_preempt_check 0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller See that the ftrace_caller took up more time than the ftrace_graph_caller did. With this patch: 0.05% hackbench [kernel.kallsyms] [k] __buffer_unlock_commit 0.04% hackbench [kernel.kallsyms] [k] call_filter_check_discard 0.04% hackbench [kernel.kallsyms] [k] ftrace_graph_caller 0.04% hackbench [kernel.kallsyms] [k] sched_clock The ftrace_caller is no where to be found and ftrace_graph_caller still takes up the same percentage. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Oleg Nesterov 提交于
The usage of uprobe_buffer_enable() added by dcad1a20 is very wrong, 1. uprobe_buffer_enable() and uprobe_buffer_disable() are not balanced, _enable() should be called only if !enabled. 2. If uprobe_buffer_enable() fails probe_event_enable() should clear tp.flags and free event_file_link. 3. If uprobe_register() fails it should do uprobe_buffer_disable(). Link: http://lkml.kernel.org/p/20140627170146.GA18332@redhat.comAcked-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Fixes: dcad1a20 "tracing/uprobes: Fetch args before reserving a ring buffer" Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Oleg Nesterov 提交于
I do not know why dd9fa555 "tracing/uprobes: Move argument fetching to uprobe_dispatcher()" added the UPROBE_HANDLER_REMOVE, but it looks wrong. OK, perhaps it makes sense to avoid store_trace_args() if the tracee is nacked by uprobe_perf_filter(). But then we should kill the same code in uprobe_perf_func() and unify the TRACE/PROFILE filtering (we need to do this anyway to mix perf/ftrace). Until then this code actually adds the pessimization because uprobe_perf_filter() will be called twice and return T in likely case. Link: http://lkml.kernel.org/p/20140627170143.GA18329@redhat.comAcked-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Oleg Nesterov 提交于
Add WARN_ON's into uprobe_unregister() and uprobe_apply() to ensure that nobody tries to play with the dead uprobe/consumer. This helps to catch the bugs like the one fixed by the previous patch. In the longer term we should fix this poorly designed interface. uprobe_register() should return "struct uprobe *" which should be passed to apply/unregister. Plus other semantic changes, see the changelog in commit 41ccba02. Link: http://lkml.kernel.org/p/20140627170140.GA18322@redhat.comAcked-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Oleg Nesterov 提交于
This reverts commit 43fe9891. This patch is very wrong. Firstly, this change leads to unbalanced uprobe_unregister(). Just for example, # perf probe -x /lib/libc.so.6 syscall # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable # perf record -e probe_libc:syscall whatever after that uprobe is dead (unregistered) but the user of ftrace/perf can't know this, and it looks as if nobody hits this probe. This would be easy to fix, but there are other reasons why it is not simple to mix ftrace and perf. If nothing else, they can't share the same ->consumer.filter. This is fixable too, but probably we need to fix the poorly designed uprobe_register() interface first. At least "register" and "apply" should be clearly separated. Link: http://lkml.kernel.org/p/20140627170136.GA18319@redhat.com Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: "zhangwei(Jovi)" <jovi.zhangwei@huawei.com> Cc: stable@vger.kernel.org # v3.14 Acked-by: NNamhyung Kim <namhyung@kernel.org> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 30 6月, 2014 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The ftrace dynamic record has a flags element that also has a counter. Instead of hard coding "rec->flags & ~FTRACE_FL_MASK" all over the place. Use a macro instead. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
When registering a function callback for the function tracer, the ops can specify if it wants to save full regs (like an interrupt would) for each function that it traces, or if it does not care about regs and just wants to have the fastest return possible. Once a ops has registered a function, if other ops register that function they all will receive the regs too. That's because it does the work once, it does it for everyone. Now if the ops wanting regs unregisters the function so that there's only ops left that do not care about regs, those ops will still continue getting regs and going through the work for it on that function. This is because the disabling of the rec counter only sees the ops registered, and does not see the ops that are still attached, and does not know if the current ops that are still attached want regs or not. To play it safe, it just keeps regs being processed until no function is registered anymore. Instead of doing that, check the ops that are still registered for that function and if none want regs for it anymore, then disable the processing of regs. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 24 6月, 2014 5 次提交
-
-
由 Aaron Tomlin 提交于
A 'softlockup' is defined as a bug that causes the kernel to loop in kernel mode for more than a predefined period to time, without giving other tasks a chance to run. Currently, upon detection of this condition by the per-cpu watchdog task, debug information (including a stack trace) is sent to the system log. On some occasions, we have observed that the "victim" rather than the actual "culprit" (i.e. the owner/holder of the contended resource) is reported to the user. Often this information has proven to be insufficient to assist debugging efforts. To avoid loss of useful debug information, for architectures which support NMI, this patch makes it possible to improve soft lockup reporting. This is accomplished by issuing an NMI to each cpu to obtain a stack trace. If NMI is not supported we just revert back to the old method. A sysctl and boot-time parameter is available to toggle this feature. [dzickus@redhat.com: add CONFIG_SMP in certain areas] [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations] [mq@suse.cz: fix warning] Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Signed-off-by: NDon Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NJan Moskyto Matejka <mq@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Oleg reports a division by zero error on zero-length write() to the percpu_pagelist_fraction sysctl: divide error: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000 RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120 RSP: 0018:ffff8800d87a3e78 EFLAGS: 00010246 RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010 RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060 R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800 FS: 00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0 Call Trace: proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xba/0x1e0 SyS_write+0x46/0xb0 tracesys+0xe1/0xe6 However, if the percpu_pagelist_fraction sysctl is set by the user, it is also impossible to restore it to the kernel default since the user cannot write 0 to the sysctl. This patch allows the user to write 0 to restore the default behavior. It still requires a fraction equal to or larger than 8, however, as stated by the documentation for sanity. If a value in the range [1, 7] is written, the sysctl will return EINVAL. This successfully solves the divide by zero issue at the same time. Signed-off-by: NDavid Rientjes <rientjes@google.com> Reported-by: NOleg Drokin <green@linuxhacker.ru> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Don Zickus 提交于
Peter Wu noticed the following splat on his machine when updating /proc/sys/kernel/watchdog_thresh: BUG: sleeping function called from invalid context at mm/slub.c:965 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180 #1: (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110 #2: (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80 Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110 CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x4e/0x7a __might_sleep+0x11d/0x190 kmem_cache_alloc_trace+0x4e/0x1e0 perf_event_alloc+0x55/0x440 perf_event_create_kernel_counter+0x26/0xe0 watchdog_nmi_enable+0x75/0x140 update_timers_all_cpus+0x53/0xa0 proc_dowatchdog+0xe4/0x110 proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xad/0x180 SyS_write+0x49/0xb0 system_call_fastpath+0x16/0x1b NMI watchdog: disabled (cpu0): hardware events not enabled What happened is after updating the watchdog_thresh, the lockup detector is restarted to utilize the new value. Part of this process involved disabling preemption. Once preemption was disabled, perf tried to allocate a new event (as part of the restart). This caused the above BUG_ON as you can't sleep with preemption disabled. The preemption restriction seemed agressive as we are not doing anything on that particular cpu, but with all the online cpus (which are protected by the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. Signed-off-by: NDon Zickus <dzickus@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Reported-by: NPeter Wu <peter@lekensteyn.nl> Tested-by: NPeter Wu <peter@lekensteyn.nl> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.13+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Petr Tesarik 提交于
To allow filtering of huge pages, makedumpfile must be able to identify them in the dump. This can be done by checking the appropriate page flag, so communicate its value to makedumpfile through the VMCOREINFO interface. There's only one small catch. Depending on how many page flags are available on a given architecture, this bit can be called PG_head or PG_compound. I sent a similar patch back in 2012, but Eric Biederman did not like using an #ifdef. So, this time I'm adding a common symbol (PG_head_mask) instead. See https://lkml.org/lkml/2012/11/28/91 for the previous version. Signed-off-by: NPetr Tesarik <ptesarik@suse.cz> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Shaohua Li <shli@kernel.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Srivatsa S. Bhat 提交于
There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async() | = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about "receiving an IPI on an offline CPU". One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CPU in the CPU_DYING stage[1], including those callbacks for which the source CPU's IPIs might not have been received on the outgoing CPU yet. Once we do this, an IPI that arrives late on the CPU going offline (either due to the race mentioned above, or due to hardware latencies) will be completely harmless, since the outgoing CPU would have executed all the queued callbacks before going offline. Overall, this fix (parts 1 and 2 put together) additionally guarantees that we will see a warning only when the *IPI-sender code* is buggy - that is, if it queues the callback _after_ the target CPU has gone offline. [1]. The CPU_DYING part needs a little more explanation: by the time we execute the CPU_DYING notifier callbacks, the CPU would have already been marked offline. But we want to flush out the pending callbacks at this stage, ignoring the fact that the CPU is offline. So restructure the IPI handler code so that we can by-pass the "is-cpu-offline?" check in this particular case. (Of course, the right solution here is to fix CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING notifiers, but this requires a lot of audit to ensure that this change doesn't break any existing code; hence lets go with the solution proposed above until that is done). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Tested-by: NSachin Kamat <sachin.kamat@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 6月, 2014 3 次提交
-
-
由 Oleg Nesterov 提交于
syscall_regfunc() ignores the kernel threads because "it has no effect", see cc3b13c1 "Don't trace kernel thread syscalls" which added this check. However, this means that a user-space task spawned by call_usermodehelper() will run without TIF_SYSCALL_TRACEPOINT if sys_tracepoint_refcount != 0. Remove this check. The unnecessary report from ret_from_fork path mentioned by cc3b13c1 is no longer possible, see See commit fb45550d "make sure that kernel_thread() callbacks call do_exit() themselves". A kernel_thread() callback can only return and take the int_ret_from_sys_call path after do_execve() succeeds, otherwise the kernel will crash. But in this case it is no longer a kernel thread and thus is needs TIF_SYSCALL_TRACEPOINT. Link: http://lkml.kernel.org/p/20140413185938.GD20668@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Oleg Nesterov 提交于
1. Remove _irqsafe from syscall_regfunc/syscall_unregfunc, read_lock(tasklist) doesn't need to disable irqs. 2. Change this code to avoid the deprecated do_each_thread() and use for_each_process_thread() (stolen from the patch from Frederic). 3. Change syscall_regfunc() to check PF_KTHREAD to skip the kernel threads, ->mm != NULL is the common mistake. Note: probably this check should be simply removed, needs another patch. [fweisbec@gmail.com: s/do_each_thread/for_each_process_thread/] Link: http://lkml.kernel.org/p/20140413185918.GC20668@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Oleg Nesterov 提交于
syscall_regfunc() and syscall_unregfunc() should set/clear TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race with copy_process() and miss the new child which was not added to the process/thread lists yet. Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT under tasklist. Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com Cc: stable@vger.kernel.org # 2.6.33 Fixes: a871bd33 "tracing: Add syscall tracepoints" Acked-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-