- 27 8月, 2009 1 次提交
-
-
由 David S. Miller 提交于
Steven Rostedt has suggested that Neil work with the tracing folks, trying to use TRACE_EVENT as the mechanism for implementation. And if that doesn't workout we can investigate other solutions such as that one which was tried here. This reverts the following 2 commits: 5a165657 ("net: skb ftracer - Add config option to enable new ftracer (v3)") 9ec04da7 ("net: skb ftracer - Add actual ftrace code to kernel (v3)") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 8月, 2009 1 次提交
-
-
由 Randy Dunlap 提交于
Fix trace_skb_sources build when CONFIG_NET is not enabled: kernel/built-in.o: In function `probe_skb_dequeue': trace_skb_sources.c:(.text+0xd9152): undefined reference to `init_net' trace_skb_sources.c:(.text+0xd9188): undefined reference to `dev_get_by_index' Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 8月, 2009 2 次提交
-
-
由 Neil Horman 提交于
skb allocation / consumption correlator Add ftracer module to kernel to print out a list that correlates a process id, an skb it read, and the numa nodes on wich the process was running when it was read along with the numa node the skbuff was allocated on. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Makefile | 1 trace.h | 19 ++++++ trace_skb_sources.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 174 insertions(+) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neil Horman 提交于
skb allocation / consumption corelator - Add config option This patch adds a Kconfig option to enable the addtition of the skb source tracer. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Kconfig | 10 ++++++++++ 1 file changed, 10 insertions(+) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2009 3 次提交
-
-
由 Frederic Weisbecker 提交于
This patch implements the kernel side support for ftrace event record sampling. A new counter sampling attribute is added: PERF_SAMPLE_TP_RECORD which requests ftrace events record sampling. In this case if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint fires, we emit the tracepoint binary record to the perfcounter event buffer, as a sample. Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf record: perf record -f -F 1 -a -e workqueue:workqueue_execution perf report -D 0x21e18 [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........ . 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!...... . 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve . 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1........... . 0040: e0 b1 31 81 ff ff ff ff ....... . 0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33 The raw ftrace binary record starts at offset 0020. Translation: struct trace_entry { type = 0x2b = 43; flags = 1; preempt_count = 2; pid = 0xa = 10; tgid = 0xa = 10; } thread_comm = "events/1" thread_pid = 0xa = 10; func = 0xffffffff8131b1e0 = flush_to_ldisc() What will come next? - Userspace support ('perf trace'), 'flight data recorder' mode for perf trace, etc. - The unconditional copy from the profiling callback brings some costs however if someone wants no such sampling to occur, and needs to be fixed in the future. For that we need to have an instant access to the perf counter attribute. This is a matter of a flag to add in the struct ftrace_event. - Take care of the events recursivity! Don't ever try to record a lock event for example, it seems some locking is used in the profiling fast path and lead to a tracing recursivity. That will be fixed using raw spinlock or recursivity protection. - [...] - Profit! :-) Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Adds possible second part to the assign argument of TP_EVENT(). TP_perf_assign( __perf_count(foo); __perf_addr(bar); ) Which, when specified make the swcounter increment with @foo instead of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address associated with the event) when this triggers a counter overflow. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Stanislaw Gruszka 提交于
When the process exits we don't have to run new cputimer nor use running one (as it not accounts when tsk->exit_state != 0) to get process CPU times. As there is only one thread we can just use CPU times fields from task and signal structs. Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roland McGrath <roland@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 8月, 2009 5 次提交
-
-
由 Tom Zanussi 提交于
If filter_add_subsystem_pred() fails due to ENOSPC or ENOMEM, the pred doesn't get freed, while as a side effect it does for other errors. Make it so the caller always frees the pred for any error. Signed-off-by: NTom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <1249746593.6453.32.camel@tropicana> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Tom Zanussi 提交于
Dan Carpenter sent me a fix to prevent pred from being used if it couldn't be allocated. I noticed the same problem also existed for the create_pred() case and added a fix for that. Reported-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NTom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <1249746549.6453.29.camel@tropicana> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Eric Dumazet 提交于
While looking at Jens Rosenboom bug report (http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from a dying "ps" program, we found following problem. clone() syscall has special support for TID of created threads. This support includes two features. One (CLONE_CHILD_SETTID) is to set an integer into user memory with the TID value. One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created thread dies. The integer location is a user provided pointer, provided at clone() time. kernel keeps this pointer value into current->clear_child_tid. At execve() time, we should make sure kernel doesnt keep this user provided pointer, as full user memory is replaced by a new one. As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user memory in forked processes. Following sequence could happen: 1) bash (or any program) starts a new process, by a fork() call that glibc maps to a clone( ... CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID ...) syscall 2) When new process starts, its current->clear_child_tid is set to a location that has a meaning only in bash (or initial program) context (&THREAD_SELF->tid) 3) This new process does the execve() syscall to start a new program. current->clear_child_tid is left unchanged (a non NULL value) 4) If this new program creates some threads, and initial thread exits, kernel will attempt to clear the integer pointed by current->clear_child_tid from mm_release() : if (tsk->clear_child_tid && !(tsk->flags & PF_SIGNALED) && atomic_read(&mm->mm_users) > 1) { u32 __user * tidptr = tsk->clear_child_tid; tsk->clear_child_tid = NULL; /* * We don't check the error code - if userspace has * not set up a proper pointer then tough luck. */ << here >> put_user(0, tidptr); sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0); } 5) OR : if new program is not multi-threaded, but spied by /proc/pid users (ps command for example), mm_users > 1, and the exiting program could corrupt 4 bytes in a persistent memory area (shm or memory mapped file) If current->clear_child_tid points to a writeable portion of memory of the new program, kernel happily and silently corrupts 4 bytes of memory, with unexpected effects. Fix is straightforward and should not break any sane program. Reported-by: NJens Rosenboom <jens@mcbone.net> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sonny Rao <sonnyrao@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xiao Guangrong 提交于
Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug, but only if CPUMASK_OFFSTACK=y, which is default to n. The bug was introduced by 8969a5ed ("generic-ipi: remove kmalloc()"). Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Dumazet 提交于
I noticed oprofile memleaked in linux-2.6 current tree, and tracked this ring-buffer leak. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> LKML-Reference: <4A7C06B9.2090302@gmail.com> Cc: stable@kernel.org Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 07 8月, 2009 2 次提交
-
-
由 Li Zefan 提交于
/proc/lock_stat is writable. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <4A7BE7B6.10904@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Brice Goglin reported this crash with per task precise stats: > I finally managed to test the threaded perfcounter statistics (thanks a > lot for implementing it). I am running 2.6.31-rc5 (with the AMD > magny-cours patches but I don't think they matter here). I am trying to > measure local/remote memory accesses per thread during the well-known > stream benchmark. It's compiled with OpenMP using 16 threads on a > quad-socket quad-core barcelona machine. > > Command line is: > /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf record -f -s > -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 ./stream > > It seems to work fine with a single -e <counter> on the command line > while it crashes when there are at least 2 of them. > It seems to work fine without -s as well. A silly copy-paste resulted in a messed up iteration which would cause the OOPS. Reported-by: NBrice Goglin <Brice.Goglin@inria.fr> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: NBrice Goglin <Brice.Goglin@inria.fr> LKML-Reference: <1249574786.32113.550.camel@twins> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 06 8月, 2009 5 次提交
-
-
由 Robert Richter 提交于
When calling rb_buffer_peek() from ring_buffer_consume() and a padding event is returned, the function rb_advance_reader() is called twice. This may lead to missing samples or under high workloads to the warning below. This patch fixes this. If a padding event is returned by rb_buffer_peek() it will be consumed by the calling function now. Also, I simplified some code in ring_buffer_consume(). ------------[ cut here ]------------ WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5() Hardware name: Anaheim Modules linked in: Pid: 29, comm: events/2 Tainted: G W 2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1 Call Trace: [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5 [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11 [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5 [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2 [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165 [<ffffffff8132749b>] sync_buffer+0xa5/0x401 [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165 [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78 [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78 [<ffffffff8104aa30>] worker_thread+0x113/0x1ac [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38 [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac [<ffffffff8104dc9a>] kthread+0x88/0x92 [<ffffffff8100bdba>] child_rip+0xa/0x20 [<ffffffff8104dc12>] ? kthread+0x0/0x92 [<ffffffff8100bdb0>] ? child_rip+0x0/0x20 ---[ end trace f561c0a58fcc89bd ]--- Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@kernel.org> Signed-off-by: NRobert Richter <robert.richter@amd.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Not all tracepoints are created equal, in specific the ftrace tracepoints are created with TRACE_EVENT_FORMAT() which does not generate the needed bits to tie them into perf counters. For those events, don't create the 'id' file and fail ->profile_enable when their ID is specified through other means. Reported-by: NChris Mason <chris.mason@oracle.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1249497664.5890.4.camel@laptop> [ v2: fix build error in the !CONFIG_EVENT_PROFILE case ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Darren Hart 提交于
In the event of a lock steal or owner died, rt_mutex_start_proxy_lock() will give the rt_mutex to the waiting task, but it fails to release the wait_lock. This leads to subsequent deadlocks when other tasks try to acquire the rt_mutex. I also removed a few extra blank lines that really spaced this routine out. I must have been high on the \n when I wrote this originally... Signed-off-by: NDarren Hart <dvhltc@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Dinakar Guniguntala <dino@in.ibm.com> Cc: John Stultz <johnstul@linux.vnet.ibm.com> LKML-Reference: <4A79D7F1.4000405@us.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Steven Rostedt 提交于
The commit: commit e0fdace1 Author: David Miller <davem@davemloft.net> Date: Fri Aug 1 01:11:22 2008 -0700 debug_locks: set oops_in_progress if we will log messages. Otherwise lock debugging messages on runqueue locks can deadlock the system due to the wakeups performed by printk(). Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NIngo Molnar <mingo@elte.hu> Will permanently set oops_in_progress on any lockdep failure. When this triggers it will cause any read from the ring buffer to permanently disable the ring buffer (not to mention no locking of printk). This patch removes the check. It keeps the print in NMI which makes sense. This is probably OK, since the ring buffer should not cause something to set oops_in_progress anyway. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
The function ring_buffer_discard_commit inversed the code path of the result of try_to_discard. It should skip incrementing the entry counter if try_to_discard succeeded. But instead, it increments the entry conder if it succeeded to discard, and does not increment it if it fails. The result of this bug is that filtering will make the stat counters incorrect. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 04 8月, 2009 1 次提交
-
-
由 Hiroshi Shimamoto 提交于
Prevent calling do_nanosleep() with clockid CLOCK_MONOTONIC_RAW, it may cause oops, such as NULL pointer dereference. Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com> Cc: <stable@kernel.org> LKML-Reference: <4A764FF3.50607@ct.jp.nec.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 02 8月, 2009 6 次提交
-
-
由 Gregory Haskins 提交于
Background: Several race conditions in the scheduler have cropped up recently, which Steven and I have tracked down using ftrace. The most recent one turns out to be a race in how the scheduler determines a suitable migration target for RT tasks, introduced recently with commit: commit 68e74568 Date: Tue Nov 25 02:35:13 2008 +1030 sched: convert struct cpupri_vec cpumask_var_t. The original design of cpupri allowed lockless readers to quickly determine a best-estimate target. Races between the pri_active bitmap and the vec->mask were handled in the original code because we would detect and return "0" when this occured. The design was predicated on the *effective* atomicity (*) of caching the result of cpus_and() between the cpus_allowed and the vec->mask. Commit 68e74568 changed the behavior such that vec->mask is accessed multiple times. This introduces a subtle race, the result of which means we can have a result that returns "1", but with an empty bitmap. *) yes, we know cpus_and() is not a locked operator across the entire composite array, but it is implicitly atomic on a per-word basis which is all the design required to work. Implementation: Rather than forgoing the lockless design, or reverting to a stack-based cpumask_t, we simply check for when the race has been encountered and continue processing in the event that the race is hit. This renders the removal race as if the priority bit had been atomically cleared as well, and allows the algorithm to execute correctly. Signed-off-by: NGregory Haskins <ghaskins@novell.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Steven Rostedt <srostedt@redhat.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145728.25226.92769.stgit@dev.haskins.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The latencytop and sleep accounting code assumes that any scheduler entity represents a task, this is not so. Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
In order to be able to distinguish between no samples due to inactivity and no samples due to task ended, Arjan asked for PERF_EVENT_EXIT events. This is useful to the boot delay instrumentation (bootchart) app. This patch changes the PERF_EVENT_FORK to be emitted on every clone, and adds PERF_EVENT_EXIT to be emitted on task exit, after the task's counters have been closed. This task tracing is controlled through: attr.comm || attr.mmap and through the new attr.task field. Suggested-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> [ cleaned up perf_counter.h a bit ] Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Currently the counter value returned by read() is the value of the parent counter, to which child counters are only fed back on child exit. Thus read() can return rather erratic (and meaningless) numbers depending on the state of the child processes. Change this by always iterating the full child hierarchy on read() and sum all counters. Suggested-by: NCorey Ashford <cjashfor@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Linus Torvalds 提交于
The previous commit ("do_sigaltstack: avoid copying 'stack_t' as a structure to user space") fixed a real bug. This one just cleans up the copy from user space to that gcc can generate better code for it (and so that it looks the same as the later copy back to user space). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Ulrich Drepper correctly points out that there is generally padding in the structure on 64-bit hosts, and that copying the structure from kernel to user space can leak information from the kernel stack in those padding bytes. Avoid the whole issue by just copying the three members one by one instead, which also means that the function also can avoid the need for a stack frame. This also happens to match how we copy the new structure from user space, so it all even makes sense. [ The obvious solution of adding a memset() generates horrid code, gcc does really stupid things. ] Reported-by: NUlrich Drepper <drepper@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 7月, 2009 1 次提交
-
-
由 Masami Hiramatsu 提交于
Use kernel_text_address() for checking probe address instead of __kernel_text_address(), because __kernel_text_address() returns true for init functions even after relaseing those functions. That will hit a BUG() in text_poke(). Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 7月, 2009 5 次提交
-
-
由 Mel Gorman 提交于
When profile= is used, a large buffer is allocated early at boot. This can be larger than what the page allocator can provide so it prints a warning. However, the caller is able to handle the situation so this patch suppresses the warning. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
After commit ec64f515 ("cgroup: fix frequent -EBUSY at rmdir"), cgroup's rmdir (especially against memcg) doesn't return -EBUSY by temporary ref counts. That commit expects all refs after pre_destroy() is temporary but...it wasn't. Then, rmdir can wait permanently. This patch tries to fix that and change followings. - set CGRP_WAIT_ON_RMDIR flag before pre_destroy(). - clear CGRP_WAIT_ON_RMDIR flag when the subsys finds racy case. if there are sleeping ones, wakes them up. - rmdir() sleeps only when CGRP_WAIT_ON_RMDIR flag is set. Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: NPaul Menage <menage@google.com> Acked-by: NBalbir Sigh <balbir@linux.vnet.ibm.com> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
The bug was introduced by commit cc31edce ("cgroups: convert tasks file to use a seq_file with shared pid array"). We cache a pid array for all threads that are opening the same "tasks" file, but the pids in the array are always from the namespace of the last process that opened the file, so all other threads will read pids from that namespace instead of their own namespaces. To fix it, we maintain a list of pid arrays, which is keyed by pid_ns. The list will be of length 1 at most time. Reported-by: NPaul Menage <menage@google.com> Idea-by: NPaul Menage <menage@google.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NSerge Hallyn <serue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hidetoshi Seto 提交于
Setting "crashkernel=512M-2G:64M,2G-:128M" does not work but it turns to work if it has a trailing-whitespace, like "crashkernel=512M-2G:64M,2G-:128M ". It was because of a bug in the parser, running over the cmdline. This patch adds a check of the termination. Reported-by: NJin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Tested-by: NJin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Fix a post-2.6.31 regression which was introduced by 2ff05b2b ("oom: move oom_adj value from task_struct to mm_struct"). After moving the oom_adj value from the task struct to the mm_struct, the oom_adj value was no longer properly inherited by child processes. Copying over the oom_adj value at fork time fixes that bug. [kosaki.motohiro@jp.fujitsu.com: test for current->mm before dereferencing it] Signed-off-by: NRik van Riel <riel@redhat.com> Reported-by: NPaul Menage <manage@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 7月, 2009 2 次提交
-
-
由 Lai Jiangshan 提交于
About a half events are missing when we splice_read from trace_pipe. They are unexpectedly consumed because we ignore the TRACE_TYPE_NO_CONSUME return value used by the function graph tracer when it needs to consume the events by itself to walk on the ring buffer. The same problem appears with ftrace_dump() Example of an output before this patch: 1) | ktime_get_real() { 1) 2.846 us | read_hpet(); 1) 4.558 us | } 1) 6.195 us | } After this patch: 0) | ktime_get_real() { 0) | getnstimeofday() { 0) 1.960 us | read_hpet(); 0) 3.597 us | } 0) 5.196 us | } The fix also applies on 2.6.30 Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Lai Jiangshan 提交于
When print_graph_entry() computes a function call entry event, it needs to also check the next entry to guess if it matches the return event of the current function entry. In order to look at this next event, it needs to consume the current entry before going ahead in the ring buffer. However, if the current event that gets consumed is the last one in the ring buffer head page, the ring_buffer may reuse the page for writers. The consumed entry will then become invalid because of possible racy overwriting. Me must then handle this entry by making a copy of it. The fix also applies on 2.6.30 Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org LKML-Reference: <4A6EEAEC.3050508@cn.fujitsu.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 28 7月, 2009 2 次提交
-
-
由 Oleg Nesterov 提交于
Commit 63706172 ("kthreads: rework kthread_stop()") removed the limitation that the thread function mysr not call do_exit() itself, but forgot to update the comment. Since that commit it is OK to use kthread_stop() even if kthread can exit itself. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Frysinger 提交于
The check_modstruct_version() needs to look up the symbol "module_layout" in the kernel, but it does so literally and not by a C identifier. The trouble is that it does not include a symbol prefix for those ports that need it (like the Blackfin and H8300 port). So make sure we tack on the MODULE_SYMBOL_PREFIX define to the front of it. Signed-off-by: NMike Frysinger <vapier@gentoo.org> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 7月, 2009 1 次提交
-
-
由 Thomas Gleixner 提交于
If trace_printk_on_oops is set we lose interesting trace information when the tracer is enabled across oops handling and printing. We want the trace which might give us information _WHY_ we oopsed. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 23 7月, 2009 3 次提交
-
-
由 Steven Rostedt 提交于
The current code will truncate the ftrace files contents if O_APPEND is not set and the file is opened in write mode. This is incorrect. It should only truncate the file if O_TRUNC is set. Otherwise if one of these files is opened by a C program with fopen "r+", it will incorrectly truncate the file. Reported-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt 提交于
Since the trace_printk may use pointers to the format fields in the buffer, they are exported via debugfs/tracing/printk_formats. This is used by utilities that read the ring buffer in binary format. It helps the utilities map the address of the format in the binary buffer to what the printf format looks like. Unfortunately, the way the output code works, it exports the address of the pointer to the format address, and not the format address itself. This makes the file totally useless in trying to figure out what format string a binary address belongs to. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Li Zefan 提交于
Every time we cat a trace_stat file, we leak memory allocated by seq_open(). Also fix memory leak in a failure path in tracing_stat_open(). Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A67D92B.4060704@cn.fujitsu.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-