提交 · fa44063f9ef163c3a4c8d8c0465bb8a056b42035 · openeuler / raspberrypi-kernel

03 7月, 2013 3 次提交

uprobes: Fix return value in error handling path · fa44063f

由 zhangwei(Jovi) 提交于 6月 13, 2013

When wrong argument is passed into uprobe_events it does not return
an error:

[root@jovi tracing]# echo 'p:myprobe /bin/bash' > uprobe_events
[root@jovi tracing]#

The proper response is:

[root@jovi tracing]# echo 'p:myprobe /bin/bash' > uprobe_events
-bash: echo: write error: Invalid argument

Link: http://lkml.kernel.org/r/51B964FF.5000106@huawei.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <srikar@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fa44063f

tracing: Fix race between deleting buffer and setting events · 2a6c24af

由 Steven Rostedt (Red Hat) 提交于 7月 02, 2013

While analyzing the code, I discovered that there's a potential race between
deleting a trace instance and setting events. There are a few races that can
occur if events are being traced as the buffer is being deleted. Mostly the
problem comes with freeing the descriptor used by the trace event callback.
To prevent problems like this, the events are disabled before the buffer is
deleted. The problem with the current solution is that the event_mutex is let
go between disabling the events and freeing the files, which means that the events
could be enabled again while the freeing takes place.

Cc: stable@vger.kernel.org # 3.10
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2a6c24af

tracing: Add trace_array_get/put() to event handling · 8e2e2fa4

由 Steven Rostedt (Red Hat) 提交于 7月 02, 2013

Commit a695cb58 "tracing: Prevent deleting instances when they are being read"
tried to fix a race between deleting a trace instance and reading contents
of a trace file. But it wasn't good enough. The following could crash the kernel:

 # cd /sys/kernel/debug/tracing/instances
 # ( while :; do mkdir foo; rmdir foo; done ) &
 # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) &

Luckily this can only be done by root user, but it should be fixed regardless.

The problem is that a delete of the file can happen after the write to the event
is opened, but before the enabling happens.

The solution is to make sure the trace_array is available before succeeding in
opening for write, and incerment the ref counter while opened.

Now the instance can be deleted when the events are writing to the buffer,
but the deletion of the instance will disable all events before the instance
is actually deleted.

Cc: stable@vger.kernel.org # 3.10
Reported-by: NAlexander Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8e2e2fa4

02 7月, 2013 13 次提交

tracing: Get trace_array ref counts when accessing trace files · 7b85af63

由 Steven Rostedt (Red Hat) 提交于 7月 01, 2013

When a trace file is opened that may access a trace array, it must
increment its ref count to prevent it from being deleted.

Cc: stable@vger.kernel.org # 3.10
Reported-by: NAlexander Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7b85af63

tracing: Add trace_array_get/put() to handle instance refs better · ff451961

由 Steven Rostedt (Red Hat) 提交于 7月 01, 2013

Commit a695cb58 "tracing: Prevent deleting instances when they are being read"
tried to fix a race between deleting a trace instance and reading contents
of a trace file. But it wasn't good enough. The following could crash the kernel:

 # cd /sys/kernel/debug/tracing/instances
 # ( while :; do mkdir foo; rmdir foo; done ) &
 # ( while :; do cat foo/trace &> /dev/null; done ) &

Luckily this can only be done by root user, but it should be fixed regardless.

The problem is that a delete of the file can happen after the reader starts
to open the file but before it grabs the trace_types_mutex.

The solution is to validate the trace array before using it. If the trace
array does not exist in the list of trace arrays, then it returns -ENODEV.

There's a possibility that a trace_array could be deleted and a new one
created and the open would open its file instead. But that is very minor as
it will just return the data of the new trace array, it may confuse the user
but it will not crash the system. As this can only be done by root anyway,
the race will only occur if root is deleting what its trying to read at
the same time.

Cc: stable@vger.kernel.org # 3.10
Reported-by: NAlexander Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ff451961

tracing: Protect ftrace_trace_arrays list in trace_events.c · a8227415

由 Alexander Z Lam 提交于 7月 01, 2013

There are multiple places where the ftrace_trace_arrays list is accessed in
trace_events.c without the trace_types_lock held.

Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: NAlexander Z Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a8227415

tracing: Make trace_marker use the correct per-instance buffer · 2d71619c

由 Alexander Z Lam 提交于 7月 01, 2013

The trace_marker file was present for each new instance created, but it
added the trace mark to the global trace buffer instead of to
the instance's buffer.

Link: http://lkml.kernel.org/r/1372717885-4543-2-git-send-email-azl@google.com

Cc: David Sharp <dhsharp@google.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: NAlexander Z Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2d71619c

ftrace: Do not run selftest if command line parameter is set · f1ed7c74

由 Steven Rostedt (Red Hat) 提交于 6月 27, 2013

If the kernel command line ftrace filter parameters are set
(ftrace_filter or ftrace_notrace), force the function self test to
pass, with a warning why it was forced.

If the user adds a filter to the kernel command line, it is assumed
that they know what they are doing, and the self test should just not
run instead of failing (which disables function tracing) or clearing
the filter, as that will probably annoy the user.

If the user wants the selftest to run, the message will tell them why
it did not.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f1ed7c74

tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit() · cf6735a4

由 Oleg Nesterov 提交于 6月 20, 2013

kprobe_perf_func() and kretprobe_perf_func() pass addr=ip to
perf_trace_buf_submit() for no reason.

This sets perf_sample_data->addr for PERF_SAMPLE_ADDR, we already
have perf_sample_data->ip initialized if PERF_SAMPLE_IP.

Link: http://lkml.kernel.org/r/20130620173811.GA13161@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

cf6735a4

tracing: Use flag buffer_disabled for irqsoff tracer · 10246fa3

由 Steven Rostedt (Red Hat) 提交于 7月 01, 2013

If the ring buffer is disabled and the irqsoff tracer records a trace it
will clear out its buffer and lose the data it had previously recorded.

Currently there's a callback when writing to the tracing_of file, but if
tracing is disabled via the function tracer trigger, it will not inform
the irqsoff tracer to stop recording.

By using the "mirror" flag (buffer_disabled) in the trace_array, that keeps
track of the status of the trace_array's buffer, it gives the irqsoff
tracer a fast way to know if it should record a new trace or not.
The flag may be a little behind the real state of the buffer, but it
should not affect the trace too much. It's more important for the irqsoff
tracer to be fast.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

10246fa3

tracing/kprobes: Turn trace_probe->files into list_head · b04d52e3

由 Oleg Nesterov 提交于 6月 20, 2013

I think that "ftrace_event_file *trace_probe[]" complicates the
code for no reason, turn it into list_head to simplify the code.
enable_trace_probe() no longer needs synchronize_sched().

This needs the extra sizeof(list_head) memory for every attached
ftrace_event_file, hopefully not a problem in this case.

Link: http://lkml.kernel.org/r/20130620173814.GA13165@redhat.comAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b04d52e3

tracing: Fix disabling of soft disable · 3baa5e4c

由 Tom Zanussi 提交于 6月 29, 2013

The comment on the soft disable 'disable' case of
__ftrace_event_enable_disable() states that the soft disable bit
should be cleared in that case, but currently only the soft mode bit
is actually cleared.

This essentially leaves the standard non-soft-enable enable/disable
paths as the only way to clear the soft disable flag, but the soft
disable bit should also be cleared when removing a trigger with '!'.

Also, the SOFT_DISABLED bit should never be set if SOFT_MODE is
cleared.

This fixes the above discrepancies.

Link: http://lkml.kernel.org/r/b9c68dd50bc07019e6c67d3f9b29be4ef1b2badb.1372479499.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3baa5e4c

tracing: Simplify code for showing of soft disabled flag · a4390596

由 Tom Zanussi 提交于 6月 29, 2013

Rather than enumerating each permutation, build the enable state
string up from the combination of states. This also allows for the
simpler addition of more states.

Link: http://lkml.kernel.org/r/9aff5af6dee2f5a40ca30df41c39d5f33e998d7a.1372479499.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a4390596

tracing/kprobes: Kill probe_enable_lock · 3fe3d619

由 Oleg Nesterov 提交于 6月 20, 2013

enable_trace_probe() and disable_trace_probe() should not worry about
serialization, the caller (perf_trace_init or __ftrace_set_clr_event)
holds event_mutex.

They are also called by kprobe_trace_self_tests_init(), but this __init
function can't race with itself or trace_events.c

And note that this code depended on event_mutex even before 41a7dd42
which introduced probe_enable_lock. In fact it assumes that the caller
kprobe_register() can never race with itself. Otherwise, say, tp->flags
manipulations are racy.

Link: http://lkml.kernel.org/r/20130620173809.GA13158@redhat.comAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3fe3d619

tracing/kprobes: Avoid perf_trace_buf_*() if ->perf_events is empty · 288e984e

由 Oleg Nesterov 提交于 6月 20, 2013

perf_trace_buf_prepare() + perf_trace_buf_submit() make no sense
if this task/CPU has no active counters. Change kprobe_perf_func()
and kretprobe_perf_func() to check call->perf_events beforehand
and return if this list is empty.

For example, "perf record -e some_probe -p1". Only /sbin/init will
report, all other threads which hit the same probe will do
perf_trace_buf_prepare/perf_trace_buf_submit just to realize that
nobody wants perf_swevent_event().

Link: http://lkml.kernel.org/r/20130620173806.GA13151@redhat.comAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

288e984e

tracing: Failed to create system directory · 6e94a780

由 Steven Rostedt 提交于 6月 27, 2013

Running the following:

 # cd /sys/kernel/debug/tracing
 # echo p:i do_sys_open > kprobe_events
 # echo p:j schedule >> kprobe_events
 # cat kprobe_events
p:kprobes/i do_sys_open
p:kprobes/j schedule
 # echo p:i do_sys_open >> kprobe_events
 # cat kprobe_events
p:kprobes/j schedule
p:kprobes/i do_sys_open
 # ls /sys/kernel/debug/tracing/events/kprobes/
enable  filter  j

Notice that the 'i' is missing from the kprobes directory.

The console produces:

"Failed to create system directory kprobes"

This is because kprobes passes in a allocated name for the system
and the ftrace event subsystem saves off that name instead of creating
a duplicate for it. But the kprobes may free the system name making
the pointer to it invalid.

This bug was introduced by 92edca07 "tracing: Use direct field, type
and system names" which switched from using kstrdup() on the system name
in favor of just keeping apointer to it, as the internal ftrace event
system names are static and exist for the life of the computer being booted.

Instead of reverting back to duplicating system names again, we can use
core_kernel_data() to determine if the passed in name was allocated or
static. Then use the MSB of the ref_count to be a flag to keep track if
the name was allocated or not. Then we can still save from having to duplicate
strings that will always exist, but still copy the ones that may be freed.

Cc: stable@vger.kernel.org # 3.10
Reported-by: N"zhangwei(Jovi)" <jovi.zhangwei@huawei.com>
Reported-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6e94a780

20 6月, 2013 4 次提交

ftrace: Fix stddev calculation in function profiler · 52d85d76

由 Juri Lelli 提交于 6月 12, 2013

When FUNCTION_GRAPH_TRACER is enabled, ftrace can profile kernel functions
and print basic statistics about them. Unfortunately, running stddev
calculation is wrong. This patch corrects it implementing Welford’s method:

s^2 = 1 / (n * (n-1)) * (n * \Sum (x_i)^2 - (\Sum x_i)^2) .
Link: http://lkml.kernel.org/r/1371031398-24048-1-git-send-email-juri.lelli@gmail.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

52d85d76

tracing/kprobes: Remove unnecessary checking of trace_probe_is_enabled · 195a84d9

由 zhangwei(Jovi) 提交于 6月 14, 2013

Since tp->flags assignment was moved into function enable_trace_probe(),
there is no need to use trace_probe_is_enabled to check flags
in the same function.

Remove the unnecessary checking.

Link: http://lkml.kernel.org/r/51BA7B9E.3040807@huawei.comAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

195a84d9

tracing: Disable tracing on warning · de7edd31

由 Steven Rostedt (Red Hat) 提交于 6月 14, 2013

Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.

This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

de7edd31

tracing: Add binary '&' filter for events · 1a891cf1

由 Steven Rostedt 提交于 6月 12, 2013

There are some cases when filtering on a set flag of a field of a tracepoint
is useful. But currently the only filtering commands for numbered fields
is ==, !=, <, <=, >, >=. This does not help when you just want to trace if
a specific flag is set. For example:

 > # sudo trace-cmd record -e brcmfmac:brcmf_dbg -f 'level & 0x40000'
 > disable all
 > enable brcmfmac:brcmf_dbg
 > path = /sys/kernel/debug/tracing/events/brcmfmac/brcmf_dbg/enable
 > (level & 0x40000)
 > ^
 > parse_error: Invalid operator
 >

When trying to trace brcmf_dbg when level has its 1 << 18 bit set, the
filter fails to perform.

By allowing a binary '&' operation, this gives the user the ability to
test a bit.

Note, a binary '|' is not added, as it doesn't make sense as fields must
be compared to constants (for now), and ORing a constant will always return
true.

Link: http://lkml.kernel.org/r/1371057385.9844.261.camel@gandalf.local.homeSuggested-by: NArend van Spriel <arend@broadcom.com>
Tested-by: NArend van Spriel <arend@broadcom.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1a891cf1

12 6月, 2013 6 次提交

tracing: Do not call kmem_cache_free() on allocation failure · aaf6ac0f

由 Namhyung Kim 提交于 6月 07, 2013

There's no point calling it when _alloc() failed.

Link: http://lkml.kernel.org/r/1370585268-29169-1-git-send-email-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

aaf6ac0f

ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched() · 7614c3dc

由 Steven Rostedt 提交于 5月 28, 2013

The function tracer uses preempt_disable/enable_notrace() for
synchronization between reading registered ftrace_ops and unregistering
them.

Most of the ftrace_ops are global permanent structures that do not
require this synchronization. That is, ops may be added and removed from
the hlist but are never freed, and wont hurt if a synchronization is
missed.

But this is not true for dynamically created ftrace_ops or control_ops,
which are used by the perf function tracing.

The problem here is that the function tracer can be used to trace
kernel/user context switches as well as going to and from idle.
Basically, it can be used to trace blind spots of the RCU subsystem.
This means that even though preempt_disable() is done, a
synchronize_sched() will ignore CPUs that haven't made it out of user
space or idle. These can include functions that are being traced just
before entering or exiting the kernel sections.

To implement the RCU synchronization, instead of using
synchronize_sched() the use of schedule_on_each_cpu() is performed. This
means that when a dynamically allocated ftrace_ops, or a control ops is
being unregistered, all CPUs must be touched and execute a ftrace_sync()
stub function via the work queues. This will rip CPUs out from idle or
in dynamic tick mode. This only happens when a user disables perf
function tracing or other dynamically allocated function tracers, but it
allows us to continue to debug RCU and context tracking with function
tracing.

Link: http://lkml.kernel.org/r/1369785676.15552.55.camel@gandalf.local.home

Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7614c3dc

tracing: Fix file mode of free_buffer · 238ae93d

由 Wang YanQing 提交于 5月 26, 2013

Commit 4f271a2a
(tracing: Add a proc file to stop tracing and free buffer)
implement a method to free up ring buffer in kernel memory
in the release code path of free_buffer's fd.

Then we don't need read/write support for free_buffer,
indeed we just have a dummy write fop, and don't implement read fop.

So the 0200 is more reasonable file mode for free_buffer than
the current file mode 0644.

Link: http://lkml.kernel.org/r/20130526085201.GA3183@udknightAcked-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Acked-by: NDavid Sharp <dhsharp@google.com>
Signed-off-by: NWang YanQing <udknight@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

238ae93d

tracing/trivial: Consolidate error return condition · 8092e808

由 Harsh Prateek Bora 提交于 5月 24, 2013

Consolidate the checks for !enabled and !param to return -EINVAL
in event_enable_func().

Link: http://lkml.kernel.org/r/1369380137-12452-1-git-send-email-harsh@linux.vnet.ibm.comSigned-off-by: NHarsh Prateek Bora <harsh@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8092e808

tracing: Add function probe to trigger a ftrace dump of current CPU trace · 90e3c03c

由 Steven Rostedt (Red Hat) 提交于 4月 30, 2013

Add the "cpudump" command to have the current CPU ftrace buffer dumped
to console if a function is hit. This is useful when debugging a
tripple fault, where you have an idea of a function that is called
just before the tripple fault occurs, and can tell ftrace to dump its
content out to the console before it continues.

This differs from the "dump" command as it only dumps the content of
the ring buffer for the currently executing CPU, and does not show
the contents of the other CPUs.

Format is:

  <function>:cpudump

echo 'bad_address:cpudump' > /debug/tracing/set_ftrace_filter

To remove this:

echo '!bad_address:cpudump' > /debug/tracing/set_ftrace_filter
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

90e3c03c

tracing: Add function probe to trigger a ftrace dump to console · ad71d889

由 Steven Rostedt (Red Hat) 提交于 4月 30, 2013

Add the "dump" command to have the ftrace buffer dumped to console if
a function is hit. This is useful when debugging a tripple fault,
where you have an idea of a function that is called just before the
tripple fault occurs, and can tell ftrace to dump its content out
to the console before it continues.

Format is:

  <function>:dump

echo 'bad_address:dump' > /debug/tracing/set_ftrace_filter

To remove this:

echo '!bad_address:dump' > /debug/tracing/set_ftrace_filter
Requested-by: NLuis Claudio R. Goncalves <lclaudio@uudg.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ad71d889

09 6月, 2013 3 次提交

irqdomain: document the simple domain first_irq · 94a63da0

由 Linus Walleij 提交于 6月 06, 2013

The first_irq needs to be zero to get a linear domain and that
comes with special semantics. We want to simplify this going
forward but some documentation never hurts.
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

94a63da0

kernel/irq/irqdomain.c: before use 'irq_data', need check it whether valid. · 275e31b1

由 Chen Gang 提交于 5月 14, 2013

Since irq_data may be NULL, if so, we WARN_ON(), and continue, 'hwirq'
which related with 'irq_data' has to initialize later, or it will cause
issue.
Signed-off-by: NChen Gang <gang.chen@asianux.com>
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

275e31b1

irqdomain: export irq_domain_add_simple · 346dbb79

由 Arnd Bergmann 提交于 4月 25, 2013

All other irq_domain_add_* functions are exported already, and apparently
this one got left out by mistake, which causes build errors for ARM
allmodconfig kernels:

ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-rcar.ko] undefined!
ERROR: "irq_domain_add_simple" [drivers/gpio/gpio-em.ko] undefined!
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NSimon Horman <horms+renesas@verge.net.au>
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

346dbb79

07 6月, 2013 1 次提交

tracing: Use current_uid() for critical time tracing · f17a5194

由 Steven Rostedt (Red Hat) 提交于 5月 30, 2013

The irqsoff tracer records the max time that interrupts are disabled.
There are hooks in the assembly code that calls back into the tracer when
interrupts are disabled or enabled.

When they are enabled, the tracer checks if the amount of time they
were disabled is larger than the previous recorded max interrupts off
time. If it is, it creates a snapshot of the currently running trace
to store where the last largest interrupts off time was held and how
it happened.

During testing, this RCU lockdep dump appeared:

[ 1257.829021] ===============================
[ 1257.829021] [ INFO: suspicious RCU usage. ]
[ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G        W
[ 1257.829021] -------------------------------
[ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1257.829021]
[ 1257.829021] other info that might help us debug this:
[ 1257.829021]
[ 1257.829021]
[ 1257.829021] RCU used illegally from idle CPU!
[ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
[ 1257.829021] RCU used illegally from extended quiescent state!
[ 1257.829021] 2 locks held by trace-cmd/4831:
[ 1257.829021]  #0:  (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209
[ 1257.829021]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee
[ 1257.829021]
[ 1257.829021] stack backtrace:
[ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G        W    3.10.0-rc1-test+ #171
[ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 1257.829021]  0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
[ 1257.829021]  ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
[ 1257.829021]  ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
[ 1257.829021] Call Trace:
[ 1257.829021]  [<ffffffff8153dd2b>] dump_stack+0x19/0x1b
[ 1257.829021]  [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112
[ 1257.829021]  [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee
[ 1257.829021]  [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209
[ 1257.829021]  [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f
[ 1257.829021]  [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197
[ 1257.829021]  [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf
[ 1257.829021]  [<ffffffff811002b9>] user_enter+0xfd/0x107
[ 1257.829021]  [<ffffffff810029b4>] do_notify_resume+0x92/0x97
[ 1257.829021]  [<ffffffff8154bdca>] int_signal+0x12/0x17

What happened was entering into the user code, the interrupts were enabled
and a max interrupts off was recorded. The trace buffer was saved along with
various information about the task: comm, pid, uid, priority, etc.

The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
to retrieve the data, and this happened to happen where RCU is blind (user_enter).

As only the preempt and irqs off tracers can have this happen, and they both
only have the tsk == current, if tsk == current, use current_uid() instead of
task_uid(), as current_uid() does not use RCU as only current can change its uid.

This fixes the RCU suspicious splat.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f17a5194

30 5月, 2013 1 次提交

tracing: Fix bad parameter passed in branch selftest · 0184d50f

由 Steven Rostedt (Red Hat) 提交于 5月 29, 2013

The branch selftest calls trace_test_buffer(), but with the new code
it expects the first parameter to be a pointer to a struct trace_buffer.
All self tests were changed but the branch selftest was missed.

This caused either a crash or failed test when the branch selftest was
enabled.

Link: http://lkml.kernel.org/r/20130529141333.GA24064@localhostReported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0184d50f

29 5月, 2013 4 次提交

ftrace: Use the rcu _notrace variants for rcu_dereference_raw() and friends · 1bb539ca

由 Steven Rostedt 提交于 5月 28, 2013

As rcu_dereference_raw() under RCU debug config options can add quite a
bit of checks, and that tracing uses rcu_dereference_raw(), these checks
happen with the function tracer. The function tracer also happens to trace
these debug checks too. This added overhead can livelock the system.

Have the function tracer use the new RCU _notrace equivalents that do
not do the debug checks for RCU.

Link: http://lkml.kernel.org/r/20130528184209.467603904@goodmis.orgAcked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1bb539ca

cgroup: warn about mismatching options of a new mount of an existing hierarchy · 2a0ff3fb

由 Jeff Liu 提交于 5月 26, 2013

With the new __DEVEL__sane_behavior mount option was introduced,
if the root cgroup is alive with no xattr function, to mount a
new cgroup with xattr will be rejected in terms of design which
just fine.  However, if the root cgroup does not mounted with
__DEVEL__sane_hehavior, to create a new cgroup with xattr option
will succeed although after that the EA function does not works
as expected but will get ENOTSUPP for setting up attributes under
either cgroup. e.g.

setfattr: /cgroup2/test: Operation not supported

Instead of keeping silence in this case, it's better to drop a log
entry in warning level.  That would be helpful to understand the
reason behind the scene from the user's perspective, and this is
essentially an improvement does not break the backward compatibilities.

With this fix, above mount attemption will keep up works as usual but
the following line cound be found at the system log:

[ ...] cgroup: new mount options do not match the existing superblock

tj: minor formatting / message updates.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

2a0ff3fb

timekeeping: Correct run-time detection of persistent_clock. · 0d6bd995

由 Zoran Markovic 提交于 5月 17, 2013

Since commit 31ade306, timekeeping_init()
checks for presence of persistent clock by attempting to read a non-zero
time value. This is an issue on platforms where persistent_clock (instead
is implemented as a free-running counter (instead of an RTC) starting
from zero on each boot and running during suspend. Examples are some ARM
platforms (e.g. PandaBoard).

An attempt to read such a clock during timekeeping_init() may return zero
value and falsely declare persistent clock as missing. Additionally, in
the above case suspend times may be accounted twice (once from
timekeeping_resume() and once from rtc_resume()), resulting in a gradual
drift of system time.

This patch does a run-time correction of the issue by doing the same check
during timekeeping_suspend().

A better long-term solution would have to return error when trying to read
non-existing clock and zero when trying to read an uninitialized clock, but
that would require changing all persistent_clock implementations.

This patch addresses the immediate breakage, for now.

Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Feng Tang <feng.tang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NZoran Markovic <zoran.markovic@linaro.org>
[jstultz: Tweaked commit message and subject]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

0d6bd995

ntp: Remove unused variable flags in __hardpps · aa848233

由 Geert Uytterhoeven 提交于 5月 03, 2013

kernel/time/ntp.c: In function ‘__hardpps’:
kernel/time/ntp.c:877: warning: unused variable ‘flags’

commit a076b214 ("ntp: Remove ntp_lock,
using the timekeeping locks to protect ntp state") removed its users,
but not the actual variable.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

aa848233

28 5月, 2013 2 次提交

ring-buffer: Do not poll non allocated cpu buffers · 6721cb60

由 Steven Rostedt (Red Hat) 提交于 5月 23, 2013

The tracing infrastructure sets up for possible CPUs, but it uses
the ring buffer polling, it is possible to call the ring buffer
polling code with a CPU that hasn't been allocated. This will cause
a kernel oops when it access a ring buffer cpu buffer that is part
of the possible cpus but hasn't been allocated yet as the CPU has never
been online.
Reported-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Tested-by: NMauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6721cb60

tick: Cure broadcast false positive pending bit warning · 2938d275

由 Thomas Gleixner 提交于 5月 28, 2013

commit 26517f3e (tick: Avoid programming the local cpu timer if
broadcast pending) added a warning if the cpu enters broadcast mode
again while the pending bit is still set. Meelis reported that the
warning triggers. There are two corner cases which have been not
considered:

1) cpuidle calls clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
   twice. That can result in the following scenario

   CPU0                    CPU1
                           cpuidle_idle_call()
                             clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                               set cpu in tick_broadcast_oneshot_mask

   broadcast interrupt
     event expired for cpu1
     set pending bit

                             acpi_idle_enter_simple()
                               clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER)
                                 WARN_ON(pending bit)

  Move the WARN_ON into the section where we enter broadcast mode so
  it wont provide false positives on the second call.

2) safe_halt() enables interrupts, so a broadcast interrupt can be
   delivered befor the broadcast mode is disabled. That sets the
   pending bit for the CPU which receives the broadcast
   interrupt. Though the interrupt is delivered right away from the
   broadcast handler and leaves the pending bit stale.

   Clear the pending bit for the current cpu in the broadcast handler.
Reported-and-tested-by: NMeelis Roos <mroos@linux.ee>
Cc: Len Brown <lenb@kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1305271841130.4220@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

2938d275

25 5月, 2013 1 次提交

auditfilter.c: fix kernel-doc warnings · 387b8b3e

由 Randy Dunlap 提交于 5月 24, 2013

Fix kernel-doc warnings in kernel/auditfilter.c:

Warning(kernel/auditfilter.c:1029): Excess function parameter 'loginuid' description in 'audit_receive_filter'
Warning(kernel/auditfilter.c:1029): Excess function parameter 'sessionid' description in 'audit_receive_filter'
Warning(kernel/auditfilter.c:1029): Excess function parameter 'sid' description in 'audit_receive_filter'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

387b8b3e

24 5月, 2013 1 次提交

cgroup: fix a subtle bug in descendant pre-order walk · 7805d000

由 Tejun Heo 提交于 5月 24, 2013

When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty.  This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path.  Remove it along with
the later assumption that the subtree isn't empty.  This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org

7805d000

23 5月, 2013 1 次提交

tracing: Fix crash when ftrace=nop on the kernel command line · ca164318

由 Steven Rostedt (Red Hat) 提交于 5月 23, 2013

If ftrace=<tracer> is on the kernel command line, when that tracer is
registered, it will be initiated by tracing_set_tracer() to execute that
tracer.

The nop tracer is just a stub tracer that is used to have no tracer
enabled. It is assigned at early bootup as it is the default tracer.

But if ftrace=nop is on the kernel command line, the registering of the
nop tracer will call tracing_set_tracer() which will try to execute
the nop tracer. But it expects tr->current_trace to be assigned something
as it usually is assigned to the nop tracer. As it hasn't been assigned
to anything yet, it causes the system to crash.

The simple fix is to move the tr->current_trace = nop before registering
the nop tracer. The functionality is still the same as the nop tracer
doesn't do anything anyway.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ca164318