提交 · e360adbe29241a0194e10e20595360dd7b98a2b3 · openeuler / raspberrypi-kernel

19 10月, 2010 3 次提交

irq_work: Add generic hardirq context callbacks · e360adbe

由 Peter Zijlstra 提交于 10月 14, 2010

Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: NHuang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e360adbe

perf_events: Fix transaction recovery in group_sched_in() · 8e5fc1a7

由 Stephane Eranian 提交于 10月 15, 2010

The group_sched_in() function uses a transactional approach to schedule
a group of events. In a group, either all events can be scheduled or
none are. To schedule each event in, the function calls event_sched_in().
In case of error, event_sched_out() is called on each event in the group.

The problem is that event_sched_out() does not completely cancel the
effects of event_sched_in(). Furthermore event_sched_out() changes the
state of the event as if it had run which is not true is this particular
case.

Those inconsistencies impact time tracking fields and may lead to events
in a group not all reporting the same time_enabled and time_running values.
This is demonstrated with the example below:

$ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
  11423 baclears (32.85% scaling, ena=829181, run=556827)
   7671 baclears (0.00% scaling, ena=556827, run=556827)

2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
  11705 baclears (57.83% scaling, ena=962822, run=405995)
  11705 baclears (57.83% scaling, ena=962822, run=405995)

Notice that in the first group, the last baclears event does not
report the same timings as its siblings.

This issue comes from the fact that tstamp_stopped is updated
by event_sched_out() as if the event had actually run.

To solve the issue, we must ensure that, in case of error, there is
no change in the event state whatsoever. That means timings must
remain as they were when entering group_sched_in().

To do this we defer updating tstamp_running until we know the
transaction succeeded. Therefore, we have split event_sched_in()
in two parts separating the update to tstamp_running.

Similarly, in case of error, we do not want to update tstamp_stopped.
Therefore, we have split event_sched_out() in two parts separating
the update to tstamp_stopped.

With this patch, we now get the following output:

$ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
  11243 baclears (71.75% scaling, ena=1093330, run=308841)
  11243 baclears (71.75% scaling, ena=1093330, run=308841)

1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
   9253 baclears (0.00% scaling, ena=784489, run=784489)
   9253 baclears (0.00% scaling, ena=784489, run=784489)

Note that the uneven timing between groups is a side effect of
the process spending most of its time sleeping, i.e., not enough
event rotations (but that's a separate issue).
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4cb86b4c.41e9d80a.44e9.3e19@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8e5fc1a7

perf_events: Fix bogus context time tracking · c530ccd9

由 Stephane Eranian 提交于 10月 15, 2010

You can only call update_context_time() when the context
is active, i.e., the thread it is attached to is still running.

However, perf_event_read() can be called even when the context
is inactive, e.g., user read() the counters. The call to
update_context_time() must be conditioned on the status of
the context, otherwise, bogus time_enabled, time_running may
be returned. Here is an example on AMD64. The task program
is an example from libpfm4. The -p prints deltas every 1s.

$ task -p -e cpu_clk_unhalted sleep 5
    2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)

Whereas if you don't read deltas, e.g., no call to perf_event_read() until
the process terminates:

$ task -e cpu_clk_unhalted sleep 5
    2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)

Notice that time_enable, time_running are bogus in the first example
causing bogus scaling.

This patch fixes the problem, by conditionally calling update_context_time()
in perf_event_read().
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
LKML-Reference: <4cb856dc.51edd80a.5ae0.38fb@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c530ccd9

15 10月, 2010 2 次提交

ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT · cf4db259

由 Steven Rostedt 提交于 10月 14, 2010

The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.
Suggested-by: NIngo Molnar <mingo@elte.hu>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

cf4db259

ftrace/x86: Add support for C version of recordmcount · 72441cb1

由 Steven Rostedt 提交于 10月 13, 2010

This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.

After verifying this works, other archs can add:

 HAVE_C_MCOUNT_RECORD

in its Kconfig and it will use the C version of recordmcount
instead of the perl version.

Cc: <linux-arch@vger.kernel.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: linux-kbuild@vger.kernel.org
Cc: John Reiser <jreiser@bitwagon.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

72441cb1

14 10月, 2010 1 次提交

kprobes: Fix selftest to clear flags field for reusing probes · fd02e6f7

由 Masami Hiramatsu 提交于 10月 14, 2010

Fix selftest to clear flags field for reusing probes
because the flags field can be modified by Kprobes.
This also set NULL to kprobe.addr instead of 0.
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20101014031024.4100.50107.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd02e6f7

13 10月, 2010 1 次提交

tracing: Fix function-graph build warning on 32-bit · 14cae9bd

由 Borislav Petkov 提交于 9月 29, 2010

Fix

kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast

when building 36-rc6 on a 32-bit due to the strict type check failing
in the min() macro.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20100929080823.GA13595@liondog.tnic>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

14cae9bd

11 10月, 2010 1 次提交

perf: New helper function for pmu name · 84c79910

由 Matt Fleming 提交于 10月 03, 2010

Introduce perf_pmu_name() helper function that returns the name of the
pmu. This gives us a generic way to get the name of a pmu regardless of
how an architecture identifies it internally.
Signed-off-by: NMatt Fleming <matt@console-pimps.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mundt <lethal@linux-sh.org>
Signed-off-by: NRobert Richter <robert.richter@amd.com>

84c79910

06 10月, 2010 1 次提交

modules: Fix module_bug_list list corruption race · 5336377d

由 Linus Torvalds 提交于 10月 05, 2010

With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.

However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling.  That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.

Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.

So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.

Future fixups:
 - move the module list handling code into kernel/module.c where it
   belongs.
 - get rid of 'module_bug_list' and just use the regular list of modules
   (called 'modules' - imagine that) that we already create and maintain
   for other reasons.
Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5336377d

04 10月, 2010 1 次提交

perf_events: Fix invalid pointer when pid is invalid · 540804b5

由 Stephane Eranian 提交于 10月 04, 2010

This patch fixes an error in perf_event_open() when the pid
provided by the user is invalid. find_lively_task_by_vpid()
does not return NULL on error but an error code. Without the
fix the error code was silently passed to find_get_context()
which would eventually cause a invalid pointer dereference.
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
LKML-Reference: <4ca9a5d1.e8e9d80a.3dbb.ffff8f2e@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

540804b5

02 10月, 2010 1 次提交

kfifo: fix scatterlist usage · 399f1e30

由 Ira W. Snyder 提交于 9月 30, 2010

The kfifo_dma family of functions use sg_mark_end() on the last element in
their scatterlist. This forces use of a fresh scatterlist for each DMA
operation, which makes recycling a single scatterlist impossible.

Change the behavior of the kfifo_dma functions to match the usage of the
dma_map_sg function. This means that users must respect the returned
nents value. The sample code is updated to reflect the change.

This bug is trivial to cause: call kfifo_dma_in_prepare() such that it
prepares a scatterlist with a single entry comprising the whole fifo.
This is the case when you map the entirety of a newly created empty fifo.
This causes the setup_sgl() function to mark the first scatterlist entry
as the end of the chain, no matter what comes after it.

Afterwards, add and remove some data from the fifo such that another call
to kfifo_dma_in_prepare() will create two scatterlist entries. It returns
nents=2. However, due to the previous sg_mark_end() call, sg_is_last()
will now return true for the first scatterlist element. This causes the
sample code to print a single scatterlist element when it should print
two.

By removing the call to sg_mark_end(), we make the API as similar as
possible to the DMA mapping API. All users are required to respect the
returned nents.
Signed-off-by: NIra W. Snyder <iws@ovro.caltech.edu>
Cc: Stefani Seibold <stefani@seibold.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

399f1e30

23 9月, 2010 5 次提交

rmap: fix walk during fork · a247c3a9

由 Andrea Arcangeli 提交于 9月 22, 2010

The below bug in fork led to the rmap walk finding the parent huge-pmd
twice instead of just once, because the anon_vma_chain objects of the
child vma still point to the vma->vm_mm of the parent.

The patch fixes it by making the rmap walk accurate during fork.  It's not
a big deal normally but it worth being accurate considering the cost is
the same.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NJohannes Weiner <jweiner@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a247c3a9

jump label: Tracepoint support for jump labels · 8f7b50c5

由 Jason Baron 提交于 9月 17, 2010

Make use of the jump label infrastructure for tracepoints.
Signed-off-by: NJason Baron <jbaron@redhat.com>
LKML-Reference: <a9ba2056e2c9cf332c3c300b577463ce66ff23a8.1284733808.git.jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8f7b50c5

jump label: Add jump_label_text_reserved() to reserve jump points · 4c3ef6d7

由 Jason Baron 提交于 9月 17, 2010

Add a jump_label_text_reserved(void *start, void *end), so that other
pieces of code that want to modify kernel text, can first verify that
jump label has not reserved the instruction.
Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
LKML-Reference: <06236663a3a7b1c1f13576bb9eccb6d9c17b7bfe.1284733808.git.jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4c3ef6d7

jump label: Initialize workqueue tracepoints *before* they are registered · e0cf0cd4

由 Jason Baron 提交于 9月 17, 2010

Initialize the workqueue data structures *before* they are registered
so that they are ready for callbacks.
Signed-off-by: NJason Baron <jbaron@redhat.com>
LKML-Reference: <e3a3383fc370ac7086625bebe89d9480d7caf372.1284733808.git.jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e0cf0cd4

jump label: Base patch for jump label · bf5438fc

由 Jason Baron 提交于 9月 17, 2010

base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
assembly gcc mechanism, we can now branch to labels from an 'asm goto'
statment. This allows us to create a 'no-op' fastpath, which can subsequently
be patched with a jump to the slowpath code. This is useful for code which
might be rarely used, but which we'd like to be able to call, if needed.
Tracepoints are the current usecase that these are being implemented for.
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NJason Baron <jbaron@redhat.com>
LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com>

[ cleaned up some formating ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bf5438fc

21 9月, 2010 2 次提交

perf: Avoid RCU vs preemption assumptions · 41945f6c

由 Peter Zijlstra 提交于 9月 16, 2010

The per-pmu per-cpu context patch converted things from
get_cpu_var() to this_cpu_ptr(), but that only works if
rcu_read_lock() actually disables preemption, and since
there is no such guarantee, we need to fix that.

Use the newly introduced {get,put}_cpu_ptr().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <20100917093009.308453028@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

41945f6c

sched: Fix nohz balance kick · f6c3f168

由 Suresh Siddha 提交于 9月 13, 2010

There's a situation where the nohz balancer will try to wake itself:

cpu-x is idle which is also ilb_cpu
got a scheduler tick during idle
and the nohz_kick_needed() in trigger_load_balance() checks for
rq_x->nr_running which might not be zero (because of someone waking a
task on this rq etc) and this leads to the situation of the cpu-x
sending a kick to itself.

And this can cause a lockup.

Avoid this by not marking ourself eligible for kicking.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1284400941.2684.19.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f6c3f168

17 9月, 2010 5 次提交

perf: Undo the per cpu-context timer stuff · e9d2b064

由 Peter Zijlstra 提交于 9月 17, 2010

Revert the timer per cpu-context timers because of unfortunate
nohz interaction. Fixing that would have been somewhat ugly, so
go back to driving things from the regular tick. Provide a
jiffies interval feature for people who want slower rotations.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20100917093009.519845633@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e9d2b064

perf: Fix perf_event_exit_cpu_context() · 917bdd1c

由 Peter Zijlstra 提交于 9月 17, 2010

Use the right cpu-context.. spotted by preempt warning on
hot-unplug
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <20100917093009.461794357@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

917bdd1c

perf: Complete software pmu grouping · b04243ef

由 Peter Zijlstra 提交于 9月 17, 2010

Aside from allowing software events into a !software group,
allow adding !software events to pure software groups.

Once we've moved the software group and attached the first
!software event, the group will no longer be a pure software
group and hence no longer be eligible for movement, at which
point the straight ctx comparison is correct again.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20100917093009.410784731@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b04243ef

perf_events: Fix broken event grouping · d14b12d7

由 Stephane Eranian 提交于 9月 17, 2010

Events were not grouped anymore. The reason was that in
perf_event_open(), the field event->group_leader was
initialized before the function looked up the group_fd
to find the event leader. This patch fixes this by
reordering the code correctly.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <20100917093009.360420946@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d14b12d7

hw breakpoints: Fix pid namespace bug · 068e35ee

由 Matt Helsley 提交于 9月 13, 2010

Hardware breakpoints can't be registered within pid namespaces
because tsk->pid is passed rather than the pid in the current
namespace.

(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.

Much thanks to Frederic Weisbecker <fweisbec@gmail.com> for doing
the bulk of the work finding this bug.
Reported-by: NRobin Green <greenrd@greenrd.org>
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: 2.6.33-2.6.35 <stable@kernel.org>
LKML-Reference: <f63454af09fb1915717251570423eb9ddd338340.1284407762.git.matthltc@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

068e35ee

15 9月, 2010 15 次提交

kprobes: Add sparse context annotations · 635c17c2

由 Namhyung Kim 提交于 9月 15, 2010

This removes following warnings when build with C=1

warning: context imbalance in 'kretprobe_hash_lock' - wrong count at exit
warning: context imbalance in 'kretprobe_table_lock' - wrong count at exit
warning: context imbalance in 'kretprobe_hash_unlock' - unexpected unlock
warning: context imbalance in 'kretprobe_table_unlock' - unexpected unlock
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-6-git-send-email-namhyung@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

635c17c2

kprobes: Make functions static · 6376b229

由 Namhyung Kim 提交于 9月 15, 2010

Make following (internal) functions static to make sparse
happier :-)

 * get_optimized_kprobe: only called from static functions
 * kretprobe_table_unlock: _lock function is static
 * kprobes_optinsn_template_holder: never called but holding asm code
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-4-git-send-email-namhyung@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6376b229

kprobes: Verify jprobe entry point · 05662bdb

由 Namhyung Kim 提交于 9月 15, 2010

Verify jprobe's entry point is a function entry point
using kallsyms' offset value.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-3-git-send-email-namhyung@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

05662bdb

kprobes: Remove redundant address check · edbaadbe

由 Namhyung Kim 提交于 9月 15, 2010

Remove call to kernel_text_address() in register_jprobes()
because it is called right after in register_kprobe().
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-2-git-send-email-namhyung@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

edbaadbe

perf events: Clean up pid passing · 38a81da2

由 Matt Helsley 提交于 9月 13, 2010

The kernel perf event creation path shouldn't use find_task_by_vpid()
because a vpid exists in a specific namespace. find_task_by_vpid() uses
current's pid namespace which isn't always the correct namespace to use
for the vpid in all the places perf_event_create_kernel_counter() (and
thus find_get_context()) is called.

The goal is to clean up pid namespace handling and prevent bugs like:

	https://bugzilla.kernel.org/show_bug.cgi?id=17281

Instead of using pids switch find_get_context() to use task struct
pointers directly. The syscall is responsible for resolving the pid to
a task struct. This moves the pid namespace resolution into the syscall
much like every other syscall that takes pid parameters.
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robin Green <greenrd@greenrd.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
LKML-Reference: <a134e5e392ab0204961fd1a62c84a222bf5874a9.1284407763.git.matthltc@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

38a81da2

perf events: Split out task search into helper · 2ebd4ffb

由 Matt Helsley 提交于 9月 13, 2010

Split out the code which searches for non-exiting tasks into its own
helper. Creating this helper not only makes the code slightly more
readable it prepares to move the search out of find_get_context() in
a subsequent commit.
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robin Green <greenrd@greenrd.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
LKML-Reference: <561205417b450b8a4bf7488374541d64b4690431.1284407762.git.matthltc@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2ebd4ffb

hw breakpoints: Fix pid namespace bug · d958077d

由 Matt Helsley 提交于 9月 13, 2010

Hardware breakpoints can't be registered within pid namespaces
because tsk->pid is passed rather than the pid in the current
namespace.

(See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

This is a quick fix demonstrating the problem but is not the
best method of solving the problem since passing pids internally
is not the best way to avoid pid namespace bugs. Subsequent patches
will show a better solution.

Much thanks to Frederic Weisbecker <fweisbec@gmail.com> for doing the
bulk of the work finding this bug.
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robin Green <greenrd@greenrd.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
LKML-Reference: <f63454af09fb1915717251570423eb9ddd338340.1284407762.git.matthltc@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d958077d

watchdog: Avoid kernel crash when disabling watchdog · d9ca07a0

由 Stephane Eranian 提交于 9月 14, 2010

In case you boot with the watchdog disabled, i.e., nowatchdog, then,
if you try to disable it via /proc/sys/kernel/watchdog, you get
a kernel crash. The reason is that you are trying to cancel a hrtimer
which has never been initialized.

This patch fixes this by skipping execution of
watchdog_disable_all_cpus() when the watchdog is marked
disabled from boot.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4c8f7a23.cae9d80a.2c11.0bb4@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d9ca07a0

sched: Fix user time incorrectly accounted as system time on 32-bit · e75e863d

由 Stanislaw Gruszka 提交于 9月 14, 2010

We have 32-bit variable overflow possibility when multiply in
task_times() and thread_group_times() functions. When the
overflow happens then the scaled utime value becomes erroneously
small and the scaled stime becomes i erroneously big.

Reported here:

 https://bugzilla.redhat.com/show_bug.cgi?id=633037
 https://bugzilla.kernel.org/show_bug.cgi?id=16559Reported-by: NMichael Chapman <redhat-bugzilla@very.puzzling.org>
Reported-by: NCiriaco Garcia de Celis <sysman@etherpilot.com>
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: <stable@kernel.org>  # 2.6.32.19+ (partially) and 2.6.33+
LKML-Reference: <20100914143513.GB8415@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e75e863d

tracing: Remove leftover FTRACE_ENABLE/DISABLE_MCOUNT enums · 79e406d7

由 Steven Rostedt 提交于 9月 14, 2010

The enums for FTRACE_ENABLE_MCOUNT and FTRACE_DISABLE_MCOUNT were
used as commands to ftrace_run_update_code(). But these commands
were used by the old nasty ftrace daemon that has long been slain.

This is a clean up patch to remove the references to these enums
and simplify the code a little.
Reported-by: NWu Zhangjin <wuzhangjin@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

79e406d7

tracing: Do not trace in irq when funcgraph-irq option is zero · b304d044

由 Steven Rostedt 提交于 9月 14, 2010

When the function graph tracer funcgraph-irq option is zero, disable
tracing in IRQs. This makes the option have two effects.

1) When reading the trace file, do not display the functions that
   happen in interrupt context (when detected)

2) [*new*] When recording a trace, skip those that are detected
   to be in interrupt by the 'in_irq()' function

Note, in_irq() is updated at irq_enter() and irq_exit(). There are
still functions that are recorded by the function graph tracer that
is in interrupt context but outside the irq_enter/exit() routines.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

b304d044

tracing: Add funcgraph-irq option for function graph tracer. · 2bd16212

由 Jiri Olsa 提交于 9月 07, 2010

It's handy to be able to disable the irq related output
and not to have to jump over each irq related code, when
you have no interrest in it.

The option is by default enabled, so there's no change to
current behaviour. It affects only the final output, so all
the irq related data stay in the ring buffer.
Signed-off-by: NJiri Olsa <jolsa@redhat.com>
LKML-Reference: <20100907145344.GC1912@jolsa.brq.redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2bd16212

compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5

由 H. Peter Anvin 提交于 9月 07, 2010

compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.
Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NChris Metcalf <cmetcalf@tilera.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NTony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@kernel.org>

c41d68a5

tracing: Fix reading of set_ftrace_filter across lists · 57c072c7

由 Steven Rostedt 提交于 9月 14, 2010

If we do:

 # cd /sys/kernel/debug
 # echo 'do_IRQ:traceon schedule:traceon sys_write:traceon' > \
    set_ftrace_filter
 # cat set_ftrace_filter

We get the following output:

 #### all functions enabled ####
 sys_write:traceon:unlimited
 schedule:traceon:unlimited
 do_IRQ:traceon:unlimited

This outputs two lists. One is the fact that all functions are
currently enabled for function tracing, the other has three probed
functions, which happen to have 'traceon' as their commands.

Currently, when reading the first list (functions enabled) the
seq_file code will receive a "NULL" from the t_next() function
causing it to exit early. This makes "read()" from userspace stop
reading the code at this boarder. Although read is allowed to do this,
some (broken) applications might consider this an end of file and
stop early.

This patch adds the start of the second list to t_next() when it
finishes the first list. It is a simple change and gives the
set_ftrace_filter file nicer reading ability.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

57c072c7

tracing: Keep track of set_ftrace_filter position and allow lseek again · 98c4fd04

由 Steven Rostedt 提交于 9月 10, 2010

This patch keeps track of the index within the elements of
set_ftrace_filter and if the position goes backwards, it nicely
resets and starts from the beginning again.

This allows for lseek and pread to work properly now.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

98c4fd04

14 9月, 2010 2 次提交

tracing: Replace typecasted void pointer in set_ftrace_filter code · 4aeb6967

由 Steven Rostedt 提交于 9月 09, 2010

The set_ftrace_filter uses seq_file and reads from two lists. The
pointer returned by t_next() can either be of type struct dyn_ftrace
or struct ftrace_func_probe. If there is a bug (there was one)
the wrong pointer may be used and the reference can cause an oops.

This patch makes t_next() and friends only return the iterator structure
which now has a pointer of type struct dyn_ftrace and struct
ftrace_func_probe. The t_show() can now test if the pointer is NULL or
not and if the pointer exists, it is guaranteed to be of the correct type.

Now if there's a bug, only wrong data will be shown but not an oops.

Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4aeb6967

tracing: Do not reset *pos in set_ftrace_filter · 2bccfffd

由 Steven Rostedt 提交于 9月 09, 2010

After the filtered functions are read, the probed functions are read
from the hash in set_ftrace_filter. When the hashed probed functions
are read, the *pos passed in is reset. Instead of modifying the pos
given to the read function, just record the pos where the filtered
functions ended and subtract from that.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2bccfffd