提交 · 64ce312618ef0e11d88def80effcefd1b59fdb1e · openeuler / raspberrypi-kernel

29 5月, 2011 9 次提交

perf: De-schedule a task context when removing the last event · 64ce3126

由 Peter Zijlstra 提交于 4月 09, 2011

Since perf_install_in_context() will now install a context when we
add the first event, we can de-schedule the context when the last
event is removed.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192142.090431763@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

64ce3126

perf: Change close() semantics for group events · e03a9a55

由 Peter Zijlstra 提交于 4月 09, 2011

In order to always call list_del_event() on the correct cpu if the
event is part of an active context and avoid having to do two IPIs,
change the close() semantics slightly.

The current perf_event_disable() call would disable a whole group if
the event that's being closed is the group leader, whereas the new
code keeps the group siblings enabled.

People should not rely on this behaviour and I don't think they do,
but in case we find they do, the fix is easy and we have to take the
double IPI cost.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>
Link: http://lkml.kernel.org/r/20110409192142.038377551@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

e03a9a55

perf: Collect the schedule-in rules in one function · dce5855b

由 Peter Zijlstra 提交于 4月 09, 2011

This was scattered out - refactor it into a single function.
No change in functionality.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.979862055@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

dce5855b

perf: Change and simplify ctx::is_active semantics · db24d33e

由 Peter Zijlstra 提交于 4月 09, 2011

Instead of tracking if a context is active or not, track which events
of the context are active. By making it a bitmask of
EVENT_PINNED|EVENT_FLEXIBLE we can simplify some of the scheduling
routines since it can avoid adding events that are already active.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.930282378@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

db24d33e

perf: Simplify and fix __perf_install_in_context() · 2c29ef0f

由 Peter Zijlstra 提交于 4月 09, 2011

Currently __perf_install_in_context() will try and schedule in the
event irrespective of our event scheduling rules, that is, we try to
schedule CPU-pinned, TASK-pinned, CPU-flexible, TASK-flexible, but
when creating a new event we simply try and schedule it on top of
whatever is already on the PMU, this can lead to errors for pinned
events.

Therefore, simplify things and simply schedule everything out, add the
event to the corresponding context and schedule everything back in.

This also nicely handles the case where with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI can come right in the middle
of schedule, before we managed to call perf_event_task_sched_in().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.870894224@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

2c29ef0f

perf: Remove task_ctx_sched_in() · 04dc2dbb

由 Peter Zijlstra 提交于 4月 09, 2011

Make task_ctx_sched_*() imply EVENT_ALL, since anything less will not
actually have scheduled the task in/out at all.

Since there's no site that schedules all of a task in (due to the
interleave with flexible cpuctx) we can remove this function.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.817893268@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

04dc2dbb

perf: Optimize event scheduling locking · facc4307

由 Peter Zijlstra 提交于 4月 09, 2011

Currently we only hold one ctx->lock at a time, which results in us
flipping back and forth between cpuctx->ctx.lock and task_ctx->lock.

Avoid this and gain large atomic regions by holding both locks. We
nest the task lock inside the cpu lock, since with task scheduling we
might have to change task ctx while holding the cpu ctx lock.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.769881865@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

facc4307

perf: Clean up 'ctx' reference counting · 9137fb28

由 Peter Zijlstra 提交于 4月 09, 2011

Small cleanup to how we refcount in find_get_context(), this also
allows us to use put_ctx() to free things instead of using kfree().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.719340481@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

9137fb28

perf: Optimize ctx_sched_out() · 075e0b00

由 Peter Zijlstra 提交于 4月 09, 2011

Oleg noted that ctx_sched_out() disables the PMU even though it might
not actually do something, avoid needless PMU-disabling.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110409192141.665385503@chello.nlSigned-off-by: NIngo Molnar <mingo@elte.hu>

075e0b00

28 5月, 2011 1 次提交

perf: Fix SIGIO handling · f506b3dc

由 Peter Zijlstra 提交于 5月 26, 2011

Vince noticed that unless we mmap() a buffer, SIGIO gets lost. So
explicitly push the wakeup (including signals) when requested.
Reported-by: NVince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/n/tip-2euus3f3x3dyvdk52cjxw8zu@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

f506b3dc

04 5月, 2011 1 次提交

perf events: Clean up definitions and initializers, update copyrights · e7e7ee2e

由 Ingo Molnar 提交于 5月 04, 2011

Fix a few inconsistent style bits that were added over the past few
months.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-yv4hwf9yhnzoada8pcpb3a97@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

e7e7ee2e

03 5月, 2011 1 次提交

perf: Start the restructuring · fae85b7c

由 Borislav Petkov 提交于 10月 26, 2010

mv kernel/perf_event.c -> kernel/events/core.c. From there, all further
sensible splitting can happen. The idea is that due to perf_event.c
becoming pretty sizable and with the advent of the marriage with ftrace,
splitting functionality into its logical parts should help speeding up
the unification and to manage the complexity of the subsystem.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

fae85b7c

11 4月, 2011 1 次提交

perf_event: Fix cgrp event scheduling bug in perf_enable_on_exec() · e566b76e

由 Stephane Eranian 提交于 4月 06, 2011

There is a bug in perf_event_enable_on_exec() when cgroup events are
active on a CPU: the cgroup events may be scheduled twice causing event
state corruptions which eventually may lead to kernel panics.

The reason is that the function needs to first schedule out the cgroup
events, just like for the per-thread events. The cgroup event are
scheduled back in automatically from the perf_event_context_sched_in()
function.

The patch also adds a WARN_ON_ONCE() is perf_cgroup_switch() to catch any
bogus state.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110406005454.GA1062@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

e566b76e

05 4月, 2011 1 次提交

jump label: Introduce static_branch() interface · d430d3d7

由 Jason Baron 提交于 3月 16, 2011

Introduce:

static __always_inline bool static_branch(struct jump_label_key *key);

instead of the old JUMP_LABEL(key, label) macro.

In this way, jump labels become really easy to use:

Define:

        struct jump_label_key jump_key;

Can be used as:

        if (static_branch(&jump_key))
                do unlikely code

enable/disale via:

        jump_label_inc(&jump_key);
        jump_label_dec(&jump_key);

that's it!

For the jump labels disabled case, the static_branch() becomes an
atomic_read(), and jump_label_inc()/dec() are simply atomic_inc(),
atomic_dec() operations. We show testing results for this change below.

Thanks to H. Peter Anvin for suggesting the 'static_branch()' construct.

Since we now require a 'struct jump_label_key *key', we can store a pointer into
the jump table addresses. In this way, we can enable/disable jump labels, in
basically constant time. This change allows us to completely remove the previous
hashtable scheme. Thanks to Peter Zijlstra for this re-write.

Testing:

I ran a series of 'tbench 20' runs 5 times (with reboots) for 3
configurations, where tracepoints were disabled.

jump label configured in
avg: 815.6

jump label *not* configured in (using atomic reads)
avg: 800.1

jump label *not* configured in (regular reads)
avg: 803.4
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110316212947.GA8792@redhat.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
Suggested-by: NH. Peter Anvin <hpa@linux.intel.com>
Tested-by: NDavid Daney <ddaney@caviumnetworks.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d430d3d7

31 3月, 2011 2 次提交

perf: Fix task_struct reference leak · fd1edb3a

由 Peter Zijlstra 提交于 3月 28, 2011

sys_perf_event_open() had an imbalance in the number of task refs it
took causing memory leakage

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@kernel.org # .37+
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fd1edb3a

perf: Rebase max unprivileged mlock threshold on top of page size · 20443384

由 Frederic Weisbecker 提交于 3月 31, 2011

Ensure we allow 512 kiB + 1 page for user control without
assuming a 4096 bytes page size.
Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
LKML-Reference: <1301535209-9679-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

20443384

24 3月, 2011 1 次提交

perf: Better fit max unprivileged mlock pages for tools needs · 880f5731

由 Frederic Weisbecker 提交于 3月 23, 2011

The maximum kilobytes of locked memory that an unprivileged user
can reserve is of 512 kB = 128 pages by default, scaled to the
number of onlined CPUs, which fits well with the tools that use
128 data pages by default.

However tools actually use 129 pages, because they need one more
for the user control page. Thus the default mlock threshold is
not sufficient for the default tools needs and we always end up
to evaluate the constant mlock rlimit policy, which doesn't have
this scaling with the number of online CPUs.

Hence, on systems that have more than 16 CPUs, we overlap the
rlimit threshold and fail to mmap:

	$ perf record ls
	Error: failed to mmap with 1 (Operation not permitted)

Just increase the max unprivileged mlock threshold by one page
so that it supports well perf tools even after 16 CPUs.
Reported-by: NHan Pingtian <phan@redhat.com>
Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stable <stable@kernel.org>
LKML-Reference: <1300904979-5508-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

880f5731

23 3月, 2011 1 次提交

perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx() · 68cacd29

由 Stephane Eranian 提交于 3月 23, 2011

This patch solves a stale pointer problem in
update_cgrp_time_from_cpuctx(). The cpuctx->cgrp
was not cleared on all possible event exit paths,
including:

   close()
     perf_release()
       perf_release_kernel()
         list_del_event()

This patch fixes list_del_event() to clear cpuctx->cgrp
when there are no cgroup events left in the context.

[ This second version makes the code compile when
  CONFIG_CGROUP_PERF is not enabled. We unconditionally define
  perf_cpu_context->cgrp. ]
Signed-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: perfmon2-devel@lists.sf.net
Cc: paulus@samba.org
Cc: davem@davemloft.net
LKML-Reference: <20110323150306.GA1580@quad>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

68cacd29

16 3月, 2011 3 次提交

perf: Fix tear-down of inherited group events · 38b435b1

由 Peter Zijlstra 提交于 3月 15, 2011

When destroying inherited events, we need to destroy groups too,
otherwise the event iteration in perf_event_exit_task_context() will
miss group siblings and we leak events with all the consequences.
Reported-and-tested-by: NVince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org> # .35+
LKML-Reference: <1300196470.2203.61.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

38b435b1

perf: Handle stopped state with tracepoints · a0f7d0f7

由 Frederic Weisbecker 提交于 3月 07, 2011

We toggle the state from start and stop callbacks but actually
don't check it when the event triggers. Do it so that
these callbacks actually work.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1299529629-18280-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a0f7d0f7

perf: Fix the software events state check · 91b2f482

由 Frederic Weisbecker 提交于 3月 07, 2011

Fix the mistakenly inverted check of events state.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@kernel.org>
LKML-Reference: <1299529629-18280-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

91b2f482

04 3月, 2011 5 次提交

perf: Fix cgroup vs jump_label problem · 08309379

由 Peter Zijlstra 提交于 3月 03, 2011

Li Zefan reported that the jump label code sleeps and we're calling it
under a spinlock, *fail* ;-)
Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

08309379

perf cgroup: Clean up perf_cgroup_create() · 1b15d055

由 Li Zefan 提交于 3月 03, 2011

- Use kzalloc() to replace kmalloc() + memset().

- Remove redundant initialization, since alloc_percpu() returns
  zero-filled percpu memory.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4D6F347E.2010806@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1b15d055

perf cgroup: Fix unmatched call to perf_detach_cgroup() · f75e18cb

由 Li Zefan 提交于 3月 03, 2011

In the failure path, we call perf_detach_cgroup(), but we didn't
call perf_get_cgroup() prio to it.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4D6F346E.9070606@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f75e18cb

perf cgroup: Fix leak of file reference count · 3db272c0

由 Li Zefan 提交于 3月 03, 2011

In perf_cgroup_connect(), fput_light() is missing in a failure path.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4D6F3461.6060406@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3db272c0

perf: Fix the missing event initialization when pmu is found in idr · 940c5b29

由 Lin Ming 提交于 2月 27, 2011

Currently, the event is not initialized if pmu is found in idr. This
never causes bug just because now no pmu is associated with the idr
id.
Signed-off-by: NLin Ming <ming.m.lin@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1298812411.2699.9.camel@localhost>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

940c5b29

23 2月, 2011 2 次提交

perf: Simplify task_clock_event_read() · 768a06e2

由 Peter Zijlstra 提交于 2月 22, 2011

There is no point in us having different code paths for nmi and !nmi
here, so remove the !nmi one.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

768a06e2

perf_events: Fix rcu and locking issues with cgroup support · 3f7cce3c

由 Stephane Eranian 提交于 2月 18, 2011

This patches ensures that we do not end up calling
perf_cgroup_from_task() when there is no cgroup event.
This avoids potential RCU and locking issues.

The change in perf_cgroup_set_timestamp() ensures we
check against ctx->nr_cgroups. It also avoids calling
perf_clock() tiwce in a row. It also ensures we do need
to grab ctx->lock before calling the function.

We drop update_cgrp_time() from task_clock_event_read()
because it is not needed. This also avoids having to
deal with perf_cgroup_from_task().

Thanks to Peter Zijlstra for his help on this.
Signed-off-by: NStephane Eranian <eranian@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d5e76b8.815bdf0a.7ac3.774f@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3f7cce3c

16 2月, 2011 4 次提交

perf: Optimize hrtimer events · ba3dd36c

由 Peter Zijlstra 提交于 2月 15, 2011

There is no need to re-initialize the hrtimer every time we start it,
so don't do that (shaves a few cycles). Also, since we know hrtimers
run at a fixed rate (nanoseconds) we can pre-compute the desired
frequency at which they tick. This avoids us having to go through the
whole adaptive frequency feedback logic (shaves another few cycles).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1297448589.5226.47.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ba3dd36c

perf: Optimize throttling code · 163ec435

由 Peter Zijlstra 提交于 2月 16, 2011

By pre-computing the maximum number of samples per tick we can avoid a
multiplication and a conditional since MAX_INTERRUPTS >
max_samples_per_tick.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

163ec435

perf: Add cgroup support · e5d1367f

由 Stephane Eranian 提交于 2月 14, 2011

This kernel patch adds the ability to filter monitoring based on
container groups (cgroups). This is for use in per-cpu mode only.

The cgroup to monitor is passed as a file descriptor in the pid
argument to the syscall. The file descriptor must be opened to
the cgroup name in the cgroup filesystem. For instance, if the
cgroup name is foo and cgroupfs is mounted in /cgroup, then the
file descriptor is opened to /cgroup/foo. Cgroup mode is
activated by passing PERF_FLAG_PID_CGROUP in the flags argument
to the syscall.

For instance to measure in cgroup foo on CPU1 assuming
cgroupfs is mounted under /cgroup:

struct perf_event_attr attr;
int cgroup_fd, fd;

cgroup_fd = open("/cgroup/foo", O_RDONLY);
fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
close(cgroup_fd);
Signed-off-by: NStephane Eranian <eranian@google.com>
[ added perf_cgroup_{exit,attach} ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e5d1367f

perf: Fix throttle logic · 4fe757dd

由 Peter Zijlstra 提交于 2月 15, 2011

It was possible to call pmu::start() on an already running event. In
particular this lead so some wreckage as the hrtimer events would
re-initialize active timers.

This was due to throttled events being activated again by scheduling.
Scheduling in a context would add and force start events, resulting in
running events with a possible throttle status. The next tick to hit
that task will then try to unthrottle the event and call ->start() on
an already running event.
Reported-by: NJeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4fe757dd

03 2月, 2011 2 次提交

perf: Fix reading in perf_event_read() · 542e72fc

由 Peter Zijlstra 提交于 1月 26, 2011

It is quite possible for the event to have been disabled between
perf_event_read() sending the IPI and the CPU servicing the IPI and
calling __perf_event_read(), hence revalidate the state.
Reported-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

542e72fc

perf: Cure task_oncpu_function_call() races · fe4b04fa

由 Peter Zijlstra 提交于 2月 02, 2011

Oleg reported that on architectures with
__ARCH_WANT_INTERRUPTS_ON_CTXSW the IPI from
task_oncpu_function_call() can land before perf_event_task_sched_in()
and cause interesting situations for eg. perf_install_in_context().

This patch reworks the task_oncpu_function_call() interface to give a
more usable primitive as well as rework all its users to hopefully be
more obvious as well as remove the races.

While looking at the code I also found a number of races against
perf_event_task_sched_out() which can flip contexts between tasks so
plug those too.
Reported-and-reviewed-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fe4b04fa

28 1月, 2011 1 次提交

perf: Fix alloc_callchain_buffers() · 88d4f0db

由 Eric Dumazet 提交于 1月 25, 2011

Commit 927c7a9e ("perf: Fix race in callchains") introduced
a mismatch in the sizing of struct callchain_cpus_entries.

nr_cpu_ids must be used instead of num_possible_cpus(), or we
might get out of bound memory accesses on some machines.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephane Eranian <eranian@google.com>
CC: stable@kernel.org
LKML-Reference: <1295980851.3588.351.camel@edumazet-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88d4f0db

22 1月, 2011 1 次提交

perf: perf_event_exit_task_context: s/rcu_dereference/rcu_dereference_raw/ · 806839b2

由 Oleg Nesterov 提交于 1月 21, 2011

In theory, almost every user of task->child->perf_event_ctxp[]
is wrong. find_get_context() can install the new context at any
moment, we need read_barrier_depends().

dbe08d82 "perf: Fix
find_get_context() vs perf_event_exit_task() race" added
rcu_dereference() into perf_event_exit_task_context() to make
the precedent, but this makes __rcu_dereference_check() unhappy.
Use rcu_dereference_raw() to shut up the warning.
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: acme@redhat.com
Cc: paulus@samba.org
Cc: stern@rowland.harvard.edu
Cc: a.p.zijlstra@chello.nl
Cc: fweisbec@gmail.com
Cc: roland@redhat.com
Cc: prasad@linux.vnet.ibm.com
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
LKML-Reference: <20110121174547.GA8796@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

806839b2

21 1月, 2011 1 次提交

perf: Annotate cpuctx->ctx.mutex to avoid a lockdep splat · 547e9fd7

由 Peter Zijlstra 提交于 1月 19, 2011

Lockdep spotted:

	loop_1b_instruc/1899 is trying to acquire lock:
	 (event_mutex){+.+.+.}, at: [<ffffffff810e1908>] perf_trace_init+0x3b/0x2f7

	but task is already holding lock:
	 (&ctx->mutex){+.+.+.}, at: [<ffffffff810eb45b>] perf_event_init_context+0xc0/0x218

	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #3 (&ctx->mutex){+.+.+.}:
	-> #2 (cpu_hotplug.lock){+.+.+.}:
	-> #1 (module_mutex){+.+...}:
	-> #0 (event_mutex){+.+.+.}:

But because the deadlock would be cpuhotplug (cpu-event) vs fork
(task-event) it cannot, in fact, happen. We can annotate this by giving the
perf_event_context used for the cpuctx a different lock class from those
used by tasks.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

547e9fd7

20 1月, 2011 2 次提交

perf: Fix perf_event_init_task()/perf_event_free_task() interaction · 8550d7cb

由 Oleg Nesterov 提交于 1月 19, 2011

perf_event_init_task() should clear child->perf_event_ctxp[]
before anything else. Otherwise, if
perf_event_init_context(perf_hw_context) fails,
perf_event_free_task() can free perf_event_ctxp[perf_sw_context]
copied from parent->perf_event_ctxp[] by dup_task_struct().

Also move the initialization of perf_event_mutex and
perf_event_list from perf_event_init_context() to
perf_event_init_context().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <20110119182228.GC12183@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8550d7cb

perf: Fix find_get_context() vs perf_event_exit_task() race · dbe08d82

由 Oleg Nesterov 提交于 1月 19, 2011

find_get_context() must not install the new perf_event_context
if the task has already passed perf_event_exit_task().

If nothing else, this means the memory leak. Initially
ctx->refcount == 2, it is supposed that
perf_event_exit_task_context() should participate and do the
necessary put_ctx().

find_lively_task_by_vpid() checks PF_EXITING but this buys
nothing, by the time we call find_get_context() this task can be
already dead. To the point, cmpxchg() can succeed when the task
has already done the last schedule().

Change find_get_context() to populate task->perf_event_ctxp[]
under task->perf_event_mutex, this way we can trust PF_EXITING
because perf_event_exit_task() takes the same mutex.

Also, change perf_event_exit_task_context() to use
rcu_dereference(). Probably this is not strictly needed, but
with or without this change find_get_context() can race with
setup_new_exec()->perf_event_exit_task(), rcu_dereference()
looks better.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
LKML-Reference: <20110119182207.GB12183@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dbe08d82

19 1月, 2011 1 次提交

perf: Validate cpu early in perf_event_alloc() · 66832eb4

由 Oleg Nesterov 提交于 1月 18, 2011

Starting from perf_event_alloc()->perf_init_event(), the kernel
assumes that event->cpu is either -1 or the valid CPU number.

Change perf_event_alloc() to validate this argument early. This
also means we can remove the similar check in
find_get_context().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: gregkh@suse.de
Cc: stable@kernel.org
LKML-Reference: <20110118161032.GC693@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

66832eb4