- 23 11月, 2015 3 次提交
-
-
由 Peter Zijlstra 提交于
There were still a number of references to my old Red Hat email address in the kernel source. Remove these while keeping the Red Hat copyright notices intact. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch reinforces the lockdep checks performed by perf_cgroup_from_tsk() by passing the perf_event_context whenever possible. It is okay to not hold the RCU read lock when we know we hold the ctx->lock. This patch makes sure this property holds. In some functions, such as perf_cgroup_sched_in(), we do not pass the context because we are sure we are holding the RCU read lock. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: edumazet@google.com Link: http://lkml.kernel.org/r/1447322404-10920-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
The RCU checker detected RCU violation in the cgroup switching routines perf_cgroup_sched_in() and perf_cgroup_sched_out(). We were dereferencing cgroup from task without holding the RCU lock. Fix this by holding the RCU read lock. We move the locking from perf_cgroup_switch() to avoid double locking. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: edumazet@google.com Link: http://lkml.kernel.org/r/1447322404-10920-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 11月, 2015 2 次提交
-
-
由 Peter Zijlstra 提交于
Arnaldo reported that tracepoint filters seem to misbehave (ie. not apply) on inherited events. The fix is obvious; filters are only set on the actual (parent) event, use the normal pattern of using this parent event for filters. This is safe because each child event has a reference to it. Reported-by: NArnaldo Carvalho de Melo <acme@kernel.org> Tested-by: NArnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang Nan <wangnan0@huawei.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151102095051.GN17308@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Paul E. McKenney 提交于
The perf_lock_task_context() function disables preemption across its RCU read-side critical section because that critical section acquires a scheduler lock. If there was a preemption during that RCU read-side critical section, the rcu_read_unlock() could attempt to acquire scheduler locks, resulting in deadlock. However, recent optimizations to expedited grace periods mean that IPI handlers that execute during preemptible RCU read-side critical sections can now cause the subsequent rcu_read_unlock() to acquire scheduler locks. Disabling preemption does nothiing to prevent these IPI handlers from executing, so these optimizations introduced a deadlock. In theory, this deadlock could be avoided by pulling all wakeups and printk()s out from rnp->lock critical sections, but in practice this would re-introduce some RCU CPU stall warning bugs. Given that acquiring scheduler locks entails disabling interrupts, these deadlocks can be avoided by disabling interrupts (instead of disabling preemption) across any RCU read-side critical that acquires scheduler locks and holds them across the rcu_read_unlock(). This commit therefore makes this change for perf_lock_task_context(). Reported-by: NDave Jones <davej@codemonkey.org.uk> Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20151104134838.GR29027@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 11月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
This seems to be a mis-reading of how alpha memory ordering works, and is not backed up by the alpha architecture manual. The helper functions don't do anything special on any other architectures, and the arguments that support them being safe on other architectures also argue that they are safe on alpha. Basically, the "control dependency" is between a previous read and a subsequent write that is dependent on the value read. Even if the subsequent write is actually done speculatively, there is no way that such a speculative write could be made visible to other cpu's until it has been committed, which requires validating the speculation. Note that most weakely ordered architectures (very much including alpha) do not guarantee any ordering relationship between two loads that depend on each other on a control dependency: read A if (val == 1) read B because the conditional may be predicted, and the "read B" may be speculatively moved up to before reading the value A. So we require the user to insert a smp_rmb() between the two accesses to be correct: read A; if (A == 1) smp_rmb() read B Alpha is further special in that it can break that ordering even if the *address* of B depends on the read of A, because the cacheline that is read later may be stale unless you have a memory barrier in between the pointer read and the read of the value behind a pointer: read ptr read offset(ptr) whereas all other weakly ordered architectures guarantee that the data dependency (as opposed to just a control dependency) will order the two accesses. As a result, alpha needs a "smp_read_barrier_depends()" in between those two reads for them to be ordered. The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()" had was a control dependency to a subsequent *write*, however, and nobody can finalize such a subsequent write without having actually done the read. And were you to write such a value to a "stale" cacheline (the way the unordered reads came to be), that would seem to lose the write entirely. So the things that make alpha able to re-order reads even more aggressively than other weak architectures do not seem to be relevant for a subsequent write. Alpha memory ordering may be strange, but there's no real indication that it is *that* strange. Also, the alpha architecture reference manual very explicitly talks about the definition of "Dependence Constraints" in section 5.6.1.7, where a preceding read dominates a subsequent write. Such a dependence constraint admittedly does not impose a BEFORE (alpha architecture term for globally visible ordering), but it does guarantee that there can be no "causal loop". I don't see how you could avoid such a loop if another cpu could see the stored value and then impact the value of the first read. Put another way: the read and the write could not be seen as being out of order wrt other cpus. So I do not see how these "x_ctrl()" functions can currently be necessary. I may have to eat my words at some point, but in the absense of clear proof that alpha actually needs this, or indeed even an explanation of how alpha could _possibly_ need it, I do not believe these functions are called for. And if it turns out that alpha really _does_ need a barrier for this case, that barrier still should not be "smp_read_barrier_depends()". We'd have to make up some new speciality barrier just for alpha, along with the documentation for why it really is necessary. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul E McKenney <paulmck@us.ibm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 10月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
Instead of WARN_ON in perf_event_output() on unpaded raw samples, pad them automatically. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 10月, 2015 1 次提交
-
-
由 Tejun Heo 提交于
cgroup_exit() is called when a task exits and disassociates the exiting task from its cgroups and half-attach it to the root cgroup. This is unnecessary and undesirable. No controller actually needs an exiting task to be disassociated with non-root cgroups. Both cpu and perf_event controllers update the association to the root cgroup from their exit callbacks just to keep consistent with the cgroup core behavior. Also, this disassociation makes it difficult to track resources held by zombies or determine where the zombies came from. Currently, pids controller is completely broken as it uncharges on exit and zombies always escape the resource restriction. With cgroup association being reset on exit, fixing it is pretty painful. There's no reason to reset cgroup membership on exit. The zombie can be removed from its css_set so that it doesn't show up on "cgroup.procs" and thus can't be migrated or interfere with cgroup removal. It can still pin and point to the css_set so that its cgroup membership is maintained. This patch makes cgroup core keep zombies associated with their cgroups at the time of exit. * Previous patches decoupled populated_cnt tracking from css_set lifetime, so a dying task can be simply unlinked from its css_set while pinning and pointing to the css_set. This keeps css_set association from task side alive while hiding it from "cgroup.procs" and populated_cnt tracking. The css_set reference is dropped when the task_struct is freed. * ->exit() callback no longer needs the css arguments as the associated css never changes once PF_EXITING is set. Removed. * cpu and perf_events controllers no longer need ->exit() callbacks. There's no reason to explicitly switch away on exit. The final schedule out is enough. The callbacks are removed. * On traditional hierarchies, nothing changes. "/proc/PID/cgroup" still reports "/" for all zombies. On the default hierarchy, "/proc/PID/cgroup" keeps reporting the cgroup that the task belonged to at the time of exit. If the cgroup gets removed before the task is reaped, " (deleted)" is appended. v2: Build brekage due to missing dummy cgroup_free() when !CONFIG_CGROUP fixed. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
-
- 28 9月, 2015 1 次提交
-
-
由 Geliang Tang 提交于
Fixes various sparse warnings. Signed-off-by: NGeliang Tang <geliangtang@163.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/70c14234da1bed6e3e67b9c419e2d5e376ab4f32.1443367286.git.geliangtang@163.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 9月, 2015 3 次提交
-
-
由 Peter Zijlstra 提交于
There are two races with the current code: - Another event can join the group and compute a larger header_size concurrently, if the smaller store wins we'll have an incorrect header_size set. - We compute the header_size after the event becomes active, therefore its possible to use the size before its computed. Remedy the first by moving the computation inside the ctx::mutex lock, and the second by placing it _before_ perf_install_in_context(). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Vince reported that its possible to overflow the various size fields and get weird stuff if you stick too many events in a group. Put a lid on this by requiring the fixed record size not exceed 16k. This is still a fair amount of events (silly amount really) and leaves plenty room for callchains and stack dwarves while also avoiding overflowing the u16 variables. Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The exclusive_event_installable() stuff only works because its exclusive with the grouping bits. Rework the code such that there is a sane place to error out before we go do things we cannot undo. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 9月, 2015 8 次提交
-
-
由 Sukadev Bhattiprolu 提交于
Define a new PERF_PMU_TXN_READ interface to read a group of counters at once. pmu->start_txn() // Initialize before first event for each event in group pmu->read(event); // Queue each event to be read rc = pmu->commit_txn() // Read/update all queued counters Note that we use this interface with all PMUs. PMUs that implement this interface use the ->read() operation to _queue_ the counters to be read and use ->commit_txn() to actually read all the queued counters at once. PMUs that don't implement PERF_PMU_TXN_READ ignore ->start_txn() and ->commit_txn() and continue to read counters one at a time. Thanks to input from Peter Zijlstra. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-9-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sukadev Bhattiprolu 提交于
When we implement the ability to read several counters at once (using the PERF_PMU_TXN_READ transaction interface), perf_event_read() can fail when the 'group' parameter is true (eg: trying to read too many events at once). For now, have perf_event_read() return an integer. Ignore the return value when the 'group' parameter is false. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-8-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In order to enable the use of perf_event_read(.group = true), we need to invert the sibling-child loop nesting of perf_read_group(). Currently we iterate the child list for each sibling, this precludes using group reads. Flip things around so we iterate each group for each child. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> [ Made the patch compile and things. ] Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-7-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Enable perf_event_read() to update entire groups at once, this will be useful for read transactions. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20150723080435.GE25159@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra (Intel) 提交于
In order to free up the perf_event_read_group() name: s/perf_event_read_\(one\|group\)/perf_read_\1/g s/perf_read_hw/__perf_read/g Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-5-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sukadev Bhattiprolu 提交于
perf_event_read() does two things: - call the PMU to read/update the counter value, and - compute the total count of the event and its children Not all callers need both. perf_event_reset() for instance needs the first piece but doesn't need the second. Similarly, when we implement the ability to read a group of events using the transaction interface, we would need the two pieces done independently. Break up perf_event_read() and have it just read/update the counter and have the callers compute the total count if necessary. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-4-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sukadev Bhattiprolu 提交于
Currently, the PMU interface allows reading only one counter at a time. But some PMUs like the 24x7 counters in Power, support reading several counters at once. To leveage this functionality, extend the transaction interface to support a "transaction type". The first type, PERF_PMU_TXN_ADD, refers to the existing transactions, i.e. used to _schedule_ all the events on the PMU as a group. A second transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch, by the 24x7 counters to read several counters at once. Extend the transaction interfaces to the PMU to accept a 'txn_flags' parameter and use this parameter to ignore any transactions that are not of type PERF_PMU_TXN_ADD. Thanks to Peter Zijlstra for his input. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [peterz: s390 compile fix] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
cgroup_exit() is not called from copy_process() after commit: e8604cb4 ("cgroup: fix spurious lockdep warning in cgroup_exit()") from do_exit(). So this check is useless and the comment is obsolete. Signed-off-by: NKirill Tkhai <ktkhai@odin.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/55E444C8.3020402@odin.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 9月, 2015 1 次提交
-
-
由 Dave Young 提交于
There are two kexec load syscalls, kexec_load another and kexec_file_load. kexec_file_load has been splited as kernel/kexec_file.c. In this patch I split kexec_load syscall code to kernel/kexec.c. And add a new kconfig option KEXEC_CORE, so we can disable kexec_load and use kexec_file_load only, or vice verse. The original requirement is from Ted Ts'o, he want kexec kernel signature being checked with CONFIG_KEXEC_VERIFY_SIG enabled. But kexec-tools use kexec_load syscall can bypass the checking. Vivek Goyal proposed to create a common kconfig option so user can compile in only one syscall for loading kexec kernel. KEXEC/KEXEC_FILE selects KEXEC_CORE so that old config files still work. Because there's general code need CONFIG_KEXEC_CORE, so I updated all the architecture Kconfig with a new option KEXEC_CORE, and let KEXEC selects KEXEC_CORE in arch Kconfig. Also updated general kernel code with to kexec_load syscall. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NDave Young <dyoung@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 8月, 2015 3 次提交
-
-
由 Alexander Shishkin 提交于
A question [1] was raised about the use of page::private in AUX buffer allocations, so let's add a clarification about its intended use. The private field and flag are used by perf's rb_alloc_aux() path to tell the pmu driver the size of each high-order allocation, so that the driver can program those appropriately into its hardware. This only matters for PMUs that don't support hardware scatter tables. Otherwise, every page in the buffer is just a page. This patch adds a comment about the private field to the AUX buffer allocation path. [1] http://marc.info/?l=linux-kernel&m=143803696607968Reported-by: NMathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1438063204-665-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
I ran the perf fuzzer, which triggered some WARN()s which are due to trying to stop/restart an event on the wrong CPU. Use the normal IPI pattern to ensure we run the code on the correct CPU. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: bad7192b ("perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the period") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ben Hutchings 提交于
If rb->aux_refcount is decremented to zero before rb->refcount, __rb_free_aux() may be called twice resulting in a double free of rb->aux_pages. Fix this by adding a check to __rb_free_aux(). Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 57ffc5ca ("perf: Fix AUX buffer refcounting") Link: http://lkml.kernel.org/r/1437953468.12842.17.camel@decadent.org.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 8月, 2015 1 次提交
-
-
由 Kaixu Xia 提交于
This patch add three core perf APIs: - perf_event_attrs(): export the struct perf_event_attr from struct perf_event; - perf_event_get(): get the struct perf_event from the given fd; - perf_event_read_local(): read the events counters active on the current CPU; These APIs are needed when accessing events counters in eBPF programs. The API perf_event_read_local() comes from Peter and I add the corresponding SOB. Signed-off-by: NKaixu Xia <xiakaixu@huawei.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 8月, 2015 1 次提交
-
-
由 Wang Nan 提交于
By copying BPF related operation to uprobe processing path, this patch allow users attach BPF programs to uprobes like what they are already doing on kprobes. After this patch, users are allowed to use PERF_EVENT_IOC_SET_BPF on a uprobe perf event. Which make it possible to profile user space programs and kernel events together using BPF. Because of this patch, CONFIG_BPF_EVENTS should be selected by CONFIG_UPROBE_EVENT to ensure trace_call_bpf() is compiled even if KPROBE_EVENT is not set. Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaixu Xia <xiakaixu@huawei.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Zefan Li <lizefan@huawei.com> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1435716878-189507-3-git-send-email-wangnan0@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
- 04 8月, 2015 2 次提交
-
-
由 Alexander Shishkin 提交于
Currently, the PT driver zeroes out the status register every time before starting the event. However, all the writable bits are already taken care of in pt_handle_status() function, except the new PacketByteCnt field, which in new versions of PT contains the number of packet bytes written since the last sync (PSB) packet. Zeroing it out before enabling PT forces a sync packet to be written. This means that, with the existing code, a sync packet (PSB and PSBEND, 18 bytes in total) will be generated every time a PT event is scheduled in. To avoid these unnecessary syncs and save a WRMSR in the fast path, this patch changes the default behavior to not clear PacketByteCnt field, so that the sync packets will be generated with the period specified as "psb_period" attribute config field. This has little impact on the trace data as the other packets that are normally sent within PSB+ (between PSB and PSBEND) have their own generation scenarios which do not depend on the sync packets. One exception where we do need to force PSB like this when tracing starts, so that the decoder has a clear sync point in the trace. For this purpose we aready have hw::itrace_started flag, which we are currently using to output PERF_RECORD_ITRACE_START. This patch moves setting itrace_started from perf core to the pmu::start, where it should still be 0 on the very first run. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1438264104-16189-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Vince reported that the fasync signal stuff doesn't work proper for inherited events. So fix that. Installing fasync allocates memory and sets filp->f_flags |= FASYNC, which upon the demise of the file descriptor ensures the allocation is freed and state is updated. Now for perf, we can have the events stick around for a while after the original FD is dead because of references from child events. So we cannot copy the fasync pointer around. We can however consistently use the parent's fasync, as that will be updated. Reported-and-Tested-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho deMelo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 7月, 2015 12 次提交
-
-
由 Oleg Nesterov 提交于
The xol_free_insn_slot()->waitqueue_active() check is buggy. We need mb() after we set the conditon for wait_event(), or xol_take_insn_slot() can miss the wakeup. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134036.GA4799@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Change xol_add_vma() to use _install_special_mapping(), this way we can name the vma installed by uprobes. Currently it looks like private anonymous mapping, this is confusing and complicates the debugging. With this change /proc/$pid/maps reports "[uprobes]". As a side effect this will cause core dumps to include the XOL vma and I think this is good; this can help to debug the problem if the app crashed because it was probed. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134033.GA4796@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
install_special_mapping(pages) expects that "pages" is the zero- terminated array while xol_add_vma() passes &area->page, this means that special_mapping_fault() can wrongly use the next member in xol_area (vaddr) as "struct page *". Fortunately, this area is not expandable so pgoff != 0 isn't possible (modulo bugs in special_mapping_vmops), but still this does not look good. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Anand <panand@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134031.GA4789@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
The previous change documents that cleanup_return_instances() can't always detect the dead frames, the stack can grow. But there is one special case which imho worth fixing: arch_uretprobe_is_alive() can return true when the stack didn't actually grow, but the next "call" insn uses the already invalidated frame. Test-case: #include <stdio.h> #include <setjmp.h> jmp_buf jmp; int nr = 1024; void func_2(void) { if (--nr == 0) return; longjmp(jmp, 1); } void func_1(void) { setjmp(jmp); func_2(); } int main(void) { func_1(); return 0; } If you ret-probe func_1() and func_2() prepare_uretprobe() hits the MAX_URETPROBE_DEPTH limit and "return" from func_2() is not reported. When we know that the new call is not chained, we can do the more strict check. In this case "sp" points to the new ret-addr, so every frame which uses the same "sp" must be dead. The only complication is that arch_uretprobe_is_alive() needs to know was it chained or not, so we add the new RP_CHECK_CHAIN_CALL enum and change prepare_uretprobe() to pass RP_CHECK_CALL only if !chained. Note: arch_uretprobe_is_alive() could also re-read *sp and check if this word is still trampoline_vaddr. This could obviously improve the logic, but I would like to avoid another copy_from_user() especially in the case when we can't avoid the false "alive == T" positives. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134028.GA4786@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
arch/x86 doesn't care (so far), but as Pratyush Anand pointed out other architectures might want why arch_uretprobe_is_alive() was called and use different checks depending on the context. Add the new argument to distinguish 2 callers. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134026.GA4779@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Change prepare_uretprobe() to flush the !arch_uretprobe_is_alive() return_instance's. This is not needed correctness-wise, but can help to avoid the failure caused by MAX_URETPROBE_DEPTH. Note: in this case arch_uretprobe_is_alive() can be false positive, the stack can grow after longjmp(). Unfortunately, the kernel can't 100% solve this problem, but see the next patch. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134023.GA4776@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Test-case: #include <stdio.h> #include <setjmp.h> jmp_buf jmp; void func_2(void) { longjmp(jmp, 1); } void func_1(void) { if (setjmp(jmp)) return; func_2(); printf("ERR!! I am running on the caller's stack\n"); } int main(void) { func_1(); return 0; } fails if you probe func_1() and func_2() because handle_trampoline() assumes that the probed function should must return and hit the bp installed be prepare_uretprobe(). But in this case func_2() does not return, so when func_1() returns the kernel uses the no longer valid return_instance of func_2(). Change handle_trampoline() to unwind ->return_instances until we know that the next chain is alive or NULL, this ensures that the current chain is the last we need to report and free. Alternatively, every return_instance could use unique trampoline_vaddr, in this case we could use it as a key. And this could solve the problem with sigaltstack() automatically. But this approach needs more changes, and it puts the "hard" limit on MAX_URETPROBE_DEPTH. Plus it can not solve another problem partially fixed by the next patch. Note: this change has no effect on !x86, the arch-agnostic version of arch_uretprobe_is_alive() just returns "true". TODO: as documented by the previous change, arch_uretprobe_is_alive() can be fooled by sigaltstack/etc. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134021.GA4773@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Add the x86 specific version of arch_uretprobe_is_alive() helper. It returns true if the stack frame mangled by prepare_uretprobe() is still on stack. So if it returns false, we know that the probed function has already returned. We add the new return_instance->stack member and change the generic code to initialize it in prepare_uretprobe, but it should be equally useful for other architectures. TODO: this assumes that the probed application can't use multiple stacks (say sigaltstack). We will try to improve this logic later. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134018.GA4766@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Add the new "weak" helper, arch_uretprobe_is_alive(), used by the next patches. It should return true if this return_instance is still valid. The arch agnostic version just always returns true. The patch exports "struct return_instance" for the architectures which want to override this hook. We can also cleanup prepare_uretprobe() if we pass the new return_instance to arch_uretprobe_hijack_return_addr(). Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134016.GA4762@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
No functional changes, preparation. Add the new helper, find_next_ret_chain(), which finds the first !chained entry and returns its ->next. Yes, it is suboptimal. We probably want to turn ->chained into ->start_of_this_chain pointer and avoid another loop. But this needs the boring changes in dup_utask(), so lets do this later. Change the main loop in handle_trampoline() to unwind the stack until ri is equal to the pointer returned by this new helper. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134013.GA4755@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Turn the last pr_warn() in uprobes.c into uprobe_warn(). While at it: - s/kzalloc/kmalloc, we initialize every member of 'ri' - remove the pointless comment above the obvious code Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134010.GA4752@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
1. It doesn't make sense to continue if handle_trampoline() fails, change handle_swbp() to always return after this call. 2. Turn pr_warn() into uprobe_warn(), and change handle_trampoline() to send SIGILL on failure. It is pointless to return to user mode with the corrupted instruction_pointer() which we can't restore. Tested-by: NPratyush Anand <panand@redhat.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NAnton Arapov <arapov@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150721134008.GA4745@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-