- 07 6月, 2015 6 次提交
-
-
由 Yan, Zheng 提交于
Flush the PEBS buffer during context switches if PEBS interrupt threshold is larger than one. This allows perf to supply TID for sample outputs. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-6-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
PEBS always had the capability to log samples to its buffers without an interrupt. Traditionally perf has not used this but always set the PEBS threshold to one. For frequently occurring events (like cycles or branches or load/store) this in term requires using a relatively high sampling period to avoid overloading the system, by only processing PMIs. This in term increases sampling error. For the common cases we still need to use the PMI because the PEBS hardware has various limitations. The biggest one is that it can not supply a callgraph. It also requires setting a fixed period, as the hardware does not support adaptive period. Another issue is that it cannot supply a time stamp and some other options. To supply a TID it requires flushing on context switch. It can however supply the IP, the load/store address, TSX information, registers, and some other things. So we can make PEBS work for some specific cases, basically as long as you can do without a callgraph and can set the period you can use this new PEBS mode. The main benefit is the ability to support much lower sampling period (down to -c 1000) without extensive overhead. One use cases is for example to increase the resolution of the c2c tool. Another is double checking when you suspect the standard sampling has too much sampling error. Some numbers on the overhead, using cycle soak, comparing the elapsed time from "kernbench -M -H" between plain (threshold set to one) and multi (large threshold). The test command for plain: "perf record --time -e cycles:p -c $period -- kernbench -M -H" The test command for multi: "perf record --no-time -e cycles:p -c $period -- kernbench -M -H" ( The only difference of test command between multi and plain is time stamp options. Since time stamp is not supported by large PEBS threshold, it can be used as a flag to indicate if large threshold is enabled during the test. ) period plain(Sec) multi(Sec) Delta 10003 32.7 16.5 16.2 20003 30.2 16.2 14.0 40003 18.6 14.1 4.5 80003 16.8 14.6 2.2 100003 16.9 14.1 2.8 800003 15.4 15.7 -0.3 1000003 15.3 15.2 0.2 2000003 15.3 15.1 0.1 With periods below 100003, plain (threshold one) cause much more overhead. With 10003 sampling period, the Elapsed Time for multi is even 2X faster than plain. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-5-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
When the PEBS interrupt threshold is larger than one record and the machine supports multiple PEBS events, the records of these events are mixed up and we need to demultiplex them. Demuxing the records is hard because the hardware is deficient. The hardware has two issues that, when combined, create impossible scenarios to demux. The first issue is that the 'status' field of the PEBS record is a copy of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a problem let us first describe the regular PEBS cycle: A) the CTRn value reaches 0: - the corresponding bit in GLOBAL_STATUS gets set - we start arming the hardware assist < some unspecified amount of time later -- this could cover multiple events of interest > B) the hardware assist is armed, any next event will trigger it C) a matching event happens: - the hardware assist triggers and generates a PEBS record this includes a copy of GLOBAL_STATUS at this moment - if we auto-reload we (re)set CTRn - we clear the relevant bit in GLOBAL_STATUS Now consider the following chain of events: A0, B0, A1, C0 The event generated for counter 0 will include a status with counter 1 set, even though its not at all related to the record. A similar thing can happen with a !PEBS event if it just happens to overflow at the right moment. The second issue is that the hardware will only emit one record for two or more counters if the event that triggers the assist is 'close'. The 'close' can be several cycles. In some cases even the complete assist, if the event is something that doesn't need retirement. For instance, consider this chain of events: A0, B0, A1, B1, C01 Where C01 is an event that triggers both hardware assists, we will generate but a single record, but again with both counters listed in the status field. This time the record pertains to both events. Note that these two cases are different but undistinguishable with the data as generated. Therefore demuxing records with multiple PEBS bits (we can safely ignore status bits for !PEBS counters) is impossible. Furthermore we cannot emit the record to both events because that might cause a data leak -- the events might not have the same privileges -- so what this patch does is discard such events. The assumption/hope is that such discards will be rare. Here lists some possible ways you may get high discard rate. - when you count the same thing multiple times. But it is not a useful configuration. - you can be unfortunate if you measure with a userspace only PEBS event along with either a kernel or unrestricted PEBS event. Imagine the event triggering and setting the overflow flag right before entering the kernel. Then all kernel side events will end up with multiple bits set. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NKan Liang <kan.liang@intel.com> [ Changelog improvements. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
Move code that sets up the PEBS sample data to a separate function. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-3-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
When a fixed period is specified, this patch makes perf use the PEBS auto reload mechanism. This makes normal profiling faster, because it avoids one costly MSR write in the PMI handler. However, the reset value will be loaded by hardware assist. There is a small delay compared to the previous non-auto-reload mechanism. The delay time is arbitrary, but very small. The assist cost is 400-800 cycles, assuming common cases with everything cached. The minimum period the patch currently uses is 10000. In that extreme case it can be ~10% if cycles are used. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1430940834-8964-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch enables support for branch sampling filter for indirect jumps (IND_JUMP). It enables LBR IND_JMP filtering where available. There is also software filtering support. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@redhat.com Cc: dsahern@gmail.com Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: namhyung@kernel.org Link: http://lkml.kernel.org/r/1431637800-31061-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 5月, 2015 25 次提交
-
-
由 Alexander Shishkin 提交于
There is a 'pt' variable in the outer scope of pt_event_stop() with the same type, we don't really need another one in the inner scope. This patch removes the redundant variable declaration. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1432308626-18845-8-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Initially, we were trying to guard against scenarios where somebody attaches to the system with a hardware debugger while PT is enabled from software and pt_is_running() tries to make sure we handle this better, but the truth is, there is still a race window no matter what and people with hardware debuggers should really know what they are doing anyway. In other words, there is no point in keeping this one around, and it's one RDMSR instructions fewer in the fast path. The case when PT is enabled by the BIOS at boot time is handled in the driver initialization path and doesn't use pt_is_running(). This patch gets rid of it. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1429622177-22843-6-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Currently, the description of pt_buffer_reset_offsets() lacks information about its calling constraints and ordering with regards to other buffer management functions. Add a clarification about when this function has to be called. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1429622177-22843-5-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
The comments in the driver don't make it absolutely clear as to what exactly is the calling order and other possible constraints of buffer management functions. Document constraints and calling order for the buffer configuration functions. While at it, replace a redundant check in pt_buffer_reset_markers() with an explanation why it is not needed. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1429622177-22843-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Currently, there's a set-but-not-used variable in setup_topa_index(); this patch gets rid of it. And while at it, fixes a style issue with brackets around a one-line block. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1429622177-22843-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Don't bother with taking locks if we're not actually going to do anything. Also, drop the _irqsave(), this is very much only called from IRQ-disabled context. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
!x && y == ! (x || !y) Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
For some obscure reason intel_{start,stop}_scheduling() copy the HT state to an intermediate array. This would make sense if we ever were to make changes to it which we'd have to discard. Except we don't. By the time we call intel_commit_scheduling() we're; as the name implies; committed to them. We'll never back out. A further hint its pointless is that stop_scheduling() unconditionally publishes the state. So the intermediate array is pointless, modify the state in place and kill the extra array. And remove the pointless array initialization: INTEL_EXCL_UNUSED == 0. Note; all is serialized by intel_excl_cntr::lock. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Both intel_commit_scheduling() and intel_get_excl_contraints() test for cntr < 0. The only way that can happen (aside from a bug) is through validate_event(), however that is already captured by the cpuc->is_fake test. So remove these test and simplify the code. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Move the code of intel_commit_scheduling() to the right place, which is in between start() and stop(). No change in functionality. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The intel_commit_scheduling() callback is pointlessly different from the start and stop scheduling callback. Furthermore, the constraint should never be NULL, so remove that test. Even though we'll never get called (because we NULL the callbacks) when !is_ht_workaround_enabled() put that test in. Collapse the (pointless) WARN_ON_ONCE() and bail on !cpuc->excl_cntrs -- this is doubly pointless, because its the same condition as is_ht_workaround_enabled() which was already pointless because the whole method won't ever be called. Furthremore, make all the !excl_cntrs test WARN_ON_ONCE(); they're all pointless, because the above, either the function ({get,put}_excl_constraint) are already predicated on it existing or the is_ht_workaround_enabled() thing is the same test. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
We have two 'struct event_constraint' local variables in intel_get_excl_constraints(): 'cx' and 'c'. Instead of using 'cx' after the dynamic allocation, put all 'cx' inside the dynamic allocation block and use 'c' outside of it. Also use direct assignment to copy the structure; let the compiler figure it out. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Lockdep is very good at finding incorrect IRQ state while locking and is far better at telling us if we hold a lock than the _is_locked() API. It also generates less code for !DEBUG kernels. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
For some obscure reason the current code accounts the current SMT thread's state on the remote thread and reads the remote's state on the local SMT thread. While internally consistent, and 'correct' its pointless confusion we can do without. Flip them the right way around. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Matt Fleming 提交于
Since we write RMID values to MSRs the correct type to use is 'u32' because that clearly articulates we're writing a hardware register value. Fix up all uses of RMID in this code to consistently use the correct data type. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/1432285182-17180-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
'closid' (CLass Of Service ID) is used for the Class based Cache Allocation Technology (CAT). Add explicit storage to the per cpu cache for it, so it can be used later with the CAT support (requires to move the per cpu data). While at it: - Rename the structure to intel_pqr_state which reflects the actual purpose of the struct: cache values which go into the PQR MSR - Rename 'cnt' to rmid_usecnt which reflects the actual purpose of the counter. - Document the structure and the struct members. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMatt Fleming <matt.fleming@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/20150518235150.240899319@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
intel_cqm_event_del() is a 1:1 wrapper for intel_cqm_event_stop(). Remove the useless indirection. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMatt Fleming <matt.fleming@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/20150518235150.159779847@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
If the usage counter is non-zero there is no point to update the rmid in the PQR MSR. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMatt Fleming <matt.fleming@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/20150518235150.080844281@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
'struct intel_cqm_state' is a strict per CPU cache of the rmid and the usage counter. It can never be modified from a remote CPU. The three functions which modify the content: intel_cqm_event[start|stop|del] (del maps to stop) are called from the perf core with interrupts disabled which is enough protection for the per CPU state values. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMatt Fleming <matt.fleming@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/20150518235150.001006529@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
'int' is really not a proper data type for an MSR. Use u32 to make it clear that we are dealing with a 32-bit unsigned hardware value. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMatt Fleming <matt.fleming@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/20150518235149.919350144@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The CQM code acts like it owns the PQR MSR completely. That's not true because only the lower 10 bits are used for CQM. The upper 32 bits are used for the 'CLass Of Service ID' (CLOSID). Document the abuse. Will be fixed in a later patch. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMatt Fleming <matt.fleming@intel.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Will Auld <will.auld@intel.com> Link: http://lkml.kernel.org/r/20150518235149.823214798@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Don Zickus 提交于
I stumbled upon an AMD box that had the BIOS using a hardware performance counter. Instead of printing out a warning and continuing, it failed and blocked further perf counter usage. Looking through the history, I found this commit: a5ebe0ba ("perf/x86: Check all MSRs before passing hw check") which tweaked the rules for a Xen guest on an almost identical box and now changed the behaviour. Unfortunately the rules were tweaked incorrectly and will always lead to MSR failures even though the MSRs are completely fine. What happens now is in arch/x86/kernel/cpu/perf_event.c::check_hw_exists(): <snip> for (i = 0; i < x86_pmu.num_counters; i++) { reg = x86_pmu_config_addr(i); ret = rdmsrl_safe(reg, &val); if (ret) goto msr_fail; if (val & ARCH_PERFMON_EVENTSEL_ENABLE) { bios_fail = 1; val_fail = val; reg_fail = reg; } } <snip> /* * Read the current value, change it and read it back to see if it * matches, this is needed to detect certain hardware emulators * (qemu/kvm) that don't trap on the MSR access and always return 0s. */ reg = x86_pmu_event_addr(0); ^^^^ if the first perf counter is enabled, then this routine will always fail because the counter is running. :-( if (rdmsrl_safe(reg, &val)) goto msr_fail; val ^= 0xffffUL; ret = wrmsrl_safe(reg, val); ret |= rdmsrl_safe(reg, &val_new); if (ret || val != val_new) goto msr_fail; The above bios_fail used to be a 'goto' which is why it worked in the past. Further, most vendors have migrated to using fixed counters to hide their evilness hence this problem rarely shows up now days except on a few old boxes. I fixed my problem and kept the spirit of the original Xen fix, by recording a safe non-enable register to be used safely for the reading/writing check. Because it is not enabled, this passes on bare metal boxes (like metal), but should continue to throw an msr_fail on Xen guests because the register isn't emulated yet. Now I get a proper bios_fail error message and Xen should still see their msr_fail message (untested). Signed-off-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: george.dunlap@eu.citrix.com Cc: konrad.wilk@oracle.com Link: http://lkml.kernel.org/r/1431976608-56970-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Currently, pt_buffer_reset_markers() is a difficult to read knot of arithmetics with a redundant check for multiple-entry TOPA capability, a commented out wakeup marker placement and a logical error wrt to stop marker placement. The latter happens when write head is not page aligned and results in stop marker being placed one page earlier than it actually should. All these problems only affect PT implementations that support multiple-entry TOPA tables (read: proper scatter-gather). For single-entry TOPA implementations, there is no functional impact. This patch deals with all of the above. Tested on both single-entry and multiple-entry TOPA PT implementations. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1432308626-18845-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The (SNB/IVB/HSW) HT bug only affects events that can be programmed onto GP counters, therefore we should only limit the number of GP counters that can be used per cpu -- iow we should not constrain the FP counters. Furthermore, we should only enfore such a limit when there are in fact exclusive events being scheduled on either sibling. Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> [ Fixed build fail for the !CONFIG_CPU_SUP_INTEL case. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Commit 43b45780 ("perf/x86: Reduce stack usage of x86_schedule_events()") violated the rule that 'fake' scheduling; as used for event/group validation; should not change the event state. This went mostly un-noticed because repeated calls of x86_pmu::get_event_constraints() would give the same result. And x86_pmu::put_event_constraints() would mostly not do anything. Commit e979121b ("perf/x86/intel: Implement cross-HT corruption bug workaround") made the situation much worse by actually setting the event->hw.constraint value to NULL, so when validation and actual scheduling interact we get NULL ptr derefs. Fix it by removing the constraint pointer from the event and move it back to an array, this time in cpuc instead of on the stack. validate_group() x86_schedule_events() event->hw.constraint = c; # store <context switch> perf_task_event_sched_in() ... x86_schedule_events(); event->hw.constraint = c2; # store ... put_event_constraints(event); # assume failure to schedule intel_put_event_constraints() event->hw.constraint = NULL; <context switch end> c = event->hw.constraint; # read -> NULL if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref This in particular is possible when the event in question is a cpu-wide event and group-leader, where the validate_group() tries to add an event to the group. Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 43b45780 ("perf/x86: Reduce stack usage of x86_schedule_events()") Fixes: e979121b ("perf/x86/intel: Implement cross-HT corruption bug workaround") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 5月, 2015 2 次提交
-
-
由 Stephane Eranian 提交于
This patch enables the uncore Memory Controller (IMC) PMU support for Intel Broadwell-U (Model 61) mobile processors. The IMC PMU enables measuring memory bandwidth. To use with perf: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ sleep 10 Tested-by: NSonny Rao <sonnyrao@chromium.org> Signed-off-by: NStephane Eranian <eranian@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kan.liang@intel.com Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/20150423065642.GA4890@thinkpadSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch enables RAPL counters (energy consumption counters) support for Intel Broadwell-U processors (Model 61): To use: $ perf stat -a -I 1000 -e power/energy-cores/,power/energy-pkg/,power/energy-ram/ sleep 10 Signed-off-by: NStephane Eranian <eranian@google.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jacob.jun.pan@linux.intel.com Cc: kan.liang@intel.com Cc: peterz@infradead.org Cc: sonnyrao@chromium.org Link: http://lkml.kernel.org/r/20150423070709.GA4970@thinkpadSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 5月, 2015 1 次提交
-
-
由 Kan Liang 提交于
iTLB-load-misses and LLC-load-misses count incorrectly on SLM. There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK should be used to count iTLB-load-misses. This event counts when an instruction (I) page walk is completed or started. Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks. DMND_DATA_RD counts both demand and DCU prefetch data reads. However, LLC-load-misses should only count demand reads. There is no way to not include prefetches with a single counter on SLM. So the LLC-load-misses support should be removed on SLM. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 5月, 2015 1 次提交
-
-
由 Boris Ostrovsky 提交于
Commit 61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") makes AMD processors set SS to __KERNEL_DS in __switch_to() to deal with cases when SS is NULL. This breaks Xen PV guests who do not want to load SS with__KERNEL_DS. Since the problem that the commit is trying to address would have to be fixed in the hypervisor (if it in fact exists under Xen) there is no reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here. This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features op which will clear this flag. (And since this structure is no longer HVM-specific we should do some renaming). Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: NSander Eikelenboom <linux@eikelenboom.it> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 27 4月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: e7d6eefa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 4月, 2015 3 次提交
-
-
由 Sonny Rao 提交于
This keeps all the related PCI IDs together in the driver where they are used. Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sonny Rao 提交于
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs This uncore is the same as the Haswell desktop part but uses a different PCI ID. Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
The core_pmu does not define cpu_* callbacks, which handles allocation of 'struct cpu_hw_events::shared_regs' data, initialization of debug store and PMU_FL_EXCL_CNTRS counters. While this probably won't happen on bare metal, virtual CPU can define x86_pmu.extra_regs together with PMU version 1 and thus be using core_pmu -> using shared_regs data without it being allocated. That could could leave to following panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40 SNIP [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230 [<ffffffff811a993a>] ? dput+0x9a/0x150 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110 [<ffffffff8119c731>] ? path_put+0x31/0x40 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170 [<ffffffff81014a29>] ? sched_clock+0x9/0x10 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0 [<ffffffff81195ee9>] set_task_comm+0x69/0x80 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0 Adding cpu_(prepare|starting|dying) for core_pmu to have shared_regs data allocated for core_pmu. AFAICS there's no harm to initialize debug store and PMU_FL_EXCL_CNTRS either for core_pmu. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 4月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
Dan Carpenter reported that pt_event_add() has buggy error handling logic: it returns 0 instead of -EBUSY when it fails to start a newly added event. Furthermore, the control flow in this function is messy, with cleanup labels mixed with direct returns. Fix the bug and clean up the code by converting it to a straight fast path for the regular non-failing case, plus a clear sequence of cascading goto labels to do all cleanup. NOTE: I materially changed the existing clean up logic in the pt_event_start() failure case to use the direct perf_aux_output_end() path, not pt_event_del(), because perf_aux_output_end() is enough here. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-