- 02 4月, 2015 16 次提交
-
-
由 Stephane Eranian 提交于
perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation This patch limits the number of counters available to each CPU when the HT bug workaround is enabled. This is necessary to avoid situation of counter starvation. Such can arise from configuration where one HT thread, HT0, is using all 4 counters with corrupting events which require exclusion the the sibling HT, HT1. In such case, HT1 would not be able to schedule any event until HT0 is done. To mitigate this problem, this patch artificially limits the number of counters to 2. That way, we can gurantee that at least 2 counters are not in exclusive mode and therefore allow the sibling thread to schedule events of the same type (system vs. per-thread). The 2 counters are not determined in advance. We simply set the limit to two events per HT. This helps mitigate starvation in case of events with specific counter constraints such a PREC_DIST. Note that this does not elimintate the starvation is all cases. But it is better than not having it. (Solution suggested by Peter Zjilstra.) Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
With dynamic constraint, we need to restart from the static constraints each time the intel_get_event_constraints() is called. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch modifies the PEBS constraint tables for SNB/IVB/HSW such that corrupting events supporting PEBS activate the HT workaround. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patches activates the HT bug workaround for the SNB/IVB/HSW processors. This covers non-PEBS mode. Activation is done thru the constraint tables. Both client and server processors needs this workaround. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch implements a software workaround for a HW erratum on Intel SandyBridge, IvyBridge and Haswell processors with Hyperthreading enabled. The errata are documented for each processor in their respective specification update documents: - SandyBridge: BJ122 - IvyBridge: BV98 - Haswell: HSD29 The bug causes silent counter corruption across hyperthreads only when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3). Counters measuring those events may leak counts to the sibling counter. For instance, counter 0, thread 0 measuring event 0xd0, may leak to counter 0, thread 1, regardless of the event measured there. The size of the leak is not predictible. It all depends on the workload and the state of each sibling hyper-thread. The corrupting events do undercount as a consequence of the leak. The leak is compensated automatically only when the sibling counter measures the exact same corrupting event AND the workload is on the two threads is the same. Given, there is no way to guarantee this, a work-around is necessary. Furthermore, there is a serious problem if the leaked count is added to a low-occurrence event. In that case the corruption on the low occurrence event can be very large, e.g., orders of magnitude. There is no HW or FW workaround for this problem. The bug is very easy to reproduce on a loaded system. Here is an example on a Haswell client, where CPU0, CPU4 are siblings. We load the CPUs with a simple triad app streaming large floating-point vector. We use 0x81d0 corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and 0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not using the LBR, the 0x20cc event should be zero. $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 139 277 291 r20cc 10,000969126 seconds time elapsed In this example, 0x81d0 and r20cc ar eusing sinling counters on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it from 0 to 139 millions occurrences. This patch provides a software workaround to this problem by modifying the way events are scheduled onto counters by the kernel. The patch forces cross-thread mutual exclusion between counters in case a corrupting event is measured by one of the hyper-threads. If thread 0, counter 0 is measuring event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting event is measured on any hyper-thread, event scheduling proceeds as before. The same example run with the workaround enabled, yield the correct answer: $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 0 r20cc 10,000969126 seconds time elapsed The patch does provide correctness for all non-corrupting events. It does not "repatriate" the leaked counts back to the leaking counter. This is planned for a second patch series. This patch series makes this repatriation more easy by guaranteeing the sibling counter is not measuring any useful event. The patch introduces dynamic constraints for events. That means that events which did not have constraints, i.e., could be measured on any counters, may now be constrained to a subset of the counters depending on what is going on the sibling thread. The algorithm is similar to a cache coherency protocol. We call it XSU in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU counter. As a consequence of the workaround, users may see an increased amount of event multiplexing, even in situtations where there are fewer events than counters measured on a CPU. Patch has been tested on all three impacted processors. Note that when HT is off, there is no corruption. However, the workaround is still enabled, yet not costing too much. Adding a dynamic detection of HT on turned out to be complex are requiring too much to code to be justified. This patch addresses the issue when PEBS is not used. A subsequent patch fixes the problem when PEBS is used. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [spinlock_t -> raw_spinlock_t] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch adds a new shared_regs style structure to the per-cpu x86 state (cpuc). It is used to coordinate access between counters which must be used with exclusion across HyperThreads on Intel processors. This new struct is not needed on each PMU, thus is is allocated on demand. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [peterz: spinlock_t -> raw_spinlock_t] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch adds an index parameter to the get_event_constraint() x86_pmu callback. It is expected to represent the index of the event in the cpuc->event_list[] array. When the callback is used for fake_cpuc (evnet validation), then the index must be -1. The motivation for passing the index is to use it to index into another cpuc array. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch adds 3 new PMU model specific callbacks during the event scheduling done by x86_schedule_events(). ->start_scheduling(): invoked when entering the schedule routine. ->stop_scheduling(): invoked at the end of the schedule routine ->commit_scheduling(): invoked for each committed event To be used optionally by model-specific code. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
Make the cpuc->kfree_on_online a vector to accommodate more than one entry and add the second entry to be used by a later patch. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
Because it will be used for more than just tracking the presence of extra registers. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Add support for Branch Trace Store (BTS) via kernel perf event infrastructure. The difference with the existing implementation of BTS support is that this one is a separate PMU that exports events' trace buffers to userspace by means of AUX area of the perf buffer, which is zero-copy mapped into userspace. The immediate benefit is that the buffer size can be much bigger, resulting in fewer interrupts and no kernel side copying is involved and little to no trace data loss. Also, kernel code can be traced with this driver. The old way of collecting BTS traces still works. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Add support for Intel Processor Trace (PT) to kernel's perf events. PT is an extension of Intel Architecture that collects information about software execuction such as control flow, execution modes and timings and formats it into highly compressed binary packets. Even being compressed, these packets are generated at hundreds of megabytes per second per core, which makes it impractical to decode them on the fly in the kernel. This driver exports trace data by through AUX space in the perf ring buffer, which is zero-copy mapped into userspace for faster data retrieval. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614392-114498-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Intel PT cannot be used at the same time as LBR or BTS and will cause a general protection fault if they are used together. In order to avoid fixing up GPs in the fast path, instead we disallow creating LBR/BTS events when PT events are present and vice versa. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-12-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Intel Processor Trace is an architecture extension that allows for program flow tracing. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1421237903-181015-11-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Some of the CYCLE_ACTIVITY.* events can only be scheduled on counter 2. Due to a typo Haswell matched those with INTEL_EVENT_CONSTRAINT, which lead to the events never matching as the comparison does not expect anything in the umask too. Fix the typo. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1425925222-32361-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
For supporting Intel LBR branches filtering, Intel LBR sharing logic mechanism is introduced from commit b36817e8 ("perf/x86: Add Intel LBR sharing logic"). It modifies __intel_shared_reg_get_constraints() to config lbr_sel, which is finally used to set LBR_SELECT. However, the intel_shared_regs_constraints() function is called after intel_pebs_constraints(). The PEBS event will return immediately after intel_pebs_constraints(). So it's impossible to filter branches for PEBS events. This patch moves intel_shared_regs_constraints() ahead of intel_pebs_constraints(). We can safely do that because the intel_shared_regs_constraints() function only returns empty constraint if its rejecting the event, otherwise it returns NULL such that we continue calling intel_pebs_constraints() and x86_get_event_constraint(). Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1427467105-9260-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 3月, 2015 6 次提交
-
-
由 Peter Zijlstra 提交于
While thinking on the whole clock discussion it occurred to me we have two distinct uses of time: 1) the tracking of event/ctx/cgroup enabled/running/stopped times which includes the self-monitoring support in struct perf_event_mmap_page. 2) the actual timestamps visible in the data records. And we've been conflating them. The first is all about tracking time deltas, nobody should really care in what time base that happens, its all relative information, as long as its internally consistent it works. The second however is what people are worried about when having to merge their data with external sources. And here we have the discussion on MONOTONIC vs MONOTONIC_RAW etc.. Where MONOTONIC is good for correlating between machines (static offset), MONOTNIC_RAW is required for correlating against a fixed rate hardware clock. This means configurability; now 1) makes that hard because it needs to be internally consistent across groups of unrelated events; which is why we had to have a global perf_clock(). However, for 2) it doesn't really matter, perf itself doesn't care what it writes into the buffer. The below patch makes the distinction between these two cases by adding perf_event_clock() which is used for the second case. It further makes this configurable on a per-event basis, but adds a few sanity checks such that we cannot combine events with different clocks in confusing ways. And since we then have per-event configurability we might as well retain the 'legacy' behaviour as a default. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 David Ahern 提交于
perf_pmu_disable() is called before pmu->add() and perf_pmu_enable() is called afterwards. No need to call these inside of x86_pmu_add() as well. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424281543-67335-1-git-send-email-dsahern@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NJohn Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. This is erratum BDM11 and BDM55: http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/5th-gen-core-family-spec-update.pdf BDM11: When using a period < 100; we may get incorrect PEBS/PMI interrupts and/or an invalid counter state. BDM55: When bit0-5 of the period are !0 we may get redundant PEBS records on overflow. Add a new callback to enforce this, and set it for Broadwell. How does this handle the case when an app requests a specific period with some of the bottom bits set? Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer (by Peterz): IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: NAndi Kleen <ak@linux.intel.com> [ Updated comments and changelog a bit. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Add Broadwell support for Broadwell to perf. The basic support is very similar to Haswell. We use the new cache event list added for Haswell earlier. The only differences are a few bits related to remote nodes. To avoid an extra, mostly identical, table these are patched up in the initialization code. The constraint list has one new event that needs to be handled over Haswell. Includes code and testing from Kan Liang. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Haswell offcore events are quite different from Sandy Bridge. Add a new table to handle Haswell properly. Note that the offcore bits listed in the SDM are not quite correct (this is currently being fixed). An uptodate list of bits is in the patch. The basic setup is similar to Sandy Bridge. The prefetch columns have been removed, as prefetch counting is not very reliable on Haswell. One L1 event that is not in the event list anymore has been also removed. - data reads do not include code reads (comparable to earlier Sandy Bridge tables) - data counts include speculative execution (except L1 write, dtlb, bpu) - remote node access includes both remote memory, remote cache, remote mmio. - prefetches are not included in the counts for consistency (different from Sandy Bridge, which includes prefetches in the remote node) Signed-off-by: NAndi Kleen <ak@linux.intel.com> [ Removed the HSM30 comments; we don't have them for SNB/IVB either. ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1424225886-18652-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 3月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
We currently have a race: if we're preempted during syscall exit, we can fail to process syscall return work that is queued up while we're preempted in ret_from_sys_call after checking ti.flags. Fix it by disabling interrupts before checking ti.flags. Reported-by: NStefan Seyfried <stefan.seyfried@googlemail.com> Reported-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Tejun Heo <tj@kernel.org> Fixes: 96b6352c ("x86_64, entry: Remove the syscall exit audit") Link: http://lkml.kernel.org/r/189320d42b4d671df78c10555976bb10af1ffc75.1427137498.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 3月, 2015 1 次提交
-
-
由 Radim Krčmář 提交于
kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled. We need to do that for irq notifiers. (Like with edge interrupts.) Fix it by skipping EOI broadcast only. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Tested-by: NBandan Das <bsd@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 23 3月, 2015 2 次提交
-
-
由 Peter Zijlstra 提交于
The only reason CQM had to use a hard-coded pmu type was so it could use cqm_target in hw_perf_event. Do away with the {tp,bp,cqm}_target pointers and provide a non type specific one. This allows us to do away with that silly pmu type as well. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Vince Weaver <vince@deater.net> Cc: acme@kernel.org Cc: acme@redhat.com Cc: hpa@zytor.com Cc: jolsa@redhat.com Cc: kanaka.d.juvva@intel.com Cc: matt.fleming@intel.com Cc: tglx@linutronix.de Cc: torvalds@linux-foundation.org Cc: vikas.shivappa@linux.intel.com Link: http://lkml.kernel.org/r/20150305211019.GU21418@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Matt Fleming 提交于
Someone fat fingered a merge conflict and lost the Makefile hunk. Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <acme@redhat.com> Cc: <hpa@zytor.com> Cc: <jolsa@redhat.com> Cc: <kanaka.d.juvva@intel.com> Cc: <tglx@linutronix.de> Cc: <torvalds@linux-foundation.org> Cc: <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1424976420.15321.35.camel@mfleming-mobl1.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 3月, 2015 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Commit b4b55cda (Refine the way to release PCI IRQ resources) introduced a regression in the PCI IRQ resource management by causing the IRQ resource of a device, established when pci_enabled_device() is called on a fully disabled device, to be released when the driver is unbound from the device, regardless of the enable_cnt. This leads to the situation that an ill-behaved driver can now make a device unusable to subsequent drivers by an imbalance in their use of pci_enable/disable_device(). That is a serious problem for secondary drivers like vfio-pci, which are innocent of the transgressions of the previous driver. Since the solution of this problem is not immediate and requires further discussion, revert commit b4b55cda and the issue it was supposed to address (a bug related to xen-pciback) will be taken care of in a different way going forward. Reported-by: NAlex Williamson <alex.williamson@redhat.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 18 3月, 2015 1 次提交
-
-
由 Radim Krčmář 提交于
If EPT was enabled, unrestricted_guest was allowed in L1 regardless of L0. L1 triple faulted when running L2 guest that required emulation. Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)' in L0's dmesg: WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] () Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when the host doesn't have it enabled. Fixes: 78051e3b ("KVM: nVMX: Disable unrestricted mode if ept=0") Cc: stable@vger.kernel.org Tested-By: NKashyap Chamarthy <kchamart@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 17 3月, 2015 1 次提交
-
-
由 Eugene Shatokhin 提交于
On x86-64, __copy_instruction() always returns 0 (error) if the instruction uses %rip-relative addressing. This is because kernel_insn_init() is called the second time for 'insn' instance in such cases and sets all its fields to 0. Because of this, trying to place a kprobe on such instruction will fail, register_kprobe() will return -EINVAL. This patch fixes the problem. Signed-off-by: NEugene Shatokhin <eugene.shatokhin@rosalab.ru> Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Link: http://lkml.kernel.org/r/20150317100918.28349.94654.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 3月, 2015 1 次提交
-
-
由 Borislav Petkov 提交于
This reverts commit: f47233c2 ("x86/mm/ASLR: Propagate base load address calculation") The main reason for the revert is that the new boot flag does not work at all currently, and in order to make this work, we need non-trivial changes to the x86 boot code which we didn't manage to get done in time for merging. And even if we did, they would've been too risky so instead of rushing things and break booting 4.1 on boxes left and right, we will be very strict and conservative and will take our time with this to fix and test it properly. Reported-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Junjie Mao <eternal.n08@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/20150316100628.GD22995@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 3月, 2015 5 次提交
-
-
由 Wincy Van 提交于
In commit 3af18d9c ("KVM: nVMX: Prepare for using hardware MSR bitmap"), we are setting MSR_BITMAP in prepare_vmcs02 if we should use hardware. This is not enough since the field will be modified by following vmx_set_efer. Fix this by setting vmx_msr_bitmap_nested in vmx_set_msr_bitmap if vcpu is in guest mode. Signed-off-by: NWincy Van <fanwenyi0529@gmail.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
由 Oleg Nesterov 提交于
drop_fpu() does clear_used_math() and usually this is correct because tsk == current. However switch_fpu_finish()->restore_fpu_checking() is called before __switch_to() updates the "current_task" variable. If it fails, we will wrongly clear the PF_USED_MATH flag of the previous task. So use clear_stopped_child_used_math() instead. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
math_state_restore() assumes it is called with irqs disabled, but this is not true if the caller is __restore_xstate_sig(). This means that if ia32_fxstate == T and __copy_from_user() fails, __restore_xstate_sig() returns with irqs disabled too. This triggers: BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 dump_stack ___might_sleep ? _raw_spin_unlock_irqrestore __might_sleep down_read ? _raw_spin_unlock_irqrestore print_vma_addr signal_fault sys32_rt_sigreturn Change __restore_xstate_sig() to call set_used_math() unconditionally. This avoids enabling and disabling interrupts in math_state_restore(). If copy_from_user() fails, we can simply do fpu_finit() by hand. [ Note: this is only the first step. math_state_restore() should not check used_math(), it should set this flag. While init_fpu() should simply die. ] Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pekka Riikonen <priikone@iki.fi> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephan Mueller 提交于
The kernel crypto API logic requires the caller to provide the length of (ciphertext || authentication tag) as cryptlen for the AEAD decryption operation. Thus, the cipher implementation must calculate the size of the plaintext output itself and cannot simply use cryptlen. The RFC4106 GCM decryption operation tries to overwrite cryptlen memory in req->dst. As the destination buffer for decryption only needs to hold the plaintext memory but cryptlen references the input buffer holding (ciphertext || authentication tag), the assumption of the destination buffer length in RFC4106 GCM operation leads to a too large size. This patch simply uses the already calculated plaintext size. In addition, this patch fixes the offset calculation of the AAD buffer pointer: as mentioned before, cryptlen already includes the size of the tag. Thus, the tag does not need to be added. With the addition, the AAD will be written beyond the already allocated buffer. Note, this fixes a kernel crash that can be triggered from user space via AF_ALG(aead) -- simply use the libkcapi test application from [1] and update it to use rfc4106-gcm-aes. Using [1], the changes were tested using CAVS vectors to demonstrate that the crypto operation still delivers the right results. [1] http://www.chronox.de/libkcapi.html CC: Tadeusz Struk <tadeusz.struk@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NStephan Mueller <smueller@chronox.de> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Petr Matousek 提交于
If data is read from PIC with invalid access size, the return data stays uninitialized even though success is returned. Fix this by always initializing the data. Signed-off-by: NPetr Matousek <pmatouse@redhat.com> Reported-by: NNadav Amit <nadav.amit@gmail.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 12 3月, 2015 2 次提交
-
-
由 Daniel J Blueman 提交于
On NumaChip systems, the physical processor ID assignment wasn't accounting for the number of nodes in AMD multi-module processors, giving an incorrect sibling map: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ff000000 core_siblings_list:24-31 physical_package_id:3 thread_siblings:00000000,30000000 thread_siblings_list:28-29 This fixes it: $ cd /sys/devices/system/cpu/cpu29/topology $ grep . * core_id:5 core_siblings:00000000,ffff0000 core_siblings_list:16-31 physical_package_id:1 thread_siblings:00000000,30000000 thread_siblings_list:28-29 Signed-off-by: NDaniel J Blueman <daniel@numascale.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Steffen Persvold <sp@numascale.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1426135950-10110-1-git-send-email-daniel@numascale.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li, Aubrey 提交于
On a platform in ACPI Hardware-reduced mode, the legacy PIC and PIT may not be initialized even though they may be present in silicon. Touching these legacy components causes unexpected results on the system. On the Bay Trail-T(ASUS-T100) platform, touching these legacy components blocks platform hardware low idle power state(S0ix) during system suspend. So we should bypass them in ACPI hardware reduced mode. Suggested-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NLi Aubrey <aubrey.li@linux.intel.com> Cc: <alan@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Link: http://lkml.kernel.org/r/54FFF81C.20703@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 3月, 2015 1 次提交
-
-
由 Paolo Bonzini 提交于
POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: NGreg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 297e2105Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 10 3月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
The one in do_debug() is probably harmless, but better safe than sorry. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/d67deaa9df5458363623001f252d1aee3215d014.1425948056.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 3月, 2015 1 次提交
-
-
由 Jiri Slaby 提交于
On gcc5 the kernel does not link: ld: .eh_frame_hdr table[4] FDE at 0000000000000648 overlaps table[5] FDE at 0000000000000670. Because prior GCC versions always emitted NOPs on ALIGN directives, but gcc5 started omitting them. .LSTARTFDEDLSI1 says: /* HACK: The dwarf2 unwind routines will subtract 1 from the return address to get an address in the middle of the presumed call instruction. Since we didn't get here via a call, we need to include the nop before the real start to make up for it. */ .long .LSTART_sigreturn-1-. /* PC-relative start address */ But commit 69d0627a ("x86 vDSO: reorder vdso32 code") from 2.6.25 replaced .org __kernel_vsyscall+32,0x90 by ALIGN right before __kernel_sigreturn. Of course, ALIGN need not generate any NOP in there. Esp. gcc5 collapses vclock_gettime.o and int80.o together with no generated NOPs as "ALIGN". So fix this by adding to that point at least a single NOP and make the function ALIGN possibly with more NOPs then. Kudos for reporting and diagnosing should go to Richard. Reported-by: NRichard Biener <rguenther@suse.de> Signed-off-by: NJiri Slaby <jslaby@suse.cz> Acked-by: NAndy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1425543211-12542-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
-