- 31 7月, 2018 1 次提交
-
-
由 Kan Liang 提交于
Masayoshi Mizuma reported that a warning message is shown while a CPU is hot-removed on Broadwell servers: WARNING: CPU: 126 PID: 6 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0x10b/0x150 Call Trace: pci_device_remove+0x42/0xd0 device_release_driver_internal+0x148/0x220 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x57/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x174/0x3a0 worker_thread+0x4c/0x3d0 kthread+0xf8/0x130 This bug was introduced by: commit 15a3e845 ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs") The index of "QPI Port 2 filter" was hardcode to 2, but this conflicts with the index of "PCU.3" which is "HSWEP_PCI_PCU_3", which equals to 2 as well. To fix the conflict, the hardcoded index needs to be cleaned up: - introduce a new enumerator "BDX_PCI_QPI_PORT2_FILTER" for "QPI Port 2 filter" on Broadwell, - increase UNCORE_EXTRA_PCI_DEV_MAX by one, - clean up the hardcoded index. Debugged-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com> Suggested-by: NIngo Molnar <mingo@kernel.org> Reported-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com> Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: msys.mizuma@gmail.com Cc: stable@vger.kernel.org Fixes: 15a3e845 ("perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs") Link: http://lkml.kernel.org/r/1532953688-15008-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 7月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
Vince reported the perf_fuzzer giving various unwinder warnings and Josh reported: > Deja vu. Most of these are related to perf PEBS, similar to the > following issue: > > b8000586 ("perf/x86/intel: Cure bogus unwind from PEBS entries") > > This is basically the ORC version of that. setup_pebs_sample_data() is > assembling a franken-pt_regs which ORC isn't happy about. RIP is > inconsistent with some of the other registers (like RSP and RBP). And where the previous unwinder only needed BP,SP ORC also requires IP. But we cannot spoof IP because then the sample will get displaced, entirely negating the point of PEBS. So cure the whole thing differently by doing the unwind early; this does however require a means to communicate we did the unwind early. We (ab)use an unused sample_type bit for this, which we set on events that fill out the data->callchain before the normal perf_prepare_sample(). Debugged-by: NJosh Poimboeuf <jpoimboe@redhat.com> Reported-by: NVince Weaver <vincent.weaver@maine.edu> Tested-by: NJosh Poimboeuf <jpoimboe@redhat.com> Tested-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 7月, 2018 1 次提交
-
-
由 Hugh Dickins 提交于
Markus reported that BTS is sporadically missing the tail of the trace in the perf_event data buffer: [decode error (1): instruction overflow] shown in GDB; and bisected it to the conversion of debug_store to PTI. A little "optimization" crept into alloc_bts_buffer(), which mistakenly placed bts_interrupt_threshold away from the 24-byte record boundary. Intel SDM Vol 3B 17.4.9 says "This address must point to an offset from the BTS buffer base that is a multiple of the BTS record size." Revert "max" from a byte count to a record count, to calculate the bts_interrupt_threshold correctly: which turns out to fix problem seen. Fixes: c1961a46 ("x86/events/intel/ds: Map debug buffers in cpu_entry_area") Reported-and-tested-by: NMarkus T Metzger <markus.t.metzger@intel.com> Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: stable@vger.kernel.org # v4.14+ Link: https://lkml.kernel.org/r/alpine.LSU.2.11.1807141248290.1614@eggly.anvils
-
- 13 6月, 2018 2 次提交
-
-
由 Kees Cook 提交于
The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
由 Matthew Wilcox 提交于
Need to do a bit of rearranging to make this work. Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 31 5月, 2018 8 次提交
-
-
由 Kan Liang 提交于
The counters in client IMC uncore are free running counters, not fixed counters. It should be corrected. The new infrastructure for free running counter should be applied. Introducing a new type SNB_PCI_UNCORE_IMC_DATA for client IMC free running counters. Keeping the customized event_init() function to be compatible with old event encoding. Clean up other customized event_*() functions. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-8-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Some uncores have customized PMU. For customized PMU, it does not need to customize everything. For example, it only needs to customize init() function for client IMC uncore. Other functions like add()/del()/start()/stop()/read() can use generic code. Expose the uncore_pmu_event_add/del/start/stop() functions. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-7-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
As of Skylake Server, there are a number of free running counters in each IIO Box that collect counts of per-box IO clocks and per-port Input/Output x BW/Utilization. The free running counters cannot be part of the existing IIO BOX, because, quoting from Peter Zijlstra: "This will result in some (probably) unexpected scheduling artifacts. Probably the only way to really cure that is to have the free running counters in their own PMU and not share with the GP counters of this box." So let's add a new PMU for the free running counters, as suggested. The free-running counter is read-only and always active. Counting will be suspended only when the IIO Box is powered down. There are three types of IIO free-running counters on Skylake server, IO CLOCKS counter, BANDWIDTH counters and UTILIZATION counters. IO CLOCKS counter is a clock of IIO box. BANDWIDTH counters are to count inbound(PCIe->CPU)/outbound(CPU->PCIe) bandwidth. UTILIZATION counters are to count input/output utilization. The bit width of the free-running counters is 36-bits. Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-6-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
There are a number of free running counters introduced for uncore, which provide highly valuable information to a wide array of customers. However, the generic uncore code doesn't support them yet. The free running counters will be specially handled based on their unique attributes: - They are read-only. They cannot be enabled/disabled. - The event and the counter are always 1:1 mapped. It doesn't need to be assigned nor tracked by event_list. - They are always active. It doesn't need to check the availability. - They have different bit width. Also, using inline helpers to replace the check for fixed counter and free running counter. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-5-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
There are a number of free running counters introduced for uncore, which provide highly valuable information to a wide array of customers. For example, Skylake Server has IIO free running counters to collect Input/Output x BW/Utilization. There is NO event available on the general purpose counters, that is exactly the same as the free running counters. The generic uncore code needs to be enhanced to support the new counters. In the uncore document, there is no event-code assigned to free running counters. Some events need to be defined to indicate the free running counters. The events are encoded as event-code + umask-code. The event-code for all free running counters is 0xff, which is the same as the fixed counters: - It has not been decided what code will be used for common events on future platforms. 0xff is the only one which will definitely not be used as any common event-code. - Cannot re-use current events on the general purpose counters. Because there is NO event available, that is exactly the same as the free running counters. - Even in the existing codes, the fixed counters for core, that have the same event-code, may count different things. Hence, it should not surprise the users if the free running counters that share the same event-code also count different things. Umask will be used to distinguish the counters. The umask-code is used to distinguish a fixed counter and a free running counter, and different types of free running counters. For fixed counters, the umask-code is 0x0X, where X indicates the index of the fixed counter, which starts from 0. - Compatible with the old event encoding. - Currently, there is only one fixed counter. There are still 15 reserved spaces for extension. For free running counters, the umask-code uses the rest of the space. It would follow the format of 0xXY: - X stands for the type of free running counters, which starts from 1. - Y stands for the index of free running counters of same type, which starts from 0. - The free running counters do different thing. It can be categorized to several types, according to the MSR location, bit width and definition. E.g. there are three types of IIO free running counters on Skylake server to monitor IO CLOCKS, BANDWIDTH and UTILIZATION on different ports. It makes it easy to locate the free running counter of a specific type. - So far, there are at most 8 counters of each type. There are still 8 reserved spaces for extension. Introducing a new index to indicate the free running counters. Only one index is enough for all free running counters. Because the free running counters are always active, and the event and free running counter are always 1:1 mapped, it does not need extra index to indicate the assigned counter. Introducing a new data structure to store free running counters related information for each type. It includes the number of counters, bit width, base address, offset between counters and offset between boxes. Introducing several inline helpers to check index for fixed counter and free running counter, validate free running counter event, and retrieve the free running counter information according to box and event. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-4-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
There is no index which is bigger than UNCORE_PMC_IDX_FIXED. The only exception is client IMC uncore, which has been specially handled. For generic code, it is not correct to use >= to check fixed counter. The code quality issue will bring problem when a new counter index is introduced. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-3-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
For Nehalem and Westmere, there is only one fixed counter for W-Box. There is no index which is bigger than UNCORE_PMC_IDX_FIXED. It is not correct to use >= to check fixed counter. The code quality issue will bring problem when new counter index is introduced. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-2-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
There are two free-running counters for client IMC uncore. The customized event_init() function hard codes their index to 'UNCORE_PMC_IDX_FIXED' and 'UNCORE_PMC_IDX_FIXED + 1'. To support the index 'UNCORE_PMC_IDX_FIXED + 1', the generic uncore_perf_event_update is obscurely hacked. The code quality issue will bring problems when a new counter index is introduced into the generic code, for example, a new index for free-running counter. Introducing a customized event_read() function for client IMC uncore. The customized function is copied from previous generic uncore_pmu_event_read(). The index 'UNCORE_PMC_IDX_FIXED + 1' will be isolated for client IMC uncore only. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: acme@kernel.org Cc: eranian@google.com Link: http://lkml.kernel.org/r/1525371913-10597-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 5月, 2018 1 次提交
-
-
由 Song Liu 提交于
As Miklos reported and suggested: "This pattern repeats two times in trace_uprobe.c and in kernel/events/core.c as well: ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) goto fail_address_parse; inode = igrab(d_inode(path.dentry)); path_put(&path); And it's wrong. You can only hold a reference to the inode if you have an active ref to the superblock as well (which is normally through path.mnt) or holding s_umount. This way unmounting the containing filesystem while the tracepoint is active will give you the "VFS: Busy inodes after unmount..." message and a crash when the inode is finally put. Solution: store path instead of inode." This patch fixes the issue in kernel/event/core.c. Reviewed-and-tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 375637bc ("perf/core: Introduce address range filtering") Link: http://lkml.kernel.org/r/20180418062907.3210386-2-songliubraving@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 5月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
> arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap) Userspace controls @attr, sanitize cfg (attr->config) before using it to index an array. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 4月, 2018 1 次提交
-
-
由 Kan Liang 提交于
The SMM freeze feature was introduced since PerfMon V2. But the current code unconditionally enables the feature for all platforms. It can generate #GP exception, if the related FREEZE_WHILE_SMM bit is set for the machine with PerfMon V1. To disable the feature for PerfMon V1, perf needs to - Remove the freeze_on_smi sysfs entry by moving intel_pmu_attrs to intel_pmu, which is only applied to PerfMon V2 and later. - Check the PerfMon version before flipping the SMM bit when starting CPU Fixes: 6089327f ("perf/x86: Add sysfs entry to freeze counters on SMI") Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: ak@linux.intel.com Cc: eranian@google.com Cc: acme@redhat.com Link: https://lkml.kernel.org/r/1524682637-63219-1-git-send-email-kan.liang@linux.intel.com
-
- 20 4月, 2018 2 次提交
-
-
由 Oskar Senft 提交于
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally despite the fact that there are no SBOXes available. Check the Power Control Unit CAPID4 register to determine the number of available SBOXes on the particular CPU before trying to enable them. If there are none, nullify the SBOX descriptor so it isn't tried to be initialized. Signed-off-by: NOskar Senft <osk@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMark van Dijk <mark@voidzero.net> Reviewed-by: NKan Liang <kan.liang@intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: eranian@google.com Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
-
由 Stephane Eranian 提交于
This reverts commit 3b94a891 ("perf/x86/intel/uncore: Remove SBOX support for Broadwell server") Revert because there exists a proper workaround for Broadwell-EP servers without SBOX now. Note that BDX-DE does not have a SBOX. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKan Liang <kan.liang@intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: ak@linux.intel.com Cc: osk@google.com Cc: mark@voidzero.net Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com
-
- 05 4月, 2018 1 次提交
-
-
由 Stephane Eranian 提交于
This patch removes a redundant store on regs->flags introduced by commit: 71eb9ee9 ("perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs") We were clearing the PERF_EFLAGS_EXACT but it was overwritten by regs->flags = pebs->flags later on. The PERF_EFLAGS_EXACT is a software flag using bit 3 of regs->flags. X86 marks this bit as Reserved. To make sure this bit is zero before we do any IP processing, we clear it explicitly. Patch also removes the following assignment: regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM); Because there is no regs->flags to preserve anymore because set_linear_ip() is not called until later. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1522909791-32498-1-git-send-email-eranian@google.com [ Improve capitalization, punctuation and clarity of comments. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 31 3月, 2018 2 次提交
-
-
由 Harry Pan 提交于
Cannon Lake supports C1/C3/C6/C7, PC2/PC3/PC6/PC7/PC8/PC9/PC10 state residency counters, this patch enables those counters. ( The MSR information is based on Intel Software Developers' Manual, Vol. 4, Order No. 335592. ) Tested-by: NPuthikorn Voravootivat <puthik@chromium.org> Signed-off-by: NHarry Pan <harry.pan@intel.com> Reviewed-by: NBenson Leung <bleung@chromium.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan.liang@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: gs0622@gmail.com Link: http://lkml.kernel.org/r/20180309121549.630-3-harry.pan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Harry Pan 提交于
This patch enables RAPL counters (energy consumption counters) support for Cannon Lake processors. ( ESU and power domains refer to Intel Software Developers' Manual, Vol. 4, Order No. 335592. ) Usage example: $ perf list $ perf stat -a -e power/energy-cores/,power/energy-pkg/ sleep 10 Tested-by: NPuthikorn Voravootivat <puthik@chromium.org> Signed-off-by: NHarry Pan <harry.pan@intel.com> Reviewed-by: NBenson Leung <bleung@chromium.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: colin.king@canonical.com Cc: gs0622@gmail.com Cc: kan.liang@linux.intel.com Link: http://lkml.kernel.org/r/20180309121549.630-2-harry.pan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 3月, 2018 1 次提交
-
-
由 Alexander Shishkin 提交于
This is a cosmetic patch that deals with the address filter structure's ambiguous fields 'filter' and 'range'. The former stands to mean that the filter's *action* should be to filter the traces to its address range if it's set or stop tracing if it's unset. This is confusing and hard on the eyes, so this patch replaces it with 'action' enum. The 'range' field is completely redundant (meaning that the filter is an address range as opposed to a single address trigger), as we can use zero size to mean the same thing. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20180329120648.11902-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 3月, 2018 1 次提交
-
-
由 Stephane Eranian 提交于
this patch fix a bug in how the pebs->real_ip is handled in the PEBS handler. real_ip only exists in Haswell and later processor. It is actually the eventing IP, i.e., where the event occurred. As opposed to the pebs->ip which is the PEBS interrupt IP which is always off by one. The problem is that the real_ip just like the IP needs to be fixed up because PEBS does not record all the machine state registers, and in particular the code segement (cs). This is why we have the set_linear_ip() function. The problem was that set_linear_ip() was only used on the pebs->ip and not the pebs->real_ip. We have profiles which ran into invalid callstacks because of this. Here is an example: ..... 0: ffffffffffffff80 recent entry, marker kernel v ..... 1: 000000000040044d <= user address in kernel space! ..... 2: fffffffffffffe00 marker enter user v ..... 3: 000000000040044d ..... 4: 00000000004004b6 oldest entry Debugging output in get_perf_callchain(): [ 857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0 The problem is that the kernel entry in 1: points to a user level address. How can that be? The reason is that with PEBS sampling the instruction that caused the event to occur and the instruction where the CPU was when the interrupt was posted may be far apart. And sometime during that time window, the privilege level may change. This happens, for instance, when the PEBS sample is taken close to a kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level instruction. But by the time the PMU interrupt fired, the processor had already entered kernel space. This is why the debug output shows a user address with user_mode() false. The problem comes from PEBS not recording the code segment (cs) register. The register is used in x86_64 to determine if executing in kernel vs user space. This is okay because the kernel has a software workaround called set_linear_ip(). But the issue in setup_pebs_sample_data() is that set_linear_ip() is never called on the real_ip value when it is available (Haswell and later) and precise_ip > 1. This patch fixes this problem and eliminates the callchain discrepancy. The patch restructures the code around set_linear_ip() to minimize the number of times the IP has to be set. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1521788507-10231-1-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 3月, 2018 4 次提交
-
-
由 Kan Liang 提交于
The number of CHAs is miscalculated on multi-domain PCI Skylake server systems, resulting in an uncore driver initialization error. Gary Kroening explains: "For systems with a single PCI segment, it is sufficient to look for the bus number to change in order to determine that all of the CHa's have been counted for a single socket. However, for multi PCI segment systems, each socket is given a new segment and the bus number does NOT change. So looking only for the bus number to change ends up counting all of the CHa's on all sockets in the system. This leads to writing CPU MSRs beyond a valid range and causes an error in ivbep_uncore_msr_init_box()." To fix this bug, query the number of CHAs from the CAPID6 register: it should read bits 27:0 in the CAPID6 register located at Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector of available LLC slices and the CHAs that manage those slices. Reported-by: NKroening, Gary <gary.kroening@hpe.com> Tested-by: NKroening, Gary <gary.kroening@hpe.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: abanman@hpe.com Cc: dimitri.sivanich@hpe.com Cc: hpa@zytor.com Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Fixes: cd34cd97 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520967094-13219-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
The 'freerunning PEBS' and 'large PEBS' are the same thing. Both of these names appear in the code and in the API, which causes confusion. Rename 'freerunning PEBS' to 'large PEBS' to unify the code, which eliminates the confusion. No functional change. Reported-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1520865937-22910-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
Adding a filter constraint for Intel Skylake CHA event UNC_CHA_UPI_CREDITS_ACQUIRED (0x38). The event supports core-id/thread-id and link filtering. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1520869294-14176-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dan Carpenter 提交于
We intended to clear the lowest 6 bits but because of a type bug we clear the high 32 bits as well. Andi says that periods are rarely more than U32_MAX so this bug probably doesn't have a huge runtime impact. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/20180317115216.GB4035@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 3月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae6 ("perf/core: Remove perf_event::group_entry") Reported-by: NMark Rutland <mark.rutland@arm.com> Suggested-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
-
- 16 3月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
Mark noticed that the change to sibling_list changed some iteration semantics; because previously we used group_list as list entry, sibling events would always have an empty sibling_list. But because we now use sibling_list for both list head and list entry, siblings will report as having siblings. Fix this with a custom for_each_sibling_event() iterator. Fixes: 8343aae6 ("perf/core: Remove perf_event::group_entry") Reported-by: NMark Rutland <mark.rutland@arm.com> Suggested-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: vincent.weaver@maine.edu Cc: alexander.shishkin@linux.intel.com Cc: torvalds@linux-foundation.org Cc: alexey.budankov@linux.intel.com Cc: valery.cherepennikov@intel.com Cc: eranian@google.com Cc: acme@redhat.com Cc: linux-tip-commits@vger.kernel.org Cc: davidcc@google.com Cc: kan.liang@intel.com Cc: Dmitry.Prohorov@intel.com Cc: jolsa@redhat.com Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
-
- 12 3月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
Now that all the grouping is done with RB trees, we no longer need group_entry and can replace the whole thing with sibling_list. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Budankov <alexey.budankov@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Carrillo-Cisneros <davidcc@google.com> Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valery Cherepennikov <valery.cherepennikov@intel.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 3月, 2018 5 次提交
-
-
由 Kan Liang 提交于
Auto-reload events needs to be specially handled in event count read. Auto-reload is only available for intel_pmu. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)") Link: http://lkml.kernel.org/r/1518474035-21006-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
perf/x86/intel/ds: Introduce ->read() function for auto-reload events and flush the PEBS buffer there There is no way to get exact auto-reload times and values which are needed for event updates unless we flush the PEBS buffer. Introduce intel_pmu_auto_reload_read() to drain the PEBS buffer for auto reload event. To prevent races with the hardware, we can only call drain_pebs() when the PMU is disabled. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Link: http://lkml.kernel.org/r/1518474035-21006-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
There is a bug when reading event->count with large PEBS enabled. Here is an example: # ./read_count 0x71f0 0x122c0 0x1000000001c54 0x100000001257d 0x200000000bdc5 In fixed period mode, the auto-reload mechanism could be enabled for PEBS events, but the calculation of event->count does not take the auto-reload values into account. Anyone who reads event->count will get the wrong result, e.g x86_pmu_read(). This bug was introduced with the auto-reload mechanism enabled since commit: 851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible") Introduce intel_pmu_save_and_restart_reload() to calculate the event->count only for auto-reload. Since the counter increments a negative counter value and overflows on the sign switch, giving the interval: [-period, 0] the difference between two consequtive reads is: A) value2 - value1; when no overflows have happened in between, B) (0 - value1) + (value2 - (-period)); when one overflow happened in between, C) (0 - value1) + (n - 1) * (period) + (value2 - (-period)); when @n overflows happened in between. Here A) is the obvious difference, B) is the extension to the discrete interval, where the first term is to the top of the interval and the second term is from the bottom of the next interval and C) the extension to multiple intervals, where the middle term is the whole intervals covered. The equation for all cases is: value2 - value1 + n * period Previously the event->count is updated right before the sample output. But for case A, there is no PEBS record ready. It needs to be specially handled. Remove the auto-reload code from x86_perf_event_set_period() since we'll not longer call that function in this case. Based-on-code-from: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Fixes: 851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible") Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
The PMU is disabled in intel_pmu_handle_irq(), but cpuc->enabled is not updated accordingly. This is fine in current usage because no-one checks it - but fix it for future code: for example, the drain_pebs() will be modified to fix an auto-reload bug. Properly save/restore the old PMU state. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: kernel test robot <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/6f44ee84-56f8-79f1-559b-08e371eaeb78@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Large fixed period values could be truncated on Broadwell, for example: perf record -e cycles -c 10000000000 Here the fixed period is 0x2540BE400, but the period which finally applied is 0x540BE400 - which is wrong. The reason is that x86_pmu::limit_period() uses an u32 parameter, so the high 32 bits of 'period' get truncated. This bug was introduced in: commit 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") It's safe to use u64 instead of u32: - Although the 'left' is s64, the value of 'left' must be positive when calling limit_period(). - bdw_limit_period() only modifies the lowest 6 bits, it doesn't touch the higher 32 bits. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds") Link: http://lkml.kernel.org/r/1519926894-3520-1-git-send-email-kan.liang@linux.intel.com [ Rewrote unacceptably bad changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 3月, 2018 1 次提交
-
-
由 Kan Liang 提交于
There is no event extension (bit 21) for SKX UPI, so use 'event' instead of 'event_ext'. Reported-by: NStephane Eranian <eranian@google.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: cd34cd97 ("perf/x86/intel/uncore: Add Skylake server uncore support") Link: http://lkml.kernel.org/r/1520004150-4855-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 2月, 2018 1 次提交
-
-
由 Jia Zhang 提交于
x86_mask is a confusing name which is hard to associate with the processor's stepping. Additionally, correct an indent issue in lib/cpu.c. Signed-off-by: NJia Zhang <qianyue.zj@alibaba-inc.com> [ Updated it to more recent kernels. ] Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: tony.luck@intel.com Link: http://lkml.kernel.org/r/1514771530-70829-1-git-send-email-qianyue.zj@alibaba-inc.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 1月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
More lockdep gifts, a 5-way lockup race: perf_event_create_kernel_counter() perf_event_alloc() perf_try_init_event() x86_pmu_event_init() __x86_pmu_event_init() x86_reserve_hardware() #0 mutex_lock(&pmc_reserve_mutex); reserve_ds_buffer() #1 get_online_cpus() perf_event_release_kernel() _free_event() hw_perf_event_destroy() x86_release_hardware() #0 mutex_lock(&pmc_reserve_mutex) release_ds_buffer() #1 get_online_cpus() #1 do_cpu_up() perf_event_init_cpu() #2 mutex_lock(&pmus_lock) #3 mutex_lock(&ctx->mutex) sys_perf_event_open() mutex_lock_double() #3 mutex_lock(ctx->mutex) #4 mutex_lock_nested(ctx->mutex, 1); perf_try_init_event() #4 mutex_lock_nested(ctx->mutex, 1) x86_pmu_event_init() intel_pmu_hw_config() x86_add_exclusive() #0 mutex_lock(&pmc_reserve_mutex) Fix it by using ordering constructs instead of locking. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 1月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
The intel_bts driver does not use the 'normal' BTS buffer which is exposed through the cpu_entry_area but instead uses the memory allocated for the perf AUX buffer. This obviously comes apart when using PTI because then the kernel mapping; which includes that AUX buffer memory; disappears. Fixing this requires to expose a mapping which is visible in all context and that's not trivial. As a quick fix disable this driver when PTI is enabled to prevent malfunction. Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig") Reported-by: NVince Weaver <vincent.weaver@maine.edu> Reported-by: NRobert Święcki <robert@swiecki.net> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: greg@kroah.com Cc: hughd@google.com Cc: luto@amacapital.net Cc: Vince Weaver <vince@deater.net> Cc: torvalds@linux-foundation.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180114102713.GB6166@worktop.programming.kicks-ass.net
-
- 12 1月, 2018 1 次提交
-
-
由 Kan Liang 提交于
Perf-fuzzer triggers non-existent MSR access in RAPL driver on Haswell-EX. Haswell/Broadwell server and client have differnt RAPL events. Since 'commit 7f2236d0 ("perf/x86/rapl: Use Intel family macros for RAPL")', it accidentally assign RAPL client events to server. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linux-kernel@vger.kernel.org Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-