- 01 2月, 2021 2 次提交
-
-
由 Kan Liang 提交于
Add perf core PMU support for the Intel Sapphire Rapids server, which is the successor of the Intel Ice Lake server. The enabling code is based on Ice Lake, but there are several new features introduced. The event encoding is changed and simplified, e.g., the event codes which are below 0x90 are restricted to counters 0-3. The event codes which above 0x90 are likely to have no restrictions. The event constraints, extra_regs(), and hardware cache events table are changed accordingly. A new Precise Distribution (PDist) facility is introduced, which further minimizes the skid when a precise event is programmed on the GP counter 0. Enable the Precise Distribution (PDist) facility with :ppp event. For this facility to work, the period must be initialized with a value larger than 127. Add spr_limit_period() to apply the limit for :ppp event. Two new data source fields, data block & address block, are added in the PEBS Memory Info Record for the load latency event. To enable the feature, - An auxiliary event has to be enabled together with the load latency event on Sapphire Rapids. A new flag PMU_FL_MEM_LOADS_AUX is introduced to indicate the case. A new event, mem-loads-aux, is exposed to sysfs for the user tool. Add a check in hw_config(). If the auxiliary event is not detected, return an unique error -ENODATA. - The union perf_mem_data_src is extended to support the new fields. - Ice Lake and earlier models do not support block information, but the fields may be set by HW on some machines. Add pebs_no_block to explicitly indicate the previous platforms which don't support the new block fields. Accessing the new block fields are ignored on those platforms. A new store Latency facility is introduced, which leverages the PEBS facility where it can provide additional information about sampled stores. The additional information includes the data address, memory auxiliary info (e.g. Data Source, STLB miss) and the latency of the store access. To enable the facility, the new event (0x02cd) has to be programed on the GP counter 0. A new flag PERF_X86_EVENT_PEBS_STLAT is introduced to indicate the event. The store_latency_data() is introduced to parse the memory auxiliary info. The layout of access latency field of PEBS Memory Info Record has been changed. Two latency, instruction latency (bit 15:0) and cache access latency (bit 47:32) are recorded. - The cache access latency is similar to previous memory access latency. For loads, the latency starts by the actual cache access until the data is returned by the memory subsystem. For stores, the latency starts when the demand write accesses the L1 data cache and lasts until the cacheline write is completed in the memory subsystem. The cache access latency is stored in low 32bits of the sample type PERF_SAMPLE_WEIGHT_STRUCT. - The instruction latency starts by the dispatch of the load operation for execution and lasts until completion of the instruction it belongs to. Add a new flag PMU_FL_INSTR_LATENCY to indicate the instruction latency support. The instruction latency is stored in the bit 47:32 of the sample type PERF_SAMPLE_WEIGHT_STRUCT. Extends the PERF_METRICS MSR to feature TMA method level 2 metrics. The lower half of the register is the TMA level 1 metrics (legacy). The upper half is also divided into four 8-bit fields for the new level 2 metrics. Expose all eight Topdown metrics events to user space. The full description for the SPR features can be found at Intel Architecture Instruction Set Extensions and Future Features Programming Reference, 319433-041. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-5-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
Intel Sapphire Rapids server will introduce 8 metrics events. Intel Ice Lake only supports 4 metrics events. A perf tool user may mistakenly use the unsupported events via RAW format on Ice Lake. The user can still get a value from the unsupported Topdown metrics event once the following Sapphire Rapids enabling patch is applied. To enable the 8 metrics events on Intel Sapphire Rapids, the INTEL_TD_METRIC_MAX has to be updated, which impacts the is_metric_event(). The is_metric_event() is a generic function. On Ice Lake, the newly added SPR metrics events will be mistakenly accepted as metric events on creation. At runtime, the unsupported Topdown metrics events will be updated. Add a variable num_topdown_events in x86_pmu to indicate the available number of the Topdown metrics event on the platform. Apply the number into is_metric_event(). Only the supported Topdown metrics events should be created as metrics events. Apply the num_topdown_events in icl_update_topdown_event() as well. The function can be reused by the following patch. Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1611873611-156687-4-git-send-email-kan.liang@linux.intel.com
-
- 28 1月, 2021 1 次提交
-
-
由 Peter Zijlstra 提交于
Perfmon-v4 counter freezing is fundamentally broken; remove this default disabled code to make sure nobody uses it. The feature is called Freeze-on-PMI in the SDM, and if it would do that, there wouldn't actually be a problem, *however* it does something subtly different. It globally disables the whole PMU when it raises the PMI, not when the PMI hits. This means there's a window between the PMI getting raised and the PMI actually getting served where we loose events and this violates the perf counter independence. That is, a counting event should not result in a different event count when there is a sampling event co-scheduled. This is known to break existing software (RR). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
- 10 11月, 2020 2 次提交
-
-
由 Stephane Eranian 提交于
Starting with Arch Perfmon v5, the anythread filter on generic counters may be deprecated. The current kernel was exporting the any filter without checking. On Icelake, it means you could do cpu/event=0x3c,any/ even though the filter does not exist. This patch corrects the problem by relying on the CPUID 0xa leaf function to determine if anythread is supported or not as described in the Intel SDM Vol3b 18.2.5.1 AnyThread Deprecation section. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201028194247.3160610-1-eranian@google.com
-
由 Peter Zijlstra 提交于
intel_pmu_drain_pebs_*() is typically called from handle_pmi_common(), both have an on-stack struct perf_sample_data, which is *big*. Rewire things so that drain_pebs() can use the one handle_pmi_common() has. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151955.054099690@infradead.org
-
- 29 10月, 2020 1 次提交
-
-
由 Stephane Eranian 提交于
When studying code layout, it is useful to capture the page size of the sampled code address. Add a new sample type for code page size. The new sample type requires collecting the ip. The code page size can be calculated from the NMI-safe perf_get_page_size(). For large PEBS, it's very unlikely that the mapping is gone for the earlier PEBS records. Enable the feature for the large PEBS. The worst case is that page-size '0' is returned. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201001135749.2804-5-kan.liang@linux.intel.com
-
- 06 10月, 2020 2 次提交
-
-
由 Peter Zijlstra 提交于
When a group that has TopDown members is failed to be scheduled, any later TopDown groups will not return valid values. Here is an example. A background perf that occupies all the GP counters and the fixed counter 1. $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles, cycles,cycles}:D" -a A user monitors a TopDown group. It works well, because the fixed counter 3 and the PERF_METRICS are available. $perf stat -x, --topdown -- ./workload retiring,bad speculation,frontend bound,backend bound, 18.0,16.1,40.4,25.5, Then the user tries to monitor a group that has TopDown members. Because of the cycles event, the group is failed to be scheduled. $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound, topdown-fe-bound,topdown-bad-spec,cycles}' -- ./workload <not counted>,,slots,0,0.00,, <not counted>,,topdown-retiring,0,0.00,, <not counted>,,topdown-be-bound,0,0.00,, <not counted>,,topdown-fe-bound,0,0.00,, <not counted>,,topdown-bad-spec,0,0.00,, <not counted>,,cycles,0,0.00,, The user tries to monitor a TopDown group again. It doesn't work anymore. $perf stat -x, --topdown -- ./workload ,,,,, In a txn, cancel_txn() is to truncate the event_list for a canceled group and update the number of events added in this transaction. However, the number of TopDown events added in this transaction is not updated. The kernel will probably fail to add new Topdown events. Fixes: 7b2c05a1 ("perf/x86/intel: Generic support for hardware TopDown metrics") Reported-by: NAndi Kleen <ak@linux.intel.com> Reported-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NKan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20201005082611.GH2628@hirez.programming.kicks-ass.net
-
由 Peter Zijlstra 提交于
Kan reported that n_metric gets corrupted for cancelled transactions; a similar issue exists for n_pair for AMD's Large Increment thing. The problem was confirmed and confirmed fixed by Kim using: sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 & # should succeed: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload # should fail: sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload # previously failed, now succeeds with this patch: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload Fixes: 57388912 ("perf/x86/amd: Add support for Large Increment per Cycle Events") Reported-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NKim Phillips <kim.phillips@amd.com> Link: https://lkml.kernel.org/r/20201005082516.GG2628@hirez.programming.kicks-ass.net
-
- 18 8月, 2020 3 次提交
-
-
由 Kan Liang 提交于
Ice Lake supports the hardware TopDown metrics feature, which can free up the scarce GP counters. Update the event constraints for the metrics events. The metric counters do not exist, which are mapped to a dummy offset. The sharing between multiple users of the same metric without multiplexing is not allowed. Implement set_topdown_event_period for Ice Lake. The values in PERF_METRICS MSR are derived from the fixed counter 3. Both registers should start from zero. Implement update_topdown_event for Ice Lake. The metric is reported by multiplying the metric (fraction) with slots. To maintain accurate measurements, both registers are cleared for each update. The fixed counter 3 should always be cleared before the PERF_METRICS. Implement td_attr for the new metrics events and the new slots fixed counter. Make them visible to the perf user tools. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
Intro ===== The TopDown Microarchitecture Analysis (TMA) Method is a structured analysis methodology to identify critical performance bottlenecks in out-of-order processors. Current perf has supported the method. The method works well, but there is one problem. To collect the TopDown events, several GP counters have to be used. If a user wants to collect other events at the same time, the multiplexing probably be triggered, which impacts the accuracy. To free up the scarce GP counters, the hardware TopDown metrics feature is introduced from Ice Lake. The hardware implements an additional "metrics" register and a new Fixed Counter 3 that measures pipeline "slots". The TopDown events can be calculated from them instead. Events ====== The level 1 TopDown has four metrics. There is no event-code assigned to the TopDown metrics. Four metric events are exported as separate perf events, which map to the internal "metrics" counter register. Those events do not exist in hardware, but can be allocated by the scheduler. For the event mapping, a special 0x00 event code is used, which is reserved for fake events. The metric events start from umask 0x10. When setting up the metric events, they point to the Fixed Counter 3. They have to be specially handled. - Add the update_topdown_event() callback to read the additional metrics MSR and generate the metrics. - Add the set_topdown_event_period() callback to initialize metrics MSR and the fixed counter 3. - Add a variable n_metric_event to track the number of the accepted metrics events. The sharing between multiple users of the same metric without multiplexing is not allowed. - Only enable/disable the fixed counter 3 when there are no other active TopDown events, which avoid the unnecessary writing of the fixed control register. - Disable the PMU when reading the metrics event. The metrics MSR and the fixed counter 3 are read separately. The values may be modified by an NMI. All four metric events don't support sampling. Since they will be handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is introduced to indicate this case. The slots event can support both sampling and counting. For counting, the flag is also applied. For sampling, it will be handled normally as other normal events. Groups ====== The slots event is required in a Topdown group. To avoid reading the METRICS register multiple times, the metrics and slots value can only be updated by slots event in a group. All active slots and metrics events will be updated one time. Therefore, the slots event must be before any metric events in a Topdown group. NMI ====== The METRICS related register may be overflow. The bit 48 of the STATUS register will be set. If so, PERF_METRICS and Fixed counter 3 are required to be reset. The patch also update all active slots and metrics events in the NMI handler. The update_topdown_event() has to read two registers separately. The values may be modified by an NMI. PMU has to be disabled before calling the function. RDPMC ====== RDPMC is temporarily disabled. A later patch will enable it. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
Bit 15 of the PERF_CAPABILITIES MSR indicates that the perf METRICS feature is supported. The perf METRICS is not a PEBS feature. Rename pebs_metrics_available perf_metrics. The bit is not used in the current code. It will be used in a later patch. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200723171117.9918-6-kan.liang@linux.intel.com
-
- 08 7月, 2020 12 次提交
-
-
由 Kan Liang 提交于
Reading LBR registers in a perf NMI handler for a non-PEBS event causes a high overhead because the number of LBR registers is huge. To reduce the overhead, the XSAVES instruction should be used to replace the LBR registers' reading method. The XSAVES buffer used for LBR read has to be per-CPU because the NMI handler invoked the lbr_read(). The existing task_ctx_data buffer cannot be used which is per-task and only be allocated for the LBR call stack mode. A new lbr_xsave pointer is introduced in the cpu_hw_events as an XSAVES buffer for LBR read. The XSAVES buffer should be allocated only when LBR is used by a non-PEBS event on the CPU because the total size of the lbr_xsave is not small (~1.4KB). The XSAVES buffer is allocated when a non-PEBS event is added, but it is lazily released in x86_release_hardware() when perf releases the entire PMU hardware resource, because perf may frequently schedule the event, e.g. high context switch. The lazy release method reduces the overhead of frequently allocate/free the buffer. If the lbr_xsave fails to be allocated, roll back to normal Arch LBR lbr_read(). Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/1593780569-62993-24-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
In the LBR call stack mode, LBR information is used to reconstruct a call stack. To get the complete call stack, perf has to save/restore all LBR registers during a context switch. Due to a large number of the LBR registers, this process causes a high CPU overhead. To reduce the CPU overhead during a context switch, use the XSAVES/XRSTORS instructions. Every XSAVE area must follow a canonical format: the legacy region, an XSAVE header and the extended region. Although the LBR information is only kept in the extended region, a space for the legacy region and XSAVE header is still required. Add a new dedicated structure for LBR XSAVES support. Before enabling XSAVES support, the size of the LBR state has to be sanity checked, because: - the size of the software structure is calculated from the max number of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR. The size of the LBR state is enumerated by the CPUID leaf for XSAVE support of Arch LBR. If the values from the two CPUID leaves are not consistent, it may trigger a buffer overflow. For example, a hypervisor may unconsciously set inconsistent values for the two emulated CPUID. - unlike other state components, the size of an LBR state depends on the max number of LBRs, which may vary from generation to generation. Expose the function xfeature_size() for the sanity check. The LBR XSAVES support will be disabled if the size of the LBR state enumerated by CPUID doesn't match with the size of the software structure. The XSAVE instruction requires 64-byte alignment for state buffers. A new macro is added to reflect the alignment requirement. A 64-byte aligned kmem_cache is created for architecture LBR. Currently, the structure for each state component is maintained in fpu/types.h. The structure for the new LBR state component should be maintained in the same place. Move structure lbr_entry to fpu/types.h as well for broader sharing. Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support, which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR information at the context switch when the call stack mode is enabled. Since the XSAVES/XRSTORS instructions will be eventually invoked, the dedicated functions is named with '_xsaves'/'_xrstors' postfix. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
Last Branch Records (LBR) enables recording of software path history by logging taken branches and other control flows within architectural registers now. Intel CPUs have had model-specific LBR for quite some time, but this evolves them into an architectural feature now. The main improvements of Architectural LBR implemented includes: - Linux kernel can support the LBR features without knowing the model number of the current CPU. - Architectural LBR capabilities can be enumerated by CPUID. The lbr_ctl_map is based on the CPUID Enumeration. - The possible LBR depth can be retrieved from CPUID enumeration. The max value is written to the new MSR_ARCH_LBR_DEPTH as the number of LBR entries. - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs, which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR. - Each LBR record or entry is still comprised of three MSRs, IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP. But they become the architectural MSRs. - Architectural LBR is stack-like now. Entry 0 is always the youngest branch, entry 1 the next youngest... The TOS MSR has been removed. The way to enable/disable Architectural LBR is similar to the previous model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but some modifications are required, which include: - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural LBR. - When checking the value of the IA32_DEBUGCTL MSR, ignoring the DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning and always return 0. - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for Architectural LBR. - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for Architectural LBR. Some Architectural LBR dedicated functions are implemented to reset/read/save/restore LBR. - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR entries, which is a lot faster and can improve the context switch latency. - For read, the branch type information can be retrieved from the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to OTHER_BRANCH type. The software decoding is still required for the OTHER_BRANCH case. LBR records are stored in the age order as well. Reuse intel_pmu_store_lbr(). Check the CPUID enumeration before accessing the corresponding bits in LBR_INFO. - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH). Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the LBR registers are reset in the deep C-state. If 'the deep C-state reset' bit is not set in CPUID enumeration, ignoring the check. XSAVE support for Architectural LBR will be implemented later. The number of LBR entries cannot be hardcoded anymore, which should be retrieved from CPUID enumeration. A new structure x86_perf_task_context_arch_lbr is introduced for Architectural LBR. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
The previous model-specific LBR and Architecture LBR (legacy way) use a similar method to save/restore the LBR information, which directly accesses the LBR registers. The codes which read/write a set of LBR registers can be shared between them. Factor out two functions which are used to read/write a set of LBR registers. Add lbr_info into structure x86_pmu, and use it to replace the hardcoded LBR INFO MSR, because the LBR INFO MSR address of the previous model-specific LBR is different from Architecture LBR. The MSR address should be assigned at boot time. For now, only Sky Lake and later platforms have the LBR INFO MSR. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-13-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
Current LBR information in the structure x86_perf_task_context is stored in a different format from the PEBS LBR record and Architecture LBR, which prevents the sharing of the common codes. Use the format of the PEBS LBR record as a unified format. Use a generic name lbr_entry to replace pebs_lbr_entry. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-11-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
An IA32_LBR_CTL is introduced for Architecture LBR to enable and config LBR registers to replace the previous LBR_SELECT. All the related members in struct cpu_hw_events and struct x86_pmu have to be renamed. Some new macros are added to reflect the layout of LBR_CTL. The mapping from PERF_SAMPLE_BRANCH_* to the corresponding bits in LBR_CTL MSR is saved in lbr_ctl_map now, which is not a const value. The value relies on the CPUID enumeration. For the previous model-specific LBR, most of the bits in LBR_SELECT operate in the suppressed mode. For the bits in LBR_CTL, the polarity is inverted. For the previous model-specific LBR format 5 (LBR_FORMAT_INFO), if the NO_CYCLES and NO_FLAGS type are set, the flag LBR_NO_INFO will be set to avoid the unnecessary LBR_INFO MSR read. Although Architecture LBR also has a dedicated LBR_INFO MSR, perf doesn't need to check and set the flag LBR_NO_INFO. For Architecture LBR, XSAVES instruction will be used as the default way to read the LBR MSRs all together. The overhead which the flag tries to avoid doesn't exist anymore. Dropping the flag can save the extra check for the flag in the lbr_read() later, and make the code cleaner. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-10-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
The LBR capabilities of Architecture LBR are retrieved from the CPUID enumeration once at boot time. The capabilities have to be saved for future usage. Several new fields are added into structure x86_pmu to indicate the capabilities. The fields will be used in the following patches. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-9-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
The type of task_ctx is hardcoded as struct x86_perf_task_context, which doesn't apply for Architecture LBR. For example, Architecture LBR doesn't have the TOS MSR. The number of LBR entries is variable. A new struct will be introduced for Architecture LBR. Perf has to determine the type of task_ctx at run time. The type of task_ctx pointer is changed to 'void *', which will be determined at run time. The generic LBR optimization can be shared between Architecture LBR and model-specific LBR. Both need to access the structure for the generic LBR optimization. A helper task_context_opt() is introduced to retrieve the pointer of the structure at run time. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-7-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
To reduce the overhead of a context switch with LBR enabled, some generic optimizations were introduced, e.g. avoiding restore LBR if no one else touched them. The generic optimizations can also be used by Architecture LBR later. Currently, the fields for the generic optimizations are part of structure x86_perf_task_context, which will be deprecated by Architecture LBR. A new structure should be introduced for the common fields of generic optimization, which can be shared between Architecture LBR and model-specific LBR. Both 'valid_lbrs' and 'tos' are also used by the generic optimizations, but they are not moved into the new structure, because Architecture LBR is stack-like. The 'valid_lbrs' which records the index of the valid LBR is not required anymore. The TOS MSR will be removed. LBR registers may be cleared in the deep Cstate. If so, the generic optimizations should not be applied. Perf has to unconditionally restore the LBR registers. A generic function is required to detect the reset due to the deep Cstate. lbr_is_reset_in_cstate() is introduced. Currently, for the model-specific LBR, the TOS MSR is used to detect the reset. There will be another method introduced for Architecture LBR later. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-6-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
The MSRs of Architectural LBR are different from previous model-specific LBR. Perf has to implement different functions to save and restore them. The function pointers for LBR save and restore are introduced. Perf should initialize the corresponding functions at boot time. The generic optimizations, e.g. avoiding restore LBR if no one else touched them, still apply for Architectural LBRs. The related codes are not moved to model-specific functions. Current model-specific LBR functions are set as default. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-5-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
The method to read Architectural LBRs is different from previous model-specific LBR. Perf has to implement a different function. A function pointer for LBR read is introduced. Perf should initialize the corresponding function at boot time, and avoid checking lbr_format at run time. The current 64-bit LBR read function is set as default. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-4-git-send-email-kan.liang@linux.intel.com
-
由 Kan Liang 提交于
The method to reset Architectural LBRs is different from previous model-specific LBR. Perf has to implement a different function. A function pointer is introduced for LBR reset. The enum of LBR_FORMAT_* is also moved to perf_event.h. Perf should initialize the corresponding functions at boot time, and avoid checking lbr_format at run time. The current 64-bit LBR reset function is set as default. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1593780569-62993-3-git-send-email-kan.liang@linux.intel.com
-
- 02 7月, 2020 3 次提交
-
-
由 Like Xu 提交于
When a guest wants to use the LBR registers, its hypervisor creates a guest LBR event and let host perf schedules it. The LBR records msrs are accessible to the guest when its guest LBR event is scheduled on by the perf subsystem. Before scheduling this event out, we should avoid host changes on IA32_DEBUGCTLMSR or LBR_SELECT. Otherwise, some unexpected branch operations may interfere with guest behavior, pollute LBR records, and even cause host branches leakage. In addition, the read operation on host is also avoidable. To ensure that guest LBR records are not lost during the context switch, the guest LBR event would enable the callstack mode which could save/restore guest unread LBR records with the help of intel_pmu_lbr_sched_task() naturally. However, the guest LBR_SELECT may changes for its own use and the host LBR event doesn't save/restore it. To ensure that we doesn't lost the guest LBR_SELECT value when the guest LBR event is running, the vlbr_constraint is bound up with a new constraint flag PERF_X86_EVENT_LBR_SELECT. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200514083054.62538-6-like.xu@linux.intel.com
-
由 Like Xu 提交于
The hypervisor may request the perf subsystem to schedule a time window to directly access the LBR records msrs for its own use. Normally, it would create a guest LBR event with callstack mode enabled, which is scheduled along with other ordinary LBR events on the host but in an exclusive way. To avoid wasting a counter for the guest LBR event, the perf tracks its hw->idx via INTEL_PMC_IDX_FIXED_VLBR and assigns it with a fake VLBR counter with the help of new vlbr_constraint. As with the BTS event, there is actually no hardware counter assigned for the guest LBR event. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200514083054.62538-5-like.xu@linux.intel.com
-
由 Wei Wang 提交于
The MSR variable type can be 'unsigned int', which uses less memory than the longer 'unsigned long'. Fix 'struct x86_pmu' for that. The lbr_nr won't be a negative number, so make it 'unsigned int' as well. Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200613080958.132489-2-like.xu@linux.intel.com
-
- 01 5月, 2020 1 次提交
-
-
由 CodyYao-oc 提交于
Zhaoxin CPU has provided facilities for monitoring performance via PMU (Performance Monitor Unit), but the functionality is unused so far. Therefore, add support for zhaoxin pmu to make performance related hardware events available. The PMU is mostly an Intel Architectural PerfMon-v2 with a novel errata for the ZXC line. It supports the following events: ----------------------------------------------------------------------------------------------------------------------------------- Event | Event | Umask | Description | Select | | ----------------------------------------------------------------------------------------------------------------------------------- cpu-cycles | 82h | 00h | unhalt core clock instructions | 00h | 00h | number of instructions at retirement. cache-references | 15h | 05h | number of fillq pushs at the current cycle. cache-misses | 1ah | 05h | number of l2 miss pushed by fillq. branch-instructions | 28h | 00h | counts the number of branch instructions retired. branch-misses | 29h | 00h | mispredicted branch instructions at retirement. bus-cycles | 83h | 00h | unhalt bus clock stalled-cycles-frontend | 01h | 01h | Increments each cycle the # of Uops issued by the RAT to RS. stalled-cycles-backend | 0fh | 04h | RS0/1/2/3/45 empty L1-dcache-loads | 68h | 05h | number of retire/commit load. L1-dcache-load-misses | 4bh | 05h | retired load uops whose data source followed an L1 miss. L1-dcache-stores | 69h | 06h | number of retire/commit Store,no LEA L1-dcache-store-misses | 62h | 05h | cache lines in M state evicted out of L1D due to Snoop HitM or dirty line replacement. L1-icache-loads | 00h | 03h | number of l1i cache access for valid normal fetch,including un-cacheable access. L1-icache-load-misses | 01h | 03h | number of l1i cache miss for valid normal fetch,including un-cacheable miss. L1-icache-prefetches | 0ah | 03h | number of prefetch. L1-icache-prefetch-misses | 0bh | 03h | number of prefetch miss. dTLB-loads | 68h | 05h | number of retire/commit load dTLB-load-misses | 2ch | 05h | number of load operations miss all level tlbs and cause a tablewalk. dTLB-stores | 69h | 06h | number of retire/commit Store,no LEA dTLB-store-misses | 30h | 05h | number of store operations miss all level tlbs and cause a tablewalk. dTLB-prefetches | 64h | 05h | number of hardware pte prefetch requests dispatched out of the prefetch FIFO. dTLB-prefetch-misses | 65h | 05h | number of hardware pte prefetch requests miss the l1d data cache. iTLB-load | 00h | 00h | actually counter instructions. iTLB-load-misses | 34h | 05h | number of code operations miss all level tlbs and cause a tablewalk. ----------------------------------------------------------------------------------------------------------------------------------- Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NCodyYao-oc <CodyYao-oc@zhaoxin.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1586747669-4827-1-git-send-email-CodyYao-oc@zhaoxin.com
-
- 17 1月, 2020 2 次提交
-
-
由 Kim Phillips 提交于
Description of hardware operation --------------------------------- The core AMD PMU has a 4-bit wide per-cycle increment for each performance monitor counter. That works for most events, but now with AMD Family 17h and above processors, some events can occur more than 15 times in a cycle. Those events are called "Large Increment per Cycle" events. In order to count these events, two adjacent h/w PMCs get their count signals merged to form 8 bits per cycle total. In addition, the PERF_CTR count registers are merged to be able to count up to 64 bits. Normally, events like instructions retired, get programmed on a single counter like so: PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count The next counter at MSRs 0xc0010202-3 remains unused, or can be used independently to count something else. When counting Large Increment per Cycle events, such as FLOPs, however, we now have to reserve the next counter and program the PERF_CTL (config) register with the Merge event (0xFFF), like so: PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count The count is widened from the normal 48-bits to 64 bits by having the second counter carry the higher 16 bits of the count in its lower 16 bits of its counter register. The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge event before the even counter, PERF_CTL0. The Large Increment feature is available starting with Family 17h. For more details, search any Family 17h PPR for the "Large Increment per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this version: https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip Description of software operation --------------------------------- The following steps are taken in order to support reserving and enabling the extra counter for Large Increment per Cycle events: 1. In the main x86 scheduler, we reduce the number of available counters by the number of Large Increment per Cycle events being scheduled, tracked by a new cpuc variable 'n_pair' and a new amd_put_event_constraints_f17h(). This improves the counter scheduler success rate. 2. In perf_assign_events(), if a counter is assigned to a Large Increment event, we increment the current counter variable, so the counter used for the Merge event is removed from assignment consideration by upcoming event assignments. 3. In find_counter(), if a counter has been found for the Large Increment event, we set the next counter as used, to prevent other events from using it. 4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e., we add Merge event accounting to the existing used_mask logic. 5. Finally, we add on the programming of Merge event to the neighbouring PMC counters in the counter enable/disable{_all} code paths. Currently, software does not support a single PMU with mixed 48- and 64-bit counting, so Large increment event counts are limited to 48 bits. In set_period, we zero-out the upper 16 bits of the count, so the hardware doesn't copy them to the even counter's higher bits. Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm vaddps instruction executed in a loop 100 million times: perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload> Performance counter stats for '<workload>': 800,000,000 cpu/fp_ret_sse_avx_ops.all/u 300,042,101 cpu/instructions/u Prior to this patch, the reported SSE/AVX FLOPs retired count would be wrong. [peterz: lots of renames and edits to the code] Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Kim Phillips 提交于
AMD Family 17h processors and above gain support for Large Increment per Cycle events. Unfortunately there is no CPUID or equivalent bit that indicates whether the feature exists or not, so we continue to determine eligibility based on a CPU family number comparison. For Large Increment per Cycle events, we add a f17h-and-compatibles get_event_constraints_f17h() that returns an even counter bitmask: Large Increment per Cycle events can only be placed on PMCs 0, 2, and 4 out of the currently available 0-5. The only currently public event that requires this feature to report valid counts is PMCx003 "Retired SSE/AVX Operations". Note that the CPU family logic in amd_core_pmu_init() is changed so as to be able to selectively add initialization for features available in ranges of backward-compatible CPU families. This Large Increment per Cycle feature is expected to be retained in future families. A side-effect of assigning a new get_constraints function for f17h disables calling the old (prior to f15h) amd_get_event_constraints implementation left enabled by commit e40ed154 ("perf/x86: Add perf support for AMD family-17h processors"), which is no longer necessary since those North Bridge event codes are obsoleted. Also fix a spelling mistake whilst in the area (calulating -> calculating). Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.com
-
- 28 10月, 2019 2 次提交
-
-
由 Alexey Budankov 提交于
Implement intel_pmu_lbr_swap_task_ctx() method updating counters of the events that requested LBR callstack data on a sample. The counter can be zero for the case when task context belongs to a thread that has just come from a block on a futex and the context contains saved (lbr_stack_state == LBR_VALID) LBR register values. For the values to be restored at LBR registers on the next thread's switch-in event it swaps the counter value with the one that is expected to be non zero at the previous equivalent task perf event context. Swap operation type ensures the previous task perf event context stays consistent with the amount of events that requested LBR callstack data on a sample. Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/261ac742-9022-c3f4-5885-1eae7415b091@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexey Budankov 提交于
Declare swap_task_ctx() methods at the generic and x86 specific pmu types to bridge calls to platform specific PMU code on optimized context switch path between equivalent task perf event contexts. Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/9a0aa84a-f062-9b64-3133-373658550c4b@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 8月, 2019 1 次提交
-
-
由 Alexander Shishkin 提交于
If PEBS declares ability to output its data to Intel PT stream, use the aux_output attribute bit to enable PEBS data output to PT. This requires a PT event to be present and scheduled in the same context. Unlike the DS area, the kernel does not extract PEBS records from the PT stream to generate corresponding records in the perf stream, because that would require real time in-kernel PT decoding, which is not feasible. The PMI, however, can still be used. The output setting is per-CPU, so all PEBS events must be either writing to PT or to the DS area, therefore, in case of conflict, the conflicting event will fail to schedule, allowing the rotation logic to alternate between the PEBS->PT and PEBS->DS events. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: kan.liang@linux.intel.com Link: https://lkml.kernel.org/r/20190806084606.4021-3-alexander.shishkin@linux.intel.com
-
- 25 6月, 2019 2 次提交
-
-
由 Kan Liang 提交于
We don't need pmu->pebs_no_xmm_regs anymore, the capabilities PERF_PMU_CAP_EXTENDED_REGS can be used to check if XMM registers collection is supported. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Use generic macro PERF_REG_EXTENDED_MASK to replace PEBS_XMM_REGS to avoid duplication. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lkml.kernel.org/r/1559081314-9714-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 6月, 2019 4 次提交
-
-
由 Jiri Olsa 提交于
Using the new pmu::update_attrs attribute group for default attributes - freeze_on_smi, allow_tsx_force_abort. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190512155518.21468-10-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
Using the new pmu::update_attrs attribute group for "caps" directory. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190512155518.21468-7-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
Using the new pmu::update_attrs attribute group to create detected events for x86_pmu. Moving the topdown/memory/tsx attributes to separate attribute groups with specific is_visible functions. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190512155518.21468-5-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
Nobody is using that. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190512155518.21468-4-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 5月, 2019 1 次提交
-
-
由 Stephane Eranian 提交于
On Intel Westmere, a cmdline as follows: $ perf record -e cpu/event=0xc4,umask=0x2,name=br_inst_retired.near_call/p .... was failing. Yet the event+ umask support PEBS. It turns out this is due to a bug in the the PEBS event constraint table for westmere. All forms of BR_INST_RETIRED.* support PEBS. Therefore the constraint mask should ignore the umask. The name of the macro INTEL_FLAGS_EVENT_CONSTRAINT() hint that this is the case but it was not. That macros was checking both the event code and event umask. Therefore, it was only matching on 0x00c4. There are code+umask macros, they all have *UEVENT*. This bug fixes the issue by checking only the event code in the mask. Both single and range version are modified. Signed-off-by: NStephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/20190509214556.123493-1-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 4月, 2019 1 次提交
-
-
由 Kan Liang 提交于
Add Icelake core PMU perf code, including constraint tables and the main enable code. Icelake expanded the generic counters to always 8 even with HT on, but a range of events cannot be scheduled on the extra 4 counters. Add new constraint ranges to describe this to the scheduler. The number of constraints that need to be checked is larger now than with earlier CPUs. At some point we may need a new data structure to look them up more efficiently than with linear search. So far it still seems to be acceptable however. Icelake added a new fixed counter SLOTS. Full support for it is added later in the patch series. The cache events table is identical to Skylake. Compare to PEBS instruction event on generic counter, fixed counter 0 has less skid. Force instruction:ppp always in fixed counter 0. Originally-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Link: https://lkml.kernel.org/r/20190402194509.2832-9-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-