1. 20 4月, 2021 12 次提交
  2. 16 4月, 2021 1 次提交
  3. 01 2月, 2021 3 次提交
    • K
      perf/x86/intel: Support CPUID 10.ECX to disable fixed counters · 32451614
      Kan Liang 提交于
      With Architectural Performance Monitoring Version 5, CPUID 10.ECX cpu
      leaf indicates the fixed counter enumeration. This extends the previous
      count to a bitmap which allows disabling even lower fixed counters.
      It could be used by a Hypervisor.
      
      The existing intel_ctrl variable is used to remember the bitmask of the
      counters. All code that reads all counters is fixed to check this extra
      bitmask.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-6-git-send-email-kan.liang@linux.intel.com
      32451614
    • K
      perf/x86/intel: Add perf core PMU support for Sapphire Rapids · 61b985e3
      Kan Liang 提交于
      Add perf core PMU support for the Intel Sapphire Rapids server, which is
      the successor of the Intel Ice Lake server. The enabling code is based
      on Ice Lake, but there are several new features introduced.
      
      The event encoding is changed and simplified, e.g., the event codes
      which are below 0x90 are restricted to counters 0-3. The event codes
      which above 0x90 are likely to have no restrictions. The event
      constraints, extra_regs(), and hardware cache events table are changed
      accordingly.
      
      A new Precise Distribution (PDist) facility is introduced, which
      further minimizes the skid when a precise event is programmed on the GP
      counter 0. Enable the Precise Distribution (PDist) facility with :ppp
      event. For this facility to work, the period must be initialized with a
      value larger than 127. Add spr_limit_period() to apply the limit for
      :ppp event.
      
      Two new data source fields, data block & address block, are added in the
      PEBS Memory Info Record for the load latency event. To enable the
      feature,
      - An auxiliary event has to be enabled together with the load latency
        event on Sapphire Rapids. A new flag PMU_FL_MEM_LOADS_AUX is
        introduced to indicate the case. A new event, mem-loads-aux, is
        exposed to sysfs for the user tool.
        Add a check in hw_config(). If the auxiliary event is not detected,
        return an unique error -ENODATA.
      - The union perf_mem_data_src is extended to support the new fields.
      - Ice Lake and earlier models do not support block information, but the
        fields may be set by HW on some machines. Add pebs_no_block to
        explicitly indicate the previous platforms which don't support the new
        block fields. Accessing the new block fields are ignored on those
        platforms.
      
      A new store Latency facility is introduced, which leverages the PEBS
      facility where it can provide additional information about sampled
      stores. The additional information includes the data address, memory
      auxiliary info (e.g. Data Source, STLB miss) and the latency of the
      store access. To enable the facility, the new event (0x02cd) has to be
      programed on the GP counter 0. A new flag PERF_X86_EVENT_PEBS_STLAT is
      introduced to indicate the event. The store_latency_data() is introduced
      to parse the memory auxiliary info.
      
      The layout of access latency field of PEBS Memory Info Record has been
      changed. Two latency, instruction latency (bit 15:0) and cache access
      latency (bit 47:32) are recorded.
      - The cache access latency is similar to previous memory access latency.
        For loads, the latency starts by the actual cache access until the
        data is returned by the memory subsystem.
        For stores, the latency starts when the demand write accesses the L1
        data cache and lasts until the cacheline write is completed in the
        memory subsystem.
        The cache access latency is stored in low 32bits of the sample type
        PERF_SAMPLE_WEIGHT_STRUCT.
      - The instruction latency starts by the dispatch of the load operation
        for execution and lasts until completion of the instruction it belongs
        to.
        Add a new flag PMU_FL_INSTR_LATENCY to indicate the instruction
        latency support. The instruction latency is stored in the bit 47:32
        of the sample type PERF_SAMPLE_WEIGHT_STRUCT.
      
      Extends the PERF_METRICS MSR to feature TMA method level 2 metrics. The
      lower half of the register is the TMA level 1 metrics (legacy). The
      upper half is also divided into four 8-bit fields for the new level 2
      metrics. Expose all eight Topdown metrics events to user space.
      
      The full description for the SPR features can be found at Intel
      Architecture Instruction Set Extensions and Future Features
      Programming Reference, 319433-041.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-5-git-send-email-kan.liang@linux.intel.com
      61b985e3
    • K
      perf/x86/intel: Filter unsupported Topdown metrics event · 1ab5f235
      Kan Liang 提交于
      Intel Sapphire Rapids server will introduce 8 metrics events. Intel
      Ice Lake only supports 4 metrics events. A perf tool user may mistakenly
      use the unsupported events via RAW format on Ice Lake. The user can
      still get a value from the unsupported Topdown metrics event once the
      following Sapphire Rapids enabling patch is applied.
      
      To enable the 8 metrics events on Intel Sapphire Rapids, the
      INTEL_TD_METRIC_MAX has to be updated, which impacts the
      is_metric_event(). The is_metric_event() is a generic function.
      On Ice Lake, the newly added SPR metrics events will be mistakenly
      accepted as metric events on creation. At runtime, the unsupported
      Topdown metrics events will be updated.
      
      Add a variable num_topdown_events in x86_pmu to indicate the available
      number of the Topdown metrics event on the platform. Apply the number
      into is_metric_event(). Only the supported Topdown metrics events
      should be created as metrics events.
      
      Apply the num_topdown_events in icl_update_topdown_event() as well. The
      function can be reused by the following patch.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-4-git-send-email-kan.liang@linux.intel.com
      1ab5f235
  4. 28 1月, 2021 1 次提交
    • P
      perf/intel: Remove Perfmon-v4 counter_freezing support · 3daa96d6
      Peter Zijlstra 提交于
      Perfmon-v4 counter freezing is fundamentally broken; remove this default
      disabled code to make sure nobody uses it.
      
      The feature is called Freeze-on-PMI in the SDM, and if it would do that,
      there wouldn't actually be a problem, *however* it does something subtly
      different. It globally disables the whole PMU when it raises the PMI,
      not when the PMI hits.
      
      This means there's a window between the PMI getting raised and the PMI
      actually getting served where we loose events and this violates the
      perf counter independence. That is, a counting event should not result
      in a different event count when there is a sampling event co-scheduled.
      
      This is known to break existing software (RR).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      3daa96d6
  5. 10 11月, 2020 2 次提交
  6. 29 10月, 2020 1 次提交
  7. 06 10月, 2020 2 次提交
    • P
      perf/x86: Fix n_metric for cancelled txn · 3dbde695
      Peter Zijlstra 提交于
      When a group that has TopDown members is failed to be scheduled, any
      later TopDown groups will not return valid values.
      
      Here is an example.
      
      A background perf that occupies all the GP counters and the fixed
      counter 1.
       $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles,
                       cycles,cycles}:D" -a
      
      A user monitors a TopDown group. It works well, because the fixed
      counter 3 and the PERF_METRICS are available.
       $perf stat -x, --topdown -- ./workload
         retiring,bad speculation,frontend bound,backend bound,
         18.0,16.1,40.4,25.5,
      
      Then the user tries to monitor a group that has TopDown members.
      Because of the cycles event, the group is failed to be scheduled.
       $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound,
                           topdown-fe-bound,topdown-bad-spec,cycles}'
                           -- ./workload
          <not counted>,,slots,0,0.00,,
          <not counted>,,topdown-retiring,0,0.00,,
          <not counted>,,topdown-be-bound,0,0.00,,
          <not counted>,,topdown-fe-bound,0,0.00,,
          <not counted>,,topdown-bad-spec,0,0.00,,
          <not counted>,,cycles,0,0.00,,
      
      The user tries to monitor a TopDown group again. It doesn't work anymore.
       $perf stat -x, --topdown -- ./workload
      
          ,,,,,
      
      In a txn, cancel_txn() is to truncate the event_list for a canceled
      group and update the number of events added in this transaction.
      However, the number of TopDown events added in this transaction is not
      updated. The kernel will probably fail to add new Topdown events.
      
      Fixes: 7b2c05a1 ("perf/x86/intel: Generic support for hardware TopDown metrics")
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Reported-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NKan Liang <kan.liang@linux.intel.com>
      Link: https://lkml.kernel.org/r/20201005082611.GH2628@hirez.programming.kicks-ass.net
      3dbde695
    • P
      perf/x86: Fix n_pair for cancelled txn · 871a93b0
      Peter Zijlstra 提交于
      Kan reported that n_metric gets corrupted for cancelled transactions;
      a similar issue exists for n_pair for AMD's Large Increment thing.
      
      The problem was confirmed and confirmed fixed by Kim using:
      
        sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 &
      
        # should succeed:
        sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload
      
        # should fail:
        sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload
      
        # previously failed, now succeeds with this patch:
        sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload
      
      Fixes: 57388912 ("perf/x86/amd: Add support for Large Increment per Cycle Events")
      Reported-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NKim Phillips <kim.phillips@amd.com>
      Link: https://lkml.kernel.org/r/20201005082516.GG2628@hirez.programming.kicks-ass.net
      871a93b0
  8. 18 8月, 2020 3 次提交
    • K
      perf/x86/intel: Support TopDown metrics on Ice Lake · 59a854e2
      Kan Liang 提交于
      Ice Lake supports the hardware TopDown metrics feature, which can free
      up the scarce GP counters.
      
      Update the event constraints for the metrics events. The metric counters
      do not exist, which are mapped to a dummy offset. The sharing between
      multiple users of the same metric without multiplexing is not allowed.
      
      Implement set_topdown_event_period for Ice Lake. The values in
      PERF_METRICS MSR are derived from the fixed counter 3. Both registers
      should start from zero.
      
      Implement update_topdown_event for Ice Lake. The metric is reported by
      multiplying the metric (fraction) with slots. To maintain accurate
      measurements, both registers are cleared for each update. The fixed
      counter 3 should always be cleared before the PERF_METRICS.
      
      Implement td_attr for the new metrics events and the new slots fixed
      counter. Make them visible to the perf user tools.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
      59a854e2
    • K
      perf/x86/intel: Generic support for hardware TopDown metrics · 7b2c05a1
      Kan Liang 提交于
      Intro
      =====
      
      The TopDown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. Current perf has supported the method.
      
      The method works well, but there is one problem. To collect the TopDown
      events, several GP counters have to be used. If a user wants to collect
      other events at the same time, the multiplexing probably be triggered,
      which impacts the accuracy.
      
      To free up the scarce GP counters, the hardware TopDown metrics feature
      is introduced from Ice Lake. The hardware implements an additional
      "metrics" register and a new Fixed Counter 3 that measures pipeline
      "slots". The TopDown events can be calculated from them instead.
      
      Events
      ======
      
      The level 1 TopDown has four metrics. There is no event-code assigned to
      the TopDown metrics. Four metric events are exported as separate perf
      events, which map to the internal "metrics" counter register. Those
      events do not exist in hardware, but can be allocated by the scheduler.
      
      For the event mapping, a special 0x00 event code is used, which is
      reserved for fake events. The metric events start from umask 0x10.
      
      When setting up the metric events, they point to the Fixed Counter 3.
      They have to be specially handled.
      - Add the update_topdown_event() callback to read the additional metrics
        MSR and generate the metrics.
      - Add the set_topdown_event_period() callback to initialize metrics MSR
        and the fixed counter 3.
      - Add a variable n_metric_event to track the number of the accepted
        metrics events. The sharing between multiple users of the same metric
        without multiplexing is not allowed.
      - Only enable/disable the fixed counter 3 when there are no other active
        TopDown events, which avoid the unnecessary writing of the fixed
        control register.
      - Disable the PMU when reading the metrics event. The metrics MSR and
        the fixed counter 3 are read separately. The values may be modified by
        an NMI.
      
      All four metric events don't support sampling. Since they will be
      handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
      introduced to indicate this case.
      
      The slots event can support both sampling and counting.
      For counting, the flag is also applied.
      For sampling, it will be handled normally as other normal events.
      
      Groups
      ======
      
      The slots event is required in a Topdown group.
      To avoid reading the METRICS register multiple times, the metrics and
      slots value can only be updated by slots event in a group.
      All active slots and metrics events will be updated one time.
      Therefore, the slots event must be before any metric events in a Topdown
      group.
      
      NMI
      ======
      
      The METRICS related register may be overflow. The bit 48 of the STATUS
      register will be set. If so, PERF_METRICS and Fixed counter 3 are
      required to be reset. The patch also update all active slots and
      metrics events in the NMI handler.
      
      The update_topdown_event() has to read two registers separately. The
      values may be modified by an NMI. PMU has to be disabled before calling
      the function.
      
      RDPMC
      ======
      
      RDPMC is temporarily disabled. A later patch will enable it.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
      7b2c05a1
    • K
      perf/x86/intel: Fix the name of perf METRICS · bbdbde2a
      Kan Liang 提交于
      Bit 15 of the PERF_CAPABILITIES MSR indicates that the perf METRICS
      feature is supported. The perf METRICS is not a PEBS feature.
      
      Rename pebs_metrics_available perf_metrics.
      
      The bit is not used in the current code. It will be used in a later
      patch.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-6-kan.liang@linux.intel.com
      bbdbde2a
  9. 08 7月, 2020 12 次提交
    • K
      perf/x86/intel/lbr: Support XSAVES for arch LBR read · c085fb87
      Kan Liang 提交于
      Reading LBR registers in a perf NMI handler for a non-PEBS event
      causes a high overhead because the number of LBR registers is huge.
      To reduce the overhead, the XSAVES instruction should be used to replace
      the LBR registers' reading method.
      
      The XSAVES buffer used for LBR read has to be per-CPU because the NMI
      handler invoked the lbr_read(). The existing task_ctx_data buffer
      cannot be used which is per-task and only be allocated for the LBR call
      stack mode. A new lbr_xsave pointer is introduced in the cpu_hw_events
      as an XSAVES buffer for LBR read.
      
      The XSAVES buffer should be allocated only when LBR is used by a
      non-PEBS event on the CPU because the total size of the lbr_xsave is
      not small (~1.4KB).
      
      The XSAVES buffer is allocated when a non-PEBS event is added, but it
      is lazily released in x86_release_hardware() when perf releases the
      entire PMU hardware resource, because perf may frequently schedule the
      event, e.g. high context switch. The lazy release method reduces the
      overhead of frequently allocate/free the buffer.
      
      If the lbr_xsave fails to be allocated, roll back to normal Arch LBR
      lbr_read().
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Link: https://lkml.kernel.org/r/1593780569-62993-24-git-send-email-kan.liang@linux.intel.com
      c085fb87
    • K
      perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch · ce711ea3
      Kan Liang 提交于
      In the LBR call stack mode, LBR information is used to reconstruct a
      call stack. To get the complete call stack, perf has to save/restore
      all LBR registers during a context switch. Due to a large number of the
      LBR registers, this process causes a high CPU overhead. To reduce the
      CPU overhead during a context switch, use the XSAVES/XRSTORS
      instructions.
      
      Every XSAVE area must follow a canonical format: the legacy region, an
      XSAVE header and the extended region. Although the LBR information is
      only kept in the extended region, a space for the legacy region and
      XSAVE header is still required. Add a new dedicated structure for LBR
      XSAVES support.
      
      Before enabling XSAVES support, the size of the LBR state has to be
      sanity checked, because:
      - the size of the software structure is calculated from the max number
      of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR.
      The size of the LBR state is enumerated by the CPUID leaf for XSAVE
      support of Arch LBR. If the values from the two CPUID leaves are not
      consistent, it may trigger a buffer overflow. For example, a hypervisor
      may unconsciously set inconsistent values for the two emulated CPUID.
      - unlike other state components, the size of an LBR state depends on the
      max number of LBRs, which may vary from generation to generation.
      
      Expose the function xfeature_size() for the sanity check.
      The LBR XSAVES support will be disabled if the size of the LBR state
      enumerated by CPUID doesn't match with the size of the software
      structure.
      
      The XSAVE instruction requires 64-byte alignment for state buffers. A
      new macro is added to reflect the alignment requirement. A 64-byte
      aligned kmem_cache is created for architecture LBR.
      
      Currently, the structure for each state component is maintained in
      fpu/types.h. The structure for the new LBR state component should be
      maintained in the same place. Move structure lbr_entry to fpu/types.h as
      well for broader sharing.
      
      Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support,
      which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR
      information at the context switch when the call stack mode is enabled.
      Since the XSAVES/XRSTORS instructions will be eventually invoked, the
      dedicated functions is named with '_xsaves'/'_xrstors' postfix.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
      ce711ea3
    • K
      perf/x86/intel/lbr: Support Architectural LBR · 47125db2
      Kan Liang 提交于
      Last Branch Records (LBR) enables recording of software path history by
      logging taken branches and other control flows within architectural
      registers now. Intel CPUs have had model-specific LBR for quite some
      time, but this evolves them into an architectural feature now.
      
      The main improvements of Architectural LBR implemented includes:
      - Linux kernel can support the LBR features without knowing the model
        number of the current CPU.
      - Architectural LBR capabilities can be enumerated by CPUID. The
        lbr_ctl_map is based on the CPUID Enumeration.
      - The possible LBR depth can be retrieved from CPUID enumeration. The
        max value is written to the new MSR_ARCH_LBR_DEPTH as the number of
        LBR entries.
      - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs,
        which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR.
      - Each LBR record or entry is still comprised of three MSRs,
        IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP.
        But they become the architectural MSRs.
      - Architectural LBR is stack-like now. Entry 0 is always the youngest
        branch, entry 1 the next youngest... The TOS MSR has been removed.
      
      The way to enable/disable Architectural LBR is similar to the previous
      model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but
      some modifications are required, which include:
      - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural
        LBR.
      - When checking the value of the IA32_DEBUGCTL MSR, ignoring the
        DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning
        and always return 0.
      - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because
        MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for
        Architectural LBR.
      - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for
        Architectural LBR.
      
      Some Architectural LBR dedicated functions are implemented to
      reset/read/save/restore LBR.
      - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR
        entries, which is a lot faster and can improve the context switch
        latency.
      - For read, the branch type information can be retrieved from
        the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to
        OTHER_BRANCH type. The software decoding is still required for the
        OTHER_BRANCH case.
        LBR records are stored in the age order as well. Reuse
        intel_pmu_store_lbr(). Check the CPUID enumeration before accessing
        the corresponding bits in LBR_INFO.
      - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH).
        Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the
        LBR registers are reset in the deep C-state. If 'the deep C-state
        reset' bit is not set in CPUID enumeration, ignoring the check.
        XSAVE support for Architectural LBR will be implemented later.
      
      The number of LBR entries cannot be hardcoded anymore, which should be
      retrieved from CPUID enumeration. A new structure
      x86_perf_task_context_arch_lbr is introduced for Architectural LBR.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
      47125db2
    • K
      perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all() · fda1f99f
      Kan Liang 提交于
      The previous model-specific LBR and Architecture LBR (legacy way) use a
      similar method to save/restore the LBR information, which directly
      accesses the LBR registers. The codes which read/write a set of LBR
      registers can be shared between them.
      
      Factor out two functions which are used to read/write a set of LBR
      registers.
      
      Add lbr_info into structure x86_pmu, and use it to replace the hardcoded
      LBR INFO MSR, because the LBR INFO MSR address of the previous
      model-specific LBR is different from Architecture LBR. The MSR address
      should be assigned at boot time. For now, only Sky Lake and later
      platforms have the LBR INFO MSR.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-13-git-send-email-kan.liang@linux.intel.com
      fda1f99f
    • K
      perf/x86/intel/lbr: Unify the stored format of LBR information · 5624986d
      Kan Liang 提交于
      Current LBR information in the structure x86_perf_task_context is stored
      in a different format from the PEBS LBR record and Architecture LBR,
      which prevents the sharing of the common codes.
      
      Use the format of the PEBS LBR record as a unified format. Use a generic
      name lbr_entry to replace pebs_lbr_entry.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-11-git-send-email-kan.liang@linux.intel.com
      5624986d
    • K
      perf/x86/intel/lbr: Support LBR_CTL · 49d8184f
      Kan Liang 提交于
      An IA32_LBR_CTL is introduced for Architecture LBR to enable and config
      LBR registers to replace the previous LBR_SELECT.
      
      All the related members in struct cpu_hw_events and struct x86_pmu
      have to be renamed.
      
      Some new macros are added to reflect the layout of LBR_CTL.
      
      The mapping from PERF_SAMPLE_BRANCH_* to the corresponding bits in
      LBR_CTL MSR is saved in lbr_ctl_map now, which is not a const value.
      The value relies on the CPUID enumeration.
      
      For the previous model-specific LBR, most of the bits in LBR_SELECT
      operate in the suppressed mode. For the bits in LBR_CTL, the polarity is
      inverted.
      
      For the previous model-specific LBR format 5 (LBR_FORMAT_INFO), if the
      NO_CYCLES and NO_FLAGS type are set, the flag LBR_NO_INFO will be set to
      avoid the unnecessary LBR_INFO MSR read. Although Architecture LBR also
      has a dedicated LBR_INFO MSR, perf doesn't need to check and set the
      flag LBR_NO_INFO. For Architecture LBR, XSAVES instruction will be used
      as the default way to read the LBR MSRs all together. The overhead which
      the flag tries to avoid doesn't exist anymore. Dropping the flag can
      save the extra check for the flag in the lbr_read() later, and make the
      code cleaner.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-10-git-send-email-kan.liang@linux.intel.com
      49d8184f
    • K
      perf/x86: Expose CPUID enumeration bits for arch LBR · af6cf129
      Kan Liang 提交于
      The LBR capabilities of Architecture LBR are retrieved from the CPUID
      enumeration once at boot time. The capabilities have to be saved for
      future usage.
      
      Several new fields are added into structure x86_pmu to indicate the
      capabilities. The fields will be used in the following patches.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-9-git-send-email-kan.liang@linux.intel.com
      af6cf129
    • K
      perf/x86/intel/lbr: Use dynamic data structure for task_ctx · f42be865
      Kan Liang 提交于
      The type of task_ctx is hardcoded as struct x86_perf_task_context,
      which doesn't apply for Architecture LBR. For example, Architecture LBR
      doesn't have the TOS MSR. The number of LBR entries is variable. A new
      struct will be introduced for Architecture LBR. Perf has to determine
      the type of task_ctx at run time.
      
      The type of task_ctx pointer is changed to 'void *', which will be
      determined at run time.
      
      The generic LBR optimization can be shared between Architecture LBR and
      model-specific LBR. Both need to access the structure for the generic
      LBR optimization. A helper task_context_opt() is introduced to retrieve
      the pointer of the structure at run time.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-7-git-send-email-kan.liang@linux.intel.com
      f42be865
    • K
      perf/x86/intel/lbr: Factor out a new struct for generic optimization · 530bfff6
      Kan Liang 提交于
      To reduce the overhead of a context switch with LBR enabled, some
      generic optimizations were introduced, e.g. avoiding restore LBR if no
      one else touched them. The generic optimizations can also be used by
      Architecture LBR later. Currently, the fields for the generic
      optimizations are part of structure x86_perf_task_context, which will be
      deprecated by Architecture LBR. A new structure should be introduced
      for the common fields of generic optimization, which can be shared
      between Architecture LBR and model-specific LBR.
      
      Both 'valid_lbrs' and 'tos' are also used by the generic optimizations,
      but they are not moved into the new structure, because Architecture LBR
      is stack-like. The 'valid_lbrs' which records the index of the valid LBR
      is not required anymore. The TOS MSR will be removed.
      
      LBR registers may be cleared in the deep Cstate. If so, the generic
      optimizations should not be applied. Perf has to unconditionally
      restore the LBR registers. A generic function is required to detect the
      reset due to the deep Cstate. lbr_is_reset_in_cstate() is introduced.
      Currently, for the model-specific LBR, the TOS MSR is used to detect the
      reset. There will be another method introduced for Architecture LBR
      later.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-6-git-send-email-kan.liang@linux.intel.com
      530bfff6
    • K
      perf/x86/intel/lbr: Add the function pointers for LBR save and restore · 799571bf
      Kan Liang 提交于
      The MSRs of Architectural LBR are different from previous model-specific
      LBR. Perf has to implement different functions to save and restore them.
      
      The function pointers for LBR save and restore are introduced. Perf
      should initialize the corresponding functions at boot time.
      
      The generic optimizations, e.g. avoiding restore LBR if no one else
      touched them, still apply for Architectural LBRs. The related codes are
      not moved to model-specific functions.
      
      Current model-specific LBR functions are set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-5-git-send-email-kan.liang@linux.intel.com
      799571bf
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR read · c301b1d8
      Kan Liang 提交于
      The method to read Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer for LBR read is introduced. Perf should initialize
      the corresponding function at boot time, and avoid checking lbr_format
      at run time.
      
      The current 64-bit LBR read function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-4-git-send-email-kan.liang@linux.intel.com
      c301b1d8
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR reset · 9f354a72
      Kan Liang 提交于
      The method to reset Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer is introduced for LBR reset. The enum of
      LBR_FORMAT_* is also moved to perf_event.h. Perf should initialize the
      corresponding functions at boot time, and avoid checking lbr_format at
      run time.
      
      The current 64-bit LBR reset function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-3-git-send-email-kan.liang@linux.intel.com
      9f354a72
  10. 02 7月, 2020 3 次提交