1. 20 4月, 2021 10 次提交
  2. 06 3月, 2021 1 次提交
  3. 10 2月, 2021 1 次提交
  4. 01 2月, 2021 4 次提交
    • K
      perf/x86/intel: Support CPUID 10.ECX to disable fixed counters · 32451614
      Kan Liang 提交于
      With Architectural Performance Monitoring Version 5, CPUID 10.ECX cpu
      leaf indicates the fixed counter enumeration. This extends the previous
      count to a bitmap which allows disabling even lower fixed counters.
      It could be used by a Hypervisor.
      
      The existing intel_ctrl variable is used to remember the bitmask of the
      counters. All code that reads all counters is fixed to check this extra
      bitmask.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-6-git-send-email-kan.liang@linux.intel.com
      32451614
    • K
      perf/x86/intel: Add perf core PMU support for Sapphire Rapids · 61b985e3
      Kan Liang 提交于
      Add perf core PMU support for the Intel Sapphire Rapids server, which is
      the successor of the Intel Ice Lake server. The enabling code is based
      on Ice Lake, but there are several new features introduced.
      
      The event encoding is changed and simplified, e.g., the event codes
      which are below 0x90 are restricted to counters 0-3. The event codes
      which above 0x90 are likely to have no restrictions. The event
      constraints, extra_regs(), and hardware cache events table are changed
      accordingly.
      
      A new Precise Distribution (PDist) facility is introduced, which
      further minimizes the skid when a precise event is programmed on the GP
      counter 0. Enable the Precise Distribution (PDist) facility with :ppp
      event. For this facility to work, the period must be initialized with a
      value larger than 127. Add spr_limit_period() to apply the limit for
      :ppp event.
      
      Two new data source fields, data block & address block, are added in the
      PEBS Memory Info Record for the load latency event. To enable the
      feature,
      - An auxiliary event has to be enabled together with the load latency
        event on Sapphire Rapids. A new flag PMU_FL_MEM_LOADS_AUX is
        introduced to indicate the case. A new event, mem-loads-aux, is
        exposed to sysfs for the user tool.
        Add a check in hw_config(). If the auxiliary event is not detected,
        return an unique error -ENODATA.
      - The union perf_mem_data_src is extended to support the new fields.
      - Ice Lake and earlier models do not support block information, but the
        fields may be set by HW on some machines. Add pebs_no_block to
        explicitly indicate the previous platforms which don't support the new
        block fields. Accessing the new block fields are ignored on those
        platforms.
      
      A new store Latency facility is introduced, which leverages the PEBS
      facility where it can provide additional information about sampled
      stores. The additional information includes the data address, memory
      auxiliary info (e.g. Data Source, STLB miss) and the latency of the
      store access. To enable the facility, the new event (0x02cd) has to be
      programed on the GP counter 0. A new flag PERF_X86_EVENT_PEBS_STLAT is
      introduced to indicate the event. The store_latency_data() is introduced
      to parse the memory auxiliary info.
      
      The layout of access latency field of PEBS Memory Info Record has been
      changed. Two latency, instruction latency (bit 15:0) and cache access
      latency (bit 47:32) are recorded.
      - The cache access latency is similar to previous memory access latency.
        For loads, the latency starts by the actual cache access until the
        data is returned by the memory subsystem.
        For stores, the latency starts when the demand write accesses the L1
        data cache and lasts until the cacheline write is completed in the
        memory subsystem.
        The cache access latency is stored in low 32bits of the sample type
        PERF_SAMPLE_WEIGHT_STRUCT.
      - The instruction latency starts by the dispatch of the load operation
        for execution and lasts until completion of the instruction it belongs
        to.
        Add a new flag PMU_FL_INSTR_LATENCY to indicate the instruction
        latency support. The instruction latency is stored in the bit 47:32
        of the sample type PERF_SAMPLE_WEIGHT_STRUCT.
      
      Extends the PERF_METRICS MSR to feature TMA method level 2 metrics. The
      lower half of the register is the TMA level 1 metrics (legacy). The
      upper half is also divided into four 8-bit fields for the new level 2
      metrics. Expose all eight Topdown metrics events to user space.
      
      The full description for the SPR features can be found at Intel
      Architecture Instruction Set Extensions and Future Features
      Programming Reference, 319433-041.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-5-git-send-email-kan.liang@linux.intel.com
      61b985e3
    • K
      perf/x86/intel: Filter unsupported Topdown metrics event · 1ab5f235
      Kan Liang 提交于
      Intel Sapphire Rapids server will introduce 8 metrics events. Intel
      Ice Lake only supports 4 metrics events. A perf tool user may mistakenly
      use the unsupported events via RAW format on Ice Lake. The user can
      still get a value from the unsupported Topdown metrics event once the
      following Sapphire Rapids enabling patch is applied.
      
      To enable the 8 metrics events on Intel Sapphire Rapids, the
      INTEL_TD_METRIC_MAX has to be updated, which impacts the
      is_metric_event(). The is_metric_event() is a generic function.
      On Ice Lake, the newly added SPR metrics events will be mistakenly
      accepted as metric events on creation. At runtime, the unsupported
      Topdown metrics events will be updated.
      
      Add a variable num_topdown_events in x86_pmu to indicate the available
      number of the Topdown metrics event on the platform. Apply the number
      into is_metric_event(). Only the supported Topdown metrics events
      should be created as metrics events.
      
      Apply the num_topdown_events in icl_update_topdown_event() as well. The
      function can be reused by the following patch.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-4-git-send-email-kan.liang@linux.intel.com
      1ab5f235
    • K
      perf/x86/intel: Factor out intel_update_topdown_event() · 628d923a
      Kan Liang 提交于
      Similar to Ice Lake, Intel Sapphire Rapids server also supports the
      topdown performance metrics feature. The difference is that Intel
      Sapphire Rapids server extends the PERF_METRICS MSR to feature TMA
      method level two metrics, which will introduce 8 metrics events. Current
      icl_update_topdown_event() only check 4 level one metrics events.
      
      Factor out intel_update_topdown_event() to facilitate the code sharing
      between Ice Lake and Sapphire Rapids.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-3-git-send-email-kan.liang@linux.intel.com
      628d923a
  5. 28 1月, 2021 2 次提交
  6. 10 12月, 2020 2 次提交
  7. 10 11月, 2020 2 次提交
  8. 29 10月, 2020 2 次提交
  9. 03 10月, 2020 1 次提交
  10. 29 9月, 2020 2 次提交
  11. 24 8月, 2020 1 次提交
  12. 18 8月, 2020 5 次提交
    • K
      perf/x86/intel: Support per-thread RDPMC TopDown metrics · 2cb5383b
      Kan Liang 提交于
      Starts from Ice Lake, the TopDown metrics are directly available as
      fixed counters and do not require generic counters. Also, the TopDown
      metrics can be collected per thread. Extend the RDPMC usage to support
      per-thread TopDown metrics.
      
      The RDPMC index of the PERF_METRICS will be output if RDPMC users ask
      for the RDPMC index of the metrics events.
      
      To support per thread RDPMC TopDown, the metrics and slots counters have
      to be saved/restored during the context switching.
      
      The last_period and period_left are not used in the counting mode. Use
      the fields for saved_metric and saved_slots.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-12-kan.liang@linux.intel.com
      2cb5383b
    • K
      perf/x86/intel: Support TopDown metrics on Ice Lake · 59a854e2
      Kan Liang 提交于
      Ice Lake supports the hardware TopDown metrics feature, which can free
      up the scarce GP counters.
      
      Update the event constraints for the metrics events. The metric counters
      do not exist, which are mapped to a dummy offset. The sharing between
      multiple users of the same metric without multiplexing is not allowed.
      
      Implement set_topdown_event_period for Ice Lake. The values in
      PERF_METRICS MSR are derived from the fixed counter 3. Both registers
      should start from zero.
      
      Implement update_topdown_event for Ice Lake. The metric is reported by
      multiplying the metric (fraction) with slots. To maintain accurate
      measurements, both registers are cleared for each update. The fixed
      counter 3 should always be cleared before the PERF_METRICS.
      
      Implement td_attr for the new metrics events and the new slots fixed
      counter. Make them visible to the perf user tools.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
      59a854e2
    • K
      perf/x86/intel: Generic support for hardware TopDown metrics · 7b2c05a1
      Kan Liang 提交于
      Intro
      =====
      
      The TopDown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. Current perf has supported the method.
      
      The method works well, but there is one problem. To collect the TopDown
      events, several GP counters have to be used. If a user wants to collect
      other events at the same time, the multiplexing probably be triggered,
      which impacts the accuracy.
      
      To free up the scarce GP counters, the hardware TopDown metrics feature
      is introduced from Ice Lake. The hardware implements an additional
      "metrics" register and a new Fixed Counter 3 that measures pipeline
      "slots". The TopDown events can be calculated from them instead.
      
      Events
      ======
      
      The level 1 TopDown has four metrics. There is no event-code assigned to
      the TopDown metrics. Four metric events are exported as separate perf
      events, which map to the internal "metrics" counter register. Those
      events do not exist in hardware, but can be allocated by the scheduler.
      
      For the event mapping, a special 0x00 event code is used, which is
      reserved for fake events. The metric events start from umask 0x10.
      
      When setting up the metric events, they point to the Fixed Counter 3.
      They have to be specially handled.
      - Add the update_topdown_event() callback to read the additional metrics
        MSR and generate the metrics.
      - Add the set_topdown_event_period() callback to initialize metrics MSR
        and the fixed counter 3.
      - Add a variable n_metric_event to track the number of the accepted
        metrics events. The sharing between multiple users of the same metric
        without multiplexing is not allowed.
      - Only enable/disable the fixed counter 3 when there are no other active
        TopDown events, which avoid the unnecessary writing of the fixed
        control register.
      - Disable the PMU when reading the metrics event. The metrics MSR and
        the fixed counter 3 are read separately. The values may be modified by
        an NMI.
      
      All four metric events don't support sampling. Since they will be
      handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
      introduced to indicate this case.
      
      The slots event can support both sampling and counting.
      For counting, the flag is also applied.
      For sampling, it will be handled normally as other normal events.
      
      Groups
      ======
      
      The slots event is required in a Topdown group.
      To avoid reading the METRICS register multiple times, the metrics and
      slots value can only be updated by slots event in a group.
      All active slots and metrics events will be updated one time.
      Therefore, the slots event must be before any metric events in a Topdown
      group.
      
      NMI
      ======
      
      The METRICS related register may be overflow. The bit 48 of the STATUS
      register will be set. If so, PERF_METRICS and Fixed counter 3 are
      required to be reset. The patch also update all active slots and
      metrics events in the NMI handler.
      
      The update_topdown_event() has to read two registers separately. The
      values may be modified by an NMI. PMU has to be disabled before calling
      the function.
      
      RDPMC
      ======
      
      RDPMC is temporarily disabled. A later patch will enable it.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
      7b2c05a1
    • K
      perf/x86/intel: Use switch in intel_pmu_disable/enable_event · 58da7dbe
      Kan Liang 提交于
      Currently, the if-else is used in the intel_pmu_disable/enable_event to
      check the type of an event. It works well, but with more and more types
      added later, e.g., perf metrics, compared to the switch statement, the
      if-else may impair the readability of the code.
      
      There is no harm to use the switch statement to replace the if-else
      here. Also, some optimizing compilers may compile a switch statement
      into a jump-table which is more efficient than if-else for a large
      number of cases. The performance gain may not be observed for now,
      because the number of cases is only 5, but the benefits may be observed
      with more and more types added in the future.
      
      Use switch to replace the if-else in the intel_pmu_disable/enable_event.
      
      If the idx is invalid, print a warning.
      
      For the case INTEL_PMC_IDX_FIXED_BTS in intel_pmu_disable_event, don't
      need to check the event->attr.precise_ip. Use return for the case.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-7-kan.liang@linux.intel.com
      58da7dbe
    • K
      perf/x86/intel: Name the global status bit in NMI handler · 60a2a271
      Kan Liang 提交于
      Magic numbers are used in the current NMI handler for the global status
      bit. Use a meaningful name to replace the magic numbers to improve the
      readability of the code.
      
      Remove a Tab for all GLOBAL_STATUS_* and INTEL_PMC_IDX_FIXED_BTS macros
      to reduce the length of the line.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-3-kan.liang@linux.intel.com
      60a2a271
  13. 08 7月, 2020 4 次提交
    • K
      perf/x86/intel/lbr: Support Architectural LBR · 47125db2
      Kan Liang 提交于
      Last Branch Records (LBR) enables recording of software path history by
      logging taken branches and other control flows within architectural
      registers now. Intel CPUs have had model-specific LBR for quite some
      time, but this evolves them into an architectural feature now.
      
      The main improvements of Architectural LBR implemented includes:
      - Linux kernel can support the LBR features without knowing the model
        number of the current CPU.
      - Architectural LBR capabilities can be enumerated by CPUID. The
        lbr_ctl_map is based on the CPUID Enumeration.
      - The possible LBR depth can be retrieved from CPUID enumeration. The
        max value is written to the new MSR_ARCH_LBR_DEPTH as the number of
        LBR entries.
      - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs,
        which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR.
      - Each LBR record or entry is still comprised of three MSRs,
        IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP.
        But they become the architectural MSRs.
      - Architectural LBR is stack-like now. Entry 0 is always the youngest
        branch, entry 1 the next youngest... The TOS MSR has been removed.
      
      The way to enable/disable Architectural LBR is similar to the previous
      model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but
      some modifications are required, which include:
      - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural
        LBR.
      - When checking the value of the IA32_DEBUGCTL MSR, ignoring the
        DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning
        and always return 0.
      - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because
        MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for
        Architectural LBR.
      - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for
        Architectural LBR.
      
      Some Architectural LBR dedicated functions are implemented to
      reset/read/save/restore LBR.
      - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR
        entries, which is a lot faster and can improve the context switch
        latency.
      - For read, the branch type information can be retrieved from
        the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to
        OTHER_BRANCH type. The software decoding is still required for the
        OTHER_BRANCH case.
        LBR records are stored in the age order as well. Reuse
        intel_pmu_store_lbr(). Check the CPUID enumeration before accessing
        the corresponding bits in LBR_INFO.
      - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH).
        Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the
        LBR registers are reset in the deep C-state. If 'the deep C-state
        reset' bit is not set in CPUID enumeration, ignoring the check.
        XSAVE support for Architectural LBR will be implemented later.
      
      The number of LBR entries cannot be hardcoded anymore, which should be
      retrieved from CPUID enumeration. A new structure
      x86_perf_task_context_arch_lbr is introduced for Architectural LBR.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
      47125db2
    • K
      perf/x86/intel/lbr: Add the function pointers for LBR save and restore · 799571bf
      Kan Liang 提交于
      The MSRs of Architectural LBR are different from previous model-specific
      LBR. Perf has to implement different functions to save and restore them.
      
      The function pointers for LBR save and restore are introduced. Perf
      should initialize the corresponding functions at boot time.
      
      The generic optimizations, e.g. avoiding restore LBR if no one else
      touched them, still apply for Architectural LBRs. The related codes are
      not moved to model-specific functions.
      
      Current model-specific LBR functions are set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-5-git-send-email-kan.liang@linux.intel.com
      799571bf
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR read · c301b1d8
      Kan Liang 提交于
      The method to read Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer for LBR read is introduced. Perf should initialize
      the corresponding function at boot time, and avoid checking lbr_format
      at run time.
      
      The current 64-bit LBR read function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-4-git-send-email-kan.liang@linux.intel.com
      c301b1d8
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR reset · 9f354a72
      Kan Liang 提交于
      The method to reset Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer is introduced for LBR reset. The enum of
      LBR_FORMAT_* is also moved to perf_event.h. Perf should initialize the
      corresponding functions at boot time, and avoid checking lbr_format at
      run time.
      
      The current 64-bit LBR reset function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-3-git-send-email-kan.liang@linux.intel.com
      9f354a72
  14. 02 7月, 2020 3 次提交