1. 06 10月, 2020 2 次提交
    • P
      perf/x86: Fix n_metric for cancelled txn · 3dbde695
      Peter Zijlstra 提交于
      When a group that has TopDown members is failed to be scheduled, any
      later TopDown groups will not return valid values.
      
      Here is an example.
      
      A background perf that occupies all the GP counters and the fixed
      counter 1.
       $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles,
                       cycles,cycles}:D" -a
      
      A user monitors a TopDown group. It works well, because the fixed
      counter 3 and the PERF_METRICS are available.
       $perf stat -x, --topdown -- ./workload
         retiring,bad speculation,frontend bound,backend bound,
         18.0,16.1,40.4,25.5,
      
      Then the user tries to monitor a group that has TopDown members.
      Because of the cycles event, the group is failed to be scheduled.
       $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound,
                           topdown-fe-bound,topdown-bad-spec,cycles}'
                           -- ./workload
          <not counted>,,slots,0,0.00,,
          <not counted>,,topdown-retiring,0,0.00,,
          <not counted>,,topdown-be-bound,0,0.00,,
          <not counted>,,topdown-fe-bound,0,0.00,,
          <not counted>,,topdown-bad-spec,0,0.00,,
          <not counted>,,cycles,0,0.00,,
      
      The user tries to monitor a TopDown group again. It doesn't work anymore.
       $perf stat -x, --topdown -- ./workload
      
          ,,,,,
      
      In a txn, cancel_txn() is to truncate the event_list for a canceled
      group and update the number of events added in this transaction.
      However, the number of TopDown events added in this transaction is not
      updated. The kernel will probably fail to add new Topdown events.
      
      Fixes: 7b2c05a1 ("perf/x86/intel: Generic support for hardware TopDown metrics")
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Reported-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NKan Liang <kan.liang@linux.intel.com>
      Link: https://lkml.kernel.org/r/20201005082611.GH2628@hirez.programming.kicks-ass.net
      3dbde695
    • P
      perf/x86: Fix n_pair for cancelled txn · 871a93b0
      Peter Zijlstra 提交于
      Kan reported that n_metric gets corrupted for cancelled transactions;
      a similar issue exists for n_pair for AMD's Large Increment thing.
      
      The problem was confirmed and confirmed fixed by Kim using:
      
        sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 &
      
        # should succeed:
        sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload
      
        # should fail:
        sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload
      
        # previously failed, now succeeds with this patch:
        sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload
      
      Fixes: 57388912 ("perf/x86/amd: Add support for Large Increment per Cycle Events")
      Reported-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NKim Phillips <kim.phillips@amd.com>
      Link: https://lkml.kernel.org/r/20201005082516.GG2628@hirez.programming.kicks-ass.net
      871a93b0
  2. 18 8月, 2020 3 次提交
    • K
      perf/x86/intel: Support TopDown metrics on Ice Lake · 59a854e2
      Kan Liang 提交于
      Ice Lake supports the hardware TopDown metrics feature, which can free
      up the scarce GP counters.
      
      Update the event constraints for the metrics events. The metric counters
      do not exist, which are mapped to a dummy offset. The sharing between
      multiple users of the same metric without multiplexing is not allowed.
      
      Implement set_topdown_event_period for Ice Lake. The values in
      PERF_METRICS MSR are derived from the fixed counter 3. Both registers
      should start from zero.
      
      Implement update_topdown_event for Ice Lake. The metric is reported by
      multiplying the metric (fraction) with slots. To maintain accurate
      measurements, both registers are cleared for each update. The fixed
      counter 3 should always be cleared before the PERF_METRICS.
      
      Implement td_attr for the new metrics events and the new slots fixed
      counter. Make them visible to the perf user tools.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-11-kan.liang@linux.intel.com
      59a854e2
    • K
      perf/x86/intel: Generic support for hardware TopDown metrics · 7b2c05a1
      Kan Liang 提交于
      Intro
      =====
      
      The TopDown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. Current perf has supported the method.
      
      The method works well, but there is one problem. To collect the TopDown
      events, several GP counters have to be used. If a user wants to collect
      other events at the same time, the multiplexing probably be triggered,
      which impacts the accuracy.
      
      To free up the scarce GP counters, the hardware TopDown metrics feature
      is introduced from Ice Lake. The hardware implements an additional
      "metrics" register and a new Fixed Counter 3 that measures pipeline
      "slots". The TopDown events can be calculated from them instead.
      
      Events
      ======
      
      The level 1 TopDown has four metrics. There is no event-code assigned to
      the TopDown metrics. Four metric events are exported as separate perf
      events, which map to the internal "metrics" counter register. Those
      events do not exist in hardware, but can be allocated by the scheduler.
      
      For the event mapping, a special 0x00 event code is used, which is
      reserved for fake events. The metric events start from umask 0x10.
      
      When setting up the metric events, they point to the Fixed Counter 3.
      They have to be specially handled.
      - Add the update_topdown_event() callback to read the additional metrics
        MSR and generate the metrics.
      - Add the set_topdown_event_period() callback to initialize metrics MSR
        and the fixed counter 3.
      - Add a variable n_metric_event to track the number of the accepted
        metrics events. The sharing between multiple users of the same metric
        without multiplexing is not allowed.
      - Only enable/disable the fixed counter 3 when there are no other active
        TopDown events, which avoid the unnecessary writing of the fixed
        control register.
      - Disable the PMU when reading the metrics event. The metrics MSR and
        the fixed counter 3 are read separately. The values may be modified by
        an NMI.
      
      All four metric events don't support sampling. Since they will be
      handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
      introduced to indicate this case.
      
      The slots event can support both sampling and counting.
      For counting, the flag is also applied.
      For sampling, it will be handled normally as other normal events.
      
      Groups
      ======
      
      The slots event is required in a Topdown group.
      To avoid reading the METRICS register multiple times, the metrics and
      slots value can only be updated by slots event in a group.
      All active slots and metrics events will be updated one time.
      Therefore, the slots event must be before any metric events in a Topdown
      group.
      
      NMI
      ======
      
      The METRICS related register may be overflow. The bit 48 of the STATUS
      register will be set. If so, PERF_METRICS and Fixed counter 3 are
      required to be reset. The patch also update all active slots and
      metrics events in the NMI handler.
      
      The update_topdown_event() has to read two registers separately. The
      values may be modified by an NMI. PMU has to be disabled before calling
      the function.
      
      RDPMC
      ======
      
      RDPMC is temporarily disabled. A later patch will enable it.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
      7b2c05a1
    • K
      perf/x86/intel: Fix the name of perf METRICS · bbdbde2a
      Kan Liang 提交于
      Bit 15 of the PERF_CAPABILITIES MSR indicates that the perf METRICS
      feature is supported. The perf METRICS is not a PEBS feature.
      
      Rename pebs_metrics_available perf_metrics.
      
      The bit is not used in the current code. It will be used in a later
      patch.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-6-kan.liang@linux.intel.com
      bbdbde2a
  3. 08 7月, 2020 12 次提交
    • K
      perf/x86/intel/lbr: Support XSAVES for arch LBR read · c085fb87
      Kan Liang 提交于
      Reading LBR registers in a perf NMI handler for a non-PEBS event
      causes a high overhead because the number of LBR registers is huge.
      To reduce the overhead, the XSAVES instruction should be used to replace
      the LBR registers' reading method.
      
      The XSAVES buffer used for LBR read has to be per-CPU because the NMI
      handler invoked the lbr_read(). The existing task_ctx_data buffer
      cannot be used which is per-task and only be allocated for the LBR call
      stack mode. A new lbr_xsave pointer is introduced in the cpu_hw_events
      as an XSAVES buffer for LBR read.
      
      The XSAVES buffer should be allocated only when LBR is used by a
      non-PEBS event on the CPU because the total size of the lbr_xsave is
      not small (~1.4KB).
      
      The XSAVES buffer is allocated when a non-PEBS event is added, but it
      is lazily released in x86_release_hardware() when perf releases the
      entire PMU hardware resource, because perf may frequently schedule the
      event, e.g. high context switch. The lazy release method reduces the
      overhead of frequently allocate/free the buffer.
      
      If the lbr_xsave fails to be allocated, roll back to normal Arch LBR
      lbr_read().
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Link: https://lkml.kernel.org/r/1593780569-62993-24-git-send-email-kan.liang@linux.intel.com
      c085fb87
    • K
      perf/x86/intel/lbr: Support XSAVES/XRSTORS for LBR context switch · ce711ea3
      Kan Liang 提交于
      In the LBR call stack mode, LBR information is used to reconstruct a
      call stack. To get the complete call stack, perf has to save/restore
      all LBR registers during a context switch. Due to a large number of the
      LBR registers, this process causes a high CPU overhead. To reduce the
      CPU overhead during a context switch, use the XSAVES/XRSTORS
      instructions.
      
      Every XSAVE area must follow a canonical format: the legacy region, an
      XSAVE header and the extended region. Although the LBR information is
      only kept in the extended region, a space for the legacy region and
      XSAVE header is still required. Add a new dedicated structure for LBR
      XSAVES support.
      
      Before enabling XSAVES support, the size of the LBR state has to be
      sanity checked, because:
      - the size of the software structure is calculated from the max number
      of the LBR depth, which is enumerated by the CPUID leaf for Arch LBR.
      The size of the LBR state is enumerated by the CPUID leaf for XSAVE
      support of Arch LBR. If the values from the two CPUID leaves are not
      consistent, it may trigger a buffer overflow. For example, a hypervisor
      may unconsciously set inconsistent values for the two emulated CPUID.
      - unlike other state components, the size of an LBR state depends on the
      max number of LBRs, which may vary from generation to generation.
      
      Expose the function xfeature_size() for the sanity check.
      The LBR XSAVES support will be disabled if the size of the LBR state
      enumerated by CPUID doesn't match with the size of the software
      structure.
      
      The XSAVE instruction requires 64-byte alignment for state buffers. A
      new macro is added to reflect the alignment requirement. A 64-byte
      aligned kmem_cache is created for architecture LBR.
      
      Currently, the structure for each state component is maintained in
      fpu/types.h. The structure for the new LBR state component should be
      maintained in the same place. Move structure lbr_entry to fpu/types.h as
      well for broader sharing.
      
      Add dedicated lbr_save/lbr_restore functions for LBR XSAVES support,
      which invokes the corresponding xstate helpers to XSAVES/XRSTORS LBR
      information at the context switch when the call stack mode is enabled.
      Since the XSAVES/XRSTORS instructions will be eventually invoked, the
      dedicated functions is named with '_xsaves'/'_xrstors' postfix.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Link: https://lkml.kernel.org/r/1593780569-62993-23-git-send-email-kan.liang@linux.intel.com
      ce711ea3
    • K
      perf/x86/intel/lbr: Support Architectural LBR · 47125db2
      Kan Liang 提交于
      Last Branch Records (LBR) enables recording of software path history by
      logging taken branches and other control flows within architectural
      registers now. Intel CPUs have had model-specific LBR for quite some
      time, but this evolves them into an architectural feature now.
      
      The main improvements of Architectural LBR implemented includes:
      - Linux kernel can support the LBR features without knowing the model
        number of the current CPU.
      - Architectural LBR capabilities can be enumerated by CPUID. The
        lbr_ctl_map is based on the CPUID Enumeration.
      - The possible LBR depth can be retrieved from CPUID enumeration. The
        max value is written to the new MSR_ARCH_LBR_DEPTH as the number of
        LBR entries.
      - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs,
        which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR.
      - Each LBR record or entry is still comprised of three MSRs,
        IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP.
        But they become the architectural MSRs.
      - Architectural LBR is stack-like now. Entry 0 is always the youngest
        branch, entry 1 the next youngest... The TOS MSR has been removed.
      
      The way to enable/disable Architectural LBR is similar to the previous
      model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but
      some modifications are required, which include:
      - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural
        LBR.
      - When checking the value of the IA32_DEBUGCTL MSR, ignoring the
        DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning
        and always return 0.
      - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because
        MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for
        Architectural LBR.
      - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for
        Architectural LBR.
      
      Some Architectural LBR dedicated functions are implemented to
      reset/read/save/restore LBR.
      - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR
        entries, which is a lot faster and can improve the context switch
        latency.
      - For read, the branch type information can be retrieved from
        the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to
        OTHER_BRANCH type. The software decoding is still required for the
        OTHER_BRANCH case.
        LBR records are stored in the age order as well. Reuse
        intel_pmu_store_lbr(). Check the CPUID enumeration before accessing
        the corresponding bits in LBR_INFO.
      - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH).
        Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the
        LBR registers are reset in the deep C-state. If 'the deep C-state
        reset' bit is not set in CPUID enumeration, ignoring the check.
        XSAVE support for Architectural LBR will be implemented later.
      
      The number of LBR entries cannot be hardcoded anymore, which should be
      retrieved from CPUID enumeration. A new structure
      x86_perf_task_context_arch_lbr is introduced for Architectural LBR.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
      47125db2
    • K
      perf/x86/intel/lbr: Factor out rdlbr_all() and wrlbr_all() · fda1f99f
      Kan Liang 提交于
      The previous model-specific LBR and Architecture LBR (legacy way) use a
      similar method to save/restore the LBR information, which directly
      accesses the LBR registers. The codes which read/write a set of LBR
      registers can be shared between them.
      
      Factor out two functions which are used to read/write a set of LBR
      registers.
      
      Add lbr_info into structure x86_pmu, and use it to replace the hardcoded
      LBR INFO MSR, because the LBR INFO MSR address of the previous
      model-specific LBR is different from Architecture LBR. The MSR address
      should be assigned at boot time. For now, only Sky Lake and later
      platforms have the LBR INFO MSR.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-13-git-send-email-kan.liang@linux.intel.com
      fda1f99f
    • K
      perf/x86/intel/lbr: Unify the stored format of LBR information · 5624986d
      Kan Liang 提交于
      Current LBR information in the structure x86_perf_task_context is stored
      in a different format from the PEBS LBR record and Architecture LBR,
      which prevents the sharing of the common codes.
      
      Use the format of the PEBS LBR record as a unified format. Use a generic
      name lbr_entry to replace pebs_lbr_entry.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-11-git-send-email-kan.liang@linux.intel.com
      5624986d
    • K
      perf/x86/intel/lbr: Support LBR_CTL · 49d8184f
      Kan Liang 提交于
      An IA32_LBR_CTL is introduced for Architecture LBR to enable and config
      LBR registers to replace the previous LBR_SELECT.
      
      All the related members in struct cpu_hw_events and struct x86_pmu
      have to be renamed.
      
      Some new macros are added to reflect the layout of LBR_CTL.
      
      The mapping from PERF_SAMPLE_BRANCH_* to the corresponding bits in
      LBR_CTL MSR is saved in lbr_ctl_map now, which is not a const value.
      The value relies on the CPUID enumeration.
      
      For the previous model-specific LBR, most of the bits in LBR_SELECT
      operate in the suppressed mode. For the bits in LBR_CTL, the polarity is
      inverted.
      
      For the previous model-specific LBR format 5 (LBR_FORMAT_INFO), if the
      NO_CYCLES and NO_FLAGS type are set, the flag LBR_NO_INFO will be set to
      avoid the unnecessary LBR_INFO MSR read. Although Architecture LBR also
      has a dedicated LBR_INFO MSR, perf doesn't need to check and set the
      flag LBR_NO_INFO. For Architecture LBR, XSAVES instruction will be used
      as the default way to read the LBR MSRs all together. The overhead which
      the flag tries to avoid doesn't exist anymore. Dropping the flag can
      save the extra check for the flag in the lbr_read() later, and make the
      code cleaner.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-10-git-send-email-kan.liang@linux.intel.com
      49d8184f
    • K
      perf/x86: Expose CPUID enumeration bits for arch LBR · af6cf129
      Kan Liang 提交于
      The LBR capabilities of Architecture LBR are retrieved from the CPUID
      enumeration once at boot time. The capabilities have to be saved for
      future usage.
      
      Several new fields are added into structure x86_pmu to indicate the
      capabilities. The fields will be used in the following patches.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-9-git-send-email-kan.liang@linux.intel.com
      af6cf129
    • K
      perf/x86/intel/lbr: Use dynamic data structure for task_ctx · f42be865
      Kan Liang 提交于
      The type of task_ctx is hardcoded as struct x86_perf_task_context,
      which doesn't apply for Architecture LBR. For example, Architecture LBR
      doesn't have the TOS MSR. The number of LBR entries is variable. A new
      struct will be introduced for Architecture LBR. Perf has to determine
      the type of task_ctx at run time.
      
      The type of task_ctx pointer is changed to 'void *', which will be
      determined at run time.
      
      The generic LBR optimization can be shared between Architecture LBR and
      model-specific LBR. Both need to access the structure for the generic
      LBR optimization. A helper task_context_opt() is introduced to retrieve
      the pointer of the structure at run time.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-7-git-send-email-kan.liang@linux.intel.com
      f42be865
    • K
      perf/x86/intel/lbr: Factor out a new struct for generic optimization · 530bfff6
      Kan Liang 提交于
      To reduce the overhead of a context switch with LBR enabled, some
      generic optimizations were introduced, e.g. avoiding restore LBR if no
      one else touched them. The generic optimizations can also be used by
      Architecture LBR later. Currently, the fields for the generic
      optimizations are part of structure x86_perf_task_context, which will be
      deprecated by Architecture LBR. A new structure should be introduced
      for the common fields of generic optimization, which can be shared
      between Architecture LBR and model-specific LBR.
      
      Both 'valid_lbrs' and 'tos' are also used by the generic optimizations,
      but they are not moved into the new structure, because Architecture LBR
      is stack-like. The 'valid_lbrs' which records the index of the valid LBR
      is not required anymore. The TOS MSR will be removed.
      
      LBR registers may be cleared in the deep Cstate. If so, the generic
      optimizations should not be applied. Perf has to unconditionally
      restore the LBR registers. A generic function is required to detect the
      reset due to the deep Cstate. lbr_is_reset_in_cstate() is introduced.
      Currently, for the model-specific LBR, the TOS MSR is used to detect the
      reset. There will be another method introduced for Architecture LBR
      later.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-6-git-send-email-kan.liang@linux.intel.com
      530bfff6
    • K
      perf/x86/intel/lbr: Add the function pointers for LBR save and restore · 799571bf
      Kan Liang 提交于
      The MSRs of Architectural LBR are different from previous model-specific
      LBR. Perf has to implement different functions to save and restore them.
      
      The function pointers for LBR save and restore are introduced. Perf
      should initialize the corresponding functions at boot time.
      
      The generic optimizations, e.g. avoiding restore LBR if no one else
      touched them, still apply for Architectural LBRs. The related codes are
      not moved to model-specific functions.
      
      Current model-specific LBR functions are set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-5-git-send-email-kan.liang@linux.intel.com
      799571bf
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR read · c301b1d8
      Kan Liang 提交于
      The method to read Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer for LBR read is introduced. Perf should initialize
      the corresponding function at boot time, and avoid checking lbr_format
      at run time.
      
      The current 64-bit LBR read function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-4-git-send-email-kan.liang@linux.intel.com
      c301b1d8
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR reset · 9f354a72
      Kan Liang 提交于
      The method to reset Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer is introduced for LBR reset. The enum of
      LBR_FORMAT_* is also moved to perf_event.h. Perf should initialize the
      corresponding functions at boot time, and avoid checking lbr_format at
      run time.
      
      The current 64-bit LBR reset function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-3-git-send-email-kan.liang@linux.intel.com
      9f354a72
  4. 02 7月, 2020 3 次提交
  5. 01 5月, 2020 1 次提交
    • C
      x86/perf: Add hardware performance events support for Zhaoxin CPU. · 3a4ac121
      CodyYao-oc 提交于
      Zhaoxin CPU has provided facilities for monitoring performance
      via PMU (Performance Monitor Unit), but the functionality is unused so far.
      Therefore, add support for zhaoxin pmu to make performance related
      hardware events available.
      
      The PMU is mostly an Intel Architectural PerfMon-v2 with a novel
      errata for the ZXC line. It supports the following events:
      
        -----------------------------------------------------------------------------------------------------------------------------------
        Event                      | Event  | Umask |          Description
      			     | Select |       |
        -----------------------------------------------------------------------------------------------------------------------------------
        cpu-cycles                 |  82h   |  00h  | unhalt core clock
        instructions               |  00h   |  00h  | number of instructions at retirement.
        cache-references           |  15h   |  05h  | number of fillq pushs at the current cycle.
        cache-misses               |  1ah   |  05h  | number of l2 miss pushed by fillq.
        branch-instructions        |  28h   |  00h  | counts the number of branch instructions retired.
        branch-misses              |  29h   |  00h  | mispredicted branch instructions at retirement.
        bus-cycles                 |  83h   |  00h  | unhalt bus clock
        stalled-cycles-frontend    |  01h   |  01h  | Increments each cycle the # of Uops issued by the RAT to RS.
        stalled-cycles-backend     |  0fh   |  04h  | RS0/1/2/3/45 empty
        L1-dcache-loads            |  68h   |  05h  | number of retire/commit load.
        L1-dcache-load-misses      |  4bh   |  05h  | retired load uops whose data source followed an L1 miss.
        L1-dcache-stores           |  69h   |  06h  | number of retire/commit Store,no LEA
        L1-dcache-store-misses     |  62h   |  05h  | cache lines in M state evicted out of L1D due to Snoop HitM or dirty line replacement.
        L1-icache-loads            |  00h   |  03h  | number of l1i cache access for valid normal fetch,including un-cacheable access.
        L1-icache-load-misses      |  01h   |  03h  | number of l1i cache miss for valid normal fetch,including un-cacheable miss.
        L1-icache-prefetches       |  0ah   |  03h  | number of prefetch.
        L1-icache-prefetch-misses  |  0bh   |  03h  | number of prefetch miss.
        dTLB-loads                 |  68h   |  05h  | number of retire/commit load
        dTLB-load-misses           |  2ch   |  05h  | number of load operations miss all level tlbs and cause a tablewalk.
        dTLB-stores                |  69h   |  06h  | number of retire/commit Store,no LEA
        dTLB-store-misses          |  30h   |  05h  | number of store operations miss all level tlbs and cause a tablewalk.
        dTLB-prefetches            |  64h   |  05h  | number of hardware pte prefetch requests dispatched out of the prefetch FIFO.
        dTLB-prefetch-misses       |  65h   |  05h  | number of hardware pte prefetch requests miss the l1d data cache.
        iTLB-load                  |  00h   |  00h  | actually counter instructions.
        iTLB-load-misses           |  34h   |  05h  | number of code operations miss all level tlbs and cause a tablewalk.
        -----------------------------------------------------------------------------------------------------------------------------------
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NCodyYao-oc <CodyYao-oc@zhaoxin.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1586747669-4827-1-git-send-email-CodyYao-oc@zhaoxin.com
      3a4ac121
  6. 17 1月, 2020 2 次提交
    • K
      perf/x86/amd: Add support for Large Increment per Cycle Events · 57388912
      Kim Phillips 提交于
      Description of hardware operation
      ---------------------------------
      
      The core AMD PMU has a 4-bit wide per-cycle increment for each
      performance monitor counter.  That works for most events, but
      now with AMD Family 17h and above processors, some events can
      occur more than 15 times in a cycle.  Those events are called
      "Large Increment per Cycle" events. In order to count these
      events, two adjacent h/w PMCs get their count signals merged
      to form 8 bits per cycle total.  In addition, the PERF_CTR count
      registers are merged to be able to count up to 64 bits.
      
      Normally, events like instructions retired, get programmed on a single
      counter like so:
      
      PERF_CTL0 (MSR 0xc0010200) 0x000000000053ff0c # event 0x0c, umask 0xff
      PERF_CTR0 (MSR 0xc0010201) 0x0000800000000001 # r/w 48-bit count
      
      The next counter at MSRs 0xc0010202-3 remains unused, or can be used
      independently to count something else.
      
      When counting Large Increment per Cycle events, such as FLOPs,
      however, we now have to reserve the next counter and program the
      PERF_CTL (config) register with the Merge event (0xFFF), like so:
      
      PERF_CTL0 (msr 0xc0010200) 0x000000000053ff03 # FLOPs event, umask 0xff
      PERF_CTR0 (msr 0xc0010201) 0x0000800000000001 # rd 64-bit cnt, wr lo 48b
      PERF_CTL1 (msr 0xc0010202) 0x0000000f004000ff # Merge event, enable bit
      PERF_CTR1 (msr 0xc0010203) 0x0000000000000000 # wr hi 16-bits count
      
      The count is widened from the normal 48-bits to 64 bits by having the
      second counter carry the higher 16 bits of the count in its lower 16
      bits of its counter register.
      
      The odd counter, e.g., PERF_CTL1, is programmed with the enabled Merge
      event before the even counter, PERF_CTL0.
      
      The Large Increment feature is available starting with Family 17h.
      For more details, search any Family 17h PPR for the "Large Increment
      per Cycle Events" section, e.g., section 2.1.15.3 on p. 173 in this
      version:
      
      https://www.amd.com/system/files/TechDocs/56176_ppr_Family_17h_Model_71h_B0_pub_Rev_3.06.zip
      
      Description of software operation
      ---------------------------------
      
      The following steps are taken in order to support reserving and
      enabling the extra counter for Large Increment per Cycle events:
      
      1. In the main x86 scheduler, we reduce the number of available
      counters by the number of Large Increment per Cycle events being
      scheduled, tracked by a new cpuc variable 'n_pair' and a new
      amd_put_event_constraints_f17h().  This improves the counter
      scheduler success rate.
      
      2. In perf_assign_events(), if a counter is assigned to a Large
      Increment event, we increment the current counter variable, so the
      counter used for the Merge event is removed from assignment
      consideration by upcoming event assignments.
      
      3. In find_counter(), if a counter has been found for the Large
      Increment event, we set the next counter as used, to prevent other
      events from using it.
      
      4. We perform steps 2 & 3 also in the x86 scheduler fastpath, i.e.,
      we add Merge event accounting to the existing used_mask logic.
      
      5. Finally, we add on the programming of Merge event to the
      neighbouring PMC counters in the counter enable/disable{_all}
      code paths.
      
      Currently, software does not support a single PMU with mixed 48- and
      64-bit counting, so Large increment event counts are limited to 48
      bits.  In set_period, we zero-out the upper 16 bits of the count, so
      the hardware doesn't copy them to the even counter's higher bits.
      
      Simple invocation example showing counting 8 FLOPs per 256-bit/%ymm
      vaddps instruction executed in a loop 100 million times:
      
      perf stat -e cpu/fp_ret_sse_avx_ops.all/,cpu/instructions/ <workload>
      
       Performance counter stats for '<workload>':
      
             800,000,000      cpu/fp_ret_sse_avx_ops.all/u
             300,042,101      cpu/instructions/u
      
      Prior to this patch, the reported SSE/AVX FLOPs retired count would
      be wrong.
      
      [peterz: lots of renames and edits to the code]
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      57388912
    • K
      perf/x86/amd: Constrain Large Increment per Cycle events · 471af006
      Kim Phillips 提交于
      AMD Family 17h processors and above gain support for Large Increment
      per Cycle events.  Unfortunately there is no CPUID or equivalent bit
      that indicates whether the feature exists or not, so we continue to
      determine eligibility based on a CPU family number comparison.
      
      For Large Increment per Cycle events, we add a f17h-and-compatibles
      get_event_constraints_f17h() that returns an even counter bitmask:
      Large Increment per Cycle events can only be placed on PMCs 0, 2,
      and 4 out of the currently available 0-5.  The only currently
      public event that requires this feature to report valid counts
      is PMCx003 "Retired SSE/AVX Operations".
      
      Note that the CPU family logic in amd_core_pmu_init() is changed
      so as to be able to selectively add initialization for features
      available in ranges of backward-compatible CPU families.  This
      Large Increment per Cycle feature is expected to be retained
      in future families.
      
      A side-effect of assigning a new get_constraints function for f17h
      disables calling the old (prior to f15h) amd_get_event_constraints
      implementation left enabled by commit e40ed154 ("perf/x86: Add perf
      support for AMD family-17h processors"), which is no longer
      necessary since those North Bridge event codes are obsoleted.
      
      Also fix a spelling mistake whilst in the area (calulating ->
      calculating).
      
      Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20191114183720.19887-2-kim.phillips@amd.com
      471af006
  7. 28 10月, 2019 2 次提交
    • A
      perf/x86/intel: Implement LBR callstack context synchronization · 421ca868
      Alexey Budankov 提交于
      Implement intel_pmu_lbr_swap_task_ctx() method updating counters
      of the events that requested LBR callstack data on a sample.
      
      The counter can be zero for the case when task context belongs to
      a thread that has just come from a block on a futex and the context
      contains saved (lbr_stack_state == LBR_VALID) LBR register values.
      
      For the values to be restored at LBR registers on the next thread's
      switch-in event it swaps the counter value with the one that is
      expected to be non zero at the previous equivalent task perf event
      context.
      
      Swap operation type ensures the previous task perf event context
      stays consistent with the amount of events that requested LBR
      callstack data on a sample.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/261ac742-9022-c3f4-5885-1eae7415b091@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      421ca868
    • A
      perf/core, perf/x86: Introduce swap_task_ctx() method at 'struct pmu' · fc1adfe3
      Alexey Budankov 提交于
      Declare swap_task_ctx() methods at the generic and x86 specific
      pmu types to bridge calls to platform specific PMU code on optimized
      context switch path between equivalent task perf event contexts.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/9a0aa84a-f062-9b64-3133-373658550c4b@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc1adfe3
  8. 28 8月, 2019 1 次提交
  9. 25 6月, 2019 2 次提交
  10. 03 6月, 2019 4 次提交
  11. 10 5月, 2019 1 次提交
    • S
      perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking · 6b89d4c1
      Stephane Eranian 提交于
      On Intel Westmere, a cmdline as follows:
      
        $ perf record -e cpu/event=0xc4,umask=0x2,name=br_inst_retired.near_call/p ....
      
      was failing. Yet the event+ umask support PEBS.
      
      It turns out this is due to a bug in the the PEBS event constraint table for
      westmere. All forms of BR_INST_RETIRED.* support PEBS. Therefore the constraint
      mask should ignore the umask. The name of the macro INTEL_FLAGS_EVENT_CONSTRAINT()
      hint that this is the case but it was not. That macros was checking both the
      event code and event umask. Therefore, it was only matching on 0x00c4.
      There are code+umask macros, they all have *UEVENT*.
      
      This bug fixes the issue by checking only the event code in the mask.
      Both single and range version are modified.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/20190509214556.123493-1-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b89d4c1
  12. 16 4月, 2019 7 次提交
    • K
      perf/x86/intel: Add Icelake support · 60176089
      Kan Liang 提交于
      Add Icelake core PMU perf code, including constraint tables and the main
      enable code.
      
      Icelake expanded the generic counters to always 8 even with HT on, but a
      range of events cannot be scheduled on the extra 4 counters.
      Add new constraint ranges to describe this to the scheduler.
      The number of constraints that need to be checked is larger now than
      with earlier CPUs.
      At some point we may need a new data structure to look them up more
      efficiently than with linear search. So far it still seems to be
      acceptable however.
      
      Icelake added a new fixed counter SLOTS. Full support for it is added
      later in the patch series.
      
      The cache events table is identical to Skylake.
      
      Compare to PEBS instruction event on generic counter, fixed counter 0
      has less skid. Force instruction:ppp always in fixed counter 0.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-9-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60176089
    • P
      perf/x86: Support constraint ranges · 63b79f6e
      Peter Zijlstra 提交于
      Icelake extended the general counters to 8, even when SMT is enabled.
      However only a (large) subset of the events can be used on all 8
      counters.
      
      The events that can or cannot be used on all counters are organized
      in ranges.
      
      A lot of scheduler constraints are required to handle all this.
      
      To avoid blowing up the tables add event code ranges to the constraint
      tables, and a new inline function to match them.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # developer hat on
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # maintainer hat on
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-8-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63b79f6e
    • A
      perf/x86/lbr: Avoid reading the LBRs when adaptive PEBS handles them · d3617b98
      Andi Kleen 提交于
      With adaptive PEBS the CPU can directly supply the LBR information,
      so we don't need to read it again. But the LBRs still need to be
      enabled. Add a special count to the cpuc that distinguishes these
      two cases, and avoid reading the LBRs unnecessarily when PEBS is
      active.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-7-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d3617b98
    • K
      perf/x86/intel: Support adaptive PEBS v4 · c22497f5
      Kan Liang 提交于
      Adaptive PEBS is a new way to report PEBS sampling information. Instead
      of a fixed size record for all PEBS events it allows to configure the
      PEBS record to only include the information needed. Events can then opt
      in to use such an extended record, or stay with a basic record which
      only contains the IP.
      
      The major new feature is to support LBRs in PEBS record.
      Besides normal LBR, this allows (much faster) large PEBS, while still
      supporting callstacks through callstack LBR. So essentially a lot of
      profiling can now be done without frequent interrupts, dropping the
      overhead significantly.
      
      The main requirement still is to use a period, and not use frequency
      mode, because frequency mode requires reevaluating the frequency on each
      overflow.
      
      The floating point state (XMM) is also supported, which allows efficient
      profiling of FP function arguments.
      
      Introduce specific drain function to handle variable length records.
      Use a new callback to parse the new record format, and also handle the
      STATUS field now being at a different offset.
      
      Add code to set up the configuration register. Since there is only a
      single register, all events either get the full super set of all events,
      or only the basic record.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-6-kan.liang@linux.intel.com
      [ Renamed GPRS => GP. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c22497f5
    • K
      perf/x86: Support outputting XMM registers · 878068ea
      Kan Liang 提交于
      Starting from Icelake, XMM registers can be collected in PEBS record.
      But current code only output the pt_regs.
      
      Add a new struct x86_perf_regs for both pt_regs and xmm_regs. The
      xmm_regs will be used later to keep a pointer to PEBS record which has
      XMM information.
      
      XMM registers are 128 bit. To simplify the code, they are handled like
      two different registers, which means setting two bits in the register
      bitmap. This also allows only sampling the lower 64bit bits in XMM.
      
      The index of XMM registers starts from 32. There are 16 XMM registers.
      So all reserved space for regs are used. Remove REG_RESERVED.
      
      Add PERF_REG_X86_XMM_MAX, which stands for the max number of all x86
      regs including both GPRs and XMM.
      
      Add REG_NOSUPPORT for 32bit to exclude unsupported registers.
      
      Previous platforms can not collect XMM information in PEBS record.
      Adding pebs_no_xmm_regs to indicate the unsupported platforms.
      
      The common code still validates the supported registers. However, it
      cannot check model specific registers, e.g. XMM. Add extra check in
      x86_pmu_hw_config() to reject invalid config of regs_user and regs_intr.
      The regs_user never supports XMM collection.
      The regs_intr only supports XMM collection when sampling PEBS event on
      icelake and later platforms.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-3-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      878068ea
    • S
      perf/x86/intel: Force resched when TFA sysctl is modified · f447e4eb
      Stephane Eranian 提交于
      This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
      event is using PMC3 when the echo command returns. Vice-Versa, when TFA
      is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).
      
        $ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
           1.000123979    125,768,725,208      branches
           1.000562520    125,631,000,456      branches
           1.000942898    125,487,114,291      branches
           1.001333316    125,323,363,620      branches
           2.004721306    125,514,968,546      branches
           2.005114560    125,511,110,861      branches
           2.005482722    125,510,132,724      branches
           2.005851245    125,508,967,086      branches
           3.006323475    125,166,570,648      branches
           3.006709247    125,165,650,056      branches
           3.007086605    125,164,639,142      branches
           3.007459298    125,164,402,912      branches
           4.007922698    125,045,577,140      branches
           4.008310775    125,046,804,324      branches
           4.008670814    125,048,265,111      branches
           4.009039251    125,048,677,611      branches
           5.009503373    125,122,240,217      branches
           5.009897067    125,122,450,517      branches
      
      Then on another connection, sysadmin does:
      
        $ echo  1 >/sys/devices/cpu/allow_tsx_force_abort
      
      Then perf stat adjusts the events immediately:
      
           5.010286029    125,121,393,483      branches
           5.010646308    125,120,556,786      branches
           6.011113588    124,963,351,832      branches
           6.011510331    124,964,267,566      branches
           6.011889913    124,964,829,130      branches
           6.012262996    124,965,841,156      branches
           7.012708299    124,419,832,234      branches [79.69%]
           7.012847908    124,416,363,853      branches [79.73%]
           7.013225462    124,400,723,712      branches [79.73%]
           7.013598191    124,376,154,434      branches [79.70%]
           8.014089834    124,250,862,693      branches [74.98%]
           8.014481363    124,267,539,139      branches [74.94%]
           8.014856006    124,259,519,786      branches [74.98%]
           8.014980848    124,225,457,969      branches [75.04%]
           9.015464576    124,204,235,423      branches [75.03%]
           9.015858587    124,204,988,490      branches [75.04%]
           9.016243680    124,220,092,486      branches [74.99%]
           9.016620104    124,231,260,146      branches [74.94%]
      
      And vice-versa if the syadmin does:
      
        $ echo  0 >/sys/devices/cpu/allow_tsx_force_abort
      
      Events are again spread over the 4 counters:
      
          10.017096277    124,276,230,565      branches [74.96%]
          10.017237209    124,228,062,171      branches [75.03%]
          10.017478637    124,178,780,626      branches [75.03%]
          10.017853402    124,198,316,177      branches [75.03%]
          11.018334423    124,602,418,933      branches [85.40%]
          11.018722584    124,602,921,320      branches [85.42%]
          11.019095621    124,603,956,093      branches [85.42%]
          11.019467742    124,595,273,783      branches [85.42%]
          12.019945736    125,110,114,864      branches
          12.020330764    125,109,334,472      branches
          12.020688740    125,109,818,865      branches
          12.021054020    125,108,594,014      branches
          13.021516774    125,109,164,018      branches
          13.021903640    125,108,794,510      branches
          13.022270770    125,107,756,978      branches
          13.022630819    125,109,380,471      branches
          14.023114989    125,133,140,817      branches
          14.023501880    125,133,785,858      branches
          14.023868339    125,133,852,700      branches
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Cc: nelson.dsouza@intel.com
      Cc: tonyj@suse.com
      Link: https://lkml.kernel.org/r/20190408173252.37932-3-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f447e4eb
    • K
      perf/x86: Fix incorrect PEBS_REGS · 9d5dcc93
      Kan Liang 提交于
      PEBS_REGS used as mask for the supported registers for large PEBS.
      However, the mask cannot filter the sample_regs_user/sample_regs_intr
      correctly.
      
      (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which
      is only the index.
      
      Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general
      purpose registers.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Fixes: 2fe1bc1f ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
      Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com
      [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d5dcc93