1. 18 1月, 2022 2 次提交
    • P
      perf/x86/intel/lbr: Support LBR format V7 · 1ac7fd81
      Peter Zijlstra (Intel) 提交于
      The Goldmont plus and Tremont have LBR format V7. The V7 has LBR_INFO,
      which is the same as LBR format V5. But V7 doesn't support TSX.
      
      Without the patch, the associated misprediction and cycles information
      in the LBR_INFO may be lost on a Goldmont plus platform.
      For Tremont, the patch only impacts the non-PEBS events. Because of the
      adaptive PEBS, the LBR_INFO is always processed for a PEBS event.
      
      Currently, two different ways are used to check the LBR capabilities,
      which make the codes complex and confusing.
      For the LBR format V4 and earlier, the global static lbr_desc array is
      used to store the flags for the LBR capabilities in each LBR format.
      For LBR format V5 and V6, the current code checks the version number
      for the LBR capabilities.
      
      There are common LBR capabilities among LBR format versions. Several
      flags for the LBR capabilities are introduced into the struct x86_pmu.
      The flags, which can be shared among LBR formats, are used to check
      the LBR capabilities. Add intel_pmu_lbr_init() to set the flags
      accordingly at boot time.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NKan Liang <kan.liang@linux.intel.com>
      Link: https://lkml.kernel.org/r/1641315077-96661-1-git-send-email-peterz@infradead.org
      1ac7fd81
    • K
      perf/x86/intel: Add a quirk for the calculation of the number of counters on Alder Lake · 7fa981ca
      Kan Liang 提交于
      For some Alder Lake machine with all E-cores disabled in a BIOS, the
      below warning may be triggered.
      
      [ 2.010766] hw perf events fixed 5 > max(4), clipping!
      
      Current perf code relies on the CPUID leaf 0xA and leaf 7.EDX[15] to
      calculate the number of the counters and follow the below assumption.
      
      For a hybrid configuration, the leaf 7.EDX[15] (X86_FEATURE_HYBRID_CPU)
      is set. The leaf 0xA only enumerate the common counters. Linux perf has
      to manually add the extra GP counters and fixed counters for P-cores.
      For a non-hybrid configuration, the X86_FEATURE_HYBRID_CPU should not
      be set. The leaf 0xA enumerates all counters.
      
      However, that's not the case when all E-cores are disabled in a BIOS.
      Although there are only P-cores in the system, the leaf 7.EDX[15]
      (X86_FEATURE_HYBRID_CPU) is still set. But the leaf 0xA is updated
      to enumerate all counters of P-cores. The inconsistency triggers the
      warning.
      
      Several software ways were considered to handle the inconsistency.
      - Drop the leaf 0xA and leaf 7.EDX[15] CPUID enumeration support.
        Hardcode the number of counters. This solution may be a problem for
        virtualization. A hypervisor cannot control the number of counters
        in a Linux guest via changing the guest CPUID enumeration anymore.
      - Find another CPUID bit that is also updated with E-cores disabled.
        There may be a problem in the virtualization environment too. Because
        a hypervisor may disable the feature/CPUID bit.
      - The P-cores have a maximum of 8 GP counters and 4 fixed counters on
        ADL. The maximum number can be used to detect the case.
        This solution is implemented in this patch.
      
      Fixes: ee72a94e ("perf/x86/intel: Fix fixed counter check warning for some Alder Lake")
      Reported-by: NDamjan Marion (damarion) <damarion@cisco.com>
      Reported-by: yinchu's avatarChan Edison <edison_chan_gz@hotmail.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NDamjan Marion (damarion) <damarion@cisco.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1641925238-149288-1-git-send-email-kan.liang@linux.intel.com
      7fa981ca
  2. 17 11月, 2021 4 次提交
    • S
      perf: Add wrappers for invoking guest callbacks · 1c343051
      Sean Christopherson 提交于
      Add helpers for the guest callbacks to prepare for burying the callbacks
      behind a Kconfig (it's a lot easier to provide a few stubs than to #ifdef
      piles of code), and also to prepare for converting the callbacks to
      static_call().  perf_instruction_pointer() in particular will have subtle
      semantics with static_call(), as the "no callbacks" case will return 0 if
      the callbacks are unregistered between querying guest state and getting
      the IP.  Implement the change now to avoid a functional change when adding
      static_call() support, and because the new helper needs to return
      _something_ in this case.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20211111020738.2512932-8-seanjc@google.com
      1c343051
    • L
      perf/core: Rework guest callbacks to prepare for static_call support · b9f5621c
      Like Xu 提交于
      To prepare for using static_calls to optimize perf's guest callbacks,
      replace ->is_in_guest and ->is_user_mode with a new multiplexed hook
      ->state, tweak ->handle_intel_pt_intr to play nice with being called when
      there is no active guest, and drop "guest" from ->get_guest_ip.
      
      Return '0' from ->state and ->handle_intel_pt_intr to indicate "not in
      guest" so that DEFINE_STATIC_CALL_RET0 can be used to define the static
      calls, i.e. no callback == !guest.
      
      [sean: extracted from static_call patch, fixed get_ip() bug, wrote changelog]
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Originally-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Signed-off-by: NZhu Lingshan <lingshan.zhu@intel.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Link: https://lore.kernel.org/r/20211111020738.2512932-7-seanjc@google.com
      b9f5621c
    • S
      perf: Protect perf_guest_cbs with RCU · ff083a2d
      Sean Christopherson 提交于
      Protect perf_guest_cbs with RCU to fix multiple possible errors.  Luckily,
      all paths that read perf_guest_cbs already require RCU protection, e.g. to
      protect the callback chains, so only the direct perf_guest_cbs touchpoints
      need to be modified.
      
      Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
      perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
      Fixed via the READ_ONCE() in rcu_dereference().
      
      Bug #2 is that on weakly-ordered architectures, updates to the callbacks
      themselves are not guaranteed to be visible before the pointer is made
      visible to readers.  Fixed by the smp_store_release() in
      rcu_assign_pointer() when the new pointer is non-NULL.
      
      Bug #3 is that, because the callbacks are global, it's possible for
      readers to run in parallel with an unregisters, and thus a module
      implementing the callbacks can be unloaded while readers are in flight,
      resulting in a use-after-free.  Fixed by a synchronize_rcu() call when
      unregistering callbacks.
      
      Bug #1 escaped notice because it's extremely unlikely a compiler will
      reload perf_guest_cbs in this sequence.  perf_guest_cbs does get reloaded
      for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest()
      guard all but guarantees the consumer will win the race, e.g. to nullify
      perf_guest_cbs, KVM has to completely exit the guest and teardown down
      all VMs before KVM start its module unload / unregister sequence.  This
      also makes it all but impossible to encounter bug #3.
      
      Bug #2 has not been a problem because all architectures that register
      callbacks are strongly ordered and/or have a static set of callbacks.
      
      But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
      perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
      kvm_intel module load/unload leads to:
      
        BUG: kernel NULL pointer dereference, address: 0000000000000000
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD 0 P4D 0
        Oops: 0000 [#1] PREEMPT SMP
        CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:perf_misc_flags+0x1c/0x70
        Call Trace:
         perf_prepare_sample+0x53/0x6b0
         perf_event_output_forward+0x67/0x160
         __perf_event_overflow+0x52/0xf0
         handle_pmi_common+0x207/0x300
         intel_pmu_handle_irq+0xcf/0x410
         perf_event_nmi_handler+0x28/0x50
         nmi_handle+0xc7/0x260
         default_do_nmi+0x6b/0x170
         exc_nmi+0x103/0x130
         asm_exc_nmi+0x76/0xbf
      
      Fixes: 39447b38 ("perf: Enhance perf to allow for guest statistic collection from host")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com
      ff083a2d
    • S
      x86/perf: Fix snapshot_branch_stack warning in VM · f3fd84a3
      Song Liu 提交于
      When running in VM intel_pmu_snapshot_branch_stack triggers WRMSR warning
      like:
      
       [ ] unchecked MSR access error: WRMSR to 0x3f1 (tried to write 0x0000000000000000) at rIP: 0xffffffff81011a5b (intel_pmu_snapshot_branch_stack+0x3b/0xd0)
      
      This can be triggered with BPF selftests:
      
        tools/testing/selftests/bpf/test_progs -t get_branch_snapshot
      
      This warning is caused by __intel_pmu_pebs_disable_all() in the VM.
      Since it is not necessary to disable PEBS for LBR, remove it from
      intel_pmu_snapshot_branch_stack and intel_pmu_snapshot_arch_branch_stack.
      
      Fixes: c22ac2a3 ("perf: Enable branch record for software events")
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NLike Xu <likexu@tencent.com>
      Link: https://lore.kernel.org/r/20211112054510.2667030-1-songliubraving@fb.com
      f3fd84a3
  3. 11 11月, 2021 1 次提交
  4. 30 10月, 2021 1 次提交
  5. 15 10月, 2021 1 次提交
  6. 01 10月, 2021 1 次提交
  7. 14 9月, 2021 1 次提交
  8. 26 8月, 2021 1 次提交
  9. 06 8月, 2021 1 次提交
    • K
      perf/x86/intel: Apply mid ACK for small core · acade637
      Kan Liang 提交于
      A warning as below may be occasionally triggered in an ADL machine when
      these conditions occur:
      
       - Two perf record commands run one by one. Both record a PEBS event.
       - Both runs on small cores.
       - They have different adaptive PEBS configuration (PEBS_DATA_CFG).
      
        [ ] WARNING: CPU: 4 PID: 9874 at arch/x86/events/intel/ds.c:1743 setup_pebs_adaptive_sample_data+0x55e/0x5b0
        [ ] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0
        [ ] Call Trace:
        [ ]  <NMI>
        [ ]  intel_pmu_drain_pebs_icl+0x48b/0x810
        [ ]  perf_event_nmi_handler+0x41/0x80
        [ ]  </NMI>
        [ ]  __perf_event_task_sched_in+0x2c2/0x3a0
      
      Different from the big core, the small core requires the ACK right
      before re-enabling counters in the NMI handler, otherwise a stale PEBS
      record may be dumped into the later NMI handler, which trigger the
      warning.
      
      Add a new mid_ack flag to track the case. Add all PMI handler bits in
      the struct x86_hybrid_pmu to track the bits for different types of
      PMUs.  Apply mid ACK for the small cores on an Alder Lake machine.
      
      The existing hybrid() macro has a compile error when taking address of
      a bit-field variable. Add a new macro hybrid_bit() to get the
      bit-field value of a given PMU.
      
      Fixes: f83d2f91 ("perf/x86/intel: Add Alder Lake Hybrid support")
      Reported-by: NAmmy Yi <ammy.yi@intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NAmmy Yi <ammy.yi@intel.com>
      Link: https://lkml.kernel.org/r/1627997128-57891-1-git-send-email-kan.liang@linux.intel.com
      acade637
  10. 24 6月, 2021 3 次提交
  11. 15 6月, 2021 1 次提交
  12. 18 5月, 2021 1 次提交
  13. 22 4月, 2021 1 次提交
  14. 20 4月, 2021 13 次提交
  15. 22 3月, 2021 1 次提交
    • I
      x86: Fix various typos in comments, take #2 · 163b0991
      Ingo Molnar 提交于
      Fix another ~42 single-word typos in arch/x86/ code comments,
      missed a few in the first pass, in particular in .S files.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      163b0991
  16. 18 3月, 2021 1 次提交
    • I
      x86: Fix various typos in comments · d9f6e12f
      Ingo Molnar 提交于
      Fix ~144 single-word typos in arch/x86/ code comments.
      
      Doing this in a single commit should reduce the churn.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: linux-kernel@vger.kernel.org
      d9f6e12f
  17. 17 3月, 2021 1 次提交
  18. 06 3月, 2021 1 次提交
  19. 10 2月, 2021 1 次提交
  20. 01 2月, 2021 3 次提交
    • K
      perf/x86/intel: Support CPUID 10.ECX to disable fixed counters · 32451614
      Kan Liang 提交于
      With Architectural Performance Monitoring Version 5, CPUID 10.ECX cpu
      leaf indicates the fixed counter enumeration. This extends the previous
      count to a bitmap which allows disabling even lower fixed counters.
      It could be used by a Hypervisor.
      
      The existing intel_ctrl variable is used to remember the bitmask of the
      counters. All code that reads all counters is fixed to check this extra
      bitmask.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-6-git-send-email-kan.liang@linux.intel.com
      32451614
    • K
      perf/x86/intel: Add perf core PMU support for Sapphire Rapids · 61b985e3
      Kan Liang 提交于
      Add perf core PMU support for the Intel Sapphire Rapids server, which is
      the successor of the Intel Ice Lake server. The enabling code is based
      on Ice Lake, but there are several new features introduced.
      
      The event encoding is changed and simplified, e.g., the event codes
      which are below 0x90 are restricted to counters 0-3. The event codes
      which above 0x90 are likely to have no restrictions. The event
      constraints, extra_regs(), and hardware cache events table are changed
      accordingly.
      
      A new Precise Distribution (PDist) facility is introduced, which
      further minimizes the skid when a precise event is programmed on the GP
      counter 0. Enable the Precise Distribution (PDist) facility with :ppp
      event. For this facility to work, the period must be initialized with a
      value larger than 127. Add spr_limit_period() to apply the limit for
      :ppp event.
      
      Two new data source fields, data block & address block, are added in the
      PEBS Memory Info Record for the load latency event. To enable the
      feature,
      - An auxiliary event has to be enabled together with the load latency
        event on Sapphire Rapids. A new flag PMU_FL_MEM_LOADS_AUX is
        introduced to indicate the case. A new event, mem-loads-aux, is
        exposed to sysfs for the user tool.
        Add a check in hw_config(). If the auxiliary event is not detected,
        return an unique error -ENODATA.
      - The union perf_mem_data_src is extended to support the new fields.
      - Ice Lake and earlier models do not support block information, but the
        fields may be set by HW on some machines. Add pebs_no_block to
        explicitly indicate the previous platforms which don't support the new
        block fields. Accessing the new block fields are ignored on those
        platforms.
      
      A new store Latency facility is introduced, which leverages the PEBS
      facility where it can provide additional information about sampled
      stores. The additional information includes the data address, memory
      auxiliary info (e.g. Data Source, STLB miss) and the latency of the
      store access. To enable the facility, the new event (0x02cd) has to be
      programed on the GP counter 0. A new flag PERF_X86_EVENT_PEBS_STLAT is
      introduced to indicate the event. The store_latency_data() is introduced
      to parse the memory auxiliary info.
      
      The layout of access latency field of PEBS Memory Info Record has been
      changed. Two latency, instruction latency (bit 15:0) and cache access
      latency (bit 47:32) are recorded.
      - The cache access latency is similar to previous memory access latency.
        For loads, the latency starts by the actual cache access until the
        data is returned by the memory subsystem.
        For stores, the latency starts when the demand write accesses the L1
        data cache and lasts until the cacheline write is completed in the
        memory subsystem.
        The cache access latency is stored in low 32bits of the sample type
        PERF_SAMPLE_WEIGHT_STRUCT.
      - The instruction latency starts by the dispatch of the load operation
        for execution and lasts until completion of the instruction it belongs
        to.
        Add a new flag PMU_FL_INSTR_LATENCY to indicate the instruction
        latency support. The instruction latency is stored in the bit 47:32
        of the sample type PERF_SAMPLE_WEIGHT_STRUCT.
      
      Extends the PERF_METRICS MSR to feature TMA method level 2 metrics. The
      lower half of the register is the TMA level 1 metrics (legacy). The
      upper half is also divided into four 8-bit fields for the new level 2
      metrics. Expose all eight Topdown metrics events to user space.
      
      The full description for the SPR features can be found at Intel
      Architecture Instruction Set Extensions and Future Features
      Programming Reference, 319433-041.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-5-git-send-email-kan.liang@linux.intel.com
      61b985e3
    • K
      perf/x86/intel: Filter unsupported Topdown metrics event · 1ab5f235
      Kan Liang 提交于
      Intel Sapphire Rapids server will introduce 8 metrics events. Intel
      Ice Lake only supports 4 metrics events. A perf tool user may mistakenly
      use the unsupported events via RAW format on Ice Lake. The user can
      still get a value from the unsupported Topdown metrics event once the
      following Sapphire Rapids enabling patch is applied.
      
      To enable the 8 metrics events on Intel Sapphire Rapids, the
      INTEL_TD_METRIC_MAX has to be updated, which impacts the
      is_metric_event(). The is_metric_event() is a generic function.
      On Ice Lake, the newly added SPR metrics events will be mistakenly
      accepted as metric events on creation. At runtime, the unsupported
      Topdown metrics events will be updated.
      
      Add a variable num_topdown_events in x86_pmu to indicate the available
      number of the Topdown metrics event on the platform. Apply the number
      into is_metric_event(). Only the supported Topdown metrics events
      should be created as metrics events.
      
      Apply the num_topdown_events in icl_update_topdown_event() as well. The
      function can be reused by the following patch.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1611873611-156687-4-git-send-email-kan.liang@linux.intel.com
      1ab5f235