1. 18 8月, 2020 3 次提交
    • K
      perf/x86/intel: Generic support for hardware TopDown metrics · 7b2c05a1
      Kan Liang 提交于
      Intro
      =====
      
      The TopDown Microarchitecture Analysis (TMA) Method is a structured
      analysis methodology to identify critical performance bottlenecks in
      out-of-order processors. Current perf has supported the method.
      
      The method works well, but there is one problem. To collect the TopDown
      events, several GP counters have to be used. If a user wants to collect
      other events at the same time, the multiplexing probably be triggered,
      which impacts the accuracy.
      
      To free up the scarce GP counters, the hardware TopDown metrics feature
      is introduced from Ice Lake. The hardware implements an additional
      "metrics" register and a new Fixed Counter 3 that measures pipeline
      "slots". The TopDown events can be calculated from them instead.
      
      Events
      ======
      
      The level 1 TopDown has four metrics. There is no event-code assigned to
      the TopDown metrics. Four metric events are exported as separate perf
      events, which map to the internal "metrics" counter register. Those
      events do not exist in hardware, but can be allocated by the scheduler.
      
      For the event mapping, a special 0x00 event code is used, which is
      reserved for fake events. The metric events start from umask 0x10.
      
      When setting up the metric events, they point to the Fixed Counter 3.
      They have to be specially handled.
      - Add the update_topdown_event() callback to read the additional metrics
        MSR and generate the metrics.
      - Add the set_topdown_event_period() callback to initialize metrics MSR
        and the fixed counter 3.
      - Add a variable n_metric_event to track the number of the accepted
        metrics events. The sharing between multiple users of the same metric
        without multiplexing is not allowed.
      - Only enable/disable the fixed counter 3 when there are no other active
        TopDown events, which avoid the unnecessary writing of the fixed
        control register.
      - Disable the PMU when reading the metrics event. The metrics MSR and
        the fixed counter 3 are read separately. The values may be modified by
        an NMI.
      
      All four metric events don't support sampling. Since they will be
      handled specially for event update, a flag PERF_X86_EVENT_TOPDOWN is
      introduced to indicate this case.
      
      The slots event can support both sampling and counting.
      For counting, the flag is also applied.
      For sampling, it will be handled normally as other normal events.
      
      Groups
      ======
      
      The slots event is required in a Topdown group.
      To avoid reading the METRICS register multiple times, the metrics and
      slots value can only be updated by slots event in a group.
      All active slots and metrics events will be updated one time.
      Therefore, the slots event must be before any metric events in a Topdown
      group.
      
      NMI
      ======
      
      The METRICS related register may be overflow. The bit 48 of the STATUS
      register will be set. If so, PERF_METRICS and Fixed counter 3 are
      required to be reset. The patch also update all active slots and
      metrics events in the NMI handler.
      
      The update_topdown_event() has to read two registers separately. The
      values may be modified by an NMI. PMU has to be disabled before calling
      the function.
      
      RDPMC
      ======
      
      RDPMC is temporarily disabled. A later patch will enable it.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-9-kan.liang@linux.intel.com
      7b2c05a1
    • K
      perf/x86/intel: Use switch in intel_pmu_disable/enable_event · 58da7dbe
      Kan Liang 提交于
      Currently, the if-else is used in the intel_pmu_disable/enable_event to
      check the type of an event. It works well, but with more and more types
      added later, e.g., perf metrics, compared to the switch statement, the
      if-else may impair the readability of the code.
      
      There is no harm to use the switch statement to replace the if-else
      here. Also, some optimizing compilers may compile a switch statement
      into a jump-table which is more efficient than if-else for a large
      number of cases. The performance gain may not be observed for now,
      because the number of cases is only 5, but the benefits may be observed
      with more and more types added in the future.
      
      Use switch to replace the if-else in the intel_pmu_disable/enable_event.
      
      If the idx is invalid, print a warning.
      
      For the case INTEL_PMC_IDX_FIXED_BTS in intel_pmu_disable_event, don't
      need to check the event->attr.precise_ip. Use return for the case.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-7-kan.liang@linux.intel.com
      58da7dbe
    • K
      perf/x86/intel: Name the global status bit in NMI handler · 60a2a271
      Kan Liang 提交于
      Magic numbers are used in the current NMI handler for the global status
      bit. Use a meaningful name to replace the magic numbers to improve the
      readability of the code.
      
      Remove a Tab for all GLOBAL_STATUS_* and INTEL_PMC_IDX_FIXED_BTS macros
      to reduce the length of the line.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200723171117.9918-3-kan.liang@linux.intel.com
      60a2a271
  2. 08 7月, 2020 4 次提交
    • K
      perf/x86/intel/lbr: Support Architectural LBR · 47125db2
      Kan Liang 提交于
      Last Branch Records (LBR) enables recording of software path history by
      logging taken branches and other control flows within architectural
      registers now. Intel CPUs have had model-specific LBR for quite some
      time, but this evolves them into an architectural feature now.
      
      The main improvements of Architectural LBR implemented includes:
      - Linux kernel can support the LBR features without knowing the model
        number of the current CPU.
      - Architectural LBR capabilities can be enumerated by CPUID. The
        lbr_ctl_map is based on the CPUID Enumeration.
      - The possible LBR depth can be retrieved from CPUID enumeration. The
        max value is written to the new MSR_ARCH_LBR_DEPTH as the number of
        LBR entries.
      - A new IA32_LBR_CTL MSR is introduced to enable and configure LBRs,
        which replaces the IA32_DEBUGCTL[bit 0] and the LBR_SELECT MSR.
      - Each LBR record or entry is still comprised of three MSRs,
        IA32_LBR_x_FROM_IP, IA32_LBR_x_TO_IP and IA32_LBR_x_TO_IP.
        But they become the architectural MSRs.
      - Architectural LBR is stack-like now. Entry 0 is always the youngest
        branch, entry 1 the next youngest... The TOS MSR has been removed.
      
      The way to enable/disable Architectural LBR is similar to the previous
      model-specific LBR. __intel_pmu_lbr_enable/disable() can be reused, but
      some modifications are required, which include:
      - MSR_ARCH_LBR_CTL is used to enable and configure the Architectural
        LBR.
      - When checking the value of the IA32_DEBUGCTL MSR, ignoring the
        DEBUGCTLMSR_LBR (bit 0) for Architectural LBR, which has no meaning
        and always return 0.
      - The FREEZE_LBRS_ON_PMI has to be explicitly set/clear, because
        MSR_IA32_DEBUGCTLMSR is not touched in __intel_pmu_lbr_disable() for
        Architectural LBR.
      - Only MSR_ARCH_LBR_CTL is cleared in __intel_pmu_lbr_disable() for
        Architectural LBR.
      
      Some Architectural LBR dedicated functions are implemented to
      reset/read/save/restore LBR.
      - For reset, writing to the ARCH_LBR_DEPTH MSR clears all Arch LBR
        entries, which is a lot faster and can improve the context switch
        latency.
      - For read, the branch type information can be retrieved from
        the MSR_ARCH_LBR_INFO_*. But it's not fully compatible due to
        OTHER_BRANCH type. The software decoding is still required for the
        OTHER_BRANCH case.
        LBR records are stored in the age order as well. Reuse
        intel_pmu_store_lbr(). Check the CPUID enumeration before accessing
        the corresponding bits in LBR_INFO.
      - For save/restore, applying the fast reset (writing ARCH_LBR_DEPTH).
        Reading 'lbr_from' of entry 0 instead of the TOS MSR to check if the
        LBR registers are reset in the deep C-state. If 'the deep C-state
        reset' bit is not set in CPUID enumeration, ignoring the check.
        XSAVE support for Architectural LBR will be implemented later.
      
      The number of LBR entries cannot be hardcoded anymore, which should be
      retrieved from CPUID enumeration. A new structure
      x86_perf_task_context_arch_lbr is introduced for Architectural LBR.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-15-git-send-email-kan.liang@linux.intel.com
      47125db2
    • K
      perf/x86/intel/lbr: Add the function pointers for LBR save and restore · 799571bf
      Kan Liang 提交于
      The MSRs of Architectural LBR are different from previous model-specific
      LBR. Perf has to implement different functions to save and restore them.
      
      The function pointers for LBR save and restore are introduced. Perf
      should initialize the corresponding functions at boot time.
      
      The generic optimizations, e.g. avoiding restore LBR if no one else
      touched them, still apply for Architectural LBRs. The related codes are
      not moved to model-specific functions.
      
      Current model-specific LBR functions are set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-5-git-send-email-kan.liang@linux.intel.com
      799571bf
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR read · c301b1d8
      Kan Liang 提交于
      The method to read Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer for LBR read is introduced. Perf should initialize
      the corresponding function at boot time, and avoid checking lbr_format
      at run time.
      
      The current 64-bit LBR read function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-4-git-send-email-kan.liang@linux.intel.com
      c301b1d8
    • K
      perf/x86/intel/lbr: Add a function pointer for LBR reset · 9f354a72
      Kan Liang 提交于
      The method to reset Architectural LBRs is different from previous
      model-specific LBR. Perf has to implement a different function.
      
      A function pointer is introduced for LBR reset. The enum of
      LBR_FORMAT_* is also moved to perf_event.h. Perf should initialize the
      corresponding functions at boot time, and avoid checking lbr_format at
      run time.
      
      The current 64-bit LBR reset function is set as default.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/1593780569-62993-3-git-send-email-kan.liang@linux.intel.com
      9f354a72
  3. 02 7月, 2020 3 次提交
  4. 20 5月, 2020 1 次提交
  5. 11 2月, 2020 2 次提交
  6. 15 11月, 2019 1 次提交
    • A
      x86: retpolines: eliminate retpoline from msr event handlers · 74c504a6
      Andrea Arcangeli 提交于
      It's enough to check the value and issue the direct call.
      
      After this commit is applied, here the most common retpolines executed
      under a high resolution timer workload in the guest on a VMX host:
      
      [..]
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 267
      @[]: 2256
      @[
          trace_retpoline+1
          __trace_retpoline+30
          __x86_indirect_thunk_rax+33
          __kvm_wait_lapic_expire+284
          vmx_vcpu_run.part.97+1091
          vcpu_enter_guest+377
          kvm_arch_vcpu_ioctl_run+261
          kvm_vcpu_ioctl+559
          do_vfs_ioctl+164
          ksys_ioctl+96
          __x64_sys_ioctl+22
          do_syscall_64+89
          entry_SYSCALL_64_after_hwframe+68
      ]: 2390
      @[]: 33410
      
      @total: 315707
      
      Note the highest hit above is __delay so probably not worth optimizing
      even if it would be more frequent than 2k hits per sec.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      74c504a6
  7. 28 10月, 2019 1 次提交
  8. 18 10月, 2019 1 次提交
    • J
      perf_event: Add support for LSM and SELinux checks · da97e184
      Joel Fernandes (Google) 提交于
      In current mainline, the degree of access to perf_event_open(2) system
      call depends on the perf_event_paranoid sysctl.  This has a number of
      limitations:
      
      1. The sysctl is only a single value. Many types of accesses are controlled
         based on the single value thus making the control very limited and
         coarse grained.
      2. The sysctl is global, so if the sysctl is changed, then that means
         all processes get access to perf_event_open(2) opening the door to
         security issues.
      
      This patch adds LSM and SELinux access checking which will be used in
      Android to access perf_event_open(2) for the purposes of attaching BPF
      programs to tracepoints, perf profiling and other operations from
      userspace. These operations are intended for production systems.
      
      5 new LSM hooks are added:
      1. perf_event_open: This controls access during the perf_event_open(2)
         syscall itself. The hook is called from all the places that the
         perf_event_paranoid sysctl is checked to keep it consistent with the
         systctl. The hook gets passed a 'type' argument which controls CPU,
         kernel and tracepoint accesses (in this context, CPU, kernel and
         tracepoint have the same semantics as the perf_event_paranoid sysctl).
         Additionally, I added an 'open' type which is similar to
         perf_event_paranoid sysctl == 3 patch carried in Android and several other
         distros but was rejected in mainline [1] in 2016.
      
      2. perf_event_alloc: This allocates a new security object for the event
         which stores the current SID within the event. It will be useful when
         the perf event's FD is passed through IPC to another process which may
         try to read the FD. Appropriate security checks will limit access.
      
      3. perf_event_free: Called when the event is closed.
      
      4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.
      
      5. perf_event_write: Called from the ioctl(2) syscalls for the event.
      
      [1] https://lwn.net/Articles/696240/
      
      Since Peter had suggest LSM hooks in 2016 [1], I am adding his
      Suggested-by tag below.
      
      To use this patch, we set the perf_event_paranoid sysctl to -1 and then
      apply selinux checking as appropriate (default deny everything, and then
      add policy rules to give access to domains that need it). In the future
      we can remove the perf_event_paranoid sysctl altogether.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Co-developed-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: rostedt@goodmis.org
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: jeffv@google.com
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: primiano@google.com
      Cc: Song Liu <songliubraving@fb.com>
      Cc: rsavitski@google.com
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Matthew Garrett <matthewgarrett@google.com>
      Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
      da97e184
  9. 12 10月, 2019 2 次提交
  10. 30 8月, 2019 1 次提交
    • J
      perf/x86/intel: Restrict period on Nehalem · 44d3bbb6
      Josh Hunt 提交于
      We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
      some cases when using perf:
      
      perfevents: irq loop stuck!
      WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
      ...
      RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
      ...
      Call Trace:
      <NMI>
      ? perf_event_nmi_handler+0x2e/0x50
      ? intel_pmu_save_and_restart+0x50/0x50
      perf_event_nmi_handler+0x2e/0x50
      nmi_handle+0x6e/0x120
      default_do_nmi+0x3e/0x100
      do_nmi+0x102/0x160
      end_repeat_nmi+0x16/0x50
      ...
      ? native_write_msr+0x6/0x20
      ? native_write_msr+0x6/0x20
      </NMI>
      intel_pmu_enable_event+0x1ce/0x1f0
      x86_pmu_start+0x78/0xa0
      x86_pmu_enable+0x252/0x310
      __perf_event_task_sched_in+0x181/0x190
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      finish_task_switch+0x158/0x260
      __schedule+0x2f6/0x840
      ? hrtimer_start_range_ns+0x153/0x210
      schedule+0x32/0x80
      schedule_hrtimeout_range_clock+0x8a/0x100
      ? hrtimer_init+0x120/0x120
      ep_poll+0x2f7/0x3a0
      ? wake_up_q+0x60/0x60
      do_epoll_wait+0xa9/0xc0
      __x64_sys_epoll_wait+0x1a/0x20
      do_syscall_64+0x4e/0x110
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fdeb1e96c03
      ...
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: acme@kernel.org
      Cc: Josh Hunt <johunt@akamai.com>
      Cc: bpuranda@akamai.com
      Cc: mingo@redhat.com
      Cc: jolsa@redhat.com
      Cc: tglx@linutronix.de
      Cc: namhyung@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
      44d3bbb6
  11. 28 8月, 2019 5 次提交
  12. 26 7月, 2019 1 次提交
    • G
      perf/x86/intel: Mark expected switch fall-throughs · 7b26b91d
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      arch/x86/events/intel/core.c: In function ‘intel_pmu_init’:
      arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
         pmem = true;
         ~~~~~^~~~~~
      arch/x86/events/intel/core.c:4960:2: note: here
        case INTEL_FAM6_SKYLAKE_MOBILE:
        ^~~~
      arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
         pmem = true;
         ~~~~~^~~~~~
      arch/x86/events/intel/core.c:5009:2: note: here
        case INTEL_FAM6_ICELAKE_MOBILE:
        ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      7b26b91d
  13. 25 7月, 2019 3 次提交
  14. 13 7月, 2019 1 次提交
    • K
      perf/x86/intel: Fix spurious NMI on fixed counter · e4557c1a
      Kan Liang 提交于
      If a user first sample a PEBS event on a fixed counter, then sample a
      non-PEBS event on the same fixed counter on Icelake, it will trigger
      spurious NMI. For example:
      
        perf record -e 'cycles:p' -a
        perf record -e 'cycles' -a
      
      The error message for spurious NMI:
      
        [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2.
        [    +0.000000] Do you have a strange power saving mode enabled?
        [    +0.000000] Dazed and confused, but trying to continue
      
      The bug was introduced by the following commit:
      
        commit 6f55967a ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      
      The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(),
      which returns immediately.  The related bit of PEBS_ENABLE MSR will never be
      cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter,
      but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs.
      
      Check and disable PEBS for fixed counters after intel_pmu_disable_fixed().
      Reported-by: NYi, Ammy <ammy.yi@intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 6f55967a ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4557c1a
  15. 17 6月, 2019 3 次提交
  16. 03 6月, 2019 5 次提交
  17. 21 5月, 2019 1 次提交
  18. 14 5月, 2019 1 次提交
  19. 05 5月, 2019 1 次提交
    • J
      perf/x86/intel: Fix race in intel_pmu_disable_event() · 6f55967a
      Jiri Olsa 提交于
      New race in x86_pmu_stop() was introduced by replacing the
      atomic __test_and_clear_bit() of cpuc->active_mask by separate
      test_bit() and __clear_bit() calls in the following commit:
      
        3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      
      The race causes panic for PEBS events with enabled callchains:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:perf_prepare_sample+0x8c/0x530
        Call Trace:
         <NMI>
         perf_event_output_forward+0x2a/0x80
         __perf_event_overflow+0x51/0xe0
         handle_pmi_common+0x19e/0x240
         intel_pmu_handle_irq+0xad/0x170
         perf_event_nmi_handler+0x2e/0x50
         nmi_handle+0x69/0x110
         default_do_nmi+0x3e/0x100
         do_nmi+0x11a/0x180
         end_repeat_nmi+0x16/0x1a
        RIP: 0010:native_write_msr+0x6/0x20
        ...
         </NMI>
         intel_pmu_disable_event+0x98/0xf0
         x86_pmu_stop+0x6e/0xb0
         x86_pmu_del+0x46/0x140
         event_sched_out.isra.97+0x7e/0x160
        ...
      
      The event is configured to make samples from PEBS drain code,
      but when it's disabled, we'll go through NMI path instead,
      where data->callchain will not get allocated and we'll crash:
      
                x86_pmu_stop
                  test_bit(hwc->idx, cpuc->active_mask)
                  intel_pmu_disable_event(event)
                  {
                    ...
                    intel_pmu_pebs_disable(event);
                    ...
      
      EVENT OVERFLOW ->  <NMI>
                           intel_pmu_handle_irq
                             handle_pmi_common
         TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                                 perf_event_overflow
                                   perf_prepare_sample
                                   {
                                     ...
                                     if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                           data->callchain = perf_callchain(event, regs);
      
               CRASH ->              size += data->callchain->nr;
                                   }
                         </NMI>
                    ...
                    x86_pmu_disable_event(event)
                  }
      
                  __clear_bit(hwc->idx, cpuc->active_mask);
      
      Fixing this by disabling the event itself before setting
      off the PEBS bit.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f55967a