1. 12 10月, 2019 2 次提交
  2. 30 8月, 2019 1 次提交
    • J
      perf/x86/intel: Restrict period on Nehalem · 44d3bbb6
      Josh Hunt 提交于
      We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
      some cases when using perf:
      
      perfevents: irq loop stuck!
      WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
      ...
      RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
      ...
      Call Trace:
      <NMI>
      ? perf_event_nmi_handler+0x2e/0x50
      ? intel_pmu_save_and_restart+0x50/0x50
      perf_event_nmi_handler+0x2e/0x50
      nmi_handle+0x6e/0x120
      default_do_nmi+0x3e/0x100
      do_nmi+0x102/0x160
      end_repeat_nmi+0x16/0x50
      ...
      ? native_write_msr+0x6/0x20
      ? native_write_msr+0x6/0x20
      </NMI>
      intel_pmu_enable_event+0x1ce/0x1f0
      x86_pmu_start+0x78/0xa0
      x86_pmu_enable+0x252/0x310
      __perf_event_task_sched_in+0x181/0x190
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      finish_task_switch+0x158/0x260
      __schedule+0x2f6/0x840
      ? hrtimer_start_range_ns+0x153/0x210
      schedule+0x32/0x80
      schedule_hrtimeout_range_clock+0x8a/0x100
      ? hrtimer_init+0x120/0x120
      ep_poll+0x2f7/0x3a0
      ? wake_up_q+0x60/0x60
      do_epoll_wait+0xa9/0xc0
      __x64_sys_epoll_wait+0x1a/0x20
      do_syscall_64+0x4e/0x110
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fdeb1e96c03
      ...
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: acme@kernel.org
      Cc: Josh Hunt <johunt@akamai.com>
      Cc: bpuranda@akamai.com
      Cc: mingo@redhat.com
      Cc: jolsa@redhat.com
      Cc: tglx@linutronix.de
      Cc: namhyung@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.com
      44d3bbb6
  3. 28 8月, 2019 5 次提交
  4. 26 7月, 2019 1 次提交
    • G
      perf/x86/intel: Mark expected switch fall-throughs · 7b26b91d
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      arch/x86/events/intel/core.c: In function ‘intel_pmu_init’:
      arch/x86/events/intel/core.c:4959:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
         pmem = true;
         ~~~~~^~~~~~
      arch/x86/events/intel/core.c:4960:2: note: here
        case INTEL_FAM6_SKYLAKE_MOBILE:
        ^~~~
      arch/x86/events/intel/core.c:5008:8: warning: this statement may fall through [-Wimplicit-fallthrough=]
         pmem = true;
         ~~~~~^~~~~~
      arch/x86/events/intel/core.c:5009:2: note: here
        case INTEL_FAM6_ICELAKE_MOBILE:
        ^~~~
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enable
      -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      7b26b91d
  5. 25 7月, 2019 3 次提交
  6. 13 7月, 2019 1 次提交
    • K
      perf/x86/intel: Fix spurious NMI on fixed counter · e4557c1a
      Kan Liang 提交于
      If a user first sample a PEBS event on a fixed counter, then sample a
      non-PEBS event on the same fixed counter on Icelake, it will trigger
      spurious NMI. For example:
      
        perf record -e 'cycles:p' -a
        perf record -e 'cycles' -a
      
      The error message for spurious NMI:
      
        [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2.
        [    +0.000000] Do you have a strange power saving mode enabled?
        [    +0.000000] Dazed and confused, but trying to continue
      
      The bug was introduced by the following commit:
      
        commit 6f55967a ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      
      The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(),
      which returns immediately.  The related bit of PEBS_ENABLE MSR will never be
      cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter,
      but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs.
      
      Check and disable PEBS for fixed counters after intel_pmu_disable_fixed().
      Reported-by: NYi, Ammy <ammy.yi@intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 6f55967a ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4557c1a
  7. 17 6月, 2019 3 次提交
  8. 03 6月, 2019 5 次提交
  9. 21 5月, 2019 1 次提交
  10. 14 5月, 2019 1 次提交
  11. 05 5月, 2019 1 次提交
    • J
      perf/x86/intel: Fix race in intel_pmu_disable_event() · 6f55967a
      Jiri Olsa 提交于
      New race in x86_pmu_stop() was introduced by replacing the
      atomic __test_and_clear_bit() of cpuc->active_mask by separate
      test_bit() and __clear_bit() calls in the following commit:
      
        3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      
      The race causes panic for PEBS events with enabled callchains:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:perf_prepare_sample+0x8c/0x530
        Call Trace:
         <NMI>
         perf_event_output_forward+0x2a/0x80
         __perf_event_overflow+0x51/0xe0
         handle_pmi_common+0x19e/0x240
         intel_pmu_handle_irq+0xad/0x170
         perf_event_nmi_handler+0x2e/0x50
         nmi_handle+0x69/0x110
         default_do_nmi+0x3e/0x100
         do_nmi+0x11a/0x180
         end_repeat_nmi+0x16/0x1a
        RIP: 0010:native_write_msr+0x6/0x20
        ...
         </NMI>
         intel_pmu_disable_event+0x98/0xf0
         x86_pmu_stop+0x6e/0xb0
         x86_pmu_del+0x46/0x140
         event_sched_out.isra.97+0x7e/0x160
        ...
      
      The event is configured to make samples from PEBS drain code,
      but when it's disabled, we'll go through NMI path instead,
      where data->callchain will not get allocated and we'll crash:
      
                x86_pmu_stop
                  test_bit(hwc->idx, cpuc->active_mask)
                  intel_pmu_disable_event(event)
                  {
                    ...
                    intel_pmu_pebs_disable(event);
                    ...
      
      EVENT OVERFLOW ->  <NMI>
                           intel_pmu_handle_irq
                             handle_pmi_common
         TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                                 perf_event_overflow
                                   perf_prepare_sample
                                   {
                                     ...
                                     if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                           data->callchain = perf_callchain(event, regs);
      
               CRASH ->              size += data->callchain->nr;
                                   }
                         </NMI>
                    ...
                    x86_pmu_disable_event(event)
                  }
      
                  __clear_bit(hwc->idx, cpuc->active_mask);
      
      Fixing this by disabling the event itself before setting
      off the PEBS bit.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f55967a
  12. 01 5月, 2019 1 次提交
  13. 16 4月, 2019 6 次提交
    • K
      perf/x86/intel: Add Tremont core PMU support · 6daeb873
      Kan Liang 提交于
      Add perf core PMU support for Intel Tremont CPU.
      
      The init code is based on Goldmont plus.
      
      The generic purpose counter 0 and fixed counter 0 have less skid.
      Force :ppp events on generic purpose counter 0.
      Force instruction:ppp on generic purpose counter 0 and fixed counter 0.
      
      Updates LLC cache event table and OFFCORE_RESPONSE mask.
      
      Adaptive PEBS, which is already enabled on ICL, is also supported
      on Tremont. No extra code required.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/1554922629-126287-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6daeb873
    • K
      perf/x86/intel: Add Icelake support · 60176089
      Kan Liang 提交于
      Add Icelake core PMU perf code, including constraint tables and the main
      enable code.
      
      Icelake expanded the generic counters to always 8 even with HT on, but a
      range of events cannot be scheduled on the extra 4 counters.
      Add new constraint ranges to describe this to the scheduler.
      The number of constraints that need to be checked is larger now than
      with earlier CPUs.
      At some point we may need a new data structure to look them up more
      efficiently than with linear search. So far it still seems to be
      acceptable however.
      
      Icelake added a new fixed counter SLOTS. Full support for it is added
      later in the patch series.
      
      The cache events table is identical to Skylake.
      
      Compare to PEBS instruction event on generic counter, fixed counter 0
      has less skid. Force instruction:ppp always in fixed counter 0.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-9-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60176089
    • P
      perf/x86: Support constraint ranges · 63b79f6e
      Peter Zijlstra 提交于
      Icelake extended the general counters to 8, even when SMT is enabled.
      However only a (large) subset of the events can be used on all 8
      counters.
      
      The events that can or cannot be used on all counters are organized
      in ranges.
      
      A lot of scheduler constraints are required to handle all this.
      
      To avoid blowing up the tables add event code ranges to the constraint
      tables, and a new inline function to match them.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # developer hat on
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # maintainer hat on
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-8-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63b79f6e
    • K
      perf/x86/intel: Support adaptive PEBS v4 · c22497f5
      Kan Liang 提交于
      Adaptive PEBS is a new way to report PEBS sampling information. Instead
      of a fixed size record for all PEBS events it allows to configure the
      PEBS record to only include the information needed. Events can then opt
      in to use such an extended record, or stay with a basic record which
      only contains the IP.
      
      The major new feature is to support LBRs in PEBS record.
      Besides normal LBR, this allows (much faster) large PEBS, while still
      supporting callstacks through callstack LBR. So essentially a lot of
      profiling can now be done without frequent interrupts, dropping the
      overhead significantly.
      
      The main requirement still is to use a period, and not use frequency
      mode, because frequency mode requires reevaluating the frequency on each
      overflow.
      
      The floating point state (XMM) is also supported, which allows efficient
      profiling of FP function arguments.
      
      Introduce specific drain function to handle variable length records.
      Use a new callback to parse the new record format, and also handle the
      STATUS field now being at a different offset.
      
      Add code to set up the configuration register. Since there is only a
      single register, all events either get the full super set of all events,
      or only the basic record.
      Originally-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: https://lkml.kernel.org/r/20190402194509.2832-6-kan.liang@linux.intel.com
      [ Renamed GPRS => GP. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c22497f5
    • S
      perf/x86/intel: Force resched when TFA sysctl is modified · f447e4eb
      Stephane Eranian 提交于
      This patch provides guarantee to the sysadmin that when TFA is disabled, no PMU
      event is using PMC3 when the echo command returns. Vice-Versa, when TFA
      is enabled, PMU can use PMC3 immediately (to eliminate possible multiplexing).
      
        $ perf stat -a -I 1000 --no-merge -e branches,branches,branches,branches
           1.000123979    125,768,725,208      branches
           1.000562520    125,631,000,456      branches
           1.000942898    125,487,114,291      branches
           1.001333316    125,323,363,620      branches
           2.004721306    125,514,968,546      branches
           2.005114560    125,511,110,861      branches
           2.005482722    125,510,132,724      branches
           2.005851245    125,508,967,086      branches
           3.006323475    125,166,570,648      branches
           3.006709247    125,165,650,056      branches
           3.007086605    125,164,639,142      branches
           3.007459298    125,164,402,912      branches
           4.007922698    125,045,577,140      branches
           4.008310775    125,046,804,324      branches
           4.008670814    125,048,265,111      branches
           4.009039251    125,048,677,611      branches
           5.009503373    125,122,240,217      branches
           5.009897067    125,122,450,517      branches
      
      Then on another connection, sysadmin does:
      
        $ echo  1 >/sys/devices/cpu/allow_tsx_force_abort
      
      Then perf stat adjusts the events immediately:
      
           5.010286029    125,121,393,483      branches
           5.010646308    125,120,556,786      branches
           6.011113588    124,963,351,832      branches
           6.011510331    124,964,267,566      branches
           6.011889913    124,964,829,130      branches
           6.012262996    124,965,841,156      branches
           7.012708299    124,419,832,234      branches [79.69%]
           7.012847908    124,416,363,853      branches [79.73%]
           7.013225462    124,400,723,712      branches [79.73%]
           7.013598191    124,376,154,434      branches [79.70%]
           8.014089834    124,250,862,693      branches [74.98%]
           8.014481363    124,267,539,139      branches [74.94%]
           8.014856006    124,259,519,786      branches [74.98%]
           8.014980848    124,225,457,969      branches [75.04%]
           9.015464576    124,204,235,423      branches [75.03%]
           9.015858587    124,204,988,490      branches [75.04%]
           9.016243680    124,220,092,486      branches [74.99%]
           9.016620104    124,231,260,146      branches [74.94%]
      
      And vice-versa if the syadmin does:
      
        $ echo  0 >/sys/devices/cpu/allow_tsx_force_abort
      
      Events are again spread over the 4 counters:
      
          10.017096277    124,276,230,565      branches [74.96%]
          10.017237209    124,228,062,171      branches [75.03%]
          10.017478637    124,178,780,626      branches [75.03%]
          10.017853402    124,198,316,177      branches [75.03%]
          11.018334423    124,602,418,933      branches [85.40%]
          11.018722584    124,602,921,320      branches [85.42%]
          11.019095621    124,603,956,093      branches [85.42%]
          11.019467742    124,595,273,783      branches [85.42%]
          12.019945736    125,110,114,864      branches
          12.020330764    125,109,334,472      branches
          12.020688740    125,109,818,865      branches
          12.021054020    125,108,594,014      branches
          13.021516774    125,109,164,018      branches
          13.021903640    125,108,794,510      branches
          13.022270770    125,107,756,978      branches
          13.022630819    125,109,380,471      branches
          14.023114989    125,133,140,817      branches
          14.023501880    125,133,785,858      branches
          14.023868339    125,133,852,700      branches
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Cc: nelson.dsouza@intel.com
      Cc: tonyj@suse.com
      Link: https://lkml.kernel.org/r/20190408173252.37932-3-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f447e4eb
    • K
      perf/x86: Fix incorrect PEBS_REGS · 9d5dcc93
      Kan Liang 提交于
      PEBS_REGS used as mask for the supported registers for large PEBS.
      However, the mask cannot filter the sample_regs_user/sample_regs_intr
      correctly.
      
      (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which
      is only the index.
      
      Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general
      purpose registers.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Fixes: 2fe1bc1f ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
      Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com
      [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9d5dcc93
  14. 03 4月, 2019 6 次提交
  15. 17 3月, 2019 1 次提交
  16. 15 3月, 2019 1 次提交
  17. 06 3月, 2019 1 次提交
    • P
      perf/x86/intel: Implement support for TSX Force Abort · 400816f6
      Peter Zijlstra (Intel) 提交于
      Skylake (and later) will receive a microcode update to address a TSX
      errata. This microcode will, on execution of a TSX instruction
      (speculative or not) use (clobber) PMC3. This update will also provide
      a new MSR to change this behaviour along with a CPUID bit to enumerate
      the presence of this new MSR.
      
      When the MSR gets set; the microcode will no longer use PMC3 but will
      Force Abort every TSX transaction (upon executing COMMIT).
      
      When TSX Force Abort (TFA) is allowed (default); the MSR gets set when
      PMC3 gets scheduled and cleared when, after scheduling, PMC3 is
      unused.
      
      When TFA is not allowed; clear PMC3 from all constraints such that it
      will not get used.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      400816f6