1. 20 11月, 2019 10 次提交
  2. 21 9月, 2019 2 次提交
    • K
      perf/x86/amd/ibs: Fix sample bias for dispatched micro-ops · 7ec11cad
      Kim Phillips 提交于
      [ Upstream commit 0f4cd769c410e2285a4e9873a684d90423f03090 ]
      
      When counting dispatched micro-ops with cnt_ctl=1, in order to prevent
      sample bias, IBS hardware preloads the least significant 7 bits of
      current count (IbsOpCurCnt) with random values, such that, after the
      interrupt is handled and counting resumes, the next sample taken
      will be slightly perturbed.
      
      The current count bitfield is in the IBS execution control h/w register,
      alongside the maximum count field.
      
      Currently, the IBS driver writes that register with the maximum count,
      leaving zeroes to fill the current count field, thereby overwriting
      the random bits the hardware preloaded for itself.
      
      Fix the driver to actually retain and carry those random bits from the
      read of the IBS control register, through to its write, instead of
      overwriting the lower current count bits with zeroes.
      
      Tested with:
      
      perf record -c 100001 -e ibs_op/cnt_ctl=1/pp -a -C 0 taskset -c 0 <workload>
      
      'perf annotate' output before:
      
       15.70  65:   addsd     %xmm0,%xmm1
       17.30        add       $0x1,%rax
       15.88        cmp       %rdx,%rax
                    je        82
       17.32  72:   test      $0x1,%al
                    jne       7c
        7.52        movapd    %xmm1,%xmm0
        5.90        jmp       65
        8.23  7c:   sqrtsd    %xmm1,%xmm0
       12.15        jmp       65
      
      'perf annotate' output after:
      
       16.63  65:   addsd     %xmm0,%xmm1
       16.82        add       $0x1,%rax
       16.81        cmp       %rdx,%rax
                    je        82
       16.69  72:   test      $0x1,%al
                    jne       7c
        8.30        movapd    %xmm1,%xmm0
        8.13        jmp       65
        8.24  7c:   sqrtsd    %xmm1,%xmm0
        8.39        jmp       65
      
      Tested on Family 15h and 17h machines.
      
      Machines prior to family 10h Rev. C don't have the RDWROPCNT capability,
      and have the IbsOpCurCnt bitfield reserved, so this patch shouldn't
      affect their operation.
      
      It is unknown why commit db98c5fa ("perf/x86: Implement 64-bit
      counter support for IBS") ignored the lower 4 bits of the IbsOpCurCnt
      field; the number of preloaded random bits has always been 7, AFAICT.
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Arnaldo Carvalho de Melo" <acme@kernel.org>
      Cc: <x86@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "Borislav Petkov" <bp@alien8.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: "Namhyung Kim" <namhyung@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20190826195730.30614-1-kim.phillips@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      7ec11cad
    • J
      perf/x86/intel: Restrict period on Nehalem · 560857de
      Josh Hunt 提交于
      [ Upstream commit 44d3bbb6f5e501b873218142fe08cdf62a4ac1f3 ]
      
      We see our Nehalem machines reporting 'perfevents: irq loop stuck!' in
      some cases when using perf:
      
      perfevents: irq loop stuck!
      WARNING: CPU: 0 PID: 3485 at arch/x86/events/intel/core.c:2282 intel_pmu_handle_irq+0x37b/0x530
      ...
      RIP: 0010:intel_pmu_handle_irq+0x37b/0x530
      ...
      Call Trace:
      <NMI>
      ? perf_event_nmi_handler+0x2e/0x50
      ? intel_pmu_save_and_restart+0x50/0x50
      perf_event_nmi_handler+0x2e/0x50
      nmi_handle+0x6e/0x120
      default_do_nmi+0x3e/0x100
      do_nmi+0x102/0x160
      end_repeat_nmi+0x16/0x50
      ...
      ? native_write_msr+0x6/0x20
      ? native_write_msr+0x6/0x20
      </NMI>
      intel_pmu_enable_event+0x1ce/0x1f0
      x86_pmu_start+0x78/0xa0
      x86_pmu_enable+0x252/0x310
      __perf_event_task_sched_in+0x181/0x190
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      ? __switch_to_asm+0x41/0x70
      ? __switch_to_asm+0x35/0x70
      finish_task_switch+0x158/0x260
      __schedule+0x2f6/0x840
      ? hrtimer_start_range_ns+0x153/0x210
      schedule+0x32/0x80
      schedule_hrtimeout_range_clock+0x8a/0x100
      ? hrtimer_init+0x120/0x120
      ep_poll+0x2f7/0x3a0
      ? wake_up_q+0x60/0x60
      do_epoll_wait+0xa9/0xc0
      __x64_sys_epoll_wait+0x1a/0x20
      do_syscall_64+0x4e/0x110
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fdeb1e96c03
      ...
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: acme@kernel.org
      Cc: Josh Hunt <johunt@akamai.com>
      Cc: bpuranda@akamai.com
      Cc: mingo@redhat.com
      Cc: jolsa@redhat.com
      Cc: tglx@linutronix.de
      Cc: namhyung@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Link: https://lkml.kernel.org/r/1566256411-18820-1-git-send-email-johunt@akamai.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      560857de
  3. 26 7月, 2019 4 次提交
    • K
      perf/x86/amd/uncore: Set the thread mask for F17h L3 PMCs · 9854e068
      Kim Phillips 提交于
      commit 2f217d58a8a086d3399fecce39fb358848e799c4 upstream.
      
      Fill in the L3 performance event select register ThreadMask
      bitfield, to enable per hardware thread accounting.
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20190628215906.4276-2-kim.phillips@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9854e068
    • K
      perf/x86/amd/uncore: Do not set 'ThreadMask' and 'SliceMask' for non-L3 PMCs · 82c46f7b
      Kim Phillips 提交于
      commit 16f4641166b10e199f0d7b68c2c5f004fef0bda3 upstream.
      
      The following commit:
      
        d7cbbe49 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
      
      enables L3 PMC events for all threads and slices by writing 1's in
      'ChL3PmcCfg' (L3 PMC PERF_CTL) register fields.
      
      Those bitfields overlap with high order event select bits in the Data
      Fabric PMC control register, however.
      
      So when a user requests raw Data Fabric events (-e amd_df/event=0xYYY/),
      the two highest order bits get inadvertently set, changing the counter
      select to events that don't exist, and for which no counts are read.
      
      This patch changes the logic to write the L3 masks only when dealing
      with L3 PMC counters.
      
      AMD Family 16h and below Northbridge (NB) counters were not affected.
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Gary Hook <Gary.Hook@amd.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liska <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: d7cbbe49 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
      Link: https://lkml.kernel.org/r/20190628215906.4276-1-kim.phillips@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      82c46f7b
    • K
      perf/x86/intel: Fix spurious NMI on fixed counter · a847a522
      Kan Liang 提交于
      commit e4557c1a46b0d32746bd309e1941914b5a6912b4 upstream.
      
      If a user first sample a PEBS event on a fixed counter, then sample a
      non-PEBS event on the same fixed counter on Icelake, it will trigger
      spurious NMI. For example:
      
        perf record -e 'cycles:p' -a
        perf record -e 'cycles' -a
      
      The error message for spurious NMI:
      
        [June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2.
        [    +0.000000] Do you have a strange power saving mode enabled?
        [    +0.000000] Dazed and confused, but trying to continue
      
      The bug was introduced by the following commit:
      
        commit 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      
      The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(),
      which returns immediately.  The related bit of PEBS_ENABLE MSR will never be
      cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter,
      but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs.
      
      Check and disable PEBS for fixed counters after intel_pmu_disable_fixed().
      Reported-by: NYi, Ammy <ammy.yi@intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
      Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a847a522
    • K
      perf/x86/intel/uncore: Handle invalid event coding for free-running counter · dd0260fd
      Kan Liang 提交于
      [ Upstream commit 543ac280b3576c0009e8c0fcd4d6bfc9978d7bd0 ]
      
      Counting with invalid event coding for free-running counter may cause
      OOPs, e.g. uncore_iio_free_running_0/event=1/.
      
      Current code only validate the event with free-running event format,
      event=0xff,umask=0xXY. Non-free-running event format never be checked
      for the PMU with free-running counters.
      
      Add generic hw_config() to check and reject the invalid event coding
      for free-running PMU.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: eranian@google.com
      Fixes: 0f519f03 ("perf/x86/intel/uncore: Support IIO free-running counters on SKX")
      Link: https://lkml.kernel.org/r/1556672028-119221-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      dd0260fd
  4. 22 6月, 2019 1 次提交
    • S
      perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints · 5a9c29cc
      Stephane Eranian 提交于
      [ Upstream commit 23e3983a466cd540ffdd2bbc6e0c51e31934f941 ]
      
      This patch fixes an bug revealed by the following commit:
      
        6b89d4c1ae85 ("perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking")
      
      That patch modified INTEL_FLAGS_EVENT_CONSTRAINT() to only look at the event code
      when matching a constraint. If code+umask were needed, then the
      INTEL_FLAGS_UEVENT_CONSTRAINT() macro was needed instead.
      This broke with some of the constraints for PEBS events.
      
      Several of them, including the one used for cycles:p, cycles:pp, cycles:ppp
      fell in that category and caused the event to be rejected in PEBS mode.
      In other words, on some platforms a cmdline such as:
      
        $ perf top -e cycles:pp
      
      would fail with -EINVAL.
      
      This patch fixes this bug by properly using INTEL_FLAGS_UEVENT_CONSTRAINT()
      when needed in the PEBS constraint tables.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/20190521005246.423-1-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      5a9c29cc
  5. 15 6月, 2019 1 次提交
  6. 31 5月, 2019 3 次提交
  7. 26 5月, 2019 1 次提交
    • J
      perf/x86/intel: Fix race in intel_pmu_disable_event() · a0b1dde1
      Jiri Olsa 提交于
      [ Upstream commit 6f55967ad9d9752813e36de6d5fdbd19741adfc7 ]
      
      New race in x86_pmu_stop() was introduced by replacing the
      atomic __test_and_clear_bit() of cpuc->active_mask by separate
      test_bit() and __clear_bit() calls in the following commit:
      
        3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      
      The race causes panic for PEBS events with enabled callchains:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:perf_prepare_sample+0x8c/0x530
        Call Trace:
         <NMI>
         perf_event_output_forward+0x2a/0x80
         __perf_event_overflow+0x51/0xe0
         handle_pmi_common+0x19e/0x240
         intel_pmu_handle_irq+0xad/0x170
         perf_event_nmi_handler+0x2e/0x50
         nmi_handle+0x69/0x110
         default_do_nmi+0x3e/0x100
         do_nmi+0x11a/0x180
         end_repeat_nmi+0x16/0x1a
        RIP: 0010:native_write_msr+0x6/0x20
        ...
         </NMI>
         intel_pmu_disable_event+0x98/0xf0
         x86_pmu_stop+0x6e/0xb0
         x86_pmu_del+0x46/0x140
         event_sched_out.isra.97+0x7e/0x160
        ...
      
      The event is configured to make samples from PEBS drain code,
      but when it's disabled, we'll go through NMI path instead,
      where data->callchain will not get allocated and we'll crash:
      
                x86_pmu_stop
                  test_bit(hwc->idx, cpuc->active_mask)
                  intel_pmu_disable_event(event)
                  {
                    ...
                    intel_pmu_pebs_disable(event);
                    ...
      
      EVENT OVERFLOW ->  <NMI>
                           intel_pmu_handle_irq
                             handle_pmi_common
         TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                                 perf_event_overflow
                                   perf_prepare_sample
                                   {
                                     ...
                                     if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                           data->callchain = perf_callchain(event, regs);
      
               CRASH ->              size += data->callchain->nr;
                                   }
                         </NMI>
                    ...
                    x86_pmu_disable_event(event)
                  }
      
                  __clear_bit(hwc->idx, cpuc->active_mask);
      
      Fixing this by disabling the event itself before setting
      off the PEBS bit.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 3966c3feca3f ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a0b1dde1
  8. 15 5月, 2019 1 次提交
    • P
      x86/cpu: Sanitize FAM6_ATOM naming · 1f1bc822
      Peter Zijlstra 提交于
      commit f2c4db1bd80720cd8cb2a5aa220d9bc9f374f04e upstream
      
      Going primarily by:
      
        https://en.wikipedia.org/wiki/List_of_Intel_Atom_microprocessors
      
      with additional information gleaned from other related pages; notably:
      
       - Bonnell shrink was called Saltwell
       - Moorefield is the Merriefield refresh which makes it Airmont
      
      The general naming scheme is: FAM6_ATOM_UARCH_SOCTYPE
      
        for i in `git grep -l FAM6_ATOM` ; do
      	sed -i  -e 's/ATOM_PINEVIEW/ATOM_BONNELL/g'		\
      		-e 's/ATOM_LINCROFT/ATOM_BONNELL_MID/'		\
      		-e 's/ATOM_PENWELL/ATOM_SALTWELL_MID/g'		\
      		-e 's/ATOM_CLOVERVIEW/ATOM_SALTWELL_TABLET/g'	\
      		-e 's/ATOM_CEDARVIEW/ATOM_SALTWELL/g'		\
      		-e 's/ATOM_SILVERMONT1/ATOM_SILVERMONT/g'	\
      		-e 's/ATOM_SILVERMONT2/ATOM_SILVERMONT_X/g'	\
      		-e 's/ATOM_MERRIFIELD/ATOM_SILVERMONT_MID/g'	\
      		-e 's/ATOM_MOOREFIELD/ATOM_AIRMONT_MID/g'	\
      		-e 's/ATOM_DENVERTON/ATOM_GOLDMONT_X/g'		\
      		-e 's/ATOM_GEMINI_LAKE/ATOM_GOLDMONT_PLUS/g' ${i}
        done
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: dave.hansen@linux.intel.com
      Cc: len.brown@intel.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f1bc822
  9. 10 5月, 2019 2 次提交
  10. 08 5月, 2019 1 次提交
    • K
      perf/x86/amd: Update generic hardware cache events for Family 17h · 3f8497cf
      Kim Phillips 提交于
      commit 0e3b74e26280f2cf8753717a950b97d424da6046 upstream.
      
      Add a new amd_hw_cache_event_ids_f17h assignment structure set
      for AMD families 17h and above, since a lot has changed.  Specifically:
      
      L1 Data Cache
      
      The data cache access counter remains the same on Family 17h.
      
      For DC misses, PMCx041's definition changes with Family 17h,
      so instead we use the L2 cache accesses from L1 data cache
      misses counter (PMCx060,umask=0xc8).
      
      For DC hardware prefetch events, Family 17h breaks compatibility
      for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware
      Prefetch DC Fills."
      
      L1 Instruction Cache
      
      PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward
      compatible on Family 17h.
      
      For prefetches, we remove the erroneous PMCx04B assignment which
      counts how many software data cache prefetch load instructions were
      dispatched.
      
      LL - Last Level Cache
      
      Removing PMCs 7D, 7E, and 7F assignments, as they do not exist
      on Family 17h, where the last level cache is L3.  L3 counters
      can be accessed using the existing AMD Uncore driver.
      
      Data TLB
      
      On Intel machines, data TLB accesses ("dTLB-loads") are assigned
      to counters that count load/store instructions retired.  This
      is inconsistent with instruction TLB accesses, where Intel
      implementations report iTLB misses that hit in the STLB.
      
      Ideally, dTLB-loads would count higher level dTLB misses that hit
      in lower level TLBs, and dTLB-load-misses would report those
      that also missed in those lower-level TLBs, therefore causing
      a page table walk.  That would be consistent with instruction
      TLB operation, remove the redundancy between dTLB-loads and
      L1-dcache-loads, and prevent perf from producing artificially
      low percentage ratios, i.e. the "0.01%" below:
      
              42,550,869      L1-dcache-loads
              41,591,860      dTLB-loads
                   4,802      dTLB-load-misses          #    0.01% of all dTLB cache hits
               7,283,682      L1-dcache-stores
               7,912,392      dTLB-stores
                     310      dTLB-store-misses
      
      On AMD Families prior to 17h, the "Data Cache Accesses" counter is
      used, which is slightly better than load/store instructions retired,
      but still counts in terms of individual load/store operations
      instead of TLB operations.
      
      So, for AMD Families 17h and higher, this patch assigns "dTLB-loads"
      to a counter for L1 dTLB misses that hit in the L2 dTLB, and
      "dTLB-load-misses" to a counter for L1 DTLB misses that caused
      L2 DTLB misses and therefore also caused page table walks.  This
      results in a much more accurate view of data TLB performance:
      
              60,961,781      L1-dcache-loads
                   4,601      dTLB-loads
                     963      dTLB-load-misses          #   20.93% of all dTLB cache hits
      
      Note that for all AMD families, data loads and stores are combined
      in a single accesses counter, so no 'L1-dcache-stores' are reported
      separately, and stores are counted with loads in 'L1-dcache-loads'.
      
      Also note that the "% of all dTLB cache hits" string is misleading
      because (a) "dTLB cache": although TLBs can be considered caches for
      page tables, in this context, it can be misinterpreted as data cache
      hits because the figures are similar (at least on Intel), and (b) not
      all those loads (technically accesses) technically "hit" at that
      hardware level.  "% of all dTLB accesses" would be more clear/accurate.
      
      Instruction TLB
      
      On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the
      STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in
      the STLB and completed a page table walk.
      
      For AMD Family 17h and above, for 'iTLB-loads' we replace the
      erroneous instruction cache fetches counter with PMCx084
      "L1 ITLB Miss, L2 ITLB Hit".
      
      For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss,
      L2 ITLB Miss", but set a 0xff umask because without it the event
      does not get counted.
      
      Branch Predictor (BPU)
      
      PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families.
      
      Node Level Events
      
      Family 17h does not have a PMCx0e9 counter, and corresponding counters
      have not been made available publicly, so for now, we mark them as
      unsupported for Families 17h and above.
      
      Reference:
      
        "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh"
        Released 7/17/2018, Publication #56255, Revision 3.03:
        https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf
      
      [ mingo: tidied up the line breaks. ]
      Signed-off-by: NKim Phillips <kim.phillips@amd.com>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-perf-users@vger.kernel.org
      Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3f8497cf
  11. 02 5月, 2019 1 次提交
  12. 27 4月, 2019 2 次提交
    • K
      perf/x86: Fix incorrect PEBS_REGS · 90e17512
      Kan Liang 提交于
      commit 9d5dcc93a6ddfc78124f006ccd3637ce070ef2fc upstream.
      
      PEBS_REGS used as mask for the supported registers for large PEBS.
      However, the mask cannot filter the sample_regs_user/sample_regs_intr
      correctly.
      
      (1ULL << PERF_REG_X86_*) should be used to replace PERF_REG_X86_*, which
      is only the index.
      
      Rename PEBS_REGS to PEBS_GP_REGS, because the mask is only for general
      purpose registers.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Fixes: 2fe1bc1f ("perf/x86: Enable free running PEBS for REGS_USER/INTR")
      Link: https://lkml.kernel.org/r/20190402194509.2832-2-kan.liang@linux.intel.com
      [ Renamed it to PEBS_GP_REGS - as 'GPRS' is used elsewhere ;-) ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90e17512
    • K
      perf/x86/amd: Add event map for AMD Family 17h · f45829e6
      Kim Phillips 提交于
      commit 3fe3331bb285700ab2253dbb07f8e478fcea2f1b upstream.
      
      Family 17h differs from prior families by:
      
       - Does not support an L2 cache miss event
       - It has re-enumerated PMC counters for:
         - L2 cache references
         - front & back end stalled cycles
      
      So we add a new amd_f17h_perfmon_event_map[] so that the generic
      perf event names will resolve to the correct h/w events on
      family 17h and above processors.
      
      Reference sections 2.1.13.3.3 (stalls) and 2.1.13.3.6 (L2):
      
        https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdfSigned-off-by: NKim Phillips <kim.phillips@amd.com>
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Liška <mliska@suse.cz>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: e40ed154 ("perf/x86: Add perf support for AMD family-17h processors")
      [ Improved the formatting a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f45829e6
  13. 17 4月, 2019 3 次提交
    • L
      x86/perf/amd: Remove need to check "running" bit in NMI handler · 05626465
      Lendacky, Thomas 提交于
      commit 3966c3feca3fd10b2935caa0b4a08c7dd59469e5 upstream.
      
      Spurious interrupt support was added to perf in the following commit, almost
      a decade ago:
      
        63e6be6d ("perf, x86: Catch spurious interrupts after disabling counters")
      
      The two previous patches (resolving the race condition when disabling a
      PMC and NMI latency mitigation) allow for the removal of this older
      spurious interrupt support.
      
      Currently in x86_pmu_stop(), the bit for the PMC in the active_mask bitmap
      is cleared before disabling the PMC, which sets up a race condition. This
      race condition was mitigated by introducing the running bitmap. That race
      condition can be eliminated by first disabling the PMC, waiting for PMC
      reset on overflow and then clearing the bit for the PMC in the active_mask
      bitmap. The NMI handler will not re-enable a disabled counter.
      
      If x86_pmu_stop() is called from the perf NMI handler, the NMI latency
      mitigation support will guard against any unhandled NMI messages.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05626465
    • L
      x86/perf/amd: Resolve NMI latency issues for active PMCs · 23d39b0a
      Lendacky, Thomas 提交于
      commit 6d3edaae16c6c7d238360f2841212c2b26774d5e upstream.
      
      On AMD processors, the detection of an overflowed PMC counter in the NMI
      handler relies on the current value of the PMC. So, for example, to check
      for overflow on a 48-bit counter, bit 47 is checked to see if it is 1 (not
      overflowed) or 0 (overflowed).
      
      When the perf NMI handler executes it does not know in advance which PMC
      counters have overflowed. As such, the NMI handler will process all active
      PMC counters that have overflowed. NMI latency in newer AMD processors can
      result in multiple overflowed PMC counters being processed in one NMI and
      then a subsequent NMI, that does not appear to be a back-to-back NMI, not
      finding any PMC counters that have overflowed. This may appear to be an
      unhandled NMI resulting in either a panic or a series of messages,
      depending on how the kernel was configured.
      
      To mitigate this issue, add an AMD handle_irq callback function,
      amd_pmu_handle_irq(), that will invoke the common x86_pmu_handle_irq()
      function and upon return perform some additional processing that will
      indicate if the NMI has been handled or would have been handled had an
      earlier NMI not handled the overflowed PMC. Using a per-CPU variable, a
      minimum value of the number of active PMCs or 2 will be set whenever a
      PMC is active. This is used to indicate the possible number of NMIs that
      can still occur. The value of 2 is used for when an NMI does not arrive
      at the LAPIC in time to be collapsed into an already pending NMI. Each
      time the function is called without having handled an overflowed counter,
      the per-CPU value is checked. If the value is non-zero, it is decremented
      and the NMI indicates that it handled the NMI. If the value is zero, then
      the NMI indicates that it did not handle the NMI.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23d39b0a
    • L
      x86/perf/amd: Resolve race condition when disabling PMC · e5a791b4
      Lendacky, Thomas 提交于
      commit 914123fa39042e651d79eaf86bbf63a1b938dddf upstream.
      
      On AMD processors, the detection of an overflowed counter in the NMI
      handler relies on the current value of the counter. So, for example, to
      check for overflow on a 48 bit counter, bit 47 is checked to see if it
      is 1 (not overflowed) or 0 (overflowed).
      
      There is currently a race condition present when disabling and then
      updating the PMC. Increased NMI latency in newer AMD processors makes this
      race condition more pronounced. If the counter value has overflowed, it is
      possible to update the PMC value before the NMI handler can run. The
      updated PMC value is not an overflowed value, so when the perf NMI handler
      does run, it will not find an overflowed counter. This may appear as an
      unknown NMI resulting in either a panic or a series of messages, depending
      on how the kernel is configured.
      
      To eliminate this race condition, the PMC value must be checked after
      disabling the counter. Add an AMD function, amd_pmu_disable_all(), that
      will wait for the NMI handler to reset any active and overflowed counter
      after calling x86_pmu_disable_all().
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org> # 4.14.x-
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/Message-ID:
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e5a791b4
  14. 06 4月, 2019 1 次提交
  15. 24 3月, 2019 1 次提交
    • K
      perf/x86/intel/uncore: Fix client IMC events return huge result · 30cedf18
      Kan Liang 提交于
      commit 8041ffd36f42d8521d66dd1e236feb58cecd68bc upstream.
      
      The client IMC bandwidth events currently return very large values:
      
        $ perf stat -e uncore_imc/data_reads/ -e uncore_imc/data_writes/ -I 10000 -a
      
        10.000117222 34,788.76 MiB uncore_imc/data_reads/
        10.000117222 8.26 MiB uncore_imc/data_writes/
        20.000374584 34,842.89 MiB uncore_imc/data_reads/
        20.000374584 10.45 MiB uncore_imc/data_writes/
        30.000633299 37,965.29 MiB uncore_imc/data_reads/
        30.000633299 323.62 MiB uncore_imc/data_writes/
        40.000891548 41,012.88 MiB uncore_imc/data_reads/
        40.000891548 6.98 MiB uncore_imc/data_writes/
        50.001142480 1,125,899,906,621,494.75 MiB uncore_imc/data_reads/
        50.001142480 6.97 MiB uncore_imc/data_writes/
      
      The client IMC events are freerunning counters. They still use the
      old event encoding format (0x1 for data_read and 0x2 for data write).
      The counter bit width is calculated by common code, which assume that
      the standard encoding format is used for the freerunning counters.
      Error bit width information is calculated.
      
      The patch intends to convert the old client IMC event encoding to the
      standard encoding format.
      
      Current common code uses event->attr.config which directly copy from
      user space. We should not implicitly modify it for a converted event.
      The event->hw.config is used to replace the event->attr.config in
      common code.
      
      For client IMC events, the event->attr.config is used to calculate a
      converted event with standard encoding format in the custom
      event_init(). The converted event is stored in event->hw.config.
      For other events of freerunning counters, they already use the standard
      encoding format. The same value as event->attr.config is assigned to
      event->hw.config in common event_init().
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Tested-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org # v4.18+
      Fixes: 9aae1780 ("perf/x86/intel/uncore: Clean up client IMC uncore")
      Link: https://lkml.kernel.org/r/20190227165729.1861-1-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30cedf18
  16. 19 3月, 2019 3 次提交
  17. 14 3月, 2019 3 次提交