1. 09 3月, 2018 1 次提交
    • K
      perf/x86/intel: Fix large period handling on Broadwell CPUs · f605cfca
      Kan Liang 提交于
      Large fixed period values could be truncated on Broadwell, for example:
      
        perf record -e cycles -c 10000000000
      
      Here the fixed period is 0x2540BE400, but the period which finally applied is
      0x540BE400 - which is wrong.
      
      The reason is that x86_pmu::limit_period() uses an u32 parameter, so the
      high 32 bits of 'period' get truncated.
      
      This bug was introduced in:
      
        commit 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
      
      It's safe to use u64 instead of u32:
      
       - Although the 'left' is s64, the value of 'left' must be positive when
         calling limit_period().
      
       - bdw_limit_period() only modifies the lowest 6 bits, it doesn't touch
         the higher 32 bits.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 294fe0f5 ("perf/x86/intel: Add INST_RETIRED.ALL workarounds")
      Link: http://lkml.kernel.org/r/1519926894-3520-1-git-send-email-kan.liang@linux.intel.com
      [ Rewrote unacceptably bad changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f605cfca
  2. 06 2月, 2018 1 次提交
  3. 24 12月, 2017 2 次提交
    • H
      x86/events/intel/ds: Map debug buffers in cpu_entry_area · c1961a46
      Hugh Dickins 提交于
      The BTS and PEBS buffers both have their virtual addresses programmed into
      the hardware.  This means that any access to them is performed via the page
      tables.  The times that the hardware accesses these are entirely dependent
      on how the performance monitoring hardware events are set up.  In other
      words, there is no way for the kernel to tell when the hardware might
      access these buffers.
      
      To avoid perf crashes, place 'debug_store' allocate pages and map them into
      the cpu_entry_area.
      
      The PEBS fixup buffer does not need this treatment.
      
      [ tglx: Got rid of the kaiser_add_mapping() complication ]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c1961a46
    • T
      x86/cpu_entry_area: Add debugstore entries to cpu_entry_area · 10043e02
      Thomas Gleixner 提交于
      The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual
      addresses which must be visible in any execution context.
      
      So it is required to make these mappings visible to user space when kernel
      page table isolation is active.
      
      Provide enough room for the buffer mappings in the cpu_entry_area so the
      buffers are available in the user space visible page tables.
      
      At the point where the kernel side entry area is populated there is no
      buffer available yet, but the kernel PMD must be populated. To achieve this
      set the entries for these buffers to non present.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: daniel.gruss@iaik.tugraz.at
      Cc: hughd@google.com
      Cc: keescook@google.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      10043e02
  4. 17 12月, 2017 1 次提交
  5. 29 9月, 2017 1 次提交
  6. 29 8月, 2017 1 次提交
    • K
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7
      Kan Liang 提交于
      For understanding how the workload maps to memory channels and hardware
      behavior, it's very important to collect address maps with physical
      addresses. For example, 3D XPoint access can only be found by filtering
      the physical address.
      
      Add a new sample type for physical address.
      
      perf already has a facility to collect data virtual address. This patch
      introduces a function to convert the virtual address to physical address.
      The function is quite generic and can be extended to any architecture as
      long as a virtual address is provided.
      
       - For kernel direct mapping addresses, virt_to_phys is used to convert
         the virtual addresses to physical address.
      
       - For user virtual addresses, __get_user_pages_fast is used to walk the
         pages tables for user physical address.
      
       - This does not work for vmalloc addresses right now. These are not
         resolved, but code to do that could be added.
      
      The new sample type requires collecting the virtual address. The
      virtual address will not be output unless SAMPLE_ADDR is applied.
      
      For security, the physical address can only be exposed to root or
      privileged user.
      Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc7ce9c7
  7. 25 8月, 2017 3 次提交
    • A
      perf/x86: Export some PMU attributes in caps/ directory · b00233b5
      Andi Kleen 提交于
      It can be difficult to figure out for user programs what features
      the x86 CPU PMU driver actually supports. Currently it requires
      grepping in dmesg, but dmesg is not always available.
      
      This adds a caps directory to /sys/bus/event_source/devices/cpu/,
      similar to the caps already used on intel_pt, which can be used to
      discover the available capabilities cleanly.
      
      Three capabilities are defined:
      
       - pmu_name:	Underlying CPU name known to the driver
       - max_precise:	Max precise level supported
       - branches:	Known depth of LBR.
      
      Example:
      
        % grep . /sys/bus/event_source/devices/cpu/caps/*
        /sys/bus/event_source/devices/cpu/caps/branches:32
        /sys/bus/event_source/devices/cpu/caps/max_precise:3
        /sys/bus/event_source/devices/cpu/caps/pmu_name:skylake
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b00233b5
    • A
      perf/x86: Fix data source decoding for Skylake · 6ae5fa61
      Andi Kleen 提交于
      Skylake changed the encoding of the PEBS data source field.
      Some combinations are not available anymore, but some new cases
      e.g. for L4 cache hit are added.
      
      Fix up the conversion table for Skylake, similar as had been done
      for Nehalem.
      
      On Skylake server the encoding for L4 actually means persistent
      memory. Handle this case too.
      
      To properly describe it in the abstracted perf format I had to add
      some new fields. Since a hit can have only one level add a new
      field that is an enumeration, not a bit field to describe
      the level. It can describe any level. Some numbers are also
      used to describe PMEM and LFB.
      
      Also add a new generic remote flag that can be combined with
      the generic level to signify a remote cache.
      
      And there is an extension field for the snoop indication to handle
      the Forward state.
      
      I didn't add a generic flag for hops because it's not needed
      for Skylake.
      
      I changed the existing encodings for older CPUs to also fill in the
      new level and remote fields.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6ae5fa61
    • A
      perf/x86: Move Nehalem PEBS code to flag · 95298355
      Andi Kleen 提交于
      Minor cleanup: use an explicit x86_pmu flag to handle the
      missing Lock / TLB information on Nehalem, instead of always
      checking the model number for each PEBS sample.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/20170816222156.19953-2-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95298355
  8. 18 7月, 2017 1 次提交
    • K
      perf/x86/intel: Add Goldmont Plus CPU PMU support · dd0b06b5
      Kan Liang 提交于
      Add perf core PMU support for Intel Goldmont Plus CPU cores:
      
       - The init code is based on Goldmont.
       - There is a new cache event list, based on the Goldmont cache event
         list.
       - All four general-purpose performance counters support PEBS.
       - The first general-purpose performance counter is for reduced skid
         PEBS mechanism. Using :ppp to indicate the event which want to do
         reduced skid PEBS.
       - Goldmont Plus has 4-wide pipeline for Topdown
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/20170712134423.17766-1-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd0b06b5
  9. 23 5月, 2017 1 次提交
    • K
      perf/x86: Add sysfs entry to freeze counters on SMI · 6089327f
      Kan Liang 提交于
      Currently, the SMIs are visible to all performance counters, because
      many users want to measure everything including SMIs. But in some
      cases, the SMI cycles should not be counted - for example, to calculate
      the cost of an SMI itself. So a knob is needed.
      
      When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
      counters will be effected. There is no way to do per-counter freeze
      on SMI. So it should not use the per-event interface (e.g. ioctl or
      event attribute) to set FREEZE_WHILE_SMM bit.
      
      Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
      bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
      while in SMM.
      
      Value has to be 0 or 1. It will be applied to all processors.
      
      Also serialize the entire setting so we don't get multiple concurrent
      threads trying to update to different values.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: bp@alien8.de
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/1494600673-244667-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6089327f
  10. 14 4月, 2017 1 次提交
    • K
      perf/x86: Fix spurious NMI with PEBS Load Latency event · fd583ad1
      Kan Liang 提交于
      Spurious NMIs will be observed with the following command:
      
        while :; do
          perf record -bae "cpu/umask=0x01,event=0xcd,ldlat=0x80/pp"
                        -e "cpu/umask=0x03,event=0x0/"
                        -e "cpu/umask=0x02,event=0x0/"
                        -e cycles,branches,cache-misses
                        -e cache-references -- sleep 10
        done
      
      The bug was introduced by commit:
      
        8077eca0 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
      
      That commit clears the status bits for the counters used for PEBS
      events, by masking the whole 64 bits pebs_enabled. However, only the
      low 32 bits of both status and pebs_enabled are reserved for PEBS-able
      counters.
      
      For status bits 32-34 are fixed counter overflow bits. For
      pebs_enabled bits 32-34 are for PEBS Load Latency.
      
      In the test case, the PEBS Load Latency event and fixed counter event
      could overflow at the same time. The fixed counter overflow bit will
      be cleared by mistake. Once it is cleared, the fixed counter overflow
      never be processed, which finally trigger spurious NMI.
      
      Correct the PEBS enabled mask by ignoring the non-PEBS bits.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 8077eca0 ("perf/x86/pebs: Add workaround for broken OVFL status on HSW+")
      Link: http://lkml.kernel.org/r/1491333246-3965-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd583ad1
  11. 11 12月, 2016 1 次提交
  12. 22 11月, 2016 1 次提交
    • P
      perf/x86/intel: Cure bogus unwind from PEBS entries · b8000586
      Peter Zijlstra 提交于
      Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
      unwinds sometimes do 'weird' things. In particular, we seemed to be
      ending up unwinding from random places on the NMI stack.
      
      While it was somewhat expected that the event record BP,SP would not
      match the interrupt BP,SP in that the interrupt is strictly later than
      the record event, it was overlooked that it could be on an already
      overwritten stack.
      
      Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
      when we need stack unwinds.
      
      Note that its still possible the unwind doesn't full match the actual
      event, as its entirely possible to have done an (I)RET between record
      and interrupt, but on average it should still point in the general
      direction of where the event came from. Also, it's the best we can do,
      considering.
      
      The particular scenario that triggered the bogus NMI stack unwind was
      a PEBS event with very short period, upon enabling the event at the
      tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
      triggers a record (while still on the NMI stack) which in turn
      triggers the next PMI. This then causes back-to-back NMIs and we'll
      try and unwind the stack-frame from the last NMI, which obviously is
      now overwritten by our own.
      Analyzed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk>
      Cc: dvyukov@google.com <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: ca037701 ("perf, x86: Add PEBS infrastructure")
      Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b8000586
  13. 10 8月, 2016 3 次提交
    • P
      perf/x86/intel: Clean up LBR state tracking · 3e2c1a67
      Peter Zijlstra 提交于
      The lbr_context logic confused me; it appears to me to try and do the
      same thing the pmu::sched_task() callback does now, but limited to
      per-task events.
      
      So rip it out. Afaict this should also improve performance, because I
      think the current code can end up doing lbr_reset() twice, once from
      the pmu::add() and then again from pmu::sched_task(), and MSR writes
      (all 3*16 of them) are expensive!!
      
      While thinking through the cases that need the reset it occured to me
      the first install of an event in an active context needs to reset the
      LBR (who knows what crap is in there), but detecting this case is
      somewhat hard.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3e2c1a67
    • P
      perf/x86: Ensure perf_sched_cb_{inc,dec}() is only called from pmu::{add,del}() · 68f7082f
      Peter Zijlstra 提交于
      Currently perf_sched_cb_{inc,dec}() are called from
      pmu::{start,stop}(), which has the problem that this can happen from
      NMI context, this is making it hard to optimize perf_pmu_sched_task().
      
      Furthermore, we really only need this accounting on pmu::{add,del}(),
      so doing it from pmu::{start,stop}() is doing more work than we really
      need.
      
      Introduce x86_pmu::{add,del}() and wire up the LBR and PEBS.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      68f7082f
    • P
      perf/x86/intel: Rework the large PEBS setup code · 09e61b4f
      Peter Zijlstra 提交于
      In order to allow optimizing perf_pmu_sched_task() we must ensure
      perf_sched_cb_{inc,dec}() are no longer called from NMI context; this
      means that pmu::{start,stop}() can no longer use them.
      
      Prepare for this by reworking the whole large PEBS setup code.
      
      The current code relied on the cpuc->pebs_enabled state, however since
      that reflects the current active state as per pmu::{start,stop}() we
      can no longer rely on this.
      
      Introduce two counters: cpuc->n_pebs and cpuc->n_large_pebs which
      count the total number of PEBS events and the number of PEBS events
      that have FREERUNNING set, resp.. With this we can tell if the current
      setup requires a single record interrupt threshold or can use a larger
      buffer.
      
      This also improves the code in that it re-enables the large threshold
      once the PEBS event that required single record gets removed.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      09e61b4f
  14. 27 6月, 2016 1 次提交
    • D
      perf/x86/intel: Fix MSR_LAST_BRANCH_FROM_x bug when no TSX · 19fc9ddd
      David Carrillo-Cisneros 提交于
      Intel's SDM states that bits 61:62 in MSR_LAST_BRANCH_FROM_x are the
      TSX flags for formats with LBR_TSX flags (i.e. LBR_FORMAT_EIP_EFLAGS2).
      
      However, when the CPU has TSX support deactivated, bits 61:62 actually
      behave as follows:
      
        - For wrmsr(), bits 61:62 are considered part of the sign extension.
        - When capturing branches, the LBR hw will always clear bits 61:62.
          regardless of the sign extension.
      
      Therefore, if:
      
        1) LBR has TSX format.
        2) CPU has no TSX support enabled.
      
      ... then any value passed to wrmsr() must be sign extended to 63 bits
      and any value from rdmsr() must be converted to have a sign extension
      of 61 bits, ignoring the values at TSX flags.
      
      This bug was masked by the work-around to the Intel's CPU bug:
      BJ94. "LBR May Contain Incorrect Information When Using FREEZE_LBRS_ON_PMI"
      in Document Number: 324643-037US.
      
      The aforementioned work-around uses hw flags to filter out all kernel
      branches, limiting LBR callstack to user level execution only.
      
      Since user addresses are not sign extended, they do not trigger the wrmsr()
      bug in MSR_LAST_BRANCH_FROM_x when saved/restored at context switch.
      
      To verify the hw bug:
      
        $ perf record -b -e cycles sleep 1
        $ rdmsr -p 0 0x680
        0x1fffffffb0b9b0cc
        $ wrmsr -p 0 0x680 0x1fffffffb0b9b0cc
        write(): Input/output error
      
      The quirk for LBR_FROM_ MSRs is required before calls to wrmsrl() and
      after rdmsrl().
      
      This patch introduces it for wrmsrl()'s done for testing LBR support.
      
      Future patch in series adds the quirk for context switch, that would
      be required if LBR callstack is to be enabled for ring 0.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1466533874-52003-3-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      19fc9ddd
  15. 03 6月, 2016 1 次提交
  16. 05 5月, 2016 1 次提交
  17. 23 4月, 2016 2 次提交
  18. 29 3月, 2016 1 次提交
  19. 25 3月, 2016 1 次提交
  20. 08 3月, 2016 3 次提交
  21. 17 2月, 2016 1 次提交
  22. 06 1月, 2016 2 次提交
    • H
      perf/x86/intel: Add perf core PMU support for Intel Knights Landing · 1e7b9390
      Harish Chegondi 提交于
      Knights Landing core is based on Silvermont core with several differences.
      Like Silvermont, Knights Landing has 8 pairs of LBR MSRs. However, the
      LBR MSRs addresses match those of the Xeon cores' first 8 pairs of LBR MSRs
      Unlike Silvermont, Knights Landing supports hyperthreading. Knights Landing
      offcore response events config register mask is different from that of the
      Silvermont.
      
      This patch was developed based on a patch from Andi Kleen.
      
      For more details, please refer to the public document:
      
        https://software.intel.com/sites/default/files/managed/15/8d/IntelXeonPhi%E2%84%A2x200ProcessorPerformanceMonitoringReferenceManual_Volume1_Registers_v0%206.pdfSigned-off-by: NHarish Chegondi <harish.chegondi@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Harish Chegondi <harish.chegondi@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/d14593c7311f78c93c9cf6b006be843777c5ad5c.1449517401.git.harish.chegondi@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e7b9390
    • A
      perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp · 72469764
      Andi Kleen 提交于
      Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as
      base. The basic mechanism of abusing the inverse cmask to get all
      cycles works the same as before.
      
      PREC_DIST is available on Sandy Bridge or later. It had some problems
      on Sandy Bridge, so we only use it on IvyBridge and later. I tested it
      on Broadwell and Skylake.
      
      PREC_DIST has special support for avoiding shadow effects, which can
      give better results compare to UOPS_RETIRED. The drawback is that
      PREC_DIST can only schedule on counter 1, but that is ok for cycle
      sampling, as there is normally no need to do multiple cycle sampling
      runs in parallel. It is still possible to run perf top in parallel, as
      that doesn't use precise mode. Also of course the multiplexing can
      still allow parallel operation.
      
      :pp stays with the previous event.
      
      Example:
      
      Sample a loop with 10 sqrt with old cycles:pp
      
      	  0.14 │10:   sqrtps %xmm1,%xmm0     <--------------
      	  9.13 │      sqrtps %xmm1,%xmm0
      	 11.58 │      sqrtps %xmm1,%xmm0
      	 11.51 │      sqrtps %xmm1,%xmm0
      	  6.27 │      sqrtps %xmm1,%xmm0
      	 10.38 │      sqrtps %xmm1,%xmm0
      	 12.20 │      sqrtps %xmm1,%xmm0
      	 12.74 │      sqrtps %xmm1,%xmm0
      	  5.40 │      sqrtps %xmm1,%xmm0
      	 10.14 │      sqrtps %xmm1,%xmm0
      	 10.51 │    ↑ jmp    10
      
      We expect all 10 sqrt to get roughly the sample number of samples.
      
      But you can see that the instruction directly after the JMP is
      systematically underestimated in the result, due to sampling shadow
      effects.
      
      With the new PREC_DIST based sampling this problem is gone and all
      instructions show up roughly evenly:
      
      	  9.51 │10:   sqrtps %xmm1,%xmm0
      	 11.74 │      sqrtps %xmm1,%xmm0
      	 11.84 │      sqrtps %xmm1,%xmm0
      	  6.05 │      sqrtps %xmm1,%xmm0
      	 10.46 │      sqrtps %xmm1,%xmm0
      	 12.25 │      sqrtps %xmm1,%xmm0
      	 12.18 │      sqrtps %xmm1,%xmm0
      	  5.26 │      sqrtps %xmm1,%xmm0
      	 10.13 │      sqrtps %xmm1,%xmm0
      	 10.43 │      sqrtps %xmm1,%xmm0
      	  0.16 │    ↑ jmp    10
      
      Even with PREC_DIST there is still sampling skid and the result is not
      completely even, but systematic shadow effects are significantly
      reduced.
      
      The improvements are mainly expected to make a difference in high IPC
      code. With low IPC it should be similar.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      72469764
  23. 06 12月, 2015 2 次提交
  24. 23 11月, 2015 3 次提交
    • A
      perf/x86: Handle multiple umask bits for BDW CYCLE_ACTIVITY.* · b7883a1c
      Andi Kleen 提交于
      The earlier constraint fix for Broadwell CYCLE_ACTIVITY.*
      forced umask 8 to counter 2. For this it used UEVENT,
      to match the complete umask.
      
      The event list for Broadwell has an additional
      STALLS_L1D_PENDIND event that uses umask 8, but also
      sets other bits in the umask.  The earlier strict umask match
      didn't handle this case.
      
      Add a new UBIT_EVENT constraint macro that only matches
      the specified bits in the umask. Then use that macro
      to handle CYCLE_ACTIVITY.* on Broadwell.
      
      The documented event also uses cmask, but there's no
      need to let the event scheduler know about the cmask,
      as the scheduling restriction is only tied to the umask.
      Reported-by: NGrant Ayers <ayers@cs.stanford.edu>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1447719667-9998-1-git-send-email-andi@firstfloor.org
      [ Filled in the missing email address of Grant Ayers - hopefully I got the right one. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b7883a1c
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
    • A
      perf/x86: Fix LBR call stack save/restore · b28ae956
      Andi Kleen 提交于
      This fixes a bug I added in the following commit:
      
        90405aa0 ("perf/x86/intel/lbr: Limit LBR accesses to TOS in callstack mode")
      
      The bug could lead to lost LBR call stacks. When restoring the LBR state
      we need to use the TOS of the previous context, not the current context.
      To do that we need to save/restore the TOS.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/1445366797-30894-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b28ae956
  25. 18 9月, 2015 1 次提交
  26. 13 9月, 2015 2 次提交
    • S
      perf/core: Drop PERF_EVENT_TXN · 8f3e5684
      Sukadev Bhattiprolu 提交于
      We currently use PERF_EVENT_TXN flag to determine if we are in the middle
      of a transaction. If in a transaction, we defer the schedulability checks
      from pmu->add() operation to the pmu->commit() operation.
      
      Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
      we can use the type to determine if we are in a transaction and drop the
      PERF_EVENT_TXN flag.
      
      When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
      becomes unused, so drop that field as well.
      
      This is an extension of the Powerpc patch from Peter Zijlstra to s390,
      Sparc and x86 architectures.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f3e5684
    • S
      perf/core: Add a 'flags' parameter to the PMU transactional interfaces · fbbe0701
      Sukadev Bhattiprolu 提交于
      Currently, the PMU interface allows reading only one counter at a time.
      But some PMUs like the 24x7 counters in Power, support reading several
      counters at once. To leveage this functionality, extend the transaction
      interface to support a "transaction type".
      
      The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
      i.e. used to _schedule_ all the events on the PMU as a group. A second
      transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
      by the 24x7 counters to read several counters at once.
      
      Extend the transaction interfaces to the PMU to accept a 'txn_flags'
      parameter and use this parameter to ignore any transactions that are
      not of type PERF_PMU_TXN_ADD.
      
      Thanks to Peter Zijlstra for his input.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      [peterz: s390 compile fix]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fbbe0701
  27. 04 8月, 2015 1 次提交