1. 23 12月, 2016 1 次提交
    • S
      perf/x86/pebs: Fix handling of PEBS buffer overflows · daa864b8
      Stephane Eranian 提交于
      This patch solves a race condition between PEBS and the PMU handler.
      
      In case multiple PEBS events are sampled at the same time,
      it is possible to have GLOBAL_STATUS bit 62 set indicating
      PEBS buffer overflow and also seeing at most 3 PEBS counters
      having their bits set in the status register. This is a sign
      that there was at least one PEBS record pending at the time
      of the PMU interrupt. PEBS counters must only be processed
      via the drain_pebs() calls, and not via the regular sample
      processing loop coming after that the function, otherwise
      phony regular samples may be generated in the sampling buffer
      not marked with the EXACT tag.
      
      Another possibility is to have one PEBS event and at least
      one non-PEBS event whic hoverflows while PEBS has armed. In this
      case, bit 62 of GLOBAL_STATUS will not be set, yet the overflow
      status bit for the PEBS counter will be on Skylake.
      
      To avoid this problem, we systematically ignore the PEBS-enabled
      counters from the GLOBAL_STATUS mask and we always process PEBS
      events via drain_pebs().
      
      The problem manifested itself by having non-exact samples when
      sampling only PEBS events, i.e., the PERF_SAMPLE_RECORD would
      not have the EXACT flag set.
      
      Note that this problem is only present on Skylake processor.
      This fix is harmless on older processors.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482395366-8992-1-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      daa864b8
  2. 06 12月, 2016 2 次提交
  3. 22 11月, 2016 2 次提交
    • P
      perf/x86/intel/uncore: Allow only a single PMU/box within an events group · 033ac60c
      Peter Zijlstra 提交于
      Group validation expects all events to be of the same PMU; however
      is_uncore_pmu() is too wide, it matches _all_ uncore events, even
      across PMUs.
      
      This triggers failure when we group different events from different
      uncore PMUs, like:
      
        perf stat -vv -e '{uncore_cbox_0/config=0x0334/,uncore_qpi_0/event=1/}' -a sleep 1
      
      Fix is_uncore_pmu() by only matching events to the box at hand.
      
      Note that generic code; ran after this step; will disallow this
      mixture of PMU events.
      Reported-by: NJiri Olsa <jolsa@redhat.com>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20161118125354.GQ3117@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      033ac60c
    • P
      perf/x86/intel: Cure bogus unwind from PEBS entries · b8000586
      Peter Zijlstra 提交于
      Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event
      unwinds sometimes do 'weird' things. In particular, we seemed to be
      ending up unwinding from random places on the NMI stack.
      
      While it was somewhat expected that the event record BP,SP would not
      match the interrupt BP,SP in that the interrupt is strictly later than
      the record event, it was overlooked that it could be on an already
      overwritten stack.
      
      Therefore, don't copy the recorded BP,SP over the interrupted BP,SP
      when we need stack unwinds.
      
      Note that its still possible the unwind doesn't full match the actual
      event, as its entirely possible to have done an (I)RET between record
      and interrupt, but on average it should still point in the general
      direction of where the event came from. Also, it's the best we can do,
      considering.
      
      The particular scenario that triggered the bogus NMI stack unwind was
      a PEBS event with very short period, upon enabling the event at the
      tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly
      triggers a record (while still on the NMI stack) which in turn
      triggers the next PMI. This then causes back-to-back NMIs and we'll
      try and unwind the stack-frame from the last NMI, which obviously is
      now overwritten by our own.
      Analyzed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk>
      Cc: dvyukov@google.com <dvyukov@google.com>
      Cc: stable@vger.kernel.org
      Fixes: ca037701 ("perf, x86: Add PEBS infrastructure")
      Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b8000586
  4. 16 11月, 2016 2 次提交
  5. 11 11月, 2016 1 次提交
  6. 28 10月, 2016 1 次提交
    • I
      perf/x86/intel: Honour the CPUID for number of fixed counters in hypervisors · f92b7604
      Imre Palik 提交于
      perf doesn't seem to honour the number of fixed counters specified by CPUID
      leaf 0xa. It always assumes that Intel CPUs have at least 3 fixed counters.
      
      So if some of the fixed counters are masked out by the hypervisor, it still
      tries to check/set them.
      
      This patch makes perf behave nicer when the kernel is running under a
      hypervisor that doesn't expose all the counters.
      
      This patch contains some ideas from Matt Wilson.
      Signed-off-by: NImre Palik <imrep@amazon.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Kozyrev <alexander.kozyrev@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Artyom Kuanbekov <artyom.kuanbekov@intel.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Wilson <msw@amazon.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1477037939-15605-1-git-send-email-imrep.amz@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f92b7604
  7. 19 10月, 2016 1 次提交
  8. 17 10月, 2016 3 次提交
  9. 16 10月, 2016 1 次提交
    • D
      perf/x86/intel: Remove an inconsistent NULL check · 5c38181c
      Dan Carpenter 提交于
      Smatch complains that we don't check "event->ctx" consistently.  It's
      never NULL so we can just remove the check.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kernel-janitors@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5c38181c
  10. 22 9月, 2016 1 次提交
  11. 20 9月, 2016 2 次提交
  12. 16 9月, 2016 3 次提交
  13. 15 9月, 2016 1 次提交
  14. 10 9月, 2016 6 次提交
  15. 06 9月, 2016 1 次提交
  16. 05 9月, 2016 2 次提交
  17. 18 8月, 2016 2 次提交
  18. 12 8月, 2016 2 次提交
    • K
      perf/x86/intel/uncore: Add enable_box for client MSR uncore · 95f3be79
      Kan Liang 提交于
      There are bug reports about miscounting uncore counters on some
      client machines like Sandybridge, Broadwell and Skylake. It is
      very likely to be observed on idle systems.
      
      This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
      cleared after Package C7, and nothing will be count.
      The related errata (HSD 158) could be found in:
      
        www.intel.com/content/dam/www/public/us/en/documents/specification-updates/4th-gen-core-family-desktop-specification-update.pdf
      
      This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
      in ->enable_box(). The workaround does not cover all cases. It helps for new
      events after returning from C7. But it cannot prevent C7, it will still
      miscount if a counter is already active.
      
      There is no drawback in leaving it enabled, so it does not need
      disable_box() here.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470925874-59943-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      95f3be79
    • K
      perf/x86/intel/uncore: Fix uncore num_counters · 10e9e7bd
      Kan Liang 提交于
      Some uncore boxes' num_counters value for Haswell server and
      Broadwell server are not correct (too large, off by one).
      
      This issue was found by comparing the code with the document. Although
      there is no bug report from users yet, accessing non-existent counters
      is dangerous and the behavior is undefined: it may cause miscounting or
      even crashes.
      
      This patch makes them consistent with the uncore document.
      Reported-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1470925820-59847-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10e9e7bd
  19. 10 8月, 2016 5 次提交
    • P
      perf/x86/intel: Clean up LBR state tracking · 3e2c1a67
      Peter Zijlstra 提交于
      The lbr_context logic confused me; it appears to me to try and do the
      same thing the pmu::sched_task() callback does now, but limited to
      per-task events.
      
      So rip it out. Afaict this should also improve performance, because I
      think the current code can end up doing lbr_reset() twice, once from
      the pmu::add() and then again from pmu::sched_task(), and MSR writes
      (all 3*16 of them) are expensive!!
      
      While thinking through the cases that need the reset it occured to me
      the first install of an event in an active context needs to reset the
      LBR (who knows what crap is in there), but detecting this case is
      somewhat hard.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3e2c1a67
    • P
      perf/x86/intel: Remove redundant test from intel_pmu_lbr_add() · a5dcff62
      Peter Zijlstra 提交于
      By the time we call pmu::add(), event->ctx must be set, and we
      even already rely on this, so remove that test from
      intel_pmu_lbr_add().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a5dcff62
    • P
      perf/x86/intel: Eliminate dead code in intel_pmu_lbr_del() · c3a61a2c
      Peter Zijlstra 提交于
      Since pmu::del() is always called under perf_pmu_disable(), the block
      conditional on cpuc->enabled is dead.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c3a61a2c
    • P
      perf/x86: Ensure perf_sched_cb_{inc,dec}() is only called from pmu::{add,del}() · 68f7082f
      Peter Zijlstra 提交于
      Currently perf_sched_cb_{inc,dec}() are called from
      pmu::{start,stop}(), which has the problem that this can happen from
      NMI context, this is making it hard to optimize perf_pmu_sched_task().
      
      Furthermore, we really only need this accounting on pmu::{add,del}(),
      so doing it from pmu::{start,stop}() is doing more work than we really
      need.
      
      Introduce x86_pmu::{add,del}() and wire up the LBR and PEBS.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      68f7082f
    • P
      perf/x86/intel: Rework the large PEBS setup code · 09e61b4f
      Peter Zijlstra 提交于
      In order to allow optimizing perf_pmu_sched_task() we must ensure
      perf_sched_cb_{inc,dec}() are no longer called from NMI context; this
      means that pmu::{start,stop}() can no longer use them.
      
      Prepare for this by reworking the whole large PEBS setup code.
      
      The current code relied on the cpuc->pebs_enabled state, however since
      that reflects the current active state as per pmu::{start,stop}() we
      can no longer rely on this.
      
      Introduce two counters: cpuc->n_pebs and cpuc->n_large_pebs which
      count the total number of PEBS events and the number of PEBS events
      that have FREERUNNING set, resp.. With this we can tell if the current
      setup requires a single record interrupt threshold or can use a larger
      buffer.
      
      This also improves the code in that it re-enables the large threshold
      once the PEBS event that required single record gets removed.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      09e61b4f
  20. 14 7月, 2016 1 次提交
    • P
      x86: Audit and remove any remaining unnecessary uses of module.h · eb008eb6
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  In the case of
      some of these which are modular, we can extend that to also include
      files that are building basic support functionality but not related
      to loading or registering the final module; such files also have
      no need whatsoever for module.h
      
      The advantage in removing such instances is that module.h itself
      sources about 15 other headers; adding significantly to what we feed
      cpp, and it can obscure what headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each instance for the
      presence of either and replace as needed.
      
      In the case of crypto/glue_helper.c we delete a redundant instance
      of MODULE_LICENSE in order to delete module.h -- the license info
      is already present at the top of the file.
      
      The uncore change warrants a mention too; it is uncore.c that uses
      module.h and not uncore.h; hence the relocation done there.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-9-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb008eb6