1. 01 7月, 2013 4 次提交
  2. 10 6月, 2013 1 次提交
    • M
      powerpc/perf: Fix deadlock caused by calling printk() in PMU exception · 6772faa1
      Michael Ellerman 提交于
      In commit bc09c219 "Fix finding overflowed PMC in interrupt" we added
      a printk() to the PMU exception handler. Unfortunately that is not safe.
      
      The problem is that the PMU exception may run even when interrupts are
      soft disabled, aka NMI context. We do this so that we can profile parts
      of the kernel that have interrupts soft-disabled.
      
      But by calling printk() from the exception handler, we can potentially
      deadlock in the printk code on logbuf_lock, eg:
      
        [c00000038ba575c0] c000000000081928 .vprintk_emit+0xa8/0x540
        [c00000038ba576a0] c0000000007bcde8 .printk+0x48/0x58
        [c00000038ba57710] c000000000076504 .perf_event_interrupt+0x2d4/0x490
        [c00000038ba57810] c00000000001f6f8 .performance_monitor_exception+0x48/0x60
        [c00000038ba57880] c0000000000032cc performance_monitor_common+0x14c/0x180
        --- Exception: f01 (Performance Monitor) at c0000000007b25d4 ._raw_spin_lock_irq
        +0x64/0xc0
        [c00000038ba57bf0] c00000000007ed90 .devkmsg_read+0xd0/0x5a0
        [c00000038ba57d00] c0000000001c2934 .vfs_read+0xc4/0x1e0
        [c00000038ba57d90] c0000000001c2cd8 .SyS_read+0x58/0xd0
        [c00000038ba57e30] c000000000009d54 syscall_exit+0x0/0x98
        --- Exception: c01 (System Call) at 00001fffffbf6f7c
        SP (3ffff6d4de10) is in userspace
      
      Fix it by making sure we only call printk() when we are not in NMI
      context.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Cc: <stable@vger.kernel.org> # 3.9
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6772faa1
  3. 01 6月, 2013 2 次提交
  4. 14 5月, 2013 3 次提交
  5. 26 4月, 2013 6 次提交
  6. 01 2月, 2013 1 次提交
    • S
      perf/POWER7: Make generic event translations available in sysfs · 1c53a270
      Sukadev Bhattiprolu 提交于
      Make the generic perf events in POWER7 available via sysfs.
      
      	$ ls /sys/bus/event_source/devices/cpu/events
      	branch-instructions
      	branch-misses
      	cache-misses
      	cache-references
      	cpu-cycles
      	instructions
      	stalled-cycles-backend
      	stalled-cycles-frontend
      
      	$ cat /sys/bus/event_source/devices/cpu/events/cache-misses
      	event=0x400f0
      
      This patch is based on commits that implement this functionality on x86.
      Eg:
      	commit a4747393
      	Author: Jiri Olsa <jolsa@redhat.com>
      	Date:   Wed Oct 10 14:53:11 2012 +0200
      
      	    perf/x86: Make hardware event translations available in sysfs
      
      Changelog:[v2]
      	[Jiri Osla] Drop EVENT_ID() macro since it is only used once.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anton Blanchard <anton@au1.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: linuxppc-dev@ozlabs.org
      Link: http://lkml.kernel.org/r/20130123062454.GD13720@us.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1c53a270
  7. 29 1月, 2013 1 次提交
  8. 10 1月, 2013 2 次提交
    • M
      powerpc/perf: Fix for PMCs not making progress · e13e895f
      Michael Neuling 提交于
      On POWER7 when we have really small counts left before overflow, we can take a
      PMU IRQ, but the PMC gets wound back to just before the overflow.
      
      If the kernel is setting the PMC to a value just before the overflow, we can
      get interrupted again without the PMC making any progress (ie another buggy
      overflow).  In this case, we can end up making no forward progress, with the
      PMC interrupt returning us to the same count over and over.
      
      The below detects when we are making no forward progress (ie. delta = 0) and
      then increases the amount left before the overflow.  This stops us from locking
      up.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      cc: Paul Mackerras <paulus@samba.org>
      cc: Anton Blanchard <anton@samba.org>
      cc: Linux PPC dev <linuxppc-dev@ozlabs.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e13e895f
    • M
      powerpc/perf: Fix finding overflowed PMC in interrupt · bc09c219
      Michael Neuling 提交于
      If a PMC is about to overflow on a counter that's on an active perf event
      (ie. less than 256 from the end) and a _different_ PMC overflows just at this
      time (a PMC that's not on an active perf event), we currently mark the event as
      found, but in reality it's not as it's likely the other PMC that caused the
      IRQ.  Since we mark it as found the second catch all for overflows doesn't run,
      and we don't reset the overflowing PMC ever.  Hence we keep hitting that same
      PMC IRQ over and over and don't reset the actual overflowing counter.
      
      This is a rewrite of the perf interrupt handler for book3s to get around this.
      We now check to see if any of the PMCs have actually overflowed (ie >=
      0x80000000).  If yes, record it for active counters and just reset it for
      inactive counters.  If it's not overflowed, then we check to see if it's one of
      the buggy power7 counters and if it is, record it and continue.  If none of the
      PMCs match this, then we make note that we couldn't find the PMC that caused
      the IRQ.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      cc: Paul Mackerras <paulus@samba.org>
      cc: Anton Blanchard <anton@samba.org>
      cc: Linux PPC dev <linuxppc-dev@ozlabs.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bc09c219
  9. 18 10月, 2012 1 次提交
  10. 27 9月, 2012 1 次提交
  11. 05 9月, 2012 1 次提交
  12. 24 8月, 2012 1 次提交
    • S
      powerpc/perf: Use pmc_overflow() to detect rolled back events · 81331211
      Sukadev Bhattiprolu 提交于
      For certain speculative events on Power7, 'perf stat' reports far higher
      event count than 'perf record' for the same event.
      
      As described in following commit, a performance monitor exception is raised
      even when the the performance events are rolled back.
      
              commit 0837e324
              Author: Anton Blanchard <anton@samba.org>
              Date:   Wed Mar 9 14:38:42 2011 +1100
      
      perf_event_interrupt() records an event only when an overflow occurs. But
      this check for overflow is a simple 'if (val < 0)'.
      
      Because the events are rolled back, this check for overflow fails and the
      event is not recorded. perf_event_interrupt() later uses pmc_overflow() to
      detect the overflow and resets the counters and the events are lost completely.
      
      To properly detect the overflow of rolled back events, use pmc_overflow()
      even when recording events.
      
      To reproduce:
              $ cat strcpy.c
              #include <stdio.h>
              #include <string.h>
              main()
              {
                      char buf[256];
      
                      alarm(5);
                      while(1)
                              strcpy(buf, "string1");
              }
      
              $ perf record -e r20014 ./strcpy
              $ perf report -n > report.1
              $ perf stat -e r20014 > report.2
              # Compare report.1 and report.2
      Reported-by: NMaynard Johnson <mpjohn@us.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      81331211
  13. 10 7月, 2012 3 次提交
  14. 09 5月, 2012 1 次提交
  15. 28 3月, 2012 1 次提交
    • B
      powerpc/perf: Fix instruction address sampling on 970 and Power4 · 1ce447b9
      Benjamin Herrenschmidt 提交于
      970 and Power4 don't support "continuous sampling" which means that
      when we aren't in marked instruction sampling mode (marked events),
      SIAR isn't updated with the last instruction sampled before the
      perf interrupt. On those processors, we must thus use the exception
      SRR0 value as the sampled instruction pointer.
      
      Those processors also don't support the SIPR and SIHV bits in MMCRA
      which means we need some kind of heuristic to decide if SIAR values
      represent kernel or user addresses.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1ce447b9
  16. 23 2月, 2012 1 次提交
  17. 16 2月, 2012 1 次提交
  18. 21 12月, 2011 1 次提交
    • P
      perf, arch: Rework perf_event_index() · 35edc2a5
      Peter Zijlstra 提交于
      Put the logic to compute the event index into a per pmu method. This
      is required because the x86 rules are weird and wonderful and don't
      match the capabilities of the current scheme.
      
      AFAIK only powerpc actually has a usable userspace read of the PMCs
      but I'm not at all sure anybody actually used that.
      
      ARM is restored to the default since it currently does not support
      userspace access at all. And all software events are provided with a
      method that reports their index as 0 (disabled).
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Arun Sharma <asharma@fb.com>
      Link: http://lkml.kernel.org/n/tip-dfydxodki16lylkt3gl2j7cw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      35edc2a5
  19. 19 7月, 2011 1 次提交
  20. 01 7月, 2011 1 次提交
    • P
      perf: Remove the nmi parameter from the swevent and overflow interface · a8b0ca17
      Peter Zijlstra 提交于
      The nmi parameter indicated if we could do wakeups from the current
      context, if not, we would set some state and self-IPI and let the
      resulting interrupt do the wakeup.
      
      For the various event classes:
      
        - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
          the PMI-tail (ARM etc.)
        - tracepoint: nmi=0; since tracepoint could be from NMI context.
        - software: nmi=[0,1]; some, like the schedule thing cannot
          perform wakeups, and hence need 0.
      
      As one can see, there is very little nmi=1 usage, and the down-side of
      not using it is that on some platforms some software events can have a
      jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
      
      The up-side however is that we can remove the nmi parameter and save a
      bunch of conditionals in fast paths.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8b0ca17
  21. 18 4月, 2011 1 次提交
    • E
      powerpc/perf_event: Skip updating kernel counters if register value shrinks · 86c74ab3
      Eric B Munson 提交于
      Because of speculative event roll back, it is possible for some event coutners
      to decrease between reads on POWER7.  This causes a problem with the way that
      counters are updated.  Delta calues are calculated in a 64 bit value and the
      top 32 bits are masked.  If the register value has decreased, this leaves us
      with a very large positive value added to the kernel counters.  This patch
      protects against this by skipping the update if the delta would be negative.
      This can lead to a lack of precision in the coutner values, but from my testing
      the value is typcially fewer than 10 samples at a time.
      Signed-off-by: NEric B Munson <emunson@mgebm.net>
      Cc: stable@kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      86c74ab3
  22. 31 3月, 2011 1 次提交
  23. 16 3月, 2011 1 次提交
  24. 17 1月, 2011 1 次提交
    • A
      powerpc: perf: Fix frequency calculation for overflowing counters · 4bca770e
      Anton Blanchard 提交于
      When profiling a benchmark that is almost 100% userspace, I noticed some wildly
      inaccurate profiles that showed almost all time spent in the kernel.
      
      Closer examination shows we were programming a tiny number of cycles into the
      PMU after each overflow (about ~200 away from the next overflow). This gets us
      stuck in a loop which we eventually break out of by throttling the PMU (there
      are regular throttle/unthrottle events in the log).
      
      It looks like we aren't setting event->hw.last_period to something same and the
      frequency to period calculations in perf are going haywire.
      
      With the following patch we find the correct period after a few interrupts and
      stay there. I also see no more throttle events.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      LKML-Reference: <20110117161742.5feb3761@kryten>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4bca770e
  25. 16 12月, 2010 1 次提交
    • P
      perf: Dynamic pmu types · 2e80a82a
      Peter Zijlstra 提交于
      Extend the perf_pmu_register() interface to allow for named and
      dynamic pmu types.
      
      Because we need to support the existing static types we cannot use
      dynamic types for everything, hence provide a type argument.
      
      If we want to enumerate the PMUs they need a name, provide one.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101117222056.259707703@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e80a82a
  26. 19 10月, 2010 1 次提交
    • P
      perf, powerpc: Fix power_pmu_event_init to not use event->ctx · 57fa7214
      Paul Mackerras 提交于
      Commit c3f00c70 ("perf: Separate find_get_context() from event
      initialization") changed the generic perf_event code to call
      perf_event_alloc, which calls the arch-specific event_init code,
      before looking up the context for the new event.  Unfortunately,
      power_pmu_event_init uses event->ctx->task to see whether the
      new event is a per-task event or a system-wide event, and thus
      crashes since event->ctx is NULL at the point where
      power_pmu_event_init gets called.
      
      (The reason it needs to know whether it is a per-task event is
      because there are some hardware events on Power systems which
      only count when the processor is not idle, and there are some
      fixed-function counters which count such events.  For example,
      the "run cycles" event counts cycles when the processor is not
      idle.  If the user asks to count cycles, we can use "run cycles"
      if this is a per-task event, since the processor is running when
      the task is running, by definition.  We can't use "run cycles"
      if the user asks for "cycles" on a system-wide counter.)
      
      Fortunately the information we need is in the
      event->attach_state field, so we just use that instead.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20101019055535.GA10398@drongo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Reported-by: NAlexey Kardashevskiy <aik@au1.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57fa7214