1. 25 9月, 2014 1 次提交
  2. 28 7月, 2014 3 次提交
    • M
      powerpc/perf: Add per-event excludes on Power8 · 9de5cb0f
      Michael Ellerman 提交于
      Power8 has a new register (MMCR2), which contains individual freeze bits
      for each counter. This is an improvement on previous chips as it means
      we can have multiple events on the PMU at the same time with different
      exclude_{user,kernel,hv} settings. Previously we had to ensure all
      events on the PMU had the same exclude settings.
      
      The core of the patch is fairly simple. We use the 207S feature flag to
      indicate that the PMU backend supports per-event excludes, if it's set
      we skip the generic logic that enforces the equality of excludes between
      events. We also use that flag to skip setting the freeze bits in MMCR0,
      the PMU backend is expected to have handled setting them in MMCR2.
      
      The complication arises with EBB. The FCxP bits in MMCR2 are accessible
      R/W to a task using EBB. Which means a task using EBB will be able to
      see that we are using MMCR2 for freezing, whereas the old logic which
      used MMCR0 is not user visible.
      
      The task can not see or affect exclude_kernel & exclude_hv, so we only
      need to consider exclude_user.
      
      The table below summarises the behaviour both before and after this
      commit is applied:
      
       exclude_user           true  false
       ------------------------------------
              | User visible |  N    N
       Before | Can freeze   |  Y    Y
              | Can unfreeze |  N    Y
       ------------------------------------
              | User visible |  Y    Y
        After | Can freeze   |  Y    Y
              | Can unfreeze |  Y/N  Y
       ------------------------------------
      
      So firstly I assert that the simple visibility of the exclude_user
      setting in MMCR2 is a non-issue. The event belongs to the task, and
      was most likely created by the task. So the exclude_user setting is not
      privileged information in any way.
      
      Secondly, the behaviour in the exclude_user = false case is unchanged.
      This is important as it is the case that is actually useful, ie. the
      event is created with no exclude setting and the task uses MMCR2 to
      implement exclusion manually.
      
      For exclude_user = true there is no meaningful change to freezing the
      event. Previously the task could use MMCR2 to freeze the event, though
      it was already frozen with MMCR0. With the new code the task can use
      MMCR2 to freeze the event, though it was already frozen with MMCR2.
      
      The only real change is when exclude_user = true and the task tries to
      use MMCR2 to unfreeze the event. Previously this had no effect, because
      the event was already frozen in MMCR0. With the new code the task can
      unfreeze the event in MMCR2, but at some indeterminate time in the
      future the kernel will overwrite its setting and refreeze the event.
      
      Therefore my final assertion is that any task using exclude_user = true
      and also fiddling with MMCR2 was deeply confused before this change, and
      remains so after it.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9de5cb0f
    • M
      powerpc/perf: Pass the struct perf_events down to compute_mmcr() · 8abd818f
      Michael Ellerman 提交于
      To support per-event exclude settings on Power8 we need access to the
      struct perf_events in compute_mmcr().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8abd818f
    • M
      powerpc/perf: Clear all MMCR settings before calling compute_mmcr() · 79a4cb28
      Michael Ellerman 提交于
      Because we reuse cpuhw->mmcr on each call to compute_mmcr() there's a
      risk that we could forget to set one of the values and use whatever
      value was in there previously.
      
      Currently all the implementations are careful to set all the values, but
      it's safer to clear them all before we call compute_mmcr().
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      79a4cb28
  3. 23 7月, 2014 1 次提交
  4. 11 7月, 2014 3 次提交
    • A
      powerpc/perf: Never program book3s PMCs with values >= 0x80000000 · f5602941
      Anton Blanchard 提交于
      We are seeing a lot of PMU warnings on POWER8:
      
          Can't find PMC that caused IRQ
      
      Looking closer, the active PMC is 0 at this point and we took a PMU
      exception on the transition from negative to 0. Some versions of POWER8
      have an issue where they edge detect and not level detect PMC overflows.
      
      A number of places program the PMC with (0x80000000 - period_left),
      where period_left can be negative. We can either fix all of these or
      just ensure that period_left is always >= 1.
      
      This patch takes the second option.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f5602941
    • J
      powerpc/perf: Clear MMCR2 when enabling PMU · b50a6c58
      Joel Stanley 提交于
      On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze
      the PMU counters. Aside from on boot they are then never reset,
      resulting in stuck perf counters for any user in the guest or host.
      
      We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane
      state for perf to use the PMU counters under either the guest or the
      host.
      
      This was manifesting as a bug with ppc64_cpu --frequency:
      
          $ sudo ppc64_cpu --frequency
          WARNING: couldn't run on cpu 0
          WARNING: couldn't run on cpu 8
            ...
          WARNING: couldn't run on cpu 144
          WARNING: couldn't run on cpu 152
          min:    18446744073.710 GHz (cpu -1)
          max:    0.000 GHz (cpu -1)
          avg:    0.000 GHz
      
      The command uses a perf counter to measure CPU cycles over a fixed
      amount of time, in order to approximate the frequency of the machine.
      The counters were returning zero once a guest was started, regardless of
      weather it was still running or had been shut down.
      
      By dumping the value of MMCR2, it was observed that once a guest is
      running MMCR2 is set to 1s - which stops counters from running:
      
          $ sudo sh -c 'echo p > /proc/sysrq-trigger'
          CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6
          PMC1:  5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
          PMC5:  1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef
          MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000
          MMCR2: fffffffffffffc00 EBBHR: 0000000000000000
          EBBRR: 0000000000000000 BESCR: 0000000000000000
          SIAR:  00000000000a51cc SDAR:  c00000000fc40000 SIER:  0000000001000000
      
      This is done unconditionally in book3s_hv_interrupts.S upon entering the
      guest, and the original value is only save/restored if the host has
      indicated it was using the PMU. This is okay, however the user of the
      PMU needs to ensure that it is in a defined state when it starts using
      it.
      
      Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b50a6c58
    • J
      powerpc/perf: Add PPMU_ARCH_207S define · 4d9690dd
      Joel Stanley 提交于
      Instead of separate bits for every POWER8 PMU feature, have a single one
      for v2.07 of the architecture.
      
      This saves us adding a MMCR2 define for a future patch.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJoel Stanley <joel@jms.id.au>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4d9690dd
  5. 24 3月, 2014 4 次提交
  6. 11 2月, 2014 1 次提交
    • A
      powerpc/perf: Configure BHRB filter before enabling PMU interrupts · b4d6c06c
      Anshuman Khandual 提交于
      Right now the config_bhrb() PMU specific call happens after
      write_mmcr0(), which actually enables the PMU for event counting and
      interrupts. So there is a small window of time where the PMU and BHRB
      runs without the required HW branch filter (if any) enabled in BHRB.
      
      This can cause some of the branch samples to be collected through BHRB
      without any filter applied and hence affects the correctness of
      the results. This patch moves the BHRB config function call before
      enabling interrupts.
      
      Here are some data points captured via trace prints which depicts how we
      could get PMU interrupts with BHRB filter NOT enabled with a standard
      perf record command line (asking for branch record information as well).
      
          $ perf record -j any_call ls
      
      Before the patch:-
      
          ls-1962  [003] d...  2065.299590: .perf_event_interrupt: MMCRA: 40000000000
          ls-1962  [003] d...  2065.299603: .perf_event_interrupt: MMCRA: 40000000000
          ...
      
          All the PMU interrupts before this point did not have the requested
          HW branch filter enabled in the MMCRA.
      
          ls-1962  [003] d...  2065.299647: .perf_event_interrupt: MMCRA: 40040000000
          ls-1962  [003] d...  2065.299662: .perf_event_interrupt: MMCRA: 40040000000
      
      After the patch:-
      
          ls-1850  [008] d...   190.311828: .perf_event_interrupt: MMCRA: 40040000000
          ls-1850  [008] d...   190.311848: .perf_event_interrupt: MMCRA: 40040000000
      
          All the PMU interrupts have the requested HW BHRB branch filter
          enabled in MMCRA.
      Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      [mpe: Fixed up whitespace and cleaned up changelog]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b4d6c06c
  7. 14 8月, 2013 1 次提交
  8. 01 8月, 2013 1 次提交
  9. 24 7月, 2013 1 次提交
  10. 01 7月, 2013 6 次提交
  11. 10 6月, 2013 1 次提交
    • M
      powerpc/perf: Fix deadlock caused by calling printk() in PMU exception · 6772faa1
      Michael Ellerman 提交于
      In commit bc09c219 "Fix finding overflowed PMC in interrupt" we added
      a printk() to the PMU exception handler. Unfortunately that is not safe.
      
      The problem is that the PMU exception may run even when interrupts are
      soft disabled, aka NMI context. We do this so that we can profile parts
      of the kernel that have interrupts soft-disabled.
      
      But by calling printk() from the exception handler, we can potentially
      deadlock in the printk code on logbuf_lock, eg:
      
        [c00000038ba575c0] c000000000081928 .vprintk_emit+0xa8/0x540
        [c00000038ba576a0] c0000000007bcde8 .printk+0x48/0x58
        [c00000038ba57710] c000000000076504 .perf_event_interrupt+0x2d4/0x490
        [c00000038ba57810] c00000000001f6f8 .performance_monitor_exception+0x48/0x60
        [c00000038ba57880] c0000000000032cc performance_monitor_common+0x14c/0x180
        --- Exception: f01 (Performance Monitor) at c0000000007b25d4 ._raw_spin_lock_irq
        +0x64/0xc0
        [c00000038ba57bf0] c00000000007ed90 .devkmsg_read+0xd0/0x5a0
        [c00000038ba57d00] c0000000001c2934 .vfs_read+0xc4/0x1e0
        [c00000038ba57d90] c0000000001c2cd8 .SyS_read+0x58/0xd0
        [c00000038ba57e30] c000000000009d54 syscall_exit+0x0/0x98
        --- Exception: c01 (System Call) at 00001fffffbf6f7c
        SP (3ffff6d4de10) is in userspace
      
      Fix it by making sure we only call printk() when we are not in NMI
      context.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Cc: <stable@vger.kernel.org> # 3.9
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6772faa1
  12. 01 6月, 2013 2 次提交
  13. 14 5月, 2013 3 次提交
  14. 26 4月, 2013 6 次提交
  15. 01 2月, 2013 1 次提交
    • S
      perf/POWER7: Make generic event translations available in sysfs · 1c53a270
      Sukadev Bhattiprolu 提交于
      Make the generic perf events in POWER7 available via sysfs.
      
      	$ ls /sys/bus/event_source/devices/cpu/events
      	branch-instructions
      	branch-misses
      	cache-misses
      	cache-references
      	cpu-cycles
      	instructions
      	stalled-cycles-backend
      	stalled-cycles-frontend
      
      	$ cat /sys/bus/event_source/devices/cpu/events/cache-misses
      	event=0x400f0
      
      This patch is based on commits that implement this functionality on x86.
      Eg:
      	commit a4747393
      	Author: Jiri Olsa <jolsa@redhat.com>
      	Date:   Wed Oct 10 14:53:11 2012 +0200
      
      	    perf/x86: Make hardware event translations available in sysfs
      
      Changelog:[v2]
      	[Jiri Osla] Drop EVENT_ID() macro since it is only used once.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Anton Blanchard <anton@au1.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: linuxppc-dev@ozlabs.org
      Link: http://lkml.kernel.org/r/20130123062454.GD13720@us.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1c53a270
  16. 29 1月, 2013 1 次提交
  17. 10 1月, 2013 2 次提交
    • M
      powerpc/perf: Fix for PMCs not making progress · e13e895f
      Michael Neuling 提交于
      On POWER7 when we have really small counts left before overflow, we can take a
      PMU IRQ, but the PMC gets wound back to just before the overflow.
      
      If the kernel is setting the PMC to a value just before the overflow, we can
      get interrupted again without the PMC making any progress (ie another buggy
      overflow).  In this case, we can end up making no forward progress, with the
      PMC interrupt returning us to the same count over and over.
      
      The below detects when we are making no forward progress (ie. delta = 0) and
      then increases the amount left before the overflow.  This stops us from locking
      up.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      cc: Paul Mackerras <paulus@samba.org>
      cc: Anton Blanchard <anton@samba.org>
      cc: Linux PPC dev <linuxppc-dev@ozlabs.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e13e895f
    • M
      powerpc/perf: Fix finding overflowed PMC in interrupt · bc09c219
      Michael Neuling 提交于
      If a PMC is about to overflow on a counter that's on an active perf event
      (ie. less than 256 from the end) and a _different_ PMC overflows just at this
      time (a PMC that's not on an active perf event), we currently mark the event as
      found, but in reality it's not as it's likely the other PMC that caused the
      IRQ.  Since we mark it as found the second catch all for overflows doesn't run,
      and we don't reset the overflowing PMC ever.  Hence we keep hitting that same
      PMC IRQ over and over and don't reset the actual overflowing counter.
      
      This is a rewrite of the perf interrupt handler for book3s to get around this.
      We now check to see if any of the PMCs have actually overflowed (ie >=
      0x80000000).  If yes, record it for active counters and just reset it for
      inactive counters.  If it's not overflowed, then we check to see if it's one of
      the buggy power7 counters and if it is, record it and continue.  If none of the
      PMCs match this, then we make note that we couldn't find the PMC that caused
      the IRQ.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      cc: Paul Mackerras <paulus@samba.org>
      cc: Anton Blanchard <anton@samba.org>
      cc: Linux PPC dev <linuxppc-dev@ozlabs.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bc09c219
  18. 18 10月, 2012 1 次提交
  19. 27 9月, 2012 1 次提交