1. 11 4月, 2017 10 次提交
  2. 04 4月, 2017 1 次提交
    • A
      perf: qcom: Add L3 cache PMU driver · 3071f13d
      Agustin Vega-Frias 提交于
      This adds a new dynamic PMU to the Perf Events framework to program
      and control the L3 cache PMUs in some Qualcomm Technologies SOCs.
      
      The driver supports a distributed cache architecture where the overall
      cache for a socket is comprised of multiple slices each with its own PMU.
      Access to each individual PMU is provided even though all CPUs share all
      the slices. User space needs to aggregate to individual counts to provide
      a global picture.
      
      The driver exports formatting and event information to sysfs so it can
      be used by the perf user space tools with the syntaxes:
         perf stat -a -e l3cache_0_0/read-miss/
         perf stat -a -e l3cache_0_0/event=0x21/
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAgustin Vega-Frias <agustinv@codeaurora.org>
      [will: fixed sparse issues]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3071f13d
  3. 01 4月, 2017 3 次提交
    • M
      drivers/perf: arm_pmu: split irq request from enable · c09adab0
      Mark Rutland 提交于
      For historical reasons, we lazily request and free interrupts in the
      arm pmu driver. This requires us to refcount use of the pmu (by way of
      counting the active events) in order to request/free interrupts at the
      correct times, which complicates the driver somewhat.
      
      The existing logic is flawed, as it only considers currently online CPUs
      when requesting, freeing, or managing the affinity of interrupts.
      Intervening hotplug events can result in erroneous IRQ affinity, online
      CPUs for which interrupts have not been requested, or offline CPUs whose
      interrupts are still requested.
      
      To fix this, this patch splits the requesting of interrupts from any
      per-cpu management (i.e. per-cpu enable/disable, and configuration of
      cpu affinity). We now request all interrupts up-front at probe time (and
      never free them, since we never unregister PMUs).
      
      The management of affinity, and per-cpu enable/disable now happens in
      our cpu hotplug callback, ensuring it occurs consistently. This means
      that we must now invoke the CPU hotplug callback at boot time in order
      to configure IRQs, and since the callback also resets the PMU hardware,
      we can remove the duplicate reset in the probe path.
      
      This rework renders our event refcounting unnecessary, so this is
      removed.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      [will: make armpmu_get_cpu_irq static]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c09adab0
    • M
      drivers/perf: arm_pmu: manage interrupts per-cpu · 7ed98e01
      Mark Rutland 提交于
      When requesting or freeing interrupts, we use platform_get_irq() to find
      relevant irqs, backing this up with additional information in an
      optional irq_affinity table.
      
      This means that our irq request and free paths are tied to a
      platform_device, and our request path must jump through a number of
      hoops in order to determine the required affinity of each interrupt.
      
      Given that the affinity must be static, we can compute the affinity once
      up-front at probe time, simplifying the irq request and free paths. By
      recording interrupts in a per-cpu data structure, we simplify a few
      paths, and permit a subsequent rework of the request and free paths.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      [will: rename local nr_irqs variable to avoid conflict with global]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7ed98e01
    • M
      drivers/perf: arm_pmu: rework per-cpu allocation · 2681f018
      Mark Rutland 提交于
      For historical reasons, we allocate per-cpu data associated with a PMU
      rather late, in cpu_pmu_init, after we've parsed whatever hardware
      information we were provided with.
      
      In order to allow use to store some per-cpu data early in the probe
      path, we need to allocate (and initialise) the per-cpu data earlier.
      This patch reworks the way we allocate the pmu and associated per-cpu
      data in order to make that possible.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      [will: make armpmu_{alloc,free} static
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      2681f018
  4. 02 3月, 2017 1 次提交
  5. 09 2月, 2017 1 次提交
  6. 04 2月, 2017 1 次提交
    • S
      perf: xgene: Include module.h · c0bfc549
      Stephen Boyd 提交于
      I ran into a build error when I disabled CONFIG_ACPI and tried to
      compile this driver:
      
      drivers/perf/xgene_pmu.c:1242:1: warning: data definition has no type or storage class
       MODULE_DEVICE_TABLE(of, xgene_pmu_of_match);
       ^
      drivers/perf/xgene_pmu.c:1242:1: error: type defaults to 'int' in declaration of 'MODULE_DEVICE_TABLE' [-Werror=implicit-int]
      
      Include module.h for the MODULE_DEVICE_TABLE macro that's
      implicitly included through ACPI.
      Tested-by: NTai Nguyen <ttnguyen@apm.com>
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      c0bfc549
  7. 25 12月, 2016 1 次提交
  8. 17 10月, 2016 1 次提交
  9. 17 9月, 2016 1 次提交
  10. 16 9月, 2016 1 次提交
  11. 09 9月, 2016 3 次提交
    • M
      drivers/perf: arm_pmu: expose a cpumask in sysfs · 48538b58
      Mark Rutland 提交于
      In systems with heterogeneous CPUs, there are multiple logical CPU PMUs,
      each of which covers a subset of CPUs in the system. In some cases
      userspace needs to know which CPUs a given logical PMU covers, so we'd
      like to expose a cpumask under sysfs, similar to what is done for uncore
      PMUs.
      
      Unfortunately, prior to commit 00e727bb ("perf stat: Balance
      opening and reading events"), perf stat only correctly handled a cpumask
      holding a single CPU, and only when profiling in system-wide mode. In
      other cases, the presence of a cpumask file could cause perf stat to
      behave erratically.
      
      Thus, exposing a cpumask file would break older perf binaries in cases
      where they would otherwise work.
      
      To avoid this issue while still providing userspace with the information
      it needs, this patch exposes a differently-named file (cpus) under
      sysfs. New tools can look for this and operate correctly, while older
      tools will not be adversely affected by its presence.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      48538b58
    • M
      drivers/perf: arm_pmu: only use common attr_groups · 1589680d
      Mark Rutland 提交于
      Now that the 32-bit and 64-bit perf backends use the common groups
      directly, remove the fallback and no longer allow the groups array to be
      overridden.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      1589680d
    • M
      drivers/perf: arm_pmu: add common attr group fields · 86cdd72a
      Mark Rutland 提交于
      In preparation for adding common attribute groups, add an array of
      attribute group pointers to arm_pmu, which will be used if the
      backend hasn't already set pmu::attr_groups.
      
      Subsequent patches will move backends over to using these, before adding
      common fields.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      86cdd72a
  12. 07 9月, 2016 1 次提交
  13. 03 9月, 2016 3 次提交
  14. 10 8月, 2016 2 次提交
    • M
      drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property · 7f1d642f
      Marc Zyngier 提交于
      Patch 19a469a5 ("drivers/perf: arm-pmu: Handle per-interrupt
      affinity mask") added support for partitionned PPI setups, but
      inadvertently broke setups using SPIs without the "interrupt-affinity"
      property (which is the case for UP platforms).
      
      This patch restore the broken functionnality by testing whether the
      interrupt is percpu or not instead of relying on the using_spi flag
      that really means "SPI *and* interrupt-affinity property".
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Fixes: 19a469a5 ("drivers/perf: arm-pmu: Handle per-interrupt affinity mask")
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7f1d642f
    • S
      drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock · a026bb12
      Sudeep Holla 提交于
      arm_pmu_mutex is never held long and we don't want to sleep while the
      lock is being held as it's executed in the context of hotplug notifiers.
      So it can be converted to a simple spinlock instead.
      
      Without this patch we get the following warning:
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/2
      no locks held by swapper/2/0.
      irq event stamp: 381314
      hardirqs last  enabled at (381313): _raw_spin_unlock_irqrestore+0x7c/0x88
      hardirqs last disabled at (381314): cpu_die+0x28/0x48
      softirqs last  enabled at (381294): _local_bh_enable+0x28/0x50
      softirqs last disabled at (381293): irq_enter+0x58/0x78
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.7.0 #12
      Call trace:
       dump_backtrace+0x0/0x220
       show_stack+0x24/0x30
       dump_stack+0xb4/0xf0
       ___might_sleep+0x1d8/0x1f0
       __might_sleep+0x5c/0x98
       mutex_lock_nested+0x54/0x400
       arm_perf_starting_cpu+0x34/0xb0
       cpuhp_invoke_callback+0x88/0x3d8
       notify_cpu_starting+0x78/0x98
       secondary_start_kernel+0x108/0x1a8
      
      This patch converts the mutex to spinlock to eliminate the above
      warnings. This constraints pmu->reset to be non-blocking call which is
      the case with all the ARM PMU backends.
      
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Fixes: 37b502f1 ("arm/perf: Fix hotplug state machine conversion")
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      a026bb12
  15. 20 7月, 2016 1 次提交
    • S
      arm/perf: Fix hotplug state machine conversion · 37b502f1
      Sebastian Andrzej Siewior 提交于
      Mark Rutland pointed out that this commit is incomplete:
      
        7d88eb69 ("arm/perf: Convert to hotplug state machine")
      
      The problem is that:
      
       > We may have multiple PMUs (e.g. two in big.LITTLE systems), and
       > __oprofile_cpu_pmu only contains one of these. So this conversion is not
       > correct.
       >
       > We were relying on the notifier list implicitly containing a list of
       > those PMUs. It seems like we need an explicit list here.
       >
       > We keep __oprofile_cpu_pmu around for legacy 32-bit users of OProfile
       > (on non-hetereogeneous systems), and that's all that the variable should
       > be used for.
      
      Introduce arm_pmu_list to correctly handle multiple PMUs in the system.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-tip-commits@vger.kernel.org
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160719111733.GA22911@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      37b502f1
  16. 15 7月, 2016 1 次提交
  17. 09 7月, 2016 1 次提交
  18. 15 6月, 2016 1 次提交
  19. 03 6月, 2016 3 次提交
  20. 05 5月, 2016 1 次提交
    • M
      perf/arm: Special-case hetereogeneous CPUs · 5101ef20
      Mark Rutland 提交于
      Commit:
      
        26657848 ("perf/core: Verify we have a single perf_hw_context PMU")
      
      forcefully prevents multiple PMUs from sharing perf_hw_context, as this
      generally doesn't make sense. It is a common bug for uncore PMUs to
      use perf_hw_context rather than perf_invalid_context, which this detects.
      
      However, systems exist with heterogeneous CPUs (and hence heterogeneous
      HW PMUs), for which sharing perf_hw_context is necessary, and possible
      in some limited cases.
      
      To make this work we have to perform some gymnastics, as we did in these
      commits:
      
        66eb579e ("perf: allow for PMU-specific event filtering")
        c904e32a ("arm: perf: filter unschedulable events")
      
      To allow those systems to work, we must allow PMUs for heterogeneous
      CPUs to share perf_hw_context, though we must still disallow sharing
      otherwise to detect the common misuse of perf_hw_context.
      
      This patch adds a new PERF_PMU_CAP_HETEROGENEOUS_CPUS for this, updates
      the core logic to account for this, and makes use of it in the arm_pmu
      code that is used for systems with heterogeneous CPUs. Comments are
      added to make the rationale clear and hopefully avoid accidental abuse.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/20160426103346.GA20836@leverpostejSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5101ef20
  21. 21 4月, 2016 1 次提交
    • L
      drivers/perf: arm-pmu: fix RCU usage on pmu resume from low-power · cbcc72e0
      Lorenzo Pieralisi 提交于
      Commit da4e4f18 ("drivers/perf: arm_pmu: implement CPU_PM notifier")
      added code in the arm perf infrastructure that allows the kernel to
      save/restore perf counters whenever the CPU enters a low-power
      state. The kernel saves/restores the counters for each active event
      through the armpmu_{stop/start} ARM pmu API, so that the low-power state
      enter/exit cycle is emulated through pmu start/stop operations for each
      event in use.
      
      However, calling armpmu_start() for each active event on power up
      executes code that requires RCU locking (perf_event_update_userpage())
      to be functional, so, given that the core may call the CPU_PM notifiers
      while running the idle thread in an quiescent RCU state this is not
      allowed as detected through the following splat when kernel is run with
      CONFIG_PROVE_LOCKING enabled:
      
      [   49.293286]
      [   49.294761] ===============================
      [   49.298895] [ INFO: suspicious RCU usage. ]
      [   49.303031] 4.6.0-rc3+ #421 Not tainted
      [   49.306821] -------------------------------
      [   49.310956] include/linux/rcupdate.h:872 rcu_read_lock() used
      illegally while idle!
      [   49.318530]
      [   49.318530] other info that might help us debug this:
      [   49.318530]
      [   49.326451]
      [   49.326451] RCU used illegally from idle CPU!
      [   49.326451] rcu_scheduler_active = 1, debug_locks = 0
      [   49.337209] RCU used illegally from extended quiescent state!
      [   49.342892] 2 locks held by swapper/2/0:
      [   49.346768]  #0:  (cpu_pm_notifier_lock){......}, at:
      [<ffffff8008163c28>] cpu_pm_exit+0x18/0x80
      [   49.355492]  #1:  (rcu_read_lock){......}, at: [<ffffff800816dc38>]
      perf_event_update_userpage+0x0/0x260
      
      This patch wraps the armpmu_start() call (that indirectly calls
      perf_event_update_userpage()) on CPU_PM notifier power state exit (or
      failed entry) within the RCU_NONIDLE() macro so that the RCU subsystem
      is made aware the calling cpu is not idle from an RCU perspective for
      the armpmu_start() call duration, therefore fixing the issue.
      
      Fixes: da4e4f18 ("drivers/perf: arm_pmu: implement CPU_PM notifier")
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Reported-by: NJames Morse <james.morse@arm.com>
      Suggested-by: NKevin Hilman <khilman@baylibre.com>
      Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org>
      Cc: Kevin Hilman <khilman@baylibre.com>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      cbcc72e0
  22. 21 3月, 2016 1 次提交