提交 · f1fb60bfde65fe4c4372d480d1b5d57bdba20367 · openanolis / cloud-kernel

05 7月, 2016 5 次提交

powerpc/perf: Export Power9 generic and cache events to sysfs · f1fb60bf

由 Madhavan Srinivasan 提交于 6月 26, 2016

Export the generic hardware and cache perf events for Power9 to sysfs,
so users can determine the PMU event monitored.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f1fb60bf

powerpc/perf: Power9 PMU support · 8c002dbd

由 Madhavan Srinivasan 提交于 6月 26, 2016

This patch adds base enablement for the power9 PMU.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8c002dbd

powerpc/perf: Add power9 event list macros for generic and cache events · 34922527

由 Madhavan Srinivasan 提交于 6月 26, 2016

Add macros for the generic and cache events on Power9
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

34922527

powerpc/perf: factor out power8 pmu functions · 7ffd948f

由 Madhavan Srinivasan 提交于 6月 26, 2016

Factor out some of the power8 pmu functions
to new file "isa207-common.c" to share with
power9 pmu code. Only code movement and no
logic change
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7ffd948f

powerpc/perf: factor out power8 pmu macros and defines · 4d3576b2

由 Madhavan Srinivasan 提交于 6月 26, 2016

Factor out some of the power8 pmu macros to
new a header file to share with power9 pmu code.
Just code movement and no logic change.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4d3576b2

14 6月, 2016 1 次提交

powerpc: Various typo fixes · 027dfac6

由 Michael Ellerman 提交于 6月 01, 2016

Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

027dfac6

17 5月, 2016 3 次提交

perf core: Add perf_callchain_store_context() helper · 3e4de4ec

由 Arnaldo Carvalho de Melo 提交于 5月 12, 2016

We need have different helpers to account how many contexts we have in
the sample and for real addresses, so do it now as a prep patch, to
ease review.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3e4de4ec

perf core: Add a 'nr' field to perf_event_callchain_context · 3b1fff08

由 Arnaldo Carvalho de Melo 提交于 5月 10, 2016

We will use it to count how many addresses are in the entry->ip[] array,
excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
return the number of entries specified by the user via the relevant
sysctl, kernel.perf_event_max_contexts, or via the per event
perf_event_attr.sample_max_stack knob.

This way we keep the perf_sample->ip_callchain->nr meaning, that is the
number of entries, be it real addresses or PERF_CONTEXT_ entries, while
honouring the max_stack knobs, i.e. the end result will be max_stack
entries if we have at least that many entries in a given stack trace.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3b1fff08

perf core: Pass max stack as a perf_callchain_entry context · cfbcf468

由 Arnaldo Carvalho de Melo 提交于 4月 28, 2016

This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

cfbcf468

01 5月, 2016 1 次提交

powerpc/mm: Use pte_user() instead of open coding · e7bfc462

由 Aneesh Kumar K.V 提交于 4月 29, 2016

We have a common declaration in pte-common.h Add a book3s specific one
and switch to pte_user() in callchain.c. In a subsequent patch we will
switch _PAGE_USER to _PAGE_PRIVILEGED in the book3s version only.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e7bfc462

27 4月, 2016 2 次提交

perf core: Allow setting up max frame stack depth via sysctl · c5dfd78e

由 Arnaldo Carvalho de Melo 提交于 4月 21, 2016

The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.

And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.

The new file is:

  # cat /proc/sys/kernel/perf_event_max_stack
  127

Chaging it:

  # echo 256 > /proc/sys/kernel/perf_event_max_stack
  # cat /proc/sys/kernel/perf_event_max_stack
  256

But as soon as there is some event using callchains we get:

  # echo 512 > /proc/sys/kernel/perf_event_max_stack
  -bash: echo: write error: Device or resource busy
  #

Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.
Reported-and-Tested-by: NBrendan Gregg <brendan.d.gregg@gmail.com>
Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

c5dfd78e

powerpc/perf: Replace raw event hex values with #defines · 5bcca743

由 Madhavan Srinivasan 提交于 4月 21, 2016

Minor cleanup patch to replace the raw event hex values in
power8-pmu.c with #defines.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5bcca743

21 4月, 2016 1 次提交

powerpc/perf: Add support for sampling interrupt register state · ed4a4ef8

由 Anju T 提交于 2月 20, 2016

The perf infrastructure uses a bit mask to find out valid registers to
display. Define a register mask for supported registers defined in
uapi/asm/perf_regs.h. The bit positions also correspond to register IDs
which is used by perf infrastructure to fetch the register values.
CONFIG_HAVE_PERF_REGS enables sampling of the interrupted machine state.
Signed-off-by: NAnju T <anju@linux.vnet.ibm.com>
[mpe: Add license, use CONFIG_PPC64, fix 32-bit build]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ed4a4ef8

10 3月, 2016 7 次提交

powerpc/perf: Fix misleading comment in pmao_restore_workaround() · 58bffb5b

由 Madhavan Srinivasan 提交于 3月 04, 2016

The current comment in pmao_restore_workaround() regarding
hard_irq_disable() is wrong. It should say to hard *disable* interrupts
instead of *enable*. Fix it.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

58bffb5b

powerpc/perf/24x7: Eliminate domain suffix in event names · 8f69dc70

由 Sukadev Bhattiprolu 提交于 2月 16, 2016

The Physical Core events of the 24x7 PMU can be monitored across various
domains (physical core, vcpu home core, vcpu home node etc). For each of
these core events, we currently create multiple events in sysfs, one for
each domain the event can be monitored in. These events are distinguished
by their suffixes like __PHYS_CORE, __VCPU_HOME_CORE etc.

Rather than creating multiple such entries, we could let the user specify
make 'domain' index a required parameter and let the user specify a value
for it (like they currently specify the core index).

	$ cat /sys/bus/event_source/devices/hv_24x7/events/HPM_CCYC
	domain=?,offset=0x98,core=?,lpar=0x0

	$ perf stat -C 0 -e hv_24x7/HPM_CCYC,domain=2,core=1/ true

(the 'domain=?' and 'core=?' in sysfs tell perf tool to enforce them as
required parameters).

This simplifies the interface and allows users to identify events by the
name specified in the catalog (User can determine the domain index by
referring to '/sys/bus/event_source/devices/hv_24x7/interface/domains').

Eliminating the event suffix eliminates several functions and simplifies
code.

Note that Physical Chip events can only be monitored in the chip domain
so those events have the domain set to 1 (rather than =?) and users don't
need to specify the domain index for the Chip events.

	$ cat /sys/bus/event_source/devices/hv_24x7/events/PM_XLINK_CYCLES
	domain=1,offset=0x230,chip=?,lpar=0x0

	$ perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES,chip=1/ true
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8f69dc70

powerpc/perf/hv-24x7: Display domain indices in sysfs · d34171e8

由 Sukadev Bhattiprolu 提交于 2月 16, 2016

To help users determine domains, display the domain indices used by the
kernel in sysfs.

	$ cat /sys/bus/event_source/devices/hv_24x7/interface/domains
	1: Physical Chip
	2: Physical Core
	3: VCPU Home Core
	4: VCPU Home Chip
	5: VCPU Home Node
	6: VCPU Remote Node
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d34171e8

powerpc/perf/hv-24x7: Display change in counter values · 2b206ee6

由 Sukadev Bhattiprolu 提交于 1月 25, 2016

For 24x7 counters, perf displays the raw value of the 24x7 counter, which
is a monotonically increasing value.

	perf stat -C 0 -e \
		'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
		sleep 1

 Performance counter stats for 'CPU(s) 0':

     9,105,403,170      hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/

       0.000425751 seconds time elapsed

In the typical usage of 'perf stat' this counter value is not as useful
as the _change_ in the counter value over the duration of the application.

Have h_24x7_event_init() set the event's prev_count to the raw value of
the 24x7 counter at the time of initialization. When the application
terminates, hv_24x7_event_read() will compute the change in value and
report to the perf tool. Similarly, for the transaction interface, clear
the event count to 0 at the beginning of the transaction.

	perf stat -C 0 -e \
		'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
		sleep 1

 Performance counter stats for 'CPU(s) 0':

           245,758      hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/

       1.006366383 seconds time elapsed
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2b206ee6

powerpc/perf/hv-24x7: Fix usage with chip events. · e5a5886d

由 Sukadev Bhattiprolu 提交于 1月 23, 2016

24x7 counters can belong to different domains (core, chip, virtual CPU
etc). For events in the 'chip' domain, sysfs entry currently looks like:

	$ cd /sys/bus/event_source/devices/hv_24x7/events
	$ cat PM_XLINK_CYCLES__PHYS_CHIP
	domain=0x1,offset=0x230,core=?,lpar=0x0

where the required parameter, 'core=?' is specified with perf as:

	perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,core=1/ \
		/bin/true

This is inconsistent in that 'core' is a required parameter for a chip
event.  Instead, have the the sysfs entry display 'chip=?' for chip
events:

	$ cd /sys/bus/event_source/devices/hv_24x7/events
	$ cat PM_XLINK_CYCLES__PHYS_CHIP
	domain=0x1,offset=0x230,chip=?,lpar=0x0

We also need to add a 'chip' entry in the sysfs format directory:

	$ ls /sys/bus/event_source/devices/hv_24x7/format
	chip  core  domain  lpar  offset  vcpu
	^^^^
	(new)

so the perf tool can automatically check usage and format the chip
parameter correctly:

	$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/ \
		/bin/true
	Required parameter 'chip' not specified
	invalid or unsupported event: 'hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/'

	$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/ \
		/bin/true
	hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/: 0 6628908 6628908

	 Performance counter stats for 'CPU(s) 0':

	         0      hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/

	    0.006606970 seconds time elapsed
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e5a5886d

powerpc/perf: Export Power8 generic and cache events to sysfs · e0728b50

由 Sukadev Bhattiprolu 提交于 1月 11, 2016

Power8 supports a large number of events in each susbystem so when a
user runs:

	perf stat -e branch-instructions sleep 1
	perf stat -e L1-dcache-loads sleep 1

it is not clear as to which PMU events were monitored.

Export the generic hardware and cache perf events for Power8 to sysfs,
so users can precisely determine the PMU event monitored by the generic
event.

Eg:
	cat /sys/bus/event_source/devices/cpu/events/branch-instructions
	event=0x10068

	$ cat /sys/bus/event_source/devices/cpu/events/L1-dcache-loads
	event=0x100ee
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e0728b50

powerpc/perf: Remove PME_ prefix for power7 events · d4969e24

由 Sukadev Bhattiprolu 提交于 1月 11, 2016

We used the PME_ prefix earlier to avoid some macro/variable name
collisions.  We have since changed the way we define/use the event
macros so we no longer need the prefix.

By dropping the prefix, we keep the the event macros consistent with
their official names.
Reported-by: NMichael Ellerman <ellerman@au1.ibm.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d4969e24

01 3月, 2016 1 次提交

powerpc: Fix misspellings in comments. · 446957ba

由 Adam Buchbinder 提交于 2月 24, 2016

Signed-off-by: NAdam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

446957ba

16 2月, 2016 1 次提交

powerpc: Make vmalloc_to_phys() public · e9ab1a1c

由 Alexey Kardashevskiy 提交于 2月 15, 2016

This makes vmalloc_to_phys() public as there will be another user
(KVM in-kernel VFIO acceleration) for it soon. As this new user
can be compiled as a module, this exports the symbol.

As a little optimization, this changes the helper to call
vmalloc_to_pfn() instead of vmalloc_to_page() as the size of the
struct page may not be power-of-two aligned which will make gcc use
multiply instructions instead of shifts.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

e9ab1a1c

10 2月, 2016 1 次提交

powerpc/perf/hv-gpci: Increase request buffer size · e4f226b1

由 Sukadev Bhattiprolu 提交于 2月 09, 2016

The GPCI hcall allows for a 4K buffer but we limit the buffer to 1K.
The problem with a 1K buffer is if a request results in returning
more values than can be accomodated in the 1K buffer the request will
fail.

The buffer we are using is currently allocated on the stack and hence
limited in size. Instead use a per-CPU 4K buffer like we do with 24x7
counters (hv-24x7.c).

While here, rename the macro GPCI_MAX_DATA_BYTES to HGPCI_MAX_DATA_BYTES
for consistency with 24x7 counters.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

e4f226b1

28 1月, 2016 1 次提交

powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 · 370f06c8

由 Madhavan Srinivasan 提交于 1月 25, 2016

Commit 7a786832 ("powerpc/perf: Add an explict flag indicating
presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove
the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not
set.

That commit's changelog also mentions that Power8 does not support
MMCRA[SLOT]. However when the Power8 PMU support was merged, it
errnoeously included the PPMU_HAS_SSLOT flag.

So remove PPMU_HAS_SSLOT from the Power8 flags.

mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39
(IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with
the high bits of the Threshold Event Counter Mantissa. I am not aware of
any published events which use the threshold counting mechanism, which
would cause the mantissa bits to be set. So in practice this bug is
unlikely to trigger.

Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support")
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

370f06c8

20 10月, 2015 1 次提交

perf/powerpc: Add support for PERF_SAMPLE_BRANCH_CALL · 24f1a79a

由 Stephane Eranian 提交于 10月 13, 2015

The patch catches PERF_SAMPLE_BRANCH_CALL because it is not clear whether
this is actually supported by the hardware.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-4-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

24f1a79a

12 10月, 2015 1 次提交

powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6

由 Aneesh Kumar K.V 提交于 10月 09, 2015

We need to properly identify whether a hugepage is an explicit or
a transparent hugepage in follow_huge_addr(). We used to depend
on hugepage shift argument to do that. But in some case that can
result in wrong results. For ex:

On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
We do prevent reusing the pfn page via the usage of
kick_all_cpus_sync(). But that happens after we updated the pte to 0.
Hence in follow_huge_addr() we can find hugepage shift set, but transparent
huge page check fail for a thp pte.

NOTE: We fixed a variant of this race against thp split in commit
691e95fd
("powerpc/mm/thp: Make page table walk safe against thp split/collapse")

Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
follow_page_mask occasionally.

In the long term, we may want to switch ppc64 64k page size config to
enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

891121e6

13 9月, 2015 3 次提交

perf/core: Drop PERF_EVENT_TXN · 8f3e5684

由 Sukadev Bhattiprolu 提交于 9月 03, 2015

We currently use PERF_EVENT_TXN flag to determine if we are in the middle
of a transaction. If in a transaction, we defer the schedulability checks
from pmu->add() operation to the pmu->commit() operation.

Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
we can use the type to determine if we are in a transaction and drop the
PERF_EVENT_TXN flag.

When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
becomes unused, so drop that field as well.

This is an extension of the Powerpc patch from Peter Zijlstra to s390,
Sparc and x86 architectures.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8f3e5684

powerpc, perf/powerpc/hv-24x7: Use PMU_TXN_READ interface · 88a48613

由 Sukadev Bhattiprolu 提交于 9月 03, 2015

The 24x7 counters in Powerpc allow monitoring a large number of counters
simultaneously. They also allow reading several counters in a single
HCALL so we can get a more consistent snapshot of the system.

Use the PMU's transaction interface to monitor and read several event
counters at once. The idea is that users can group several 24x7 events
into a single group of events. We use the following logic to submit
the group of events to the PMU and read the values:

	pmu->start_txn()		// Initialize before first event

	for each event in group
		pmu->read(event);	// Queue each event to be read

	pmu->commit_txn()		// Read/update all queuedcounters

The ->commit_txn() also updates the event counts in the respective
perf_event objects.  The perf subsystem can then directly get the
event counts from the perf_event and can avoid submitting a new
->read() request to the PMU.

Thanks to input from Peter Zijlstra.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-10-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

88a48613

perf/core: Add a 'flags' parameter to the PMU transactional interfaces · fbbe0701

由 Sukadev Bhattiprolu 提交于 9月 03, 2015

Currently, the PMU interface allows reading only one counter at a time.
But some PMUs like the 24x7 counters in Power, support reading several
counters at once. To leveage this functionality, extend the transaction
interface to support a "transaction type".

The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
i.e. used to _schedule_ all the events on the PMU as a group. A second
transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
by the 24x7 counters to read several counters at once.

Extend the transaction interfaces to the PMU to accept a 'txn_flags'
parameter and use this parameter to ignore any transactions that are
not of type PERF_PMU_TXN_ADD.

Thanks to Peter Zijlstra for his input.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[peterz: s390 compile fix]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

fbbe0701

27 7月, 2015 1 次提交

powerpc/perf: Change type of the bhrb_users variable · f0322f7f

由 Anshuman Khandual 提交于 6月 30, 2015

This patch just changes data type of bhrb_users variable from
int to unsigned int because it never contains a negative value.
Reported-by: NDaniel Axtens <dja@axtens.net>
Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f0322f7f

25 7月, 2015 2 次提交

powerpc/perf/hv-24x7: Simplify extracting counter from result buffer · 465345ca

由 Sukadev Bhattiprolu 提交于 7月 14, 2015

Simplify code that extracts a 24x7 counter from the HCALL's result buffer.
Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

465345ca

powerpc/perf/hv-24x7: Whitespace - fix parameter alignment · 40386217

由 Sukadev Bhattiprolu 提交于 7月 14, 2015

Fix parameter alignment to be consistent with coding style.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

40386217

08 7月, 2015 1 次提交

powerpc/perf/24x7: Fix lockdep warning · 442053e5

由 Sukadev Bhattiprolu 提交于 7月 07, 2015

The sysfs attributes for the 24x7 counters are dynamically allocated.
Initialize the attributes using sysfs_attr_init() to fix following
warning which occurs when CONFIG_DEBUG_LOCK_VMALLOC=y.

[    0.346249] audit: initializing netlink subsys (disabled)
[    0.346284] audit: type=2000 audit(1436295254.340:1): initialized
[    0.346489] BUG: key c0000000efe90198 not in .data!
[    0.346491] DEBUG_LOCKS_WARN_ON(1)
[    0.346502] ------------[ cut here ]------------
[    0.346504] WARNING: at ../kernel/locking/lockdep.c:3002
[    0.346506] Modules linked in:
Reported-by: NGustavo Luiz Duarte <gustavold@linux.vnet.ibm.com>
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Tested-by: NGustavo Luiz Duarte <gustavold@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

442053e5

02 6月, 2015 1 次提交

powerpc/perf: Fix book3s kernel to userspace backtraces · 72e349f1

由 Anton Blanchard 提交于 5月 26, 2015

When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.

If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().

Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:

0.11%  qemu-system-ppc  [kernel.kallsyms]        [k] _raw_spin_lock_irqsave
       |
       ---_raw_spin_lock_irqsave
          |
          |--52.35%-- 0
          |          |
          |          |--46.39%-- __hrtimer_start_range_ns
          |          |          kvmppc_run_core
          |          |          kvmppc_vcpu_run_hv
          |          |          kvmppc_vcpu_run
          |          |          kvm_arch_vcpu_ioctl_run
          |          |          kvm_vcpu_ioctl
          |          |          do_vfs_ioctl
          |          |          sys_ioctl
          |          |          system_call
          |          |          |
          |          |          |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
          |          |          |          |
          |          |          |           --100.00%-- 0x7e714
          |          |          |                     0x7e714

Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.

Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:

     0.47%  qemu-system-ppc  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
            |
            ---_raw_spin_lock_irqsave
               |
               |--53.83%-- 0
               |          |
               |          |--44.73%-- hrtimer_try_to_cancel
               |          |          kvmppc_start_thread
               |          |          kvmppc_run_core
               |          |          kvmppc_vcpu_run_hv
               |          |          kvmppc_vcpu_run
               |          |          kvm_arch_vcpu_ioctl_run
               |          |          kvm_vcpu_ioctl
               |          |          do_vfs_ioctl
               |          |          sys_ioctl
               |          |          system_call
               |          |          __ioctl
               |          |          0x7e714
               |          |          0x7e714

Cc: stable@vger.kernel.org
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

72e349f1

17 4月, 2015 1 次提交

powerpc/mm/thp: Make page table walk safe against thp split/collapse · 691e95fd

由 Aneesh Kumar K.V 提交于 3月 30, 2015

We can disable a THP split or a hugepage collapse by disabling irq.
We do send IPI to all the cpus in the early part of split/collapse,
and disabling local irq ensure we don't make progress with
split/collapse. If the THP is getting split we return NULL from
find_linux_pte_or_hugepte(). For all the current callers it should be ok.
We need to be careful if we want to use returned pte_t pointer outside
the irq disabled region. W.r.t to THP split, the pfn remains the same,
but then a hugepage collapse will result in a pfn change. There are
few steps we can take to avoid a hugepage collapse.One way is to take page
reference inside the irq disable region. Other option is to take
mmap_sem so that a parallel collapse will not happen. We can also
disable collapse by taking pmd_lock. Another method used by kvm
subsystem is to check whether we had a mmu_notifer update in between
using mmu_notifier_retry().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

691e95fd

14 4月, 2015 2 次提交

powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH · 9a5cbce4

由 Anton Blanchard 提交于 4月 14, 2015

We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.

Cc: stable@vger.kernel.org
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

9a5cbce4

powerpc/perf/hv-24x7: Fail 24x7 initcall if create_events_from_catalog() fails · 7debc970

由 Li Zhong 提交于 4月 13, 2015

As Michael pointed out, create_events_from_catalog() fails when we
either have:
 - a kernel bug
 - some sort of hypervisor misconfiguration
 - ENOMEM

In all the above cases, we can also fail 24x7 initcall.

For hypervisor errors, EIO is used so there is something reported
in dmesg.
Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7debc970

11 4月, 2015 3 次提交

powerpc/perf/hv-24x7: Add missing put_cpu_var() · b816ce67

由 Sukadev Bhattiprolu 提交于 2月 17, 2015

Add missing put_cpu_var() for 24x7 requests. This went missing in
commit f34b6c72 (3.18-rc3).
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b816ce67

powerpc/perf/hv-24x7: Break up single_24x7_request · aeab199d

由 Sukadev Bhattiprolu 提交于 3月 30, 2015

Break up the function single_24x7_request() into smaller functions.
This would later enable us to "prepare" a multi-event request
buffer and then submit a single hcall for several events.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

aeab199d

powerpc/perf/hv-24x7: Define update_event_count() · 529ce8c9

由 Sukadev Bhattiprolu 提交于 3月 30, 2015

Move the code to update an event count into a new function,
update_event_count().
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

529ce8c9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功