提交 · 5aa04b3eb6fca63d2e9827be656dcadc26d54e11 · openanolis / cloud-kernel

04 12月, 2017 1 次提交

powerpc/perf: Fix oops when grouping different pmu events · 5aa04b3e

由 Ravi Bangoria 提交于 11月 30, 2017

When user tries to group imc (In-Memory Collections) event with
normal event, (sometime) kernel crashes with following log:

    Faulting instruction address: 0x00000000
    [link register   ] c00000000010ce88 power_check_constraints+0x128/0x980
    ...
    c00000000010e238 power_pmu_event_init+0x268/0x6f0
    c0000000002dc60c perf_try_init_event+0xdc/0x1a0
    c0000000002dce88 perf_event_alloc+0x7b8/0xac0
    c0000000002e92e0 SyS_perf_event_open+0x530/0xda0
    c00000000000b004 system_call+0x38/0xe0

'event_base' field of 'struct hw_perf_event' is used as flags for
normal hw events and used as memory address for imc events. While
grouping these two types of events, collect_events() tries to
interpret imc 'event_base' as a flag, which causes a corruption
resulting in a crash.

Consider only those events which belongs to 'perf_hw_context' in
collect_events().
Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-By: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

5aa04b3e

22 11月, 2017 2 次提交

powerpc/perf: Fix IMC_MAX_PMU macro · 73ce9aec

由 Madhavan Srinivasan 提交于 11月 22, 2017

IMC_MAX_PMU is used for static storage (per_nest_pmu_arr) which holds
nest pmu information. Current value for the macro is 32 based on
the initial number of nest pmu units supported by the nest microcode.
But going forward, microcode could support more nest units. Instead
of static storage, patch to fix the code to dynamically allocate an
array based on the number of nest imc units found in the device tree.

Fixes:8f95faaa ('powerpc/powernv: Detect and create IMC device')
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

73ce9aec

powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id() · f3f1dfd6

由 Michael Ellerman 提交于 10月 16, 2017

init_imc_pmu() uses topology_physical_package_id() to detect the
node id of the processor it is on to get local memory, but that's
wrong, and can lead to crashes. Fix it to use cpu_to_node().

Fixes: 885dcd70 ("powerpc/perf: Add nest IMC PMU support")
Cc: stable@vger.kernel.org # v4.14+
Reported-By: NRob Lippert <rlippert@google.com>
Tested-By: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f3f1dfd6

03 11月, 2017 1 次提交

powerpc/perf: Fix core-imc hotplug callback failure during imc initialization · 7ecb37f6

由 Madhavan Srinivasan 提交于 11月 02, 2017

Call trace observed during boot:

  nest_capp0_imc performance monitor hardware support registered
  nest_capp1_imc performance monitor hardware support registered
  core_imc memory allocation for cpu 56 failed
  Unable to handle kernel paging request for data at address 0xffa400010
  Faulting instruction address: 0xc000000000bf3294
  0:mon> e
  cpu 0x0: Vector: 300 (Data Access) at [c000000ff38ff8d0]
      pc: c000000000bf3294: mutex_lock+0x34/0x90
      lr: c000000000bf3288: mutex_lock+0x28/0x90
      sp: c000000ff38ffb50
     msr: 9000000002009033
     dar: ffa400010
   dsisr: 80000
    current = 0xc000000ff383de00
    paca    = 0xc000000007ae0000	 softe: 0	 irq_happened: 0x01
      pid   = 13, comm = cpuhp/0
  Linux version 4.11.0-39.el7a.ppc64le (mockbuild@ppc-058.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Oct 3 07:42:44 EDT 2017
  0:mon> t
  [c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470
  [c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0
  [c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0
  [c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0
  [c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0
  [c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0
  [c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74

While registering the cpuhoplug callbacks for core-imc, if we fails
in the cpuhotplug online path for any random core (either because opal call to
initialize the core-imc counters fails or because memory allocation fails for
that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who
successfully returned from cpuhotplug online path.

But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event
context, when core-imc counters are not even initialized. Thus creating the
above stack dump.

Add a check to see if core-imc counters are enabled or not in the cpuhotplug
offline path before migrating the context to handle this failing scenario.

Fixes: 885dcd70 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7ecb37f6

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

25 10月, 2017 1 次提交

powerpc/perf: Fix IMC allocation routine · 0b167f11

由 Guilherme G. Piccoli 提交于 10月 19, 2017

When setting nr_cpus=1, we observed a crash in IMC code during boot
due to a missing allocation: basically, IMC code is taking the number
of threads into account in imc_mem_init() and if we manually set
nr_cpus for a value that is not multiple of the number of threads per
core, an integer division in that function will discard the decimal
portion, leading IMC to not allocate one mem_info struct. This causes
a NULL pointer dereference later, on is_core_imc_mem_inited().

This patch just rounds that division up, fixing the bug.
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0b167f11

22 10月, 2017 1 次提交

powerpc/perf/hv-24x7: Fix incorrect comparison in memord · 05c14c03

由 Michael Ellerman 提交于 10月 09, 2017

In the hv-24x7 code there is a function memord() which tries to
implement a sort function return -1, 0, 1. However one of the
conditions is incorrect, such that it can never be true, because we
will have already returned.

I don't believe there is a bug in practice though, because the
comparisons are an optimisation prior to calling memcmp().

Fix it by swapping the second comparision, so it can be true.
Reported-by: NDavid Binderman <dcb314@hotmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

05c14c03

13 10月, 2017 1 次提交

powerpc/perf: Fix IMC initialization crash · 0d8ba162

由 Anju T Sudhakar 提交于 10月 13, 2017

Panic observed with latest firmware, and upstream kernel:

 NIP init_imc_pmu+0x8c/0xcf0
 LR  init_imc_pmu+0x2f8/0xcf0
 Call Trace:
   init_imc_pmu+0x2c8/0xcf0 (unreliable)
   opal_imc_counters_probe+0x300/0x400
   platform_drv_probe+0x64/0x110
   driver_probe_device+0x3d8/0x580
   __driver_attach+0x14c/0x1a0
   bus_for_each_dev+0x8c/0xf0
   driver_attach+0x34/0x50
   bus_add_driver+0x298/0x350
   driver_register+0x9c/0x180
   __platform_driver_register+0x5c/0x70
   opal_imc_driver_init+0x2c/0x40
   do_one_initcall+0x64/0x1d0
   kernel_init_freeable+0x280/0x374
   kernel_init+0x24/0x160
   ret_from_kernel_thread+0x5c/0x74

While registering nest imc at init, cpu-hotplug callback
nest_pmu_cpumask_init() makes an OPAL call to stop the engine. And if
the OPAL call fails, imc_common_cpuhp_mem_free() is invoked to cleanup
memory and cpuhotplug setup.

But when cleaning up the attribute group, we are dereferencing the
attribute element array without checking whether the backing element
is not NULL. This causes the kernel panic.

Add a check for the backing element prior to dereferencing the
attribute element, to handle the failing case gracefully.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: NPridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[mpe: Trim change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0d8ba162

12 10月, 2017 2 次提交

powerpc/perf: Add ___GFP_NOWARN flag to alloc_pages_node() · cd4f2b30

由 Anju T Sudhakar 提交于 10月 11, 2017

Stack trace output during a stress test:
 [    4.310049] Freeing initrd memory: 22592K
[    4.310646] rtas_flash: no firmware flash support
[    4.313341] cpuhp/64: page allocation failure: order:0, mode:0x14480c0(GFP_KERNEL|__GFP_ZERO|__GFP_THISNODE), nodemask=(null)
[    4.313465] cpuhp/64 cpuset=/ mems_allowed=0
[    4.313521] CPU: 64 PID: 392 Comm: cpuhp/64 Not tainted 4.11.0-39.el7a.ppc64le #1
[    4.313588] Call Trace:
[    4.313622] [c000000f1fb1b8e0] [c000000000c09388] dump_stack+0xb0/0xf0 (unreliable)
[    4.313694] [c000000f1fb1b920] [c00000000030ef6c] warn_alloc+0x12c/0x1c0
[    4.313753] [c000000f1fb1b9c0] [c00000000030ff68] __alloc_pages_nodemask+0xea8/0x1000
[    4.313823] [c000000f1fb1bbb0] [c000000000113a8c] core_imc_mem_init+0xbc/0x1c0
[    4.313892] [c000000f1fb1bc00] [c000000000113cdc] ppc_core_imc_cpu_online+0x14c/0x170
[    4.313962] [c000000f1fb1bc90] [c000000000125758] cpuhp_invoke_callback+0x198/0x5d0
[    4.314031] [c000000f1fb1bd00] [c00000000012782c] cpuhp_thread_fun+0x8c/0x3d0
[    4.314101] [c000000f1fb1bd60] [c0000000001678d0] smpboot_thread_fn+0x290/0x2a0
[    4.314169] [c000000f1fb1bdc0] [c00000000015ee78] kthread+0x168/0x1b0
[    4.314229] [c000000f1fb1be30] [c00000000000b368] ret_from_kernel_thread+0x5c/0x74
[    4.314313] Mem-Info:
[    4.314356] active_anon:0 inactive_anon:0 isolated_anon:0

core_imc_mem_init() at system boot use alloc_pages_node() to get memory
and alloc_pages_node() throws this stack dump when tried to allocate
memory from a node which has no memory behind it. Add a ___GFP_NOWARN
flag in allocation request as a fix.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
Reported-by: NVenkat R.B <venkatb3@in.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

cd4f2b30

powerpc/perf: Fix for core/nest imc call trace on cpuhotplug · 0d923820

由 Anju T Sudhakar 提交于 10月 04, 2017

Nest/core pmu units are enabled only when it is used. A reference count is
maintained for the events which uses the nest/core pmu units. Currently in
*_imc_counters_release function a WARN() is used for notification of any
underflow of ref count.

The case where event ref count hit a negative value is, when perf session is
started, followed by offlining of all cpus in a given core.
i.e. in cpuhotplug offline path ppc_core_imc_cpu_offline() function set the
ref->count to zero, if the current cpu which is about to offline is the last
cpu in a given core and make an OPAL call to disable the engine in that core.
And on perf session termination, perf->destroy (core_imc_counters_release) will
first decrement the ref->count for this core and based on the ref->count value
an opal call is made to disable the core-imc engine.
Now, since cpuhotplug path already clears the ref->count for core and disabled
the engine, perf->destroy() decrementing again at event termination make it
negative which in turn fires the WARN_ON. The same happens for nest units.

Add a check to see if the reference count is alreday zero, before decrementing
the count, so that the ref count will not hit a negative value.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: NSantosh Sivaraj <santosh@fossix.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0d923820

20 9月, 2017 1 次提交

powerpc/sysrq: Fix oops whem ppmu is not registered · 4917fcb5

由 Ravi Bangoria 提交于 9月 19, 2017

Kernel crashes if power pmu is not registered and user tries to dump
regs with 'echo p > /proc/sysrq-trigger'. Sample log:

  Unable to handle kernel paging request for data at address 0x00000008
  Faulting instruction address: 0xc0000000000d52f0

  NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230
  LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50
  Call Trace:
   printk+0x38/0x4c (unreliable)
   __handle_sysrq+0xe4/0x270
   write_sysrq_trigger+0x64/0x80
   proc_reg_write+0x80/0xd0
   __vfs_write+0x40/0x200
   vfs_write+0xc8/0x240
   SyS_write+0x60/0x110
   system_call+0x58/0x6c

Fixes: 5f6d0380 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values")
Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

4917fcb5

29 8月, 2017 1 次提交

perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7

由 Kan Liang 提交于 8月 28, 2017

For understanding how the workload maps to memory channels and hardware
behavior, it's very important to collect address maps with physical
addresses. For example, 3D XPoint access can only be found by filtering
the physical address.

Add a new sample type for physical address.

perf already has a facility to collect data virtual address. This patch
introduces a function to convert the virtual address to physical address.
The function is quite generic and can be extended to any architecture as
long as a virtual address is provided.

 - For kernel direct mapping addresses, virt_to_phys is used to convert
   the virtual addresses to physical address.

 - For user virtual addresses, __get_user_pages_fast is used to walk the
   pages tables for user physical address.

 - This does not work for vmalloc addresses right now. These are not
   resolved, but code to do that could be added.

The new sample type requires collecting the virtual address. The
virtual address will not be output unless SAMPLE_ADDR is applied.

For security, the physical address can only be exposed to root or
privileged user.
Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NKan Liang <kan.liang@intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: acme@kernel.org
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

fc7ce9c7

17 8月, 2017 2 次提交

powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19

由 Aneesh Kumar K.V 提交于 7月 27, 2017

Add newer helpers to make the function usage simpler. It is always
recommended to use find_current_mm_pte() for walking the page table.
If we cannot use find_current_mm_pte(), it should be documented why
the said usage of __find_linux_pte() is safe against a parallel THP
split.

For now we have KVM code using __find_linux_pte(). This is because kvm
code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
with PACA soft_enabled = 1. We may want to fix that later and make
sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
we can switch kvm to use find_linux_pte().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

94171b19

powerpc/perf: Fix usage of nest_imc_refc · 711bd207

由 Madhavan Srinivasan 提交于 8月 16, 2017

nest_imc_refc is a reference count struct, used to track number of
active perf sessions using the nest units.

Currently the code accesses nest_imc_refc using node_id, which is
incorrect, the array is indexed by node number. Meaning in the case of
sparse node ids we index off the end of the array.

Fix it to use get_nest_pmu_ref() which uses the existing per-cpu
variable local_nest_imc_refc.

Fixes: 885dcd70 ('powerpc/perf: Add nest IMC PMU support')
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Tweak change log]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

711bd207

15 8月, 2017 1 次提交

powerpc/perf/imc: Fix nest events on muti socket system · 7efbae90

由 Anju T 提交于 8月 14, 2017

In a multi node system with discontiguous node ids, nest event values
are not showing up properly. eg. lscpu output:

  NUMA node0 CPU(s): 0-15
  NUMA node8 CPU(s): 16-31

Nest event values on such systems can be counted on CPUs <= 15:

  $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 0-14 -I 1000 sleep 1000
  #           time             counts unit events
       1.000294577    30,17,24,42,880 nest_powerbus0_imc/PM_PB_CYC/

But not on CPUs >= 16:

  $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
  #           time             counts unit events
       1.000049902    <not supported> nest_powerbus0_imc/PM_PB_CYC/

This is because, when fetching the reference count, the node id (which
may be sparse) is used as the array index, not the node number (which
is 0 based and contiguous).

Fix it by using the node number as the array index.

  $./perf stat -e 'nest_powerbus0_imc/PM_PB_CYC/' -C 16-28 -I 1000 sleep 1000
  #           time             counts unit events
       1.000241961    26,12,35,28,704 nest_powerbus0_imc/PM_PB_CYC/
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
[mpe: Change log tweaks for clarity and brevity]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7efbae90

14 8月, 2017 1 次提交

powerpc/perf: Fix double unlock in imc_common_cpuhp_mem_free() · b3376dcc

由 Dan Carpenter 提交于 8月 11, 2017

This function is not called with the nest_init_lock held, and it also
unlocks the nest_init_lock immediately below, so it's fairly clear
that this is a typo and should be locking the lock.

Fixes: 885dcd70 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

b3376dcc

10 8月, 2017 4 次提交

powerpc/perf: Cleanup of PM_BR_CMPL vs. PM_BRU_CMPL in Power9 event list · 93fc5ca9

由 Madhavan Srinivasan 提交于 8月 09, 2017

Fixes: 34922527 ("powerpc/perf: Add power9 event list macros for generic and cache events")
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

93fc5ca9

powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list · 91e0bd1e

由 Madhavan Srinivasan 提交于 7月 31, 2017

Add couple of more events (PM_LD_MISS_L1 and PM_BR_2PATH) to
power9 event list and power9_event_alternatives array (these
events can be counted in more than one PMC).
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

91e0bd1e

powerpc/perf: Factor out PPMU_ONLY_COUNT_RUN check code from power8 · 70a7e720

由 Madhavan Srinivasan 提交于 7月 31, 2017

There are some hardware events on Power systems which only count when
the processor is not idle, and there are some fixed-function counters
which count such events. For example, the "run cycles" event counts
cycles when the processor is not idle. If the user asks to count
cycles, we can use "run cycles" if this is a per-task event, since the
processor is running when the task is running, by definition. We can't
use "run cycles" if the user asks for "cycles" on a system-wide
counter.

Currently in power8 this check is done using PPMU_ONLY_COUNT_RUN flag
in power8_get_alternatives() function. Based on the flag, events are
switched if needed. This function should also be enabled in power9, so
factor out the code to isa207_get_alternatives().

Fixes: efe881af ('powerpc/perf: Factor out event_alternative function')
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

70a7e720

powerpc/perf: Update default sdar_mode value for power9 · 7aa345d8

由 Madhavan Srinivasan 提交于 7月 25, 2017

Commit 20dd4c62 ('powerpc/perf: Fix SDAR_MODE value for continous
sampling on Power9') set the default sdar_mode value in MMCRA[SDAR_MODE]
to be used as 0b01 (Update on TLB miss). And this value is set if sdar_mode
from event is zero, or we are in continous sampling mode in power9 dd1.

But it is preferred to have the sdar_mode value for power9 as
0b10 (Update on dcache miss) for better sampling updates instead
of 0b01 (Update on TLB miss).

From Anton:

Using a bandwidth test case with a 1MB footprint, I profiled cycles and
chose TLB updates of the SDAR:

  $ perf record -d -e r000400000000001E:u ./bw2001 1M
                        ^
                        SDAR TLB

  $ perf report -D | grep PERF_RECORD_SAMPLE | sed 's/.*addr: //' | sort -u | wc -l
  4

  I get 4 unique addresses. If I ran with dcache misses:

  $ perf record -d -e r000800000000001E:u ./bw2001 1M
                        ^
                        SDAR dcache miss

  $ perf report -D|grep PERF_RECORD_SAMPLE| sed 's/.*addr: //'|sort -u | wc -l
  5217

I get 5217 unique addresses. No surprises here, but it does show why
TLB misses is the wrong event to default to - we get very little useful
information out of it.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7aa345d8

25 7月, 2017 3 次提交

powerpc/perf: Add thread IMC PMU support · f74c89bd

由 Anju T Sudhakar 提交于 7月 19, 2017

Add support to register Thread In-Memory Collection PMU counters.
Patch adds thread IMC specific data structures, along with memory
init functions and CPU hotplug support.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: NHemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f74c89bd

powerpc/perf: Add core IMC PMU support · 39a846db

由 Anju T Sudhakar 提交于 7月 19, 2017

Add support to register Core In-Memory Collection PMU counters.
Patch adds core IMC specific data structures, along with memory
init functions and CPU hotplug support.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: NHemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

39a846db

powerpc/perf: Add nest IMC PMU support · 885dcd70

由 Anju T Sudhakar 提交于 7月 19, 2017

Add support to register Nest In-Memory Collection PMU counters.
Patch adds a new device file called "imc-pmu.c" under powerpc/perf
folder to contain all the device PMU functions.

Device tree parser code added to parse the PMU events information
and create sysfs event attributes for the PMU.

Cpumask attribute added along with Cpu hotplug online/offline functions
specific for nest PMU. A new state "CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE"
added for the cpu hotplug callbacks. Error handle path frees the memory
and unregisters the CPU hotplug callbacks.
Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: NHemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

885dcd70

12 7月, 2017 1 次提交

powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events · 3f0bd8da

由 Anton Blanchard 提交于 6月 19, 2017

Similar to POWER8, POWER9 can count run cycles and run instructions
completed on more than one PMU.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

3f0bd8da

11 7月, 2017 1 次提交

powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9 · 20dd4c62

由 Madhavan Srinivasan 提交于 7月 11, 2017

In case of continous sampling (non-marked), the code currently
sets MMCRA[SDAR_MODE] to 0b01 (Update on TLB miss) for Power9 DD1.

On DD2 and later it copies the sdar_mode value from the event code,
which for most events is 0b00 (No updates).

However we must set a non-zero value for SDAR_MODE when doing
continuous sampling, so honor the event code, unless it's zero, in
which case we use use 0b01 (Update on TLB miss).

Fixes: 78b4416a ("powerpc/perf: Handle sdar_mode for marked event in power9")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

20dd4c62

02 7月, 2017 8 次提交

powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8 · bfaa7834

由 Thiago Jung Bauermann 提交于 6月 29, 2017

On POWER9 SMT8 the 24x7 API returns two result elements for physical core
and virtual CPU events and we need to add their counts to get the final
result.
Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bfaa7834

powerpc/perf/hv-24x7: Support v2 of the hypervisor API · 2e6553aa

由 Thiago Jung Bauermann 提交于 6月 29, 2017

POWER9 introduces a new version of the hypervisor API to access the 24x7
perf counters. The new version changed some of the structures used for
requests and results.
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

2e6553aa

powerpc/perf/hv-24x7: Minor improvements · ebd4a5a3

由 Thiago Jung Bauermann 提交于 6月 29, 2017

There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request.

There's also an HV_PERF_DOMAIN_MAX constant, so use it in
h_24x7_event_init. This makes the comment above the check redundant,
so remove it.

In add_event_to_24x7_request, a statement is terminated with a comma
instead of a semicolon. Fix it.

In hv-24x7.h, improve comments in struct hv_24x7_result.
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ebd4a5a3

powerpc/perf/hv-24x7: Fix return value of hcalls · 38d81846

由 Thiago Jung Bauermann 提交于 6月 29, 2017

The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix
this in the code.

The H_GET_24X7_DATA hcall can return a signed error code, so fix this in
the code. Also, don't truncate it to 32 bit to use as return value for
make_24x7_request. In case of error h_24x7_event_commit_txn passes that
return value to generic code, so it should be a proper errno. The other
caller of make_24x7_request is single_24x7_request, whose callers don't
actually care which error code is returned so they are not affected by this
change.

Finally, h_24x7_get_value doesn't use the error code from
single_24x7_request, so there's no need to store it.
Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

38d81846

powerpc-perf/hx-24x7: Don't log failed hcall twice · 62714a14

由 Thiago Jung Bauermann 提交于 6月 29, 2017

make_24x7_request already calls log_24x7_hcall if it fails, so callers
don't have to do it again.

In fact, since the latter is now only called from the former, there's no
need for a separate log_24x7_hcall anymore so remove it.
Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

62714a14

powerpc/perf/hv-24x7: Properly iterate through results · 41f577eb

由 Thiago Jung Bauermann 提交于 6月 29, 2017

hv-24x7.h has a comment mentioning that result_buffer->results can't be
indexed as a normal array because it may contain results of variable sizes,
so fix the loop in h_24x7_event_commit_txn to take the variation into
account when iterating through results.

Another problem in that loop is that it sets h24x7hw->events[i] to NULL.
This assumes that only the i'th result maps to the i'th request, but that
is not guaranteed to be true. We need to leave the event in the array so
that we don't dereference a NULL pointer in case more than one result maps
to one request.

We still assume that each result has only one result element, so warn if
that assumption is violated.
Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

41f577eb

powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer check · 36c8fb2c

由 Thiago Jung Bauermann 提交于 6月 29, 2017

request_buffer can hold 254 requests, so if it already has that number of
entries we can't add a new one.

Also, define constant to show where the number comes from.

Fixes: e3ee15dc ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()")
Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

36c8fb2c

powerpc/perf/hv-24x7: Fix passing of catalog version number · 12bf85a7

由 Thiago Jung Bauermann 提交于 6月 29, 2017

H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from
the first catalog page obtained previously. This is a 64 bit number, but
create_events_from_catalog truncates it to 32-bit.

This worked on POWER8, but POWER9 actually uses the upper bits so the call
fails with H_P3 because the hypervisor doesn't recognize the version.

This patch also adds the hcall return code to the error message, which is
helpful when debugging the problem.

Fixes: 5c5cd7b5 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events")
Reviewed-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

12bf85a7

28 6月, 2017 1 次提交

powerpc/perf: Fix branch event code for power9 · 24bedcb7

由 Madhavan Srinivasan 提交于 6月 25, 2017

Correct "branch" event code of Power9 is "r4d05e". Replace the current
"branch" event code with "r4d05e" and add a hack to use "r10012" as
event code for Power9 DD1.

Fixes: d89f473f ("powerpc/perf: Fix PM_BRU_CMPL event code for power9")
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

24bedcb7

16 6月, 2017 1 次提交

powerpc/perf: Fix oops when kthread execs user process · bf05fc25

由 Ravi Bangoria 提交于 6月 15, 2017

When a kthread calls call_usermodehelper() the steps are:
  1. allocate current->mm
  2. load_elf_binary()
  3. populate current->thread.regs

While doing this, interrupts are not disabled. If there is a perf
interrupt in the middle of this process (i.e. step 1 has completed
but not yet reached to step 3) and if perf tries to read userspace
regs, kernel oops with following log:

  Unable to handle kernel paging request for data at address 0x00000000
  Faulting instruction address: 0xc0000000000da0fc
  ...
  Call Trace:
  perf_output_sample_regs+0x6c/0xd0
  perf_output_sample+0x4e4/0x830
  perf_event_output_forward+0x64/0x90
  __perf_event_overflow+0x8c/0x1e0
  record_and_restart+0x220/0x5c0
  perf_event_interrupt+0x2d8/0x4d0
  performance_monitor_exception+0x54/0x70
  performance_monitor_common+0x158/0x160
  --- interrupt: f01 at avtab_search_node+0x150/0x1a0
      LR = avtab_search_node+0x100/0x1a0
  ...
  load_elf_binary+0x6e8/0x15a0
  search_binary_handler+0xe8/0x290
  do_execveat_common.isra.14+0x5f4/0x840
  call_usermodehelper_exec_async+0x170/0x210
  ret_from_kernel_thread+0x5c/0x7c

Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace
pt_regs are not set.

Fixes: ed4a4ef8 ("powerpc/perf: Add support for sampling interrupt register state")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

bf05fc25

06 6月, 2017 1 次提交

powerpc/perf: Fix Power9 test_adder fields · 8c218578

由 Madhavan Srinivasan 提交于 5月 26, 2017

Commit 8d911904 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1')
was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable
the use of PMC5 using raw event code. But instead of updating the
power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the
power9_pmu structure. Fix it.

Fixes: 8d911904 ("powerpc/perf: Add restrictions to PMC5 in power9 DD1")
Reported-by: NShriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Tested-by: NShriya <shriyak@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

8c218578

19 4月, 2017 4 次提交

powerpc/perf: Add Power8 mem_access event to sysfs · f2080b9a

由 Madhavan Srinivasan 提交于 4月 11, 2017

Patch add "mem_access" event to sysfs. This as-is not a raw event
supported by Power8 pmu. Instead, it is formed based on
raw event encoding specificed in isa207-common.h.

Primary PMU event used here is PM_MRK_INST_CMPL.
This event tracks only the completed marked instructions.

Random sampling mode (MMCRA[SM]) with Random Instruction
Sampling (RIS) is enabled to mark type of instructions.

With Random sampling in RLS mode with PM_MRK_INST_CMPL event,
the LDST /DATA_SRC fields in SIER identifies the memory
hierarchy level (eg: L1, L2 etc) statisfied a data-cache
miss for a marked instruction.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

f2080b9a

powerpc/perf: Support to export SIERs bit in Power9 · d148c94c

由 Madhavan Srinivasan 提交于 4月 11, 2017

Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d148c94c

powerpc/perf: Support to export SIERs bit in Power8 · 453ce7a9

由 Madhavan Srinivasan 提交于 4月 11, 2017

Patch to export SIER bits to userspace via
perf_mem_data_src and perf_sample_data struct.
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

453ce7a9

powerpc/perf: Support to export MMCRA[TEC*] field to userspace · 170a315f

由 Madhavan Srinivasan 提交于 4月 11, 2017

Threshold feature when used with MMCRA [Threshold Event Counter Event],
MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update
MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event
Counter Multiplier] with the corresponding threshold event count values.
Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of
struct perf_sample_data.
Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

170a315f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功