- 04 12月, 2017 1 次提交
-
-
由 Ravi Bangoria 提交于
When user tries to group imc (In-Memory Collections) event with normal event, (sometime) kernel crashes with following log: Faulting instruction address: 0x00000000 [link register ] c00000000010ce88 power_check_constraints+0x128/0x980 ... c00000000010e238 power_pmu_event_init+0x268/0x6f0 c0000000002dc60c perf_try_init_event+0xdc/0x1a0 c0000000002dce88 perf_event_alloc+0x7b8/0xac0 c0000000002e92e0 SyS_perf_event_open+0x530/0xda0 c00000000000b004 system_call+0x38/0xe0 'event_base' field of 'struct hw_perf_event' is used as flags for normal hw events and used as memory address for imc events. While grouping these two types of events, collect_events() tries to interpret imc 'event_base' as a flag, which causes a corruption resulting in a crash. Consider only those events which belongs to 'perf_hw_context' in collect_events(). Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-By: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 9月, 2017 1 次提交
-
-
由 Ravi Bangoria 提交于
Kernel crashes if power pmu is not registered and user tries to dump regs with 'echo p > /proc/sysrq-trigger'. Sample log: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc0000000000d52f0 NIP [c0000000000d52f0] perf_event_print_debug+0x10/0x230 LR [c00000000058a938] sysrq_handle_showregs+0x38/0x50 Call Trace: printk+0x38/0x4c (unreliable) __handle_sysrq+0xe4/0x270 write_sysrq_trigger+0x64/0x80 proc_reg_write+0x80/0xd0 __vfs_write+0x40/0x200 vfs_write+0xc8/0x240 SyS_write+0x60/0x110 system_call+0x58/0x6c Fixes: 5f6d0380 ("powerpc/perf: Define perf_event_print_debug() to print PMU register values") Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 29 8月, 2017 1 次提交
-
-
由 Kan Liang 提交于
For understanding how the workload maps to memory channels and hardware behavior, it's very important to collect address maps with physical addresses. For example, 3D XPoint access can only be found by filtering the physical address. Add a new sample type for physical address. perf already has a facility to collect data virtual address. This patch introduces a function to convert the virtual address to physical address. The function is quite generic and can be extended to any architecture as long as a virtual address is provided. - For kernel direct mapping addresses, virt_to_phys is used to convert the virtual addresses to physical address. - For user virtual addresses, __get_user_pages_fast is used to walk the pages tables for user physical address. - This does not work for vmalloc addresses right now. These are not resolved, but code to do that could be added. The new sample type requires collecting the virtual address. The virtual address will not be output unless SAMPLE_ADDR is applied. For security, the physical address can only be exposed to root or privileged user. Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: mpe@ellerman.id.au Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 4月, 2017 2 次提交
-
-
由 Madhavan Srinivasan 提交于
Threshold feature when used with MMCRA [Threshold Event Counter Event], MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event Counter Multiplier] with the corresponding threshold event count values. Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of struct perf_sample_data. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
The LDST field and DATA_SRC in SIER identifies the memory hierarchy level (eg: L1, L2 etc), from which a data-cache miss for a marked instruction was satisfied. Use the 'perf_mem_data_src' object to export this hierarchy level to user space. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 09 3月, 2017 1 次提交
-
-
由 Madhavan Srinivasan 提交于
Power9 DD1 do not support PMU_HAS_SIER flag and sdsync in perf_get_data_addr() defaults to MMCRA_SDSYNC which is wrong. Since power9 MMCRA does not support SDSYNC bit, patch includes PPMU_NO_SIAR flag to the check and set the sdsync with MMCRA_SAMPLE_ENABLE; Fixes: 27593d72 ("powerpc/perf: Use MSR to report privilege level on P9 DD1") Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 2月, 2017 2 次提交
-
-
由 Madhavan Srinivasan 提交于
Cleanup to use is_kernel_addr macro. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Madhavan Srinivasan 提交于
Since PM_INST_DISP include speculative instruction, based on the workload the dispatch count could vary considerably. Hence as an alternative, for completed instruction counting, program the PM_INST_DISP event to the MMCR* but use Instruction Counter register value. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 18 1月, 2017 1 次提交
-
-
由 Madhavan Srinivasan 提交于
SIER and SIAR are not updated correctly for some samples, so force the use of MSR and regs->nip instead for misc_flag updates. This is done by adding a new ppmu flag and updating the use_siar logic in perf_read_regs() to use it, and dropping the PPMU_HAS_SIER flag. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Rename flag to PPMU_NO_SIAR, and also drop PPMU_HAS_SIER] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
When the state names got added a script was used to add the extra argument to the calls. The script basically converted the state constant to a string, but the cleanup to convert these strings into meaningful ones did not happen. Replace all the useless strings with 'subsys/xxx/yyy:state' strings which are used in all the other places already. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 13 9月, 2016 1 次提交
-
-
由 Daniel Axtens 提交于
Squash a bunch of sparse warnings by making things static. Reviewed-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 7月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
Install the callbacks via the state machine and let the core invoke the callbacks on the already online CPUs. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Reviewed-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20160713153334.345786236@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 6月, 2016 1 次提交
-
-
由 Michael Ellerman 提交于
Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 3月, 2016 1 次提交
-
-
由 Madhavan Srinivasan 提交于
The current comment in pmao_restore_workaround() regarding hard_irq_disable() is wrong. It should say to hard *disable* interrupts instead of *enable*. Fix it. Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 13 9月, 2015 2 次提交
-
-
由 Sukadev Bhattiprolu 提交于
We currently use PERF_EVENT_TXN flag to determine if we are in the middle of a transaction. If in a transaction, we defer the schedulability checks from pmu->add() operation to the pmu->commit() operation. Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ) we can use the type to determine if we are in a transaction and drop the PERF_EVENT_TXN flag. When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures becomes unused, so drop that field as well. This is an extension of the Powerpc patch from Peter Zijlstra to s390, Sparc and x86 architectures. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sukadev Bhattiprolu 提交于
Currently, the PMU interface allows reading only one counter at a time. But some PMUs like the 24x7 counters in Power, support reading several counters at once. To leveage this functionality, extend the transaction interface to support a "transaction type". The first type, PERF_PMU_TXN_ADD, refers to the existing transactions, i.e. used to _schedule_ all the events on the PMU as a group. A second transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch, by the 24x7 counters to read several counters at once. Extend the transaction interfaces to the PMU to accept a 'txn_flags' parameter and use this parameter to ignore any transactions that are not of type PERF_PMU_TXN_ADD. Thanks to Peter Zijlstra for his input. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [peterz: s390 compile fix] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 7月, 2015 1 次提交
-
-
由 Anshuman Khandual 提交于
This patch just changes data type of bhrb_users variable from int to unsigned int because it never contains a negative value. Reported-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 6月, 2015 1 次提交
-
-
由 Anton Blanchard 提交于
When we take a PMU exception or a software event we call perf_read_regs(). This overloads regs->result with a boolean that describes if we should use the sampled instruction address register (SIAR) or the regs. If the exception is in kernel, we start with the kernel regs and backtrace through the kernel stack. At this point we switch to the userspace regs and backtrace the user stack with perf_callchain_user(). Unfortunately these regs have not got the perf_read_regs() treatment, so regs->result could be anything. If it is non zero, perf_instruction_pointer() decides to use the SIAR, and we get issues like this: 0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--52.35%-- 0 | | | |--46.39%-- __hrtimer_start_range_ns | | kvmppc_run_core | | kvmppc_vcpu_run_hv | | kvmppc_vcpu_run | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call | | | | | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum | | | | | | | --100.00%-- 0x7e714 | | | 0x7e714 Notice the bogus _raw_spin_irqsave when we transition from kernel (system_call) to userspace (0x7e714). We inserted what was in the SIAR. Add a check in regs_use_siar() to check that the regs in question are from a PMU exception. With this fix the backtrace makes sense: 0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave | ---_raw_spin_lock_irqsave | |--53.83%-- 0 | | | |--44.73%-- hrtimer_try_to_cancel | | kvmppc_start_thread | | kvmppc_run_core | | kvmppc_vcpu_run_hv | | kvmppc_vcpu_run | | kvm_arch_vcpu_ioctl_run | | kvm_vcpu_ioctl | | do_vfs_ioctl | | sys_ioctl | | system_call | | __ioctl | | 0x7e714 | | 0x7e714 Cc: stable@vger.kernel.org Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 3月, 2015 1 次提交
-
-
由 Jan Stancek 提交于
One path in power_pmu_event_init() calls get_cpu_var(), but is missing matching call to put_cpu_var(), which causes preemption imbalance and crash in user-space: Page fault in user mode with in_atomic() = 1 mm = c000001fefa5a280 NIP = 3fff9bf2cae0 MSR = 900000014280f032 Oops: Weird page fault, sig: 11 [#23] SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: <snip> CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 task: c000001fe82c9200 ti: c000001fe835c000 task.ti: c000001fe835c000 NIP: 00003fff9bf2cae0 LR: 00003fff9bee4898 CTR: 00003fff9bf2cae0 REGS: c000001fe835fea0 TRAP: 0401 Tainted: G D (4.0.0-rc5+) MSR: 900000014280f032 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI> CR: 22000028 XER: 00000000 CFAR: 00003fff9bee4894 SOFTE: 1 GPR00: 00003fff9bee494c 00003fffe01c2ee0 00003fff9c084410 0000000010020068 GPR04: 0000000000000000 0000000000000002 0000000000000008 0000000000000001 GPR08: 0000000000000001 00003fff9c074a30 00003fff9bf2cae0 00003fff9bf2cd70 GPR12: 0000000052000022 00003fff9c10b700 NIP [00003fff9bf2cae0] 0x3fff9bf2cae0 LR [00003fff9bee4898] 0x3fff9bee4898 Call Trace: ---[ end trace 5d3d952b5d4185d4 ]--- BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41 in_atomic(): 1, irqs_disabled(): 0, pid: 10285, name: a.out INFO: lockdep is turned off. CPU: 43 PID: 10285 Comm: a.out Tainted: G D 4.0.0-rc5+ #1 Call Trace: [c000001fe835f990] [c00000000089c014] .dump_stack+0x98/0xd4 (unreliable) [c000001fe835fa10] [c0000000000e4138] .___might_sleep+0x1d8/0x2e0 [c000001fe835faa0] [c000000000888da8] .down_read+0x38/0x110 [c000001fe835fb30] [c0000000000bf2f4] .exit_signals+0x24/0x160 [c000001fe835fbc0] [c0000000000abde0] .do_exit+0xd0/0xe70 [c000001fe835fcb0] [c00000000001f4c4] .die+0x304/0x450 [c000001fe835fd60] [c00000000088e1f4] .do_page_fault+0x2d4/0x900 [c000001fe835fe30] [c000000000008664] handle_page_fault+0x10/0x30 note: a.out[10285] exited with preempt_count 1 Reproducer: #include <stdio.h> #include <unistd.h> #include <syscall.h> #include <sys/types.h> #include <sys/stat.h> #include <linux/perf_event.h> #include <linux/hw_breakpoint.h> static struct perf_event_attr event = { .type = PERF_TYPE_RAW, .size = sizeof(struct perf_event_attr), .sample_type = PERF_SAMPLE_BRANCH_STACK, .branch_sample_type = PERF_SAMPLE_BRANCH_ANY_RETURN, }; int main() { syscall(__NR_perf_event_open, &event, 0, -1, -1, 0); } Signed-off-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 19 2月, 2015 1 次提交
-
-
由 Peter Zijlstra 提交于
The recent LBR rework for x86 left a stray flush_branch_stack() user in the PowerPC code, fix that up. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Tejun Heo <tj@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 11月, 2014 1 次提交
-
-
由 Christoph Lameter 提交于
This still has not been merged and now powerpc is the only arch that does not have this change. Sorry about missing linuxppc-dev before. V2->V2 - Fix up to work against 3.18-rc1 __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: NChristoph Lameter <cl@linux.com> [mpe: Fix build errors caused by set/or_softirq_pending(), and rework assignment in __set_breakpoint() to use memcpy().] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 9月, 2014 1 次提交
-
-
由 Anton Blanchard 提交于
Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 8月, 2014 2 次提交
-
-
由 Tejun Heo 提交于
This reverts commit 5828f666 due to build failure after merging with pending powerpc changes. Link: http://lkml.kernel.org/g/20140827142243.6277eaff@canb.auug.org.auSigned-off-by: NTejun Heo <tj@kernel.org> Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Christoph Lameter 提交于
__get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) tj: Folded a fix patch. http://lkml.kernel.org/g/alpine.DEB.2.11.1408172143020.9652@gentwo.org Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 28 7月, 2014 3 次提交
-
-
由 Michael Ellerman 提交于
Power8 has a new register (MMCR2), which contains individual freeze bits for each counter. This is an improvement on previous chips as it means we can have multiple events on the PMU at the same time with different exclude_{user,kernel,hv} settings. Previously we had to ensure all events on the PMU had the same exclude settings. The core of the patch is fairly simple. We use the 207S feature flag to indicate that the PMU backend supports per-event excludes, if it's set we skip the generic logic that enforces the equality of excludes between events. We also use that flag to skip setting the freeze bits in MMCR0, the PMU backend is expected to have handled setting them in MMCR2. The complication arises with EBB. The FCxP bits in MMCR2 are accessible R/W to a task using EBB. Which means a task using EBB will be able to see that we are using MMCR2 for freezing, whereas the old logic which used MMCR0 is not user visible. The task can not see or affect exclude_kernel & exclude_hv, so we only need to consider exclude_user. The table below summarises the behaviour both before and after this commit is applied: exclude_user true false ------------------------------------ | User visible | N N Before | Can freeze | Y Y | Can unfreeze | N Y ------------------------------------ | User visible | Y Y After | Can freeze | Y Y | Can unfreeze | Y/N Y ------------------------------------ So firstly I assert that the simple visibility of the exclude_user setting in MMCR2 is a non-issue. The event belongs to the task, and was most likely created by the task. So the exclude_user setting is not privileged information in any way. Secondly, the behaviour in the exclude_user = false case is unchanged. This is important as it is the case that is actually useful, ie. the event is created with no exclude setting and the task uses MMCR2 to implement exclusion manually. For exclude_user = true there is no meaningful change to freezing the event. Previously the task could use MMCR2 to freeze the event, though it was already frozen with MMCR0. With the new code the task can use MMCR2 to freeze the event, though it was already frozen with MMCR2. The only real change is when exclude_user = true and the task tries to use MMCR2 to unfreeze the event. Previously this had no effect, because the event was already frozen in MMCR0. With the new code the task can unfreeze the event in MMCR2, but at some indeterminate time in the future the kernel will overwrite its setting and refreeze the event. Therefore my final assertion is that any task using exclude_user = true and also fiddling with MMCR2 was deeply confused before this change, and remains so after it. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
To support per-event exclude settings on Power8 we need access to the struct perf_events in compute_mmcr(). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Because we reuse cpuhw->mmcr on each call to compute_mmcr() there's a risk that we could forget to set one of the values and use whatever value was in there previously. Currently all the implementations are careful to set all the values, but it's safer to clear them all before we call compute_mmcr(). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 23 7月, 2014 1 次提交
-
-
由 Michael Ellerman 提交于
In the recent commit b50a6c58 "Clear MMCR2 when enabling PMU", I screwed up the handling of MMCR2 for tasks using EBB. We must make sure we set MMCR2 *before* ebb_switch_in(), otherwise we overwrite the value of MMCR2 that userspace may have written. That potentially breaks a task that uses EBB and manually uses MMCR2 for event freezing. Fixes: b50a6c58 ("powerpc/perf: Clear MMCR2 when enabling PMU") Cc: stable@vger.kernel.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 7月, 2014 3 次提交
-
-
由 Anton Blanchard 提交于
We are seeing a lot of PMU warnings on POWER8: Can't find PMC that caused IRQ Looking closer, the active PMC is 0 at this point and we took a PMU exception on the transition from negative to 0. Some versions of POWER8 have an issue where they edge detect and not level detect PMC overflows. A number of places program the PMC with (0x80000000 - period_left), where period_left can be negative. We can either fix all of these or just ensure that period_left is always >= 1. This patch takes the second option. Cc: <stable@vger.kernel.org> Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Joel Stanley 提交于
On POWER8 when switching to a KVM guest we set bits in MMCR2 to freeze the PMU counters. Aside from on boot they are then never reset, resulting in stuck perf counters for any user in the guest or host. We now set MMCR2 to 0 whenever enabling the PMU, which provides a sane state for perf to use the PMU counters under either the guest or the host. This was manifesting as a bug with ppc64_cpu --frequency: $ sudo ppc64_cpu --frequency WARNING: couldn't run on cpu 0 WARNING: couldn't run on cpu 8 ... WARNING: couldn't run on cpu 144 WARNING: couldn't run on cpu 152 min: 18446744073.710 GHz (cpu -1) max: 0.000 GHz (cpu -1) avg: 0.000 GHz The command uses a perf counter to measure CPU cycles over a fixed amount of time, in order to approximate the frequency of the machine. The counters were returning zero once a guest was started, regardless of weather it was still running or had been shut down. By dumping the value of MMCR2, it was observed that once a guest is running MMCR2 is set to 1s - which stops counters from running: $ sudo sh -c 'echo p > /proc/sysrq-trigger' CPU: 0 PMU registers, ppmu = POWER8 n_counters = 6 PMC1: 5b635e38 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000 PMC5: 1bf5a646 PMC6: 5793d378 PMC7: deadbeef PMC8: deadbeef MMCR0: 0000000080000000 MMCR1: 000000001e000000 MMCRA: 0000040000000000 MMCR2: fffffffffffffc00 EBBHR: 0000000000000000 EBBRR: 0000000000000000 BESCR: 0000000000000000 SIAR: 00000000000a51cc SDAR: c00000000fc40000 SIER: 0000000001000000 This is done unconditionally in book3s_hv_interrupts.S upon entering the guest, and the original value is only save/restored if the host has indicated it was using the PMU. This is okay, however the user of the PMU needs to ensure that it is in a defined state when it starts using it. Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support") Cc: stable@vger.kernel.org Signed-off-by: NJoel Stanley <joel@jms.id.au> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Joel Stanley 提交于
Instead of separate bits for every POWER8 PMU feature, have a single one for v2.07 of the architecture. This saves us adding a MMCR2 define for a future patch. Cc: stable@vger.kernel.org Signed-off-by: NJoel Stanley <joel@jms.id.au> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 3月, 2014 4 次提交
-
-
由 Michael Ellerman 提交于
The previous commit added constraint and register handling to allow processes using EBB (Event Based Branches) to request access to the BHRB (Branch History Rolling Buffer). With that in place we can allow processes using EBB to access the BHRB. This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We must also clear BHRBA when we are disabling. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Although we already block EBB events which request sampling using sample_period, technically it's possible for an event to set sample_type but not sample_period. Nothing terrible will happen if an EBB event does specify sample_type, but it signals a major confusion on the part of userspace, and so we do them the favor of rejecting it. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Some power8 revisions have a hardware bug where we can lose a PMU exception, this commit adds a workaround to detect the bad condition and rectify the situation. See the comment in the commit for a full description. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anshuman Khandual 提交于
Currently the sysrq ShowRegs command does not print any PMU registers as we have an empty definition for perf_event_print_debug(). This patch defines perf_event_print_debug() to print various PMU registers. Example output: CPU: 0 PMU registers, ppmu = POWER7 n_counters = 6 PMC1: 00000000 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000 PMC5: 00000000 PMC6: 00000000 PMC7: deadbeef PMC8: deadbeef MMCR0: 0000000080000000 MMCR1: 0000000000000000 MMCRA: 0f00000001000000 SIAR: 0000000000000000 SDAR: 0000000000000000 SIER: 0000000000000000 Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fix 32 bit build and rework formatting for compactness] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 2月, 2014 1 次提交
-
-
由 Anshuman Khandual 提交于
Right now the config_bhrb() PMU specific call happens after write_mmcr0(), which actually enables the PMU for event counting and interrupts. So there is a small window of time where the PMU and BHRB runs without the required HW branch filter (if any) enabled in BHRB. This can cause some of the branch samples to be collected through BHRB without any filter applied and hence affects the correctness of the results. This patch moves the BHRB config function call before enabling interrupts. Here are some data points captured via trace prints which depicts how we could get PMU interrupts with BHRB filter NOT enabled with a standard perf record command line (asking for branch record information as well). $ perf record -j any_call ls Before the patch:- ls-1962 [003] d... 2065.299590: .perf_event_interrupt: MMCRA: 40000000000 ls-1962 [003] d... 2065.299603: .perf_event_interrupt: MMCRA: 40000000000 ... All the PMU interrupts before this point did not have the requested HW branch filter enabled in the MMCRA. ls-1962 [003] d... 2065.299647: .perf_event_interrupt: MMCRA: 40040000000 ls-1962 [003] d... 2065.299662: .perf_event_interrupt: MMCRA: 40040000000 After the patch:- ls-1850 [008] d... 190.311828: .perf_event_interrupt: MMCRA: 40040000000 ls-1850 [008] d... 190.311848: .perf_event_interrupt: MMCRA: 40040000000 All the PMU interrupts have the requested HW BHRB branch filter enabled in MMCRA. Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fixed up whitespace and cleaned up changelog] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 14 8月, 2013 1 次提交
-
-
由 Anton Blanchard 提交于
Address some of the trivial sparse warnings in arch/powerpc. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 8月, 2013 1 次提交
-
-
由 Michael Ellerman 提交于
We use bit 63 of the event code for userspace to request that the event be counted using EBB (Event Based Branches). Export this value, making it part of the API - though only on processors that support EBB. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 7月, 2013 1 次提交
-
-
由 Anshuman Khandual 提交于
When the task moves around the system, the corresponding cpuhw per cpu strcuture should be popullated with the BHRB filter request value so that PMU could be configured appropriately with that during the next call into power_pmu_enable(). Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Acked-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 01 7月, 2013 1 次提交
-
-
由 Michael Ellerman 提交于
Add support for EBB (Event Based Branches) on 64-bit book3s. See the included documentation for more details. EBBs are a feature which allows the hardware to branch directly to a specified user space address when a PMU event overflows. This can be used by programs for self-monitoring with no kernel involvement in the inner loop. Most of the logic is in the generic book3s code, primarily to avoid a proliferation of PMU callbacks. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-