- 05 7月, 2021 1 次提交
-
-
由 Thomas Richter 提交于
Commit cf6acb8b ("s390/cpumf: Add support for complete counter set extraction") allows access to the CPU Measurement Counter Facility via character device /dev/hwctr. The access was exclusive via this device or via perf_event_open() system call. Only one path at a time was permitted. The CPU Measurement Counter Facility device driver blocked access to other processes. This patch removes this restriction and allows concurrent access to the CPU Measurement Counter Facility from multiple processes at the same time via perf_event_open() SVC and via /dev/hwctr device. The access via /dev/hwctr device is still exclusive, only one process is allowed to access this device. This patch - moves the /dev/hwctr device access from file perf_cpum_cf_diag.c. to file perf_cpum_cf.c. - use only one trace buffer .../s390dbf/cpum_cf. - remove cfset_csd structure and includes its members it into the structure cpu_cf_events. This results in one data structure and simplifies the access. - rework function familiy ctr_set_enable, ctr_set_disable, ctr_set_start and ctr_set_stop which operate on a counter set number. Now they operate on a counter set bit mask. - move CF_DIAG event functionality to file perf_cpum_cf.c. It now contains the complete functionality of the CPU Measurement Counter Facility: - Performance measurement support for counters using perf stat. - Support for complete counter set extraction with device /dev/hwctr. - Support for counter set extraction event CF_DIAG attached to samples using perf record. - removes file perf_cpum_cf_diag.c Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Reviewed-by: NSumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 30 6月, 2021 1 次提交
-
-
由 Mel Gorman 提交于
The PCP high watermark is based on the number of online CPUs so the watermarks must be adjusted during CPU hotplug. At the time of hot-remove, the number of online CPUs is already adjusted but during hot-add, a delta needs to be applied to update PCP to the correct value. After this patch is applied, the high watermarks are adjusted correctly. # grep high: /proc/zoneinfo | tail -1 high: 649 # echo 0 > /sys/devices/system/cpu/cpu4/online # grep high: /proc/zoneinfo | tail -1 high: 664 # echo 1 > /sys/devices/system/cpu/cpu4/online # grep high: /proc/zoneinfo | tail -1 high: 649 Link: https://lkml.kernel.org/r/20210525080119.5455-4-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 4月, 2021 1 次提交
-
-
由 Tom Zanussi 提交于
Implement the IDXD performance monitor capability (named 'perfmon' in the DSA (Data Streaming Accelerator) spec [1]), which supports the collection of information about key events occurring during DSA and IAX (Intel Analytics Accelerator) device execution, to assist in performance tuning and debugging. The idxd perfmon support is implemented as part of the IDXD driver and interfaces with the Linux perf framework. It has several features in common with the existing uncore pmu support: - it does not support sampling - does not support per-thread counting However it also has some unique features not present in the core and uncore support: - all general-purpose counters are identical, thus no event constraints - operation is always system-wide While the core perf subsystem assumes that all counters are by default per-cpu, the uncore pmus are socket-scoped and use a cpu mask to restrict counting to one cpu from each socket. IDXD counters use a similar strategy but expand the scope even further; since IDXD counters are system-wide and can be read from any cpu, the IDXD perf driver picks a single cpu to do the work (with cpu hotplug notifiers to choose a different cpu if the chosen one is taken off-line). More specifically, the perf userspace tool by default opens a counter for each cpu for an event. However, if it finds a cpumask file associated with the pmu under sysfs, as is the case with the uncore pmus, it will open counters only on the cpus specified by the cpumask. Since perfmon only needs to open a single counter per event for a given IDXD device, the perfmon driver will create a sysfs cpumask file for the device and insert the first cpu of the system into it. When a user uses perf to open an event, perf will open a single counter on the cpu specified by the cpu mask. This amounts to the default system-wide rather than per-cpu counting mentioned previously for perfmon pmu events. In order to keep the cpu mask up-to-date, the driver implements cpu hotplug support for multiple devices, as IDXD usually enumerates and registers more than one idxd device. The perfmon driver implements basic perfmon hardware capability discovery and configuration, and is initialized by the IDXD driver's probe function. During initialization, the driver retrieves the total number of supported performance counters, the pmu ID, and the device type from idxd device, and registers itself under the Linux perf framework. The perf userspace tool can be used to monitor single or multiple events depending on the given configuration, as well as event groups, which are also supported by the perfmon driver. The user configures events using the perf tool command-line interface by specifying the event and corresponding event category, along with an optional set of filters that can be used to restrict counting to specific work queues, traffic classes, page and transfer sizes, and engines (See [1] for specifics). With the configuration specified by the user, the perf tool issues a system call passing that information to the kernel, which uses it to initialize the specified event(s). The event(s) are opened and started, and following termination of the perf command, they're stopped. At that point, the perfmon driver will read the latest count for the event(s), calculate the difference between the latest counter values and previously tracked counter values, and display the final incremental count as the event count for the cycle. An overflow handler registered on the IDXD irq path is used to account for counter overflows, which are signaled by an overflow interrupt. Below are a couple of examples of perf usage for monitoring DSA events. The following monitors all events in the 'engine' category. Becuuse no filters are specified, this captures all engine events for the workload, which in this case is 19 iterations of the work generated by the kernel dmatest module. Details describing the events can be found in Appendix D of [1], Performance Monitoring Events, but briefly they are: event 0x1: total input data processed, in 32-byte units event 0x2: total data written, in 32-byte units event 0x4: number of work descriptors that read the source event 0x8: number of work descriptors that write the destination event 0x10: number of work descriptors dispatched from batch descriptors event 0x20: number of work descriptors dispatched from work queues # perf stat -e dsa0/event=0x1,event_category=0x1/, dsa0/event=0x2,event_category=0x1/, dsa0/event=0x4,event_category=0x1/, dsa0/event=0x8,event_category=0x1/, dsa0/event=0x10,event_category=0x1/, dsa0/event=0x20,event_category=0x1/ modprobe dmatest channel=dma0chan0 timeout=2000 iterations=19 run=1 wait=1 Performance counter stats for 'system wide': 5,332 dsa0/event=0x1,event_category=0x1/ 5,327 dsa0/event=0x2,event_category=0x1/ 19 dsa0/event=0x4,event_category=0x1/ 19 dsa0/event=0x8,event_category=0x1/ 0 dsa0/event=0x10,event_category=0x1/ 19 dsa0/event=0x20,event_category=0x1/ 21.977436186 seconds time elapsed The command below illustrates filter usage with a simple example. It specifies that MEM_MOVE operations should be counted for the DSA device dsa0 (event 0x8 corresponds to the EV_MEM_MOVE event - Number of Memory Move Descriptors, which is part of event category 0x3 - Operations. The detailed category and event IDs are available in Appendix D, Performance Monitoring Events, of [1]). In addition to the event and event category, a number of filters are also specified (the detailed filter values are available in Chapter 6.4 (Filter Support) of [1]), which will restrict counting to only those events that meet all of the filter criteria. In this case, the filters specify that only MEM_MOVE operations that are serviced by work queue wq0 and specifically engine number engine0 and traffic class tc0 having sizes between 0 and 4k and page size of between 0 and 1G result in a counter hit; anything else will be filtered out and not appear in the final count. Note that filters are optional - any filter not specified is assumed to be all ones and will pass anything. # perf stat -e dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7, filter_eng=0x1,event=0x8,event_category=0x3/ modprobe dmatest channel=dma0chan0 timeout=2000 iterations=19 run=1 wait=1 Performance counter stats for 'system wide': 19 dsa0/filter_wq=0x1,filter_tc=0x1,filter_sz=0x7, filter_eng=0x1,event=0x8,event_category=0x3/ 21.865914091 seconds time elapsed The output above reflects that the unspecified workload resulted in the counting of 19 MEM_MOVE operation events that met the filter criteria. [1]: https://software.intel.com/content/www/us/en/develop/download/intel-data-streaming-accelerator-preliminary-architecture-specification.html [ Based on work originally by Jing Lin. ] Reviewed-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NTom Zanussi <tom.zanussi@linux.intel.com> Link: https://lore.kernel.org/r/0c5080a7d541904c4ad42b848c76a1ce056ddac7.1619276133.git.zanussi@kernel.orgSigned-off-by: NVinod Koul <vkoul@kernel.org>
-
- 08 4月, 2021 2 次提交
-
-
由 Tony Lindgren 提交于
There is a timer wrap issue on dra7 for the ARM architected timer. In a typical clock configuration the timer fails to wrap after 388 days. To work around the issue, we need to use timer-ti-dm percpu timers instead. Let's configure dmtimer3 and 4 as percpu timers by default, and warn about the issue if the dtb is not configured properly. Let's do this as a single patch so it can be backported to v5.8 and later kernels easily. Note that this patch depends on earlier timer-ti-dm systimer posted mode fixes, and a preparatory clockevent patch "clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue". For more information, please see the errata for "AM572x Sitara Processors Silicon Revisions 1.1, 2.0": https://www.ti.com/lit/er/sprz429m/sprz429m.pdf The concept is based on earlier reference patches done by Tero Kristo and Keerthy. Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <kristo@kernel.org> Signed-off-by: NTony Lindgren <tony@atomide.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210323074326.28302-3-tony@atomide.com
-
由 Hector Martin 提交于
This is the root interrupt controller used on Apple ARM SoCs such as the M1. This irqchip driver performs multiple functions: * Handles both IRQs and FIQs * Drives the AIC peripheral itself (which handles IRQs) * Dispatches FIQs to downstream hard-wired clients (currently the ARM timer). * Implements a virtual IPI multiplexer to funnel multiple Linux IPIs into a single hardware IPI Reviewed-by: NMarc Zyngier <maz@kernel.org> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NHector Martin <marcan@marcan.st>
-
- 07 4月, 2021 2 次提交
-
-
由 John Garry 提交于
Now that the core code handles flushing per-IOVA domain CPU rcaches, remove the handling here. Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1616675401-151997-3-git-send-email-john.garry@huawei.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
由 John Garry 提交于
Like the Intel IOMMU driver already does, flush the per-IOVA domain CPU rcache when a CPU goes offline - there's no point in keeping it. Reviewed-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1616675401-151997-2-git-send-email-john.garry@huawei.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 25 3月, 2021 2 次提交
-
-
由 Shaokun Zhang 提交于
On HiSilicon Hip09 platform, there is a PA (Protocol Adapter) module on each chip SICL (Super I/O Cluster) which incorporates three Hydra interface and facilitates the cache coherency between the dies on the chip. While PA uncore PMU model is the same as other Hip09 PMU modules and many PMU events are supported. Let's support the PMU driver using the HiSilicon uncore PMU framework. PA PMU supports the following filter functions: * tracetag_en: allows user to count events according to tt_req or tt_core set in L3C PMU. It's the same as other PMUs. * srcid_cmd & srcid_msk: allows user to filter statistics that come from specific CCL/ICL by configuration source ID. * tgtid_cmd & tgtid_msk: it is the similar function to srcid_cmd & srcid_msk. Both are used to check where the data comes from or go to. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: NJohn Garry <john.garry@huawei.com> Co-developed-by: NQi Liu <liuqi115@huawei.com> Signed-off-by: NQi Liu <liuqi115@huawei.com> Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1615186237-22263-9-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Shaokun Zhang 提交于
HiSilicon's Hip09 is comprised by multi-dies that can be connected by SLLC module (Skyros Link Layer Controller), its has separate PMU registers which the driver can program it freely and interrupt is supported to handle counter overflow. Let's support its driver under the framework of HiSilicon uncore PMU driver. SLLC PMU supports the following filter functions: * tracetag_en: allows user to count data according to tt_req or tt_core set in L3C PMU. * srcid_cmd & srcid_msk: allows user to filter statistics that come from specific CCL/ICL by configuration source ID. * tgtid_hi & tgtid_lo: it also supports event statistics that these operations will go to the CCL/ICL by configuration target ID or target ID range. It's the same as source ID with 11-bit width in the SoC. More introduction is added in documentation: Documentation/admin-guide/perf/hisi-pmu.rst Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: John Garry <john.garry@huawei.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: NJohn Garry <john.garry@huawei.com> Co-developed-by: NQi Liu <liuqi115@huawei.com> Signed-off-by: NQi Liu <liuqi115@huawei.com> Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1615186237-22263-8-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: NWill Deacon <will@kernel.org>
-
- 24 2月, 2021 1 次提交
-
-
由 Thomas Richter 提交于
Add support to the CPU Measurement counter facility device driver to extract complete counter sets per CPU and per counter set from user space. This includes a new device named /dev/hwctr and support for the device driver functions open, close and ioctl. Other functions are not supported. The ioctl command supports 3 subcommands: S390_HWCTR_START: enables counter sets on a list of CPUs. S390_HWCTR_STOP: disables counter sets on a list of CPUs. S390_HWCTR_READ: reads counter sets on a list of CPUs. The ioctl(..., S390_HWCTR_READ, ...) is the only subcommand which returns data. It requires member data_bytes to be positive and indicates the maximum amount of data available to store counter set data. The other ioctl() subcommands do not use this member and it should be set to zero. The S390_HWCTR_READ subcommand returns the following data: The cpuset data is flattened using the following scheme, stored in member data: 0x0 0x8 0xc 0x10 0x10 0x18 0x20 0x28 0xU-1 +---------+-----+---------+-----+---------+-----+-----+------+------+ | no_cpus | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n | +---------+-----+---------+-----+---------+-----+-----+------+------+ 0xU 0xU+4 0xU+8 0xU+10 0xV-1 +-----+---------+-----+-----+------+------+ | set | no_cnts | cv1 | cv2 | .... | cv_n | +-----+---------+-----+-----+------+------+ 0xV 0xV+4 0xV+8 0xV+c +-----+---------+-----+---------+-----+-----+------+------+ | cpu | no_sets | set | no_cnts | cv1 | cv2 | .... | cv_n | +-----+---------+-----+---------+-----+-----+------+------+ U and V denote arbitrary hexadezimal addresses. The first integer represents the number of CPUs data was extracted from. This is followed by CPU number and number of counter sets extracted. Both are two integer values. This is followed by the set identifer and number of counters extracted. Both are two integer values. This is followed by the counter values, each element is eight bytes in size. The S390_HWCTR_READ ioctl subcommand is also limited to one call per minute. This ensures that an application does not read out the counter sets too often and reduces the overall CPU performance. The complete counter set extraction is an expensive operation. Reviewed-by: NSumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NHeiko Carstens <hca@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 12 1月, 2021 1 次提交
-
-
由 Guo Ren 提交于
Current perf init will failed with: [ 1.452433] csky-pmu: probe of soc:pmu failed with error -16 This patch fix it up with adding CPUHP_AP_PERF_CSKY_ONLINE in cpuhotplug.h. Signed-off-by: NGuo Ren <guoren@linux.alibaba.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
-
- 23 12月, 2020 1 次提交
-
-
由 Daniel Lezcano 提交于
With the powercap dtpm controller, we are able to plug devices with power limitation features in the tree. The following patch introduces the CPU power limitation based on the energy model and the performance states. The power limitation is done at the performance domain level. If some CPUs are unplugged, the corresponding power will be subtracted from the performance domain total power. It is up to the platform to initialize the dtpm tree and add the CPU. Here is an example to create a simple tree with one root node called "pkg" and the CPU's performance domains. static int dtpm_register_pkg(struct dtpm_descr *descr) { struct dtpm *pkg; int ret; pkg = dtpm_alloc(NULL); if (!pkg) return -ENOMEM; ret = dtpm_register(descr->name, pkg, descr->parent); if (ret) return ret; return dtpm_register_cpu(pkg); } static struct dtpm_descr descr = { .name = "pkg", .init = dtpm_register_pkg, }; DTPM_DECLARE(descr); Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: NLukasz Luba <lukasz.luba@arm.com> Tested-by: NLukasz Luba <lukasz.luba@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 11 11月, 2020 1 次提交
-
-
由 Thomas Gleixner 提交于
With the new mechanism which kicks tasks off the outgoing CPU at the end of schedule() the situation on an outgoing CPU right before the stopper thread brings it down completely is: - All user tasks and all unbound kernel threads have either been migrated away or are not running and the next wakeup will move them to a online CPU. - All per CPU kernel threads, except cpu hotplug thread and the stopper thread have either been unbound or parked by the responsible CPU hotplug callback. That means that at the last step before the stopper thread is invoked the cpu hotplug thread is the last legitimate running task on the outgoing CPU. Add a final wait step right before the stopper thread is kicked which ensures that any still running tasks on the way to park or on the way to kick themself of the CPU are either sleeping or gone. This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If sched_cpu_dying() detects that there is still another running task aside of the stopper thread then it will explode with the appropriate fireworks. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NValentin Schneider <valentin.schneider@arm.com> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
-
- 07 10月, 2020 1 次提交
-
-
由 Kajol Jain 提交于
Patch here adds cpu hotplug functions to hv_gpci pmu. A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_GPCI_ONLINE" enum is added. The online callback function updates the cpumask only if its empty. As the primary intention of adding hotplug support is to designate a CPU to make HCALL to collect the counter data. The offline function test and clear corresponding cpu in a cpumask and update cpumask to any other active cpu. Signed-off-by: NKajol Jain <kjain@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201003074943.338618-4-kjain@linux.ibm.com
-
- 01 10月, 2020 1 次提交
-
-
由 Zqiang 提交于
If a CPU is offlined the debug objects per CPU pool is not cleaned up. If the CPU is never onlined again then the objects in the pool are wasted. Add a CPU hotplug callback which is invoked after the CPU is dead to free the pool. [ tglx: Massaged changelog and added comment about remote access safety ] Signed-off-by: NZqiang <qiang.zhang@windriver.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Link: https://lore.kernel.org/r/20200908062709.11441-1-qiang.zhang@windriver.com
-
- 18 9月, 2020 1 次提交
-
-
由 Andrew Jones 提交于
Steal time initialization requires mapping a memory region which invokes a memory allocation. Doing this at CPU starting time results in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled: BUG: sleeping function called from invalid context at mm/slab.h:498 in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1 Call trace: dump_backtrace+0x0/0x208 show_stack+0x1c/0x28 dump_stack+0xc4/0x11c ___might_sleep+0xf8/0x130 __might_sleep+0x58/0x90 slab_pre_alloc_hook.constprop.101+0xd0/0x118 kmem_cache_alloc_node_trace+0x84/0x270 __get_vm_area_node+0x88/0x210 get_vm_area_caller+0x38/0x40 __ioremap_caller+0x70/0xf8 ioremap_cache+0x78/0xb0 memremap+0x9c/0x1a8 init_stolen_time_cpu+0x54/0xf0 cpuhp_invoke_callback+0xa8/0x720 notify_cpu_starting+0xc8/0xd8 secondary_start_kernel+0x114/0x180 CPU1: Booted secondary processor 0x0000000001 [0x431f0a11] However we don't need to initialize steal time at CPU starting time. We can simply wait until CPU online time, just sacrificing a bit of accuracy by returning zero for steal time until we know better. While at it, add __init to the functions that are only called by pv_time_init() which is __init. Signed-off-by: NAndrew Jones <drjones@redhat.com> Fixes: e0685fa2 ("arm64: Retrieve stolen time as paravirtualized guest") Cc: stable@vger.kernel.org Reviewed-by: NSteven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 21 8月, 2020 1 次提交
-
-
由 Anup Patel 提交于
We add a separate CLINT timer driver for Linux RISC-V M-mode (i.e. RISC-V NoMMU kernel). The CLINT MMIO device provides three things: 1. 64bit free running counter register 2. 64bit per-CPU time compare registers 3. 32bit per-CPU inter-processor interrupt registers Unlike other timer devices, CLINT provides IPI registers along with timer registers. To use CLINT IPI registers, the CLINT timer driver provides IPI related callbacks to arch/riscv. Signed-off-by: NAnup Patel <anup.patel@wdc.com> Tested-by: NEmil Renner Berhing <kernel@esmil.dk> Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: NAtish Patra <atish.patra@wdc.com> Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
- 16 7月, 2020 1 次提交
-
-
由 Kajol Jain 提交于
Patch here adds cpu hotplug functions to hv_24x7 pmu. A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum is added. The online callback function updates the cpumask only if its empty. As the primary intention of adding hotplug support is to designate a CPU to make HCALL to collect the counter data. The offline function test and clear corresponding cpu in a cpumask and update cpumask to any other active cpu. Signed-off-by: NKajol Jain <kjain@linux.ibm.com> Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200709051836.723765-2-kjain@linux.ibm.com
-
- 10 6月, 2020 1 次提交
-
-
由 Anup Patel 提交于
The RISC-V per-HART local interrupt controller manages software interrupts, timer interrupts, external interrupts (which are routed via the platform level interrupt controller) and other per-HART local interrupts. We add a driver for the RISC-V local interrupt controller, which eventually replaces the RISC-V architecture code, allowing for a better split between arch code and drivers. The driver is compliant with RISC-V Hart-Level Interrupt Controller DT bindings located at: Documentation/devicetree/bindings/interrupt-controller/riscv,cpu-intc.txt Co-developed-by: NPalmer Dabbelt <palmer@dabbelt.com> Signed-off-by: NAnup Patel <anup.patel@wdc.com> [Palmer: Cleaned up warnings] Signed-off-by: NPalmer Dabbelt <palmer@dabbelt.com>
-
- 30 5月, 2020 1 次提交
-
-
由 Ming Lei 提交于
Most of blk-mq drivers depend on managed IRQ's auto-affinity to setup up queue mapping. Thomas mentioned the following point[1]: "That was the constraint of managed interrupts from the very beginning: The driver/subsystem has to quiesce the interrupt line and the associated queue _before_ it gets shutdown in CPU unplug and not fiddle with it until it's restarted by the core when the CPU is plugged in again." However, current blk-mq implementation doesn't quiesce hw queue before the last CPU in the hctx is shutdown. Even worse, CPUHP_BLK_MQ_DEAD is a cpuhp state handled after the CPU is down, so there isn't any chance to quiesce the hctx before shutting down the CPU. Add new CPUHP_AP_BLK_MQ_ONLINE state to stop allocating from blk-mq hctxs where the last CPU goes away, and wait for completion of in-flight requests. This guarantees that there is no inflight I/O before shutting down the managed IRQ. Add a BLK_MQ_F_STACKING and set it for dm-rq and loop, so we don't need to wait for completion of in-flight requests from these drivers to avoid a potential dead-lock. It is safe to do this for stacking drivers as those do not use interrupts at all and their I/O completions are triggered by underlying devices I/O completion. [1] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [hch: different retry mechanism, merged two patches, minor cleanups] Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NDaniel Wagner <dwagner@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 5月, 2020 1 次提交
-
-
由 Mike Leach 提交于
Adds registration of CPU start and stop functions to CPU hotplug mechanisms - for any CPU bound CTI. Sets CTI powered flag according to state. Will enable CTI on CPU start if there are existing enable requests. Signed-off-by: NMike Leach <mike.leach@linaro.org> Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200518180242.7916-23-mathieu.poirier@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 16 3月, 2020 1 次提交
-
-
由 Atish Patra 提交于
Currently, PLIC threshold is only initialized once in the beginning. However, threshold can be set to disabled if a CPU is marked offline with CPU hotplug feature. This will not allow to change the irq affinity to a CPU that just came online. Add PLIC specific CPU hotplug callbacks and enable the threshold when a CPU comes online. Take this opportunity to move the external interrupt enable code from trap init to PLIC driver as well. On cpu offline path, the driver performs the exact opposite operations i.e. disable the interrupt and the threshold. Signed-off-by: NAtish Patra <atish.patra@wdc.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Reviewed-by: NAnup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20200302231146.15530-2-atish.patra@wdc.com
-
- 02 1月, 2020 1 次提交
-
-
由 Ulf Hansson 提交于
When the hierarchical CPU topology is used and when a CPU is put offline, that CPU prevents its PM domain from being powered off, which is because genpd observes the corresponding attached device as being active from a runtime PM point of view. Furthermore, any potential master PM domains are also prevented from being powered off. To address this limitation, let's add add a new CPU hotplug state (CPUHP_AP_CPU_PM_STARTING) and register up/down callbacks for it, which allows us to deal with runtime PM accordingly. Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org> Reviewed-by: NSudeep Holla <sudeep.holla@arm.com> Acked-by: NRafael J. Wysocki <rafael@kernel.org>
-
- 11 12月, 2019 1 次提交
-
-
由 Daniel Jordan 提交于
Configuring an instance's parallel mask without any online CPUs... echo 2 > /sys/kernel/pcrypt/pencrypt/parallel_cpumask echo 0 > /sys/devices/system/cpu/cpu1/online ...makes tcrypt mode=215 crash like this: divide error: 0000 [#1] SMP PTI CPU: 4 PID: 283 Comm: modprobe Not tainted 5.4.0-rc8-padata-doc-v2+ #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191013_105130-anatol 04/01/2014 RIP: 0010:padata_do_parallel+0x114/0x300 Call Trace: pcrypt_aead_encrypt+0xc0/0xd0 [pcrypt] crypto_aead_encrypt+0x1f/0x30 do_mult_aead_op+0x4e/0xdf [tcrypt] test_mb_aead_speed.constprop.0.cold+0x226/0x564 [tcrypt] do_test+0x28c2/0x4d49 [tcrypt] tcrypt_mod_init+0x55/0x1000 [tcrypt] ... cpumask_weight() in padata_cpu_hash() returns 0 because the mask has no CPUs. The problem is __padata_remove_cpu() checks for valid masks too early and so doesn't mark the instance PADATA_INVALID as expected, which would have made padata_do_parallel() return error before doing the division. Fix by introducing a second padata CPU hotplug state before CPUHP_BRINGUP_CPU so that __padata_remove_cpu() sees the online mask without @cpu. No need for the second argument to padata_replace() since @cpu is now already missing from the online mask. Fixes: 33e54450 ("padata: Handle empty padata cpumasks") Signed-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 15 11月, 2019 1 次提交
-
-
由 Michael Kelley 提交于
Hyper-V has historically initialized stimer-based clockevents late in the process of onlining a CPU because clockevents depend on stimer interrupts. In the original Hyper-V design, stimer interrupts generate a VMbus message, so the VMbus machinery must be running first, and VMbus can't be initialized until relatively late. On x86/64, LAPIC timer based clockevents are used during early initialization before VMbus and stimer-based clockevents are ready, and again during CPU offlining after the stimer clockevents have been shut down. Unfortunately, this design creates problems when offlining CPUs for hibernation or other purposes. stimer-based clockevents are shut down relatively early in the offlining process, so clockevents_unbind_device() must be used to fallback to the LAPIC-based clockevents for the remainder of the offlining process. Furthermore, the late initialization and early shutdown of stimer-based clockevents doesn't work well on ARM64 since there is no other timer like the LAPIC to fallback to. So CPU onlining and offlining doesn't work properly. Fix this by recognizing that stimer Direct Mode is the normal path for newer versions of Hyper-V on x86/64, and the only path on other architectures. With stimer Direct Mode, stimer interrupts don't require any VMbus machinery. stimer clockevents can be initialized and shut down consistent with how it is done for other clockevent devices. While the old VMbus-based stimer interrupts must still be supported for backward compatibility on x86, that mode of operation can be treated as legacy. So add a new Hyper-V stimer entry in the CPU hotplug state list, and use that new state when in Direct Mode. Update the Hyper-V clocksource driver to allocate and initialize stimer clockevents earlier during boot. Update Hyper-V initialization and the VMbus driver to use this new design. As a result, the LAPIC timer is no longer used during boot or CPU onlining/offlining and clockevents_unbind_device() is not called. But retain the old design as a legacy implementation for older versions of Hyper-V that don't support Direct Mode. Signed-off-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NDexuan Cui <decui@microsoft.com> Reviewed-by: NDexuan Cui <decui@microsoft.com> Link: https://lkml.kernel.org/r/1573607467-9456-1-git-send-email-mikelley@microsoft.com
-
- 22 10月, 2019 1 次提交
-
-
由 Steven Price 提交于
Enable paravirtualization features when running under a hypervisor supporting the PV_TIME_ST hypercall. For each (v)CPU, we ask the hypervisor for the location of a shared page which the hypervisor will use to report stolen time to us. We set pv_time_ops to the stolen time function which simply reads the stolen value from the shared page for a VCPU. We guarantee single-copy atomicity using READ_ONCE which means we can also read the stolen time for another VCPU than the currently running one while it is potentially being updated by the hypervisor. Signed-off-by: NSteven Price <steven.price@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 04 7月, 2019 1 次提交
-
-
由 James Morse 提交于
The cacheinfo structures are alloced/freed by cpu online/offline callbacks. Originally these were only used by sysfs to expose the cache topology to user space. Without any in-kernel dependencies CPUHP_AP_ONLINE_DYN was an appropriate choice. resctrl has started using these structures to identify CPUs that share a cache. It updates its 'domain' structures from cpu online/offline callbacks. These depend on the cacheinfo structures (resctrl_online_cpu()->domain_add_cpu()->get_cache_id()-> get_cpu_cacheinfo()). These also run as CPUHP_AP_ONLINE_DYN. Now that there is an in-kernel dependency, move the cacheinfo work earlier so we know its done before resctrl's CPUHP_AP_ONLINE_DYN work runs. Fixes: 2264d9c7 ("x86/intel_rdt: Build structures for each resource based on cache topology") Cc: <stable@vger.kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: NJames Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20190624173656.202407-1-james.morse@arm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 6月, 2019 1 次提交
-
-
由 Marek Szyprowski 提交于
Exynos SoCs based on CA7/CA15 have 2 timer interfaces: custom Exynos MCT (Multi Core Timer) and standard ARM Architected Timers. There are use cases, where both timer interfaces are used simultanously. One of such examples is using Exynos MCT for the main system timer and ARM Architected Timers for the KVM and virtualized guests (KVM requires arch timers). Exynos Multi-Core Timer driver (exynos_mct) must be however started before ARM Architected Timers (arch_timer), because they both share some common hardware blocks (global system counter) and turning on MCT is needed to get ARM Architected Timer working properly. To ensure selecting Exynos MCT as the main system timer, increase MCT timer rating. To ensure proper starting order of both timers during suspend/resume cycle, increase MCT hotplug priority over ARM Archictected Timers. Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: NKrzysztof Kozlowski <krzk@kernel.org> Reviewed-by: NChanwoo Choi <cw00.choi@samsung.com> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
-
- 15 6月, 2019 1 次提交
-
-
由 Borislav Petkov 提交于
Adric Blake reported the following warning during suspend-resume: Enabling non-boot CPUs ... x86: Booting SMP configuration: smpboot: Booting Node 0 Processor 1 APIC 0x2 unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \ at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20) Call Trace: intel_set_tfa intel_pmu_cpu_starting ? x86_pmu_dead_cpu x86_pmu_starting_cpu cpuhp_invoke_callback ? _raw_spin_lock_irqsave notify_cpu_starting start_secondary secondary_startup_64 microcode: sig=0x806ea, pf=0x80, revision=0x96 microcode: updated to revision 0xb4, date = 2019-04-01 CPU1 is up The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated by microcode. The log above shows that the microcode loader callback happens after the PMU restoration, leading to the conjecture that because the microcode hasn't been updated yet, that MSR is not present yet, leading to the #GP. Add a microcode loader-specific hotplug vector which comes before the PERF vectors and thus executes earlier and makes sure the MSR is present. Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort") Reported-by: NAdric Blake <promarbler14@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: x86@kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637
-
- 03 5月, 2019 1 次提交
-
-
由 Anju T Sudhakar 提交于
Patch detects trace-imc events, does memory initilizations for each online cpu, and registers cpuhotplug call-backs. Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 4月, 2019 1 次提交
-
-
由 Rafael J. Wysocki 提交于
The current handling of MSR_IA32_ENERGY_PERF_BIAS in the kernel is problematic, because it may cause changes made by user space to that MSR (with the help of the x86_energy_perf_policy tool, for example) to be lost every time a CPU goes offline and then back online as well as during system-wide power management transitions into sleep states and back into the working state. The first problem is that if the current EPB value for a CPU going online is 0 ('performance'), the kernel will change it to 6 ('normal') regardless of whether or not this is the first bring-up of that CPU. That also happens during system-wide resume from sleep states (including, but not limited to, hibernation). However, the EPB may have been adjusted by user space this way and the kernel should not blindly override that setting. The second problem is that if the platform firmware resets the EPB values for any CPUs during system-wide resume from a sleep state, the kernel will not restore their previous EPB values that may have been set by user space before the preceding system-wide suspend transition. Again, that behavior may at least be confusing from the user space perspective. In order to address these issues, rework the handling of MSR_IA32_ENERGY_PERF_BIAS so that the EPB value is saved on CPU offline and restored on CPU online as well as (for the boot CPU) during the syscore stages of system-wide suspend and resume transitions, respectively. However, retain the policy by which the EPB is set to 6 ('normal') on the first bring-up of each CPU if its initial value is 0, based on the observation that 0 may mean 'not initialized' just as well as 'performance' in that case. While at it, move the MSR_IA32_ENERGY_PERF_BIAS handling code into a separate file and document it in Documentation/admin-guide. Fixes: abe48b10 (x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS) Fixes: b51ef52d (x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume) Reported-by: NThomas Renninger <trenn@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NHannes Reinecke <hare@suse.com> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NThomas Gleixner <tglx@linutronix.de>
-
- 23 2月, 2019 1 次提交
-
-
由 Joseph Lo 提交于
Add support for the Tegra210 timer that runs at oscillator clock (TMR10-TMR13). We need these timers to work as clock event device and to replace the ARMv8 architected timer due to it can't survive across the power cycle of the CPU core or CPUPORESET signal. So it can't be a wake-up source when CPU suspends in power down state. Also convert the original driver to use timer-of API. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NJoseph Lo <josephl@nvidia.com> Acked-by: NThierry Reding <treding@nvidia.com> Acked-by: NJon Hunter <jonathanh@nvidia.com> Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
-
- 06 12月, 2018 1 次提交
-
-
由 Kulkarni, Ganapatrao 提交于
This patch adds a perf driver for the PMU UNCORE devices DDR4 Memory Controller(DMC) and Level 3 Cache(L3C). Each PMU supports up to 4 counters. All counters lack overflow interrupt and are sampled periodically. Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NGanapatrao Kulkarni <ganapatrao.kulkarni@cavium.com> [will: consistent enum cpuhp_state naming] Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 22 11月, 2018 1 次提交
-
-
由 Hoan Tran 提交于
If the CPU assigned to the xgene PMU is taken offline, then subsequent perf invocations on the PMU will fail: # echo 0 > /sys/devices/system/cpu/cpu0/online # perf stat -a -e l3c0/cycle-count/,l3c0/write/ sleep 1 Error: The sys_perf_event_open() syscall returned with 19 (No such device) for event (l3c0/cycle-count/). /bin/dmesg may provide additional information. No CONFIG_PERF_EVENTS=y kernel support configured? This patch implements a hotplug notifier in the xgene PMU driver so that the PMU context is migrated to another online CPU should its assigned CPU disappear. Acked-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NHoan Tran <hoan.tran@amperecomputing.com> [will: Made naming of new cpuhp_state enum entry consistent] Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 03 11月, 2018 1 次提交
-
-
由 Guo Ren 提交于
The driver is for C-SKY SMP timer. It only supports oneshot event and 32bit overflow for clocksource. Per cpu core has one timer and all timers share one clock-counter-input from the same clocksource. This use mfcr&mtcr instructions to access the regs. Signed-off-by: NGuo Ren <ren_guo@c-sky.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
-
- 13 8月, 2018 1 次提交
-
-
由 Palmer Dabbelt 提交于
The RISC-V ISA defines a per-hart real-time clock and timer, which is present on all systems. The clock is accessed via the 'rdtime' pseudo-instruction (which reads a CSR), and the timer is set via an SBI call. Contains various improvements from Atish Patra <atish.patra@wdc.com>. Signed-off-by: NDmitriy Cherkasov <dmitriy@oss-tech.org> Signed-off-by: NPalmer Dabbelt <palmer@dabbelt.com> [hch: remove dead code, add SPDX tags, used riscv_of_processor_hart(), minor cleanups, merged hotplug cpu support and other improvements from Atish] Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAtish Patra <atish.patra@wdc.com> Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
-
- 03 7月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
Oleg suggested to replace the "watchdog/%u" threads with cpu_stop_work. That removes one thread per CPU while at the same time fixes softlockup vs SCHED_DEADLINE. But more importantly, it does away with the single smpboot_update_cpumask_percpu_thread() user, which allows cleanups/shrinkage of the smpboot interface. Suggested-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 6月, 2018 1 次提交
-
-
由 Lucas Stach 提交于
The current call site in boot_secondary is causing sleep in invalid context warnings, as this part of the code is running with interrrupts disabled and some of the calls into the clock framework might sleep on a mutex. Convert the secondary CPU clock sync to a hotplug state, which allows to call it from a sleepable context. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Signed-off-by: NGregory CLEMENT <gregory.clement@bootlin.com>
-
- 16 3月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
The Analog Devices Blackfin port was added in 2007 and was rather active for a while, but all work on it has come to a standstill over time, as Analog have changed their product line-up. Aaron Wu confirmed that the architecture port is no longer relevant, and multiple people suggested removing blackfin independently because of some of its oddities like a non-working SMP port, and the amount of duplication between the chip variants, which cause extra work when doing cross-architecture changes. Link: https://docs.blackfin.uclinux.org/Acked-by: NAaron Wu <Aaron.Wu@analog.com> Acked-by: NBryan Wu <cooloney@gmail.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Mike Frysinger <vapier@chromium.org> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
- 23 2月, 2018 1 次提交
-
-
由 James Hogan 提交于
Now that arch/metag/ has been removed, remove the metag generic per-thread timer driver. It is of no value without the architecture code. Signed-off-by: NJames Hogan <jhogan@kernel.org> Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-metag@vger.kernel.org
-