- 05 6月, 2018 1 次提交
-
-
由 Colin Ian King 提交于
Trivial fix to spelling mistake in hmi_error_types text Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NStewart Smith <stewart@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 04 6月, 2018 1 次提交
-
-
由 Haren Myneni 提交于
NX can set the 3rd bit in CR register for XER[SO] (Summary overflow) which is not related to paste request. The current paste function returns failure for a successful request when this bit is set. So mask this bit and check the proper return status. Fixes: 2392c8c8 ("powerpc/powernv/vas: Define copy/paste interfaces") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: NHaren Myneni <haren@us.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 03 6月, 2018 8 次提交
-
-
由 Michal Suchanek 提交于
Check what firmware told us and enable/disable the barrier_nospec as appropriate. We err on the side of enabling the barrier, as it's no-op on older systems, see the comment for more detail. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anju T Sudhakar 提交于
Since thread-imc internally use the core-imc hardware infrastructure and is depended on it, having thread-imc in the kernel in the absence of core-imc is trivial. Patch disables thread-imc, if core-imc is not registered. Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anju T Sudhakar 提交于
When any of the IMC (In-Memory Collection counter) devices fail to initialize, imc_common_mem_free() frees set of memory. In doing so, pmu_ptr pointer is also freed. But pmu_ptr pointer is used in subsequent function (imc_common_cpuhp_mem_free()) which is wrong. Patch here reorders the code to avoid such access. Also free the memory which is dynamically allocated during imc initialization, wherever required. Signed-off-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Arnd Bergmann 提交于
Looking through the remaining users of the deprecated mktime() function, I found the powerpc rtc handlers, which use it in place of rtc_tm_to_time64(). To clean this up, I'm changing over the read_persistent_clock() function to the read_persistent_clock64() variant, and change all the platform specific handlers along with it. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alastair D'Silva 提交于
The function removes the process element from NPU cache. Signed-off-by: NAlastair D'Silva <alastair@d-silva.org> Acked-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: NAndrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Using irq_work for processing OPAL event interrupts is not necessary. irq_work is typically used to schedule work from NMI context, a softirq may be more appropriate. However OPAL events are not particularly performance or latency critical, so they can all be invoked by kopald. This patch removes the irq_work queueing, and instead wakes up kopald when there is an event to be processed. kopald processes interrupts individually, enabling irqs and calling cond_resched between each one to minimise latencies. Event handlers themselves should still use threaded handlers, workqueues, etc. as necessary to avoid high interrupts-off latencies within any single interrupt. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Although it is often possible to recover a CPU that was interrupted from OPAL with a system reset NMI, it's undesirable to interrupt them for a few reasons. Firstly because dump/debug code itself needs to call firmware, so it could hang on a lock or possibly corrupt a per-cpu data structure if it or another CPU was interrupted from OPAL. Secondly, the kexec crash dump code will not return from interrupt to unwind the OPAL call. Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to another CPU, which wait for it to leave firmware (or time out) to avoid this problem in normal conditions. Firmware bugs may still result in a timeout and interrupting OPAL, but that is the best option (stops the CPU, and possibly allows firmware to be debugged). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages(). Since iommu_tce_table_put() calls it_ops::free when the last reference to the table is released, explicit call to pnv_pci_ioda2_table_free_pages() is not needed so let's remove it. This should fix double free in the case of PCI hotuplug as pnv_pci_ioda2_table_free_pages() does not reset neither iommu_table::it_base nor ::it_size. This was not exposed by SRIOV as it uses different code path via pnv_pcibios_sriov_disable(). IODA1 does not inialize it_ops::free so it does not have this issue. Fixes: c5f7700b ("powerpc/powernv: Dynamically release PE") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 5月, 2018 1 次提交
-
-
由 Akshay Adiga 提交于
Init all present cpus for deep states instead of "all possible" cpus. Init fails if a possible cpu is guarded. Resulting in making only non-deep states available for cpuidle/hotplug. Stewart says, this means that for single threaded workloads, if you guard out a CPU core you'll not get WoF (Workload Optimised Frequency), which means that performance goes down when you wouldn't expect it to. Fixes: 77b54e9f ("powernv/powerpc: Add winkle support for offline cpus") Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: NAkshay Adiga <akshay.adiga@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 24 5月, 2018 1 次提交
-
-
由 Simon Guo 提交于
This patches add some macros for CR0/TEXASR bits so that PR KVM TM logic (tbegin./treclaim./tabort.) can make use of them later. Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com> Reviewed-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 5月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichal Suchánek <msuchanek@suse.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2018 1 次提交
-
-
由 Shilpasri G Bhat 提交于
This patch adds support to read 64-bit sensor values. This method is used to read energy sensors and counters which are of type u64. Signed-off-by: NShilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 18 5月, 2018 2 次提交
-
-
由 Michael Ellerman 提交于
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
-
由 Michael Ellerman 提交于
This allows us to squash some sparse warnings and also avoids having to do explicity endian conversions in the code. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
-
- 17 5月, 2018 2 次提交
-
-
由 Nicholas Piggin 提交于
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b807033 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
A kernel crash in process context that calls emergency_restart from panic will end up calling opal_event_shutdown with interrupts disabled but not in interrupt. This causes a sleeping function to be called which gives the following warning with sysrq+c: Rebooting in 10 seconds.. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 in_atomic(): 0, irqs_disabled(): 1, pid: 7669, name: bash CPU: 20 PID: 7669 Comm: bash Tainted: G D W 4.17.0-rc5+ #3 Call Trace: dump_stack+0xb0/0xf4 (unreliable) ___might_sleep+0x174/0x1a0 mutex_lock+0x38/0xb0 __free_irq+0x68/0x460 free_irq+0x70/0xc0 opal_event_shutdown+0xb4/0xf0 opal_shutdown+0x24/0xa0 pnv_shutdown+0x28/0x40 machine_shutdown+0x44/0x60 machine_restart+0x28/0x80 emergency_restart+0x30/0x50 panic+0x2a0/0x328 oops_end+0x1ec/0x1f0 bad_page_fault+0xe8/0x154 handle_page_fault+0x34/0x38 --- interrupt: 300 at sysrq_handle_crash+0x44/0x60 LR = __handle_sysrq+0xfc/0x260 flag_spec.62335+0x12b844/0x1e8db4 (unreliable) __handle_sysrq+0xfc/0x260 write_sysrq_trigger+0xa8/0xb0 proc_reg_write+0xac/0x110 __vfs_write+0x6c/0x240 vfs_write+0xd0/0x240 ksys_write+0x6c/0x110 Fixes: 9f0fd049 ("powerpc/powernv: Add a virtual irqchip for opal events") Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 5月, 2018 2 次提交
-
-
由 Alexey Kardashevskiy 提交于
At the moment we assume that IODA2 and newer PHBs can always do 4K/64K/16M IOMMU pages, however this is not the case for POWER9 and now skiboot advertises the supported sizes via the device so we use that instead of hard coding the mask. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Currently memtrace doesn't build if NUMA=n: In function ‘memtrace_alloc_node’: arch/powerpc/platforms/powernv/memtrace.c:134:6: error: the address of ‘contig_page_data’ will always evaluate as ‘true’ if (!NODE_DATA(nid) || !node_spanned_pages(nid)) ^ This is because for NUMA=n NODE_DATA(nid) points to an always allocated structure, contig_page_data. But even in the NUMA=y case memtrace_alloc_node() is only called for online nodes, and we should always have a NODE_DATA() allocated for an online node. So remove the (hopefully) overly paranoid check, which also means we can build when NUMA=n. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 5月, 2018 1 次提交
-
-
由 Balbir Singh 提交于
This commit was a stop-gap to prevent crashes on hotunplug, caused by the mismatch between the 1G mappings used for the linear mapping and the memory block size. Those issues are now resolved because we split the linear mapping at hotunplug time if necessary, as implemented in commit 4dd5f8a9 ("powerpc/mm/radix: Split linear mapping on hot-unplug"). Signed-off-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Tested-by: NRashmica Gupta <rashmica.g@gmail.com> Tested-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 4月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 24 4月, 2018 4 次提交
-
-
由 Alistair Popple 提交于
The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Acked-by: NBalbir Singh <bsingharora@gmail.com> Tested-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com> Tested-by: NMark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com> Tested-by: NMark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Balbir Singh 提交于
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: NBalbir Singh <bsingharora@gmail.com> Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 4月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, and various lockup errors to trigger (again, BMC reboot can cause it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Depends-on: 34dd25de ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 4月, 2018 1 次提交
-
-
由 Oliver O'Halloran 提交于
Scan the devicetree for an nvdimm-bus compatible and create a platform device for them. Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 03 4月, 2018 2 次提交
-
-
由 Nicholas Piggin 提交于
Currently powernv reboot and shutdown requests just leave secondaries to do their own things. This is undesirable because they can trigger any number of watchdogs while waiting for reboot, but also we don't know what else they might be doing -- they might be causing trouble, trampling memory, etc. The opal scheduled flash update code already ran into watchdog problems due to flashing taking a long time, and it was fixed with 2196c6f1 ("powerpc/powernv: Return secondary CPUs to firmware before FW update"), which returns secondaries to opal. It's been found that regular reboots can take over 10 seconds, which can result in the hard lockup watchdog firing, reboot: Restarting system [ 360.038896709,5] OPAL: Reboot request... Watchdog CPU:0 Hard LOCKUP Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 Watchdog CPU:16 Hard LOCKUP watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] This patch removes the special case for flash update, and calls smp_send_stop in all cases before calling reboot/shutdown. smp_send_stop could return CPUs to OPAL, the main reason not to is that the request could come from a NMI that interrupts OPAL code, so re-entry to OPAL can cause a number of problems. Putting secondaries into simple spin loops improves the chances of a successful reboot. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The PSSCR value is not stored to PACA_REQ_PSSCR if the CPU does not have the XER[SO] bug. Fix this by storing up-front, outside the workaround code. The initial test is not required because it is a slow path. The workaround is made to depend on CONFIG_KVM_BOOK3S_HV_POSSIBLE, to match pnv_power9_force_smt4_catch() where it is used. Drop the comment on pnv_power9_force_smt4_catch() as it's no longer true. Fixes: 7672691a ("powerpc/powernv: Provide a way to force a core into SMT4 mode") Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 31 3月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 3月, 2018 2 次提交
-
-
由 Nicholas Piggin 提交于
opal_nvram_write currently just assumes success if it encounters an error other than OPAL_BUSY or OPAL_BUSY_EVENT. Have it return -EIO on other errors instead. Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: NStewart Smith <stewart@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 3月, 2018 4 次提交
-
-
由 Sam Bobroff 提交于
Checking for a "fully active" device state requires testing two flag bits, which is open coded in several places, so add a function to do it. Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alexey Kardashevskiy 提交于
GPUs and the corresponding NVLink bridges get different PEs as they have separate translation validation entries (TVEs). We put these PEs to the same IOMMU group so they cannot be passed through separately. So the iommu_table_group_ops::set_window/unset_window for GPUs do set tables to the NPU PEs as well which means that iommu_table's list of attached PEs (iommu_table_group_link) has both GPU and NPU PEs linked. This list is used for TCE cache invalidation. The problem is that NPU PE has just a single TVE and can be programmed to point to 32bit or 64bit windows while GPU PE has two (as any other PCI device). So we end up having an 32bit iommu_table struct linked to both PEs even though only the 64bit TCE table cache can be invalidated on NPU. And a relatively recent skiboot detects this and prints errors. This changes GPU's iommu_table_group_ops::set_window/unset_window to make sure that NPU PE is only linked to the table actually used by the hardware. If there are two tables used by an IOMMU group, the NPU PE will use the last programmed one which with the current use scenarios is expected to be a 64bit one. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Now that we have the security flags we can significantly simplify the code in pnv_setup_rfi_flush(), because we can use the flags instead of checking device tree properties and because the security flags have pessimistic defaults. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Now that we have feature flags for security related things, set or clear them based on what we see in the device tree provided by firmware. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 3月, 2018 1 次提交
-
-
由 Paul Mackerras 提交于
POWER9 processors up to and including "Nimbus" v2.2 have hardware bugs relating to transactional memory and thread reconfiguration. One of these bugs has a workaround which is to get the core into SMT4 state temporarily. This workaround is only needed when running bare-metal. This patch provides a function which gets the core into SMT4 mode by preventing threads from going to a stop state, and waking up those which are already in a stop state. Once at least 3 threads are not in a stop state, the core will be in SMT4 and we can continue. To do this, we add a "dont_stop" flag to the paca to tell the thread not to go into a stop state. If this flag is set, power9_idle_stop() just returns immediately with a return value of 0. The pnv_power9_force_smt4_catch() function does the following: 1. Set the dont_stop flag for each thread in the core, except ourselves (in fact we use an atomic_inc() in case more than one thread is calling this function concurrently). 2. See how many threads are awake, indicated by their requested_psscr field in the paca being 0. If this is at least 3, skip to step 5. 3. Send a doorbell interrupt to each thread that was seen as being in a stop state in step 2. 4. Until at least 3 threads are awake, scan the threads to which we sent a doorbell interrupt and check if they are awake now. This relies on the following properties: - Once dont_stop is non-zero, requested_psccr can't go from zero to non-zero, except transiently (and without the thread doing stop). - requested_psscr being zero guarantees that the thread isn't in a state-losing stop state where thread reconfiguration could occur. - Doing stop with a PSSCR value of 0 won't be a state-losing stop and thus won't allow thread reconfiguration. - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core must be in SMT4 mode, since SMT modes are powers of 2. This does add a sync to power9_idle_stop(), which is necessary to provide the correct ordering between setting requested_psscr and checking dont_stop. The overhead of the sync should be unnoticeable compared to the latency of going into and out of a stop state. Because some objected to incurring this extra latency on systems where the XER[SO] bug is not relevant, I have put the test in power9_idle_stop inside a feature section. This means that pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will probably hang the system. In order to cater for uses where the caller has an operation that has to be done while the core is in SMT4, the core continues to be kept in SMT4 after pnv_power9_force_smt4_catch() function returns, until the pnv_power9_force_smt4_release() function is called. It undoes the effect of step 1 above and allows the other threads to go into a stop state. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 20 3月, 2018 1 次提交
-
-
由 Markus Elfring 提交于
It's slightly less error prone to use sizeof(*foo) rather than specifying the type. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> [mpe: Consolidate into one patch, rewrite change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 3月, 2018 1 次提交
-
-
由 Sukadev Bhattiprolu 提交于
Add a couple of trace points in the VAS driver Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> [mpe: Add SPDX tag to new header] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-