- 24 5月, 2018 2 次提交
-
-
由 Sandipan Das 提交于
This adds support for bpf-to-bpf function calls in the powerpc64 JIT compiler. The JIT compiler converts the bpf call instructions to native branch instructions. After a round of the usual passes, the start addresses of the JITed images for the callee functions are known. Finally, to fixup the branch target addresses, we need to perform an extra pass. Because of the address range in which JITed images are allocated on powerpc64, the offsets of the start addresses of these images from __bpf_call_base are as large as 64 bits. So, for a function call, we cannot use the imm field of the instruction to determine the callee's address. Instead, we use the alternative method of getting it from the list of function addresses in the auxiliary data of the caller by using the off field as an index. Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Sandipan Das 提交于
For multi-function programs, loading the address of a callee function to a register requires emitting instructions whose count varies from one to five depending on the nature of the address. Since we come to know of the callee's address only before the extra pass, the number of instructions required to load this address may vary from what was previously generated. This can make the JITed image grow or shrink. To avoid this, we should generate a constant five-instruction when loading function addresses by padding the optimized load sequence with NOPs. Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 22 5月, 2018 1 次提交
-
-
由 Nicholas Piggin 提交于
On some CPUs we can prevent a vulnerability related to store-to-load forwarding by preventing store forwarding between privilege domains, by inserting a barrier in kernel entry and exit paths. This is known to be the case on at least Power7, Power8 and Power9 powerpc CPUs. Barriers must be inserted generally before the first load after moving to a higher privilege, and after the last store before moving to a lower privilege, HV and PR privilege transitions must be protected. Barriers are added as patch sections, with all kernel/hypervisor entry points patched, and the exit points to lower privilge levels patched similarly to the RFI flush patching. Firmware advertisement is not implemented yet, so CPU flush types are hard coded. Thanks to Michal Suchánek for bug fixes and review. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichal Suchánek <msuchanek@suse.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 5月, 2018 1 次提交
-
-
由 Michael Neuling 提交于
Clear the PCR (Processor Compatibility Register) on boot to ensure we are not running in a compatibility mode. We've seen this cause problems when a crash (and kdump) occurs while running compat mode guests. The kdump kernel then runs with the PCR set and causes problems. The symptom in the kdump kernel (also seen in petitboot after fast-reboot) is early userspace programs taking sigills on newer instructions (seen in libc). Signed-off-by: NMichael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 5月, 2018 7 次提交
-
-
由 Nicholas Piggin 提交于
Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b807033 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This requires further changes to linker script to KEEP some tables and wildcard compiler generated sections into the right place. This includes pp32 modifications from Christophe Leroy. When compiling powernv_defconfig with this option, the resulting kernel is almost 400kB smaller (and still boots): text data bss dec filename 11827621 4810490 1341080 17979191 vmlinux 11752437 4598858 1338776 17690071 vmlinux.dcde Mathieu's numbers for custom Mac Mini G4 config has almost 200kB saving. It also had some increase in vmlinux size for as-yet unknown reasons. text data bss dec filename 7461457 2475122 1428064 11364643 vmlinux 7386425 2364370 1425432 11176227 vmlinux.dcde Tested-by: Christophe Leroy <christophe.leroy@c-s.fr> [8xx] Tested-by: Mathieu Malaterre <malat@debian.org> [32-bit powermac] Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Paul Mackerras 提交于
A radix guest can execute tlbie instructions to invalidate TLB entries. After a tlbie or a group of tlbies, it must then do the architected sequence eieio; tlbsync; ptesync to ensure that the TLB invalidation has been processed by all CPUs in the system before it can rely on no CPU using any translation that it just invalidated. In fact it is the ptesync which does the actual synchronization in this sequence, and hardware has a requirement that the ptesync must be executed on the same CPU thread as the tlbies which it is expected to order. Thus, if a vCPU gets moved from one physical CPU to another after it has done some tlbies but before it can get to do the ptesync, the ptesync will not have the desired effect when it is executed on the second physical CPU. To fix this, we do a ptesync in the exit path for radix guests. If there are any pending tlbies, this will wait for them to complete. If there aren't, then ptesync will just do the same as sync. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Benjamin Herrenschmidt 提交于
When a vcpu priority (CPPR) is set to a lower value (masking more interrupts), we stop processing interrupts already in the queue for the priorities that have now been masked. If those interrupts were previously re-routed to a different CPU, they might still be stuck until the older one that has them in its queue processes them. In the case of guest CPU unplug, that can be never. To address that without creating additional overhead for the normal interrupt processing path, this changes H_CPPR handling so that when such a priority change occurs, we scan the interrupt queue for that vCPU, and for any interrupt in there that has been re-routed, we replace it with a dummy and force a re-trigger. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
The current partition table unmap code clears the _PAGE_PRESENT bit out of the pte, which leaves pud_huge/pmd_huge true and does not clear pud_present/pmd_present. This can confuse subsequent page faults and possibly lead to the guest looping doing continual hypervisor page faults. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
The standard eieio ; tlbsync ; ptesync must follow tlbie to ensure it is ordered with respect to subsequent operations. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
Currently, the HV KVM guest entry/exit code adds the timebase offset from the vcore struct to the timebase on guest entry, and subtracts it on guest exit. Which is fine, except that it is possible for userspace to change the offset using the SET_ONE_REG interface while the vcore is running, as there is only one timebase offset per vcore but potentially multiple VCPUs in the vcore. If that were to happen, KVM would subtract a different offset on guest exit from that which it had added on guest entry, leading to the timebase being out of sync between cores in the host, which then leads to bad things happening such as hangs and spurious watchdog timeouts. To fix this, we add a new field 'tb_offset_applied' to the vcore struct which stores the offset that is currently applied to the timebase. This value is set from the vcore tb_offset field on guest entry, and is what is subtracted from the timebase on guest exit. Since it is zero when the timebase offset is not applied, we can simplify the logic in kvmhv_start_timing and kvmhv_accumulate_time. In addition, we had secondary threads reading the timebase while running concurrently with code on the primary thread which would eventually add or subtract the timebase offset from the timebase. This occurred while saving or restoring the DEC register value on the secondary threads. Although no specific incorrect behaviour has been observed, this is a race which should be fixed. To fix it, we move the DEC saving code to just before we call kvmhv_commence_exit, and the DEC restoring code to after the point where we have waited for the primary thread to switch the MMU context and add the timebase offset. That way we are sure that the timebase contains the guest timebase value in both cases. Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 16 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Variants of proc_create{,_data} that directly take a seq_file show callback and drastically reduces the boilerplate code in the callers. All trivial callers converted over. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 14 5月, 2018 2 次提交
-
-
由 Bryant G. Ly 提交于
This driver is a logical device which provides an interface between the hypervisor and a management partition. This interface is like a message passing interface. This management partition is intended to provide an alternative to HMC-based system management. VMC enables the Management LPAR to provide basic logical partition functions: - Logical Partition Configuration - Boot, start, and stop actions for individual partitions - Display of partition status - Management of virtual Ethernet - Management of virtual Storage - Basic system management This driver is to be used for the POWER Virtual Management Channel Virtual Adapter on the PowerPC platform. It provides a character device which allows for both request/response and async message support through the /dev/ibmvmc node. Signed-off-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com> Reviewed-by: NSteven Royer <seroyer@linux.vnet.ibm.com> Reviewed-by: NAdam Reznechek <adreznec@linux.vnet.ibm.com> Reviewed-by: NRandy Dunlap <rdunlap@infradead.org> Tested-by: NTaylor Jakobson <tjakobs@us.ibm.com> Tested-by: NBrad Warrum <bwarrum@us.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Frederic Weisbecker 提交于
Remove the ad-hoc implementation, the generic code now allows us not to reinvent the wheel. Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Rich Felker <dalias@libc.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/1525786706-22846-9-git-send-email-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 5月, 2018 6 次提交
-
-
由 Christoph Hellwig 提交于
This way we have one central definition of it, and user can select it as needed. The new option is not user visible, which is the behavior it had in most architectures, with a few notable exceptions: - On x86_64 and mips/loongson3 it used to be user selectable, but defaulted to y. It now is unconditional, which seems like the right thing for 64-bit architectures without guaranteed availablity of IOMMUs. - on powerpc the symbol is user selectable and defaults to n, but many boards select it. This change assumes no working setup required a manual selection, but if that turned out to be wrong we'll have to add another select statement or two for the respective boards. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Define this symbol if the architecture either uses 64-bit pointers or the PHYS_ADDR_T_64BIT is set. This covers 95% of the old arch magic. We only need an additional select for Xen on ARM (why anyway?), and we now always set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing instead of only doing it when highmem is set. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJames Hogan <jhogan@kernel.org>
-
由 Christoph Hellwig 提交于
Instead select the PHYS_ADDR_T_64BIT for 32-bit architectures that need a 64-bit phys_addr_t type directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJames Hogan <jhogan@kernel.org>
-
由 Christoph Hellwig 提交于
This way we have one central definition of it, and user can select it as needed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
-
由 Christoph Hellwig 提交于
This way we have one central definition of it, and user can select it as needed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
-
由 Christoph Hellwig 提交于
This avoids selecting IOMMU_HELPER just for this function. And we only use it once or twice in normal builds so this often even is a size reduction. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 08 5月, 2018 3 次提交
-
-
由 Christoph Hellwig 提交于
There is no arch specific code required for dma-debug, so there is no need to opt into the support either. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
-
由 Christoph Hellwig 提交于
Most mainstream architectures are using 65536 entries, so lets stick to that. If someone is really desperate to override it that can still be done through <asm/dma-mapping.h>, but I'd rather see a really good rationale for that. dma_debug_init is now called as a core_initcall, which for many architectures means much earlier, and provides dma-debug functionality earlier in the boot process. This should be safe as it only relies on the memory allocator already being available. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
-
由 Michael Ellerman 提交于
The build is failing with CONFIG_NUMA=n and some compiler versions: arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu': hotplug-cpu.c:(.text+0x12c): undefined reference to `timed_topology_update' arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_cpu_remove': hotplug-cpu.c:(.text+0x400): undefined reference to `timed_topology_update' Fix it by moving the empty version of timed_topology_update() into the existing #ifdef block, which has the right guard of SPLPAR && NUMA. Fixes: cee5405d ("powerpc/hotplug: Improve responsiveness of hotplug change") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 07 5月, 2018 4 次提交
-
-
由 Naveen N. Rao 提交于
Some syscall entry functions on powerpc are prefixed with ppc_/ppc32_/ppc64_ rather than the usual sys_/__se_sys prefix. fork(), clone(), swapcontext() are some examples of syscalls with such entry points. We need to match against these names when initializing ftrace syscall tracing. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Naveen N. Rao 提交于
On powerpc64 ABIv1, we are enabling syscall tracing for only ~20 syscalls. This is due to commit e145242e ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention") which has changed the syscall entry wrapper prefix from "SyS" to "__se_sys". Update the logic for ABIv1 to not just skip the initial dot, but also the "__se_sys" prefix. Fixes: commit e145242e ("syscalls/core, syscalls/x86: Clean up syscall stub naming convention") Reported-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
In commit 4e26bc4a ("powerpc/64: Rename soft_enabled to irq_soft_mask") we renamed paca->soft_enabled. But then in commit 8e0b634b ("powerpc/64s: Do not allocate lppaca if we are not virtualized") we added it back. Oops. This happened because the two patches were in flight at the same time and rebased vs each other multiple times, and we missed it in review. Fixes: 8e0b634b ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Christoph Hellwig 提交于
This was used by the ide, scsi and networking code in the past to determine if they should bounce payloads. Now that the dma mapping always have to support dma to all physical memory (thanks to swiotlb for non-iommu systems) there is no need to this crude hack any more. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: Palmer Dabbelt <palmer@sifive.com> (for riscv) Reviewed-by: NJens Axboe <axboe@kernel.dk>
-
- 04 5月, 2018 1 次提交
-
-
由 Daniel Borkmann 提交于
Since LD_ABS/LD_IND instructions are now removed from the core and reimplemented through a combination of inlined BPF instructions and a slow-path helper, we can get rid of the complexity from ppc64 JIT. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Tested-by: NSandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 27 4月, 2018 2 次提交
-
-
由 Laurentiu Tudor 提交于
Add missing "altivec unavailable" interrupt injection helper thus fixing the linker error below: arch/powerpc/kvm/emulate_loadstore.o: In function `kvmppc_check_altivec_disabled': arch/powerpc/kvm/emulate_loadstore.c: undefined reference to `.kvmppc_core_queue_vec_unavail' Fixes: 09f98496 ("KVM: PPC: Book3S: Add MMIO emulation for VMX instructions") Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
smp_send_stop can lock up the IPI path for any subsequent calls, because the receiving CPUs spin in their handler function. This started becoming a problem with the addition of an smp_send_stop call in the reboot path, because panics can reboot after doing their own smp_send_stop. The NMI IPI variant was fixed with ac61c115 ("powerpc: Fix smp_send_stop NMI IPI handling"), which leaves the smp_call_function variant. This is fixed by having smp_send_stop only ever do the smp_call_function once. This is a bit less robust than the NMI IPI fix, because any other call to smp_call_function after smp_send_stop could deadlock, but that has always been the case, and it was not been a problem before. Fixes: f2748bdf ("powerpc/powernv: Always stop secondaries before reboot/shutdown") Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 25 4月, 2018 5 次提交
-
-
由 Eric W. Biederman 提交于
Using an si_code of 0 that aliases with SI_USER is clearly the wrong thing todo, and causes problems in interesting ways. For use in unknown_exception the recently defined TRAP_UNK semantically is a perfect fit. For use in RunModeException it looks like something more specific than TRAP_UNK could be used. No one has bothered to find a better fit than the broken si_code of 0 in all of these years and I don't see an obvious better fit so TRAP_UNK is switching RunModeException to return TRAP_UNK is clearly an improvement. Recent history suggests no actually cares about crazy corner cases of the kernel behavior like this so I don't expect any regressions from changing this. However if something does happen this change is easy to revert. Though I wonder if SIGKILL might not be a better fit. Cc: Paul Mackerras <paulus@samba.org> Cc: Kumar Gala <kumar.gala@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx") Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Using an si_code of 0 that aliases with SI_USER is clearly the wrong thing todo, and causes problems in interesting ways. The newly defined FPE_FLTUNK semantically appears to fit the bill so use it instead. Cc: Paul Mackerras <paulus@samba.org> Cc: Kumar Gala <kumar.gala@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Fixes: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx") Fixes: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitSigned-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Call clear_siginfo to ensure every stack allocated siginfo is properly initialized before being passed to the signal sending functions. Note: It is not safe to depend on C initializers to initialize struct siginfo on the stack because C is allowed to skip holes when initializing a structure. The initialization of struct siginfo in tracehook_report_syscall_exit was moved from the helper user_single_step_siginfo into tracehook_report_syscall_exit itself, to make it clear that the local variable siginfo gets fully initialized. In a few cases the scope of struct siginfo has been reduced to make it clear that siginfo siginfo is not used on other paths in the function in which it is declared. Instances of using memset to initialize siginfo have been replaced with calls clear_siginfo for clarity. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Nicholas Piggin 提交于
The NMI IPI handler for a receiving CPU increments nmi_ipi_busy_count over the handler function call, which causes later smp_send_nmi_ipi() callers to spin until the call is finished. The stop_this_cpu() function never returns, so the busy count is never decremeted, which can cause the system to hang in some cases. For example panic() will call smp_send_stop() early on which calls stop_this_cpu() on other CPUs, then later in the reboot path, pnv_restart() will call smp_send_stop() again, which hangs. Fix this by adding a special case to the stop_this_cpu() handler to decrement the busy count, because it will never return. Now that the NMI/non-NMI versions of stop_this_cpu() are different, split them out into separate functions rather than doing #ifdef tricks to share the body between the two functions. Fixes: 6bed3237 ("powerpc: use NMI IPI for smp_send_stop") Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> [mpe: Split out the functions, tweak change log a bit] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or OPAL_BUSY_EVENT from firmware, which causes large scheduling latencies, up to 50 seconds have been observed here when RTC stops responding (BMC reboot can do it). Fix this by converting it to the standard form OPAL_BUSY loop that sleeps. Fixes: 628daa8d ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks") Cc: stable@vger.kernel.org # v3.2+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 24 4月, 2018 5 次提交
-
-
由 Mahesh Salgaonkar 提交于
The current code extracts the physical address for UE errors and then hooks it up into memory failure infrastructure. On successful extraction of physical address it wrongly sets "handled = 1" which means this UE error has been recovered. Since MCE handler gets return value as handled = 1, it assumes that error has been recovered and goes back to same NIP. This causes MCE interrupt again and again in a loop leading to hard lockup. Also, initialize phys_addr to ULONG_MAX so that we don't end up queuing undesired page to hwpoison. Without this patch we see: Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 ... Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Severe Machine check interrupt [Recovered] NIP: [000000001002588c] PID: 7109 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffd2755940 Physical address: 000020181a080000 Memory failure: 0x20181a08: recovery action for dirty LRU page: Recovered Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned Memory failure: 0x20181a08: already hardware poisoned ... Watchdog CPU:38 Hard LOCKUP After this patch we see: Severe Machine check interrupt [Not recovered] NIP: [00007fffaae585f4] PID: 7168 Comm: find Initiator: CPU Error type: UE [Load/Store] Effective address: 00007fffaafe28ac Physical address: 00002017c0bd0000 find[7168]: unhandled signal 7 at 00007fffaae585f4 nip 00007fffaae585f4 lr 00007fffaae585e0 code 4 Memory failure: 0x2017c0bd: recovery action for dirty LRU page: Recovered Fixes: 01eaac2b ("powerpc/mce: Hookup ierror (instruction) UE errors") Fixes: ba41e1e1 ("powerpc/mce: Hookup derror (load/store) UE errors") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NBalbir Singh <bsingharora@gmail.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
The NPU has a limited number of address translation shootdown (ATSD) registers and the GPU has limited bandwidth to process ATSDs. This can result in contention of ATSD registers leading to soft lockups on some threads, particularly when invalidating a large address range in pnv_npu2_mn_invalidate_range(). At some threshold it becomes more efficient to flush the entire GPU TLB for the given MM context (PID) than individually flushing each address in the range. This patch will result in ranges greater than 2MB being converted from 32+ ATSDs into a single ATSD which will flush the TLB for the given PID on each GPU. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Acked-by: NBalbir Singh <bsingharora@gmail.com> Tested-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
There is a single npu context per set of callback parameters. Callers should be prevented from overwriting existing callback values so instead return an error if different parameters are passed. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com> Tested-by: NMark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Alistair Popple 提交于
The pnv_npu2_init_context() and pnv_npu2_destroy_context() functions are used to allocate/free contexts to allow address translation and shootdown by the NPU on a particular GPU. Context initialisation is implicitly safe as it is protected by the requirement mmap_sem be held in write mode, however pnv_npu2_destroy_context() does not require mmap_sem to be held and it is not safe to call with a concurrent initialisation for a different GPU. It was assumed the driver would ensure destruction was not called concurrently with initialisation. However the driver may be simplified by allowing concurrent initialisation and destruction for different GPUs. As npu context creation/destruction is not a performance critical path and the critical section is not large a single spinlock is used for simplicity. Fixes: 1ab66d1f ("powerpc/powernv: Introduce address translation services for Nvlink2") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NAlistair Popple <alistair@popple.id.au> Reviewed-by: NMark Hairgrove <mhairgrove@nvidia.com> Tested-by: NMark Hairgrove <mhairgrove@nvidia.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Balbir Singh 提交于
Don't do this via custom code, instead now that we have support in the arch hotplug/hotunplug code, rely on those routines to do the right thing. The existing flush doesn't work because it uses ppc64_caches.l1d.size instead of ppc64_caches.l1d.line_size. Fixes: 9d5171a8 ("powerpc/powernv: Enable removal of memory for in memory tracing") Signed-off-by: NBalbir Singh <bsingharora@gmail.com> Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-