- 01 4月, 2020 14 次提交
-
-
由 Nicholas Piggin 提交于
This allows more code to be moved out of unrelocated regions. The system call KVMTEST is changed to be open-coded and remain in the tramp area to avoid having to move it to entry_64.S. The custom nature of the system call entry code means the hcall case can be made more streamlined than regular interrupt handlers. mpe: Incorporate fix from Nick: Moving KVM test to the common entry code missed the case of HMI and MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to switch to virt mode). This means a MCE or HMI exception that is taken while KVM is running a guest context will not be switched out of that context, and KVM won't be notified. Found by running sigfuz in guest with patched host on POWER9 DD2.3, which causes some TM related HMI interrupts (which are expected and supposed to be handled by KVM). This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add the KVM test. This makes them look a little more like other handlers that all use __GEN_COMMON_ENTRY. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
As well as moving code out of the unrelocated vectors, this allows the masked handlers to be moved to common code, and allows the soft_nmi handler to be generated more like a regular handler. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-12-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
The real mode interrupt entry points currently use rfid to branch to the common handler in virtual mode. This is a significant amount of code, and forces other code (notably the KVM test) to live in the real mode handler. In the interest of minimising the amount of code that runs unrelocated move the switch to virt mode into the common code, and do it with mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST performance of real-mode interrupt handlers is not a big concern these days). This requires CTR to always be saved (real-mode needs to reach 0xc...) but that's not a huge impact these days. It could be optimized away in future. mpe: Incorporate fix from Nick: It's possible for interrupts to be replayed when TM is enabled and suspended, for example rt_sigreturn, where the mtmsrd MSR_KERNEL in the real-mode entry point to the common handler causes a TM Bad Thing exception (due to attempting to clear suspended). The fix for this is to have replay interrupts go to the _virt entry point and skip the mtmsrd, which matches what happens before this patch. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-11-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
Rather than using DAR=2 to select the i-side registers, add an explicit option. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-10-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-9-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-8-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-7-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
Aside from label names and BUG line numbers, the generated code change is an additional HMI KVM handler added for the "late" KVM handler, because early and late HMI generation is achieved by defining two different interrupt types. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-6-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
These don't provide a large amount of code sharing. Removing them makes code easier to shuffle around. For example, some of the common instructions will be moved into the common code gen macro. No generated code change. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-5-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
No generated code change. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-4-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
No generated code change. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-3-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
The code generation macro arguments are difficult to read, and defaults can't easily be used. This introduces a block where parameters can be set for interrupt handler code generation by the subsequent macros, and adds the first generation macro for interrupt entry. One interrupt handler is converted to the new macros to demonstrate the change, the rest will be coverted all at once. No generated code change. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200225173541.1549955-2-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
Before: WARNING: CPU: 0 PID: 494 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 494 Comm: a Tainted: G W NIP: c00000000001ed2c LR: c000000000d13190 CTR: c00000000003f910 REGS: c0000001fffd3870 TRAP: 0700 Tainted: G W MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000aeb12c c0000001fffd3b00 c0000000012ba300 0000000000000000 GPR04: 0000000000000000 0000000000000000 000000010bd207c8 6b00696e74657272 GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 000000010bd207bc GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000d13190] _raw_spin_unlock_irqrestore+0x50/0xc0 Call Trace: Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 After: WARNING: CPU: 0 PID: 499 at arch/powerpc/kernel/irq.c:343 CPU: 0 PID: 499 Comm: a Not tainted NIP: c00000000001ed2c LR: c000000000d13210 CTR: c00000000003f980 REGS: c0000001fffd3870 TRAP: 0700 Not tainted MSR: 8000000000021003 <SF,ME,RI,LE> CR: 28000488 XER: 00000000 CFAR: c00000000001ec90 IRQMASK: 0 GPR00: c000000000aeb1ac c0000001fffd3b00 c0000000012ba300 0000000000000000 GPR04: 0000000000000000 0000000000000000 00000001347607c8 6b00696e74657272 GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000 GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 00000001347607bc GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50 NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100 LR [c000000000d13210] _raw_spin_unlock_irqrestore+0x50/0xc0 Call Trace: [c0000001fffd3b20] [c000000000aeb1ac] of_find_property+0x6c/0x90 [c0000001fffd3b70] [c000000000aeb1f0] of_get_property+0x20/0x40 [c0000001fffd3b90] [c000000000042cdc] rtas_token+0x3c/0x70 [c0000001fffd3bb0] [c0000000000dc318] fwnmi_release_errinfo+0x28/0x70 [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540 [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70 [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0 --- interrupt: 200 at 0x1347607c8 LR = 0x7fffafbd8328 Instruction dump: 60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000 60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000 Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200325104144.158362-1-npiggin@gmail.com
-
由 Michael Ellerman 提交于
In restore_tm_sigcontexts() we take the trap value directly from the user sigcontext with no checking: err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]); This means we can be in the kernel with an arbitrary regs->trap value. Although that's not immediately problematic, there is a risk we could trigger one of the uses of CHECK_FULL_REGS(): #define CHECK_FULL_REGS(regs) BUG_ON(regs->trap & 1) It can also cause us to unnecessarily save non-volatile GPRs again in save_nvgprs(), which shouldn't be problematic but is still wrong. It's also possible it could trick the syscall restart machinery, which relies on regs->trap not being == 0xc00 (see 9a81c16b ("powerpc: fix double syscall restarts")), though I haven't been able to make that happen. Finally it doesn't match the behaviour of the non-TM case, in restore_sigcontext() which zeroes regs->trap. So change restore_tm_sigcontexts() to zero regs->trap. This was discovered while testing Nick's upcoming rewrite of the syscall entry path. In that series the call to save_nvgprs() prior to signal handling (do_notify_resume()) is removed, which leaves the low-bit of regs->trap uncleared which can then trigger the FULL_REGS() WARNs in setup_tm_sigcontexts(). Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.au
-
- 27 3月, 2020 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
As per ISA an isync is only needed on instruction cache block invalidate. Remove the same from dcache invalidate. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200320103242.229223-1-aneesh.kumar@linux.ibm.com
-
由 Fangrui Song 提交于
.globl sets the symbol binding to STB_GLOBAL while .weak sets the binding to STB_WEAK. GNU as let .weak override .globl since binutils-gdb 5ca547dc2399a0a5d9f20626d4bf5547c3ccfddd (1996). Clang integrated assembler let the last win but it may error in the future. Since it is a convention that only one binding directive is used, just delete .globl. Fixes: ee9d21b3 ("powerpc/boot: Ensure _zimage_start is a weak symbol") Signed-off-by: NFangrui Song <maskray@google.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200325164257.170229-1-maskray@google.com
-
由 Ganesh Goudar 提交于
memcpy_mcsafe has been implemented for power machines which is used by pmem infrastructure, so that an UE encountered during memcpy from pmem devices would not result in panic instead a right error code is returned. The implementation expects machine check handler to ignore the event and set nip to continue the execution from fixup code. Appropriate changes are already made to powernv machine check handler, make similar changes to pseries machine check handler to ignore the the event and set nip to continue execution at the fixup entry if we hit UE at an instruction with a fixup entry. while we are at it, have a common function which searches the exception table entry and updates nip with fixup address, and any future common changes can be made in this function that are valid for both architectures. powernv changes are made by commit 895e3dce ("powerpc/mce: Handle UE event for memcpy_mcsafe") Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: NSantosh S <santosh@fossix.org> Signed-off-by: NGanesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200326184916.31172-1-ganeshgr@linux.ibm.com
-
- 26 3月, 2020 8 次提交
-
-
由 Michael Ellerman 提交于
We can avoid the #ifdef by using IS_ENABLED() in the existing condition check. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313112020.28235-2-mpe@ellerman.id.au
-
由 Michael Ellerman 提交于
We don't need the NULL check of np, the result is the same because the OF helpers cope with NULL, of_node_to_nid(NULL) == NUMA_NO_NODE (-1). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313112020.28235-1-mpe@ellerman.id.au
-
由 Douglas Miller 提交于
The reason debuggers add an ASCII dump to other types of memory dumps is to give the user visual reference points in the case that ASCII strings are adjacent to other structures or element. For example, when examining the task_struct structure one can look for the comm[] string and use it to locate other important elements. ASCII strings do not have endianess, they exist in memory in the same order regardless of CPU endianess. ASCII strings are, by definition, human readable and so should be presented in a human readable format. For these reasons, the supplemental ASCII dump does not re-order the strings from memory to match the endianess of the corresponding 16, 32, or 64 bit words. That would make the ASCII dump much less useful. Signed-off-by: NDouglas Miller <dougmill@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1488205694-13337-1-git-send-email-dougmill@linux.vnet.ibm.com
-
由 Cédric Le Goater 提交于
As does XMON, the debugfs file /sys/kernel/debug/powerpc/xive exposes the XIVE internal state of the machine CPUs and interrupts. Available on the PowerNV and sPAPR platforms. Signed-off-by: NCédric Le Goater <clg@kaod.org> [mpe: Make the debugfs file 0400] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-5-clg@kaod.org
-
由 Cédric Le Goater 提交于
Some firmwares or hypervisors can advertise different source characteristics. Track their value under XMON. What we are mostly interested in is the StoreEOI flag. Signed-off-by: NCédric Le Goater <clg@kaod.org> Reviewed-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-4-clg@kaod.org
-
由 Cédric Le Goater 提交于
The PowerNV platform has multiple IRQ chips and the xmon command dumping the state of the XIVE interrupt should only operate on the XIVE IRQ chip. Fixes: 5896163f ("powerpc/xmon: Improve output of XIVE interrupts") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: NCédric Le Goater <clg@kaod.org> Reviewed-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-3-clg@kaod.org
-
由 Cédric Le Goater 提交于
When a CPU is brought up, an IPI number is allocated and recorded under the XIVE CPU structure. Invalid IPI numbers are tracked with interrupt number 0x0. On the PowerNV platform, the interrupt number space starts at 0x10 and this works fine. However, on the sPAPR platform, it is possible to allocate the interrupt number 0x0 and this raises an issue when CPU 0 is unplugged. The XIVE spapr driver tracks allocated interrupt numbers in a bitmask and it is not correctly updated when interrupt number 0x0 is freed. It stays allocated and it is then impossible to reallocate. Fix by using the XIVE_BAD_IRQ value instead of zero on both platforms. Reported-by: NDavid Gibson <david@gibson.dropbear.id.au> Fixes: eac1e731 ("powerpc/xive: guest exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: NCédric Le Goater <clg@kaod.org> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Tested-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306150143.5551-2-clg@kaod.org
-
由 Nick Desaulniers 提交于
Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Suggested-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> [mpe: Drop changes to a/p/boot which doesn't use linux headers] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190812215052.71840-10-ndesaulniers@google.com
-
- 25 3月, 2020 15 次提交
-
-
由 Fabiano Rosas 提交于
The if statement that this comment referred to was removed in commit 11fdb309 ("powerpc/prom_init: Remove support for OPAL v2"). Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200324182912.1048906-1-farosas@linux.ibm.com
-
由 Christophe Leroy 提交于
When a program check exception happens while MMU translation is disabled, following Oops happens in kprobe_handler() in the following code: } else if (*addr != BREAKPOINT_INSTRUCTION) { BUG: Unable to handle kernel data access on read at 0x0000e268 Faulting instruction address: 0xc000ec34 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=16K PREEMPT CMPC885 Modules linked in: CPU: 0 PID: 429 Comm: cat Not tainted 5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267 NIP: c000ec34 LR: c000ecd8 CTR: c019cab8 REGS: ca4d3b58 TRAP: 0300 Not tainted (5.6.0-rc1-s3k-dev-00824-g84195dc6c58a) MSR: 00001032 <ME,IR,DR,RI> CR: 2a4d3c52 XER: 00000000 DAR: 0000e268 DSISR: c0000000 GPR00: c000b09c ca4d3c10 c66d0620 00000000 ca4d3c60 00000000 00009032 00000000 GPR08: 00020000 00000000 c087de44 c000afe0 c66d0ad0 100d3dd6 fffffff3 00000000 GPR16: 00000000 00000041 00000000 ca4d3d70 00000000 00000000 0000416d 00000000 GPR24: 00000004 c53b6128 00000000 0000e268 00000000 c07c0000 c07bb6fc ca4d3c60 NIP [c000ec34] kprobe_handler+0x128/0x290 LR [c000ecd8] kprobe_handler+0x1cc/0x290 Call Trace: [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4 --- interrupt: 700 at 0xe268 Instruction dump: 913e0008 81220000 38600001 3929ffff 91220000 80010024 bb410008 7c0803a6 38210020 4e800020 38600000 4e800020 <813b0000> 6d2a7fe0 2f8a0008 419e0154 ---[ end trace 5b9152d4cdadd06d ]--- kprobe is not prepared to handle events in real mode and functions running in real mode should have been blacklisted, so kprobe_handler() can safely bail out telling 'this trap is not mine' for any trap that happened while in real-mode. If the trap happened with MSR_IR or MSR_DR cleared, return 0 immediately. Reported-by: NLarry Finger <Larry.Finger@lwfinger.net> Fixes: 6cc89bad ("powerpc/kprobes: Invoke handlers directly") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org> Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/424331e2006e7291a1bfe40e7f3fa58825f565e1.1582054578.git.christophe.leroy@c-s.fr
-
由 Nathan Chancellor 提交于
When building ppc64 defconfig, Clang errors (trimmed for brevity): arch/powerpc/platforms/maple/setup.c:365:1: error: attribute declaration must precede definition [-Werror,-Wignored-attributes] machine_device_initcall(maple, maple_cpc925_edac_setup); ^ machine_device_initcall expands to __define_machine_initcall, which in turn has the macro machine_is used in it, which declares mach_##name with an __attribute__((weak)). define_machine actually defines mach_##name, which in this file happens before the declaration, hence the warning. To fix this, move define_machine after machine_device_initcall so that the declaration occurs before the definition, which matches how machine_device_initcall and define_machine work throughout arch/powerpc. While we're here, remove some spaces before tabs. Fixes: 8f101a05 ("edac: cpc925 MC platform device setup") Reported-by: NNick Desaulniers <ndesaulniers@google.com> Suggested-by: NIlie Halip <ilie.halip@gmail.com> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200323222729.15365-1-natechancellor@gmail.com
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200320152436.1468651-1-npiggin@gmail.com
-
由 Oliver O'Halloran 提交于
With the EEH early probe now being pseries specific there's no need for eeh_ops->probe() to take a pci_dn. Instead, we can make it take a pci_dev and use the probe function to map a pci_dev to an eeh_dev. This allows the platform to implement it's own method for finding (or creating) an eeh_dev for a given pci_dev which also removes a use of pci_dn in generic EEH code. This patch also renames eeh_device_add_late() to eeh_device_probe(). This better reflects what it does does and removes the last vestiges of the early/late EEH probe split. Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-6-oohall@gmail.com
-
由 Oliver O'Halloran 提交于
The eeh_ops->probe() function is called from two different contexts: 1. On pseries, where we set EEH_PROBE_MODE_DEVTREE, it's called in eeh_add_device_early() which is supposed to run before we create a pci_dev. 2. On PowerNV, where we set EEH_PROBE_MODE_DEV, it's called in eeh_device_add_late() which is supposed to run *after* the pci_dev is created. The "early" probe is required because PAPR requires that we perform an RTAS call to enable EEH support on a device before we start interacting with it via config space or MMIO. This requirement doesn't exist on PowerNV and shoehorning two completely separate initialisation paths into a common interface just results in a convoluted code everywhere. Additionally the early probe requires the probe function to take an pci_dn rather than a pci_dev argument. We'd like to make pci_dn a pseries specific data structure since there's no real requirement for them on PowerNV. To help both goals move the early probe into the pseries containment zone so the platform depedence is more explicit. Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-5-oohall@gmail.com
-
由 Oliver O'Halloran 提交于
This check for a missing PHB has existing in various forms since the initial PPC64 port was upstreamed in 2002. The idea seems to be that we need to guard against creating pci-specific data structures for the non-pci children of a PCI device tree node (e.g. USB devices). However, we only create pci_dn structures for DT nodes that correspond to PCI devices so there's not much point in doing this check in the eeh_probe path. Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-4-oohall@gmail.com
-
由 Oliver O'Halloran 提交于
The pci hotplug helper (pci_hp_add_devices()) calls eeh_add_device_tree_early() to scan the device-tree for new PCI devices and do the early EEH probe before the device is scanned. This early probe is a no-op in a lot of cases because: a) The early init is only required to satisfy a PAPR requirement that EEH be configured before we start doing config accesses. On PowerNV it is a no-op. b) It's a no-op for devices that have already had their eeh_dev initialised. There are four callers of pci_hp_add_devices(): 1. arch/powerpc/kernel/eeh_driver.c Here the hotplug helper is called when re-scanning pci_devs that were removed during an EEH recovery pass. The EEH stat for each removed device (the eeh_dev) is retained across a recovery pass so the early init is a no-op in this case. 2. drivers/pci/hotplug/pnv_php.c This is also a no-op since the PowerNV hotplug driver is, suprisingly, PowerNV specific. 3. drivers/pci/hotplug/rpaphp_core.c 4. drivers/pci/hotplug/rpaphp_pci.c In these two cases new devices have been hotplugged and FW has provided new DT nodes for each. These are the only two cases where the EEH we might have new PCI device nodes in the DT so these are the only two cases where the early EEH probe needs to be done. We can move the calls to eeh_add_device_tree_early() to the locations where it's needed and remove it from the generic path. This is preparation for making the early EEH probe pseries specific. Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-3-oohall@gmail.com
-
由 Oliver O'Halloran 提交于
On pseries and PowerNV pcibios_bus_add_device() calls eeh_add_device_late() so there's no need to do a separate tree traversal to bind the eeh_dev and pci_dev together setting up the PHB at boot. As a result we can remove eeh_add_device_tree_late(). Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-2-oohall@gmail.com
-
由 Oliver O'Halloran 提交于
Move creating the EEH specific sysfs files into eeh_add_device_late() rather than being open-coded all over the place. Calling the function is generally done immediately after calling eeh_add_device_late() anyway. This is also a correctness fix since currently the sysfs files will be added even if the EEH probe happens to fail. Similarly, on pseries we currently add the sysfs files before calling eeh_add_device_late(). This is flat-out broken since the sysfs files require the pci_dev->dev.archdata.edev pointer to be set, and that is done in eeh_add_device_late(). Reviewed-by: NSam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200306073904.4737-1-oohall@gmail.com
-
由 Michael Ellerman 提交于
The previous commit reduced the amount of code that is run before we setup a paca. However there are still a few remaining functions that run with no paca, or worse, with an arbitrary value in r13 that will be used as a paca pointer. In particular the stack protector canary is stored in the paca, so if stack protector is activated for any of these functions we will read the stack canary from wherever r13 points. If r13 happens to point outside of memory we will get a machine check / checkstop. For example if we modify initialise_paca() to trigger stack protection, and then boot in the mambo simulator with r13 poisoned in skiboot before calling the kernel: DEBUG: 19952232: (19952232): INSTRUCTION: PC=0xC0000000191FC1E8: [0x3C4C006D]: addis r2,r12,0x6D [fetch] DEBUG: 19952236: (19952236): INSTRUCTION: PC=0xC00000001807EAD8: [0x7D8802A6]: mflr r12 [fetch] FATAL ERROR: 19952276: (19952276): Check Stop for 0:0: Machine Check with ME bit of MSR off DEBUG: 19952276: (19952276): INSTRUCTION: PC=0xC0000000191FCA7C: [0xE90D0CF8]: ld r8,0xCF8(r13) [Instruction Failed] INFO: 19952276: (19952277): ** Execution stopped: Mambo Error, Machine Check Stop, ** systemsim % bt pc: 0xC0000000191FCA7C initialise_paca+0x54 lr: 0xC0000000191FC22C early_setup+0x44 stack:0x00000000198CBED0 0x0 +0x0 stack:0x00000000198CBF00 0xC0000000191FC22C early_setup+0x44 stack:0x00000000198CBF90 0x1801C968 +0x1801C968 So annotate the relevant functions to ensure stack protection is never enabled for them. Fixes: 06ec27ae ("powerpc/64: add stack protector support") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200320032116.1024773-2-mpe@ellerman.id.au
-
由 Daniel Axtens 提交于
Currently we set up the paca after parsing the device tree for CPU features. Prior to that, r13 contains random data, which means there is random data in r13 while we're running the generic dt parsing code. This random data varies depending on whether we boot through a vmlinux or a zImage: for the vmlinux case it's usually around zero, but for zImages we see random values like 912a72603d420015. This is poor practice, and can also lead to difficult-to-debug crashes. For example, when kcov is enabled, the kcov instrumentation attempts to read preempt_count out of the current task, which goes via the paca. This then crashes in the zImage case. Similarly stack protector can cause crashes if r13 is bogus, by reading from the stack canary in the paca. To resolve this: - move the paca setup to before the CPU feature parsing. - because we no longer have access to CPU feature flags in paca setup, change the HV feature test in the paca setup path to consider the actual value of the MSR rather than the CPU feature. Translations get switched on once we leave early_setup, so I think we'd already catch any other cases where the paca or task aren't set up. Boot tested on a P9 guest and host. Fixes: fb0b0a73 ("powerpc: Enable kcov") Fixes: 06ec27ae ("powerpc/64: add stack protector support") Cc: stable@vger.kernel.org # v4.20+ Reviewed-by: NAndrew Donnellan <ajd@linux.ibm.com> Suggested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NDaniel Axtens <dja@axtens.net> [mpe: Reword comments & change log a bit to mention stack protector] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200320032116.1024773-1-mpe@ellerman.id.au
-
由 Aneesh Kumar K.V 提交于
H_PAGE_THP_HUGE is used to differentiate between a THP hugepage and hugetlb hugepage entries. The difference is WRT how we handle hash fault on these address. THP address enables MPSS in segments. We want to manage devmap hugepage entries similar to THP pt entries. Hence use H_PAGE_THP_HUGE for devmap huge PTE entries. With current code while handling hash PTE fault, we do set is_thp = true when finding devmap PTE huge PTE entries. Current code also does the below sequence we setting up huge devmap entries. entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); if (pfn_t_devmap(pfn)) entry = pmd_mkdevmap(entry); In that case we would find both H_PAGE_THP_HUGE and PAGE_DEVMAP set for huge devmap PTE entries. This results in false positive error like below. kernel BUG at /home/kvaneesh/src/linux/mm/memory.c:4321! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 56 PID: 67996 Comm: t_mmap_dio Not tainted 5.6.0-rc4-59640-g371c804dedbc #128 .... NIP [c00000000044c9e4] __follow_pte_pmd+0x264/0x900 LR [c0000000005d45f8] dax_writeback_one+0x1a8/0x740 Call Trace: str_spec.74809+0x22ffb4/0x2d116c (unreliable) dax_writeback_one+0x1a8/0x740 dax_writeback_mapping_range+0x26c/0x700 ext4_dax_writepages+0x150/0x5a0 do_writepages+0x68/0x180 __filemap_fdatawrite_range+0x138/0x180 file_write_and_wait_range+0xa4/0x110 ext4_sync_file+0x370/0x6e0 vfs_fsync_range+0x70/0xf0 sys_msync+0x220/0x2e0 system_call+0x5c/0x68 This is because our pmd_trans_huge check doesn't exclude _PAGE_DEVMAP. To make this all consistent, update pmd_mkdevmap to set H_PAGE_THP_HUGE and pmd_trans_huge check now excludes _PAGE_DEVMAP correctly. Fixes: ebd31197 ("powerpc/mm: Add devmap support for ppc64") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200313094842.351830-1-aneesh.kumar@linux.ibm.com
-
由 Christophe Leroy 提交于
Reorder Linux PTE bits to (almost) match Hash PTE bits. RW Kernel : PP = 00 RO Kernel : PP = 00 RW User : PP = 01 RO User : PP = 11 So naturally, we should have _PAGE_USER = 0x001 _PAGE_RW = 0x002 Today 0x001 and 0x002 and _PAGE_PRESENT and _PAGE_HASHPTE which both are software only bits. Switch _PAGE_USER and _PAGE_PRESET Switch _PAGE_RW and _PAGE_HASHPTE This allows to remove a few insns. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c4d6c18a7f8d9d3b899bc492f55fbc40ef38896a.1583861325.git.christophe.leroy@c-s.fr
-
由 Christophe Leroy 提交于
At the moment kasan_remap_early_shadow_ro() does nothing, because k_end is 0 and k_cur < 0 is always true. Change the test to k_cur != k_end, as done in kasan_init_shadow_page_tables() Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Fixes: cbd18991 ("powerpc/mm: Fix an Oops in kasan_mmu_init()") Cc: stable@vger.kernel.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/4e7b56865e01569058914c991143f5961b5d4719.1583507333.git.christophe.leroy@c-s.fr
-