- 02 12月, 2020 1 次提交
-
-
由 Nicholas Piggin 提交于
This can be hit by an HPT guest running on an HPT host and bring down the host, so it's quite important to fix. Fixes: 7290f3b3 ("powerpc/64s/powernv: machine check dump SLB contents") Cc: stable@vger.kernel.org # v5.4+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NMahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-2-npiggin@gmail.com
-
- 01 12月, 2020 1 次提交
-
-
由 Greg Kurz 提交于
Commit 062cfab7 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") updated kvmppc_xive_vcpu_id_valid() in a way that allows userspace to trigger an assertion in skiboot and crash the host: [ 696.186248988,3] XIVE[ IC 08 ] eq_blk != vp_blk (0 vs. 1) for target 0x4300008c/0 [ 696.186314757,0] Assert fail: hw/xive.c:2370:0 [ 696.186342458,0] Aborting! xive-kvCPU 0043 Backtrace: S: 0000000031e2b8f0 R: 0000000030013840 .backtrace+0x48 S: 0000000031e2b990 R: 000000003001b2d0 ._abort+0x4c S: 0000000031e2ba10 R: 000000003001b34c .assert_fail+0x34 S: 0000000031e2ba90 R: 0000000030058984 .xive_eq_for_target.part.20+0xb0 S: 0000000031e2bb40 R: 0000000030059fdc .xive_setup_silent_gather+0x2c S: 0000000031e2bc20 R: 000000003005a334 .opal_xive_set_vp_info+0x124 S: 0000000031e2bd20 R: 00000000300051a4 opal_entry+0x134 --- OPAL call token: 0x8a caller R1: 0xc000001f28563850 --- XIVE maintains the interrupt context state of non-dispatched vCPUs in an internal VP structure. We allocate a bunch of those on startup to accommodate all possible vCPUs. Each VP has an id, that we derive from the vCPU id for efficiency: static inline u32 kvmppc_xive_vp(struct kvmppc_xive *xive, u32 server) { return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); } The KVM XIVE device used to allocate KVM_MAX_VCPUS VPs. This was limitting the number of concurrent VMs because the VP space is limited on the HW. Since most of the time, VMs run with a lot less vCPUs, commit 062cfab7 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") gave the possibility for userspace to tune the size of the VP block through the KVM_DEV_XIVE_NR_SERVERS attribute. The check in kvmppc_pack_vcpu_id() was changed from cpu < KVM_MAX_VCPUS * xive->kvm->arch.emul_smt_mode to cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode The previous check was based on the fact that the VP block had KVM_MAX_VCPUS entries and that kvmppc_pack_vcpu_id() guarantees that packed vCPU ids are below KVM_MAX_VCPUS. We've changed the size of the VP block, but kvmppc_pack_vcpu_id() has nothing to do with it and it certainly doesn't ensure that the packed vCPU ids are below xive->nr_servers. kvmppc_xive_vcpu_id_valid() might thus return true when the VM was configured with a non-standard VSMT mode, even if the packed vCPU id is higher than what we expect. We end up using an unallocated VP id, which confuses OPAL. The assert in OPAL is probably abusive and should be converted to a regular error that the kernel can handle, but we shouldn't really use broken VP ids in the first place. Fix kvmppc_xive_vcpu_id_valid() so that it checks the packed vCPU id is below xive->nr_servers, which is explicitly what we want. Fixes: 062cfab7 ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/160673876747.695514.1809676603724514920.stgit@bahia.lan
-
- 27 11月, 2020 1 次提交
-
-
由 Srikar Dronamraju 提交于
Commit e75130f2 ("powerpc/numa: Offline memoryless cpuless node 0") offlines node 0 and expects nodes to be subsequently onlined when CPUs or nodes are detected. Commit 6398eaa2 ("powerpc/numa: Prefer node id queried from vphn") skips onlining node 0 when CPUs are associated with node 0. On systems with node 0 having CPUs but no memory, this causes node 0 be marked offline. This causes issues at boot time when trying to set memory node for online CPUs while building the zonelist. 0:mon> t [link register ] c000000000400354 __build_all_zonelists+0x164/0x280 [c00000000161bda0] c0000000016533c8 node_states+0x20/0xa0 (unreliable) [c00000000161bdc0] c000000000400384 __build_all_zonelists+0x194/0x280 [c00000000161be30] c000000001041800 build_all_zonelists_init+0x4c/0x118 [c00000000161be80] c0000000004020d0 build_all_zonelists+0x190/0x1b0 [c00000000161bef0] c000000001003cf8 start_kernel+0x18c/0x6a8 [c00000000161bf90] c00000000000adb4 start_here_common+0x1c/0x3e8 0:mon> r R00 = c000000000400354 R16 = 000000000b57a0e8 R01 = c00000000161bda0 R17 = 000000000b57a6b0 R02 = c00000000161ce00 R18 = 000000000b5afee8 R03 = 0000000000000000 R19 = 000000000b6448a0 R04 = 0000000000000000 R20 = fffffffffffffffd R05 = 0000000000000000 R21 = 0000000001400000 R06 = 0000000000000000 R22 = 000000001ec00000 R07 = 0000000000000001 R23 = c000000001175580 R08 = 0000000000000000 R24 = c000000001651ed8 R09 = c0000000017e84d8 R25 = c000000001652480 R10 = 0000000000000000 R26 = c000000001175584 R11 = c000000c7fac0d10 R27 = c0000000019568d0 R12 = c000000000400180 R28 = 0000000000000000 R13 = c000000002200000 R29 = c00000000164dd78 R14 = 000000000b579f78 R30 = 0000000000000000 R15 = 000000000b57a2b8 R31 = c000000001175584 pc = c000000000400194 local_memory_node+0x24/0x80 cfar= c000000000074334 mcount+0xc/0x10 lr = c000000000400354 __build_all_zonelists+0x164/0x280 msr = 8000000002001033 cr = 44002284 ctr = c000000000400180 xer = 0000000000000001 trap = 380 dar = 0000000000001388 dsisr = c00000000161bc90 0:mon> Fix this by setting node to be online while onlining CPUs that belong to node 0. Fixes: e75130f2 ("powerpc/numa: Offline memoryless cpuless node 0") Fixes: 6398eaa2 ("powerpc/numa: Prefer node id queried from vphn") Reported-by: NMilan Mohanty <milmohan@in.ibm.com> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127053738.10085-1-srikar@linux.vnet.ibm.com
-
- 26 11月, 2020 3 次提交
-
-
由 Nicholas Piggin 提交于
When offlining a CPU, powerpc/64s does not flush TLBs, rather it just leaves the CPU set in mm_cpumasks, so it continues to receive TLBIEs to manage its TLBs. However the exit_flush_lazy_tlbs() function expects that after returning, all CPUs (except self) have flushed TLBs for that mm, in which case TLBIEL can be used for this flush. This breaks for offline CPUs because they don't get the IPI to flush their TLB. This can lead to stale translations. Fix this by clearing the CPU from mm_cpumasks, then flushing all TLBs before going offline. These offlined CPU bits stuck in the cpumask also prevents the cpumask from being trimmed back to local mode, which means continual broadcast IPIs or TLBIEs are needed for TLB flushing. This patch prevents that situation too. A cast of many were involved in working this out, but in particular Milton, Aneesh, Paul made key discoveries. Fixes: 0cef77c7 ("powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask") Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: NMilton Miller <miltonm@us.ibm.com> Debugged-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Debugged-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-5-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
tlbiel_all() can not be usable in !HVMODE when running hash presently, remove HV privileged flushes when running in guest to make it usable. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-3-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
A typo has the R field of the instruction assigned by lucky dip a la register allocator. Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201126102530.691335-2-npiggin@gmail.com
-
- 24 11月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
We call arch_cpu_idle() with RCU disabled, but then use local_irq_{en,dis}able(), which invokes tracing, which relies on RCU. Switch all arch_cpu_idle() implementations to use raw_local_irq_{en,dis}able() and carefully manage the lockdep,rcu,tracing state like we do in entry. (XXX: we really should change arch_cpu_idle() to not return with interrupts enabled) Reported-by: NSven Schnelle <svens@linux.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
-
- 23 11月, 2020 2 次提交
-
-
由 Stephen Rothwell 提交于
Using DECLARE_STATIC_KEY_FALSE needs linux/jump_table.h. Otherwise the build fails with eg: arch/powerpc/include/asm/book3s/64/kup-radix.h:66:1: warning: data definition has no type or storage class 66 | DECLARE_STATIC_KEY_FALSE(uaccess_flush_key); Fixes: 9a32a7e7 ("powerpc/64s: flush L1D after user accesses") Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> [mpe: Massage change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201123184016.693fe464@canb.auug.org.au
-
由 Dan Williams 提交于
The core-mm has a default __weak implementation of phys_to_target_node() to mirror the weak definition of memory_add_physaddr_to_nid(). That symbol is exported for modules. However, while the export in mm/memory_hotplug.c exported the symbol in the configuration cases of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=y ...and: CONFIG_NUMA_KEEP_MEMINFO=n CONFIG_MEMORY_HOTPLUG=y ...it failed to export the symbol in the case of: CONFIG_NUMA_KEEP_MEMINFO=y CONFIG_MEMORY_HOTPLUG=n Not only is that broken, but Christoph points out that the kernel should not be exporting any __weak symbol, which means that memory_add_physaddr_to_nid() example that phys_to_target_node() copied is broken too. Rework the definition of phys_to_target_node() and memory_add_physaddr_to_nid() to not require weak symbols. Move to the common arch override design-pattern of an asm header defining a symbol to replace the default implementation. The only common header that all memory_add_physaddr_to_nid() producing architectures implement is asm/sparsemem.h. In fact, powerpc already defines its memory_add_physaddr_to_nid() helper in sparsemem.h. Double-down on that observation and define phys_to_target_node() where necessary in asm/sparsemem.h. An alternate consideration that was discarded was to put this override in asm/numa.h, but that entangles with the definition of MAX_NUMNODES relative to the inclusion of linux/nodemask.h, and requires powerpc to grow a new header. The dependency on NUMA_KEEP_MEMINFO for DEV_DAX_HMEM_DEVICES is invalid now that the symbol is properly exported / stubbed in all combinations of CONFIG_NUMA_KEEP_MEMINFO and CONFIG_MEMORY_HOTPLUG. [dan.j.williams@intel.com: v4] Link: https://lkml.kernel.org/r/160461461867.1505359.5301571728749534585.stgit@dwillia2-desk3.amr.corp.intel.com [dan.j.williams@intel.com: powerpc: fix create_section_mapping compile warning] Link: https://lkml.kernel.org/r/160558386174.2948926.2740149041249041764.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a035b6bf ("mm/memory_hotplug: introduce default phys_to_target_node() implementation") Reported-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: NThomas Gleixner <tglx@linutronix.de> Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Tested-by: NRandy Dunlap <rdunlap@infradead.org> Tested-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lkml.kernel.org/r/160447639846.1133764.7044090803980177548.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 11月, 2020 4 次提交
-
-
由 Daniel Axtens 提交于
pseries|pnv_setup_rfi_flush already does the count cache flush setup, and we just added entry and uaccess flushes. So the name is not very accurate any more. In both platforms we then also immediately setup the STF flush. Rename them to _setup_security_mitigations and fold the STF flush in. Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
In kup.h we currently include kup-radix.h for all 64-bit builds, which includes Book3S and Book3E. The latter doesn't make sense, Book3E never uses the Radix MMU. This has worked up until now, but almost by accident, and the recent uaccess flush changes introduced a build breakage on Book3E because of the bad structure of the code. So disentangle things so that we only use kup-radix.h for Book3S. This requires some more stubs in kup.h and fixing an include in syscall_64.c. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache after user accesses. This is part of the fix for CVE-2020-4788. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
IBM Power9 processors can speculatively operate on data in the L1 cache before it has been completely validated, via a way-prediction mechanism. It is not possible for an attacker to determine the contents of impermissible memory using this method, since these systems implement a combination of hardware and software security measures to prevent scenarios where protected data could be leaked. However these measures don't address the scenario where an attacker induces the operating system to speculatively execute instructions using data that the attacker controls. This can be used for example to speculatively bypass "kernel user access prevention" techniques, as discovered by Anthony Steinhauser of Google's Safeside Project. This is not an attack by itself, but there is a possibility it could be used in conjunction with side-channels or other weaknesses in the privileged code to construct an attack. This issue can be mitigated by flushing the L1 cache between privilege boundaries of concern. This patch flushes the L1 cache on kernel entry. This is part of the fix for CVE-2020-4788. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 18 11月, 2020 1 次提交
-
-
由 Nicholas Piggin 提交于
Commit 2284ffea ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") removed KVM guest tests from interrupts that do not set HV=1, when PR-KVM is not configured. This is wrong for HV-KVM HPT guest MMIO emulation case which attempts to load the faulting instruction word with MSR[DR]=1 and MSR[HV]=1 with the guest MMU context loaded. This can cause host DSI, DSLB interrupts which must test for KVM guest. Restore this and add a comment. Fixes: 2284ffea ("powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201117135617.3521127-1-npiggin@gmail.com
-
- 17 11月, 2020 1 次提交
-
-
由 Michael Ellerman 提交于
Currently a build with CONFIG_E200=y will fail with: Error: invalid switch -me200 Error: unrecognized option -me200 Upstream binutils has never supported an -me200 option. Presumably it was supported at some point by either a fork or Freescale internal binutils. We can't support code that we can't even build test, so drop the addition of -me200 to the build flags, so we can at least build with CONFIG_E200=y. Reported-by: NNémeth Márton <nm127@freemail.hu> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Acked-by: NScott Wood <oss@buserror.net> Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
-
- 16 11月, 2020 3 次提交
-
-
由 Arnd Bergmann 提交于
Stefan Agner reported a bug when using zsram on 32-bit Arm machines with RAM above the 4GB address boundary: Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = a27bd01c [00000000] *pgd=236a0003, *pmd=1ffa64003 Internal error: Oops: 207 [#1] SMP ARM Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1 Hardware name: BCM2711 PC is at zs_map_object+0x94/0x338 LR is at zram_bvec_rw.constprop.0+0x330/0xa64 pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013 sp : e376bbe0 ip : 00000000 fp : c1e2921c r10: 00000002 r9 : c1dda730 r8 : 00000000 r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000 r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 30c5383d Table: 235c2a80 DAC: fffffffd Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6) Stack: (0xe376bbe0 to 0xe376c000) As it turns out, zsram needs to know the maximum memory size, which is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture. The same problem will be hit on all 32-bit architectures that have a physical address space larger than 4GB and happen to not enable sparsemem and include asm/sparsemem.h from asm/pgtable.h. After the initial discussion, I suggested just always defining MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is set, or provoking a build error otherwise. This addresses all configurations that can currently have this runtime bug, but leaves all other configurations unchanged. I looked up the possible number of bits in source code and datasheets, here is what I found: - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never support more than 32 bits, even though supersections in theory allow up to 40 bits as well. - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5 XPA supports up to 60 bits in theory, but 40 bits are more than anyone will ever ship - On PowerPC, there are three different implementations of 36 bit addressing, but 32-bit is used without CONFIG_PTE_64BIT - On RISC-V, the normal page table format can support 34 bit addressing. There is no highmem support on RISC-V, so anything above 2GB is unused, but it might be useful to eventually support CONFIG_ZRAM for high pages. Fixes: 61989a80 ("staging: zsmalloc: zsmalloc memory allocation library") Fixes: 02390b87 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS") Acked-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de> Reviewed-by: NStefan Agner <stefan@agner.ch> Tested-by: NStefan Agner <stefan@agner.ch> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 Cédric Le Goater 提交于
When accessing the ESB page of a source interrupt, the fault handler will retrieve the page address from the XIVE interrupt 'xive_irq_data' structure. If the associated KVM XIVE interrupt is not valid, that is not allocated at the HW level for some reason, the fault handler will dereference a NULL pointer leading to the oops below : WARNING: CPU: 40 PID: 59101 at arch/powerpc/kvm/book3s_xive_native.c:259 xive_native_esb_fault+0xe4/0x240 [kvm] CPU: 40 PID: 59101 Comm: qemu-system-ppc Kdump: loaded Tainted: G W --------- - - 4.18.0-240.el8.ppc64le #1 NIP: c00800000e949fac LR: c00000000044b164 CTR: c00800000e949ec8 REGS: c000001f69617840 TRAP: 0700 Tainted: G W --------- - - (4.18.0-240.el8.ppc64le) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 44044282 XER: 00000000 CFAR: c00000000044b160 IRQMASK: 0 GPR00: c00000000044b164 c000001f69617ac0 c00800000e96e000 c000001f69617c10 GPR04: 05faa2b21e000080 0000000000000000 0000000000000005 ffffffffffffffff GPR08: 0000000000000000 0000000000000001 0000000000000000 0000000000000001 GPR12: c00800000e949ec8 c000001ffffd3400 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c000001f5c065160 c000000001c76f90 GPR24: c000001f06f20000 c000001f5c065100 0000000000000008 c000001f0eb98c78 GPR28: c000001dcab40000 c000001dcab403d8 c000001f69617c10 0000000000000011 NIP [c00800000e949fac] xive_native_esb_fault+0xe4/0x240 [kvm] LR [c00000000044b164] __do_fault+0x64/0x220 Call Trace: [c000001f69617ac0] [0000000137a5dc20] 0x137a5dc20 (unreliable) [c000001f69617b50] [c00000000044b164] __do_fault+0x64/0x220 [c000001f69617b90] [c000000000453838] do_fault+0x218/0x930 [c000001f69617bf0] [c000000000456f50] __handle_mm_fault+0x350/0xdf0 [c000001f69617cd0] [c000000000457b1c] handle_mm_fault+0x12c/0x310 [c000001f69617d10] [c00000000007ef44] __do_page_fault+0x264/0xbb0 [c000001f69617df0] [c00000000007f8c8] do_page_fault+0x38/0xd0 [c000001f69617e30] [c00000000000a714] handle_page_fault+0x18/0x38 Instruction dump: 40c2fff0 7c2004ac 2fa90000 409e0118 73e90001 41820080 e8bd0008 7c2004ac 7ca90074 39400000 915c0000 7929d182 <0b090000> 2fa50000 419e0080 e89e0018 ---[ end trace 66c6ff034c53f64f ]--- xive-kvm: xive_native_esb_fault: accessing invalid ESB page for source 8 ! Fix that by checking the validity of the KVM XIVE interrupt structure. Fixes: 6520ca64 ("KVM: PPC: Book3S HV: XIVE: Add a mapping for the source ESB pages") Cc: stable@vger.kernel.org # v5.2+ Reported-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NCédric Le Goater <clg@kaod.org> Tested-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201105134713.656160-1-clg@kaod.org
-
由 Nicholas Piggin 提交于
pseries guest kernels have a FWNMI handler for SRESET and MCE NMIs, which is basically the same as the regular handlers for those interrupts. The system reset FWNMI handler did not have a KVM guest test in it, although it probably should have because the guest can itself run guests. Commit 4f50541f ("powerpc/64s/exception: Move all interrupt handlers to new style code gen macros") convert the handler faithfully to avoid a KVM test with a "clever" trick to modify the IKVM_REAL setting to 0 when the fwnmi handler is to be generated (PPC_PSERIES=y). This worked when the KVM test was generated in the interrupt entry handlers, but a later patch moved the KVM test to the common handler, and the common handler macro is expanded below the fwnmi entry. This prevents the KVM test from being generated even for the 0x100 entry point as well. The result is NMI IPIs in the host kernel when a guest is running will use gest registers. This goes particularly badly when an HPT guest is running and the MMU is set to guest mode. Remove this trickery and just generate the test always. Fixes: 9600f261 ("powerpc/64s/exception: Move KVM test to common code") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201114114743.3306283-1-npiggin@gmail.com
-
- 10 11月, 2020 2 次提交
-
-
由 Peter Zijlstra 提交于
struct perf_sample_data lives on-stack, we should be careful about it's size. Furthermore, the pt_regs copy in there is only because x86_64 is a trainwreck, solve it differently. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NSteven Rostedt <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
-
由 Peter Zijlstra 提交于
__perf_output_begin() has an on-stack struct perf_sample_data in the unlikely case it needs to generate a LOST record. However, every call to perf_output_begin() must already have a perf_sample_data on-stack. Reported-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20201030151954.985416146@infradead.org
-
- 08 11月, 2020 1 次提交
-
-
由 Christophe Leroy 提交于
When calling early_hash_table(), the kernel hasn't been yet relocated to its linking address, so data must be addressed with relocation offset. Add relocation offset to write into Hash in early_hash_table(). Fixes: 69a1593a ("powerpc/32s: Setup the early hash table at all time.") Reported-by: NErhard Furtner <erhard_f@mailbox.org> Reported-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Tested-by: NSerge Belyshev <belyshev@depni.sinp.msu.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9e225a856a8b22e0e77587ee22ab7a2f5bca8753.1604740029.git.christophe.leroy@csgroup.eu
-
- 06 11月, 2020 1 次提交
-
-
由 Scott Cheloha 提交于
Add a non-NUMA definition for of_drconf_to_nid_single() to topology.h so we have one even if powerpc/mm/numa.c is not compiled. On a non-NUMA kernel the appropriate node id is always first_online_node. Fixes: 72cdd117 ("pseries/hotplug-memory: hot-add: skip redundant LMB lookup") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NScott Cheloha <cheloha@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201105223040.3612663-1-cheloha@linux.ibm.com
-
- 05 11月, 2020 5 次提交
-
-
由 Christophe Leroy 提交于
When _PAGE_ACCESSED is not set, a minor fault is expected. To do this, TLB miss exception ANDs _PAGE_PRESENT and _PAGE_ACCESSED into the L2 entry valid bit. To simplify the processing and reduce the number of instructions in TLB miss exceptions, manage it as an APG bit and get it next to _PAGE_GUARDED bit to allow a copy in one go. Then declare the corresponding groups as handling all accesses as user accesses. As the PP bits always define user as No Access, it will generate a fault. Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/80f488db230c6b0e7b3b990d72bd94a8a069e93e.1602492856.git.christophe.leroy@csgroup.eu
-
由 Christophe Leroy 提交于
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. This adds at least 3 instructions to the TLB miss exception handlers fast path. Following patch will reduce this overhead. Also update the rotation instruction to the correct number of bits to reflect all changes done to _PAGE_ACCESSED over time. Fixes: d069cb43 ("powerpc/8xx: Don't touch ACCESSED when no SWAP.") Fixes: 5f356497 ("powerpc/8xx: remove unused _PAGE_WRITETHRU") Fixes: e0a8e0d9 ("powerpc/8xx: Handle PAGE_USER via APG bits") Fixes: 5b2753fc ("powerpc/8xx: Implementation of PAGE_EXEC") Fixes: a891c43b ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.") Cc: stable@vger.kernel.org Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.1602492856.git.christophe.leroy@csgroup.eu
-
由 Christophe Leroy 提交于
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 2c74e258 ("powerpc/40x: Rework 40x PTE access and TLB miss") Cc: stable@vger.kernel.org Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b02ca2ed2d3676a096219b48c0f69ec982a75bcf.1602342801.git.christophe.leroy@csgroup.eu
-
由 Christophe Leroy 提交于
The kernel expects pte_young() to work regardless of CONFIG_SWAP. Make sure a minor fault is taken to set _PAGE_ACCESSED when it is not already set, regardless of the selection of CONFIG_SWAP. Fixes: 84de6ab0 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") Cc: stable@vger.kernel.org Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a44367744de54e2315b2f1a8cbbd7f88488072e0.1602342806.git.christophe.leroy@csgroup.eu
-
由 Michael Ellerman 提交于
Andreas reported that commit ee0a49a6 ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") broke CLONE_CHILD_SETTID. Further inspection showed that the put_user() in schedule_tail() was missing entirely, the store not emitted by the compiler. <.schedule_tail>: mflr r0 std r0,16(r1) stdu r1,-112(r1) bl <.finish_task_switch> ld r9,2496(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0x60> ld r3,392(r13) ld r9,1392(r3) cmpdi cr7,r9,0 beq cr7,<.schedule_tail+0x3c> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop bl <.calculate_sigpending> nop addi r1,r1,112 ld r0,16(r1) mtlr r0 blr nop nop nop bl <.__balance_callback> b <.schedule_tail+0x1c> Notice there are no stores other than to the stack. There should be a stw in there for the store to current->set_child_tid. This is only seen with GCC 4.9 era compilers (tested with 4.9.3 and 4.9.4), and only when CONFIG_PPC_KUAP is disabled. When CONFIG_PPC_KUAP=y, the inline asm that's part of the isync() and mtspr() inlined via allow_user_access() seems to be enough to avoid the bug. We already have a macro to work around this (or a similar bug), called asm_volatile_goto which includes an empty asm block to tickle the compiler into generating the right code. So use that. With this applied the code generation looks more like it will work: <.schedule_tail>: mflr r0 std r31,-8(r1) std r0,16(r1) stdu r1,-144(r1) std r3,112(r1) bl <._mcount> nop ld r3,112(r1) bl <.finish_task_switch> ld r9,2624(r3) cmpdi cr7,r9,0 bne cr7,<.schedule_tail+0xa0> ld r3,2408(r13) ld r31,1856(r3) cmpdi cr7,r31,0 beq cr7,<.schedule_tail+0x80> li r4,0 li r5,0 bl <.__task_pid_nr_ns> nop li r9,-1 clrldi r9,r9,12 cmpld cr7,r31,r9 bgt cr7,<.schedule_tail+0x80> lis r9,16 rldicr r9,r9,32,31 subf r9,r31,r9 cmpldi cr7,r9,3 ble cr7,<.schedule_tail+0x80> li r9,0 stw r3,0(r31) <-- stw nop bl <.calculate_sigpending> nop addi r1,r1,144 ld r0,16(r1) ld r31,-8(r1) mtlr r0 blr nop bl <.__balance_callback> b <.schedule_tail+0x30> Fixes: ee0a49a6 ("powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()") Reported-by: NAndreas Schwab <schwab@linux-m68k.org> Tested-by: NAndreas Schwab <schwab@linux-m68k.org> Suggested-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201104111742.672142-1-mpe@ellerman.id.au
-
- 02 11月, 2020 2 次提交
-
-
由 Qian Cai 提交于
The call to rcu_cpu_starting() in start_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows (with CONFIG_PROVE_RCU_LIST=y): WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x128/0x14c __lock_acquire+0x1060/0x1c60 lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 clockevents_register_device+0x74/0x270 register_decrementer_clockevent+0x94/0x110 start_secondary+0x134/0x800 start_secondary_prolog+0x10/0x14 This is avoided by adding a call to rcu_cpu_starting() near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. It's safe to call rcu_cpu_starting() in the arch code as well as later in generic code, as explained by Paul: It uses a per-CPU variable so that RCU pays attention only to the first call to rcu_cpu_starting() if there is more than one of them. This is even intentional, due to there being a generic arch-independent call to rcu_cpu_starting() in notify_cpu_starting(). So multiple calls to rcu_cpu_starting() are fine by design. Fixes: 4d004099 ("lockdep: Fix lockdep recursion") Signed-off-by: NQian Cai <cai@redhat.com> Acked-by: NPaul E. McKenney <paulmck@kernel.org> [mpe: Add Fixes tag, reword slightly & expand change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028182334.13466-1-cai@redhat.com
-
由 Qian Cai 提交于
Lockdep complains that a possible deadlock below in eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ disabled. Let's just make eeh_addr_cache_show() acquire the lock with IRQ disabled as well. CPU0 CPU1 ---- ---- lock(&pci_io_addr_cache_root.piar_lock); local_irq_disable(); lock(&tp->lock); lock(&pci_io_addr_cache_root.piar_lock); <Interrupt> lock(&tp->lock); *** DEADLOCK *** lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 eeh_addr_cache_insert_dev+0x48/0x390 eeh_probe_device+0xb8/0x1a0 pnv_pcibios_bus_add_device+0x3c/0x80 pcibios_bus_add_device+0x118/0x290 pci_bus_add_device+0x28/0xe0 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xc4/0x124 do_one_initcall+0xac/0x528 kernel_init_freeable+0x35c/0x3fc kernel_init+0x24/0x148 ret_from_kernel_thread+0x5c/0x80 lock_acquire+0x140/0x5f0 _raw_spin_lock+0x4c/0x70 eeh_addr_cache_show+0x38/0x110 seq_read+0x1a0/0x660 vfs_read+0xc8/0x1f0 ksys_read+0x74/0x130 system_call_exception+0xf8/0x1d0 system_call_common+0xe8/0x218 Fixes: 5ca85ae6 ("powerpc/eeh_cache: Add a way to dump the EEH address cache") Signed-off-by: NQian Cai <cai@redhat.com> Reviewed-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028152717.8967-1-cai@redhat.com
-
- 26 10月, 2020 1 次提交
-
-
由 Joe Perches 提交于
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.plSigned-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NNick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: NMiguel Ojeda <ojeda@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 10月, 2020 4 次提交
-
-
由 Ganesh Goudar 提交于
When an UE or memory error exception is encountered the MCE handler tries to find the pfn using addr_to_pfn() which takes effective address as an argument, later pfn is used to poison the page where memory error occurred, recent rework in this area made addr_to_pfn to run in real mode, which can be fatal as it may try to access memory outside RMO region. Have two helper functions to separate things to be done in real mode and virtual mode without changing any functionality. This also fixes the following error as the use of addr_to_pfn is now moved to virtual mode. Without this change following kernel crash is seen on hitting UE. [ 485.128036] Oops: Kernel access of bad area, sig: 11 [#1] [ 485.128040] LE SMP NR_CPUS=2048 NUMA pSeries [ 485.128047] Modules linked in: [ 485.128067] CPU: 15 PID: 6536 Comm: insmod Kdump: loaded Tainted: G OE 5.7.0 #22 [ 485.128074] NIP: c00000000009b24c LR: c0000000000398d8 CTR: c000000000cd57c0 [ 485.128078] REGS: c000000003f1f970 TRAP: 0300 Tainted: G OE (5.7.0) [ 485.128082] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 28008284 XER: 00000001 [ 485.128088] CFAR: c00000000009b190 DAR: c0000001fab00000 DSISR: 40000000 IRQMASK: 1 [ 485.128088] GPR00: 0000000000000001 c000000003f1fbf0 c000000001634300 0000b0fa01000000 [ 485.128088] GPR04: d000000002220000 0000000000000000 00000000fab00000 0000000000000022 [ 485.128088] GPR08: c0000001fab00000 0000000000000000 c0000001fab00000 c000000003f1fc14 [ 485.128088] GPR12: 0000000000000008 c000000003ff5880 d000000002100008 0000000000000000 [ 485.128088] GPR16: 000000000000ff20 000000000000fff1 000000000000fff2 d0000000021a1100 [ 485.128088] GPR20: d000000002200000 c00000015c893c50 c000000000d49b28 c00000015c893c50 [ 485.128088] GPR24: d0000000021a0d08 c0000000014e5da8 d0000000021a0818 000000000000000a [ 485.128088] GPR28: 0000000000000008 000000000000000a c0000000017e2970 000000000000000a [ 485.128125] NIP [c00000000009b24c] __find_linux_pte+0x11c/0x310 [ 485.128130] LR [c0000000000398d8] addr_to_pfn+0x138/0x170 [ 485.128133] Call Trace: [ 485.128135] Instruction dump: [ 485.128138] 3929ffff 7d4a3378 7c883c36 7d2907b4 794a1564 7d294038 794af082 3900ffff [ 485.128144] 79291f24 790af00e 78e70020 7d095214 <7c69502a> 2fa30000 419e011c 70690040 [ 485.128152] ---[ end trace d34b27e29ae0e340 ]--- Fixes: 9ca766f9 ("powerpc/64s/pseries: machine check convert to use common event code") Signed-off-by: NGanesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200724063946.21378-1-ganeshgr@linux.ibm.com
-
由 Christophe Leroy 提交于
GCC 4.9 sometimes fails to build with "m<>" constraint in inline assembly. CC lib/iov_iter.o In file included from ./arch/powerpc/include/asm/cmpxchg.h:6:0, from ./arch/powerpc/include/asm/atomic.h:11, from ./include/linux/atomic.h:7, from ./include/linux/crypto.h:15, from ./include/crypto/hash.h:11, from lib/iov_iter.c:2: lib/iov_iter.c: In function 'iovec_from_user.part.30': ./arch/powerpc/include/asm/uaccess.h:287:2: error: 'asm' operand has impossible constraints __asm__ __volatile__( \ ^ ./include/linux/compiler.h:78:42: note: in definition of macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ ./arch/powerpc/include/asm/uaccess.h:583:34: note: in expansion of macro 'unsafe_op_wrap' #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e) ^ ./arch/powerpc/include/asm/uaccess.h:329:10: note: in expansion of macro '__get_user_asm' case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break; \ ^ ./arch/powerpc/include/asm/uaccess.h:363:3: note: in expansion of macro '__get_user_size_allowed' __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err); \ ^ ./arch/powerpc/include/asm/uaccess.h:100:2: note: in expansion of macro '__get_user_nocheck' __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false) ^ ./arch/powerpc/include/asm/uaccess.h:583:49: note: in expansion of macro '__get_user_allowed' #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e) ^ lib/iov_iter.c:1663:3: note: in expansion of macro 'unsafe_get_user' unsafe_get_user(len, &uiov[i].iov_len, uaccess_end); ^ make[1]: *** [scripts/Makefile.build:283: lib/iov_iter.o] Error 1 Define a UPD_CONSTR macro that is "<>" by default and only "" with GCC prior to GCC 5. Fixes: fcf1f268 ("powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto()") Fixes: 2f279eeb ("powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()") Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Acked-by: NSegher Boessenkool <segher@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/212d3bc4a52ca71523759517bb9c61f7e477c46a.1603179582.git.christophe.leroy@csgroup.eu
-
由 Oliver O'Halloran 提交于
In commit 269e5833 ("powerpc/eeh: Delete eeh_pe->config_addr") the following simplification was made: - if (!pe->addr && !pe->config_addr) { + if (!pe->addr) { eeh_stats.no_cfg_addr++; return 0; } This introduced a bug which causes EEH checking to be skipped for devices in PE#0. Before the change above the check would always pass since at least one of the two PE addresses would be non-zero in all circumstances. On PowerNV pe->config_addr would be the BDFN of the first device added to the PE. The zero BDFN is reserved for the PHB's root port, but this is fine since for obscure platform reasons the root port is never assigned to PE#0. Similarly, on pseries pe->addr has always been non-zero for the reasons outlined in commit 42de19d5 ("powerpc/pseries/eeh: Allow zero to be a valid PE configuration address"). We can fix the problem by deleting the block entirely The original purpose of this test was to avoid performing EEH checks on devices that were not on an EEH capable bus. In modern Linux the edev->pe pointer will be NULL for devices that are not on an EEH capable bus. The code block immediately above this one already checks for the edev->pe == NULL case so this test (new and old) is entirely redundant. Ideally we'd delete eeh_stats.no_cfg_addr too since nothing increments it any more. Unfortunately, that information is exposed via /proc/powerpc/eeh which means it's technically ABI. We could make it hard-coded, but that's a change for another patch. Fixes: 269e5833 ("powerpc/eeh: Delete eeh_pe->config_addr") Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201021232554.1434687-1-oohall@gmail.com
-
由 Joe Perches 提交于
This should be const, so make it so. Signed-off-by: NJoe Perches <joe@perches.com> Message-Id: <d130e88dd4c82a12d979da747cc0365c72c3ba15.1601770305.git.joe@perches.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 10月, 2020 2 次提交
-
-
由 Jordan Niethe 提交于
ISA v3.1 removes transactional memory and hence it should not be present in cpu_features or cpu_user_features2. Remove CPU_FTR_TM_COMP from CPU_FTRS_POWER10. Remove PPC_FEATURE2_HTM_COMP and PPC_FEATURE2_HTM_NOSC_COMP from COMMON_USER2_POWER10. Fixes: a3ea40d5 ("powerpc: Add POWER10 architected mode") Signed-off-by: NJordan Niethe <jniethe5@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200827035529.900-1-jniethe5@gmail.com
-
由 Michael Neuling 提交于
__get_user_atomic_128_aligned() stores to kaddr using stvx which is a VMX store instruction, hence kaddr must be 16 byte aligned otherwise the store won't occur as expected. Unfortunately when we call __get_user_atomic_128_aligned() in p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't guaranteed to be 16B aligned. This means that the write to vbuf in __get_user_atomic_128_aligned() has the bottom bits of the address truncated. This results in other local variables being overwritten. Also vbuf will not contain the correct data which results in the userspace emulation being wrong and hence undetected user data corruption. In the past we've been mostly lucky as vbuf has ended up aligned but this is fragile and isn't always true. CONFIG_STACKPROTECTOR in particular can change the stack arrangement enough that our luck runs out. This issue only occurs on POWER9 Nimbus <= DD2.1 bare metal. The fix is to align vbuf to a 16 byte boundary. Fixes: 5080332c ("powerpc/64s: Add workaround for P9 vector CI load issue") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201013043741.743413-1-mikey@neuling.org
-
- 19 10月, 2020 4 次提交
-
-
由 Vasant Hegde 提交于
Even though we use self removing sysfs helper, we still need to make sure we do the final kobject delete conditionally. sysfs_remove_file_self() will handle parallel calls to remove the sysfs attribute file and returns true only in the caller that removed the attribute file. The other parallel callers are returned false. Do the final kobject delete checking the return value of sysfs_remove_file_self(). Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201017164236.264713-1-hegdevasant@linux.vnet.ibm.com
-
由 Vasant Hegde 提交于
Every dump reported by OPAL is exported to userspace through a sysfs interface and notified using kobject_uevent(). The userspace daemon (opal_errd) then reads the dump and acknowledges that the dump is saved safely to disk. Once acknowledged the kernel removes the respective sysfs file entry causing respective resources to be released including kobject. However it's possible the userspace daemon may already be scanning dump entries when a new sysfs dump entry is created by the kernel. User daemon may read this new entry and ack it even before kernel can notify userspace about it through kobject_uevent() call. If that happens then we have a potential race between dump_ack_store->kobject_put() and kobject_uevent which can lead to use-after-free of a kernfs object resulting in a kernel crash. This patch fixes this race by protecting the sysfs file creation/notification by holding a reference count on kobject until we safely send kobject_uevent(). The function create_dump_obj() returns the dump object which if used by caller function will end up in use-after-free problem again. However, the return value of create_dump_obj() function isn't being used today and there is no need as well. Hence change it to return void to make this fix complete. Fixes: c7e64b9c ("powerpc/powernv Platform dump interface") Signed-off-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201017164210.264619-1-hegdevasant@linux.vnet.ibm.com
-
由 Srikar Dronamraju 提交于
Qian Cai reported a regression where CPU Hotplug fails with the latest powerpc/next BUG: sleeping function called from invalid context at mm/slab.h:494 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88 no locks held by swapper/88/0. irq event stamp: 18074448 hardirqs last enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110 hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0 do_idle at kernel/sched/idle.c:253 (discriminator 1) softirqs last enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0 softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0 CPU: 88 PID: 0 Comm: swapper/88 Tainted: G W 5.9.0-rc8-next-20201007 #1 Call Trace: [c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable) [c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310 [c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190 [c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0 slab_alloc_node at mm/slub.c:2817 (inlined by) __kmalloc_node at mm/slub.c:4013 [c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80 kmalloc_node at include/linux/slab.h:577 (inlined by) alloc_cpumask_var_node at lib/cpumask.c:116 [c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800 update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267 (inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387 (inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420 [c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14 Allocating a temporary mask while performing a CPU Hotplug operation with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable function from a atomic context. Fix this by allocating the temporary mask with GFP_ATOMIC flag. Also instead of having to allocate twice, allocate the mask in the caller so that we only have to allocate once. If the allocation fails, assume the mask to be same as sibling mask, which will make the scheduler to drop this domain for this CPU. Fixes: 70a94089 ("powerpc/smp: Optimize update_coregroup_mask") Fixes: 3ab33d6d ("powerpc/smp: Optimize update_mask_by_l2") Reported-by: NQian Cai <cai@redhat.com> Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201019042716.106234-3-srikar@linux.vnet.ibm.com
-
由 Srikar Dronamraju 提交于
Commit 3ab33d6d ("powerpc/smp: Optimize update_mask_by_l2") introduced submask_fn in update_mask_by_l2 to track the right submask. However commit f6606cfd ("powerpc/smp: Dont assume l2-cache to be superset of sibling") introduced sibling_mask in update_mask_by_l2 to track the same submask. Remove sibling_mask in favour of submask_fn. Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201019042716.106234-2-srikar@linux.vnet.ibm.com
-