- 23 8月, 2019 2 次提交
-
-
由 Suraj Jitindar Singh 提交于
The rmap array in the guest memslot is an array of size number of guest pages, allocated at memslot creation time. Each rmap entry in this array is used to store information about the guest page to which it corresponds. For example for a hpt guest it is used to store a lock bit, rc bits, a present bit and the index of a hpt entry in the guest hpt which maps this page. For a radix guest which is running nested guests it is used to store a pointer to a linked list of nested rmap entries which store the nested guest physical address which maps this guest address and for which there is a pte in the shadow page table. As there are currently two uses for the rmap array, and the potential for this to expand to more in the future, define a type field (being the top 8 bits of the rmap entry) to be used to define the type of the rmap entry which is currently present and define two values for this field for the two current uses of the rmap array. Since the nested case uses the rmap entry to store a pointer, define this type as having the two high bits set as is expected for a pointer. Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering). Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Menzel 提交于
Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging it as an expected fall-through. arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’: arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=] pte->may_write = true; ~~~~~~~~~~~~~~~^~~~~~ arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here case 3: ^~~~ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 16 8月, 2019 4 次提交
-
-
由 Paul Mackerras 提交于
Testing has revealed the existence of a race condition where a XIVE interrupt being shut down can be in one of the XIVE interrupt queues (of which there are up to 8 per CPU, one for each priority) at the point where free_irq() is called. If this happens, can return an interrupt number which has been shut down. This can lead to various symptoms: - irq_to_desc(irq) can be NULL. In this case, no end-of-interrupt function gets called, resulting in the CPU's elevated interrupt priority (numerically lowered CPPR) never gets reset. That then means that the CPU stops processing interrupts, causing device timeouts and other errors in various device drivers. - The irq descriptor or related data structures can be in the process of being freed as the interrupt code is using them. This typically leads to crashes due to bad pointer dereferences. This race is basically what commit 62e04686 ("genirq: Add optional hardware synchronization for shutdown", 2019-06-28) is intended to fix, given a get_irqchip_state() method for the interrupt controller being used. It works by polling the interrupt controller when an interrupt is being freed until the controller says it is not pending. With XIVE, the PQ bits of the interrupt source indicate the state of the interrupt source, and in particular the P bit goes from 0 to 1 at the point where the hardware writes an entry into the interrupt queue that this interrupt is directed towards. Normally, the code will then process the interrupt and do an end-of-interrupt (EOI) operation which will reset PQ to 00 (assuming another interrupt hasn't been generated in the meantime). However, there are situations where the code resets P even though a queue entry exists (for example, by setting PQ to 01, which disables the interrupt source), and also situations where the code leaves P at 1 after removing the queue entry (for example, this is done for escalation interrupts so they cannot fire again until they are explicitly re-enabled). The code already has a 'saved_p' flag for the interrupt source which indicates that a queue entry exists, although it isn't maintained consistently. This patch adds a 'stale_p' flag to indicate that P has been left at 1 after processing a queue entry, and adds code to set and clear saved_p and stale_p as necessary to maintain a consistent indication of whether a queue entry may or may not exist. With this, we can implement xive_get_irqchip_state() by looking at stale_p, saved_p and the ESB PQ bits for the interrupt. There is some additional code to handle escalation interrupts properly; because they are enabled and disabled in KVM assembly code, which does not have access to the xive_irq_data struct for the escalation interrupt. Hence, stale_p may be incorrect when the escalation interrupt is freed in kvmppc_xive_{,native_}cleanup_vcpu(). Fortunately, we can fix it up by looking at vcpu->arch.xive_esc_on, with some careful attention to barriers in order to ensure the correct result if xive_esc_irq() races with kvmppc_xive_cleanup_vcpu(). Finally, this adds code to make noise on the console (pr_crit and WARN_ON(1)) if we find an interrupt queue entry for an interrupt which does not have a descriptor. While this won't catch the race reliably, if it does get triggered it will be an indication that the race is occurring and needs to be debugged. Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813100648.GE9567@blackberry
-
由 Paul Mackerras 提交于
At present, when running a guest on POWER9 using HV KVM but not using an in-kernel interrupt controller (XICS or XIVE), for example if QEMU is run with the kernel_irqchip=off option, the guest entry code goes ahead and tries to load the guest context into the XIVE hardware, even though no context has been set up. To fix this, we check that the "CAM word" is non-zero before pushing it to the hardware. The CAM word is initialized to a non-zero value in kvmppc_xive_connect_vcpu() and kvmppc_xive_native_connect_vcpu(), and is now cleared in kvmppc_xive_{,native_}cleanup_vcpu. Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Reported-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813100100.GC9567@blackberry
-
由 Paul Mackerras 提交于
Escalation interrupts are interrupts sent to the host by the XIVE hardware when it has an interrupt to deliver to a guest VCPU but that VCPU is not running anywhere in the system. Hence we disable the escalation interrupt for the VCPU being run when we enter the guest and re-enable it when the guest does an H_CEDE hypercall indicating it is idle. It is possible that an escalation interrupt gets generated just as we are entering the guest. In that case the escalation interrupt may be using a queue entry in one of the interrupt queues, and that queue entry may not have been processed when the guest exits with an H_CEDE. The existing entry code detects this situation and does not clear the vcpu->arch.xive_esc_on flag as an indication that there is a pending queue entry (if the queue entry gets processed, xive_esc_irq() will clear the flag). There is a comment in the code saying that if the flag is still set on H_CEDE, we have to abort the cede rather than re-enabling the escalation interrupt, lest we end up with two occurrences of the escalation interrupt in the interrupt queue. However, the exit code doesn't do that; it aborts the cede in the sense that vcpu->arch.ceded gets cleared, but it still enables the escalation interrupt by setting the source's PQ bits to 00. Instead we need to set the PQ bits to 10, indicating that an interrupt has been triggered. We also need to avoid setting vcpu->arch.xive_esc_on in this case (i.e. vcpu->arch.xive_esc_on seen to be set on H_CEDE) because xive_esc_irq() will run at some point and clear it, and if we race with that we may end up with an incorrect result (i.e. xive_esc_on set when the escalation interrupt has just been handled). It is extremely unlikely that having two queue entries would cause observable problems; theoretically it could cause queue overflow, but the CPU would have to have thousands of interrupts targetted to it for that to be possible. However, this fix will also make it possible to determine accurately whether there is an unhandled escalation interrupt in the queue, which will be needed by the following patch. Fixes: 9b9b13a6 ("KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190813100349.GD9567@blackberry
-
由 Cédric Le Goater 提交于
When a vCPU is brought done, the XIVE VP (Virtual Processor) is first disabled and then the event notification queues are freed. When freeing the queues, we check for possible escalation interrupts and free them also. But when a XIVE VP is disabled, the underlying XIVE ENDs also are disabled in OPAL. When an END (Event Notification Descriptor) is disabled, its ESB pages (ESn and ESe) are disabled and loads return all 1s. Which means that any access on the ESB page of the escalation interrupt will return invalid values. When an interrupt is freed, the shutdown handler computes a 'saved_p' field from the value returned by a load in xive_do_source_set_mask(). This value is incorrect for escalation interrupts for the reason described above. This has no impact on Linux/KVM today because we don't make use of it but we will introduce in future changes a xive_get_irqchip_state() handler. This handler will use the 'saved_p' field to return the state of an interrupt and 'saved_p' being incorrect, softlockup will occur. Fix the vCPU cleanup sequence by first freeing the escalation interrupts if any, then disable the XIVE VP and last free the queues. Fixes: 90c73795 ("KVM: PPC: Book3S HV: Add a new KVM device for the XIVE native exploitation mode") Fixes: 5af50993 ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190806172538.5087-1-clg@kaod.org
-
- 25 7月, 2019 1 次提交
-
-
由 Masahiro Yamada 提交于
UAPI headers licensed under GPL are supposed to have exception "WITH Linux-syscall-note" so that they can be included into non-GPL user space application code. The exception note is missing in some UAPI headers. Some of them slipped in by the treewide conversion commit b2441318 ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license"). Just run: $ git show --oneline b2441318 -- arch/x86/include/uapi/asm/ I believe they are not intentional, and should be fixed too. This patch was generated by the following script: git grep -l --not -e Linux-syscall-note --and -e SPDX-License-Identifier \ -- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild | while read file do sed -i -e '/[[:space:]]OR[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \ -e '/[[:space:]]or[[:space:]]/s/\(GPL-[^[:space:]]*\)/(\1 WITH Linux-syscall-note)/g' \ -e '/[[:space:]]OR[[:space:]]/!{/[[:space:]]or[[:space:]]/!s/\(GPL-[^[:space:]]*\)/\1 WITH Linux-syscall-note/g}' $file done After this patch is applied, there are 5 UAPI headers that do not contain "WITH Linux-syscall-note". They are kept untouched since this exception applies only to GPL variants. $ git grep --not -e Linux-syscall-note --and -e SPDX-License-Identifier \ -- :arch/*/include/uapi/asm/*.h :include/uapi/ :^*/Kbuild include/uapi/drm/panfrost_drm.h:/* SPDX-License-Identifier: MIT */ include/uapi/linux/batman_adv.h:/* SPDX-License-Identifier: MIT */ include/uapi/linux/qemu_fw_cfg.h:/* SPDX-License-Identifier: BSD-3-Clause */ include/uapi/linux/vbox_err.h:/* SPDX-License-Identifier: MIT */ include/uapi/linux/virtio_iommu.h:/* SPDX-License-Identifier: BSD-3-Clause */ Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 7月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Renaming docs seems to be en vogue at the moment, so fix on of the grossly misnamed directories. We usually never use "virtual" as a shortcut for virtualization in the kernel, but always virt, as seen in the virt/ top-level directory. Fix up the documentation to match that. Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 7月, 2019 4 次提交
-
-
由 Vaibhav Jain 提交于
In some cases initial bind of scm memory for an lpar can fail if previously it wasn't released using a scm-unbind hcall. This situation can arise due to panic of the previous kernel or forced lpar fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error. To mitigate such cases the patch updates papr_scm_probe() to force a call to drc_pmem_unbind() in case the initial bind of scm memory fails with EBUSY error. In case scm-bind operation again fails after the forced scm-unbind then we follow the existing error path. We also update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp and indicate it as a EBUSY error back to the caller. Suggested-by: N"Oliver O'Halloran" <oohall@gmail.com> Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190629160610.23402-4-vaibhav@linux.ibm.com
-
由 Vaibhav Jain 提交于
The new hcall named H_SCM_UNBIND_ALL has been introduce that can unbind all or specific scm memory assigned to an lpar. This is more efficient than using H_SCM_UNBIND_MEM as currently we don't support partial unbind of scm memory. Hence this patch proposes following changes to drc_pmem_unbind(): * Update drc_pmem_unbind() to replace hcall H_SCM_UNBIND_MEM to H_SCM_UNBIND_ALL. * Update drc_pmem_unbind() to handles cases when PHYP asks the guest kernel to wait for specific amount of time before retrying the hcall via the 'LONG_BUSY' return value. * Ensure appropriate error code is returned back from the function in case of an error. Reviewed-by: NOliver O'Halloran <oohall@gmail.com> Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190629160610.23402-3-vaibhav@linux.ibm.com
-
由 Vaibhav Jain 提交于
Update the hvcalls.h to include op-codes for new hcalls introduce to manage SCM memory. Also update existing hcall definitions to reflect current papr specification for SCM. The removed hcall op-codes H_SCM_MEM_QUERY, H_SCM_BLOCK_CLEAR were transient proposals and there support was never implemented by Power-VM nor they were used anywhere in Linux kernel. Hence we don't expect anyone to be impacted by this change. Signed-off-by: NVaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190629160610.23402-2-vaibhav@linux.ibm.com
-
由 Michael Neuling 提交于
On systems like P9 powernv where we have no TM (or P8 booted with ppc_tm=off), userspace can construct a signal context which still has the MSR TS bits set. The kernel tries to restore this context which results in the following crash: Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033 Oops: Unrecoverable exception, sig: 6 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ff #69 NIP: c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000 REGS: c00000003fffbd70 TRAP: 0700 Not tainted (5.2.0-11045-g7142b497d8) MSR: 8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]> CR: 42004242 XER: 00000000 CFAR: c0000000000022e0 IRQMASK: 0 GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669 GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000 GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420 GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000 GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000 GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728 NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80 LR [00007fffb2d67e48] 0x7fffb2d67e48 Call Trace: Instruction dump: e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00 e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18 The problem is the signal code assumes TM is enabled when CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as with P9 powernv or if `ppc_tm=off` is used on P8. This means any local user can crash the system. Fix the problem by returning a bad stack frame to the user if they try to set the MSR TS bits with sigreturn() on systems where TM is not supported. Found with sigfuz kernel selftest on P9. This fixes CVE-2019-13648. Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9 Reported-by: NPraveen Pandey <Praveen.Pandey@in.ibm.com> Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.org
-
- 19 7月, 2019 4 次提交
-
-
由 Shawn Anastasio 提交于
The refactor of powerpc DMA functions in commit 6666cc17 ("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly changes the way DMA mappings are handled on powerpc. Since this change, all mapped pages are marked as cache-inhibited through the default implementation of arch_dma_mmap_pgprot. This differs from the previous behavior of only marking pages in noncoherent mappings as cache-inhibited and has resulted in sporadic system crashes in certain hardware configurations and workloads (see Bugzilla). This commit restores the previous correct behavior by providing an implementation of arch_dma_mmap_pgprot that only marks pages in noncoherent mappings as cache-inhibited. As this behavior should be universal for all powerpc platforms a new file, dma-generic.c, was created to store it. Fixes: 6666cc17 ("powerpc/dma: remove dma_nommu_mmap_coherent") # NOTE: fixes commit 6666cc17 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ # NOTE: fixes commit 6666cc17 released in v5.1. # Consider a stable tag: # Cc: stable@vger.kernel.org # v5.1+ Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: NShawn Anastasio <shawn@anastas.io> Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io
-
由 Cédric Le Goater 提交于
The XIVE device structure is now allocated in kvmppc_xive_get_device() and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create() will result in a double free and corrupt the host memory. Fixes: 5422e951 ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: NCédric Le Goater <clg@kaod.org> Tested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6ea6998b-a890-2511-01d1-747d7621eb19@kaod.org
-
由 David Hildenbrand 提交于
walk_memory_range() was once used to iterate over sections. Now, it iterates over memory blocks. Rename the function, fixup the documentation. Also, pass start+size instead of PFNs, which is what most callers already have at hand. (we'll rework link_mem_sections() most probably soon) Follow-up patches will rework, simplify, and move walk_memory_blocks() to drivers/base/memory.c. Note: walk_memory_blocks() only works correctly right now if the start_pfn is aligned to a section start. This is the case right now, but we'll generalize the function in a follow up patch so the semantics match the documentation. [akpm@linux-foundation.org: remove unused variable] Link: http://lkml.kernel.org/r/20190614100114.311-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Rashmica Gupta <rashmica.g@gmail.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Arun KS <arunks@codeaurora.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Hildenbrand 提交于
We want to improve error handling while adding memory by allowing to use arch_remove_memory() and __remove_pages() even if CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like: arch_add_memory() rc = do_something(); if (rc) { arch_remove_memory(); } We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require quite some dependencies for memory offlining. Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Mark Brown <broonie@kernel.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Rob Herring <robh@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "mike.travis@hpe.com" <mike.travis@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Qian Cai <cai@lca.pw> Cc: Mathieu Malaterre <malat@debian.org> Cc: Baoquan He <bhe@redhat.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jun Yao <yaojun8558363@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 7月, 2019 1 次提交
-
-
由 Gautham R. Shenoy 提交于
xive_find_target_in_mask() has the following for(;;) loop which has a bug when @first == cpumask_first(@mask) and condition 1 fails to hold for every CPU in @mask. In this case we loop forever in the for-loop. first = cpu; for (;;) { if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1 return cpu; cpu = cpumask_next(cpu, mask); if (cpu == first) // condition 2 break; if (cpu >= nr_cpu_ids) // condition 3 cpu = cpumask_first(mask); } This is because, when @first == cpumask_first(@mask), we never hit the condition 2 (cpu == first) since prior to this check, we would have executed "cpu = cpumask_next(cpu, mask)" which will set the value of @cpu to a value greater than @first or to nr_cpus_ids. When this is coupled with the fact that condition 1 is not met, we will never exit this loop. This was discovered by the hard-lockup detector while running LTP test concurrently with SMT switch tests. watchdog: CPU 12 detected hard LOCKUP on other CPUs 68 watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago) watchdog: CPU 68 Hard LOCKUP watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago) CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1 NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000 REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le) MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000 CFAR: c0000000006f558c IRQMASK: 1 GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18 GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8 GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8 GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001 GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18 GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0 GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f NIP [c0000000006f5578] find_next_bit+0x38/0x90 LR [c000000000cba9ec] cpumask_next+0x2c/0x50 Call Trace: [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable) [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240 [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0 [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260 [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0 [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0 [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90 [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220 [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x] [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x] [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0 [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50 [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0 [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0 [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70 [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160 [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70 Instruction dump: 78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039 4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040 To fix this, move the check for condition 2 after the check for condition 3, so that we are able to break out of the loop soon after iterating through all the CPUs in the @mask in the problem case. Use do..while() to achieve this. Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Reported-by: NIndira P. Joga <indira.priya@in.ibm.com> Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.com
-
- 17 7月, 2019 7 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Convert docs to ReST and add them to the arch-specific book. The conversion here was trivial, as almost every file there was already using an elegant format close to ReST standard. The changes were mostly to mark literal blocks and add a few missing section title identifiers. One note with regards to "--": on Sphinx, this can't be used to identify a list, as it will format it badly. This can be used, however, to identify a long hyphen - and "---" is an even longer one. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
-
由 Daniel Jordan 提交于
locked_vm accounting is done roughly the same way in five places, so unify them in a helper. Include the helper's caller in the debug print to distinguish between callsites. Error codes stay the same, so user-visible behavior does too. The one exception is that the -EPERM case in tce_account_locked_vm is removed because Alexey has never seen it triggered. [daniel.m.jordan@oracle.com: v3] Link: http://lkml.kernel.org/r/20190529205019.20927-1-daniel.m.jordan@oracle.com [sfr@canb.auug.org.au: fix mm/util.c] Link: http://lkml.kernel.org/r/20190524175045.26897-1-daniel.m.jordan@oracle.comSigned-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NAlex Williamson <alex.williamson@redhat.com> Cc: Alan Tull <atull@kernel.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Lameter <cl@linux.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Moritz Fischer <mdf@kernel.org> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Steve Sistare <steven.sistare@oracle.com> Cc: Wu Hao <hao.wu@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robin Murphy 提交于
ARCH_HAS_ZONE_DEVICE is somewhat meaningless in itself, and combined with the long-out-of-date comment can lead to the impression than an architecture may just enable it (since __add_pages() now "comprehends device memory" for itself) and expect things to work. In practice, however, ZONE_DEVICE users have little chance of functioning correctly without __HAVE_ARCH_PTE_DEVMAP, so let's clean that up the same way as ARCH_HAS_PTE_SPECIAL and make it the proper dependency so the real situation is clearer. Link: http://lkml.kernel.org/r/87554aa78478a02a63f2c4cf60a847279ae3eb3b.1558547956.git.robin.murphy@arm.comSigned-off-by: NRobin Murphy <robin.murphy@arm.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NOliver O'Halloran <oohall@gmail.com> Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Two architecture that use arch specific MMAP flags are powerpc and sparc. We still have few flag values common across them and other architectures. Consolidate this in mman-common.h. Also update the comment to indicate where to find HugeTLB specific reserved values Link: http://lkml.kernel.org/r/20190604090950.31417-1-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry V. Levin 提交于
syscall_get_error() is required to be implemented on this architecture in addition to already implemented syscall_get_nr(), syscall_get_arguments(), syscall_get_return_value(), and syscall_get_arch() functions in order to extend the generic ptrace API with PTRACE_GET_SYSCALL_INFO request. Link: http://lkml.kernel.org/r/20190510152824.GE28558@altlinux.orgSigned-off-by: NDmitry V. Levin <ldv@altlinux.org> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Elvira Khabirova <lineprinter@altlinux.org> Cc: Eugene Syromyatnikov <esyr@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greentime Hu <greentime@andestech.com> Cc: Helge Deller <deller@gmx.de> [parisc] Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: James Hogan <jhogan@kernel.org> Cc: kbuild test robot <lkp@intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
Architectures which support kprobes have very similar boilerplate around calling kprobe_fault_handler(). Use a helper function in kprobes.h to unify them, based on the x86 code. This changes the behaviour for other architectures when preemption is enabled. Previously, they would have disabled preemption while calling the kprobe handler. However, preemption would be disabled if this fault was due to a kprobe, so we know the fault was not due to a kprobe handler and can simply return failure. This behaviour was introduced in commit a980c0ef ("x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()") [anshuman.khandual@arm.com: export kprobe_fault_handler()] Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: James Hogan <jhogan@kernel.org> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anshuman Khandual 提交于
Finish up what commit c2febafc ("mm: convert generic code to 5-level paging") started while levelling up P4D huge mapping support at par with PUD and PMD. A new arch call back arch_ioremap_p4d_supported() is added which just maintains status quo (P4D huge map not supported) on x86, arm64 and powerpc. When HAVE_ARCH_HUGE_VMAP is enabled its just a simple check from the arch about the support, hence runtime effects are minimal. Link: http://lkml.kernel.org/r/1561699231-20991-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 7月, 2019 6 次提交
-
-
由 Andrea Arcangeli 提交于
25078dc1 first introduced an off by one error in the ZONE_DMA initialization of PPC_BOOK3E_64=y and since 9739ab7e the off by one applies to PPC32=y too. This simply corrects the off by one and should resolve crashes like below: [ 65.179101] page 0x7fff outside node 0 zone DMA [ 0x0 - 0x7fff ] Unfortunately in various MM places "max" means a non inclusive end of range. free_area_init_nodes max_zone_pfn parameter is one case and MAX_ORDER is another one (unrelated) that comes by memory. Reported-by: NZorro Lang <zlang@redhat.com> Fixes: 25078dc1 ("powerpc: use mm zones more sensibly") Fixes: 9739ab7e ("powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac") Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190625141727.2883-1-aarcange@redhat.com
-
由 Suraj Jitindar Singh 提交于
The Performance Stop Status and Control Register (PSSCR) is used to control the power saving facilities of the processor. This register has various fields, some of which can be modified only in hypervisor state, and others which can be modified in both hypervisor and privileged non-hypervisor state. The bits which can be modified in privileged non-hypervisor state are referred to as guest visible. Currently the L0 hypervisor saves and restores both it's own host value as well as the guest value of the PSSCR when context switching between the hypervisor and guest. However a nested hypervisor running it's own nested guests (as indicated by kvmhv_on_pseries()) doesn't context switch the PSSCR register. That means if a nested (L2) guest modifies the PSSCR then the L1 guest hypervisor will run with that modified value, and if the L1 guest hypervisor modifies the PSSCR and then goes to run the nested (L2) guest again then the L2 PSSCR value will be lost. Fix this by having the (L1) nested hypervisor save and restore both its host and the guest PSSCR value when entering and exiting a nested (L2) guest. Note that only the guest visible parts of the PSSCR are context switched since this is all the L1 nested hypervisor can access, this is fine however as these are the only fields the L0 hypervisor provides guest control of anyway and so all other fields are ignored. This could also have been implemented by adding the PSSCR register to the hv_regs passed to the L0 hypervisor as input to the H_ENTER_NESTED hcall, however this would have meant updating the structure layout and thus required modifications to both the L0 and L1 kernels. Whereas the approach used doesn't require L0 kernel modifications while achieving the same result. Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190703012022.15644-3-sjitindarsingh@gmail.com
-
由 Suraj Jitindar Singh 提交于
The ability to run nested guests under KVM means that a guest can also act as a hypervisor for it's own nested guest. Currently ppc_set_pmu_inuse() assumes that either FW_FEATURE_LPAR is set, indicating a guest environment, and so sets the pmcregs_in_use flag in the lppaca, or that it isn't set, indicating a hypervisor environment, and so sets the pmcregs_in_use flag in the paca. The pmcregs_in_use flag in the lppaca is used to communicate this information to a hypervisor and so must be set in a guest environment. The pmcregs_in_use flag in the paca is used by KVM code to determine whether the host state of the performance monitoring unit (PMU) must be saved and restored when running a guest. Thus when a guest also acts as a hypervisor it must set this bit in both places since it needs to ensure both that the real hypervisor saves it's PMU registers when it runs (requires pmcregs_in_use flag in lppaca), and that it saves it's own PMU registers when running a nested guest (requires pmcregs_in_use flag in paca). Modify ppc_set_pmu_inuse() so that the pmcregs_in_use bit is set in both the lppaca and the paca when a guest (LPAR) is running with the capability of running it's own guests (CONFIG_KVM_BOOK3S_HV_POSSIBLE). Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190703012022.15644-2-sjitindarsingh@gmail.com
-
由 Suraj Jitindar Singh 提交于
The performance monitoring unit (PMU) registers are saved on guest exit when the guest has set the pmcregs_in_use flag in its lppaca, if it exists, or unconditionally if it doesn't. If a nested guest is being run then the hypervisor doesn't, and in most cases can't, know if the PMU registers are in use since it doesn't know the location of the lppaca for the nested guest, although it may have one for its immediate guest. This results in the values of these registers being lost across nested guest entry and exit in the case where the nested guest was making use of the performance monitoring facility while it's nested guest hypervisor wasn't. Further more the hypervisor could interrupt a guest hypervisor between when it has loaded up the PMU registers and it calling H_ENTER_NESTED or between returning from the nested guest to the guest hypervisor and the guest hypervisor reading the PMU registers, in kvmhv_p9_guest_entry(). This means that it isn't sufficient to just save the PMU registers when entering or exiting a nested guest, but that it is necessary to always save the PMU registers whenever a guest is capable of running nested guests to ensure the register values aren't lost in the context switch. Ensure the PMU register values are preserved by always saving their value into the vcpu struct when a guest is capable of running nested guests. This should have minimal performance impact however any impact can be avoided by booting a guest with "-machine pseries,cap-nested-hv=false" on the qemu commandline. Fixes: 95a6432c ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190703012022.15644-1-sjitindarsingh@gmail.com
-
由 Suraj Jitindar Singh 提交于
The virtual real mode addressing (VRMA) mechanism is used when a partition is using HPT (Hash Page Table) translation and performs real mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode effective address bits 0:23 are treated as zero (i.e. the access is aliased to 0) and the access is performed using an implicit 1TB SLB entry. The size of the RMA (Real Memory Area) is communicated to the guest as the size of the first memory region in the device tree. And because of the mechanism described above can be expected to not exceed 1TB. In the event that the host erroneously represents the RMA as being larger than 1TB, guest accesses in real mode to memory addresses above 1TB will be aliased down to below 1TB. This means that a memory access performed in real mode may differ to one performed in virtual mode for the same memory address, which would likely have unintended consequences. To avoid this outcome have the guest explicitly limit the size of the RMA to the current maximum, which is 1TB. This means that even if the first memory block is larger than 1TB, only the first 1TB should be accessed in real mode. Fixes: c610d65c ("powerpc/pseries: lift RTAS limit for hash") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190710052018.14628-1-sjitindarsingh@gmail.com
-
由 Christian Brauner 提交于
A while ago Arnd made it possible to give new system calls the same syscall number on all architectures (except alpha). To not break this nice new feature let's mark 435 for clone3 as reserved on all architectures that do not yet implement it. Even if an architecture does not plan to implement it this ensures that new system calls coming after clone3 will have the same number on all architectures. Signed-off-by: NChristian Brauner <christian@brauner.io> Cc: linux-arch@vger.kernel.org Cc: linux-alpha@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-m68k@lists.linux-m68k.org Cc: linux-mips@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: linux-sh@vger.kernel.org Cc: sparclinux@vger.kernel.org Link: https://lore.kernel.org/r/20190714192205.27190-2-christian@brauner.ioReviewed-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NChristian Brauner <christian@brauner.io>
-
- 13 7月, 2019 4 次提交
-
-
由 Christoph Hellwig 提交于
While only powerpc supports the hugepd case, the code is pretty generic and I'd like to keep all GUP internals in one place. Link: http://lkml.kernel.org/r/20190625143715.1689-15-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
We only support the generic GUP now, so rename the config option to be more clear, and always use the mm/Kconfig definition of the symbol and select it from the arch Kconfigs. Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Walmsley 提交于
The RISC-V architecture has a register named the "Supervisor Exception Program Counter", or "sepc". This abbreviation triggers checkpatch.pl's misspelling detector, resulting in noise in the checkpatch output. The risk that this noise could cause more useful warnings to be missed seems to outweigh the harm of an occasional misspelling of "spec". Thus drop the "sepc" entry from the misspelling list. [akpm@linux-foundation.org: fix existing "sepc" instances, per Joe] Link: http://lkml.kernel.org/r/20190518210037.13674-1-paul.walmsley@sifive.comSigned-off-by: NPaul Walmsley <paul.walmsley@sifive.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Architectures like powerpc use different address range to map ioremap and vmalloc range. The memunmap() check used by the nvdimm layer was wrongly using is_vmalloc_addr() to check for ioremap range which fails for ppc64. This result in ppc64 not freeing the ioremap mapping. The side effect of this is an unbind failure during module unload with papr_scm nvdimm driver Link: http://lkml.kernel.org/r/20190701134038.14165-1-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 7月, 2019 1 次提交
-
-
由 Athira Rajeev 提交于
commit 10d91611 ("powerpc/64s: Reimplement book3s idle code in C") reimplemented book3S code to pltform/powernv/idle.c. But when doing so missed to add the per-thread LDBAR update in the core_woken path of the power9_idle_stop(). Patch fixes the same. Fixes: 10d91611 ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190702105836.26695-1-maddy@linux.vnet.ibm.com
-
- 11 7月, 2019 1 次提交
-
-
由 Oliver O'Halloran 提交于
In commit 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space") support for using hugepages in the vmalloc and ioremap areas was enabled for radix. Unfortunately this broke EEH MMIO error checking. Detection works by inserting a hook which checks the results of the ioreadXX() set of functions. When a read returns a 0xFFs response we need to check for an error which we do by mapping the (virtual) MMIO address back to a physical address, then mapping physical address to a PCI device via an interval tree. When translating virt -> phys we currently assume the ioremap space is only populated by PAGE_SIZE mappings. If a hugepage mapping is found we emit a WARN_ON(), but otherwise handles the check as though a normal page was found. In pathalogical cases such as copying a buffer containing a lot of 0xFFs from BAR memory this can result in the system not booting because it's too busy printing WARN_ON()s. There's no real reason to assume huge pages can't be present and we're prefectly capable of handling them, so do that. Fixes: 4a7b06c157a2 ("powerpc/eeh: Handle hugepages in ioremap space") Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: NOliver O'Halloran <oohall@gmail.com> Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190710150517.27114-1-oohall@gmail.com
-
- 10 7月, 2019 3 次提交
-
-
由 Masahiro Yamada 提交于
Commit 5e9dcb61 ("powerpc/boot: Expose Kconfig symbols to wrapper") was wrong, but commit e41b93a6 ("powerpc/boot: Fix build failures with -j 1") was also wrong. The correct dependency is: $(obj)/serial.o: $(obj)/autoconf.h However, I do not see the reason why we need to copy autoconf.h to arch/power/boot/. Nor do I see consistency in the way of passing CONFIG options. decompress.c references CONFIG_KERNEL_GZIP and CONFIG_KERNEL_XZ, which are passed via the command line. serial.c includes autoconf.h to reference a couple of CONFIG options, but this is fragile because we often forget to include "autoconf.h" from source files. In fact, it is already broken. ppc_asm.h references CONFIG_PPC_8xx, but utils.S is not given any way to access CONFIG options. So, CONFIG_PPC_8xx is never defined here. Pass $(LINUXINCLUDE) to make sure CONFIG options are accessible from all .c and .S files in arch/powerpc/boot/. I also removed the -traditional flag to make include/linux/kconfig.h work. This flag makes the preprocessor imitate the behavior of the pre-standard C compiler, but I do not understand why it is necessary. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190705100144.28785-2-yamada.masahiro@socionext.com
-
由 Masahiro Yamada 提交于
The next commit will make the way of passing CONFIG options more robust. Unfortunately, it would uncover another hidden issue; without this commit, skiroot_defconfig would be broken like this: | WRAP arch/powerpc/boot/zImage.pseries | arch/powerpc/boot/wrapper.a(decompress.o): In function `bcj_powerpc.isra.10': | decompress.c:(.text+0x720): undefined reference to `get_unaligned_be32' | decompress.c:(.text+0x7a8): undefined reference to `put_unaligned_be32' | make[1]: *** [arch/powerpc/boot/Makefile;383: arch/powerpc/boot/zImage.pseries] Error 1 | make: *** [arch/powerpc/Makefile;295: zImage] Error 2 skiroot_defconfig is the only defconfig that enables CONFIG_KERNEL_XZ for ppc, which has never been correctly built before. I figured out the root cause in lib/decompress_unxz.c: | #ifdef CONFIG_PPC | # define XZ_DEC_POWERPC | #endif CONFIG_PPC is undefined here in the ppc bootwrapper because autoconf.h is not included except by arch/powerpc/boot/serial.c XZ_DEC_POWERPC is not defined, therefore, bcj_powerpc() is not compiled for the bootwrapper. With the next commit passing CONFIG_PPC correctly, we would realize that {get,put}_unaligned_be32 was missing. Unlike the other decompressors, the ppc bootwrapper duplicates all the necessary helpers in arch/powerpc/boot/. The other architectures define __KERNEL__ and pull in helpers for building the decompressors. If ppc bootwrapper had defined __KERNEL__, lib/xz/xz_private.h would have included <asm/unaligned.h>: | #ifdef __KERNEL__ | # include <linux/xz.h> | # include <linux/kernel.h> | # include <asm/unaligned.h> However, doing so would cause tons of definition conflicts since the bootwrapper has duplicated everything. I just added copies of {get,put}_unaligned_be32, following the bootwrapper coding convention. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190705100144.28785-1-yamada.masahiro@socionext.com
-
由 Michael Ellerman 提交于
When CONFIG_PPC_IRQ_SOFT_MASK_DEBUG is enabled (uncommon), we have a series of WARN_ON's in arch_local_irq_restore(). These are "should never happen" conditions, but if they do happen they can flood the console and render the system unusable. So switch them to WARN_ON_ONCE(). Fixes: e2b36d59 ("powerpc/64: Don't trace code that runs with the soft irq mask unreconciled") Fixes: 9b81c021 ("powerpc/64s: make PACA_IRQ_HARD_DIS track MSR[EE] closely") Fixes: 7c0482e3 ("powerpc/irq: Fix another case of lazy IRQ state getting out of sync") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190708061046.7075-1-mpe@ellerman.id.au
-
- 05 7月, 2019 1 次提交
-
-
由 Christophe Leroy 提交于
To increase readability/maintainability, replace hard coded instructions values by symbolic names. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> [mpe: Fix R_PPC64_ENTRY case, the addi reads from r2 not r12] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-