- 28 11月, 2019 4 次提交
-
-
由 Bharata B Rao 提交于
Register the new memslot with UV during plug and unregister the memslot during unplug. In addition, release all the device pages during unplug. Signed-off-by: NBharata B Rao <bharata@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Bharata B Rao 提交于
- After the guest becomes secure, when we handle a page fault of a page belonging to SVM in HV, send that page to UV via UV_PAGE_IN. - Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL. - Ensure all those routines that walk the secondary page tables of the guest don't do so in case of secure VM. For secure guest, the active secondary page tables are in secure memory and the secondary page tables in HV are freed when guest becomes secure. Signed-off-by: NBharata B Rao <bharata@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Bharata B Rao 提交于
A secure guest will share some of its pages with hypervisor (Eg. virtio bounce buffers etc). Support sharing of pages between hypervisor and ultravisor. Shared page is reachable via both HV and UV side page tables. Once a secure page is converted to shared page, the device page that represents the secure page is unmapped from the HV side page tables. Signed-off-by: NBharata B Rao <bharata@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Bharata B Rao 提交于
A pseries guest can be run as secure guest on Ultravisor-enabled POWER platforms. On such platforms, this driver will be used to manage the movement of guest pages between the normal memory managed by hypervisor (HV) and secure memory managed by Ultravisor (UV). HV is informed about the guest's transition to secure mode via hcalls: H_SVM_INIT_START: Initiate securing a VM H_SVM_INIT_DONE: Conclude securing a VM As part of H_SVM_INIT_START, register all existing memslots with the UV. H_SVM_INIT_DONE call by UV informs HV that transition of the guest to secure mode is complete. These two states (transition to secure mode STARTED and transition to secure mode COMPLETED) are recorded in kvm->arch.secure_guest. Setting these states will cause the assembly code that enters the guest to call the UV_RETURN ucall instead of trying to enter the guest directly. Migration of pages betwen normal and secure memory of secure guest is implemented in H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls. H_SVM_PAGE_IN: Move the content of a normal page to secure page H_SVM_PAGE_OUT: Move the content of a secure page to normal page Private ZONE_DEVICE memory equal to the amount of secure memory available in the platform for running secure guests is created. Whenever a page belonging to the guest becomes secure, a page from this private device memory is used to represent and track that secure page on the HV side. The movement of pages between normal and secure memory is done via migrate_vma_pages() using UV_PAGE_IN and UV_PAGE_OUT ucalls. In order to prevent the device private pages (that correspond to pages of secure guest) from participating in KSM merging, H_SVM_PAGE_IN calls ksm_madvise() under read version of mmap_sem. However ksm_madvise() needs to be under write lock. Hence we call kvmppc_svm_page_in with mmap_sem held for writing, and it then downgrades to a read lock after calling ksm_madvise. [paulus@ozlabs.org - roll in patch "KVM: PPC: Book3S HV: Take write mmap_sem when calling ksm_madvise"] Signed-off-by: NBharata B Rao <bharata@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 21 11月, 2019 2 次提交
-
-
由 Greg Kurz 提交于
We need to check the host page size is big enough to accomodate the EQ. Let's do this before taking a reference on the EQ page to avoid a potential leak if the check fails. Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Greg Kurz 提交于
The EQ page is allocated by the guest and then passed to the hypervisor with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page before handing it over to the HW. This reference is dropped either when the guest issues the H_INT_RESET hcall or when the KVM device is released. But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times, either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot). In both cases the existing EQ page reference is leaked because we simply overwrite it in the XIVE queue structure without calling put_page(). This is especially visible when the guest memory is backed with huge pages: start a VM up to the guest userspace, either reboot it or unplug a vCPU, quit QEMU. The leak is observed by comparing the value of HugePages_Free in /proc/meminfo before and after the VM is run. Ideally we'd want the XIVE code to handle the EQ page de-allocation at the platform level. This isn't the case right now because the various XIVE drivers have different allocation needs. It could maybe worth introducing hooks for this purpose instead of exposing XIVE internals to the drivers, but this is certainly a huge work to be done later. In the meantime, for easier backport, fix both vCPU unplug and guest reboot leaks by introducing a wrapper around xive_native_configure_queue() that does the necessary cleanup. Reported-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NGreg Kurz <groug@kaod.org> Tested-by: NLijun Pan <ljp@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 22 10月, 2019 13 次提交
-
-
由 Sean Christopherson 提交于
Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Nicholas Piggin 提交于
AIL=2 mode has no known users, so is not well tested or supported. Disallow guests from selecting this mode because it may become deprecated in future versions of the architecture. This policy decision is not left to QEMU because KVM support is required for AIL=2 (when injecting interrupts). Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which can result in the guest receiving interrupts as if LPCR[AIL]=0 contrary to the ISA. In practice, Linux guests cope with this deviation, but it should be fixed. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
This consolidates the HV interrupt delivery logic into one place. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Nicholas Piggin 提交于
Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Greg Kurz 提交于
Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that userspace can tell how many interrupt servers it needs. If a VM needs less than the current default of KVM_MAX_VCPUS (2048), we can allocate less VPs in OPAL. Combined with a core stride (VSMT) that matches the number of guest threads per core, this may substantially increases the number of VMs that can run concurrently with an in-kernel XIVE device. Since the legacy XIVE KVM device is exposed to userspace through the XICS KVM API, a new attribute group is added to it for this purpose. While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES in the XICS documentation. Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Greg Kurz 提交于
The XIVE VP is an internal structure which allow the XIVE interrupt controller to maintain the interrupt context state of vCPUs non dispatched on HW threads. When a guest is started, the XIVE KVM device allocates a block of XIVE VPs in OPAL, enough to accommodate the highest possible vCPU id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048). With a guest's core stride of 8 and a threading mode of 1 (QEMU's default), a VM must run at least 256 vCPUs to actually need such a range of VPs. A POWER9 system has a limited XIVE VP space : 512k and KVM is currently wasting this HW resource with large VP allocations, especially since a typical VM likely runs with a lot less vCPUs. Make the size of the VP block configurable. Add an nr_servers field to the XIVE structure and a function to set it for this purpose. Split VP allocation out of the device create function. Since the VP block isn't used before the first vCPU connects to the XIVE KVM device, allocation is now performed by kvmppc_xive_connect_vcpu(). This gives the opportunity to set nr_servers in between: kvmppc_xive_create() / kvmppc_xive_native_create() . . kvmppc_xive_set_nr_servers() . . kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu() The connect_vcpu() functions check that the vCPU id is below nr_servers and if it is the first vCPU they allocate the VP block. This is protected against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers() with the xive->lock mutex. Also, the block is allocated once for the device lifetime: nr_servers should stay constant otherwise connect_vcpu() could generate a boggus VP id and likely crash OPAL. It is thus forbidden to update nr_servers once the block is allocated. If the VP allocation fail, return ENOSPC which seems more appropriate to report the depletion of system wide HW resource than ENOMEM or ENXIO. A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence only need 256 VPs instead of 2048. If the stride is set to match the number of threads per core, this goes further down to 32. This will be exposed to userspace by a subsequent patch. Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Greg Kurz 提交于
Reduce code duplication by consolidating the checking of vCPU ids and VP ids to a common helper used by both legacy and native XIVE KVM devices. And explain the magic with a comment. Signed-off-by: NGreg Kurz <groug@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Greg Kurz 提交于
Print out the VP id of each connected vCPU, this allow to see: - the VP block base in which OPAL encodes information that may be useful when debugging - the packed vCPU id which may differ from the raw vCPU id if the latter is >= KVM_MAX_VCPUS (2048) Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Greg Kurz 提交于
If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive set forever since the release method isn't called in this case. Any subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will thus always fail (DoS). This is a problem for QEMU since it destroys and re-creates these devices when the VM is reset: the VM would be restricted to using the much slower emulated XIVE or XICS forever. As an alternative to adding rollback, do not assign kvm->arch.xive before making sure the XIVE VPs are allocated in OPAL. Cc: stable@vger.kernel.org # v5.2 Fixes: 5422e951 ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Leonardo Bras 提交于
Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: NLeonardo Bras <leonardo@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Leonardo Bras 提交于
Reduces the number of calls to get_current() in order to get the value of current->mm by doing it once and storing the value, since it is not supposed to change inside the same process). Signed-off-by: NLeonardo Bras <leonardo@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 21 10月, 2019 1 次提交
-
-
由 Fabiano Rosas 提交于
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request the next instruction to be single stepped via the KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure. This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order to inform userspace about the state of single stepping support. We currently don't have support for guest single stepping implemented in Book3S HV so the capability is only present for Book3S PR and BookE. Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 15 10月, 2019 1 次提交
-
-
由 Greg Kurz 提交于
Connecting a vCPU to a XIVE KVM device means establishing a 1:1 association between a vCPU id and the offset (VP id) of a VP structure within a fixed size block of VPs. We currently try to enforce the 1:1 relationship by checking that a vCPU with the same id isn't already connected. This is good but unfortunately not enough because we don't map VP ids to raw vCPU ids but to packed vCPU ids, and the packing function kvmppc_pack_vcpu_id() isn't bijective by design. We got away with it because QEMU passes vCPU ids that fit well in the packing pattern. But nothing prevents userspace to come up with a forged vCPU id resulting in a packed id collision which causes the KVM device to associate two vCPUs to the same VP. This greatly confuses the irq layer and ultimately crashes the kernel, as shown below. Example: a guest with 1 guest thread per core, a core stride of 8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to inject at some point an invalid vCPU id 348, which is the packed version of itself and 2392, we get: genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348) CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38 Call Trace: [c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable) [c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0 [c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260 [c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm] [c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm] [c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm] [c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm] [c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30 [c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120 [c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80 [c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68 xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392 ------------[ cut here ]------------ remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348' WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200 Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm] CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38 NIP: c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0 REGS: c000003f7f9934b0 TRAP: 0700 Not tainted (5.3.0-xive-nr-servers-5.3-gku+) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48228222 XER: 20040000 CFAR: c000000000131a50 IRQMASK: 0 GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057 GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033 GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1 GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000 GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006 GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48 GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0 NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200 LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 Call Trace: [c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable) [c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150 [c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0 [c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100 [c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80 [c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm] [c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm] [c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm] [c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm] [c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30 [c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120 [c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80 [c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68 Instruction dump: 2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378 3884f080 3863b948 4bbf6925 60000000 <0fe00000> 4bffff7c fba10088 4bbf6e41 ---[ end trace b925b67a74a1d8d1 ]--- BUG: Kernel NULL pointer dereference at 0x00000010 Faulting instruction address: 0xc00800000d44fc04 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm] CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G W 5.3.0-xive-nr-servers-5.3-gku+ #38 NIP: c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970 REGS: c000003f7f9938e0 TRAP: 0300 Tainted: G W (5.3.0-xive-nr-servers-5.3-gku+) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24228882 XER: 20040000 CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000 GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8 GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1 GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000 GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8 GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000 GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90 NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm] LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] Call Trace: [c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable) [c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm] [c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm] [c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320 [c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0 [c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440 [c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74 Instruction dump: 3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c 2f890000 419effb8 4800821d e8410018 <e9230010> e9490008 9b2a0039 7c0004ac ---[ end trace b925b67a74a1d8d2 ]--- Kernel panic - not syncing: Fatal exception This affects both XIVE and XICS-on-XIVE devices since the beginning. Check the VP id instead of the vCPU id when a new vCPU is connected. The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu() is moved after the check to avoid the need for rollback. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 09 10月, 2019 1 次提交
-
-
由 Jordan Niethe 提交于
kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to continue. This happens after resetting the PCR. Before commit 13c7bb3c ("powerpc/64s: Set reserved PCR bits"), r0 would always be 0 before it was stored to kvmppc_vcore->in_guest. However because of this change in the commit: /* Reset PCR */ ld r0, VCORE_PCR(r5) - cmpdi r0, 0 + LOAD_REG_IMMEDIATE(r6, PCR_MASK) + cmpld r0, r6 beq 18f - li r0, 0 - mtspr SPRN_PCR, r0 + mtspr SPRN_PCR, r6 18: /* Signal secondary CPUs to continue */ stb r0,VCORE_IN_GUEST(r5) We are no longer comparing r0 against 0 and loading it with 0 if it contains something else. Hence when we store r0 to kvmppc_vcore->in_guest, it might not be 0. This means that secondary CPUs will not be signalled to continue. Those CPUs get stuck and errors like the following are logged: KVM: CPU 1 seems to be stuck KVM: CPU 2 seems to be stuck KVM: CPU 3 seems to be stuck KVM: CPU 4 seems to be stuck KVM: CPU 5 seems to be stuck KVM: CPU 6 seems to be stuck KVM: CPU 7 seems to be stuck This can be reproduced with: $ for i in `seq 1 7` ; do chcpu -d $i ; done ; $ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \ -M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \ -kernel vmlinux -initrd initrd.cpio.xz Fix by making sure r0 is 0 before storing it to kvmppc_vcore->in_guest. Fixes: 13c7bb3c ("powerpc/64s: Set reserved PCR bits") Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NJordan Niethe <jniethe5@gmail.com> Reviewed-by: NAlistair Popple <alistair@popple.id.au> Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
-
- 01 10月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 9月, 2019 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com
-
由 Aneesh Kumar K.V 提交于
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Fixes: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
-
由 Michael Roth 提交于
On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB of memory running the following guest configs: guest A: - 224GB of memory - 56 VCPUs (sockets=1,cores=28,threads=2), where: VCPUs 0-1 are pinned to CPUs 0-3, VCPUs 2-3 are pinned to CPUs 4-7, ... VCPUs 54-55 are pinned to CPUs 108-111 guest B: - 4GB of memory - 4 VCPUs (sockets=1,cores=4,threads=1) with the following workloads (with KSM and THP enabled in all): guest A: stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M guest B: stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M host: stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M the below soft-lockup traces were observed after an hour or so and persisted until the host was reset (this was found to be reliably reproducible for this configuration, for kernels 4.15, 4.18, 5.0, and 5.3-rc5): [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU [ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941 [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709] [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913] [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331] [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338] [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525] [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU [ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243 [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU [ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714 [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU [ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403 [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds. [ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds. [ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds. [ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds. [ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds. [ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds. [ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds. [ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds. [ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 CPUs 45, 24, and 124 are stuck on spin locks, likely held by CPUs 105 and 31. CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on target CPU 42. For instance: # CPU 105 registers (via xmon) R00 = c00000000020b20c R16 = 00007d1bcd800000 R01 = c00000363eaa7970 R17 = 0000000000000001 R02 = c0000000019b3a00 R18 = 000000000000006b R03 = 000000000000002a R19 = 00007d537d7aecf0 R04 = 000000000000002a R20 = 60000000000000e0 R05 = 000000000000002a R21 = 0801000000000080 R06 = c0002073fb0caa08 R22 = 0000000000000d60 R07 = c0000000019ddd78 R23 = 0000000000000001 R08 = 000000000000002a R24 = c00000000147a700 R09 = 0000000000000001 R25 = c0002073fb0ca908 R10 = c000008ffeb4e660 R26 = 0000000000000000 R11 = c0002073fb0ca900 R27 = c0000000019e2464 R12 = c000000000050790 R28 = c0000000000812b0 R13 = c000207fff623e00 R29 = c0002073fb0ca808 R14 = 00007d1bbee00000 R30 = c0002073fb0ca800 R15 = 00007d1bcd600000 R31 = 0000000000000800 pc = c00000000020b260 smp_call_function_many+0x3d0/0x460 cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460 lr = c00000000020b20c smp_call_function_many+0x37c/0x460 msr = 900000010288b033 cr = 44024824 ctr = c000000000050790 xer = 0000000000000000 trap = 100 CPU 42 is running normally, doing VCPU work: # CPU 42 stack trace (via xmon) [link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv] [c000008ed3343820] c000008ed3343850 (unreliable) [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv] [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv] [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm] [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm] [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm] [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70 [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120 [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80 [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70 --- Exception: c00 (System Call) at 00007d545cfd7694 SP (7d53ff7edf50) is in userspace It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION] was set for CPU 42 by at least 1 of the CPUs waiting in smp_call_function_many(), but somehow the corresponding call_single_queue entries were never processed by CPU 42, causing the callers to spin in csd_lock_wait() indefinitely. Nick Piggin suggested something similar to the following sequence as a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE IPI messages seems to be most common, but any mix of CALL_FUNCTION and !CALL_FUNCTION messages could trigger it): CPU X: smp_muxed_ipi_set_message(): X: smp_mb() X: message[RESCHEDULE] = 1 X: doorbell_global_ipi(42): X: kvmppc_set_host_ipi(42, 1) X: ppc_msgsnd_sync()/smp_mb() X: ppc_msgsnd() -> 42 42: doorbell_exception(): // from CPU X 42: ppc_msgsync() 105: smp_muxed_ipi_set_message(): 105: smb_mb() // STORE DEFERRED DUE TO RE-ORDERING --105: message[CALL_FUNCTION] = 1 | 105: doorbell_global_ipi(42): | 105: kvmppc_set_host_ipi(42, 1) | 42: kvmppc_set_host_ipi(42, 0) | 42: smp_ipi_demux_relaxed() | 42: // returns to executing guest | // RE-ORDERED STORE COMPLETES ->105: message[CALL_FUNCTION] = 1 105: ppc_msgsnd_sync()/smp_mb() 105: ppc_msgsnd() -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue This can be prevented with an smp_mb() at the beginning of kvmppc_set_host_ipi(), such that stores to message[<type>] (or other state indicated by the host_ipi flag) are ordered vs. the store to to host_ipi. However, doing so might still allow for the following scenario (not yet observed): CPU X: smp_muxed_ipi_set_message(): X: smp_mb() X: message[RESCHEDULE] = 1 X: doorbell_global_ipi(42): X: kvmppc_set_host_ipi(42, 1) X: ppc_msgsnd_sync()/smp_mb() X: ppc_msgsnd() -> 42 42: doorbell_exception(): // from CPU X 42: ppc_msgsync() // STORE DEFERRED DUE TO RE-ORDERING -- 42: kvmppc_set_host_ipi(42, 0) | 42: smp_ipi_demux_relaxed() | 105: smp_muxed_ipi_set_message(): | 105: smb_mb() | 105: message[CALL_FUNCTION] = 1 | 105: doorbell_global_ipi(42): | 105: kvmppc_set_host_ipi(42, 1) | // RE-ORDERED STORE COMPLETES -> 42: kvmppc_set_host_ipi(42, 0) 42: // returns to executing guest 105: ppc_msgsnd_sync()/smp_mb() 105: ppc_msgsnd() -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue Fixing this scenario would require an smp_mb() *after* clearing host_ipi flag in kvmppc_set_host_ipi() to order the store vs. subsequent processing of IPI messages. To handle both cases, this patch splits kvmppc_set_host_ipi() into separate set/clear functions, where we execute smp_mb() prior to setting host_ipi flag, and after clearing host_ipi flag. These functions pair with each other to synchronize the sender and receiver sides. With that change in place the above workload ran for 20 hours without triggering any lock-ups. Fixes: 755563bc ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0 Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com
-
- 21 9月, 2019 1 次提交
-
-
由 Jordan Niethe 提交于
Currently the reserved bits of the Processor Compatibility Register (PCR) are cleared as per the Programming Note in Section 1.3.3 of version 3.0B of the Power ISA. This causes all new architecture features to be made available when running on newer processors with new architecture features added to the PCR as bits must be set to disable a given feature. For example to disable new features added as part of Version 2.07 of the ISA the corresponding bit in the PCR needs to be set. As new processor features generally require explicit kernel support they should be disabled until such support is implemented. Therefore kernels should set all unknown/reserved bits in the PCR such that any new architecture features which the kernel does not currently know about get disabled. An update is planned to the ISA to clarify that the PCR is an exception to the Programming Note on reserved bits in Section 1.3.3. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NJordan Niethe <jniethe5@gmail.com> Tested-by: NJoel Stanley <joel@jms.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190917004605.22471-2-alistair@popple.id.au
-
- 05 9月, 2019 3 次提交
-
-
由 Nicholas Piggin 提交于
Introduce two options to control the use of the tlbie instruction. A boot time option which completely disables the kernel using the instruction, this is currently incompatible with HASH MMU, KVM, and coherent accelerators. And a debugfs option can be switched at runtime and avoids using tlbie for invalidating CPU TLBs for normal process and kernel address mappings. Coherent accelerators are still managed with tlbie, as will KVM partition scope translations. Cross-CPU TLB flushing is implemented with IPIs and tlbiel. This is a basic implementation which does not attempt to make any optimisation beyond the tlbie implementation. This is useful for performance testing among other things. For example in certain situations on large systems, using IPIs may be faster than tlbie as they can be directed rather than broadcast. Later we may also take advantage of the IPIs to do more interesting things such as trim the mm cpumask more aggressively. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-7-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
No functional change. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-4-npiggin@gmail.com
-
由 Nicholas Piggin 提交于
There should be no functional changes. - Use calls to existing radix_tlb.c functions in flush_partition. - Rename radix__flush_tlb_lpid to radix__flush_all_lpid and similar, because they flush everything, matching flush_all_mm rather than flush_tlb_mm for the lpid. - Remove some unused radix_tlb.c flush primitives. Signed-off: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190902152931.17840-3-npiggin@gmail.com
-
- 30 8月, 2019 3 次提交
-
-
由 Sukadev Bhattiprolu 提交于
When an SVM makes an hypercall or incurs some other exception, the Ultravisor usually forwards (a.k.a. reflects) the exceptions to the Hypervisor. After processing the exception, Hypervisor uses the UV_RETURN ultracall to return control back to the SVM. The expected register state on entry to this ultracall is: * Non-volatile registers are restored to their original values. * If returning from an hypercall, register R0 contains the return value (unlike other ultracalls) and, registers R4 through R12 contain any output values of the hypercall. * R3 contains the ultracall number, i.e UV_RETURN. * If returning with a synthesized interrupt, R2 contains the synthesized interrupt number. Thanks to input from Paul Mackerras, Ram Pai and Mike Anderson. Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: NClaudio Carvalho <cclaudio@linux.ibm.com> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190822034838.27876-8-cclaudio@linux.ibm.com
-
由 Alexey Kardashevskiy 提交于
Invalidating a TCE cache entry for each updated TCE is quite expensive. This makes use of the new iommu_table_ops::xchg_no_kill()/tce_kill() callbacks to bring down the time spent in mapping a huge guest DMA window; roughly 20s to 10s for each guest's 100GB of DMA space. Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190829085252.72370-3-aik@ozlabs.ru
-
由 Alexey Kardashevskiy 提交于
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we do anything with the existing tables, there is a small window when a check in kvmppc_tce_validate might pass and right after that the guest alters the page with TCEs which can cause early exit from the handler and leave srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap) (real mode) locked. This fixes the bug by jumping to the common exit code with an appropriate unlock. Fixes: 121f80ba ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190826045520.92153-1-aik@ozlabs.ru
-
- 27 8月, 2019 4 次提交
-
-
由 Paul Mackerras 提交于
On POWER9, when userspace reads the value of the DPDES register on a vCPU, it is possible for 0 to be returned although there is a doorbell interrupt pending for the vCPU. This can lead to a doorbell interrupt being lost across migration. If the guest kernel uses doorbell interrupts for IPIs, then it could malfunction because of the lost interrupt. This happens because a newly-generated doorbell interrupt is signalled by setting vcpu->arch.doorbell_request to 1; the DPDES value in vcpu->arch.vcore->dpdes is not updated, because it can only be updated when holding the vcpu mutex, in order to avoid races. To fix this, we OR in vcpu->arch.doorbell_request when reading the DPDES value. Cc: stable@vger.kernel.org # v4.13+ Fixes: 57900694 ("KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9") Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
-
由 Paul Mackerras 提交于
When we are running multiple vcores on the same physical core, they could be from different VMs and so it is possible that one of the VMs could have its arch.mmu_ready flag cleared (for example by a concurrent HPT resize) when we go to run it on a physical core. We currently check the arch.mmu_ready flag for the primary vcore but not the flags for the other vcores that will be run alongside it. This adds that check, and also a check when we select the secondary vcores from the preempted vcores list. Cc: stable@vger.kernel.org # v4.14+ Fixes: 38c53af8 ("KVM: PPC: Book3S HV: Fix exclusion between HPT resizing and other HPT updates") Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Mackerras 提交于
There are some POWER9 machines where the OPAL firmware does not support the OPAL_XIVE_GET_QUEUE_STATE and OPAL_XIVE_SET_QUEUE_STATE calls. The impact of this is that a guest using XIVE natively will not be able to be migrated successfully. On the source side, the get_attr operation on the KVM native device for the KVM_DEV_XIVE_GRP_EQ_CONFIG attribute will fail; on the destination side, the set_attr operation for the same attribute will fail. This adds tests for the existence of the OPAL get/set queue state functions, and if they are not supported, the XIVE-native KVM device is not created and the KVM_CAP_PPC_IRQ_XIVE capability returns false. Userspace can then either provide a software emulation of XIVE, or else tell the guest that it does not have a XIVE controller available to it. Cc: stable@vger.kernel.org # v5.2+ Fixes: 3fab2d10 ("KVM: PPC: Book3S HV: XIVE: Activate XIVE exploitation mode") Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Alexey Kardashevskiy 提交于
H_PUT_TCE_INDIRECT handlers receive a page with up to 512 TCEs from a guest. Although we verify correctness of TCEs before we do anything with the existing tables, there is a small window when a check in kvmppc_tce_validate might pass and right after that the guest alters the page of TCEs, causing an early exit from the handler and leaving srcu_read_lock(&vcpu->kvm->srcu) (virtual mode) or lock_rmap(rmap) (real mode) locked. This fixes the bug by jumping to the common exit code with an appropriate unlock. Cc: stable@vger.kernel.org # v4.11+ Fixes: 121f80ba ("KVM: PPC: VFIO: Add in-kernel acceleration for VFIO") Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 23 8月, 2019 2 次提交
-
-
由 Suraj Jitindar Singh 提交于
The rmap array in the guest memslot is an array of size number of guest pages, allocated at memslot creation time. Each rmap entry in this array is used to store information about the guest page to which it corresponds. For example for a hpt guest it is used to store a lock bit, rc bits, a present bit and the index of a hpt entry in the guest hpt which maps this page. For a radix guest which is running nested guests it is used to store a pointer to a linked list of nested rmap entries which store the nested guest physical address which maps this guest address and for which there is a pte in the shadow page table. As there are currently two uses for the rmap array, and the potential for this to expand to more in the future, define a type field (being the top 8 bits of the rmap entry) to be used to define the type of the rmap entry which is currently present and define two values for this field for the two current uses of the rmap array. Since the nested case uses the rmap entry to store a pointer, define this type as having the two high bits set as is expected for a pointer. Define the hpt entry type as having bit 56 set (bit 7 IBM bit ordering). Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Paul Menzel 提交于
Fix the error below triggered by `-Wimplicit-fallthrough`, by tagging it as an expected fall-through. arch/powerpc/kvm/book3s_32_mmu.c: In function ‘kvmppc_mmu_book3s_32_xlate_pte’: arch/powerpc/kvm/book3s_32_mmu.c:241:21: error: this statement may fall through [-Werror=implicit-fallthrough=] pte->may_write = true; ~~~~~~~~~~~~~~~^~~~~~ arch/powerpc/kvm/book3s_32_mmu.c:242:5: note: here case 3: ^~~~ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 22 8月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
KVM implementations that wrap struct kvm_vcpu with a vendor specific struct, e.g. struct vcpu_vmx, must place the vcpu member at offset 0, otherwise the usercopy region intended to encompass struct kvm_vcpu_arch will instead overlap random chunks of the vendor specific struct. E.g. padding a large number of bytes before struct kvm_vcpu triggers a usercopy warn when running with CONFIG_HARDENED_USERCOPY=y. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-