提交 · b990408537388e9174b642ad36cdef6c47c64d3a · openeuler / Kernel

31 3月, 2020 1 次提交

KVM: Pass kvm_init()'s opaque param to additional arch funcs · b9904085

由 Sean Christopherson 提交于 3月 21, 2020

Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable.  This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.

No functional change intended.
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com> #s390
Acked-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9904085

27 3月, 2020 1 次提交

s390/gmap: return proper error code on ksm unsharing · 7a265361

由 Christian Borntraeger 提交于 3月 27, 2020

If a signal is pending we might return -ENOMEM instead of -EINTR.
We should propagate the proper error during KSM unsharing.
unmerge_ksm_pages returns -ERESTARTSYS on signal_pending. This gets
translated by entry.S to -EINTR. It is important to get this error
code so that userspace can retry.

To make this clearer we also add -EINTR to the documentation of the
PV_ENABLE call, which calls unmerge_ksm_pages.

Fixes: 3ac8e380 ("s390/mm: disable KSM for storage key enabled pages")
Reviewed-by: NJanosch Frank <frankja@linux.vnet.ibm.com>
Reported-by: NMarc Hartmayer <mhartmay@linux.ibm.com>
Tested-by: NMarc Hartmayer <mhartmay@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

7a265361

26 3月, 2020 4 次提交

KVM: Fix out of range accesses to memslots · 0774a964

由 Sean Christopherson 提交于 3月 20, 2020

Reset the LRU slot if it becomes invalid when deleting a memslot to fix
an out-of-bounds/use-after-free access when searching through memslots.

Explicitly check for there being no used slots in search_memslots(), and
in the caller of s390's approximation variant.

Fixes: 36947254 ("KVM: Dynamically size memslot array based on number of used slots")
Reported-by: NQian Cai <cai@lca.pw>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320205546.2396-2-sean.j.christopherson@intel.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0774a964

KVM: X86: Micro-optimize IPI fastpath delay · d5361678

由 Wanpeng Li 提交于 3月 26, 2020

This patch optimizes the virtual IPI fastpath emulation sequence:

write ICR2                          send virtual IPI
read ICR2                           write ICR2
send virtual IPI         ==>        write ICR
write ICR

We can observe ~0.67% performance improvement for IPI microbenchmark
(https://lore.kernel.org/kvm/20171219085010.4081-1-ynorov@caviumnetworks.com/)
on Skylake server.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5361678

KVM: X86: Delay read msr data iff writes ICR MSR · 8a1038de

由 Wanpeng Li 提交于 3月 26, 2020

Delay read msr data until we identify guest accesses ICR MSR to avoid
to penalize all other MSR writes.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1585189202-1708-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a1038de

KVM: PPC: Book3S HV: Add a capability for enabling secure guests · 9a5788c6

由 Paul Mackerras 提交于 3月 19, 2020

At present, on Power systems with Protected Execution Facility
hardware and an ultravisor, a KVM guest can transition to being a
secure guest at will.  Userspace (QEMU) has no way of knowing
whether a host system is capable of running secure guests.  This
will present a problem in future when the ultravisor is capable of
migrating secure guests from one host to another, because
virtualization management software will have no way to ensure that
secure guests only run in domains where all of the hosts can
support secure guests.

This adds a VM capability which has two functions: (a) userspace
can query it to find out whether the host can support secure guests,
and (b) userspace can enable it for a guest, which allows that
guest to become a secure guest.  If userspace does not enable it,
KVM will return an error when the ultravisor does the hypercall
that indicates that the guest is starting to transition to a
secure guest.  The ultravisor will then abort the transition and
the guest will terminate.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NRam Pai <linuxram@us.ibm.com>

9a5788c6

24 3月, 2020 17 次提交

KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs · 7bdabad1

由 Marc Zyngier 提交于 3月 04, 2020

Just like for VLPIs, it is beneficial to avoid trapping on WFI when the
vcpu is using the GICv4.1 SGIs.

Add such a check to vcpu_clear_wfx_traps().
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NEric Auger <eric.auger@redhat.com>
Link: https://lore.kernel.org/r/20200304203330.4967-23-maz@kernel.org

7bdabad1

KVM: arm64: GICv4.1: Reload VLPI configuration on distributor enable/disable · d9c3872c

由 Marc Zyngier 提交于 3月 04, 2020

Each time a Group-enable bit gets flipped, the state of these bits
needs to be forwarded to the hardware. This is a pretty heavy
handed operation, requiring all vcpus to reload their GICv4
configuration. It is thus implemented as a new request type.

These enable bits are programmed into the HW by setting the VGrp{0,1}En
fields of GICR_VPENDBASER when the vPEs are made resident again.

Of course, we only support Group-1 for now...
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Link: https://lore.kernel.org/r/20200304203330.4967-22-maz@kernel.org

d9c3872c

arm: Remove the ability to set HYP vectors outside of the decompressor · 15ff9a39

由 Marc Zyngier 提交于 2月 16, 2020

Although we have to bounce between HYP and SVC to decompress and
relocate the kernel, we don't need to be able to use it in the
kernel itself. So let's drop the functionnality.

Since the vectors are never changed, there is no need to reset them
either, and nobody calls that stub anyway. The last function
(SOFT_RESTART) is still present in order to support kexec.
Signed-off-by: NMarc Zyngier <maz@kernel.org>

15ff9a39

arm: Remove GICv3 vgic compatibility macros · 59c1d9cc

由 Marc Zyngier 提交于 1月 25, 2020

We used to use a set of macros to provide support of vgic-v3 to 32bit
without duplicating everything. We don't need it anymore, so drop it.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NOlof Johansson <olof@lixom.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NVladimir Murzin <vladimir.murzin@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>

59c1d9cc

arm: Remove HYP/Stage-2 page-table support · 3fbb96c0

由 Marc Zyngier 提交于 1月 24, 2020

Remove all traces of Stage-2 and HYP page table support.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NOlof Johansson <olof@lixom.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NVladimir Murzin <vladimir.murzin@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>

3fbb96c0

arm: Remove 32bit KVM host support · 541ad015

由 Marc Zyngier 提交于 1月 24, 2020

That's it. Remove all references to KVM itself, and document
that although it is no more, the ABI between SVC and HYP still
exists.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NOlof Johansson <olof@lixom.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NVladimir Murzin <vladimir.murzin@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>

541ad015

arm: Remove KVM from config files · bb7c62bc

由 Marc Zyngier 提交于 1月 24, 2020

Only one platform is building KVM by default. How crazy! Remove
it whilst nobody is watching.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NOlof Johansson <olof@lixom.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NVladimir Murzin <vladimir.murzin@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>

bb7c62bc

arm: Unplug KVM from the build system · 8a90a322

由 Marc Zyngier 提交于 1月 24, 2020

As we're about to drop KVM/arm on the floor, carefully unplug
it from the build system.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NOlof Johansson <olof@lixom.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NVladimir Murzin <vladimir.murzin@arm.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NChristoffer Dall <christoffer.dall@arm.com>

8a90a322

KVM: PPC: Book3S HV: H_SVM_INIT_START must call UV_RETURN · 377f02d4

由 Laurent Dufour 提交于 3月 20, 2020

When the call to UV_REGISTER_MEM_SLOT is failing, for instance because
there is not enough free secured memory, the Hypervisor (HV) has to call
UV_RETURN to report the error to the Ultravisor (UV). Then the UV will call
H_SVM_INIT_ABORT to abort the securing phase and go back to the calling VM.

If the kvm->arch.secure_guest is not set, in the return path rfid is called
but there is no valid context to get back to the SVM since the Hcall has
been routed by the Ultravisor.

Move the setting of kvm->arch.secure_guest earlier in
kvmppc_h_svm_init_start() so in the return path, UV_RETURN will be called
instead of rfid.

Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NRam Pai <linuxram@us.ibm.com>
Tested-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

377f02d4

KVM: PPC: Book3S HV: Check caller of H_SVM_* Hcalls · 8c47b6ff

由 Laurent Dufour 提交于 3月 20, 2020

The Hcall named H_SVM_* are reserved to the Ultravisor. However, nothing
prevent a malicious VM or SVM to call them. This could lead to weird result
and should be filtered out.

Checking the Secure bit of the calling MSR ensure that the call is coming
from either the Ultravisor or a SVM. But any system call made from a SVM
are going through the Ultravisor, and the Ultravisor should filter out
these malicious call. This way, only the Ultravisor is able to make such a
Hcall.

Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NRam Pai <linuxram@us.ibnm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

8c47b6ff

KVM: PPC: Book3S HV: Skip kvmppc_uvmem_free if Ultravisor is not supported · 9bee484b

由 Fabiano Rosas 提交于 3月 19, 2020

kvmppc_uvmem_init checks for Ultravisor support and returns early if
it is not present. Calling kvmppc_uvmem_free at module exit will cause
an Oops:

$ modprobe -r kvm-hv

  Oops: Kernel access of bad area, sig: 11 [#1]
  <snip>
  NIP:  c000000000789e90 LR: c000000000789e8c CTR: c000000000401030
  REGS: c000003fa7bab9a0 TRAP: 0300   Not tainted  (5.6.0-rc6-00033-g6c90b86a-dirty)
  MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002282  XER: 00000000
  CFAR: c000000000dae880 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 1
  GPR00: c000000000789e8c c000003fa7babc30 c0000000016fe500 0000000000000000
  GPR04: 0000000000000000 0000000000000006 0000000000000000 c000003faf205c00
  GPR08: 0000000000000000 0000000000000001 000000008000002d c00800000ddde140
  GPR12: c000000000401030 c000003ffffd9080 0000000000000001 0000000000000000
  GPR16: 0000000000000000 0000000000000000 000000013aad0074 000000013aaac978
  GPR20: 000000013aad0070 0000000000000000 00007fffd1b37158 0000000000000000
  GPR24: 000000014fef0d58 0000000000000000 000000014fef0cf0 0000000000000001
  GPR28: 0000000000000000 0000000000000000 c0000000018b2a60 0000000000000000
  NIP [c000000000789e90] percpu_ref_kill_and_confirm+0x40/0x170
  LR [c000000000789e8c] percpu_ref_kill_and_confirm+0x3c/0x170
  Call Trace:
  [c000003fa7babc30] [c000003faf2064d4] 0xc000003faf2064d4 (unreliable)
  [c000003fa7babcb0] [c000000000400e8c] dev_pagemap_kill+0x6c/0x80
  [c000003fa7babcd0] [c000000000401064] memunmap_pages+0x34/0x2f0
  [c000003fa7babd50] [c00800000dddd548] kvmppc_uvmem_free+0x30/0x80 [kvm_hv]
  [c000003fa7babd80] [c00800000ddcef18] kvmppc_book3s_exit_hv+0x20/0x78 [kvm_hv]
  [c000003fa7babda0] [c0000000002084d0] sys_delete_module+0x1d0/0x2c0
  [c000003fa7babe20] [c00000000000b9d0] system_call+0x5c/0x68
  Instruction dump:
  3fc2001b fb81ffe0 fba1ffe8 fbe1fff8 7c7f1b78 7c9c2378 3bde4560 7fc3f378
  f8010010 f821ff81 486249a1 60000000 <e93f0008> 7c7d1b78 712a0002 40820084
  ---[ end trace 5774ef4dc2c98279 ]---

So this patch checks if kvmppc_uvmem_init actually allocated anything
before running kvmppc_uvmem_free.

Fixes: ca9f4942 ("KVM: PPC: Book3S HV: Support for running secure guests")
Cc: stable@vger.kernel.org # v5.5+
Reported-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
Tested-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

9bee484b

KVM: VMX: Gracefully handle faults on VMXON · 4f6ea0a8

由 Sean Christopherson 提交于 3月 21, 2020

Gracefully handle faults on VMXON, e.g. #GP due to VMX being disabled by
BIOS, instead of letting the fault crash the system.  Now that KVM uses
cpufeatures to query support instead of reading MSR_IA32_FEAT_CTL
directly, it's possible for a bug in a different subsystem to cause KVM
to incorrectly attempt VMXON[*].  Crashing the system is especially
annoying if the system is configured such that hardware_enable() will
be triggered during boot.

Oppurtunistically rename @addr to @vmxon_pointer and use a named param
to reference it in the inline assembly.

Print 0xdeadbeef in the ultra-"rare" case that reading MSR_IA32_FEAT_CTL
also faults.

[*] https://lkml.kernel.org/r/20200226231615.13664-1-sean.j.christopherson@intel.comSigned-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-4-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f6ea0a8

KVM: VMX: Fold loaded_vmcs_init() into alloc_loaded_vmcs() · d260f9ef

由 Sean Christopherson 提交于 3月 21, 2020

Subsume loaded_vmcs_init() into alloc_loaded_vmcs(), its only remaining
caller, and drop the VMCLEAR on the shadow VMCS, which is guaranteed to
be NULL. loaded_vmcs_init() was previously used by loaded_vmcs_clear(),
but loaded_vmcs_clear() also subsumed loaded_vmcs_init() to properly
handle smp_wmb() with respect to VMCLEAR.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-3-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d260f9ef

KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support · 31603d4f

由 Sean Christopherson 提交于 3月 21, 2020

VMCLEAR all in-use VMCSes during a crash, even if kdump's NMI shootdown
interrupted a KVM update of the percpu in-use VMCS list.

Because NMIs are not blocked by disabling IRQs, it's possible that
crash_vmclear_local_loaded_vmcss() could be called while the percpu list
of VMCSes is being modified, e.g. in the middle of list_add() in
vmx_vcpu_load_vmcs(). This potential corner case was called out in the
original commit[*], but the analysis of its impact was wrong.

Skipping the VMCLEARs is wrong because it all but guarantees that a
loaded, and therefore cached, VMCS will live across kexec and corrupt
memory in the new kernel. Corruption will occur because the CPU's VMCS
cache is non-coherent, i.e. not snooped, and so the writeback of VMCS
memory on its eviction will overwrite random memory in the new kernel.
The VMCS will live because the NMI shootdown also disables VMX, i.e. the
in-progress VMCLEAR will #UD, and existing Intel CPUs do not flush the
VMCS cache on VMXOFF.

Furthermore, interrupting list_add() and list_del() is safe due to
crash_vmclear_local_loaded_vmcss() using forward iteration. list_add()
ensures the new entry is not visible to forward iteration unless the
entire add completes, via WRITE_ONCE(prev->next, new). A bad "prev"
pointer could be observed if the NMI shootdown interrupted list_del() or
list_add(), but list_for_each_entry() does not consume ->prev.

In addition to removing the temporary disabling of VMCLEAR, open code
loaded_vmcs_init() in __loaded_vmcs_clear() and reorder VMCLEAR so that
the VMCS is deleted from the list only after it's been VMCLEAR'd.
Deleting the VMCS before VMCLEAR would allow a race where the NMI
shootdown could arrive between list_del() and vmcs_clear() and thus
neither flow would execute a successful VMCLEAR. Alternatively, more
code could be moved into loaded_vmcs_init(), but that gets rather silly
as the only other user, alloc_loaded_vmcs(), doesn't need the smp_wmb()
and would need to work around the list_del().

Update the smp_*() comments related to the list manipulation, and
opportunistically reword them to improve clarity.

[*] https://patchwork.kernel.org/patch/1675731/#3720461

Fixes: 8f536b76 ("KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321193751.24985-2-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

31603d4f

KVM: x86: Expose fast short REP MOV for supported cpuid · e3747407

由 Zhenyu Wang 提交于 3月 23, 2020

For CPU supporting fast short REP MOV (XF86_FEATURE_FSRM) e.g Icelake,
Tigerlake, expose it in KVM supported cpuid as well.
Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Message-Id: <20200323092236.3703-1-zhenyuw@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e3747407

KVM: s390: mark sie block as 512 byte aligned · f3dd18d4

由 Christian Borntraeger 提交于 3月 11, 2020

The sie block must be aligned to 512 bytes. Mark it as such.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>

f3dd18d4

KVM: s390: Use fallthrough; · 3b684a42

由 Joe Perches 提交于 3月 10, 2020

Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
Link: https://lore.kernel.org/r/d63c86429f3e5aa806aa3e185c97d213904924a5.1583896348.git.joe@perches.com
[borntrager@de.ibm.com: Fix link to tool and subject]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

3b684a42

19 3月, 2020 8 次提交

KVM: PPC: Kill kvmppc_ops::mmu_destroy() and kvmppc_mmu_destroy() · 6fef0c6b

由 Greg Kurz 提交于 3月 18, 2020

These are only used by HV KVM and BookE, and in both cases they are
nops.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

6fef0c6b

KVM: PPC: Book3S PR: Move kvmppc_mmu_init() into PR KVM · 3f1268dd

由 Greg Kurz 提交于 3月 18, 2020

This is only relevant to PR KVM. Make it obvious by moving the
function declaration to the Book3s header and rename it with
a _pr suffix.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

3f1268dd

KVM: PPC: Book3S PR: Fix kernel crash with PR KVM · b2fa4f90

由 Greg Kurz 提交于 3月 18, 2020

With PR KVM, shutting down a VM causes the host kernel to crash:

[  314.219284] BUG: Unable to handle kernel data access on read at 0xc00800000176c638
[  314.219299] Faulting instruction address: 0xc008000000d4ddb0
cpu 0x0: Vector: 300 (Data Access) at [c00000036da077a0]
    pc: c008000000d4ddb0: kvmppc_mmu_pte_flush_all+0x68/0xd0 [kvm_pr]
    lr: c008000000d4dd94: kvmppc_mmu_pte_flush_all+0x4c/0xd0 [kvm_pr]
    sp: c00000036da07a30
   msr: 900000010280b033
   dar: c00800000176c638
 dsisr: 40000000
  current = 0xc00000036d4c0000
  paca    = 0xc000000001a00000   irqmask: 0x03   irq_happened: 0x01
    pid   = 1992, comm = qemu-system-ppc
Linux version 5.6.0-master-gku+ (greg@palmb) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #17 SMP Wed Mar 18 13:49:29 CET 2020
enter ? for help
[c00000036da07ab0] c008000000d4fbe0 kvmppc_mmu_destroy_pr+0x28/0x60 [kvm_pr]
[c00000036da07ae0] c0080000009eab8c kvmppc_mmu_destroy+0x34/0x50 [kvm]
[c00000036da07b00] c0080000009e50c0 kvm_arch_vcpu_destroy+0x108/0x140 [kvm]
[c00000036da07b30] c0080000009d1b50 kvm_vcpu_destroy+0x28/0x80 [kvm]
[c00000036da07b60] c0080000009e4434 kvm_arch_destroy_vm+0xbc/0x190 [kvm]
[c00000036da07ba0] c0080000009d9c2c kvm_put_kvm+0x1d4/0x3f0 [kvm]
[c00000036da07c00] c0080000009da760 kvm_vm_release+0x38/0x60 [kvm]
[c00000036da07c30] c000000000420be0 __fput+0xe0/0x310
[c00000036da07c90] c0000000001747a0 task_work_run+0x150/0x1c0
[c00000036da07cf0] c00000000014896c do_exit+0x44c/0xd00
[c00000036da07dc0] c0000000001492f4 do_group_exit+0x64/0xd0
[c00000036da07e00] c000000000149384 sys_exit_group+0x24/0x30
[c00000036da07e20] c00000000000b9d0 system_call+0x5c/0x68

This is caused by a use-after-free in kvmppc_mmu_pte_flush_all()
which dereferences vcpu->arch.book3s which was previously freed by
kvmppc_core_vcpu_free_pr(). This happens because kvmppc_mmu_destroy()
is called after kvmppc_core_vcpu_free() since commit ff030fdf
("KVM: PPC: Move kvm_vcpu_init() invocation to common code").

The kvmppc_mmu_destroy() helper calls one of the following depending
on the KVM backend:

- kvmppc_mmu_destroy_hv() which does nothing (Book3s HV)

- kvmppc_mmu_destroy_pr() which undoes the effects of
  kvmppc_mmu_init() (Book3s PR 32-bit)

- kvmppc_mmu_destroy_pr() which undoes the effects of
  kvmppc_mmu_init() (Book3s PR 64-bit)

- kvmppc_mmu_destroy_e500() which does nothing (BookE e500/e500mc)

It turns out that this is only relevant to PR KVM actually. And both
32 and 64 backends need vcpu->arch.book3s to be valid when calling
kvmppc_mmu_destroy_pr(). So instead of calling kvmppc_mmu_destroy()
from kvm_arch_vcpu_destroy(), call kvmppc_mmu_destroy_pr() at the
beginning of kvmppc_core_vcpu_free_pr(). This is consistent with
kvmppc_mmu_init() being the last call in kvmppc_core_vcpu_create_pr().

For the same reason, if kvmppc_core_vcpu_create_pr() returns an
error then this means that kvmppc_mmu_init() was either not called
or failed, in which case kvmppc_mmu_destroy() should not be called.
Drop the line in the error path of kvm_arch_vcpu_create().

Fixes: ff030fdf ("KVM: PPC: Move kvm_vcpu_init() invocation to common code")
Signed-off-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

b2fa4f90

KVM: PPC: Use fallthrough; · 8fc6ba0a

由 Joe Perches 提交于 3月 10, 2020

Convert the various uses of fallthrough comments to fallthrough;

Done via script
Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe.com/Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

8fc6ba0a

KVM: PPC: Book3S HV: Fix H_CEDE return code for nested guests · 1f50cc17

由 Michael Roth 提交于 3月 10, 2020

The h_cede_tm kvm-unit-test currently fails when run inside an L1 guest
via the guest/nested hypervisor.

  ./run-tests.sh -v
  ...
  TESTNAME=h_cede_tm TIMEOUT=90s ACCEL= ./powerpc/run powerpc/tm.elf -smp 2,threads=2 -machine cap-htm=on -append "h_cede_tm"
  FAIL h_cede_tm (2 tests, 1 unexpected failures)

While the test relates to transactional memory instructions, the actual
failure is due to the return code of the H_CEDE hypercall, which is
reported as 224 instead of 0. This happens even when no TM instructions
are issued.

224 is the value placed in r3 to execute a hypercall for H_CEDE, and r3
is where the caller expects the return code to be placed upon return.

In the case of guest running under a nested hypervisor, issuing H_CEDE
causes a return from H_ENTER_NESTED. In this case H_CEDE is
specially-handled immediately rather than later in
kvmppc_pseries_do_hcall() as with most other hcalls, but we forget to
set the return code for the caller, hence why kvm-unit-test sees the
224 return code and reports an error.

Guest kernels generally don't check the return value of H_CEDE, so
that likely explains why this hasn't caused issues outside of
kvm-unit-tests so far.

Fix this by setting r3 to 0 after we finish processing the H_CEDE.

RHBZ: 1778556

Fixes: 4bad7779 ("KVM: PPC: Book3S HV: Handle hypercalls correctly when nested")
Cc: linuxppc-dev@ozlabs.org
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

1f50cc17

KVM: PPC: Book3S HV: Treat TM-related invalid form instructions on P9 like the valid ones · 1dff3064

由 Gustavo Romero 提交于 2月 21, 2020

On P9 DD2.2 due to a CPU defect some TM instructions need to be emulated by
KVM. This is handled at first by the hardware raising a softpatch interrupt
when certain TM instructions that need KVM assistance are executed in the
guest. Althought some TM instructions per Power ISA are invalid forms they
can raise a softpatch interrupt too. For instance, 'tresume.' instruction
as defined in the ISA must have bit 31 set (1), but an instruction that
matches 'tresume.' PO and XO opcode fields but has bit 31 not set (0), like
0x7cfe9ddc, also raises a softpatch interrupt. Similarly for 'treclaim.'
and 'trechkpt.' instructions with bit 31 = 0, i.e. 0x7c00075c and
0x7c0007dc, respectively. Hence, if a code like the following is executed
in the guest it will raise a softpatch interrupt just like a 'tresume.'
when the TM facility is enabled ('tabort. 0' in the example is used only
to enable the TM facility):

int main() { asm("tabort. 0; .long 0x7cfe9ddc;"); }

Currently in such a case KVM throws a complete trace like:

[345523.705984] WARNING: CPU: 24 PID: 64413 at arch/powerpc/kvm/book3s_hv_tm.c:211 kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.705985] Modules linked in: kvm_hv(E) xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat
iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ebtable_filter ebtables ip6table_filter
ip6_tables iptable_filter bridge stp llc sch_fq_codel ipmi_powernv at24 vmx_crypto ipmi_devintf ipmi_msghandler
ibmpowernv uio_pdrv_genirq kvm opal_prd uio leds_powernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx libcrc32c xor raid6_pq raid1 raid0 multipath linear tg3
crct10dif_vpmsum crc32c_vpmsum ipr [last unloaded: kvm_hv]
[345523.706030] CPU: 24 PID: 64413 Comm: CPU 0/KVM Tainted: G        W   E     5.5.0+ #1
[345523.706031] NIP:  c0080000072cb9c0 LR: c0080000072b5e80 CTR: c0080000085c7850
[345523.706034] REGS: c000000399467680 TRAP: 0700   Tainted: G        W   E      (5.5.0+)
[345523.706034] MSR:  900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR: 24022428  XER: 00000000
[345523.706042] CFAR: c0080000072b5e7c IRQMASK: 0
                GPR00: c0080000072b5e80 c000000399467910 c0080000072db500 c000000375ccc720
                GPR04: c000000375ccc720 00000003fbec0000 0000a10395dda5a6 0000000000000000
                GPR08: 000000007cfe9ddc 7cfe9ddc000005dc 7cfe9ddc7c0005dc c0080000072cd530
                GPR12: c0080000085c7850 c0000003fffeb800 0000000000000001 00007dfb737f0000
                GPR16: c0002001edcca558 0000000000000000 0000000000000000 0000000000000001
                GPR20: c000000001b21258 c0002001edcca558 0000000000000018 0000000000000000
                GPR24: 0000000001000000 ffffffffffffffff 0000000000000001 0000000000001500
                GPR28: c0002001edcc4278 c00000037dd80000 800000050280f033 c000000375ccc720
[345523.706062] NIP [c0080000072cb9c0] kvmhv_p9_tm_emulation+0x68/0x620 [kvm_hv]
[345523.706065] LR [c0080000072b5e80] kvmppc_handle_exit_hv.isra.53+0x3e8/0x798 [kvm_hv]
[345523.706066] Call Trace:
[345523.706069] [c000000399467910] [c000000399467940] 0xc000000399467940 (unreliable)
[345523.706071] [c000000399467950] [c000000399467980] 0xc000000399467980
[345523.706075] [c0000003994679f0] [c0080000072bd1c4] kvmhv_run_single_vcpu+0xa1c/0xb80 [kvm_hv]
[345523.706079] [c000000399467ac0] [c0080000072bd8e0] kvmppc_vcpu_run_hv+0x5b8/0xb00 [kvm_hv]
[345523.706087] [c000000399467b90] [c0080000085c93cc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[345523.706095] [c000000399467bb0] [c0080000085c582c] kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm]
[345523.706101] [c000000399467c40] [c0080000085b7498] kvm_vcpu_ioctl+0x3d0/0x7b0 [kvm]
[345523.706105] [c000000399467db0] [c0000000004adf9c] ksys_ioctl+0x13c/0x170
[345523.706107] [c000000399467e00] [c0000000004adff8] sys_ioctl+0x28/0x80
[345523.706111] [c000000399467e20] [c00000000000b278] system_call+0x5c/0x68
[345523.706112] Instruction dump:
[345523.706114] 419e0390 7f8a4840 409d0048 6d497c00 2f89075d 419e021c 6d497c00 2f8907dd
[345523.706119] 419e01c0 6d497c00 2f8905dd 419e00a4 <0fe00000> 38210040 38600000 ebc1fff0

and then treats the executed instruction as a 'nop'.

However the POWER9 User's Manual, in section "4.6.10 Book II Invalid
Forms", informs that for TM instructions bit 31 is in fact ignored, thus
for the TM-related invalid forms ignoring bit 31 and handling them like the
valid forms is an acceptable way to handle them. POWER8 behaves the same
way too.

This commit changes the handling of the cases here described by treating
the TM-related invalid forms that can generate a softpatch interrupt
just like their valid forms (w/ bit 31 = 1) instead of as a 'nop' and by
gently reporting any other unrecognized case to the host and treating it as
illegal instruction instead of throwing a trace and treating it as a 'nop'.
Signed-off-by: NGustavo Romero <gromero@linux.ibm.com>
Reviewed-by: NSegher Boessenkool <segher@kernel.crashing.org>
Acked-By: NMichael Neuling <mikey@neuling.org>
Reviewed-by: NLeonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

1dff3064

KVM: PPC: Book3S HV: Use RADIX_PTE_INDEX_SIZE in Radix MMU code · afd31356

由 Michael Ellerman 提交于 2月 18, 2020

In kvmppc_unmap_free_pte() in book3s_64_mmu_radix.c, we use the
non-constant value PTE_INDEX_SIZE to clear a PTE page.

We can instead use the constant RADIX_PTE_INDEX_SIZE, because we know
this code will only be running when the Radix MMU is active.

Note that we already use RADIX_PTE_INDEX_SIZE for the allocation of
kvm_pte_cache.
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Reviewed-by: NLeonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

afd31356

KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot in HPT page fault handler · cd758a9b

由 Paul Mackerras 提交于 8月 26, 2019

This makes the same changes in the page fault handler for HPT guests
that commits 31c8b0d0 ("KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot()
in page fault handler", 2018-03-01), 71d29f43 ("KVM: PPC: Book3S HV:
Don't use compound_order to determine host mapping size", 2018-09-11)
and 6579804c ("KVM: PPC: Book3S HV: Avoid crash from THP collapse
during radix page fault", 2018-10-04) made for the page fault handler
for radix guests.

In summary, where we used to call get_user_pages_fast() and then do
special handling for VM_PFNMAP vmas, we now call __get_user_pages_fast()
and then __gfn_to_pfn_memslot() if that fails, followed by reading the
Linux PTE to get the host PFN, host page size and mapping attributes.

This also brings in the change from SetPageDirty() to set_page_dirty_lock()
which was done for the radix page fault handler in commit c3856aeb
("KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault
handler", 2018-02-23).
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cd758a9b

18 3月, 2020 6 次提交

KVM: x86: Code style cleanup in kvm_arch_dev_ioctl() · cf6c26ec

由 Xiaoyao Li 提交于 2月 29, 2020

In kvm_arch_dev_ioctl(), the brackets of case KVM_X86_GET_MCE_CAP_SUPPORTED
accidently encapsulates case KVM_GET_MSR_FEATURE_INDEX_LIST and case
KVM_GET_MSRS. It doesn't affect functionality but it's misleading.

Remove unnecessary brackets and opportunistically add a "break" in the
default path.
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf6c26ec

KVM: x86: Add blurb to CPUID tracepoint when using max basic leaf values · 2b110b61

由 Sean Christopherson 提交于 3月 17, 2020

Tack on "used max basic" at the end of the CPUID tracepoint when the
output values correspond to the max basic leaf, i.e. when emulating
Intel's out-of-range CPUID behavior.  Observing "cpuid entry not found"
in the tracepoint with non-zero output values is confusing for users
that aren't familiar with the out-of-range semantics, and qualifying the
"not found" case hopefully makes it clear that "found" means "found the
exact entry".
Suggested-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b110b61

KVM: x86: Add requested index to the CPUID tracepoint · e7adda28

由 Sean Christopherson 提交于 3月 17, 2020

Output the requested index when tracing CPUID emulation; it's basically
mandatory for leafs where the index is meaningful, and is helpful for
verifying KVM correctness even when the index isn't meaningful, e.g. the
trace for a Linux guest's hypervisor_cpuid_base() probing appears to
be broken (returns all zeroes) at first glance, but is correct because
the index is non-zero, i.e. the output values correspond to a random
index in the maximum basic leaf.
Suggested-by: NXiaoyao Li <xiaoyao.li@intel.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e7adda28

KVM: nSVM: check for EFER.SVME=1 before entering guest · d55c9d40

由 Paolo Bonzini 提交于 3月 18, 2020

EFER is set for L2 using svm_set_efer, which hardcodes EFER_SVME to 1 and hides
an incorrect value for EFER.SVME in the L1 VMCB.  Perform the check manually
to detect invalid guest state.
Reported-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d55c9d40

KVM: x86: Expose AVX512 VP2INTERSECT in cpuid for TGL · 9401f2e5

由 Zhenyu Wang 提交于 3月 18, 2020

On Tigerlake new AVX512 VP2INTERSECT feature is available.
This allows to expose it via KVM_GET_SUPPORTED_CPUID.

Cc: "Zhong, Yang" <yang.zhong@intel.com>
Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9401f2e5

KVM: nVMX: remove side effects from nested_vmx_exit_reflected · 96b100cd

由 Paolo Bonzini 提交于 3月 17, 2020

The name of nested_vmx_exit_reflected suggests that it's purely
a test, but it actually marks VMCS12 pages as dirty.  Move this to
vmx_handle_exit, observing that the initial nested_run_pending check in
nested_vmx_exit_reflected is pointless---nested_run_pending has just
been cleared in vmx_vcpu_run and won't be set until handle_vmlaunch
or handle_vmresume.
Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96b100cd

17 3月, 2020 3 次提交

KVM: VMX: access regs array in vmenter.S in its natural order · bb03911f

由 Uros Bizjak 提交于 3月 10, 2020

Registers in "regs" array are indexed as rax/rcx/rdx/.../rsi/rdi/r8/...
Reorder access to "regs" array in vmenter.S to follow its natural order.
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb03911f

KVM: nVMX: properly handle errors in nested_vmx_handle_enlightened_vmptrld() · b6a0653a

由 Vitaly Kuznetsov 提交于 3月 09, 2020

nested_vmx_handle_enlightened_vmptrld() fails in two cases:
- when we fail to kvm_vcpu_map() the supplied GPA
- when revision_id is incorrect.
Genuine Hyper-V raises #UD in the former case (at least with *some*
incorrect GPAs) and does VMfailInvalid() in the later. KVM doesn't do
anything so L1 just gets stuck retrying the same faulty VMLAUNCH.

nested_vmx_handle_enlightened_vmptrld() has two call sites:
nested_vmx_run() and nested_get_vmcs12_pages(). The former needs to queue
do much: the failure there happens after migration when L2 was running (and
L1 did something weird like wrote to VP assist page from a different vCPU),
just kill L1 with KVM_EXIT_INTERNAL_ERROR.
Reported-by: NMiaohe Lin <linmiaohe@huawei.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
[Squash kbuild autopatch. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6a0653a

KVM: nVMX: stop abusing need_vmcs12_to_shadow_sync for eVMCS mapping · e942dbf8

由 Vitaly Kuznetsov 提交于 3月 09, 2020

When vmx_set_nested_state() happens, we may not have all the required
data to map enlightened VMCS: e.g. HV_X64_MSR_VP_ASSIST_PAGE MSR may not
yet be restored so we need a postponed action. Currently, we (ab)use
need_vmcs12_to_shadow_sync/nested_sync_vmcs12_to_shadow() for that but
this is not ideal:
- We may not need to sync anything if L2 is running
- It is hard to propagate errors from nested_sync_vmcs12_to_shadow()
 as we call it from vmx_prepare_switch_to_guest() which happens just
 before we do VMLAUNCH, the code is not ready to handle errors there.

Move eVMCS mapping to nested_get_vmcs12_pages() and request
KVM_REQ_GET_VMCS12_PAGES, it seems to be is less abusive in nature.
It would probably be possible to introduce a specialized KVM_REQ_EVMCS_MAP
but it is undesirable to propagate eVMCS specifics all the way up to x86.c

Note, we don't need to request KVM_REQ_GET_VMCS12_PAGES from
vmx_set_nested_state() directly as nested_vmx_enter_non_root_mode() already
does that. Requesting KVM_REQ_GET_VMCS12_PAGES is done to document the
(non-obvious) side-effect and to be future proof.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e942dbf8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功