提交 · b9757a4b6f46560cdd65afdb0e44bc06b7e60340 · openeuler / Kernel

28 9月, 2020 3 次提交

KVM: nVMX: Simplify the initialization of nested_vmx_msrs · b9757a4b

由 Chenyi Qiang 提交于 4年前

The nested VMX controls MSRs can be initialized by the global capability
values stored in vmcs_config.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200828085622.8365-6-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9757a4b

KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled · efc83133

由 Chenyi Qiang 提交于 4年前

KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they are not exposed by the
system ioctl KVM_GET_MSR.  Add them to the setup of nested VMX controls MSR.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20200828085622.8365-2-chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

efc83133

KVM: nSVM: Avoid freeing uninitialized pointers in svm_set_nested_state() · d5cd6f34

由 Vitaly Kuznetsov 提交于 4年前

The save and ctl pointers are passed uninitialized to kfree() when
svm_set_nested_state() follows the 'goto out_set_gif' path. While the
issue could've been fixed by initializing these on-stack varialbles to
NULL, it seems preferable to eliminate 'out_set_gif' label completely as
it is not actually a failure path and duplicating a single svm_set_gif()
call doesn't look too bad.

 [ bp: Drop obscure Addresses-Coverity: tag. ]

Fixes: 6ccbd29a ("KVM: SVM: nested: Don't allocate VMCB structures on stack")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reported-by: NJoerg Roedel <jroedel@suse.de>
Reported-by: NColin King <colin.king@canonical.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NTom Lendacky <thomas.lendacky@amd.com>
Link: https://lkml.kernel.org/r/20200914133725.650221-1-vkuznets@redhat.comSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5cd6f34

22 9月, 2020 5 次提交

Merge tag 'kvm-ppc-next-5.10-1' of... · 2e3df760

由 Paolo Bonzini 提交于 4年前

Merge tag 'kvm-ppc-next-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD

PPC KVM update for 5.10

- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixes

2e3df760

Merge branch 'x86-seves-for-paolo' of... · bf3c0e5e

由 Paolo Bonzini 提交于 4年前

Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD

bf3c0e5e

KVM: PPC: Book3S: Fix symbol undeclared warnings · cf59eb13

由 Wang Wensheng 提交于 4年前

Build the kernel with `C=2`:
arch/powerpc/kvm/book3s_hv_nested.c:572:25: warning: symbol
'kvmhv_alloc_nested' was not declared. Should it be static?
arch/powerpc/kvm/book3s_64_mmu_radix.c:350:6: warning: symbol
'kvmppc_radix_set_pte_at' was not declared. Should it be static?
arch/powerpc/kvm/book3s_hv.c:3568:5: warning: symbol
'kvmhv_p9_guest_entry' was not declared. Should it be static?
arch/powerpc/kvm/book3s_hv_rm_xics.c:767:15: warning: symbol 'eoi_rc'
was not declared. Should it be static?
arch/powerpc/kvm/book3s_64_vio_hv.c:240:13: warning: symbol
'iommu_tce_kill_rm' was not declared. Should it be static?
arch/powerpc/kvm/book3s_64_vio.c:492:6: warning: symbol
'kvmppc_tce_iommu_do_map' was not declared. Should it be static?
arch/powerpc/kvm/book3s_pr.c:572:6: warning: symbol 'kvmppc_set_pvr_pr'
was not declared. Should it be static?

Those symbols are used only in the files that define them so make them
static to fix the warnings.
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cf59eb13

KVM: PPC: Book3S: Remove redundant initialization of variable ret · eb173559

由 Jing Xiangfeng 提交于 4年前

The variable ret is being initialized with '-ENOMEM' that is meaningless.
So remove it.
Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

eb173559

KVM: PPC: Book3S HV: XIVE: Convert to DEFINE_SHOW_ATTRIBUTE · 45170766

由 Qinglang Miao 提交于 4年前

Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

45170766

21 9月, 2020 3 次提交

Merge tag 'kvm-s390-master-5.9-1' of... · 32251b07

由 Paolo Bonzini 提交于 4年前

Merge tag 'kvm-s390-master-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master

KVM: s390: add documentation for KVM_CAP_S390_DIAG318

diag318 code was merged in 5.9-rc1, let us add some
missing documentation

32251b07

Merge tag 'kvmarm-fixes-5.9-2' of... · b73815a1

由 Paolo Bonzini 提交于 4年前

Merge tag 'kvmarm-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

KVM/arm64 fixes for 5.9, take #2

- Fix handling of S1 Page Table Walk permission fault at S2
  on instruction fetch
- Cleanup kvm_vcpu_dabt_iswrite()

b73815a1

Revert "KVM: Check the allocation of pv cpu mask" · 7d1f8691

由 Vitaly Kuznetsov 提交于 4年前

The commit 0f990222 ("KVM: Check the allocation of pv cpu mask") we
have in 5.9-rc5 has two issue:
1) Compilation fails for !CONFIG_SMP, see:
   https://bugzilla.kernel.org/show_bug.cgi?id=209285

2) This commit completely disables PV TLB flush, see
   https://lore.kernel.org/kvm/87y2lrnnyf.fsf@vitty.brq.redhat.com/

The allocation problem is likely a theoretical one, if we don't
have memory that early in boot process we're likely doomed anyway.
Let's solve it properly later.

This reverts commit 0f990222.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d1f8691

19 9月, 2020 2 次提交

KVM: arm64: Remove S1PTW check from kvm_vcpu_dabt_iswrite() · 620cf45f

由 Marc Zyngier 提交于 4年前

Now that kvm_vcpu_trap_is_write_fault() checks for S1PTW, there
is no need for kvm_vcpu_dabt_iswrite() to do the same thing, as
we already check for this condition on all existing paths.

Drop the check and add a comment instead.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200915104218.1284701-3-maz@kernel.org

620cf45f

KVM: arm64: Assume write fault on S1PTW permission fault on instruction fetch · c4ad98e4

由 Marc Zyngier 提交于 4年前

KVM currently assumes that an instruction abort can never be a write.
This is in general true, except when the abort is triggered by
a S1PTW on instruction fetch that tries to update the S1 page tables
(to set AF, for example).

This can happen if the page tables have been paged out and brought
back in without seeing a direct write to them (they are thus marked
read only), and the fault handling code will make the PT executable(!)
instead of writable. The guest gets stuck forever.

In these conditions, the permission fault must be considered as
a write so that the Stage-1 update can take place. This is essentially
the I-side equivalent of the problem fixed by 60e21a0e ("arm64: KVM:
Take S1 walks into account when determining S2 write faults").

Update kvm_is_write_fault() to return true on IABT+S1PTW, and introduce
kvm_vcpu_trap_is_exec_fault() that only return true when no faulting
on a S1 fault. Additionally, kvm_vcpu_dabt_iss1tw() is renamed to
kvm_vcpu_abt_iss1tw(), as the above makes it plain that it isn't
specific to data abort.
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NWill Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200915104218.1284701-2-maz@kernel.org

c4ad98e4

17 9月, 2020 3 次提交

KVM: PPC: Book3S HV: Set LPCR[HDICE] before writing HDEC · 35dfb43c

由 Paul Mackerras 提交于 4年前

POWER8 and POWER9 machines have a hardware deviation where generation
of a hypervisor decrementer exception is suppressed if the HDICE bit
in the LPCR register is 0 at the time when the HDEC register
decrements from 0 to -1.  When entering a guest, KVM first writes the
HDEC register with the time until it wants the CPU to exit the guest,
and then writes the LPCR with the guest value, which includes
HDICE = 1.  If HDEC decrements from 0 to -1 during the interval
between those two events, it is possible that we can enter the guest
with HDEC already negative but no HDEC exception pending, meaning that
no HDEC interrupt will occur while the CPU is in the guest, or at
least not until HDEC wraps around.  Thus it is possible for the CPU to
keep executing in the guest for a long time; up to about 4 seconds on
POWER8, or about 4.46 years on POWER9 (except that the host kernel
hard lockup detector will fire first).

To fix this, we set the LPCR[HDICE] bit before writing HDEC on guest
entry.
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

35dfb43c

KVM: PPC: Book3S HV: Do not allocate HPT for a nested guest · 05e6295d

由 Fabiano Rosas 提交于 4年前

The current nested KVM code does not support HPT guests. This is
informed/enforced in some ways:

- Hosts < P9 will not be able to enable the nested HV feature;

- The nested hypervisor MMU capabilities will not contain
  KVM_CAP_PPC_MMU_HASH_V3;

- QEMU reflects the MMU capabilities in the
  'ibm,arch-vec-5-platform-support' device-tree property;

- The nested guest, at 'prom_parse_mmu_model' ignores the
  'disable_radix' kernel command line option if HPT is not supported;

- The KVM_PPC_CONFIGURE_V3_MMU ioctl will fail if trying to use HPT.

There is, however, still a way to start a HPT guest by using
max-compat-cpu=power8 at the QEMU machine options. This leads to the
guest being set to use hash after QEMU calls the KVM_PPC_ALLOCATE_HTAB
ioctl.

With the guest set to hash, the nested hypervisor goes through the
entry path that has no knowledge of nesting (kvmppc_run_vcpu) and
crashes when it tries to execute an hypervisor-privileged (mtspr
HDEC) instruction at __kvmppc_vcore_entry:

root@L1:~ $ qemu-system-ppc64 -machine pseries,max-cpu-compat=power8 ...

<snip>
[  538.543303] CPU: 83 PID: 25185 Comm: CPU 0/KVM Not tainted 5.9.0-rc4 #1
[  538.543355] NIP:  c00800000753f388 LR: c00800000753f368 CTR: c0000000001e5ec0
[  538.543417] REGS: c0000013e91e33b0 TRAP: 0700   Not tainted  (5.9.0-rc4)
[  538.543470] MSR:  8000000002843033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 22422882  XER: 20040000
[  538.543546] CFAR: c00800000753f4b0 IRQMASK: 3
               GPR00: c0080000075397a0 c0000013e91e3640 c00800000755e600 0000000080000000
               GPR04: 0000000000000000 c0000013eab19800 c000001394de0000 00000043a054db72
               GPR08: 00000000003b1652 0000000000000000 0000000000000000 c0080000075502e0
               GPR12: c0000000001e5ec0 c0000007ffa74200 c0000013eab19800 0000000000000008
               GPR16: 0000000000000000 c00000139676c6c0 c000000001d23948 c0000013e91e38b8
               GPR20: 0000000000000053 0000000000000000 0000000000000001 0000000000000000
               GPR24: 0000000000000001 0000000000000001 0000000000000000 0000000000000001
               GPR28: 0000000000000001 0000000000000053 c0000013eab19800 0000000000000001
[  538.544067] NIP [c00800000753f388] __kvmppc_vcore_entry+0x90/0x104 [kvm_hv]
[  538.544121] LR [c00800000753f368] __kvmppc_vcore_entry+0x70/0x104 [kvm_hv]
[  538.544173] Call Trace:
[  538.544196] [c0000013e91e3640] [c0000013e91e3680] 0xc0000013e91e3680 (unreliable)
[  538.544260] [c0000013e91e3820] [c0080000075397a0] kvmppc_run_core+0xbc8/0x19d0 [kvm_hv]
[  538.544325] [c0000013e91e39e0] [c00800000753d99c] kvmppc_vcpu_run_hv+0x404/0xc00 [kvm_hv]
[  538.544394] [c0000013e91e3ad0] [c0080000072da4fc] kvmppc_vcpu_run+0x34/0x48 [kvm]
[  538.544472] [c0000013e91e3af0] [c0080000072d61b8] kvm_arch_vcpu_ioctl_run+0x310/0x420 [kvm]
[  538.544539] [c0000013e91e3b80] [c0080000072c7450] kvm_vcpu_ioctl+0x298/0x778 [kvm]
[  538.544605] [c0000013e91e3ce0] [c0000000004b8c2c] sys_ioctl+0x1dc/0xc90
[  538.544662] [c0000013e91e3dc0] [c00000000002f9a4] system_call_exception+0xe4/0x1c0
[  538.544726] [c0000013e91e3e20] [c00000000000d140] system_call_common+0xf0/0x27c
[  538.544787] Instruction dump:
[  538.544821] f86d1098 60000000 60000000 48000099 e8ad0fe8 e8c500a0 e9264140 75290002
[  538.544886] 7d1602a6 7cec42a6 40820008 7d0807b4 <7d164ba6> 7d083a14 f90d10a0 480104fd
[  538.544953] ---[ end trace 74423e2b948c2e0c ]---

This patch makes the KVM_PPC_ALLOCATE_HTAB ioctl fail when running in
the nested hypervisor, causing QEMU to abort.
Reported-by: NSatheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

05e6295d

KVM: PPC: Don't return -ENOTSUPP to userspace in ioctls · 4e1b2ab7

由 Greg Kurz 提交于 4年前

ENOTSUPP is a linux only thingy, the value of which is unknown to
userspace, not to be confused with ENOTSUP which linux maps to
EOPNOTSUPP, as permitted by POSIX [1]:

[EOPNOTSUPP]
Operation not supported on socket. The type of socket (address family
or protocol) does not support the requested operation. A conforming
implementation may assign the same values for [EOPNOTSUPP] and [ENOTSUP].

Return -EOPNOTSUPP instead of -ENOTSUPP for the following ioctls:
- KVM_GET_FPU for Book3s and BookE
- KVM_SET_FPU for Book3s and BookE
- KVM_GET_DIRTY_LOG for BookE

This doesn't affect QEMU which doesn't call the KVM_GET_FPU and
KVM_SET_FPU ioctls on POWER anyway since they are not supported,
and _buggily_ ignores anything but -EPERM for KVM_GET_DIRTY_LOG.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.htmlSigned-off-by: NGreg Kurz <groug@kaod.org>
Acked-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

4e1b2ab7

14 9月, 2020 1 次提交

docs: kvm: add documentation for KVM_CAP_S390_DIAG318 · f20d4e92

由 Collin Walling 提交于 4年前

Documentation for the s390 DIAGNOSE 0x318 instruction handling.
Signed-off-by: NCollin Walling <walling@linux.ibm.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/kvm/20200625150724.10021-2-walling@linux.ibm.com/
Message-Id: <20200625150724.10021-2-walling@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

f20d4e92

13 9月, 2020 4 次提交

KVM: emulator: more strict rsm checks. · 37f66bbe

由 Maxim Levitsky 提交于 4年前

Don't ignore return values in rsm_load_state_64/32 to avoid
loading invalid state from SMM state area if it was tampered with
by the guest.

This is primarly intended to avoid letting guest set bits in EFER
(like EFER.SVME when nesting is disabled) by manipulating SMM save area.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827171145.374620-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37f66bbe

KVM: nSVM: more strict SMM checks when returning to nested guest · 3ebb5d26

由 Maxim Levitsky 提交于 4年前

* check that guest is 64 bit guest, otherwise the SVM related fields
  in the smm state area are not defined

* If the SMM area indicates that SMM interrupted a running guest,
  check that EFER.SVME which is also saved in this area is set, otherwise
  the guest might have tampered with SMM save area, and so indicate
  emulation failure which should triple fault the guest.

* Check that that guest CPUID supports SVM (due to the same issue as above)
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ebb5d26

SVM: nSVM: setup nested msr permission bitmap on nested state load · 772b81bb

由 Maxim Levitsky 提交于 4年前

This code was missing and was forcing the L2 run with L1's msr
permission bitmap
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

772b81bb

SVM: nSVM: correctly restore GIF on vmexit from nesting after migration · 9883764a

由 Maxim Levitsky 提交于 4年前

Currently code in svm_set_nested_state copies the current vmcb control
area to L1 control area (hsave->control), under assumption that
it mostly reflects the defaults that kvm choose, and later qemu
overrides  these defaults with L2 state using standard KVM interfaces,
like KVM_SET_REGS.

However nested GIF (which is AMD specific thing) is by default is true,
and it is copied to hsave area as such.

This alone is not a big deal since on VMexit, GIF is always set to false,
regardless of what it was on VM entry.  However in nested_svm_vmexit we
were first were setting GIF to false, but then we overwrite the control
fields with value from the hsave area.  (including the nested GIF field
itself if GIF virtualization is enabled).

Now on normal vm entry this is not a problem, since GIF is usually false
prior to normal vm entry, and this is the value that copied to hsave,
and then restored, but this is not always the case when the nested state
is loaded as explained above.

To fix this issue, move svm_set_gif after we restore the L1 control
state in nested_svm_vmexit, so that even with wrong GIF in the
saved L1 control area, we still clear GIF as the spec says.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9883764a

12 9月, 2020 13 次提交

x86/kvm: don't forget to ACK async PF IRQ · cc17b225

由 Vitaly Kuznetsov 提交于 4年前

Merge commit 26d05b36 ("Merge branch 'kvm-async-pf-int' into HEAD")
tried to adapt the new interrupt based async PF mechanism to the newly
introduced IDTENTRY magic but unfortunately it missed the fact that
DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and
all DEFINE_IDTENTRY_SYSVEC() users have to call it manually.

As the result all multi-CPU KVM guest hang on boot when
KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no
KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism
is currently disabled) but we're about to change that.

Fixes: 26d05b36 ("Merge branch 'kvm-async-pf-int' into HEAD")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200908135350.355053-3-vkuznets@redhat.com>
Tested-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc17b225

x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro · 244081f9

由 Vitaly Kuznetsov 提交于 4年前

DEFINE_IDTENTRY_SYSVEC() already contains irqentry_enter()/
irqentry_exit().
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200908135350.355053-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

244081f9

KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit · 99b82a14

由 Wanpeng Li 提交于 4年前

According to SDM 27.2.4, Event delivery causes an APIC-access VM exit.
Don't report internal error and freeze guest when event delivery causes
an APIC-access exit, it is handleable and the event will be re-injected
during the next vmentry.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1597827327-25055-2-git-send-email-wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

99b82a14

KVM: SVM: avoid emulation with stale next_rip · e42c6828

由 Wanpeng Li 提交于 4年前

svm->next_rip is reset in svm_vcpu_run() only after calling
svm_exit_handlers_fastpath(), which will cause SVM's
skip_emulated_instruction() to write a stale RIP.

We can move svm_exit_handlers_fastpath towards the end of
svm_vcpu_run().  To align VMX with SVM, keep svm_complete_interrupts()
close as well.
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Cc: Paul K. <kronenpj@kronenpj.dyndns.org>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
[Also move vmcb_mark_all_clean before any possible write to the VMCB.
 - Paolo]

e42c6828

KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN · d831de17

由 Vitaly Kuznetsov 提交于 4年前

Even without in-kernel LAPIC we should allow writing '0' to
MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In
particular, QEMU with 'kernel-irqchip=off' fails to start
a guest with

qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0

Fixes: 9d3c447c ("KVM: X86: Fix async pf caused null-ptr-deref")
Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200911093147.484565-1-vkuznets@redhat.com>
[Actually commit the version proposed by Sean Christopherson. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d831de17

KVM: SVM: Periodically schedule when unregistering regions on destroy · 7be74942

由 David Rientjes 提交于 4年前

There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed.  This can lead to soft lockups.  For example, on a
host running 4.15:

watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
CPU: 206 PID: 194348 Comm: t_virtual_machi
RIP: 0010:free_unref_page_list+0x105/0x170
...
Call Trace:
 [<0>] release_pages+0x159/0x3d0
 [<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
 [<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
 [<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
 [<0>] kvm_arch_destroy_vm+0x47/0x200
 [<0>] kvm_put_kvm+0x1a8/0x2f0
 [<0>] kvm_vm_release+0x25/0x30
 [<0>] do_exit+0x335/0xc10
 [<0>] do_group_exit+0x3f/0xa0
 [<0>] get_signal+0x1bc/0x670
 [<0>] do_signal+0x31/0x130

Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.

Periodically schedule if necessary.  This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7be74942

KVM: MIPS: Change the definition of kvm type · 15e9e35c

由 Huacai Chen 提交于 4年前

MIPS defines two kvm types:

 #define KVM_VM_MIPS_TE          0
 #define KVM_VM_MIPS_VZ          1

In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.

I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html

And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html

So I define like this:

 #define KVM_VM_MIPS_AUTO        0
 #define KVM_VM_MIPS_VZ          1
 #define KVM_VM_MIPS_TE          2

Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.
Signed-off-by: NHuacai Chen <chenhc@lemote.com>
Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

15e9e35c

kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed · f6f6195b

由 Lai Jiangshan 提交于 4年前

When kvm_mmu_get_page() gets a page with unsynced children, the spt
pagetable is unsynchronized with the guest pagetable. But the
guest might not issue a "flush" operation on it when the pagetable
entry is changed from zero or other cases. The hypervisor has the
responsibility to synchronize the pagetables.

KVM behaved as above for many years, But commit 8c8560b8
("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes")
inadvertently included a line of code to change it without giving any
reason in the changelog. It is clear that the commit's intention was to
change KVM_REQ_TLB_FLUSH -> KVM_REQ_TLB_FLUSH_CURRENT, so we don't
needlessly flush other contexts; however, one of the hunks changed
a nearby KVM_REQ_MMU_SYNC instead.  This patch changes it back.

Link: https://lore.kernel.org/lkml/20200320212833.3507-26-sean.j.christopherson@intel.com/
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20200902135421.31158-1-jiangshanlai@gmail.com>
fixes: 8c8560b8 ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes")
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6f6195b

KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control · c6b177a3

由 Chenyi Qiang 提交于 4年前

A minor fix for the update of VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL field
in exit_ctls_high.

Fixes: 03a8871a ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
VM-{Entry,Exit} control")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200828085622.8365-5-chenyi.qiang@intel.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6b177a3

KVM: fix memory leak in kvm_io_bus_unregister_dev() · f6588660

由 Rustam Kovhaev 提交于 4年前

when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them

Fixes: 90db1043 ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6588660

KVM: Check the allocation of pv cpu mask · 0f990222

由 Haiwei Li 提交于 4年前

check the allocation of per-cpu __pv_cpu_mask. Initialize ops only when
successful.
Signed-off-by: NHaiwei Li <lihaiwei@tencent.com>
Message-Id: <d59f05df-e6d3-3d31-a036-cc25a2b2f33f@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f990222

KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected · 43fea4e4

由 Peter Shier 提交于 4年前

When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
load_pdptrs to read the possibly updated PDPTEs from the guest
physical address referenced by CR3.  It loads them into
vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
vcpu->arch.regs_dirty.

At the subsequent assumed reentry into L2, the mmu will call
vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
if the L2 CRn write intercept always resumes L2.

The resume path calls vmx_check_nested_events which checks for
exceptions, MTF, and expired VMX preemption timers. If
vmx_check_nested_events finds any of these conditions pending it will
reflect the corresponding exit into L1. Live migration at this point
would also cause a missed immediate reentry into L2.

After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty.  When L2 next
resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
values stored at last L2 exit. A repro of this bug showed L2 entering
triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.

When L2 is in PAE paging mode add a call to ept_load_pdptrs before
leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
vcpu->arch.walk_mmu->pdptrs[].

Tested:
kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
Verified that test fails without the fix.

Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
failures.
Signed-off-by: NPeter Shier <pshier@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200820230545.2411347-1-pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

43fea4e4

Merge tag 'kvmarm-fixes-5.9-1' of... · 1b67fd08

由 Paolo Bonzini 提交于 4年前

Merge tag 'kvmarm-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for Linux 5.9, take #1

- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
  (dirty logging, for example)
- Fix tracing output of 64bit values

1b67fd08

08 9月, 2020 6 次提交

KVM: SVM: Use __packed shorthand · 976bc5e2

由 Borislav Petkov 提交于 4年前

Use the shorthand to make it more readable.

No functional changes.
Signed-off-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-5-joro@8bytes.org

976bc5e2

KVM: SVM: Add GHCB Accessor functions · 3702c2f4

由 Joerg Roedel 提交于 4年前

Building a correct GHCB for the hypervisor requires setting valid bits
in the GHCB. Simplify that process by providing accessor functions to
set values and to update the valid bitmap and to check the valid bitmap
in KVM.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-4-joro@8bytes.org

3702c2f4

KVM: SVM: Add GHCB definitions · d07f46f9

由 Tom Lendacky 提交于 4年前

Extend the vmcb_safe_area with SEV-ES fields and add a new
'struct ghcb' which will be used for guest-hypervisor communication.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-3-joro@8bytes.org

d07f46f9

KVM: SVM: nested: Don't allocate VMCB structures on stack · 6ccbd29a

由 Joerg Roedel 提交于 4年前

Do not allocate a vmcb_control_area and a vmcb_save_area on the stack,
as these structures will become larger with future extenstions of
SVM and thus the svm_set_nested_state() function will become a too large
stack frame.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20200907131613.12703-2-joro@8bytes.org

6ccbd29a

Merge 'x86/cpu' to pick up dependent bits · c48f46ac

由 Borislav Petkov 提交于 4年前

Pick up work happening in parallel to avoid nasty merge conflicts later.
Signed-off-by: NBorislav Petkov <bp@suse.de>

c48f46ac

B
Merge 'x86/kaslr' to pick up dependent bits · 28b590f4
由 Borislav Petkov 提交于 4年前
```
Signed-off-by: NBorislav Petkov <bp@suse.de>
```
28b590f4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功