提交 · a8c7804067ebb0479e8652ac86180b33d6ffc788 · openeuler / Kernel

22 8月, 2023 1 次提交

x86/cpu: Restore AMD's DE_CFG MSR after resume · a8c78040

由 Borislav Petkov 提交于 8月 22, 2023

stable inclusion
from stable-v5.10.155
commit 154d744fbefcd13648ff036db2d185319afa74dc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7M5F4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=154d744fbefcd13648ff036db2d185319afa74dc

--------------------------------

commit 2632daeb upstream.

DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.

Unify and correct naming while at it.

Fixes: e4d0e84e ("x86/cpu/AMD: Make LFENCE a serializing instruction")
Reported-by: NAndrew Cooper <Andrew.Cooper3@citrix.com>
Reported-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nsanglipeng <sanglipeng1@jd.com>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>

a8c78040

04 4月, 2023 1 次提交

kvm: initialize all of the kvm_debugregs structure before sending it to userspace · b2923141

由 Greg Kroah-Hartman 提交于 4月 04, 2023

stable inclusion
from stable-v5.10.169
commit 6416c2108ba54d569e4c98d3b62ac78cb12e7107
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6OOP3
CVE: CVE-2023-1513

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6416c2108ba54d569e4c98d3b62ac78cb12e7107

--------------------------------

commit 2c10b614 upstream.

When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace.  Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.

Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Reported-by: NXingyuan Mo <hdthky0@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20230214103304.3689213-1-gregkh@linuxfoundation.org>
Tested-by: NXingyuan Mo <hdthky0@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

b2923141

08 3月, 2023 1 次提交

KVM: x86: Mitigate the cross-thread return address predictions bug · 3270c3d4

由 Tom Lendacky 提交于 3月 08, 2023

stable inclusion
from stable-v5.15.94
commit 5122e0e44363e3d837592b78bc04222b9d289868
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6FB6C
CVE: CVE-2022-27672

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5122e0e44363e3d837592b78bc04222b9d289868

--------------------------------

commit 6f0f2d5e upstream.

By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.

Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

3270c3d4

18 11月, 2022 3 次提交

KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · 13523cdf

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit c72a9b1d0dadfd85d19eaea81f61f7b286c57a31
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c72a9b1d0dadfd85d19eaea81f61f7b286c57a31

--------------------------------

[ Upstream commit c2fe3cd4 ]

Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.

Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.

Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics.  This will be remedied in a future patch.

Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

13523cdf

KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) · 6d6f7a5a

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e7ccee2f09b06303fb39f8cb19a2c21d388dc4e6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e7ccee2f09b06303fb39f8cb19a2c21d388dc4e6

--------------------------------

[ Upstream commit 2368048b ]

Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR.  The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs.  I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case.  Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.

Fixes: 890ca9ae ("KVM: Add MCE support")
Fixes: 9ffd986c ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

6d6f7a5a

KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors · e04324bf

由 Lev Kujawski 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit f5385a590df78d7649876a2087646090e867e6eb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f5385a590df78d7649876a2087646090e867e6eb

--------------------------------

[ Upstream commit 0471a7bd ]

Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
MC1_CTL to ignore single-bit ECC data errors.  Single-bit ECC data
errors are always correctable and thus are safe to ignore because they
are informational in nature rather than signaling a loss of data
integrity.

Prior to this patch, these guests would crash upon writing MC1_CTL,
with resultant error messages like the following:

error: kvm run failed Operation not permitted
EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
GDT=     ffff5020 000002cf
IDT=     ffff52f0 000007ff
CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
74 26 00 0f 31 c3
Signed-off-by: NLev Kujawski <lkujaw@member.fsf.org>
Message-Id: <20220521081511.187388-1-lkujaw@member.fsf.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e04324bf

02 11月, 2022 1 次提交

KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() · 4c0ad999

由 Vitaly Kuznetsov 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3

--------------------------------

[ Upstream commit 8a414f94 ]

'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

4c0ad999

16 9月, 2022 1 次提交

x86/kvm/vmx: Make noinstr clean · 0988e2d7

由 Peter Zijlstra 提交于 9月 16, 2022

stable inclusion
from stable-v5.10.133
commit 7070bbb66c5303117e4c7651711ea7daae4c64b5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7070bbb66c5303117e4c7651711ea7daae4c64b5

--------------------------------

commit 742ab6df upstream.

The recent mmio_stale_data fixes broke the noinstr constraints:

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section
  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section

make it all happy again.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0988e2d7

14 9月, 2022 1 次提交

KVM: x86: do not report a vCPU as preempted outside instruction boundaries · 3084ed12

由 Paolo Bonzini 提交于 9月 14, 2022

mainline inclusion
from mainline-v5.19-rc2
commit 6cd88243
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H
CVE: CVE-2022-39189

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd88243c7e03845a450795e134b488fc2afb736

----------------------------------------

If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access.  A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.

To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary.  A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.

It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed.  However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses.  Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

conflict:
	arch/x86/kvm/x86.c
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Nyezengruan <yezengruan@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3084ed12

30 8月, 2022 1 次提交

KVM: x86: Properly handle APF vs disabled LAPIC situation · 17603121

由 Vitaly Kuznetsov 提交于 8月 30, 2022

stable inclusion
from stable-v5.10.119
commit 74c6e5d584354c6126f1231667f9d8e85d7f536f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6BB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=74c6e5d584354c6126f1231667f9d8e85d7f536f

--------------------------------

commit 2f15d027 upstream.

Async PF 'page ready' event may happen when LAPIC is (temporary) disabled.
In particular, Sebastien reports that when Linux kernel is directly booted
by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is
initialized. On initialization KVM tries to inject 'wakeup all' event and
puts the corresponding token to the slot. It is, however, failing to inject
an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled())
so the guest never gets notified and the whole APF mechanism gets stuck.
The same issue is likely to happen if the guest temporary disables LAPIC
and a previously unavailable page becomes available.

Do two things to resolve the issue:
- Avoid dequeuing 'page ready' events from APF queue when LAPIC is
  disabled.
- Trigger an attempt to deliver pending 'page ready' events when LAPIC
  becomes enabled (SPIV or MSR_IA32_APICBASE).
Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210422092948.568327-1-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Guoqing: backport to 5.10-stable ]
Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

17603121

19 7月, 2022 1 次提交

KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded · 6d51bada

由 Sean Christopherson 提交于 7月 19, 2022

stable inclusion
from stable-v5.10.112
commit 342454231ee5f2c2782f5510cab2e7a968486fef
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5HL0X

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=342454231ee5f2c2782f5510cab2e7a968486fef

--------------------------------

commit 1d0e8480 upstream.

Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as
-1 is technically undefined behavior when its value is read out by
param_get_bool(), as boolean values are supposed to be '0' or '1'.

Alternatively, KVM could define a custom getter for the param, but the
auto value doesn't depend on the vendor module in any way, and printing
"auto" would be unnecessarily unfriendly to the user.

In addition to fixing the undefined behavior, resolving the auto value
also fixes the scenario where the auto value resolves to N and no vendor
module is loaded.  Previously, -1 would result in Y being printed even
though KVM would ultimately disable the mitigation.

Rename the existing MMU module init/exit helpers to clarify that they're
invoked with respect to the vendor module, and add comments to document
why KVM has two separate "module init" flows.

  =========================================================================
  UBSAN: invalid-load in kernel/params.c:320:33
  load of value 255 is not a valid value for type '_Bool'
  CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   <TASK>
   dump_stack_lvl+0x34/0x44
   ubsan_epilogue+0x5/0x40
   __ubsan_handle_load_invalid_value.cold+0x43/0x48
   param_get_bool.cold+0xf/0x14
   param_attr_show+0x55/0x80
   module_attr_show+0x1c/0x30
   sysfs_kf_seq_show+0x93/0xc0
   seq_read_iter+0x11c/0x450
   new_sync_read+0x11b/0x1a0
   vfs_read+0xf0/0x190
   ksys_read+0x5f/0xe0
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>
  =========================================================================

Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable@vger.kernel.org
Reported-by: NBruno Goncalves <bgoncalv@redhat.com>
Reported-by: NJan Stancek <jstancek@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220331221359.3912754-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

6d51bada

14 7月, 2022 1 次提交

KVM: x86/emulator: Emulate RDPID only if it is enabled in guest · 5a6fe98b

由 Hou Wenlong 提交于 7月 14, 2022

stable inclusion
from stable-v5.10.111
commit a24479c5e9f44c2a799ca2178b1ba9fe1290706e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a24479c5e9f44c2a799ca2178b1ba9fe1290706e

--------------------------------

[ Upstream commit a836839c ]

When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.

Fixes: fb6d4d34 ("KVM: x86: emulate RDPID")
Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5a6fe98b

05 7月, 2022 1 次提交

KVM: x86/speculation: Disable Fill buffer clear within guests · 2536c41e

由 Pawan Gupta 提交于 7月 05, 2022

stable inclusion
from stable-v5.10.123
commit bde15fdcce44956278b4f50680b7363ca126ffb9
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS
CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=bde15fdcce44956278b4f50680b7363ca126ffb9

--------------------------------

commit 027bbb88 upstream

The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an
accurate indicator on all CPUs of whether the VERW instruction will
overwrite fill buffers. FB_CLEAR enumeration in
IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not
vulnerable to MDS/TAA, indicating that microcode does overwrite fill
buffers.

Guests running in VMM environments may not be aware of all the
capabilities/vulnerabilities of the host CPU. Specifically, a guest may
apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable
to MDS/TAA even when the physical CPU is not. On CPUs that enumerate
FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill
buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS
during VMENTER and resetting on VMEXIT. For guests that enumerate
FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM
will not use FB_CLEAR_DIS.

Irrespective of guest state, host overwrites CPU buffers before VMENTER
to protect itself from an MMIO capable guest, as part of mitigation for
MMIO Stale Data vulnerabilities.
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
                arch/x86/kvm/vmx/vmx.h
Signed-off-by: NYipeng Zou <zouyipeng@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2536c41e

28 6月, 2022 1 次提交

KVM: x86: avoid calling x86 emulator without a decoded instruction · 09689aa3

由 Sean Christopherson 提交于 6月 28, 2022

mainline inclusion
from mainline-v5.19-rc1
commit fee060cd
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5BFBW
CVE: CVE-2022-1852

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fee060cd52d69c114b62d1a2948ea9648b5131f9

--------------------------------

Whenever x86_decode_emulated_instruction() detects a breakpoint, it
returns the value that kvm_vcpu_check_breakpoint() writes into its
pass-by-reference second argument.  Unfortunately this is completely
bogus because the expected outcome of x86_decode_emulated_instruction
is an EMULATION_* value.

Then, if kvm_vcpu_check_breakpoint() does "*r = 0" (corresponding to
a KVM_EXIT_DEBUG userspace exit), it is misunderstood as EMULATION_OK
and x86_emulate_instruction() is called without having decoded the
instruction.  This causes various havoc from running with a stale
emulation context.

The fix is to move the call to kvm_vcpu_check_breakpoint() where it was
before commit 4aa2691d ("KVM: x86: Factor out x86 instruction
emulation with decoding") introduced x86_decode_emulated_instruction().
The other caller of the function does not need breakpoint checks,
because it is invoked as part of a vmexit and the processor has already
checked those before executing the instruction that #GP'd.

This fixes CVE-2022-1852.

conficts:
       arch/x86/kvm/x86.c
Reported-by: NQiuhao Li <qiuhao@sysec.org>
Reported-by: NGaoning Pan <pgn@zju.edu.cn>
Reported-by: NYongkang Jia <kangel@zju.edu.cn>
Fixes: 4aa2691d ("KVM: x86: Factor out x86 instruction emulation with decoding")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311032801.3467418-2-seanjc@google.com>
[Rewrote commit message according to Qiuhao's report, since a patch
 already existed to fix the bug. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMa Junhai <majunhai2@huawei.com>
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

09689aa3

07 6月, 2022 1 次提交

KVM: SEV: add cache flush to solve SEV cache incoherency issues · 4c56ae2f

由 Mingwei Zhang 提交于 6月 07, 2022

mainline inclusion
from mainline-v5.18-rc4
commit 683412cc
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58DFI
CVE: CVE-2022-0171

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=683412ccf61294d727ead4a73d97397396e69a6b

--------------------------------

Flush the CPU caches when memory is reclaimed from an SEV guest (where
reclaim also includes it being unmapped from KVM's memslots).  Due to lack
of coherency for SEV encrypted memory, failure to flush results in silent
data corruption if userspace is malicious/broken and doesn't ensure SEV
guest memory is properly pinned and unpinned.

Cache coherency is not enforced across the VM boundary in SEV (AMD APM
vol.2 Section 15.34.7). Confidential cachelines, generated by confidential
VM guests have to be explicitly flushed on the host side. If a memory page
containing dirty confidential cachelines was released by VM and reallocated
to another user, the cachelines may corrupt the new user at a later time.

KVM takes a shortcut by assuming all confidential memory remain pinned
until the end of VM lifetime. Therefore, KVM does not flush cache at
mmu_notifier invalidation events. Because of this incorrect assumption and
the lack of cache flushing, malicous userspace can crash the host kernel:
creating a malicious VM and continuously allocates/releases unpinned
confidential memory pages when the VM is running.

Add cache flush operations to mmu_notifier operations to ensure that any
physical memory leaving the guest VM get flushed. In particular, hook
mmu_notifier_invalidate_range_start and mmu_notifier_release events and
flush cache accordingly. The hook after releasing the mmu lock to avoid
contention with other vCPUs.

Cc: stable@vger.kernel.org
Suggested-by: NSean Christpherson <seanjc@google.com>
Reported-by: NMingwei Zhang <mizhang@google.com>
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Message-Id: <20220421031407.2516575-4-mizhang@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

conflicts:
    arch/x86/include/asm/kvm-x86-ops.h
    arch/x86/kvm/x86.c
    virt/kvm/kvm_main.c
Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4c56ae2f

10 5月, 2022 2 次提交

KVM: x86: Forcibly leave nested virt when SMM state is toggled · 826b7373

由 Sean Christopherson 提交于 5月 10, 2022

stable inclusion
from stable-v5.10.97
commit 080dbe7e9b86a0392d8dffc00d9971792afc121f
bugzilla: https://gitee.com/openeuler/kernel/issues/I55O0O

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=080dbe7e9b86a0392d8dffc00d9971792afc121f

--------------------------------

commit f7e57078 upstream.

Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.

Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.

Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.

  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Modules linked in:
  CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
  Call Trace:
   <TASK>
   kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
   kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
   kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
   kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
   kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
   kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
   kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
   __fput+0x286/0x9f0 fs/file_table.c:311
   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0xb29/0x2a30 kernel/exit.c:806
   do_group_exit+0xd2/0x2f0 kernel/exit.c:935
   get_signal+0x4b0/0x28c0 kernel/signal.c:2862
   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Cc: stable@vger.kernel.org
Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220125220358.2091737-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Backported-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

826b7373

KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS · 83e122dd

由 Like Xu 提交于 5月 10, 2022

stable inclusion
from stable-v5.10.96
commit e92cac1dd803aca5bc326ec22bdcd4f56855d7ce
bugzilla: https://gitee.com/openeuler/kernel/issues/I55NWB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e92cac1dd803aca5bc326ec22bdcd4f56855d7ce

--------------------------------

commit 4c282e51 upstream.

Do a runtime CPUID update for a vCPU if MSR_IA32_XSS is written, as the
size in bytes of the XSAVE area is affected by the states enabled in XSS.

Fixes: 20300099 ("kvm: vmx: add MSR logic for XSAVES")
Cc: stable@vger.kernel.org
Signed-off-by: NLike Xu <likexu@tencent.com>
[sean: split out as a separate patch, adjust Fixes tag]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220126172226.2298529-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

83e122dd

27 4月, 2022 1 次提交

x86: KVM: Fixed the bug that WAITmax cannot be updated in real time · 414a578b

由 liangtian 提交于 4月 27, 2022

virt inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I53PTV?from=project-issue
CVE: NA

-----------------------------------------------------

Since the reset function is in kvm_intel module instead of kvm
module, the attribute weak function in kvm_main.c could not be found, which
would cause st_max in X86 never be refreshed.
The solution is to define the reset function in x86.c under the kvm module.
Signed-off-by: Nliangtian <liangtian13@huawei.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

414a578b

19 4月, 2022 2 次提交

KVM: x86: remove PMU FIXED_CTR3 from msrs_to_save_all · a0e48d8a

由 Wei Wang 提交于 4月 19, 2022

stable inclusion
from stable-v5.10.93
commit 4c7fb4d519e599bb69581d80fbfc1392cbea5fea
bugzilla: 186204 https://gitee.com/openeuler/kernel/issues/I5311N

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4c7fb4d519e599bb69581d80fbfc1392cbea5fea

--------------------------------

commit 9fb12fe5 upstream.

The fixed counter 3 is used for the Topdown metrics, which hasn't been
enabled for KVM guests. Userspace accessing to it will fail as it's not
included in get_fixed_pmc(). This breaks KVM selftests on ICX+ machines,
which have this counter.

To reproduce it on ICX+ machines, ./state_test reports:
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

==== Test Assertion Failure ====
lib/x86_64/processor.c:1078: r == nmsrs
pid=4564 tid=4564 - Argument list too long
1  0x000000000040b1b9: vcpu_save_state at processor.c:1077
2  0x0000000000402478: main at state_test.c:209 (discriminator 6)
3  0x00007fbe21ed5f92: ?? ??:0
4  0x000000000040264d: _start at ??:?
 Unexpected result from KVM_GET_MSRS, r: 17 (failed MSR was 0x30c)

With this patch, it works well.
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Message-Id: <20211217124934.32893-1-wei.w.wang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Fixes: e2ada66e ("kvm: x86: Add Intel PMU MSRs to msrs_to_save[]")
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a0e48d8a

KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest · 72420054

由 Sean Christopherson 提交于 4月 19, 2022

stable inclusion
from stable-v5.10.93
commit 413b427f5fff5d658c2605ca889d6b13b88efd0c
bugzilla: 186204 https://gitee.com/openeuler/kernel/issues/I5311N

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=413b427f5fff5d658c2605ca889d6b13b88efd0c

--------------------------------

commit f4b027c5 upstream.

Override the Processor Trace (PT) interrupt handler for guest mode if and
only if PT is configured for host+guest mode, i.e. is being used
independently by both host and guest.  If PT is configured for system
mode, the host fully controls PT and must handle all events.

Fixes: 8479e04e ("KVM: x86: Inject PMI for KVM guest")
Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: NArtem Kashkanov <artem.kashkanov@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

72420054

14 1月, 2022 1 次提交

KVM: x86: Drop guest CPUID check for host initiated writes to MSR_IA32_PERF_CAPABILITIES · c8480258

由 Vitaly Kuznetsov 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.88
commit 9439fabfc349c3e8eacfb6b0856d6739de129969
bugzilla: 186058 https://gitee.com/openeuler/kernel/issues/I4QW6A

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9439fabfc349c3e8eacfb6b0856d6739de129969

--------------------------------

[ Upstream commit 1aa2abb3 ]

The ability to write to MSR_IA32_PERF_CAPABILITIES from the host should
not depend on guest visible CPUID entries, even if just to allow
creating/restoring guest MSRs and CPUIDs in any sequence.

Fixes: 27461da3 ("KVM: x86/pmu: Support full width counting")
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211216165213.338923-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c8480258

07 1月, 2022 1 次提交

KVM: VMX: read/write MSR_IA32_DEBUGCTLMSR from GUEST_IA32_DEBUGCTL · 52622a7d

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit d855066f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d855066f81726155caf766e47eea58ae10b1fd57

-------------------

SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
svm_get/set_msr, so the x86 common part can be safely moved to VMX.
This allows KVM to store the bits it supports in GUEST_IA32_DEBUGCTL.

Add vmx_supported_debugctl() to refactor the throwing logic of #GP.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210108013704.134985-2-like.xu@linux.intel.com>
[Merge parts of Chenyi Qiang's "KVM: X86: Expose bus lock debug exception
 to guest". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

52622a7d

30 12月, 2021 3 次提交

KVM: x86: Properly reset MMU context at vCPU RESET/INIT · 1d717e40

由 Sean Christopherson 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 0aa18375
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0aa1837533e5f4be8cc21bbc06314c23ba2c5447

-------------------

Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT.  Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.

This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.

In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c4 ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.

In SVM, commit ebae871a ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.

The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.

Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1d717e40

KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs · d55f6cb5

由 Wanpeng Li 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.12-rc4
commit c2162e13
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2162e13d6e2f43e5001a356196871642de070ba

-------------------

In order to deal with noncoherent DMA, we should execute wbinvd on
all dirty pCPUs when guest wbinvd exits to maintain data consistency.
smp_call_function_many() does not execute the provided function on the
local core, therefore replace it by on_each_cpu_mask().
Reported-by: NNadav Amit <namit@vmware.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d55f6cb5

kvm: SMM: fix losing SMI problem · 954d5f52

由 xulei 提交于 12月 30, 2021

virt inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA
-------------------

fix losing SMI problem
Signed-off-by: Nxulei <stone.xulei@huawei.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

954d5f52

15 11月, 2021 2 次提交

kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[] · 64962ca1

由 Fares Mehanna 提交于 11月 15, 2021

stable inclusion
from stable-5.10.72
commit ae34f26d4a8487e37ab9cfdca72f447c1ef49aa5
bugzilla: 182982 https://gitee.com/openeuler/kernel/issues/I4I3L1

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ae34f26d4a8487e37ab9cfdca72f447c1ef49aa5

--------------------------------

[ Upstream commit e1fc1553 ]

Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a
consistent behavior between Intel and AMD when using KVM_GET_MSRS,
KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST.

We have to add legacy and new MSRs to handle guests running without
X86_FEATURE_PERFCTR_CORE.
Signed-off-by: NFares Mehanna <faresx@amazon.de>
Message-Id: <20210915133951.22389-1-faresx@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

64962ca1

KVM: x86: Handle SRCU initialization failure during page track init · 2b28570d

由 Haimin Zhang 提交于 11月 15, 2021

stable inclusion
from stable-5.10.71
commit deb2949417677649e2413266d7ce8c2ff73952b4
bugzilla: 182981 https://gitee.com/openeuler/kernel/issues/I4I3KD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=deb2949417677649e2413266d7ce8c2ff73952b4

--------------------------------

commit eb7511bf upstream.

Check the return of init_srcu_struct(), which can fail due to OOM, when
initializing the page track mechanism.  Lack of checking leads to a NULL
pointer deref found by a modified syzkaller.
Reported-by: NTCS Robot <tcs_robot@tencent.com>
Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com>
Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com>
[Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2b28570d

19 10月, 2021 3 次提交

KVM: x86: Update vCPU's hv_clock before back to guest when tsc_offset is adjusted · 6e0e0a7e

由 Zelin Deng 提交于 10月 19, 2021

stable inclusion
from stable-5.10.65
commit c06e6ff2fcc34476eda29c5c14ca0db962f500cf
bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c06e6ff2fcc34476eda29c5c14ca0db962f500cf

--------------------------------

commit d9130a2d upstream.

When MSR_IA32_TSC_ADJUST is written by guest due to TSC ADJUST feature
especially there's a big tsc warp (like a new vCPU is hot-added into VM
which has been up for a long time), tsc_offset is added by a large value
then go back to guest. This causes system time jump as tsc_timestamp is
not adjusted in the meantime and pvclock monotonic character.
To fix this, just notify kvm to update vCPU's guest time before back to
guest.

Cc: stable@vger.kernel.org
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <1619576521-81399-2-git-send-email-zelin.deng@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6e0e0a7e

KVM: X86: Fix warning caused by stale emulation context · 696d0b3f

由 Wanpeng Li 提交于 10月 19, 2021

stable inclusion
from stable-5.10.61
commit 8e100c72b6ded07d5900c20ca5ee5d0c6d9c4293
bugzilla: 177029 https://gitee.com/openeuler/kernel/issues/I4EAXD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8e100c72b6ded07d5900c20ca5ee5d0c6d9c4293

--------------------------------

[ Upstream commit da6393cd ]

Reported by syzkaller:

  WARNING: CPU: 7 PID: 10526 at linux/arch/x86/kvm//x86.c:7621 x86_emulate_instruction+0x41b/0x510 [kvm]
  RIP: 0010:x86_emulate_instruction+0x41b/0x510 [kvm]
  Call Trace:
   kvm_mmu_page_fault+0x126/0x8f0 [kvm]
   vmx_handle_exit+0x11e/0x680 [kvm_intel]
   vcpu_enter_guest+0xd95/0x1b40 [kvm]
   kvm_arch_vcpu_ioctl_run+0x377/0x6a0 [kvm]
   kvm_vcpu_ioctl+0x389/0x630 [kvm]
   __x64_sys_ioctl+0x8e/0xd0
   do_syscall_64+0x3c/0xb0
   entry_SYSCALL_64_after_hwframe+0x44/0xae

Commit 4a1e10d5 ("KVM: x86: handle hardware breakpoints during emulation())
adds hardware breakpoints check before emulation the instruction and parts of
emulation context initialization, actually we don't have the EMULTYPE_NO_DECODE flag
here and the emulation context will not be reused. Commit c8848cee ("KVM: x86:
set ctxt->have_exception in x86_decode_insn()) triggers the warning because it
catches the stale emulation context has #UD, however, it is not during instruction
decoding which should result in EMULATION_FAILED. This patch fixes it by moving
the second part emulation context initialization into init_emulate_ctxt() and
before hardware breakpoints check. The ctxt->ud will be dropped by a follow-up
patch.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134683fdd00000

Reported-by: syzbot+71271244f206d17f6441@syzkaller.appspotmail.com
Fixes: 4a1e10d5 (KVM: x86: handle hardware breakpoints during emulation)
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <1622160097-37633-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

696d0b3f

KVM: x86: Factor out x86 instruction emulation with decoding · 0d7b3af7

由 Wei Huang 提交于 10月 19, 2021

stable inclusion
from stable-5.10.61
commit 47d4c799979996e512730de23a08c1d17dc373db
bugzilla: 177029 https://gitee.com/openeuler/kernel/issues/I4EAXD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47d4c799979996e512730de23a08c1d17dc373db

--------------------------------

[ Upstream commit 4aa2691d ]

Move the instruction decode part out of x86_emulate_instruction() for it
to be used in other places. Also kvm_clear_exception_queue() is moved
inside the if-statement as it doesn't apply when KVM are coming back from
userspace.
Co-developed-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NWei Huang <wei.huang2@amd.com>
Message-Id: <20210126081831.570253-2-wei.huang2@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0d7b3af7

15 10月, 2021 4 次提交

KVM: x86: accept userspace interrupt only if no event is injected · bd77045b

由 Paolo Bonzini 提交于 10月 15, 2021

stable inclusion
from stable-5.10.58
commit 309a31127befd0c9809036f3ea500fcf662f2bf3
bugzilla: 176984 https://gitee.com/openeuler/kernel/issues/I4E2P4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=309a31127befd0c9809036f3ea500fcf662f2bf3

--------------------------------

commit fa7a549d upstream.

Once an exception has been injected, any side effects related to
the exception (such as setting CR2 or DR6) have been taked place.
Therefore, once KVM sets the VM-entry interruption information
field or the AMD EVENTINJ field, the next VM-entry must deliver that
exception.

Pending interrupts are processed after injected exceptions, so
in theory it would not be a problem to use KVM_INTERRUPT when
an injected exception is present.  However, DOSEMU is using
run->ready_for_interrupt_injection to detect interrupt windows
and then using KVM_SET_SREGS/KVM_SET_REGS to inject the
interrupt manually.  For this to work, the interrupt window
must be delayed after the completion of the previous event
injection.

Cc: stable@vger.kernel.org
Reported-by: NStas Sergeev <stsp2@yandex.ru>
Tested-by: NStas Sergeev <stsp2@yandex.ru>
Fixes: 71cc849b ("KVM: x86: Fix split-irqchip vs interrupt injection window request")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bd77045b

KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access · c01e563c

由 Vitaly Kuznetsov 提交于 10月 15, 2021

stable inclusion
from stable-5.10.56
commit 01f3581d4400f6586403409c5c3a3128fbb3e336
bugzilla: 176004 https://gitee.com/openeuler/kernel/issues/I4DYZ4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=01f3581d4400f6586403409c5c3a3128fbb3e336

--------------------------------

[ Upstream commit 0a31df68 ]

MSR_KVM_ASYNC_PF_ACK MSR is part of interrupt based asynchronous page fault
interface and not the original (deprecated) KVM_FEATURE_ASYNC_PF. This is
stated in Documentation/virt/kvm/msr.rst.

Fixes: 66570e96 ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Message-Id: <20210722123018.260035-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c01e563c

KVM: x86: determine if an exception has an error code only when injecting it. · daee2e10

由 Maxim Levitsky 提交于 10月 15, 2021

stable inclusion
from stable-5.10.55
commit dee8119eaa9e00f0d1e14f050e1867813a49a1bb
bugzilla: 175636 https://gitee.com/openeuler/kernel/issues/I4DYWD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dee8119eaa9e00f0d1e14f050e1867813a49a1bb

--------------------------------

commit b97f0745 upstream.

A page fault can be queued while vCPU is in real paged mode on AMD, and
AMD manual asks the user to always intercept it
(otherwise result is undefined).
The resulting VM exit, does have an error code.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210225154135.405125-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NZubin Mithra <zsm@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

daee2e10

KVM: X86: Disable hardware breakpoints unconditionally before kvm_x86->run() · 1981172a

由 Lai Jiangshan 提交于 10月 14, 2021

stable inclusion
from stable-5.10.52
commit 8b2ae2de536cca8ee725add5bccce4abec235880
bugzilla: 175542 https://gitee.com/openeuler/kernel/issues/I4DTKU

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8b2ae2de536cca8ee725add5bccce4abec235880

--------------------------------

commit f85d4016 upstream.

When the host is using debug registers but the guest is not using them
nor is the guest in guest-debug state, the kvm code does not reset
the host debug registers before kvm_x86->run().  Rather, it relies on
the hardware vmentry instruction to automatically reset the dr7 registers
which ensures that the host breakpoints do not affect the guest.

This however violates the non-instrumentable nature around VM entry
and exit; for example, when a host breakpoint is set on vcpu->arch.cr2,

Another issue is consistency.  When the guest debug registers are active,
the host breakpoints are reset before kvm_x86->run(). But when the
guest debug registers are inactive, the host breakpoints are delayed to
be disabled.  The host tracing tools may see different results depending
on what the guest is doing.

To fix the problems, we clear %db7 unconditionally before kvm_x86->run()
if the host has set any breakpoints, no matter if the guest is using
them or not.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20210628172632.81029-1-jiangshanlai@gmail.com>
Cc: stable@vger.kernel.org
[Only clear %db7 instead of reloading all debug registers. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1981172a

13 10月, 2021 1 次提交

KVM: nVMX: Sync all PGDs on nested transition with shadow paging · 436ae6ed

由 Sean Christopherson 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit b2c5af71ce4b6e21f6af9fa292e72fd4662a2490
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b2c5af71ce4b6e21f6af9fa292e72fd4662a2490

--------------------------------

[ Upstream commit 07ffaf34 ]

Trigger a full TLB flush on behalf of the guest on nested VM-Enter and
VM-Exit when VPID is disabled for L2.  kvm_mmu_new_pgd() syncs only the
current PGD, which can theoretically leave stale, unsync'd entries in a
previous guest PGD, which could be consumed if L2 is allowed to load CR3
with PCID_NOFLUSH=1.

Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can
be utilized for its obvious purpose of emulating a guest TLB flush.

Note, there is no change the actual TLB flush executed by KVM, even
though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT.  When VPID is
disabled for L2, vpid02 is guaranteed to be '0', and thus
nested_get_vpid02() will return the VPID that is shared by L1 and L2.

Generate the request outside of kvm_mmu_new_pgd(), as getting the common
helper to correctly identify which requested is needed is quite painful.
E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as
a TLB flush from the L1 kernel's perspective does not invalidate EPT
mappings.  And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future
simplification by moving the logic into nested_vmx_transition_tlb_flush().

Fixes: 41fab65e ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

436ae6ed

06 7月, 2021 2 次提交

KVM: X86: Fix x86_emulator slab cache leak · f9a6de85

由 Wanpeng Li 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 3a9934d6b8dd8a91d61ed2d0d538fa27cb9192a3
bugzilla: 168323
CVE: NA

--------------------------------

commit dfdc0a71 upstream.

Commit c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
tries to allocate per-vCPU emulation context dynamically, however, the
x86_emulator slab cache is still exiting after the kvm module is unload
as below after destroying the VM and unloading the kvm module.

grep x86_emulator /proc/slabinfo
x86_emulator          36     36   2672   12    8 : tunables    0    0    0 : slabdata      3      3      0

This patch fixes this slab cache leak by destroying the x86_emulator slab cache
when the kvm module is unloaded.

Fixes: c9b8b07c (KVM: x86: Dynamically allocate per-vCPU emulation context)
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1623387573-5969-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f9a6de85

KVM: x86: Immediately reset the MMU context when the SMM flag is cleared · a557ec4c

由 Sean Christopherson 提交于 6月 30, 2021

stable inclusion
from stable-5.10.46
commit 669a8866e468fd020d34eb00e08cb41d3774b71b
bugzilla: 168323
CVE: NA

--------------------------------

commit 78fcb2c9 upstream.

Immediately reset the MMU context when the vCPU's SMM flag is cleared so
that the SMM flag in the MMU role is always synchronized with the vCPU's
flag.  If RSM fails (which isn't correctly emulated), KVM will bail
without calling post_leave_smm() and leave the MMU in a bad state.

The bad MMU role can lead to a NULL pointer dereference when grabbing a
shadow page's rmap for a page fault as the initial lookups for the gfn
will happen with the vCPU's SMM flag (=0), whereas the rmap lookup will
use the shadow page's SMM flag, which comes from the MMU (=1).  SMM has
an entirely different set of memslots, and so the initial lookup can find
a memslot (SMM=0) and then explode on the rmap memslot lookup (SMM=1).

  general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
  CPU: 1 PID: 8410 Comm: syz-executor382 Not tainted 5.13.0-rc5-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:__gfn_to_rmap arch/x86/kvm/mmu/mmu.c:935 [inline]
  RIP: 0010:gfn_to_rmap+0x2b0/0x4d0 arch/x86/kvm/mmu/mmu.c:947
  Code: <42> 80 3c 20 00 74 08 4c 89 ff e8 f1 79 a9 00 4c 89 fb 4d 8b 37 44
  RSP: 0018:ffffc90000ffef98 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff888015b9f414 RCX: ffff888019669c40
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
  RBP: 0000000000000001 R08: ffffffff811d9cdb R09: ffffed10065a6002
  R10: ffffed10065a6002 R11: 0000000000000000 R12: dffffc0000000000
  R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000000
  FS:  000000000124b300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000028e31000 CR4: 00000000001526e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   rmap_add arch/x86/kvm/mmu/mmu.c:965 [inline]
   mmu_set_spte+0x862/0xe60 arch/x86/kvm/mmu/mmu.c:2604
   __direct_map arch/x86/kvm/mmu/mmu.c:2862 [inline]
   direct_page_fault+0x1f74/0x2b70 arch/x86/kvm/mmu/mmu.c:3769
   kvm_mmu_do_page_fault arch/x86/kvm/mmu.h:124 [inline]
   kvm_mmu_page_fault+0x199/0x1440 arch/x86/kvm/mmu/mmu.c:5065
   vmx_handle_exit+0x26/0x160 arch/x86/kvm/vmx/vmx.c:6122
   vcpu_enter_guest+0x3bdd/0x9630 arch/x86/kvm/x86.c:9428
   vcpu_run+0x416/0xc20 arch/x86/kvm/x86.c:9494
   kvm_arch_vcpu_ioctl_run+0x4e8/0xa40 arch/x86/kvm/x86.c:9722
   kvm_vcpu_ioctl+0x70f/0xbb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3460
   vfs_ioctl fs/ioctl.c:51 [inline]
   __do_sys_ioctl fs/ioctl.c:1069 [inline]
   __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:1055
   do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  RIP: 0033:0x440ce9

Cc: stable@vger.kernel.org
Reported-by: syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com
Fixes: 9ec19493 ("KVM: x86: clear SMM flags before loading state while leaving SMM")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609185619.992058-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a557ec4c

15 6月, 2021 1 次提交

KVM: X86: Fix vCPU preempted state from guest's point of view · d3546be7

由 Wanpeng Li 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit bb2e3adf237cca502e9179f3118f000504fe6fb3
bugzilla: 55093
CVE: NA

--------------------------------

commit 1eff0ada upstream.

Commit 66570e96 (kvm: x86: only provide PV features if enabled in guest's
CPUID) avoids to access pv tlb shootdown host side logic when this pv feature
is not exposed to guest, however, kvm_steal_time.preempted not only leveraged
by pv tlb shootdown logic but also mitigate the lock holder preemption issue.
From guest's point of view, vCPU is always preempted since we lose the reset
of kvm_steal_time.preempted before vmentry if pv tlb shootdown feature is not
exposed. This patch fixes it by clearing kvm_steal_time.preempted before
vmentry.

Fixes: 66570e96 (kvm: x86: only provide PV features if enabled in guest's CPUID)
Reviewed-by: NSean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1621339235-11131-3-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d3546be7

03 6月, 2021 2 次提交

KVM: x86: Defer vtime accounting 'til after IRQ handling · c15ee071

由 Wanpeng Li 提交于 5月 29, 2021

stable inclusion
from stable-5.10.41
commit 514883ebac77ff9939da92e268b24a71c9fe4e05
bugzilla: 51890
CVE: NA

--------------------------------

commit 16045714 upstream.

Defer the call to account guest time until after servicing any IRQ(s)
that happened in the guest or immediately after VM-Exit.  Tick-based
accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
handler runs, and IRQs are blocked throughout the main sequence of
vcpu_enter_guest(), including the call into vendor code to actually
enter and exit the guest.

This fixes a bug where reported guest time remains '0', even when
running an infinite loop in the guest:

  https://bugzilla.kernel.org/show_bug.cgi?id=209831

Fixes: 87fa7f3e ("x86/kvm: Move context tracking where it belongs")
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c15ee071

KVM: VMX: Disable preemption when probing user return MSRs · 17763049

由 Sean Christopherson 提交于 5月 25, 2021

stable inclusion
from stable-5.10.38
commit 31f29749ee970c251b3a7e5b914108425940d089
bugzilla: 51875
CVE: NA

--------------------------------

commit 5104d7ff upstream.

Disable preemption when probing a user return MSR via RDSMR/WRMSR.  If
the MSR holds a different value per logical CPU, the WRMSR could corrupt
the host's value if KVM is preempted between the RDMSR and WRMSR, and
then rescheduled on a different CPU.

Opportunistically land the helper in common x86, SVM will use the helper
in a future commit.

Fixes: 4be53410 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation")
Cc: stable@vger.kernel.org
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-6-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

17763049

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功