提交 · 13fc6196a231afca0021d3e8bbdc8d3bee25bc6b · openeuler / Kernel

22 8月, 2023 1 次提交

x86/cpu: Restore AMD's DE_CFG MSR after resume · d80a51fc

由 Borislav Petkov 提交于 8月 22, 2023

stable inclusion
from stable-v5.10.155
commit 154d744fbefcd13648ff036db2d185319afa74dc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7M5F4

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=154d744fbefcd13648ff036db2d185319afa74dc

--------------------------------

commit 2632daeb upstream.

DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.

Unify and correct naming while at it.

Fixes: e4d0e84e ("x86/cpu/AMD: Make LFENCE a serializing instruction")
Reported-by: NAndrew Cooper <Andrew.Cooper3@citrix.com>
Reported-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Nsanglipeng <sanglipeng1@jd.com>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>

d80a51fc

04 4月, 2023 1 次提交

kvm: initialize all of the kvm_debugregs structure before sending it to userspace · 21495e58

由 Greg Kroah-Hartman 提交于 4月 04, 2023

stable inclusion
from stable-v5.10.169
commit 6416c2108ba54d569e4c98d3b62ac78cb12e7107
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6OOP3
CVE: CVE-2023-1513

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6416c2108ba54d569e4c98d3b62ac78cb12e7107

--------------------------------

commit 2c10b614 upstream.

When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace.  Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.

Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Reported-by: NXingyuan Mo <hdthky0@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20230214103304.3689213-1-gregkh@linuxfoundation.org>
Tested-by: NXingyuan Mo <hdthky0@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

21495e58

08 3月, 2023 1 次提交

KVM: x86: Mitigate the cross-thread return address predictions bug · 8bba93fa

由 Tom Lendacky 提交于 3月 08, 2023

stable inclusion
from stable-v5.15.94
commit 5122e0e44363e3d837592b78bc04222b9d289868
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6FB6C
CVE: CVE-2022-27672

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5122e0e44363e3d837592b78bc04222b9d289868

--------------------------------

commit 6f0f2d5e upstream.

By default, KVM/SVM will intercept attempts by the guest to transition
out of C0. However, the KVM_CAP_X86_DISABLE_EXITS capability can be used
by a VMM to change this behavior. To mitigate the cross-thread return
address predictions bug (X86_BUG_SMT_RSB), a VMM must not be allowed to
override the default behavior to intercept C0 transitions.

Use a module parameter to control the mitigation on processors that are
vulnerable to X86_BUG_SMT_RSB. If the processor is vulnerable to the
X86_BUG_SMT_RSB bug and the module parameter is set to mitigate the bug,
KVM will not allow the disabling of the HLT, MWAIT and CSTATE exits.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <4019348b5e07148eb4d593380a5f6713b93c9a16.1675956146.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

8bba93fa

07 2月, 2023 1 次提交

KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES · 9fce9f29

由 Jim Mattson 提交于 2月 07, 2023

stable inclusion
from stable-v5.10.142
commit eb0c614c426c5837808b924b0525c23b1d1ab164
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6CSFH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb0c614c426c5837808b924b0525c23b1d1ab164

--------------------------------

[ Upstream commit 0204750b ]

KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES
bits. When kvm_get_arch_capabilities() was originally written, there
were only a few bits defined in this MSR, and KVM could virtualize all
of them. However, over the years, several bits have been defined that
KVM cannot just blindly pass through to the guest without additional
work (such as virtualizing an MSR promised by the
IA32_ARCH_CAPABILITES feature bit).

Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off
any other bits that are set in the hardware MSR.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 5b76a3cf ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NVipin Sharma <vipinsh@google.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20220830174947.2182144-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

9fce9f29

24 11月, 2022 9 次提交

KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer · 431c2fb4

由 Sean Christopherson 提交于 2月 02, 2022

mainline inclusion
from mainline-v5.17-rc3
commit 6e37ec88
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit 6e37ec88 KVM: x86: Use ERR_PTR_USR() to return -EFAULT as a __user pointer.

--------------------------------

Use ERR_PTR_USR() when returning -EFAULT from kvm_get_attr_addr(), sparse
complains about implicitly casting the kernel pointer from ERR_PTR() into
a __user pointer.

>> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression
   (different address spaces) @@     expected void [noderef] __user * @@     got void * @@
   arch/x86/kvm/x86.c:4342:31: sparse:     expected void [noderef] __user *
   arch/x86/kvm/x86.c:4342:31: sparse:     got void *
>> arch/x86/kvm/x86.c:4342:31: sparse: sparse: incorrect type in return expression
   (different address spaces) @@     expected void [noderef] __user * @@     got void * @@
   arch/x86/kvm/x86.c:4342:31: sparse:     expected void [noderef] __user *
   arch/x86/kvm/x86.c:4342:31: sparse:     got void *

No functional change intended.

Fixes: 56f289a8 ("KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220202005157.2545816-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

431c2fb4

KVM: x86: add system attribute to retrieve full set of supported xsave states · 953b514a

由 Paolo Bonzini 提交于 1月 26, 2022

mainline inclusion
from mainline-v5.17-rc2
commit dd6e6312
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit dd6e6312 KVM: x86: add system attribute to retrieve
full set of supported xsave states.

--------------------------------

Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
have not been enabled.  Probing those, for example so that they can be
passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
The latter is in fact worse, even though that is what the rest of the
API uses, because it would require supported_xcr0 to be moved from the
KVM module to the kernel just for this use.  In addition, the value
would be nonsensical (or an error would have to be returned) until
the KVM module is loaded in.

Therefore, to limit the growth of system ioctls, add a /dev/kvm
variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

953b514a

KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr · c7cc4d56

由 Sean Christopherson 提交于 1月 27, 2022

mainline inclusion
from mainline-v5.17-rc2
commit 56f289a8
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit 56f289a8 KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr.

--------------------------------

Add a helper to handle converting the u64 userspace address embedded in
struct kvm_device_attr into a userspace pointer, it's all too easy to
forget the intermediate "unsigned long" cast as well as the truncation
check.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

c7cc4d56

kvm: x86: Disable interception for IA32_XFD on demand · 7b32cbb5

由 Kevin Tian 提交于 1月 05, 2022

mainline inclusion
from mainline-v5.17-rc1
commit b5274b1b
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit b5274b1b kvm: x86: Disable interception for IA32_XFD on demand.

--------------------------------

Always intercepting IA32_XFD causes non-negligible overhead when this
register is updated frequently in the guest.

Disable r/w emulation after intercepting the first WRMSR(IA32_XFD)
with a non-zero value.

Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync
with the software states in fpstate and the per-cpu xfd cache. This
leads to two additional changes accordingly:

  - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring
    software states back in-sync with the MSR, before handle_exit_irqoff()
    is called.

  - Always trap #NM once write interception is disabled for IA32_XFD.
    The #NM exception is rare if the guest doesn't use dynamic
    features. Otherwise, there is at most one exception per guest
    task given a dynamic feature.

p.s. We have confirmed that SDM is being revised to say that
when setting IA32_XFD[18] the AMX register state is not guaranteed
to be preserved. This clarification avoids adding mess for a creative
guest which sets IA32_XFD[18]=1 before saving active AMX state to
its own storage.
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-22-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

7b32cbb5

kvm: x86: Add support for getting/setting expanded xstate buffer · 0d9e57e1

由 Guang Zeng 提交于 1月 05, 2022

mainline inclusion
from mainline-v5.17-rc1
commit be50b206
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit be50b206 kvm: x86: Add support for getting/setting
expanded xstate buffer.

--------------------------------

With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set
xstate data from/to KVM. This doesn't work when dynamic xfeatures
(e.g. AMX) are exposed to the guest as they require a larger buffer
size.

Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the
required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2).
KVM_SET_XSAVE is extended to work with both legacy and new capabilities
by doing properly-sized memdup_user() based on the guest fpu container.
KVM_GET_XSAVE is kept for backward-compatible reason. Instead,
KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred
interface for getting xstate buffer (4KB or larger size) from KVM
(Link: https://lkml.org/lkml/2021/12/15/510)

Also, update the api doc with the new KVM_GET_XSAVE2 ioctl.
Signed-off-by: NGuang Zeng <guang.zeng@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-19-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

0d9e57e1

kvm: x86: Add XCR0 support for Intel AMX · 05370b7c

由 Jing Liu 提交于 1月 05, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 86aff7a4
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit 86aff7a4 kvm: x86: Add XCR0 support for Intel AMX.

--------------------------------

Two XCR0 bits are defined for AMX to support XSAVE mechanism. Bit 17
is for tilecfg and bit 18 is for tiledata.

The value of XCR0[17:18] is always either 00b or 11b. Also, SDM
recommends that only 64-bit operating systems enable Intel AMX by
setting XCR0[18:17]. 32-bit host kernel never sets the tile bits in
vcpu->arch.guest_supported_xcr0.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-16-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

05370b7c

kvm: x86: Emulate IA32_XFD_ERR for guest · 074c1876

由 Jing Liu 提交于 1月 05, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 548e8365
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit 548e8365 kvm: x86: Emulate IA32_XFD_ERR for guest.

--------------------------------

Emulate read/write to IA32_XFD_ERR MSR.

Only the saved value in the guest_fpu container is touched in the
emulation handler. Actual MSR update is handled right before entering
the guest (with preemption disabled)
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-14-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

074c1876

kvm: x86: Intercept #NM for saving IA32_XFD_ERR · 4a642360

由 Jing Liu 提交于 1月 05, 2022

mainline inclusion
from mainline-v5.17-rc1
commit ec5be88a
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit ec5be88a kvm: x86: Intercept #NM for saving IA32_XFD_ERR.

--------------------------------

Guest IA32_XFD_ERR is generally modified in two places:

  - Set by CPU when #NM is triggered;
  - Cleared by guest in its #NM handler;

Intercept #NM for the first case when a nonzero value is written
to IA32_XFD. Nonzero indicates that the guest is willing to do
dynamic fpstate expansion for certain xfeatures, thus KVM needs to
manage and virtualize guest XFD_ERR properly. The vcpu exception
bitmap is updated in XFD write emulation according to guest_fpu::xfd.

Save the current XFD_ERR value to the guest_fpu container in the #NM
VM-exit handler. This must be done with interrupt disabled, otherwise
the unsaved MSR value may be clobbered by host activity.

The saving operation is conducted conditionally only when guest_fpu:xfd
includes a non-zero value. Doing so also avoids misread on a platform
which doesn't support XFD but #NM is triggered due to L1 interception.

Queueing #NM to the guest is postponed to handle_exception_nmi(). This
goes through the nested_vmx check so a virtual vmexit is queued instead
when #NM is triggered in L2 but L1 wants to intercept it.

Restore the host value (always ZERO outside of the host #NM
handler) before enabling interrupt.

Restore the guest value from the guest_fpu container right before
entering the guest (with interrupt disabled).
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NKevin Tian <kevin.tian@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-13-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

4a642360

kvm: x86: Add emulation for IA32_XFD · 5910de32

由 Jing Liu 提交于 1月 05, 2022

mainline inclusion
from mainline-v5.17-rc1
commit 820a6ee9
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ
CVE: NA

Intel-SIG: commit 820a6ee9 kvm: x86: Add emulation for IA32_XFD.

--------------------------------

Intel's eXtended Feature Disable (XFD) feature allows the software
to dynamically adjust fpstate buffer size for XSAVE features which
have large state.

Because guest fpstate has been expanded for all possible dynamic
xstates at KVM_SET_CPUID2, emulation of the IA32_XFD MSR is
straightforward. For write just call fpu_update_guest_xfd() to
update the guest fpu container once all the sanity checks are passed.
For read simply return the cached value in the container.
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NJing Liu <jing2.liu@intel.com>
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Message-Id: <20220105123532.12586-11-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLin Wang <lin.x.wang@intel.com>

5910de32

18 11月, 2022 15 次提交

x86/kvm: Convert FPU handling to a single swap buffer · 38d23e01

由 Thomas Gleixner 提交于 10月 22, 2021

mainline inclusion
from mainline-v5.16-rc1
commit d69c1382
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit d69c1382 x86/kvm: Convert FPU handling to a single swap buffer.

--------------------------------

For the upcoming AMX support it's necessary to do a proper integration with
KVM. Currently KVM allocates two FPU structs which are used for saving the user
state of the vCPU thread and restoring the guest state when entering
vcpu_run() and doing the reverse operation before leaving vcpu_run().

With the new fpstate mechanism this can be reduced to one extra buffer by
swapping the fpstate pointer in current::thread::fpu. This makes the
upcoming support for AMX and XFD simpler because then fpstate information
(features, sizes, xfd) are always consistent and it does not require any
nasty workarounds.

Convert the KVM FPU code over to this new scheme.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185313.019454292@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

38d23e01

x86/KVM: Convert to fpstate · 0ba61ece

由 Thomas Gleixner 提交于 10月 13, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 1c57572d
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 1c57572d x86/KVM: Convert to fpstate.

--------------------------------

Convert KVM code to the new register storage mechanism in preparation for
dynamically sized buffers.

No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211013145322.451439983@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

0ba61ece

x86/fpu: Replace KVMs xstate component clearing · 0c034c63

由 Thomas Gleixner 提交于 10月 13, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 087df48c
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 087df48c x86/fpu: Replace KVMs xstate component clearing.

--------------------------------

In order to prepare for the support of dynamically enabled FPU features,
move the clearing of xstate components to the FPU core code.

No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211013145322.399567049@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

0c034c63

x86/fpu: Replace KVMs home brewed FPU copy to user · c32bda38

由 Thomas Gleixner 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.16-rc1
commit bf5d0047
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit bf5d0047 x86/fpu: Replace KVMs home brewed FPU copy to user.

--------------------------------

Similar to the copy from user function the FPU core has this already
implemented with all bells and whistles.

Get rid of the duplicated code and use the core functionality.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.244101845@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

c32bda38

x86/fpu: Replace KVMs home brewed FPU copy from user · f099a455

由 Thomas Gleixner 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.16-rc1
commit ea4d6938
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit ea4d6938 x86/fpu: Replace KVMs home brewed FPU copy from user.

--------------------------------

Copying a user space buffer to the memory buffer is already available in
the FPU core. The copy mechanism in KVM lacks sanity checks and needs to
use cpuid() to lookup the offset of each component, while the FPU core has
this information cached.

Make the FPU core variant accessible for KVM and replace the home brewed
mechanism.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.134065207@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f099a455

x86/fpu: Move KVMs FPU swapping to FPU core · 4aeabe2f

由 Thomas Gleixner 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.16-rc1
commit a0ff0611
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit a0ff0611 x86/fpu: Move KVMs FPU swapping to FPU core.

--------------------------------

Swapping the host/guest FPU is directly fiddling with FPU internals which
requires 5 exports. The upcoming support of dynamically enabled states
would even need more.

Implement a swap function in the FPU core code and export that instead.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: https://lkml.kernel.org/r/20211015011539.076072399@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

4aeabe2f

x86/fpu: Cleanup xstate xcomp_bv initialization · b44cbb8b

由 Thomas Gleixner 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 126fe040
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 126fe040 x86/fpu: Cleanup xstate xcomp_bv initialization.

--------------------------------

No point in having this duplicated all over the place with needlessly
different defines.

Provide a proper initialization function which initializes user buffers
properly and make KVM use it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.897664678@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b44cbb8b

x86/pkru: Remove xstate fiddling from write_pkru() · 73a93bbd

由 Thomas Gleixner 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 72a6c08c
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 72a6c08c x86/pkru: Remove xstate fiddling from write_pkru().

--------------------------------

The PKRU value of a task is stored in task->thread.pkru when the task is
scheduled out. PKRU is restored on schedule in from there. So keeping the
XSAVE buffer up to date is a pointless exercise.

Remove the xstate fiddling and cleanup all related functions.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121456.897372712@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

73a93bbd

x86/pkeys: Move read_pkru() and write_pkru() · 7eda2126

由 Dave Hansen 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 784a4661
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 784a4661 x86/pkeys: Move read_pkru() and write_pkru().

--------------------------------

write_pkru() was originally used just to write to the PKRU register.  It
was mercifully short and sweet and was not out of place in pgtable.h with
some other pkey-related code.

But, later work included a requirement to also modify the task XSAVE
buffer when updating the register.  This really is more related to the
XSAVE architecture than to paging.

Move the read/write_pkru() to asm/pkru.h.  pgtable.h won't miss them.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121455.102647114@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

7eda2126

x86/fpu: Rename copy_kernel_to_fpregs() to restore_fpregs_from_fpstate() · cd369c86

由 Thomas Gleixner 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 1c61fada
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 1c61fada x86/fpu: Rename copy_kernel_to_fpregs() to restore_fpregs_from_fpstate().

--------------------------------

This is not a copy functionality. It restores the register state from the
supplied kernel buffer.

No functional changes.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121454.716058365@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

cd369c86

x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate() · 10212d26

由 Thomas Gleixner 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit ebe7234b
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit ebe7234b x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate().

--------------------------------

A copy is guaranteed to leave the source intact, which is not the case when
FNSAVE is used as that reinitilizes the registers.

Save does not make such guarantees and it matches what this is about,
i.e. to save the state for a later restore.

Rename it to save_fpregs_to_fpstate().
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121454.508853062@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

10212d26

x86/kvm: Avoid looking up PKRU in XSAVE buffer · 8ccdba41

由 Dave Hansen 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 71ef4533
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 71ef4533 x86/kvm: Avoid looking up PKRU in XSAVE buffer.

--------------------------------

PKRU is being removed from the kernel XSAVE/FPU buffers.  This removal
will probably include warnings for code that look up PKRU in those
buffers.

KVM currently looks up the location of PKRU but doesn't even use the
pointer that it gets back.  Rework the code to avoid calling
get_xsave_addr() except in cases where its result is actually used.

This makes the code more clear and also avoids the inevitable PKRU
warnings.

This is probably a good cleanup and could go upstream idependently
of any PKRU rework.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.541037562@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

8ccdba41

KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · 3c6b4607

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit c72a9b1d0dadfd85d19eaea81f61f7b286c57a31
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c72a9b1d0dadfd85d19eaea81f61f7b286c57a31

--------------------------------

[ Upstream commit c2fe3cd4 ]

Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.

Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.

Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics.  This will be remedied in a future patch.

Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3c6b4607

KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) · e01cb813

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e7ccee2f09b06303fb39f8cb19a2c21d388dc4e6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e7ccee2f09b06303fb39f8cb19a2c21d388dc4e6

--------------------------------

[ Upstream commit 2368048b ]

Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR.  The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs.  I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case.  Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.

Fixes: 890ca9ae ("KVM: Add MCE support")
Fixes: 9ffd986c ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e01cb813

KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors · e503ec39

由 Lev Kujawski 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit f5385a590df78d7649876a2087646090e867e6eb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f5385a590df78d7649876a2087646090e867e6eb

--------------------------------

[ Upstream commit 0471a7bd ]

Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
MC1_CTL to ignore single-bit ECC data errors.  Single-bit ECC data
errors are always correctable and thus are safe to ignore because they
are informational in nature rather than signaling a loss of data
integrity.

Prior to this patch, these guests would crash upon writing MC1_CTL,
with resultant error messages like the following:

error: kvm run failed Operation not permitted
EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
GDT=     ffff5020 000002cf
IDT=     ffff52f0 000007ff
CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
74 26 00 0f 31 c3
Signed-off-by: NLev Kujawski <lkujaw@member.fsf.org>
Message-Id: <20220521081511.187388-1-lkujaw@member.fsf.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e503ec39

03 11月, 2022 6 次提交

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit · f5e2ac5e

由 Chenyi Qiang 提交于 2月 02, 2021

mainline inclusion
from mainline-v5.13-rc2
commit e8ea85fb
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=e8ea85fb

Intel-SIG: commit e8ea85fb ("KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit")

-------------------------------------

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit

Bus lock debug exception introduces a new bit DR6_BUS_LOCK (bit 11 of
DR6) to indicate that bus lock #DB exception is generated. The set/clear
of DR6_BUS_LOCK is similar to the DR6_RTM. The processor clears
DR6_BUS_LOCK when the exception is generated. For all other #DB, the
processor sets this bit to 1. Software #DB handler should set this bit
before returning to the interrupted task.

In VMM, to avoid breaking the CPUs without bus lock #DB exception
support, activate the DR6_BUS_LOCK conditionally in DR6_FIXED_1 bits.
When intercepting the #DB exception caused by bus locks, bit 11 of the
exit qualification is set to identify it. The VMM should emulate the
exception by clearing the bit 11 of the guest DR6.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f5e2ac5e

KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW · 232e5522

由 Chenyi Qiang 提交于 2月 02, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 9a3ecd5e
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=9a3ecd5e

Intel-SIG: commit 9a3ecd5e ("KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW")

-------------------------------------

KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW

DR6_INIT contains the 1-reserved bits as well as the bit that is cleared
to 0 when the condition (e.g. RTM) happens. The value can be used to
initialize dr6 and also be the XOR mask between the #DB exit
qualification (or payload) and DR6.

Concerning that DR6_INIT is used as initial value only once, rename it
to DR6_ACTIVE_LOW and apply it in other places, which would make the
incoming changes for bus lock debug exception more simple.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com>
[Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

232e5522

KVM: VMX: Enable Notify VM exit · 0cbdfd9b

由 Tao Xu 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 2f4073e0
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2f4073e0

Intel-SIG: commit 2f4073e0 ("KVM: VMX: Enable Notify VM exit")

-------------------------------------

KVM: VMX: Enable Notify VM exit

There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NTao Xu <tao3.xu@intel.com>
Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

0cbdfd9b

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault · af5a4488

由 Chenyi Qiang 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit ed235117
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=ed235117

Intel-SIG: commit ed235117 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")

-------------------------------------

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault

For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

af5a4488

KVM: VMX: Enable bus lock VM exit · 26bba696

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit fe6b6bc8
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=fe6b6bc8

Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")

-------------------------------------

KVM: VMX: Enable bus lock VM exit

Virtual Machine can exploit bus locks to degrade the performance of
system. Bus lock can be caused by split locked access to writeback(WB)
memory or by using locks on uncacheable(UC) memory. The bus lock is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores (which must wait for
the bus lock to be released before their memory operations can
complete).

To address the threat, bus lock VM exit is introduced to notify the VMM
when a bus lock was acquired, allowing it to enforce throttling or other
policy based mitigations.

A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
Detection" VM-execution control(bit 30 of Secondary Processor-based VM
execution controls). If delivery of this VM exit was preempted by a
higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
exit reason in vmcs field is set to 1.

In current implementation, the KVM exposes this capability through
KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
(i.e. off and exit) and enable it explicitly (disabled by default). If
bus locks in guest are detected by KVM, exit to user space even when
current exit reason is handled by KVM internally. Set a new field
KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
is a bus lock detected in guest.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

26bba696

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run · 580aa8e4

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit 15aad3be
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=15aad3be

Intel-SIG: commit 15aad3be ("KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run")

-------------------------------------

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run

Reset the vcpu->run->flags at the beginning of kvm_arch_vcpu_ioctl_run.
It can avoid every thunk of code that needs to set the flag clear it,
which increases the odds of missing a case and ending up with a flag in
an undefined state.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

580aa8e4

02 11月, 2022 1 次提交

KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() · 3bad6ccc

由 Vitaly Kuznetsov 提交于 11月 02, 2022

stable inclusion
from stable-v5.10.132
commit eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3

--------------------------------

[ Upstream commit 8a414f94 ]

'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220708125147.593975-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

3bad6ccc

08 10月, 2022 3 次提交

KVM: VMX: enable IPI virtualization · c52cf141

由 Chao Gao 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit d588bb9b
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d588bb9be1da6aa750aa64875fe57369db983d8b

Intel-SIG: commit d588bb9b ("KVM: VMX: enable IPI virtualization")

-------------------------------------

KVM: VMX: enable IPI virtualization

With IPI virtualization enabled, the processor emulates writes to
APIC registers that would send IPIs. The processor sets the bit
corresponding to the vector in target vCPU's PIR and may send a
notification (IPI) specified by NDST and NV fields in target vCPU's
Posted-Interrupt Descriptor (PID). It is similar to what IOMMU
engine does when dealing with posted interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID. The table size depends on maximum APIC
ID assigned for current VM session from userspace. Allocating memory
for PID-pointer table is deferred to vCPU creation, because irqchip
mode and VM-scope maximum APIC ID is settled at that point. KVM can
skip PID-pointer table allocation if !irqchip_in_kernel().

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
trap-like APIC-write VM-exit and need to be handled by VMM.
Signed-off-by: NChao Gao <chao.gao@intel.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154510.11938-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

c52cf141

KVM: x86: Allow userspace to set maximum VCPU id for VM · 59461b8c

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 35875316
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=35875316384b71d23dc2a45a969732fc8cab16af

Intel-SIG: commit 35875316 ("KVM: x86: Allow userspace to set maximum VCPU id for VM")

-------------------------------------

KVM: x86: Allow userspace to set maximum VCPU id for VM

Introduce new max_vcpu_ids in KVM for x86 architecture. Userspace
can assign maximum possible vcpu id for current VM session using
KVM_CAP_MAX_VCPU_ID of KVM_ENABLE_CAP ioctl().

This is done for x86 only because the sole use case is to guide
memory allocation for PID-pointer table, a structure needed to
enable VMX IPI.

By default, max_vcpu_ids set as KVM_MAX_VCPU_ID.
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154444.11888-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

59461b8c

KVM: Move kvm_arch_vcpu_precreate() under kvm->lock · c8477db6

由 Zeng Guang 提交于 4月 19, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 1d5e740d
category: feature
feature: IPI Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1d5e740d518e02cea46325b3d37135bf9c08982a

Intel-SIG: commit 1d5e740d ("KVM: Move kvm_arch_vcpu_precreate() under kvm->lock")

-------------------------------------

KVM: Move kvm_arch_vcpu_precreate() under kvm->lock

kvm_arch_vcpu_precreate() targets to handle arch specific VM resource
to be prepared prior to the actual creation of vCPU. For example, x86
platform may need do per-VM allocation based on max_vcpu_ids at the
first vCPU creation. It probably leads to concurrency control on this
allocation as multiple vCPU creation could happen simultaneously. From
the architectual point of view, it's necessary to execute
kvm_arch_vcpu_precreate() under protect of kvm->lock.

Currently only arm64, x86 and s390 have non-nop implementations at the
stage of vCPU pre-creation. Remove the lock acquiring in s390's design
and make sure all architecture can run kvm_arch_vcpu_precreate() safely
under kvm->lock without recrusive lock issue.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NZeng Guang <guang.zeng@intel.com>
Message-Id: <20220419154409.11842-1-guang.zeng@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJason Zeng <jason.zeng@intel.com>

c8477db6

20 9月, 2022 2 次提交

x86/kvm/vmx: Make noinstr clean · 55b31b25

由 Peter Zijlstra 提交于 9月 20, 2022

stable inclusion
from stable-v5.10.133
commit 7070bbb66c5303117e4c7651711ea7daae4c64b5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS
CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7070bbb66c5303117e4c7651711ea7daae4c64b5

--------------------------------

commit 742ab6df upstream.

The recent mmio_stale_data fixes broke the noinstr constraints:

  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section
  vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section

make it all happy again.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

55b31b25

KVM: x86: do not report a vCPU as preempted outside instruction boundaries · 0b66a631

由 Paolo Bonzini 提交于 9月 20, 2022

mainline inclusion
from mainline-v5.19-rc2
commit 6cd88243
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H
CVE: CVE-2022-39189

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd88243c7e03845a450795e134b488fc2afb736

----------------------------------------

If a vCPU is outside guest mode and is scheduled out, it might be in the
process of making a memory access.  A problem occurs if another vCPU uses
the PV TLB flush feature during the period when the vCPU is scheduled
out, and a virtual address has already been translated but has not yet
been accessed, because this is equivalent to using a stale TLB entry.

To avoid this, only report a vCPU as preempted if sure that the guest
is at an instruction boundary.  A rescheduling request will be delivered
to the host physical CPU as an external interrupt, so for simplicity
consider any vmexit *not* instruction boundary except for external
interrupts.

It would in principle be okay to report the vCPU as preempted also
if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the
vmentry/vmexit overhead unnecessarily, and optimistic spinning is
also unlikely to succeed.  However, leave it for later because right
now kvm_vcpu_check_block() is doing memory accesses.  Even
though the TLB flush issue only applies to virtual memory address,
it's very much preferrable to be conservative.
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

conflict:
	arch/x86/kvm/x86.c
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0b66a631

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功