提交 · b44cbb8bb5100df1308ecb6bdd1199928e571bbe · openeuler / Kernel

18 11月, 2022 28 次提交

x86/fpu: Cleanup xstate xcomp_bv initialization · b44cbb8b

由 Thomas Gleixner 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.16-rc1
commit 126fe040
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 126fe040 x86/fpu: Cleanup xstate xcomp_bv initialization.

--------------------------------

No point in having this duplicated all over the place with needlessly
different defines.

Provide a proper initialization function which initializes user buffers
properly and make KVM use it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211015011538.897664678@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b44cbb8b

x86/pkru: Remove xstate fiddling from write_pkru() · 73a93bbd

由 Thomas Gleixner 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 72a6c08c
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 72a6c08c x86/pkru: Remove xstate fiddling from write_pkru().

--------------------------------

The PKRU value of a task is stored in task->thread.pkru when the task is
scheduled out. PKRU is restored on schedule in from there. So keeping the
XSAVE buffer up to date is a pointless exercise.

Remove the xstate fiddling and cleanup all related functions.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121456.897372712@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

73a93bbd

x86/pkeys: Move read_pkru() and write_pkru() · 7eda2126

由 Dave Hansen 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 784a4661
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 784a4661 x86/pkeys: Move read_pkru() and write_pkru().

--------------------------------

write_pkru() was originally used just to write to the PKRU register.  It
was mercifully short and sweet and was not out of place in pgtable.h with
some other pkey-related code.

But, later work included a requirement to also modify the task XSAVE
buffer when updating the register.  This really is more related to the
XSAVE architecture than to paging.

Move the read/write_pkru() to asm/pkru.h.  pgtable.h won't miss them.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121455.102647114@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

7eda2126

x86/fpu: Rename copy_kernel_to_fpregs() to restore_fpregs_from_fpstate() · cd369c86

由 Thomas Gleixner 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 1c61fada
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 1c61fada x86/fpu: Rename copy_kernel_to_fpregs() to restore_fpregs_from_fpstate().

--------------------------------

This is not a copy functionality. It restores the register state from the
supplied kernel buffer.

No functional changes.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121454.716058365@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

cd369c86

x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate() · 10212d26

由 Thomas Gleixner 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit ebe7234b
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit ebe7234b x86/fpu: Rename copy_fpregs_to_fpstate() to save_fpregs_to_fpstate().

--------------------------------

A copy is guaranteed to leave the source intact, which is not the case when
FNSAVE is used as that reinitilizes the registers.

Save does not make such guarantees and it matches what this is about,
i.e. to save the state for a later restore.

Rename it to save_fpregs_to_fpstate().
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121454.508853062@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

10212d26

x86/kvm: Avoid looking up PKRU in XSAVE buffer · 8ccdba41

由 Dave Hansen 提交于 6月 23, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 71ef4533
category: feature
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC
CVE: NA

Intel-SIG: commit 71ef4533 x86/kvm: Avoid looking up PKRU in XSAVE buffer.

--------------------------------

PKRU is being removed from the kernel XSAVE/FPU buffers.  This removal
will probably include warnings for code that look up PKRU in those
buffers.

KVM currently looks up the location of PKRU but doesn't even use the
pointer that it gets back.  Rework the code to avoid calling
get_xsave_addr() except in cases where its result is actually used.

This makes the code more clear and also avoids the inevitable PKRU
warnings.

This is probably a good cleanup and could go upstream idependently
of any PKRU rework.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20210623121453.541037562@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

8ccdba41

kvm: x86/pmu: Fix the compare function used by the pmu event filter · 4e14dff5

由 Aaron Lewis 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 98b20e1612e69bf91185cf722a96293a136fe894
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=98b20e1612e69bf91185cf722a96293a136fe894

--------------------------------

commit 4ac19ead upstream.

When returning from the compare function the u64 is truncated to an
int.  This results in a loss of the high nybble[1] in the event select
and its sign if that nybble is in use.  Switch from using a result that
can end up being truncated to a result that can only be: 1, 0, -1.

[1] bits 35:32 in the event select register and bits 11:8 in the event
    select.

Fixes: 7ff775ac ("KVM: x86/pmu: Use binary search to check filtered events")
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220517051238.2566934-1-aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4e14dff5

KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast() · 5fddb9fa

由 Vitaly Kuznetsov 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit ac7de8c2ba1292856fdd4a4c0764669b9607cf0a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ac7de8c2ba1292856fdd4a4c0764669b9607cf0a

--------------------------------

commit 00b5f371 upstream

When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF
shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON()
instead of crashing the host.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-3-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NStefan Ghinea <stefan.ghinea@windriver.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5fddb9fa

KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq · a5998ae3

由 Vitaly Kuznetsov 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 4c85e207c1b58249ea521670df577324ad69442c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4c85e207c1b58249ea521670df577324ad69442c

--------------------------------

commit 7ec37d1c upstream

When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for
irqchip_in_kernel() so normally SynIC irqs should never be set. It is,
however,  possible for a misbehaving VMM to write to SYNIC/STIMER MSRs
causing erroneous behavior.

The immediate issue being fixed is that kvm_irq_delivery_to_apic()
(kvm_irq_delivery_to_apic_fast()) crashes when called with
'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NStefan Ghinea <stefan.ghinea@windriver.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a5998ae3

KVM: x86/pmu: Ignore pmu->global_ctrl check if vPMU doesn't support global_ctrl · ce9e3ba2

由 Like Xu 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit b788508a09901b1324ae3afe8aa0897e380422af
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b788508a09901b1324ae3afe8aa0897e380422af

--------------------------------

[ Upstream commit 98defd2e ]

MSR_CORE_PERF_GLOBAL_CTRL is introduced as part of Architecture PMU V2,
as indicated by Intel SDM 19.2.2 and the intel_is_valid_msr() function.

So in the absence of global_ctrl support, all PMCs are enabled as AMD does.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220509102204.62389-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

ce9e3ba2

KVM: VMX: Mark all PERF_GLOBAL_(OVF)_CTRL bits reserved if there's no vPMU · b4e8a0cd

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 6b4addec2f2d68ed774b4b7b6264419b15aee3b0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6b4addec2f2d68ed774b4b7b6264419b15aee3b0

--------------------------------

[ Upstream commit 93255bf9 ]

Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits
as reserved if there is no guest vPMU.  The nVMX VM-Entry consistency
checks do not check for a valid vPMU prior to consuming the masks via
kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask
to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the
actual MSR load will be rejected by intel_is_valid_msr()).

Fixes: f5132b01 ("KVM: Expose a version 2 architectural PMU to a guests")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220722224409.1336532-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

b4e8a0cd

KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter · ad9c539b

由 Like Xu 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 46ec3d8e909429f18b3f859de356e6e182637cbc
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=46ec3d8e909429f18b3f859de356e6e182637cbc

--------------------------------

[ Upstream commit 2c985527 ]

The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is a generic code refactoring.
Co-developed-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Message-Id: <20220411101946.20262-6-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

ad9c539b

KVM: x86/pmu: Use different raw event masks for AMD and Intel · 81305c9c

由 Jim Mattson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 2ba1feb14363fd311f2cc912b58e5bb0b2b73662
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ba1feb14363fd311f2cc912b58e5bb0b2b73662

--------------------------------

[ Upstream commit 95b065bf ]

The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.

Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().

Fixes: 710c4765 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

81305c9c

KVM: x86/pmu: Use binary search to check filtered events · 6a083241

由 Jim Mattson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 4bbfc055d3a788c3b7278a41b2f7cd9f804502da
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4bbfc055d3a788c3b7278a41b2f7cd9f804502da

--------------------------------

[ Upstream commit 7ff775ac ]

The PMU event filter may contain up to 300 events. Replace the linear
search in reprogram_gp_counter() with a binary search.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220115052431.447232-2-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

6a083241

KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4 · a12d5b21

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit a7d0b21c6b406e441fffbe4239c36ce83447fc52
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a7d0b21c6b406e441fffbe4239c36ce83447fc52

--------------------------------

[ Upstream commit c7d855c2 ]

Inject a #UD if L1 attempts VMXON with a CR0 or CR4 that is disallowed
per the associated nested VMX MSRs' fixed0/1 settings.  KVM cannot rely
on hardware to perform the checks, even for the few checks that have
higher priority than VM-Exit, as (a) KVM may have forced CR0/CR4 bits in
hardware while running the guest, (b) there may incompatible CR0/CR4 bits
that have lower priority than VM-Exit, e.g. CR0.NE, and (c) userspace may
have further restricted the allowed CR0/CR4 values by manipulating the
guest's nested VMX MSRs.

Note, despite a very strong desire to throw shade at Jim, commit
70f3aac9 ("kvm: nVMX: Remove superfluous VMX instruction fault checks")
is not to blame for the buggy behavior (though the comment...).  That
commit only removed the CR0.PE, EFLAGS.VM, and COMPATIBILITY mode checks
(though it did erroneously drop the CPL check, but that has already been
remedied).  KVM may force CR0.PE=1, but will do so only when also
forcing EFLAGS.VM=1 to emulate Real Mode, i.e. hardware will still #UD.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=216033
Fixes: ec378aee ("KVM: nVMX: Implement VMXON and VMXOFF")
Reported-by: NEric Li <ercli@ucdavis.edu>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220607213604.3346000-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

a12d5b21

KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · 3c6b4607

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit c72a9b1d0dadfd85d19eaea81f61f7b286c57a31
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c72a9b1d0dadfd85d19eaea81f61f7b286c57a31

--------------------------------

[ Upstream commit c2fe3cd4 ]

Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
KVM_SET_SREGS would return success while failing to actually set CR4.

Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
in __set_sregs() is not a viable option as KVM has already stuffed a
variety of vCPU state.

Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
inverted semantics.  This will be remedied in a future patch.

Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

3c6b4607

KVM: SVM: Drop VMXE check from svm_set_cr4() · c237d765

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 2f04a04d06509fbba0a4069dba5d71010139d921
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2f04a04d06509fbba0a4069dba5d71010139d921

--------------------------------

[ Upstream commit 311a0659 ]

Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
the check by incorporating VMXE into the CR4 reserved bits, via
kvm_cpu_caps.  SVM obviously does not set X86_FEATURE_VMX.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-4-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

c237d765

KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4() · 26296e1c

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit da7f731f2ed5b4a082567967ce74be274aab2daf
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=da7f731f2ed5b4a082567967ce74be274aab2daf

--------------------------------

[ Upstream commit a447e38a ]

Drop vmx_set_cr4()'s explicit check on the 'nested' module param now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, via kvm_cpu_caps.  X86_FEATURE_VMX is set in kvm_cpu_caps
(by vmx_set_cpu_caps()), if and only if 'nested' is true.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

26296e1c

KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4() · be916451

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 8b8b376903b32d3d854f39eeebe018169c920cb6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8b8b376903b32d3d854f39eeebe018169c920cb6

--------------------------------

[ Upstream commit d3a9e414 ]

Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, i.e. in cr4_guest_rsvd_bits.  This fixes a bug where KVM
incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
before KVM_SET_CPUID{,2}.

Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Reported-by: NStas Sergeev <stsp@users.sourceforge.net>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

be916451

KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) · e01cb813

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e7ccee2f09b06303fb39f8cb19a2c21d388dc4e6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e7ccee2f09b06303fb39f8cb19a2c21d388dc4e6

--------------------------------

[ Upstream commit 2368048b ]

Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR.  The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs.  I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case.  Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.

Fixes: 890ca9ae ("KVM: Add MCE support")
Fixes: 9ffd986c ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e01cb813

KVM: set_msr_mce: Permit guests to ignore single-bit ECC errors · e503ec39

由 Lev Kujawski 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit f5385a590df78d7649876a2087646090e867e6eb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f5385a590df78d7649876a2087646090e867e6eb

--------------------------------

[ Upstream commit 0471a7bd ]

Certain guest operating systems (e.g., UNIXWARE) clear bit 0 of
MC1_CTL to ignore single-bit ECC data errors.  Single-bit ECC data
errors are always correctable and thus are safe to ignore because they
are informational in nature rather than signaling a loss of data
integrity.

Prior to this patch, these guests would crash upon writing MC1_CTL,
with resultant error messages like the following:

error: kvm run failed Operation not permitted
EAX=fffffffe EBX=fffffffe ECX=00000404 EDX=ffffffff
ESI=ffffffff EDI=00000001 EBP=fffdaba4 ESP=fffdab20
EIP=c01333a5 EFL=00000246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
CS =0100 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
SS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
DS =0108 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
FS =0000 00000000 ffffffff 00c00000
GS =0000 00000000 ffffffff 00c00000
LDT=0118 c1026390 00000047 00008200 DPL=0 LDT
TR =0110 ffff5af0 00000067 00008b00 DPL=0 TSS32-busy
GDT=     ffff5020 000002cf
IDT=     ffff52f0 000007ff
CR0=8001003b CR2=00000000 CR3=0100a000 CR4=00000230
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
EFER=0000000000000000
Code=08 89 01 89 51 04 c3 8b 4c 24 08 8b 01 8b 51 04 8b 4c 24 04 <0f>
30 c3 f7 05 a4 6d ff ff 10 00 00 00 74 03 0f 31 c3 33 c0 33 d2 c3 8d
74 26 00 0f 31 c3
Signed-off-by: NLev Kujawski <lkujaw@member.fsf.org>
Message-Id: <20220521081511.187388-1-lkujaw@member.fsf.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e503ec39

KVM: x86: Tag kvm_mmu_x86_module_init() with __init · 0e9372f1

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 230e369d49976824e7af70d2ac814f0f4584352b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=230e369d49976824e7af70d2ac814f0f4584352b

--------------------------------

commit 982bae43 upstream.

Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists
is to initialize variables when kvm.ko is loaded, i.e. it must never be
called after module initialization.

Fixes: 1d0e8480 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded")
Cc: stable@vger.kernel.org
Reviewed-by: NKai Huang <kai.huang@intel.com>
Tested-by: NMichael Roth <michael.roth@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220803224957.1285926-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

0e9372f1

KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP · fc82dff1

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 0dd8ba6670f4e38eab2f2985019030cd88cc4174
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0dd8ba6670f4e38eab2f2985019030cd88cc4174

--------------------------------

commit 26262069 upstream.

When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set
the error code to the selector.  Intel SDM's says nothing about the #GP,
but AMD's APM explicitly states that both LLDT and LTR set the error code
to the selector, not zero.

Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0),
but the KVM code in question is specific to the base from the descriptor.

Fixes: e37a75a1 ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

fc82dff1

KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks · 5adcaca0

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 68ba319b88889b76e0afcb83ee3f9a07d2ae5f1b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=68ba319b88889b76e0afcb83ee3f9a07d2ae5f1b

--------------------------------

commit ec6e4d86 upstream.

Wait to mark the TSS as busy during LTR emulation until after all fault
checks for the LTR have passed.  Specifically, don't mark the TSS busy if
the new TSS base is non-canonical.

Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the
only reason for the early check was to avoid marking a !PRESENT TSS as
busy, i.e. the common !PRESENT is now done before setting the busy bit.

Fixes: e37a75a1 ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR")
Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5adcaca0

KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value · d0f8994f

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit b670a585498e8225d2040d28af5f8fa8268faf31
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b670a585498e8225d2040d28af5f8fa8268faf31

--------------------------------

commit f8ae08f9 upstream.

Restrict the nVMX MSRs based on KVM's config, not based on the guest's
current config.  Using the guest's config to audit the new config
prevents userspace from restoring the original config (KVM's config) if
at any point in the past the guest's config was restricted in any way.

Fixes: 62cc6b9d ("KVM: nVMX: support restore of VMX capability MSRs")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220607213604.3346000-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

d0f8994f

KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0 · 61207477

由 Maciej S. Szmigiero 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 8bb683490278005b4caf61e22b0828a04d282e86
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8bb683490278005b4caf61e22b0828a04d282e86

--------------------------------

commit f17c31c4 upstream.

Don't BUG/WARN on interrupt injection due to GIF being cleared,
since it's trivial for userspace to force the situation via
KVM_SET_VCPU_EVENTS (even if having at least a WARN there would be correct
for KVM internally generated injections).

  kernel BUG at arch/x86/kvm/svm/svm.c:3386!
  invalid opcode: 0000 [#1] SMP
  CPU: 15 PID: 926 Comm: smm_test Not tainted 5.17.0-rc3+ #264
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:svm_inject_irq+0xab/0xb0 [kvm_amd]
  Code: <0f> 0b 0f 1f 00 0f 1f 44 00 00 80 3d ac b3 01 00 00 55 48 89 f5 53
  RSP: 0018:ffffc90000b37d88 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff88810a234ac0 RCX: 0000000000000006
  RDX: 0000000000000000 RSI: ffffc90000b37df7 RDI: ffff88810a234ac0
  RBP: ffffc90000b37df7 R08: ffff88810a1fa410 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
  R13: ffff888109571000 R14: ffff88810a234ac0 R15: 0000000000000000
  FS:  0000000001821380(0000) GS:ffff88846fdc0000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f74fc550008 CR3: 000000010a6fe000 CR4: 0000000000350ea0
  Call Trace:
   <TASK>
   inject_pending_event+0x2f7/0x4c0 [kvm]
   kvm_arch_vcpu_ioctl_run+0x791/0x17a0 [kvm]
   kvm_vcpu_ioctl+0x26d/0x650 [kvm]
   __x64_sys_ioctl+0x82/0xb0
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Fixes: 219b65dc ("KVM: SVM: Improve nested interrupt injection")
Cc: stable@vger.kernel.org
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <35426af6e123cbe91ec7ce5132ce72521f02b1b5.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

61207477

KVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case · f9b8fbb1

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit 860e3343958a03ba8ad78750bdbe5dd1fe35ce02
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=860e3343958a03ba8ad78750bdbe5dd1fe35ce02

--------------------------------

commit 764643a6 upstream.

If a nested run isn't pending, snapshot vmcs01.GUEST_IA32_DEBUGCTL
irrespective of whether or not VM_ENTRY_LOAD_DEBUG_CONTROLS is set in
vmcs12.  When restoring nested state, e.g. after migration, without a
nested run pending, prepare_vmcs02() will propagate
nested.vmcs01_debugctl to vmcs02, i.e. will load garbage/zeros into
vmcs02.GUEST_IA32_DEBUGCTL.

If userspace restores nested state before MSRs, then loading garbage is a
non-issue as loading DEBUGCTL will also update vmcs02.  But if usersepace
restores MSRs first, then KVM is responsible for propagating L2's value,
which is actually thrown into vmcs01, into vmcs02.

Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state
is all kinds of bizarre and ideally would not be supported.  Sadly, some
VMMs do exactly that and rely on KVM to make things work.

Note, there's still a lurking SMM bug, as propagating vmcs01's DEBUGCTL
to vmcs02 across RSM may corrupt L2's DEBUGCTL.  But KVM's entire VMX+SMM
emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the
"default treatment of SMIs", i.e. when not using an SMI Transfer Monitor.

Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com
Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

f9b8fbb1

KVM: nVMX: Snapshot pre-VM-Enter BNDCFGS for !nested_run_pending case · 5b7d0cbd

由 Sean Christopherson 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit ab4805c2638cc1f30697e7e8405c4355a13999ff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ab4805c2638cc1f30697e7e8405c4355a13999ff

--------------------------------

commit fa578398 upstream.

If a nested run isn't pending, snapshot vmcs01.GUEST_BNDCFGS irrespective
of whether or not VM_ENTRY_LOAD_BNDCFGS is set in vmcs12.  When restoring
nested state, e.g. after migration, without a nested run pending,
prepare_vmcs02() will propagate nested.vmcs01_guest_bndcfgs to vmcs02,
i.e. will load garbage/zeros into vmcs02.GUEST_BNDCFGS.

If userspace restores nested state before MSRs, then loading garbage is a
non-issue as loading BNDCFGS will also update vmcs02.  But if usersepace
restores MSRs first, then KVM is responsible for propagating L2's value,
which is actually thrown into vmcs01, into vmcs02.

Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state
is all kinds of bizarre and ideally would not be supported.  Sadly, some
VMMs do exactly that and rely on KVM to make things work.

Note, there's still a lurking SMM bug, as propagating vmcs01.GUEST_BNDFGS
to vmcs02 across RSM may corrupt L2's BNDCFGS.  But KVM's entire VMX+SMM
emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the
"default treatment of SMIs", i.e. when not using an SMI Transfer Monitor.

Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com
Fixes: 62cf9bd8 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
Cc: stable@vger.kernel.org
Cc: Lei Wang <lei4.wang@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

5b7d0cbd

03 11月, 2022 12 次提交

KVM: X86: Expose bus lock debug exception to guest · b9ddddea

由 Paolo Bonzini 提交于 5月 06, 2021

mainline inclusion
from mainline-v5.13-rc2
commit 76ea438b
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=76ea438b

Intel-SIG: commit 76ea438b ("KVM: X86: Expose bus lock debug exception to guest")

-------------------------------------

KVM: X86: Expose bus lock debug exception to guest

Bus lock debug exception is an ability to notify the kernel by an #DB
trap after the instruction acquires a bus lock and is executed when
CPL>0. This allows the kernel to enforce user application throttling or
mitigations.

Existence of bus lock debug exception is enumerated via
CPUID.(EAX=7,ECX=0).ECX[24]. Software can enable these exceptions by
setting bit 2 of the MSR_IA32_DEBUGCTL. Expose the CPUID to guest and
emulate the MSR handling when guest enables it.

Support for this feature was originally developed by Xiaoyao Li and
Chenyi Qiang, but code has since changed enough that this patch has
nothing in common with theirs, except for this commit message.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b9ddddea

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit · f5e2ac5e

由 Chenyi Qiang 提交于 2月 02, 2021

mainline inclusion
from mainline-v5.13-rc2
commit e8ea85fb
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=e8ea85fb

Intel-SIG: commit e8ea85fb ("KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit")

-------------------------------------

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit

Bus lock debug exception introduces a new bit DR6_BUS_LOCK (bit 11 of
DR6) to indicate that bus lock #DB exception is generated. The set/clear
of DR6_BUS_LOCK is similar to the DR6_RTM. The processor clears
DR6_BUS_LOCK when the exception is generated. For all other #DB, the
processor sets this bit to 1. Software #DB handler should set this bit
before returning to the interrupted task.

In VMM, to avoid breaking the CPUs without bus lock #DB exception
support, activate the DR6_BUS_LOCK conditionally in DR6_FIXED_1 bits.
When intercepting the #DB exception caused by bus locks, bit 11 of the
exit qualification is set to identify it. The VMM should emulate the
exception by clearing the bit 11 of the guest DR6.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f5e2ac5e

KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW · 232e5522

由 Chenyi Qiang 提交于 2月 02, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 9a3ecd5e
category: feature
feature: KVM Bus Lock Debug Exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=9a3ecd5e

Intel-SIG: commit 9a3ecd5e ("KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW")

-------------------------------------

KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW

DR6_INIT contains the 1-reserved bits as well as the bit that is cleared
to 0 when the condition (e.g. RTM) happens. The value can be used to
initialize dr6 and also be the XOR mask between the #DB exit
qualification (or payload) and DR6.

Concerning that DR6_INIT is used as initial value only once, rename it
to DR6_ACTIVE_LOW and apply it in other places, which would make the
incoming changes for bus lock debug exception more simple.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com>
[Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

232e5522

KVM: nSVM: set fixed bits by hand · fe26a9fc

由 Paolo Bonzini 提交于 11月 27, 2020

mainline inclusion
from mainline-v5.11-rc1
commit 8cce12b3
category: feature
feature: KVM bus lock debug exception
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=8cce12b3

Intel-SIG: commit 8cce12b3 ("KVM: nSVM: set fixed bits by hand")

-------------------------------------

KVM: nSVM: set fixed bits by hand

SVM generally ignores fixed-1 bits.  Set them manually so that we
do not end up by mistake without those bits set in struct kvm_vcpu;
it is part of userspace API that KVM always returns value with the
bits set.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

fe26a9fc

KVM: VMX: Enable Notify VM exit · 0cbdfd9b

由 Tao Xu 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit 2f4073e0
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2f4073e0

Intel-SIG: commit 2f4073e0 ("KVM: VMX: Enable Notify VM exit")

-------------------------------------

KVM: VMX: Enable Notify VM exit

There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NTao Xu <tao3.xu@intel.com>
Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

0cbdfd9b

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault · af5a4488

由 Chenyi Qiang 提交于 5月 24, 2022

mainline inclusion
from mainline-v6.0-rc1
commit ed235117
category: feature
feature: Notify VM exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=ed235117

Intel-SIG: commit ed235117 ("KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault")

-------------------------------------

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault

For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

af5a4488

KVM: VMX: Remove redundant handling of bus lock vmexit · 265cc29f

由 Hao Xiang 提交于 10月 15, 2021

mainline inclusion
from mainline-v5.15-rc7
commit d61863c6
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=d61863c6

Intel-SIG: commit d61863c6 ("KVM: VMX: Remove redundant handling of bus lock vmexit")

-------------------------------------

KVM: VMX: Remove redundant handling of bus lock vmexit

Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK
VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit
could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK.

We can remove redundant handling of bus lock vmexit. Unconditionally Set
exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with
KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit().
Signed-off-by: NHao Xiang <hao.xiang@linux.alibaba.com>
Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

265cc29f

KVM: nVMX: Fix nested bus lock VM exit · 78e6cda7

由 Chenyi Qiang 提交于 9月 14, 2021

mainline inclusion
from mainline-v5.15-rc4
commit 24a996ad
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=24a996ad

Intel-SIG: commit 24a996ad ("KVM: nVMX: Fix nested bus lock VM exit")

-------------------------------------

KVM: nVMX: Fix nested bus lock VM exit

Nested bus lock VM exits are not supported yet. If L2 triggers bus lock
VM exit, it will be directed to L1 VMM, which would cause unexpected
behavior. Therefore, handle L2's bus lock VM exits in L0 directly.

Fixes: fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210914095041.29764-1-chenyi.qiang@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

78e6cda7

KVM: VMX: Enable bus lock VM exit · 26bba696

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit fe6b6bc8
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=fe6b6bc8

Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit")

-------------------------------------

KVM: VMX: Enable bus lock VM exit

Virtual Machine can exploit bus locks to degrade the performance of
system. Bus lock can be caused by split locked access to writeback(WB)
memory or by using locks on uncacheable(UC) memory. The bus lock is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores (which must wait for
the bus lock to be released before their memory operations can
complete).

To address the threat, bus lock VM exit is introduced to notify the VMM
when a bus lock was acquired, allowing it to enforce throttling or other
policy based mitigations.

A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
Detection" VM-execution control(bit 30 of Secondary Processor-based VM
execution controls). If delivery of this VM exit was preempted by a
higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
exit reason in vmcs field is set to 1.

In current implementation, the KVM exposes this capability through
KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
(i.e. off and exit) and enable it explicitly (disabled by default). If
bus locks in guest are detected by KVM, exit to user space even when
current exit reason is handled by KVM internally. Set a new field
KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
is a bus lock detected in guest.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

26bba696

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run · 580aa8e4

由 Chenyi Qiang 提交于 11月 06, 2020

mainline inclusion
from mainline-v5.12-rc1
commit 15aad3be
category: feature
feature: KVM Bus Lock VM Exit
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=15aad3be

Intel-SIG: commit 15aad3be ("KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run")

-------------------------------------

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run

Reset the vcpu->run->flags at the beginning of kvm_arch_vcpu_ioctl_run.
It can avoid every thunk of code that needs to set the flag clear it,
which increases the odds of missing a case and ending up with a flag in
an undefined state.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

580aa8e4

KVM: Expose AVX_VNNI instruction to guset · b733e0a1

由 Yang Zhong 提交于 1月 05, 2021

mainline inclusion
from mainline-v5.12-rc1
commit 1085a6b5
category: feature
feature: SPR New Instructions Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5O6WB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=1085a6b5

Intel-SIG: commit 1085a6b5 ("KVM: Expose AVX_VNNI instruction to guset")

-------------------------------------

KVM: Expose AVX_VNNI instruction to guset

Expose AVX (VEX-encoded) versions of the Vector Neural Network
Instructions to guest.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 4] AVX_VNNI

The following instructions are available when this feature is
present in the guest.
  1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes
  2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
  3. VPDPWSSD: Multiply and Add Signed Word Integers
  4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation

This instruction is currently documented in the latest "extensions"
manual (ISE). It will appear in the "main" manual (SDM) in the future.
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20210105004909.42000-3-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

b733e0a1

KVM: x86: Expose AVX512_FP16 for supported CPUID · f723ffab

由 Cathy Zhang 提交于 12月 07, 2020

mainline inclusion
from mainline-v5.11-rc1
commit 2224fc9e
category: feature
feature: SPR New Instructions Virtualization
bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5O6WB
CVE: N/A
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
commit/?id=2224fc9e

Intel-SIG: commit 2224fc9e ("KVM: x86: Expose AVX512_FP16 for supported CPUID")

-------------------------------------

KVM: x86: Expose AVX512_FP16 for supported CPUID

AVX512_FP16 is supported by Intel processors, like Sapphire Rapids.
It could gain better performance for it's faster compared to FP32
if the precision or magnitude requirements are met. It's availability
is indicated by CPUID.(EAX=7,ECX=0):EDX[bit 23].

Expose it in KVM supported CPUID, then guest could make use of it; no
new registers are used, only new instructions.
Signed-off-by: NCathy Zhang <cathy.zhang@intel.com>
Signed-off-by: NKyung Min Park <kyung.min.park@intel.com>
Acked-by: NDave Hansen <dave.hansen@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20201208033441.28207-3-kyung.min.park@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAichun Shi <aichun.shi@intel.com>

f723ffab

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功