提交 · 056dc3c619d431857850e51d5a9c6cfe516c1ebf · openeuler / Kernel

05 7月, 2022 1 次提交

x86/speculation/mmio: Add mitigation for Processor MMIO Stale Data · 056dc3c6

由 Pawan Gupta 提交于 7月 05, 2022

stable inclusion
from stable-v5.10.123
commit 26f6f231f6a5a79ccc274967939b22602dec76e8
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS
CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=26f6f231f6a5a79ccc274967939b22602dec76e8

--------------------------------

commit 8cb861e9 upstream

Processor MMIO Stale Data is a class of vulnerabilities that may
expose data after an MMIO operation. For details please refer to
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst.

These vulnerabilities are broadly categorized as:

Device Register Partial Write (DRPW):
Some endpoint MMIO registers incorrectly handle writes that are
smaller than the register size. Instead of aborting the write or only
copying the correct subset of bytes (for example, 2 bytes for a 2-byte
write), more bytes than specified by the write transaction may be
written to the register. On some processors, this may expose stale
data from the fill buffers of the core that created the write
transaction.

Shared Buffers Data Sampling (SBDS):
After propagators may have moved data around the uncore and copied
stale data into client core fill buffers, processors affected by MFBDS
can leak data from the fill buffer.

Shared Buffers Data Read (SBDR):
It is similar to Shared Buffer Data Sampling (SBDS) except that the
data is directly read into the architectural software-visible state.

An attacker can use these vulnerabilities to extract data from CPU fill
buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill
buffers using the VERW instruction before returning to a user or a
guest.

On CPUs not affected by MDS and TAA, user application cannot sample data
from CPU fill buffers using MDS or TAA. A guest with MMIO access can
still use DRPW or SBDR to extract data architecturally. Mitigate it with
VERW instruction to clear fill buffers before VMENTER for MMIO capable
guests.

Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control
the mitigation.
Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYipeng Zou <zouyipeng@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

056dc3c6

25 5月, 2022 1 次提交

KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() · 672682cc

由 Like Xu 提交于 5月 23, 2022

stable inclusion
from stable-v5.10.102
commit 99cd2a043760e4fcf06fe3c67e9885a2d64c986d
bugzilla: https://gitee.com/openeuler/kernel/issues/I567K6

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=99cd2a043760e4fcf06fe3c67e9885a2d64c986d

--------------------------------

[ Upstream commit 7c174f30 ]

The find_arch_event() returns a "unsigned int" value,
which is used by the pmc_reprogram_counter() to
program a PERF_TYPE_HARDWARE type perf_event.

The returned value is actually the kernel defined generic
perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
incoming parameters for better self-explanation.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-3-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

672682cc

18 5月, 2022 3 次提交

KVM: VMX: Set vmcs.PENDING_DBG.BS on #DB in STI/MOVSS blocking shadow · 5ad29c73

由 Sean Christopherson 提交于 5月 18, 2022

stable inclusion
from stable-v5.10.101
commit 3aa5c8657292e05e6dfa8fe2316951001dab7e3a
bugzilla: https://gitee.com/openeuler/kernel/issues/I5669Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3aa5c8657292e05e6dfa8fe2316951001dab7e3a

--------------------------------

[ Upstream commit b9bed78e ]

Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step
breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or
MOVSS blocking is active.  Setting the flag is necessary to make VM-Entry
consistency checks happy, as VMX has an invariant that if RFLAGS.TF is
set and STI/MOVSS blocking is true, then the previous instruction must
have been STI or MOV/POP, and therefore a single-step #DB must be pending
since the RFLAGS.TF cannot have been set by the previous instruction,
i.e. the one instruction delay after setting RFLAGS.TF must have already
expired.

Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately
when recording guest state as part of a VM-Exit, but #DB VM-Exits
intentionally do not treat the #DB as "guest state" as interception of
the #DB effectively makes the #DB host-owned, thus KVM needs to manually
set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest.

Note, although this bug can be triggered by guest userspace, doing so
requires IOPL=3, and guest userspace running with IOPL=3 has full access
to all I/O ports (from the guest's perspective) and can crash/reboot the
guest any number of ways.  IOPL=3 is required because STI blocking kicks
in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either
takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF.

MOVSS blocking can be initiated by userspace, but can be coincident with
a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is
executed in the MOVSS shadow.  MOV DR #GPs at CPL>0, thus MOVSS blocking
is problematic only for CPL0 (and only if the guest is crazy enough to
access a DR in a MOVSS shadow).  All other sources of #DBs are either
suppressed by MOVSS blocking (single-step, code fetch, data, and I/O),
are mutually exclusive with MOVSS blocking (T-bit task switch), or are
already handled by KVM (ICEBP, a.k.a. INT1).

This bug was originally found by running tests[1] created for XSA-308[2].
Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is
presumably why the Xen bug was deemed to be an exploitable DOS from guest
userspace.  KVM already handles ICEBP by skipping the ICEBP instruction
and thus clears MOVSS blocking as a side effect of its "emulation".

[1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html
[2] https://xenbits.xen.org/xsa/advisory-308.htmlReported-by: NDavid Woodhouse <dwmw2@infradead.org>
Reported-by: NAlexander Graf <graf@amazon.de>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220120000624.655815-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5ad29c73

KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS · e5a87aa4

由 Vitaly Kuznetsov 提交于 5月 18, 2022

stable inclusion
from stable-v5.10.101
commit 9efad4cb03658b62514d8f8992525b2912272b7b
bugzilla: https://gitee.com/openeuler/kernel/issues/I5669Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9efad4cb03658b62514d8f8992525b2912272b7b

--------------------------------

[ Upstream commit f80ae0ef ]

Similar to MSR_IA32_VMX_EXIT_CTLS/MSR_IA32_VMX_TRUE_EXIT_CTLS,
MSR_IA32_VMX_ENTRY_CTLS/MSR_IA32_VMX_TRUE_ENTRY_CTLS pair,
MSR_IA32_VMX_TRUE_PINBASED_CTLS needs to be filtered the same way
MSR_IA32_VMX_PINBASED_CTLS is currently filtered as guests may solely rely
on 'true' MSR data.

Note, none of the currently existing Windows/Hyper-V versions are known
to stumble upon the unfiltered MSR_IA32_VMX_TRUE_PINBASED_CTLS, the change
is aimed at making the filtering future proof.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e5a87aa4

KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER · f8df6534

由 Vitaly Kuznetsov 提交于 5月 18, 2022

stable inclusion
from stable-v5.10.101
commit db58a3d978b4c22472125e5e2e4aa554e5330757
bugzilla: https://gitee.com/openeuler/kernel/issues/I5669Z

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=db58a3d978b4c22472125e5e2e4aa554e5330757

--------------------------------

[ Upstream commit 7a601e2c ]

Enlightened VMCS v1 doesn't have VMX_PREEMPTION_TIMER_VALUE field,
PIN_BASED_VMX_PREEMPTION_TIMER is also filtered out already so it makes
sense to filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER too.

Note, none of the currently existing Windows/Hyper-V versions are known
to enable 'save VMX-preemption timer value' when eVMCS is in use, the
change is aimed at making the filtering future proof.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220112170134.1904308-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f8df6534

10 5月, 2022 1 次提交

KVM: x86: Forcibly leave nested virt when SMM state is toggled · 826b7373

由 Sean Christopherson 提交于 5月 10, 2022

stable inclusion
from stable-v5.10.97
commit 080dbe7e9b86a0392d8dffc00d9971792afc121f
bugzilla: https://gitee.com/openeuler/kernel/issues/I55O0O

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=080dbe7e9b86a0392d8dffc00d9971792afc121f

--------------------------------

commit f7e57078 upstream.

Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.

Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.

Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.

  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Modules linked in:
  CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
  Call Trace:
   <TASK>
   kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
   kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
   kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
   kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
   kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
   kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
   kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
   __fput+0x286/0x9f0 fs/file_table.c:311
   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0xb29/0x2a30 kernel/exit.c:806
   do_group_exit+0xd2/0x2f0 kernel/exit.c:935
   get_signal+0x4b0/0x28c0 kernel/signal.c:2862
   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Cc: stable@vger.kernel.org
Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220125220358.2091737-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Backported-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

826b7373

27 4月, 2022 2 次提交

x86: KVM: Fixed the bug that WAITmax cannot be updated in real time · 414a578b

由 liangtian 提交于 4月 27, 2022

virt inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I53PTV?from=project-issue
CVE: NA

-----------------------------------------------------

Since the reset function is in kvm_intel module instead of kvm
module, the attribute weak function in kvm_main.c could not be found, which
would cause st_max in X86 never be refreshed.
The solution is to define the reset function in x86.c under the kvm module.
Signed-off-by: Nliangtian <liangtian13@huawei.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

414a578b

KVM: VMX: switch blocked_vcpu_on_cpu_lock to raw spinlock · ea6515b7

由 Marcelo Tosatti 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit aa1346113c752783f585d1d08627cfa38aa14e47
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=aa1346113c752783f585d1d08627cfa38aa14e47

--------------------------------

commit 5f02ef74 upstream.

blocked_vcpu_on_cpu_lock is taken from hard interrupt context
(pi_wakeup_handler), therefore it cannot sleep.

Switch it to a raw spinlock.

Fixes:

[41297.066254] BUG: scheduling while atomic: CPU 0/KVM/635218/0x00010001
[41297.066323] Preemption disabled at:
[41297.066324] [<ffffffff902ee47f>] irq_enter_rcu+0xf/0x60
[41297.066339] Call Trace:
[41297.066342]  <IRQ>
[41297.066346]  dump_stack_lvl+0x34/0x44
[41297.066353]  ? irq_enter_rcu+0xf/0x60
[41297.066356]  __schedule_bug.cold+0x7d/0x8b
[41297.066361]  __schedule+0x439/0x5b0
[41297.066365]  ? task_blocks_on_rt_mutex.constprop.0.isra.0+0x1b0/0x440
[41297.066369]  schedule_rtlock+0x1e/0x40
[41297.066371]  rtlock_slowlock_locked+0xf1/0x260
[41297.066374]  rt_spin_lock+0x3b/0x60
[41297.066378]  pi_wakeup_handler+0x31/0x90 [kvm_intel]
[41297.066388]  sysvec_kvm_posted_intr_wakeup_ipi+0x9d/0xd0
[41297.066392]  </IRQ>
[41297.066392]  asm_sysvec_kvm_posted_intr_wakeup_ipi+0x12/0x20
...
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

ea6515b7

19 4月, 2022 1 次提交

KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest · 72420054

由 Sean Christopherson 提交于 4月 19, 2022

stable inclusion
from stable-v5.10.93
commit 413b427f5fff5d658c2605ca889d6b13b88efd0c
bugzilla: 186204 https://gitee.com/openeuler/kernel/issues/I5311N

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=413b427f5fff5d658c2605ca889d6b13b88efd0c

--------------------------------

commit f4b027c5 upstream.

Override the Processor Trace (PT) interrupt handler for guest mode if and
only if PT is configured for host+guest mode, i.e. is being used
independently by both host and guest.  If PT is configured for system
mode, the host fully controls PT and must handle all events.

Fixes: 8479e04e ("KVM: x86: Inject PMI for KVM guest")
Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
Reported-by: NArtem Kashkanov <artem.kashkanov@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

72420054

26 1月, 2022 1 次提交

KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU · fec8390f

由 Sean Christopherson 提交于 1月 26, 2022

stable inclusion
from stable-v5.10.89
commit 28626e76baf50e6b37d8a92564844d873aa6b51f
bugzilla: 186140 https://gitee.com/openeuler/kernel/issues/I4S8HA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=28626e76baf50e6b37d8a92564844d873aa6b51f

--------------------------------

commit fdba608f upstream.

Drop a check that guards triggering a posted interrupt on the currently
running vCPU, and more importantly guards waking the target vCPU if
triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
If a vIRQ is delivered from asynchronous context, the target vCPU can be
the currently running vCPU and can also be blocking, in which case
skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to
be a wake event for the vCPU.

The "do nothing" logic when "vcpu == running_vcpu" mostly works only
because the majority of calls to ->deliver_posted_interrupt(), especially
when using posted interrupts, come from synchronous KVM context.  But if
a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ
and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to
use posted interrupts, wake events from the device will be delivered to
KVM from IRQ context, e.g.

  vfio_msihandler()
  |
  |-> eventfd_signal()
      |
      |-> ...
          |
          |->  irqfd_wakeup()
               |
               |->kvm_arch_set_irq_inatomic()
                  |
                  |-> kvm_irq_delivery_to_apic_fast()
                      |
                      |-> kvm_apic_set_irq()

This also aligns the non-nested and nested usage of triggering posted
interrupts, and will allow for additional cleanups.

Fixes: 379a3c8e ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reported-by: NLongpeng (Mike) <longpeng2@huawei.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211208015236.1616697-18-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fec8390f

14 1月, 2022 4 次提交

KVM: VMX: Set failure code in prepare_vmcs02() · 0490014b

由 Dan Carpenter 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.84
commit 64ca109bf8758766b10bc80a036745b4bc343dd1
bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=64ca109bf8758766b10bc80a036745b4bc343dd1

--------------------------------

[ Upstream commit bfbb307c ]

The error paths in the prepare_vmcs02() function are supposed to set
*entry_failure_code but this path does not.  It leads to using an
uninitialized variable in the caller.

Fixes: 71f73470 ("KVM: nVMX: Load GUEST_IA32_PERF_GLOBAL_CTRL MSR on VM-Entry")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20211130125337.GB24578@kili>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0490014b

KVM: x86: Use a stable condition around all VT-d PI paths · 2365c712

由 Paolo Bonzini 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.84
commit 5f33887a36824f1e906863460535be5d841a4364
bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5f33887a36824f1e906863460535be5d841a4364

--------------------------------

commit 53b7ca1a upstream.

Currently, checks for whether VT-d PI can be used refer to the current
status of the feature in the current vCPU; or they more or less pick
vCPU 0 in case a specific vCPU is not available.

However, these checks do not attempt to synchronize with changes to
the IRTE.  In particular, there is no path that updates the IRTE when
APICv is re-activated on vCPU 0; and there is no path to wakeup a CPU
that has APICv disabled, if the wakeup occurs because of an IRTE
that points to a posted interrupt.

To fix this, always go through the VT-d PI path as long as there are
assigned devices and APICv is available on both the host and the VM side.
Since the relevant condition was copied over three times, take the hint
and factor it into a separate function.
Suggested-by: NSean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20211123004311.2954158-5-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2365c712

KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST · 9cbea932

由 Sean Christopherson 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.84
commit 7722e88505226d64d7b2158b470e6945ef759832
bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7722e88505226d64d7b2158b470e6945ef759832

--------------------------------

commit 2b4a5a5d upstream.

Flush the current VPID when handling KVM_REQ_TLB_FLUSH_GUEST instead of
always flushing vpid01.  Any TLB flush that is triggered when L2 is
active is scoped to L2's VPID (if it has one), e.g. if L2 toggles CR4.PGE
and L1 doesn't intercept PGE writes, then KVM's emulation of the TLB
flush needs to be applied to L2's VPID.
Reported-by: NLai Jiangshan <jiangshanlai+lkml@gmail.com>
Fixes: 07ffaf34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211125014944.536398-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9cbea932

KVM: nVMX: don't use vcpu->arch.efer when checking host state on nested state load · dc3d9448

由 Maxim Levitsky 提交于 1月 14, 2022

stable inclusion
form stable-v5.10.82
commit 6b43cf113a382694da8c3c63ab9ef41018b8c49e
bugzilla: 185877 https://gitee.com/openeuler/kernel/issues/I4QU6V

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6b43cf113a382694da8c3c63ab9ef41018b8c49e

--------------------------------

commit af957eeb upstream.

When loading nested state, don't use check vcpu->arch.efer to get the
L1 host's 64-bit vs. 32-bit state and don't check it for consistency
with respect to VM_EXIT_HOST_ADDR_SPACE_SIZE, as register state in vCPU
may be stale when KVM_SET_NESTED_STATE is called---and architecturally
does not exist.  When restoring L2 state in KVM, the CPU is placed in
non-root where nested VMX code has no snapshot of L1 host state: VMX
(conditionally) loads host state fields loaded on VM-exit, but they need
not correspond to the state before entry.  A simple case occurs in KVM
itself, where the host RIP field points to vmx_vmexit rather than the
instruction following vmlaunch/vmresume.

However, for the particular case of L1 being in 32- or 64-bit mode
on entry, the exit controls can be treated instead as the source of
truth regarding the state of L1 on entry, and can be used to check
that vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE matches vmcs12.HOST_EFER if
vmcs12.VM_EXIT_LOAD_IA32_EFER is set.  The consistency check on CPU
EFER vs. vmcs12.VM_EXIT_HOST_ADDR_SPACE_SIZE, instead, happens only
on VM-Enter.  That's because, again, there's conceptually no "current"
L1 EFER to check on KVM_SET_NESTED_STATE.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dc3d9448

07 1月, 2022 12 次提交

KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created · 9d122be4

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 67b45af9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=67b45af946ec3148b64e6a3a1ee2ea8f79c5bc07

-------------------

If lbr_desc->event is successfully created, the intel_pmu_create_
guest_lbr_event() will return 0, otherwise it will return -ENOENT,
and then jump to LBR msrs dummy handling.

Fixes: 1b5ac322 ("KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE")
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210223013958.1280444-1-like.xu@linux.intel.com>
[Add "< 0" and PTR_ERR to make the code clearer. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

9d122be4

KVM: vmx/pmu: Expose LBR_FMT in the MSR_IA32_PERF_CAPABILITIES · 1f3b1e63

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit be635e34
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be635e34c284d08b1da7f93ddd6a2110617d15e7

-------------------

Userspace could enable guest LBR feature when the exactly supported
LBR format value is initialized to the MSR_IA32_PERF_CAPABILITIES
and the LBR is also compatible with vPMU version and host cpu model.

The LBR could be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-11-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1f3b1e63

KVM: vmx/pmu: Release guest LBR event via lazy release mechanism · 04876382

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 9aa4f622
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9aa4f622460f9287e57804dbeb219bfef29f04a1

-------------------

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-10-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

04876382

KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI · 82d2ccbb

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit e6209a3b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6209a3bef793e8fe29c873a7612023916eaa611

-------------------

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

82d2ccbb

KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation · 58632199

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 9254beaa
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9254beaafd12e27d48149fab3b16db372bc90ad7

-------------------

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-8-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

58632199

KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE · e6aea025

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 1b5ac322
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b5ac3226a1aa071135fe0ee5d1055d9e88b717c

-------------------

In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access
will result in a creation of guest LBR event per-vcpu.

If the guest LBR event is scheduled on with the corresponding vcpu context,
KVM will pass-through all LBR records msrs to the guest. The LBR callstack
mechanism implemented in the host could help save/restore the guest LBR
records during the event context switches, which reduces a lot of overhead
if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the
much more frequent VMX transitions.

To avoid reclaiming LBR resources from any higher priority event on host,
KVM would always check the exist of guest LBR event and its state before
vm-entry as late as possible. A negative result would cancel the
pass-through state, and it also prevents real registers accesses and
potential data leakage. If host reclaims the LBR between two checks, the
interception state and LBR records can be safely preserved due to native
save/restore support from guest LBR event.

The KVM emits a pr_warn() when the LBR hardware is unavailable to the
guest LBR event. The administer is supposed to reminder users that the
guest result may be inaccurate if someone is using LBR to record
hypervisor on the host side.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-7-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e6aea025

KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR · 157add0e

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 8e12911b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e12911b243e485f5e4c7c5fbc79cdf185728700

-------------------

When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler
would create a guest LBR event which enables the callstack mode and none of
hardware counter is assigned. The host perf would schedule and enable this
event as usual but in an exclusive way.

The guest LBR event will be released when the vPMU is reset but soon,
the lazy release mechanism would be applied to this event like a vPMC.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-6-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

157add0e

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 1c78a8dc

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit c6462363
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c646236344e9054cc84cd5a9f763163b9654cf7e

-------------------

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1c78a8dc

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 81d99ef6

由 Paolo Bonzini 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 9c9520ce
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c9520ce883386dc3794c7d60204487ff1db09cb

-------------------

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

81d99ef6

KVM: x86/pmu: preserve IA32_PERF_CAPABILITIES across CPUID refresh · c22843e8

由 Paolo Bonzini 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit a7557539
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a755753903a40d982f6dd23d65eb96b248a2577a

-------------------

Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr(), the
value should not be changed by cpuid(). To ensure that the new value
is kept, the default initialization path is moved to intel_pmu_init().
The effective value of the MSR will be 0 if PDCM is clear, however.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c22843e8

KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static · a7beb17f

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 252e365e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=252e365eb28ddf49eb31cec1a5d99e708c73f57b

-------------------

To make code responsibilities clear, we may resue and invoke the
vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c),
so expose it to passthrough LBR msrs later.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-2-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a7beb17f

KVM: VMX: read/write MSR_IA32_DEBUGCTLMSR from GUEST_IA32_DEBUGCTL · 52622a7d

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit d855066f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d855066f81726155caf766e47eea58ae10b1fd57

-------------------

SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
svm_get/set_msr, so the x86 common part can be safely moved to VMX.
This allows KVM to store the bits it supports in GUEST_IA32_DEBUGCTL.

Add vmx_supported_debugctl() to refactor the throwing logic of #GP.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210108013704.134985-2-like.xu@linux.intel.com>
[Merge parts of Chenyi Qiang's "KVM: X86: Expose bus lock debug exception
 to guest". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

52622a7d

06 12月, 2021 2 次提交

KVM: nVMX: Query current VMCS when determining if MSR bitmaps are in use · 7c57ace2

由 Sean Christopherson 提交于 12月 06, 2021

stable inclusion
from stable-5.10.80
commit 9f9d6d391ff5f8a3a4f6a0547fe8ed78dc4d8f15
bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9f9d6d391ff5f8a3a4f6a0547fe8ed78dc4d8f15

--------------------------------

commit 7dfbc624 upstream.

Check the current VMCS controls to determine if an MSR write will be
intercepted due to MSR bitmaps being disabled.  In the nested VMX case,
KVM will disable MSR bitmaps in vmcs02 if they're disabled in vmcs12 or
if KVM can't map L1's bitmaps for whatever reason.

Note, the bad behavior is relatively benign in the current code base as
KVM sets all bits in vmcs02's MSR bitmap by default, clears bits if and
only if L0 KVM also disables interception of an MSR, and only uses the
buggy helper for MSR_IA32_SPEC_CTRL.  Because KVM explicitly tests WRMSR
before disabling interception of MSR_IA32_SPEC_CTRL, the flawed check
will only result in KVM reading MSR_IA32_SPEC_CTRL from hardware when it
isn't strictly necessary.

Tag the fix for stable in case a future fix wants to use
msr_write_intercepted(), in which case a buggy implementation in older
kernels could prove subtly problematic.

Fixes: d28b387f ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211109013047.2041518-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7c57ace2

KVM: VMX: Unregister posted interrupt wakeup handler on hardware unsetup · 5683cde0

由 Sean Christopherson 提交于 12月 06, 2021

stable inclusion
from stable-5.10.80
commit 2f65b76c444587ca1033a9af1ce60766ac910628
bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2f65b76c444587ca1033a9af1ce60766ac910628

--------------------------------

commit ec5a4919 upstream.

Unregister KVM's posted interrupt wakeup handler during unsetup so that a
spurious interrupt that arrives after kvm_intel.ko is unloaded doesn't
call into freed memory.

Fixes: bf9f6ac8 ("KVM: Update Posted-Interrupts Descriptor when vCPU is blocked")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211009001107.3936588-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5683cde0

15 11月, 2021 2 次提交

KVM: nVMX: promptly process interrupts delivered while in guest mode · 33b42e98

由 Paolo Bonzini 提交于 11月 15, 2021

stable inclusion
from stable-5.10.76
commit 8f042315fcc46b7fc048ba7d7a3e136927e384d4
bugzilla: 182988 https://gitee.com/openeuler/kernel/issues/I4IAHF

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8f042315fcc46b7fc048ba7d7a3e136927e384d4

--------------------------------

commit 3a25dfa6 upstream.

Since commit c300ab9f ("KVM: x86: Replace late check_nested_events() hack with
more precise fix") there is no longer the certainty that check_nested_events()
tries to inject an external interrupt vmexit to L1 on every call to vcpu_enter_guest.
Therefore, even in that case we need to set KVM_REQ_EVENT.  This ensures
that inject_pending_event() is called, and from there kvm_check_nested_events().

Fixes: c300ab9f ("KVM: x86: Replace late check_nested_events() hack with more precise fix")
Cc: stable@vger.kernel.org
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

33b42e98

KVM: nVMX: Filter out all unsupported controls when eVMCS was activated · e58c0892

由 Vitaly Kuznetsov 提交于 11月 15, 2021

stable inclusion
from stable-5.10.71
commit 3778511dfc5941b0ee99d1040faab5c69e51dfac
bugzilla: 182981 https://gitee.com/openeuler/kernel/issues/I4I3KD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3778511dfc5941b0ee99d1040faab5c69e51dfac

--------------------------------

commit 8d68bad6 upstream.

Windows Server 2022 with Hyper-V role enabled failed to boot on KVM when
enlightened VMCS is advertised. Debugging revealed there are two exposed
secondary controls it is not happy with: SECONDARY_EXEC_ENABLE_VMFUNC and
SECONDARY_EXEC_SHADOW_VMCS. These controls are known to be unsupported,
as there are no corresponding fields in eVMCSv1 (see the comment above
EVMCS1_UNSUPPORTED_2NDEXEC definition).

Previously, commit 31de3d25 ("x86/kvm/hyper-v: move VMX controls
sanitization out of nested_enable_evmcs()") introduced the required
filtering mechanism for VMX MSRs but for some reason put only known
to be problematic (and not full EVMCS1_UNSUPPORTED_* lists) controls
there.

Note, Windows Server 2022 seems to have gained some sanity check for VMX
MSRs: it doesn't even try to launch a guest when there's something it
doesn't like, nested_evmcs_check_controls() mechanism can't catch the
problem.

Let's be bold this time and instead of playing whack-a-mole just filter out
all unsupported controls from VMX MSRs.

Fixes: 31de3d25 ("x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210907163530.110066-1-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e58c0892

19 10月, 2021 4 次提交

KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter · b3a0096c

由 Sean Christopherson 提交于 10月 19, 2021

stable inclusion
from stable-5.10.65
commit c2c7eefc93718a3bdffe031350d23a665aac15aa
bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c2c7eefc93718a3bdffe031350d23a665aac15aa

--------------------------------

commit f7782bb8 upstream.

Clear nested.pi_pending on nested VM-Enter even if L2 will run without
posted interrupts enabled.  If nested.pi_pending is left set from a
previous L2, vmx_complete_nested_posted_interrupt() will pick up the
stale flag and exit to userspace with an "internal emulation error" due
the new L2 not having a valid nested.pi_desc.

Arguably, vmx_complete_nested_posted_interrupt() should first check for
posted interrupts being enabled, but it's also completely reasonable that
KVM wouldn't screw up a fundamental flag.  Not to mention that the mere
existence of nested.pi_pending is a long-standing bug as KVM shouldn't
move the posted interrupt out of the IRR until it's actually processed,
e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit
with a "pending" posted interrupt.  Fixing the mess is a future problem.

Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI
descriptor as an error, this was a benign bug as the null PI descriptor
effectively served as a check on PI not being enabled.  Even then, the
new flow did not become problematic until KVM started checking the result
of kvm_check_nested_events().

Fixes: 705699a1 ("KVM: nVMX: Enable nested posted interrupt processing")
Fixes: 966eefb8 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable")
Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810144526.2662272-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3a0096c

KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation · 051bb888

由 Maxim Levitsky 提交于 10月 19, 2021

stable inclusion
from stable-5.10.65
commit bf36224463356526f56d992d3f6d09cbd57e2d2f
bugzilla: 182361 https://gitee.com/openeuler/kernel/issues/I4EH3U

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bf36224463356526f56d992d3f6d09cbd57e2d2f

--------------------------------

commit 81b4b56d upstream.

If we are emulating an invalid guest state, we don't have a correct
exit reason, and thus we shouldn't do anything in this function.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210826095750.1650467-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 95b5a48c ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

051bb888

KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF · 5082a389

由 Sean Christopherson 提交于 10月 19, 2021

stable inclusion
from stable-5.10.60
commit 433f0b31ebec09bb89c2414eb76571ced670a202
bugzilla: 177018 https://gitee.com/openeuler/kernel/issues/I4EAUG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=433f0b31ebec09bb89c2414eb76571ced670a202

--------------------------------

commit 18712c13 upstream.

Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
in L2 or if the VM-Exit should be forwarded to L1.  The current logic fails
to account for the case where #PF is intercepted to handle
guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
L1.  At best, L1 will complain and inject the #PF back into L2.  At
worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
page faults.

Note, while the bug was technically introduced by the commit that added
support for the MAXPHYADDR madness, the shame is all on commit
a0c13434 ("KVM: VMX: introduce vmx_need_pf_intercept").

Fixes: 1dbf5d68 ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
Cc: stable@vger.kernel.org
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210812045615.3167686-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5082a389

KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation · 6d23d9dd

由 Sean Christopherson 提交于 10月 19, 2021

stable inclusion
from stable-5.10.60
commit 0ab67e3dfc4d7af799b76b405eba2dc6f36935f3
bugzilla: 177018 https://gitee.com/openeuler/kernel/issues/I4EAUG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0ab67e3dfc4d7af799b76b405eba2dc6f36935f3

--------------------------------

commit 7b9cae02 upstream.

Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to
effectively get the controls for the current VMCS, as opposed to using
vmx->secondary_exec_controls, which is the cached value of KVM's desired
controls for vmcs01 and truly not reflective of any particular VMCS.

While the waitpkg control is not dynamic, i.e. vmcs01 will always hold
the same waitpkg configuration as vmx->secondary_exec_controls, the same
does not hold true for vmcs02 if the L1 VMM hides the feature from L2.
If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL,
L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP.

Fixes: 6e3ba4ab ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6d23d9dd

13 10月, 2021 4 次提交

KVM: nVMX: Don't clobber nested MMU's A/D status on EPTP switch · de276379

由 Sean Christopherson 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit 64d31137b1a6e97137de5edc766aa7cfff19b7ed
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=64d31137b1a6e97137de5edc766aa7cfff19b7ed

--------------------------------

[ Upstream commit 272b0a99 ]

Drop bogus logic that incorrectly clobbers the accessed/dirty enabling
status of the nested MMU on an EPTP switch.  When nested EPT is enabled,
walk_mmu points at L2's _legacy_ page tables, not L1's EPT for L2.

This is likely a benign bug, as mmu->ept_ad is never consumed (since the
MMU is not a nested EPT MMU), and stuffing mmu_role.base.ad_disabled will
never propagate into future shadow pages since the nested MMU isn't used
to map anything, just to walk L2's page tables.

Note, KVM also does a full MMU reload, i.e. the guest_mmu will be
recreated using the new EPTP, and thus any change in A/D enabling will be
properly recognized in the relevant MMU.

Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

de276379

KVM: nVMX: Ensure 64-bit shift when checking VMFUNC bitmap · 90d0438b

由 Sean Christopherson 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit bac38bd7c458b17eded5cb6d50446bf7c9e46daa
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bac38bd7c458b17eded5cb6d50446bf7c9e46daa

--------------------------------

[ Upstream commit 0e75225d ]

Use BIT_ULL() instead of an open-coded shift to check whether or not a
function is enabled in L1's VMFUNC bitmap.  This is a benign bug as KVM
supports only bit 0, and will fail VM-Enter if any other bits are set,
i.e. bits 63:32 are guaranteed to be zero.

Note, "function" is bounded by hardware as VMFUNC will #UD before taking
a VM-Exit if the function is greater than 63.

Before:
  if ((vmcs12->vm_function_control & (1 << function)) == 0)
   0x000000000001a916 <+118>:	mov    $0x1,%eax
   0x000000000001a91b <+123>:	shl    %cl,%eax
   0x000000000001a91d <+125>:	cltq
   0x000000000001a91f <+127>:	and    0x128(%rbx),%rax

After:
  if (!(vmcs12->vm_function_control & BIT_ULL(function & 63)))
   0x000000000001a955 <+117>:	mov    0x128(%rbx),%rdx
   0x000000000001a95c <+124>:	bt     %rax,%rdx

Fixes: 27c42a1b ("KVM: nVMX: Enable VMFUNC for the L1 hypervisor")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

90d0438b

KVM: nVMX: Sync all PGDs on nested transition with shadow paging · 436ae6ed

由 Sean Christopherson 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit b2c5af71ce4b6e21f6af9fa292e72fd4662a2490
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b2c5af71ce4b6e21f6af9fa292e72fd4662a2490

--------------------------------

[ Upstream commit 07ffaf34 ]

Trigger a full TLB flush on behalf of the guest on nested VM-Enter and
VM-Exit when VPID is disabled for L2.  kvm_mmu_new_pgd() syncs only the
current PGD, which can theoretically leave stale, unsync'd entries in a
previous guest PGD, which could be consumed if L2 is allowed to load CR3
with PCID_NOFLUSH=1.

Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can
be utilized for its obvious purpose of emulating a guest TLB flush.

Note, there is no change the actual TLB flush executed by KVM, even
though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT.  When VPID is
disabled for L2, vpid02 is guaranteed to be '0', and thus
nested_get_vpid02() will return the VPID that is shared by L1 and L2.

Generate the request outside of kvm_mmu_new_pgd(), as getting the common
helper to correctly identify which requested is needed is quite painful.
E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as
a TLB flush from the L1 kernel's perspective does not invalidate EPT
mappings.  And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future
simplification by moving the logic into nested_vmx_transition_tlb_flush().

Fixes: 41fab65e ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

436ae6ed

KVM: nVMX: Handle split-lock #AC exceptions that happen in L2 · 8daab590

由 Sean Christopherson 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit 39d0dfab6c3e6ab7052b502fe4ad24a9359a9de6
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=39d0dfab6c3e6ab7052b502fe4ad24a9359a9de6

--------------------------------

commit b33bb78a upstream.

Mark #ACs that won't be reinjected to the guest as wanted by L0 so that
KVM handles split-lock #AC from L2 instead of forwarding the exception to
L1.  Split-lock #AC isn't yet virtualized, i.e. L1 will treat it like a
regular #AC and do the wrong thing, e.g. reinject it into L2.

Fixes: e6f8b6c1 ("KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guest")
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622172244.3561540-1-seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8daab590

03 6月, 2021 2 次提交

KVM: x86: Defer vtime accounting 'til after IRQ handling · c15ee071

由 Wanpeng Li 提交于 5月 29, 2021

stable inclusion
from stable-5.10.41
commit 514883ebac77ff9939da92e268b24a71c9fe4e05
bugzilla: 51890
CVE: NA

--------------------------------

commit 16045714 upstream.

Defer the call to account guest time until after servicing any IRQ(s)
that happened in the guest or immediately after VM-Exit.  Tick-based
accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
handler runs, and IRQs are blocked throughout the main sequence of
vcpu_enter_guest(), including the call into vendor code to actually
enter and exit the guest.

This fixes a bug where reported guest time remains '0', even when
running an infinite loop in the guest:

  https://bugzilla.kernel.org/show_bug.cgi?id=209831

Fixes: 87fa7f3e ("x86/kvm: Move context tracking where it belongs")
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c15ee071

KVM: VMX: Disable preemption when probing user return MSRs · 17763049

由 Sean Christopherson 提交于 5月 25, 2021

stable inclusion
from stable-5.10.38
commit 31f29749ee970c251b3a7e5b914108425940d089
bugzilla: 51875
CVE: NA

--------------------------------

commit 5104d7ff upstream.

Disable preemption when probing a user return MSR via RDSMR/WRMSR.  If
the MSR holds a different value per logical CPU, the WRMSR could corrupt
the host's value if KVM is preempted between the RDMSR and WRMSR, and
then rescheduled on a different CPU.

Opportunistically land the helper in common x86, SVM will use the helper
in a future commit.

Fixes: 4be53410 ("KVM: VMX: Initialize vmx->guest_msrs[] right after allocation")
Cc: stable@vger.kernel.org
Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-6-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

17763049

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功