提交 · 938c8745bcf2f732ee928a0b9bd592198a88cfa4 · openeuler / Kernel

08 6月, 2022 5 次提交

KVM: x86: Introduce "struct kvm_caps" to track misc caps/settings · 938c8745

由 Sean Christopherson 提交于 5月 24, 2022

Add kvm_caps to hold a variety of capabilites and defaults that aren't
handled by kvm_cpu_caps because they aren't CPUID bits in order to reduce
the amount of boilerplate code required to add a new feature.  The vast
majority (all?) of the caps interact with vendor code and are written
only during initialization, i.e. should be tagged __read_mostly, declared
extern in x86.h, and exported.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220524135624.22988-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

938c8745

KVM: nSVM: Transparently handle L1 -> L2 NMI re-injection · 159fc6fa

由 Maciej S. Szmigiero 提交于 5月 02, 2022

A NMI that L1 wants to inject into its L2 should be directly re-injected,
without causing L0 side effects like engaging NMI blocking for L1.

It's also worth noting that in this case it is L1 responsibility
to track the NMI window status for its L2 guest.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <f894d13501cd48157b3069a4b4c7369575ddb60e.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

159fc6fa

KVM: SVM: Re-inject INTn instead of retrying the insn on "failure" · 7e5b5ef8

由 Sean Christopherson 提交于 5月 02, 2022

Re-inject INTn software interrupts instead of retrying the instruction if
the CPU encountered an intercepted exception while vectoring the INTn,
e.g. if KVM intercepted a #PF when utilizing shadow paging. Retrying the
instruction is architecturally wrong e.g. will result in a spurious #DB
if there's a code breakpoint on the INT3/O, and lack of re-injection also
breaks nested virtualization, e.g. if L1 injects a software interrupt and
vectoring the injected interrupt encounters an exception that is
intercepted by L0 but not L1.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <1654ad502f860948e4f2d57b8bd881d67301f785.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e5b5ef8

KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction · 6ef88d6e

由 Sean Christopherson 提交于 5月 02, 2022

Re-inject INT3/INTO instead of retrying the instruction if the CPU
encountered an intercepted exception while vectoring the software
exception, e.g. if vectoring INT3 encounters a #PF and KVM is using
shadow paging. Retrying the instruction is architecturally wrong, e.g.
will result in a spurious #DB if there's a code breakpoint on the INT3/O,
and lack of re-injection also breaks nested virtualization, e.g. if L1
injects a software exception and vectoring the injected exception
encounters an exception that is intercepted by L0 but not L1.

Due to, ahem, deficiencies in the SVM architecture, acquiring the next
RIP may require flowing through the emulator even if NRIPS is supported,
as the CPU clears next_rip if the VM-Exit is due to an exception other
than "exceptions caused by the INT3, INTO, and BOUND instructions". To
deal with this, "skip" the instruction to calculate next_rip (if it's
not already known), and then unwind the RIP write and any side effects
(RFLAGS updates).

Save the computed next_rip and use it to re-stuff next_rip if injection
doesn't complete. This allows KVM to do the right thing if next_rip was
known prior to injection, e.g. if L1 injects a soft event into L2, and
there is no backing INTn instruction, e.g. if L1 is injecting an
arbitrary event.

Note, it's impossible to guarantee architectural correctness given SVM's
architectural flaws. E.g. if the guest executes INTn (no KVM injection),
an exit occurs while vectoring the INTn, and the guest modifies the code
stream while the exit is being handled, KVM will compute the incorrect
next_rip due to "skipping" the wrong instruction. A future enhancement
to make this less awful would be for KVM to detect that the decoded
instruction is not the correct INTn and drop the to-be-injected soft
event (retrying is a lesser evil compared to shoving the wrong RIP on the
exception stack).
Reported-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <65cb88deab40bc1649d509194864312a89bbe02e.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ef88d6e

KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02 · 00f08d99

由 Maciej S. Szmigiero 提交于 5月 02, 2022

The next_rip field of a VMCB is *not* an output-only field for a VMRUN.
This field value (instead of the saved guest RIP) in used by the CPU for
the return address pushed on stack when injecting a software interrupt or
INT3 or INTO exception.

Make sure this field gets synced from vmcb12 to vmcb02 when entering L2 or
loading a nested state and NRIPS is exposed to L1. If NRIPS is supported
in hardware but not exposed to L1 (nrips=0 or hidden by userspace), stuff
vmcb02's next_rip from the new L2 RIP to emulate a !NRIPS CPU (which
saves RIP on the stack as-is).
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <c2e0a3d78db3ae30530f11d4e9254b452a89f42b.1651440202.git.maciej.szmigiero@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00f08d99

07 6月, 2022 1 次提交

KVM: SVM: fix tsc scaling cache logic · 11d39e8c

由 Maxim Levitsky 提交于 6月 06, 2022

SVM uses a per-cpu variable to cache the current value of the
tsc scaling multiplier msr on each cpu.

Commit 1ab9287a
("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
broke this caching logic.

Refactor the code so that all TSC scaling multiplier writes go through
a single function which checks and updates the cache.

This fixes the following scenario:

1. A CPU runs a guest with some tsc scaling ratio.

2. New guest with different tsc scaling ratio starts on this CPU
   and terminates almost immediately.

   This ensures that the short running guest had set the tsc scaling ratio just
   once when it was set via KVM_SET_TSC_KHZ. Due to the bug,
   the per-cpu cache is not updated.

3. The original guest continues to run, it doesn't restore the msr
   value back to its own value, because the cache matches,
   and thus continues to run with a wrong tsc scaling ratio.

Fixes: 1ab9287a ("KVM: X86: Add vendor callbacks for writing the TSC multiplier")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606181149.103072-1-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

11d39e8c

30 4月, 2022 1 次提交

KVM: x86: Clean up and document nested #PF workaround · 6819af75

由 Sean Christopherson 提交于 3月 03, 2022

Replace the per-vendor hack-a-fix for KVM's #PF => #PF => #DF workaround
with an explicit, common workaround in kvm_inject_emulated_page_fault().
Aside from being a hack, the current approach is brittle and incomplete,
e.g. nSVM's KVM_SET_NESTED_STATE fails to set ->inject_page_fault(),
and nVMX fails to apply the workaround when VMX is intercepting #PF due
to allow_smaller_maxphyaddr=1.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6819af75

14 4月, 2022 1 次提交

KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 · 45846661

由 Sean Christopherson 提交于 4月 07, 2022

Remove WARNs that sanity check that KVM never lets a triple fault for L2
escape and incorrectly end up in L1.  In normal operation, the sanity
check is perfectly valid, but it incorrectly assumes that it's impossible
for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through
KVM_RUN (which guarantees kvm_check_nested_state() will see and handle
the triple fault).

The WARN can currently be triggered if userspace injects a machine check
while L2 is active and CR4.MCE=0.  And a future fix to allow save/restore
of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't
lost on migration, will make it trivially easy for userspace to trigger
the WARN.

Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is
tempting, but wrong, especially if/when the request is saved/restored,
e.g. if userspace restores events (including a triple fault) and then
restores nested state (which may forcibly leave guest mode).  Ignoring
the fact that KVM doesn't currently provide the necessary APIs, it's
userspace's responsibility to manage pending events during save/restore.

  ------------[ cut here ]------------
  WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
  Call Trace:
   <TASK>
   vmx_leave_nested+0x30/0x40 [kvm_intel]
   vmx_set_nested_state+0xca/0x3e0 [kvm_intel]
   kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm]
   kvm_vcpu_ioctl+0x4b9/0x660 [kvm]
   __x64_sys_ioctl+0x83/0xb0
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>
  ---[ end trace 0000000000000000 ]---

Fixes: cb6a32c2 ("KVM: x86: Handle triple fault in L2 without killing L1")
Cc: stable@vger.kernel.org
Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220407002315.78092-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45846661

02 4月, 2022 7 次提交

KVM: x86: SVM: allow AVIC to co-exist with a nested guest running · f44509f8

由 Maxim Levitsky 提交于 3月 22, 2022

Inhibit the AVIC of the vCPU that is running nested for the duration of the
nested run, so that all interrupts arriving from both its vCPU siblings
and from KVM are delivered using normal IPIs and cause that vCPU to vmexit.

Note that unlike normal AVIC inhibition, there is no need to
update the AVIC mmio memslot, because the nested guest uses its
own set of paging tables.
That also means that AVIC doesn't need to be inhibited VM wide.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f44509f8

KVM: x86: nSVM: implement nested vGIF · 0b349662

由 Maxim Levitsky 提交于 3月 22, 2022

In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless
of STGI/CLGI intercepts, and since VM entry enables GIF, this means
that L1's GIF is always 1 while L2 is running.

Thus in this case leave L1's vGIF in vmcb01, while letting L2
control the vGIF thus implementing nested vGIF.

Also allow KVM to toggle L1's GIF during nested entry/exit
by always using vmcb01.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0b349662

KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE · 74fd41ed

由 Maxim Levitsky 提交于 3月 22, 2022

Expose the pause filtering and threshold in the guest CPUID
and support PAUSE filtering when possible:

- If the L0 doesn't intercept PAUSE (cpu_pm=on), then allow L1 to
  have full control over PAUSE filtering.

- if the L1 doesn't intercept PAUSE, use host values and update
  the adaptive count/threshold even when running nested.

- Otherwise always exit to L1; it is not really possible to merge
  the fields correctly.  It is expected that in this case, userspace
  will not enable this feature in the guest CPUID, to avoid having the
  guest update both fields pointlessly.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

74fd41ed

KVM: x86: nSVM: implement nested LBR virtualization · d20c796c

由 Maxim Levitsky 提交于 3月 22, 2022

This was tested with kvm-unit-test that was developed
for this purpose.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-3-mlevitsk@redhat.com>
[Copy all of DEBUGCTL except for reserved bits. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d20c796c

KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running · 1d5a1b58

由 Maxim Levitsky 提交于 3月 22, 2022

When L2 is running without LBR virtualization, we should ensure
that L1's LBR msrs continue to update as usual.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d5a1b58

kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense · db663af4

由 Maxim Levitsky 提交于 3月 22, 2022

This makes the code a bit shorter and cleaner.

No functional change intended.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db663af4

KVM: x86: nSVM: implement nested VMLOAD/VMSAVE · b9f3973a

由 Maxim Levitsky 提交于 3月 01, 2022

This was tested by booting L1,L2,L3 (all Linux) and checking
that no VMLOAD/VMSAVE vmexits happened.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220301143650.143749-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9f3973a

25 2月, 2022 1 次提交

KVM: x86/mmu: load new PGD after the shadow MMU is initialized · 3cffc89d

由 Paolo Bonzini 提交于 2月 04, 2022

Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
shadow_root_level anymore, pull the PGD load after the initialization of
the shadow MMUs.

Besides being more intuitive, this enables future simplifications
and optimizations because it's not necessary anymore to compute the
role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
attempting to use a cached PGD to avoid having to figure out the new role.
With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing,
and avoid unloading all the cached roots.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3cffc89d

11 2月, 2022 2 次提交

KVM: nSVM: Implement Enlightened MSR-Bitmap feature · 66c03a92

由 Vitaly Kuznetsov 提交于 2月 02, 2022

Similar to nVMX commit 502d2bf5 ("KVM: nVMX: Implement Enlightened MSR
Bitmap feature"), add support for the feature for nSVM (Hyper-V on KVM).

Notable differences from nVMX implementation:
- As the feature uses SW reserved fields in VMCB control, KVM needs to
make sure it's dealing with a Hyper-V guest (kvm_hv_hypercall_enabled()).

- 'msrpm_base_pa' needs to be always be overwritten in
nested_svm_vmrun_msrpm(), even when the update is skipped. As an
optimization, nested_vmcb02_prepare_control() copies it from VMCB01
so when MSR-Bitmap feature for L2 is disabled nothing needs to be done.

- 'struct vmcb_ctrl_area_cached' needs to be extended with clean
fields/sw reserved data and __nested_copy_vmcb_control_to_cache() needs to
copy it so nested_svm_vmrun_msrpm() can use it later.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-5-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66c03a92

KVM: nSVM: Track whether changes in L0 require MSR bitmap for L2 to be rebuilt · 73c25546

由 Vitaly Kuznetsov 提交于 2月 02, 2022

Similar to nVMX commit ed2a4800 ("KVM: nVMX: Track whether changes in
L0 require MSR bitmap for L2 to be rebuilt"), introduce a flag to keep
track of whether MSR bitmap for L2 needs to be rebuilt due to changes in
MSR bitmap for L1 or switching to a different L2.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220202095100.129834-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73c25546

09 2月, 2022 1 次提交

KVM: x86: nSVM: fix potential NULL derefernce on nested migration · e1779c27

由 Maxim Levitsky 提交于 2月 07, 2022

Turns out that due to review feedback and/or rebases
I accidentally moved the call to nested_svm_load_cr3 to be too early,
before the NPT is enabled, which is very wrong to do.

KVM can't even access guest memory at that point as nested NPT
is needed for that, and of course it won't initialize the walk_mmu,
which is main issue the patch was addressing.

Fix this for real.

Fixes: 232f75d3 ("KVM: nSVM: call nested_svm_load_cr3 on nested state load")
Cc: stable@vger.kernel.org
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220207155447.840194-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e1779c27

27 1月, 2022 1 次提交

KVM: x86: Forcibly leave nested virt when SMM state is toggled · f7e57078

由 Sean Christopherson 提交于 1月 25, 2022

Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.

Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.

Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.

  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Modules linked in:
  CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
  Call Trace:
   <TASK>
   kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
   kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
   kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
   kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
   kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
   kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
   kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
   __fput+0x286/0x9f0 fs/file_table.c:311
   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0xb29/0x2a30 kernel/exit.c:806
   do_group_exit+0xd2/0x2f0 kernel/exit.c:935
   get_signal+0x4b0/0x28c0 kernel/signal.c:2862
   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Cc: stable@vger.kernel.org
Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220125220358.2091737-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f7e57078

08 12月, 2021 9 次提交

KVM: X86: Remove mmu parameter from load_pdptrs() · 2df4a5eb

由 Lai Jiangshan 提交于 11月 24, 2021

It uses vcpu->arch.walk_mmu always; nested EPT does not have PDPTRs,
and nested NPT treats them like all other non-leaf page table levels
instead of caching them.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-11-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2df4a5eb

KVM: SVM: Remove references to VCPU_EXREG_CR3 · aec9c240

由 Lai Jiangshan 提交于 11月 08, 2021

VCPU_EXREG_CR3 is never cleared from vcpu->arch.regs_avail or
vcpu->arch.regs_dirty in SVM; therefore, marking CR3 as available is
merely a NOP, and testing it will likewise always succeed.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211108124407.12187-9-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aec9c240

KVM: nSVM: introduce struct vmcb_ctrl_area_cached · 8fc78909

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

This structure will replace vmcb_control_area in
svm_nested_state, providing only the fields that are actually
used by the nested state. This avoids having and copying around
uninitialized fields. The cost of this, however, is that all
functions (in this case vmcb_is_intercept) expect the old
structure, so they need to be duplicated.

In addition, in svm_get_nested_state() user space expects a
vmcb_control_area struct, so we need to copy back all fields
in a temporary structure before copying it to userspace.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211103140527.752797-7-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8fc78909

KVM: nSVM: split out __nested_vmcb_check_controls · bd95926c

由 Paolo Bonzini 提交于 11月 11, 2021

Remove the struct vmcb_control_area parameter from nested_vmcb_check_controls,
for consistency with the functions that operate on the save area. This
way, VMRUN uses the version without underscores for both areas, while
KVM_SET_NESTED_STATE uses the version with underscores.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd95926c

KVM: nSVM: use svm->nested.save to load vmcb12 registers and avoid TOC/TOU races · 355d0473

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Use the already checked svm->nested.save cached fields
(EFER, CR0, CR4, ...) instead of vmcb12's in
nested_vmcb02_prepare_save().
This prevents from creating TOC/TOU races, since the
guest could modify the vmcb12 fields.

This also avoids the need of force-setting EFER_SVME in
nested_vmcb02_prepare_save.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211103140527.752797-6-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

355d0473

KVM: nSVM: use vmcb_save_area_cached in nested_vmcb_valid_sregs() · b7a3d8b6

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Now that struct vmcb_save_area_cached contains the required
vmcb fields values (done in nested_load_save_from_vmcb12()),
check them to see if they are correct in nested_vmcb_valid_sregs().

While at it, rename nested_vmcb_valid_sregs in nested_vmcb_check_save.
__nested_vmcb_check_save takes the additional @save parameter, so it
is helpful when we want to check a non-svm save state, like in
svm_set_nested_state. The reason for that is that save is the L1
state, not L2, so we check it without moving it to svm->nested.save.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211103140527.752797-5-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b7a3d8b6

KVM: nSVM: rename nested_load_control_from_vmcb12 in nested_copy_vmcb_control_to_cache · 7907160d

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Following the same naming convention of the previous patch,
rename nested_load_control_from_vmcb12.
In addition, inline copy_vmcb_control_area as it is only called
by this function.

__nested_copy_vmcb_control_to_cache() works with vmcb_control_area
parameters and it will be useful in next patches, when we use
local variables instead of svm cached state.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211103140527.752797-4-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7907160d

KVM: nSVM: introduce svm->nested.save to cache save area before checks · f2740a8d

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

This is useful in the next patch, to keep a saved copy
of vmcb12 registers and pass it around more easily.

Instead of blindly copying everything, we just copy EFER, CR0, CR3, CR4,
DR6 and DR7 which are needed by the VMRUN checks.  If more fields will
need to be checked, it will be quite obvious to see that they must be added
in struct vmcb_save_area_cached and in nested_copy_vmcb_save_to_cache().

__nested_copy_vmcb_save_to_cache() takes a vmcb_save_area_cached
parameter, which is useful in order to save the state to a local
variable.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211103140527.752797-3-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2740a8d

KVM: nSVM: move nested_vmcb_check_cr3_cr4 logic in nested_vmcb_valid_sregs · 907afa48

由 Emanuele Giuseppe Esposito 提交于 11月 03, 2021

Inline nested_vmcb_check_cr3_cr4 as it is not called by anyone else.
Doing so simplifies next patches.
Signed-off-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211103140527.752797-2-eesposit@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

907afa48

01 10月, 2021 2 次提交

nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB · 174a921b

由 Krish Sadhukhan 提交于 9月 20, 2021

According to section "TLB Flush" in APM vol 2,

"Support for TLB_CONTROL commands other than the first two, is
optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].

All encodings of TLB_CONTROL not defined in the APM are reserved."
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

174a921b

KVM: x86: nSVM: implement nested TSC scaling · 5228eb96

由 Maxim Levitsky 提交于 9月 14, 2021

This was tested by booting a nested guest with TSC=1Ghz,
observing the clocks, and doing about 100 cycles of migration.

Note that qemu patch is needed to support migration because
of a new MSR that needs to be placed in the migration state.

The patch will be sent to the qemu mailing list soon.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-14-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5228eb96

30 9月, 2021 1 次提交

KVM: x86: nSVM: don't copy pause related settings · 0226a45c

由 Maxim Levitsky 提交于 9月 14, 2021

According to the SDM, the CPU never modifies these settings.
It loads them on VM entry and updates an internal copy instead.

Also don't load them from the vmcb12 as we don't expose these
features to the nested guest yet.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0226a45c

23 9月, 2021 1 次提交

KVM: x86: nSVM: don't copy virt_ext from vmcb12 · faf6b755

由 Maxim Levitsky 提交于 9月 14, 2021

These field correspond to features that we don't expose yet to L2

While currently there are no CVE worthy features in this field,
if AMD adds more features to this field, that could allow guest
escapes similar to CVE-2021-3653 and CVE-2021-3656.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

faf6b755

22 9月, 2021 1 次提交

KVM: x86: SVM: call KVM_REQ_GET_NESTED_STATE_PAGES on exit from SMM mode · e85d3e7b

由 Maxim Levitsky 提交于 9月 13, 2021

Currently the KVM_REQ_GET_NESTED_STATE_PAGES on SVM only reloads PDPTRs,
and MSR bitmap, with former not really needed for SMM as SMM exit code
reloads them again from SMRAM'S CR3, and later happens to work
since MSR bitmap isn't modified while in SMM.

Still it is better to be consistient with VMX.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210913140954.165665-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e85d3e7b

16 8月, 2021 2 次提交

KVM: nSVM: always intercept VMLOAD/VMSAVE when nested (CVE-2021-3656) · c7dfa400

由 Maxim Levitsky 提交于 7月 19, 2021

If L1 disables VMLOAD/VMSAVE intercepts, and doesn't enable
Virtual VMLOAD/VMSAVE (currently not supported for the nested hypervisor),
then VMLOAD/VMSAVE must operate on the L1 physical memory, which is only
possible by making L0 intercept these instructions.

Failure to do so allowed the nested guest to run VMLOAD/VMSAVE unintercepted,
and thus read/write portions of the host physical memory.

Fixes: 89c8a498 ("KVM: SVM: Enable Virtual VMLOAD VMSAVE feature")
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7dfa400

KVM: nSVM: avoid picking up unsupported bits from L2 in int_ctl (CVE-2021-3653) · 0f923e07

由 Maxim Levitsky 提交于 7月 15, 2021

* Invert the mask of bits that we pick from L2 in
  nested_vmcb02_prepare_control

* Invert and explicitly use VIRQ related bits bitmask in svm_clear_vintr

This fixes a security issue that allowed a malicious L1 to run L2 with
AVIC enabled, which allowed the L2 to exploit the uninitialized and enabled
AVIC to read/write the host physical memory at some offsets.

Fixes: 3d6368ef ("KVM: SVM: Add VMRUN handler")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f923e07

02 8月, 2021 1 次提交

KVM: nSVM: remove useless kvm_clear_*_queue · db105fab

由 Paolo Bonzini 提交于 8月 02, 2021

For an event to be in injected state when nested_svm_vmrun executes,
it must have come from exitintinfo when svm_complete_interrupts ran:

  vcpu_enter_guest
   static_call(kvm_x86_run) -> svm_vcpu_run
    svm_complete_interrupts
     // now the event went from "exitintinfo" to "injected"
   static_call(kvm_x86_handle_exit) -> handle_exit
    svm_invoke_exit_handler
      vmrun_interception
       nested_svm_vmrun

However, no event could have been in exitintinfo before a VMRUN
vmexit.  The code in svm.c is a bit more permissive than the one
in vmx.c:

        if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
            exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
            exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
            exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)

but in any case, a VMRUN instruction would not even start to execute
during an attempted event delivery.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db105fab

28 7月, 2021 1 次提交

KVM: SVM: tweak warning about enabled AVIC on nested entry · feea0136

由 Maxim Levitsky 提交于 7月 13, 2021

It is possible that AVIC was requested to be disabled but
not yet disabled, e.g if the nested entry is done right
after svm_vcpu_after_set_cpuid.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210713142023.106183-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

feea0136

26 7月, 2021 2 次提交

KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state() · 2bb16bea

由 Vitaly Kuznetsov 提交于 7月 19, 2021

Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match
'memcpy(dest, src)' to avoid any confusion.

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210719090322.625277-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bb16bea

KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state() · 9a9e7481

由 Vitaly Kuznetsov 提交于 7月 16, 2021

To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to
svm_copy_vmloadsave_state().

Opportunistically add missing braces to 'else' branch in
vmload_vmsave_interception().

No functional change intended.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210716144104.465269-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a9e7481

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功