提交 · 45846661d10422ce9e22da21f8277540b29eca22 · openeuler / Kernel

14 4月, 2022 2 次提交

KVM: x86: Drop WARNs that assert a triple fault never "escapes" from L2 · 45846661

由 Sean Christopherson 提交于 4月 07, 2022

Remove WARNs that sanity check that KVM never lets a triple fault for L2
escape and incorrectly end up in L1.  In normal operation, the sanity
check is perfectly valid, but it incorrectly assumes that it's impossible
for userspace to induce KVM_REQ_TRIPLE_FAULT without bouncing through
KVM_RUN (which guarantees kvm_check_nested_state() will see and handle
the triple fault).

The WARN can currently be triggered if userspace injects a machine check
while L2 is active and CR4.MCE=0.  And a future fix to allow save/restore
of KVM_REQ_TRIPLE_FAULT, e.g. so that a synthesized triple fault isn't
lost on migration, will make it trivially easy for userspace to trigger
the WARN.

Clearing KVM_REQ_TRIPLE_FAULT when forcibly leaving guest mode is
tempting, but wrong, especially if/when the request is saved/restored,
e.g. if userspace restores events (including a triple fault) and then
restores nested state (which may forcibly leave guest mode).  Ignoring
the fact that KVM doesn't currently provide the necessary APIs, it's
userspace's responsibility to manage pending events during save/restore.

  ------------[ cut here ]------------
  WARNING: CPU: 7 PID: 1399 at arch/x86/kvm/vmx/nested.c:4522 nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
  Modules linked in: kvm_intel kvm irqbypass
  CPU: 7 PID: 1399 Comm: state_test Not tainted 5.17.0-rc3+ #808
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:nested_vmx_vmexit+0x7fe/0xd90 [kvm_intel]
  Call Trace:
   <TASK>
   vmx_leave_nested+0x30/0x40 [kvm_intel]
   vmx_set_nested_state+0xca/0x3e0 [kvm_intel]
   kvm_arch_vcpu_ioctl+0xf49/0x13e0 [kvm]
   kvm_vcpu_ioctl+0x4b9/0x660 [kvm]
   __x64_sys_ioctl+0x83/0xb0
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>
  ---[ end trace 0000000000000000 ]---

Fixes: cb6a32c2 ("KVM: x86: Handle triple fault in L2 without killing L1")
Cc: stable@vger.kernel.org
Cc: Chenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220407002315.78092-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45846661

KVM: x86: Move .pmu_ops to kvm_x86_init_ops and tag as __initdata · 34886e79

由 Like Xu 提交于 3月 29, 2022

The pmu_ops should be moved to kvm_x86_init_ops and tagged as __initdata.
That'll save those precious few bytes, and more importantly make
the original ops unreachable, i.e. make it harder to sneak in post-init
modification bugs.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220329235054.3534728-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

34886e79

12 4月, 2022 1 次提交

KVM: SVM: Do not activate AVIC for SEV-enabled guest · c538dc79

由 Suravee Suthikulpanit 提交于 4月 08, 2022

Since current AVIC implementation cannot support encrypted memory,
inhibit AVIC for SEV-enabled guest.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220408133710.54275-1-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c538dc79

05 4月, 2022 1 次提交

KVM: SEV: Add cond_resched() to loop in sev_clflush_pages() · 00c22013

由 Peter Gonda 提交于 3月 30, 2022

Add resched to avoid warning from sev_clflush_pages() with large number
of pages.
Signed-off-by: NPeter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Message-Id: <20220330164306.2376085-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00c22013

02 4月, 2022 21 次提交

KVM: x86: SVM: allow AVIC to co-exist with a nested guest running · f44509f8

由 Maxim Levitsky 提交于 3月 22, 2022

Inhibit the AVIC of the vCPU that is running nested for the duration of the
nested run, so that all interrupts arriving from both its vCPU siblings
and from KVM are delivered using normal IPIs and cause that vCPU to vmexit.

Note that unlike normal AVIC inhibition, there is no need to
update the AVIC mmio memslot, because the nested guest uses its
own set of paging tables.
That also means that AVIC doesn't need to be inhibited VM wide.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f44509f8

KVM: x86: nSVM: implement nested vGIF · 0b349662

由 Maxim Levitsky 提交于 3月 22, 2022

In case L1 enables vGIF for L2, the L2 cannot affect L1's GIF, regardless
of STGI/CLGI intercepts, and since VM entry enables GIF, this means
that L1's GIF is always 1 while L2 is running.

Thus in this case leave L1's vGIF in vmcb01, while letting L2
control the vGIF thus implementing nested vGIF.

Also allow KVM to toggle L1's GIF during nested entry/exit
by always using vmcb01.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0b349662

KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE · 74fd41ed

由 Maxim Levitsky 提交于 3月 22, 2022

Expose the pause filtering and threshold in the guest CPUID
and support PAUSE filtering when possible:

- If the L0 doesn't intercept PAUSE (cpu_pm=on), then allow L1 to
  have full control over PAUSE filtering.

- if the L1 doesn't intercept PAUSE, use host values and update
  the adaptive count/threshold even when running nested.

- Otherwise always exit to L1; it is not really possible to merge
  the fields correctly.  It is expected that in this case, userspace
  will not enable this feature in the guest CPUID, to avoid having the
  guest update both fields pointlessly.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

74fd41ed

KVM: x86: nSVM: implement nested LBR virtualization · d20c796c

由 Maxim Levitsky 提交于 3月 22, 2022

This was tested with kvm-unit-test that was developed
for this purpose.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-3-mlevitsk@redhat.com>
[Copy all of DEBUGCTL except for reserved bits. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d20c796c

KVM: x86: nSVM: correctly virtualize LBR msrs when L2 is running · 1d5a1b58

由 Maxim Levitsky 提交于 3月 22, 2022

When L2 is running without LBR virtualization, we should ensure
that L1's LBR msrs continue to update as usual.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322174050.241850-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d5a1b58

KVM: x86: SVM: remove vgif_enabled() · ea91559b

由 Maxim Levitsky 提交于 3月 22, 2022

KVM always uses vgif when allowed, thus there is
no need to query current vmcb for it
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-9-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ea91559b

kvm: x86: SVM: use vmcb* instead of svm->vmcb where it makes sense · db663af4

由 Maxim Levitsky 提交于 3月 22, 2022

This makes the code a bit shorter and cleaner.

No functional change intended.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db663af4

KVM: x86: SVM: use vmcb01 in init_vmcb · 1ee73a33

由 Maxim Levitsky 提交于 3月 22, 2022

Clarify that this function is not used to initialize any part of
the vmcb02.  No functional change intended.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ee73a33

KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW · 249f3249

由 Maxim Levitsky 提交于 3月 01, 2022

Use a dummy unused vmexit reason to mark the 'VM exit' that is happening
when kvm exits to handle SMM, which is not a real VM exit.

This makes it a bit easier to read the KVM trace, and avoids
other potential problems due to a stale vmexit reason in the vmcb.
If SVM_EXIT_SW somehow reaches svm_invoke_exit_handler(), instead,
svm_check_exit_valid() will return false and a WARN will be logged.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220301135526.136554-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

249f3249

KVM: x86: SVM: allow to force AVIC to be enabled · edf72123

由 Maxim Levitsky 提交于 3月 01, 2022

Apparently on some systems AVIC is disabled in CPUID but still usable.

Allow the user to override the CPUID if the user is willing to
take the risk.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220301143650.143749-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

edf72123

KVM: x86: nSVM: implement nested VMLOAD/VMSAVE · b9f3973a

由 Maxim Levitsky 提交于 3月 01, 2022

This was tested by booting L1,L2,L3 (all Linux) and checking
that no VMLOAD/VMSAVE vmexits happened.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220301143650.143749-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9f3973a

KVM: x86: SVM: fix tsc scaling when the host doesn't support it · 88099313

由 Maxim Levitsky 提交于 3月 22, 2022

It was decided that when TSC scaling is not supported,
the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0'
value.

However in this case kvm_max_tsc_scaling_ratio is not set,
which breaks various assumptions.

Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of
host support.  For consistency, do the same for VMX.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88099313

kvm: x86: SVM: remove unused defines · f37b735e

由 Maxim Levitsky 提交于 3月 22, 2022

Remove some unused #defines from svm.c
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f37b735e

KVM: x86: SVM: move tsc ratio definitions to svm.h · bb2aa78e

由 Maxim Levitsky 提交于 3月 22, 2022

Another piece of SVM spec which should be in the header file
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb2aa78e

KVM: x86: SVM: fix avic spec based definitions again · 0dacc3df

由 Maxim Levitsky 提交于 3月 22, 2022

Due to wrong rebase, commit
4a204f78 ("KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255")

moved avic spec #defines back to avic.c.

Move them back, and while at it extend AVIC_DOORBELL_PHYSICAL_ID_MASK to 12
bits as well (it will be used in nested avic)
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dacc3df

KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs · 9b026073

由 Jim Mattson 提交于 2月 26, 2022

AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.

When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.

This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.

For example,

root@Ubuntu1804:~# perf stat -e r26 -a sleep 1

 Performance counter stats for 'system wide':

                 0      r26

       1.001070977 seconds time elapsed

Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90

Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: NLotus Fenn <lotusf@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NDavid Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b026073

KVM: x86: Add wrappers for setting/clearing APICv inhibits · 320af55a

由 Sean Christopherson 提交于 3月 11, 2022

Add set/clear wrappers for toggling APICv inhibits to make the call sites
more readable, and opportunistically rename the inner helpers to align
with the new wrappers and to make them more readable as well. Invert the
flag from "activate" to "set"; activate is painfully ambiguous as it's
not obvious if the inhibit is being activated, or if APICv is being
activated, in which case the inhibit is being deactivated.

For the functions that take @set, swap the order of the inhibit reason
and @set so that the call sites are visually similar to those that bounce
through the wrapper.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

320af55a

KVM: x86: Make APICv inhibit reasons an enum and cleanup naming · 7491b7b2

由 Sean Christopherson 提交于 3月 11, 2022

Use an enum for the APICv inhibit reasons, there is no meaning behind
their values and they most definitely are not "unsigned longs".  Rename
the various params to "reason" for consistency and clarity (inhibit may
be confused as a command, i.e. inhibit APICv, instead of the reason that
is getting toggled/checked).

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7491b7b2

KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode() · 4a9e7b9e

由 Peter Gonda 提交于 3月 04, 2022

Include kvm_cache_regs.h to pick up the definition of is_guest_mode(),
which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove
include from svm_onhpyerv.c which was done only because of lack of
include in svm.h.

Fixes: 883b0a91 ("KVM: SVM: Move Nested SVM Implementation to nested.c")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20220304161032.2270688-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a9e7b9e

KVM: x86/pmu: Use different raw event masks for AMD and Intel · 95b065bf

由 Jim Mattson 提交于 3月 07, 2022

The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.

Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().

Fixes: 710c4765 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95b065bf

KVM: SVM: fix panic on out-of-bounds guest IRQ · a80ced6e

由 Yi Wang 提交于 3月 09, 2022

As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 3a8b0677 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.
Co-developed-by: NYi Liu <liu.yi24@zte.com.cn>
Signed-off-by: NYi Liu <liu.yi24@zte.com.cn>
Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a80ced6e

21 3月, 2022 1 次提交

KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask · bf07be36

由 Maxim Levitsky 提交于 3月 18, 2022

KVM_X86_OP_OPTIONAL_RET0 can only be used with 32-bit return values on 32-bit
systems, because unsigned long is only 32-bits wide there and 64-bit values
are returned in edx:eax.
Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bf07be36

08 3月, 2022 1 次提交

KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255 · 4a204f78

由 Suravee Suthikulpanit 提交于 2月 10, 2022

Expand KVM's mask for the AVIC host physical ID to the full 12 bits defined
by the architecture.  The number of bits consumed by hardware is model
specific, e.g. early CPUs ignored bits 11:8, but there is no way for KVM
to enumerate the "true" size.  So, KVM must allow using all bits, else it
risks rejecting completely legal x2APIC IDs on newer CPUs.

This means KVM relies on hardware to not assign x2APIC IDs that exceed the
"true" width of the field, but presumably hardware is smart enough to tie
the width to the max x2APIC ID.  KVM also relies on hardware to support at
least 8 bits, as the legacy xAPIC ID is writable by software.  But, those
assumptions are unavoidable due to the lack of any way to enumerate the
"true" width.

Cc: stable@vger.kernel.org
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220211000851.185799-1-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a204f78

02 3月, 2022 1 次提交

KVM: SVM: Disable preemption across AVIC load/put during APICv refresh · b652de1e

由 Sean Christopherson 提交于 3月 01, 2022

Disable preemption when loading/putting the AVIC during an APICv refresh.
If the vCPU task is preempted and migrated ot a different pCPU, the
unprotected avic_vcpu_load() could set the wrong pCPU in the physical ID
cache/table.

Pull the necessary code out of avic_vcpu_{,un}blocking() and into a new
helper to reduce the probability of introducing this exact bug a third
time.

Fixes: df7e4827 ("KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC")
Cc: stable@vger.kernel.org
Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b652de1e

01 3月, 2022 3 次提交

KVM: SVM: Exit to userspace on ENOMEM/EFAULT GHCB errors · aa9f5841

由 Sean Christopherson 提交于 2月 25, 2022

Exit to userspace if setup_vmgexit_scratch() fails due to OOM or because
copying data from guest (userspace) memory failed/faulted. The OOM
scenario is clearcut, it's userspace's decision as to whether it should
terminate the guest, free memory, etc...

As for -EFAULT, arguably, any guest issue is a violation of the guest's
contract with userspace, and thus userspace needs to decide how to
proceed. E.g. userspace defines what is RAM vs. MMIO and communicates
that directly to the guest, KVM is not involved in deciding what is/isn't
RAM nor in communicating that information to the guest. If the scratch
GPA doesn't resolve to a memslot, then the guest is not honoring the
memory configuration as defined by userspace.

And if userspace unmaps an hva for whatever reason, then exiting to
userspace with -EFAULT is absolutely the right thing to do. KVM's ABI
currently sucks and doesn't provide enough information to act on the
-EFAULT, but that will hopefully be remedied in the future as there are
multiple use cases, e.g. uffd and virtiofs truncation, that shouldn't
require any work in KVM beyond returning -EFAULT with a small amount of
metadata.

KVM could define its ABI such that failure to access the scratch area is
reflected into the guest, i.e. establish a contract with userspace, but
that's undesirable as it limits KVM's options in the future, e.g. in the
potential uffd case any failure on a uaccess needs to kick out to
userspace. KVM does have several cases where it reflects these errors
into the guest, e.g. kvm_pv_clock_pairing() and Hyper-V emulation, but
KVM would preferably "fix" those instead of propagating the falsehood
that any memory failure is the guest's fault.

Lastly, returning a boolean as an "error" for that a helper that isn't
named accordingly never works out well.

Fixes: ad5b3532 ("KVM: SVM: Do not terminate SEV-ES guests on GHCB validation failure")
Cc: Alper Gun <alpergun@google.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220225205209.3881130-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aa9f5841

KVM: SVM: Don't rewrite guest ICR on AVIC IPI virtualization failure · b51818af

由 Sean Christopherson 提交于 2月 04, 2022

Don't bother rewriting the ICR value into the vAPIC page on an AVIC IPI
virtualization failure, the access is a trap, i.e. the value has already
been written to the vAPIC page.  The one caveat is if hardware left the
BUSY flag set (which appears to happen somewhat arbitrarily), in which
case go through the "nodecode" APIC-write path in order to clear the BUSY
flag.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220204214205.3306634-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b51818af

KVM: SVM: Use common kvm_apic_write_nodecode() for AVIC write traps · ed60920e

由 Sean Christopherson 提交于 2月 04, 2022

Use the common kvm_apic_write_nodecode() to handle AVIC/APIC-write traps
instead of open coding the same exact code.  This will allow making the
low level lapic helpers inaccessible outside of lapic.c code.

Opportunistically clean up the params to eliminate a bunch of svm=>vcpu
reflection.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220204214205.3306634-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed60920e

25 2月, 2022 3 次提交

KVM: x86/mmu: load new PGD after the shadow MMU is initialized · 3cffc89d

由 Paolo Bonzini 提交于 2月 04, 2022

Now that __kvm_mmu_new_pgd does not look at the MMU's root_level and
shadow_root_level anymore, pull the PGD load after the initialization of
the shadow MMUs.

Besides being more intuitive, this enables future simplifications
and optimizations because it's not necessary anymore to compute the
role outside kvm_init_mmu.  In particular, kvm_mmu_reset_context was not
attempting to use a cached PGD to avoid having to figure out the new role.
With this change, it could follow what nested_{vmx,svm}_load_cr3 are doing,
and avoid unloading all the cached roots.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3cffc89d

KVM: x86: Provide per VM capability for disabling PMU virtualization · ba7bb663

由 David Dunn 提交于 2月 23, 2022

Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of
settings/features to allow userspace to configure PMU virtualization on
a per-VM basis.  For now, support a single flag, KVM_PMU_CAP_DISABLE,
to allow disabling PMU virtualization for a VM even when KVM is configured
with enable_pmu=true a module level.

To keep KVM simple, disallow changing VM's PMU configuration after vCPUs
have been created.
Signed-off-by: NDavid Dunn <daviddunn@google.com>
Message-Id: <20220223225743.2703915-2-daviddunn@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba7bb663

KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non... · e910a53f

由 Maxim Levitsky 提交于 2月 23, 2022

KVM: x86: nSVM: disallow userspace setting of MSR_AMD64_TSC_RATIO to non default value when tsc scaling disabled

If nested tsc scaling is disabled, MSR_AMD64_TSC_RATIO should
never have non default value.

Due to way nested tsc scaling support was implmented in qemu,
it would set this msr to 0 when nested tsc scaling was disabled.
Ignore that value for now, as it causes no harm.

Fixes: 5228eb96 ("KVM: x86: nSVM: implement nested TSC scaling")
Cc: stable@vger.kernel.org
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220223115649.319134-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e910a53f

19 2月, 2022 3 次提交

KVM: x86: allow defining return-0 static calls · 5be2226f

由 Paolo Bonzini 提交于 2月 15, 2022

A few vendor callbacks are only used by VMX, but they return an integer
or bool value.  Introduce KVM_X86_OP_OPTIONAL_RET0 for them: if a func is
NULL in struct kvm_x86_ops, it will be changed to __static_call_return0
when updating static calls.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5be2226f

KVM: x86: make several APIC virtualization callbacks optional · abb6d479

由 Paolo Bonzini 提交于 2月 08, 2022

All their invocations are conditional on vcpu->arch.apicv_active,
meaning that they need not be implemented by vendor code: even
though at the moment both vendors implement APIC virtualization,
all of them can be optional.  In fact SVM does not need many of
them, and their implementation can be deleted now.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

abb6d479

KVM: x86: return 1 unconditionally for availability of KVM_CAP_VAPIC · 8a289785

由 Paolo Bonzini 提交于 2月 15, 2022

The two ioctls used to implement userspace-accelerated TPR,
KVM_TPR_ACCESS_REPORTING and KVM_SET_VAPIC_ADDR, are available
even if hardware-accelerated TPR can be used.  So there is
no reason not to report KVM_CAP_VAPIC.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a289785

18 2月, 2022 1 次提交

KVM: SEV: Allow SEV intra-host migration of VM with mirrors · b2125513

由 Peter Gonda 提交于 2月 11, 2022

For SEV-ES VMs with mirrors to be intra-host migrated they need to be
able to migrate with the mirror. This is due to that fact that all VMSAs
need to be added into the VM with LAUNCH_UPDATE_VMSA before
lAUNCH_FINISH. Allowing migration with mirrors allows users of SEV-ES to
keep the mirror VMs VMSAs during migration.

Adds a list of mirror VMs for the original VM iterate through during its
migration. During the iteration the owner pointers can be updated from
the source to the destination. This fixes the ASID leaking issue which
caused the blocking of migration of VMs with mirrors.
Signed-off-by: NPeter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20220211193634.3183388-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2125513

14 2月, 2022 1 次提交

KVM: SVM: Rename AVIC helpers to use "avic" prefix instead of "svm" · db6e7adf

由 Sean Christopherson 提交于 1月 28, 2022

Use "avic" instead of "svm" for SVM's all of APICv hooks and make a few
additional funciton name tweaks so that the AVIC functions conform to
their associated kvm_x86_ops hooks.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220128005208.4008533-19-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db6e7adf

12 2月, 2022 1 次提交

KVM: SVM: fix race between interrupt delivery and AVIC inhibition · 66fa226c

由 Maxim Levitsky 提交于 2月 08, 2022

If svm_deliver_avic_intr is called just after the target vcpu's AVIC got
inhibited, it might read a stale value of vcpu->arch.apicv_active
which can lead to the target vCPU not noticing the interrupt.

To fix this use load-acquire/store-release so that, if the target vCPU
is IN_GUEST_MODE, we're guaranteed to see a previous disabling of the
AVIC.  If AVIC has been disabled in the meanwhile, proceed with the
KVM_REQ_EVENT-based delivery.

Incomplete IPI vmexit has the same races as svm_deliver_avic_intr, and
in fact it can be handled in exactly the same way; the only difference
lies in who has set IRR, whether svm_deliver_interrupt or the processor.
Therefore, svm_complete_interrupt_delivery can be used to fix incomplete
IPI vmexits as well.
Co-developed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66fa226c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功