提交 · 594a1c271c159c9c5f0ff2d92ebfda469e94e48d · openeuler / Kernel

14 7月, 2022 9 次提交

KVM: selftests: Fix filename reporting in guest asserts · 594a1c27

由 Colton Lewis 提交于 6月 15, 2022

Fix filename reporting in guest asserts by ensuring the GUEST_ASSERT
macro records __FILE__ and substituting REPORT_GUEST_ASSERT for many
repetitive calls to TEST_FAIL.

Previously filename was reported by using __FILE__ directly in the
selftest, wrongly assuming it would always be the same as where the
assertion failed.
Signed-off-by: NColton Lewis <coltonlewis@google.com>
Reported-by: NRicardo Koller <ricarkol@google.com>
Fixes: 4e18bccc
Link: https://lore.kernel.org/r/20220615193116.806312-5-coltonlewis@google.com
[sean: convert more TEST_FAIL => REPORT_GUEST_ASSERT instances]
Signed-off-by: NSean Christopherson <seanjc@google.com>

594a1c27

KVM: selftests: Write REPORT_GUEST_ASSERT macros to pair with GUEST_ASSERT · ddcb57af

由 Colton Lewis 提交于 6月 15, 2022

Write REPORT_GUEST_ASSERT macros to pair with GUEST_ASSERT to abstract
and make consistent all guest assertion reporting. Every report
includes an explanatory string, a filename, and a line number.
Signed-off-by: NColton Lewis <coltonlewis@google.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20220615193116.806312-4-coltonlewis@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>

ddcb57af

KVM: selftests: Increase UCALL_MAX_ARGS to 7 · fc573fa4

由 Colton Lewis 提交于 6月 15, 2022

Increase UCALL_MAX_ARGS to 7 to allow GUEST_ASSERT_4 to pass 3 builtin
ucall arguments specified in guest_assert_builtin_args plus 4
user-specified arguments.
Signed-off-by: NColton Lewis <coltonlewis@google.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20220615193116.806312-3-coltonlewis@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>

fc573fa4

KVM: selftests: enumerate GUEST_ASSERT arguments · 8fb2638a

由 Colton Lewis 提交于 6月 15, 2022

Enumerate GUEST_ASSERT arguments to avoid magic indices to ucall.args.
Signed-off-by: NColton Lewis <coltonlewis@google.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Link: https://lore.kernel.org/r/20220615193116.806312-2-coltonlewis@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>

8fb2638a

KVM: x86: WARN only once if KVM leaves a dangling userspace I/O request · 0bc27326

由 Sean Christopherson 提交于 7月 11, 2022

Change a WARN_ON() to separate WARN_ON_ONCE() if KVM has an outstanding
PIO or MMIO request without an associated callback, i.e. if KVM queued a
userspace I/O exit but didn't actually exit to userspace before moving
on to something else. Warning on every KVM_RUN risks spamming the kernel
if KVM gets into a bad state. Opportunistically split the WARNs so that
it's easier to triage failures when a WARN fires.

Deliberately do not use KVM_BUG_ON(), i.e. don't kill the VM. While the
WARN is all but guaranteed to fire if and only if there's a KVM bug, a
dangling I/O request does not present a danger to KVM (that flag is truly
truly consumed only in a single emulator path), and any such bug is
unlikely to be fatal to the VM (KVM essentially failed to do something it
shouldn't have tried to do in the first place). In other words, note the
bug, but let the VM keep running.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-4-seanjc@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>

0bc27326

KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GP · 26262069

由 Sean Christopherson 提交于 7月 11, 2022

When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set
the error code to the selector. Intel SDM's says nothing about the #GP,
but AMD's APM explicitly states that both LLDT and LTR set the error code
to the selector, not zero.

Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0),
but the KVM code in question is specific to the base from the descriptor.

Fixes: e37a75a1 ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>

26262069

KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks · ec6e4d86

由 Sean Christopherson 提交于 7月 11, 2022

Wait to mark the TSS as busy during LTR emulation until after all fault
checks for the LTR have passed. Specifically, don't mark the TSS busy if
the new TSS base is non-canonical.

Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the
only reason for the early check was to avoid marking a !PRESENT TSS as
busy, i.e. the common !PRESENT is now done before setting the busy bit.

Fixes: e37a75a1 ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR")
Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>

ec6e4d86

KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specific · 43bb9e00

由 Sean Christopherson 提交于 7月 11, 2022

Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear
that the quirk only controls the #UD behavior of MONITOR/MWAIT. KVM
doesn't currently enforce fault checks when MONITOR/MWAIT are supported,
but that could change in the future. SVM also has a virtualization hole
in that it checks all faults before intercepts, and so "never faults" is
already a lie when running on SVM.

Fixes: bfbcc81b ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com

43bb9e00

KVM: selftests: Use "a" and "d" to set EAX/EDX for wrmsr_safe() · 14fd95bf

由 Vitaly Kuznetsov 提交于 7月 14, 2022

Do not use GCC's "A" constraint to load EAX:EDX in wrmsr_safe().  Per
GCC's documenation on x86-specific constraints, "A" will not actually
load a 64-bit value into EAX:EDX on x86-64.

  The a and d registers. This class is used for instructions that return
  double word results in the ax:dx register pair. Single word values will
  be allocated either in ax or dx. For example on i386 the following
  implements rdtsc:

  unsigned long long rdtsc (void)
  {
    unsigned long long tick;
    __asm__ __volatile__("rdtsc":"=A"(tick));
    return tick;
  }

  This is not correct on x86-64 as it would allocate tick in either ax or
  dx. You have to use the following variant instead:

  unsigned long long rdtsc (void)
  {
    unsigned int tickl, tickh;
    __asm__ __volatile__("rdtsc":"=a"(tickl),"=d"(tickh));
    return ((unsigned long long)tickh << 32)|tickl;
  }

Because a u64 fits in a single 64-bit register, using "A" for selftests,
which are 64-bit only, results in GCC loading the value into either RAX
or RDX instead of splitting it across EAX:EDX.

E.g.:

  kvm_exit:             reason MSR_WRITE rip 0x402919 info 0 0
  kvm_msr:              msr_write 40000118 = 0x60000000001 (#GP)
...

With "A":

  48 8b 43 08          	mov    0x8(%rbx),%rax
  49 b9 ba da ca ba 0a 	movabs $0xabacadaba,%r9
  00 00 00
  4c 8d 15 07 00 00 00 	lea    0x7(%rip),%r10        # 402f44 <guest_msr+0x34>
  4c 8d 1d 06 00 00 00 	lea    0x6(%rip),%r11        # 402f4a <guest_msr+0x3a>
  0f 30                 wrmsr

With "a"/"d":

  48 8b 53 08             mov    0x8(%rbx),%rdx
  89 d0                   mov    %edx,%eax
  48 c1 ea 20             shr    $0x20,%rdx
  49 b9 ba da ca ba 0a    movabs $0xabacadaba,%r9
  00 00 00
  4c 8d 15 07 00 00 00    lea    0x7(%rip),%r10        # 402fc3 <guest_msr+0xb3>
  4c 8d 1d 06 00 00 00    lea    0x6(%rip),%r11        # 402fc9 <guest_msr+0xb9>
  0f 30                   wrmsr

Fixes: 3b23054c ("KVM: selftests: Add x86-64 support for exception fixup")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Link: https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints
[sean: use "& -1u", provide GCC blurb and link to documentation]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220714011115.3135828-1-seanjc@google.com

14fd95bf

13 7月, 2022 5 次提交

KVM: selftests: Provide valid inputs for MONITOR/MWAIT regs · b624ae35

由 Sean Christopherson 提交于 7月 11, 2022

Provide valid inputs for RAX, RCX, and RDX when testing whether or not
KVM injects a #UD on MONITOR/MWAIT.  SVM has a virtualization hole and
checks for _all_ faults before checking for intercepts, e.g. MONITOR with
an unsupported RCX will #GP before KVM gets a chance to intercept and
emulate.

Fixes: 2325d4dd ("KVM: selftests: Add MONITOR/MWAIT quirk test")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220711225753.1073989-3-seanjc@google.com

b624ae35

KVM: selftests: Test MONITOR and MWAIT, not just MONITOR for quirk · 874190fd

由 Sean Christopherson 提交于 7月 11, 2022

Fix a copy+paste error in monitor_mwait_test by switching one of the two
"monitor" instructions to an "mwait". The intent of the test is very
much to verify the quirk handles both MONITOR and MWAIT.

Fixes: 2325d4dd ("KVM: selftests: Add MONITOR/MWAIT quirk test")
Reported-by: NYuan Yao <yuan.yao@linux.intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220711225753.1073989-2-seanjc@google.com

874190fd

KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor, again · 79f772b9

由 Sean Christopherson 提交于 6月 14, 2022

Read vcpu->vcpu_idx directly instead of bouncing through the one-line
wrapper, kvm_vcpu_get_idx(), and drop the wrapper.  The wrapper is a
remnant of the original implementation and serves no purpose; remove it
(again) before it gains more users.

kvm_vcpu_get_idx() was removed in the not-too-distant past by commit
4eeef242 ("KVM: x86: Query vcpu->vcpu_idx directly and drop its
accessor"), but was unintentionally re-introduced by commit a54d8066
("KVM: Keep memslots in tree-based structures instead of array-based ones"),
likely due to a rebase goof.  The wrapper then managed to gain users in
KVM's Xen code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220614225615.3843835-1-seanjc@google.com

79f772b9

KVM: x86/mmu: Replace UNMAPPED_GVA with INVALID_GPA for gva_to_gpa() · 6e1d2a3f

由 Hou Wenlong 提交于 7月 01, 2022

The result of gva_to_gpa() is physical address not virtual address,
it is odd that UNMAPPED_GVA macro is used as the result for physical
address. Replace UNMAPPED_GVA with INVALID_GPA and drop UNMAPPED_GVA
macro.

No functional change intended.
Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/6104978956449467d3c68f1ad7f2c2f6d771d0ee.1656667239.git.houwenlong.hwl@antgroup.comSigned-off-by: NSean Christopherson <seanjc@google.com>

6e1d2a3f

KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 · 156b9d76

由 Vitaly Kuznetsov 提交于 7月 12, 2022

Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to
hang upon boot or shortly after when a non-default TSC frequency was
set for L1. The issue is observed on a host where TSC scaling is
supported. The problem appears to be that Windows doesn't use TSC
scaling for its guests, even when the feature is advertised, and KVM
filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from
L1's VMCS. This leads to L2 running with the default frequency (matching
host's) while L1 is running with an altered one.

Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when
it was set for L1. TSC_MULTIPLIER is already correctly computed and
written by prepare_vmcs02().
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Fixes: d041b5ea ("KVM: nVMX: Enable nested TSC scaling")
Cc: stable@vger.kernel.org
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Link: https://lore.kernel.org/r/20220712135009.952805-1-vkuznets@redhat.comSigned-off-by: NSean Christopherson <seanjc@google.com>

156b9d76

09 7月, 2022 7 次提交

KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() · 159e037d

由 Vitaly Kuznetsov 提交于 7月 08, 2022

'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left
uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally
not needed for APIC_DM_REMRD, they're still referenced by
__apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize
the structure to avoid consuming random stack memory.

Fixes: a183b638 ("KVM: x86: make apic_accept_irq tracepoint more generic")
Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220708125147.593975-1-vkuznets@redhat.comSigned-off-by: NSean Christopherson <seanjc@google.com>

159e037d

KVM: x86: Fix handling of APIC LVT updates when userspace changes MCG_CAP · f83894b2

由 Sean Christopherson 提交于 7月 08, 2022

Add a helper to update KVM's in-kernel local APIC in response to MCG_CAP
being changed by userspace to fix multiple bugs.  First and foremost,
KVM needs to check that there's an in-kernel APIC prior to dereferencing
vcpu->arch.apic.  Beyond that, any "new" LVT entries need to be masked,
and the APIC version register needs to be updated as it reports out the
number of LVT entries.

Fixes: 4b903561 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.")
Reported-by: syzbot+8cdad6430c24f396f158@syzkaller.appspotmail.com
Cc: Siddh Raman Pant <code@siddh.me>
Cc: Jue Wang <juew@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>

f83894b2

KVM: x86: Initialize number of APIC LVT entries during APIC creation · 03d84f96

由 Sean Christopherson 提交于 7月 08, 2022

Initialize the number of LVT entries during APIC creation, else the field
will be incorrectly left '0' if userspace never invokes KVM_X86_SETUP_MCE.

Add and use a helper to calculate the number of entries even though
MCG_CMCI_P is not set by default in vcpu->arch.mcg_cap. Relying on that
to always be true is unnecessarily risky, and subtle/confusing as well.

Fixes: 4b903561 ("KVM: x86: Add Corrected Machine Check Interrupt (CMCI) emulation to lapic.")
Reported-by: NXiaoyao Li <xiaoyao.li@intel.com>
Cc: Jue Wang <juew@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>

03d84f96

Merge branch 'kvm-5.20-msr-eperm' · 4a627b0b

由 Sean Christopherson 提交于 7月 08, 2022

Merge a bug fix and cleanups for {g,s}et_msr_mce() using a base that
predates commit 281b5278 ("KVM: x86: Add emulation for
MSR_IA32_MCx_CTL2 MSRs."), which was written with the intention that it
be applied _after_ the bug fix and cleanups. The bug fix in particular
needs to be sent to stable trees; give them a stable hash to use.

4a627b0b

KVM: x86: Add helpers to identify CTL and STATUS MCi MSRs · 54ad60ba

由 Sean Christopherson 提交于 5月 12, 2022

Add helpers to identify CTL (control) and STATUS MCi MSR types instead of
open coding the checks using the offset. Using the offset is perfectly
safe, but unintuitive, as understanding what the code does requires
knowing that the offset calcuation will not affect the lower three bits.

Opportunistically comment the STATUS logic to save readers a trip to
Intel's SDM or AMD's APM to understand the "data != 0" check.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-4-seanjc@google.com

54ad60ba

KVM: x86: Use explicit case-statements for MCx banks in {g,s}et_msr_mce() · f5223a33

由 Sean Christopherson 提交于 5月 12, 2022

Use an explicit case statement to grab the full range of MCx bank MSRs
in {g,s}et_msr_mce(), and manually check only the "end" (the number of
banks configured by userspace may be less than the max).  The "default"
trick works, but is a bit odd now, and will be quite odd if/when support
for accessing MCx_CTL2 MSRs is added, which has near identical logic.

Hoist "offset" to function scope so as to avoid curly braces for the case
statement, and because MCx_CTL2 support will need the same variables.

Opportunstically clean up the comment about allowing bit 10 to be cleared
from bank 4.

No functional change intended.

Cc: Jue Wang <juew@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-3-seanjc@google.com

f5223a33

KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS) · 2368048b

由 Sean Christopherson 提交于 5月 12, 2022

Return '1', not '-1', when handling an illegal WRMSR to a MCi_CTL or
MCi_STATUS MSR.  The behavior of "all zeros' or "all ones" for CTL MSRs
is architectural, as is the "only zeros" behavior for STATUS MSRs.  I.e.
the intent is to inject a #GP, not exit to userspace due to an unhandled
emulation case.  Returning '-1' gets interpreted as -EPERM up the stack
and effecitvely kills the guest.

Fixes: 890ca9ae ("KVM: Add MCE support")
Fixes: 9ffd986c ("KVM: X86: #GP when guest attempts to write MCi_STATUS register w/o 0")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220512222716.4112548-2-seanjc@google.com

2368048b

25 6月, 2022 19 次提交

KVM: x86/mmu: Buffer nested MMU split_desc_cache only by default capacity · b9b71f43

由 Sean Christopherson 提交于 6月 24, 2022

Buffer split_desc_cache, the cache used to allcoate rmap list entries,
only by the default cache capacity (currently 40), not by doubling the
minimum (513). Aliasing L2 GPAs to L1 GPAs is uncommon, thus eager page
splitting is unlikely to need 500+ entries. And because each object is a
non-trivial 128 bytes (see struct pte_list_desc), those extra ~500
entries means KVM is in all likelihood wasting ~64kb of memory per VM.

Link: https://lore.kernel.org/all/YrTDcrsn0%2F+alpzf@google.com
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220624171808.2845941-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9b71f43

KVM: x86/mmu: Use "unsigned int", not "u32", for SPTEs' @access info · 72ae5822

由 Sean Christopherson 提交于 6月 24, 2022

Use an "unsigned int" for @access parameters instead of a "u32", mostly
to be consistent throughout KVM, but also because "u32" is misleading.
@access can actually squeeze into a u8, i.e. doesn't need 32 bits, but is
as an "unsigned int" because sp->role.access is an unsigned int.

No functional change intended.

Link: https://lore.kernel.org/all/YqyZxEfxXLsHGoZ%2F@google.com
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220624171808.2845941-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72ae5822

KVM: SEV-ES: reuse advance_sev_es_emulated_ins for OUT too · db209369

由 Paolo Bonzini 提交于 10月 22, 2021

complete_emulator_pio_in() only has to be called by
complete_sev_es_emulated_ins() now; therefore, all that the function does
now is adjust sev_pio_count and sev_pio_data.  Which is the same for
both IN and OUT.

No functional change intended.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db209369

KVM: x86: de-underscorify __emulator_pio_in · f35cee4a

由 Paolo Bonzini 提交于 10月 22, 2021

Now all callers except emulator_pio_in_emulated are using
__emulator_pio_in/complete_emulator_pio_in explicitly.
Move the "either copy the result or attempt PIO" logic in
emulator_pio_in_emulated, and rename __emulator_pio_in to
just emulator_pio_in.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f35cee4a

KVM: x86: wean fast IN from emulator_pio_in · dc7a4bfd

由 Paolo Bonzini 提交于 10月 22, 2021

Use __emulator_pio_in() directly for fast PIO instead of bouncing through
emulator_pio_in() now that __emulator_pio_in() fills "val" when handling
in-kernel PIO.  vcpu->arch.pio.count is guaranteed to be '0', so this a
pure nop.

emulator_pio_in_emulated is now the last caller of emulator_pio_in.

No functional change intended.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dc7a4bfd

KVM: x86: wean in-kernel PIO from vcpu->arch.pio* · 0c05e10b

由 Paolo Bonzini 提交于 6月 15, 2022

Make emulator_pio_in_out operate directly on the provided buffer
as long as PIO is handled inside KVM.

For input operations, this means that, in the case of in-kernel
PIO, __emulator_pio_in() does not have to be always followed
by complete_emulator_pio_in().  This affects emulator_pio_in() and
kvm_sev_es_ins(); for the latter, that is why the call moves from
advance_sev_es_emulated_ins() to complete_sev_es_emulated_ins().

For output, it means that vcpu->pio.count is never set unnecessarily
and there is no need to clear it; but also vcpu->pio.size must not
be used in kvm_sev_es_outs(), because it will not be updated for
in-kernel OUT.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0c05e10b

KVM: x86: move all vcpu->arch.pio* setup in emulator_pio_in_out() · 30d583fd

由 Paolo Bonzini 提交于 10月 22, 2021

For now, this is basically an excuse to add back the void* argument to
the function, while removing some knowledge of vcpu->arch.pio* from
its callers.  The WARN that vcpu->arch.pio.count is zero is also
extended to OUT operations.

The vcpu->arch.pio* fields still need to be filled even when the PIO is
handled in-kernel as __emulator_pio_in() is always followed by
complete_emulator_pio_in().  But after fixing that, it will be possible to
to only populate the vcpu->arch.pio* fields on userspace exits.

No functional change intended.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30d583fd

KVM: x86: drop PIO from unregistered devices · 35ab3b77

由 Paolo Bonzini 提交于 6月 15, 2022

KVM protects the device list with SRCU, and therefore different calls
to kvm_io_bus_read()/kvm_io_bus_write() can very well see different
incarnations of kvm->buses.  If userspace unregisters a device while
vCPUs are running there is no well-defined result.  This patch applies
a safe fallback by returning early from emulator_pio_in_out().  This
corresponds to returning zeroes from IN, and dropping the writes on
the floor for OUT.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35ab3b77

KVM: x86: inline kernel_pio into its sole caller · 0f87ac23

由 Paolo Bonzini 提交于 10月 22, 2021

The caller of kernel_pio already has arguments for most of what kernel_pio
fishes out of vcpu->arch.pio.  This is the first step towards ensuring that
vcpu->arch.pio.* is only used when exiting to userspace.

No functional change intended.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f87ac23

KVM: x86: complete fast IN directly with complete_emulator_pio_in() · 7a6177d6

由 Paolo Bonzini 提交于 6月 15, 2022

Use complete_emulator_pio_in() directly when completing fast PIO, there's
no need to bounce through emulator_pio_in(): the comment about ECX
changing doesn't apply to fast PIO, which isn't used for string I/O.

No functional change intended.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a6177d6

KVM: x86: nSVM: optimize svm_set_x2apic_msr_interception · 091abbf5

由 Maxim Levitsky 提交于 5月 19, 2022

- Avoid toggling the x2apic msr interception if it is already up to date.

- Avoid touching L0 msr bitmap when AVIC is inhibited on entry to
  the guest mode, because in this case the guest usually uses its
  own msr bitmap.

  Later on VM exit, the 1st optimization will allow KVM to skip
  touching the L0 msr bitmap as well.
Reviewed-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220519102709.24125-18-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

091abbf5

KVM: SVM: Add AVIC doorbell tracepoint · 39b6b8c3

由 Suravee Suthikulpanit 提交于 5月 19, 2022

Add a tracepoint to track number of doorbells being sent
to signal a running vCPU to process IRQ after being injected.
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-17-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

39b6b8c3

KVM: SVM: Use target APIC ID to complete x2AVIC IRQs when possible · 8c9e639d

由 Suravee Suthikulpanit 提交于 5月 19, 2022

For x2AVIC, the index from incomplete IPI #vmexit info is invalid
for logical cluster mode. Only ICRH/ICRL values can be used
to determine the IPI destination APIC ID.

Since QEMU defines guest physical APIC ID to be the same as
vCPU ID, it can be used to quickly identify the target vCPU to deliver IPI,
and avoid the overhead from searching through all vCPUs to match the target
vCPU.
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-16-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8c9e639d

KVM: x86: Warning APICv inconsistency only when vcpu APIC mode is valid · f8d8ac21

由 Suravee Suthikulpanit 提交于 5月 19, 2022

When launching a VM with x2APIC and specify more than 255 vCPUs,
the guest kernel can disable x2APIC (e.g. specify nox2apic kernel option).
The VM fallbacks to xAPIC mode, and disable the vCPU ID 255 and greater.

In this case, APICV is deactivated for the disabled vCPUs.
However, the current APICv consistency warning does not account for
this case, which results in a warning.

Therefore, modify warning logic to report only when vCPU APIC mode
is valid.
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-15-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8d8ac21

KVM: SVM: Introduce hybrid-AVIC mode · 0e311d33

由 Suravee Suthikulpanit 提交于 5月 19, 2022

Currently, AVIC is inhibited when booting a VM w/ x2APIC support.
because AVIC cannot virtualize x2APIC MSR register accesses.
However, the AVIC doorbell can be used to accelerate interrupt
injection into a running vCPU, while all guest accesses to x2APIC MSRs
will be intercepted and emulated by KVM.

With hybrid-AVIC support, the APICV_INHIBIT_REASON_X2APIC is
no longer enforced.
Suggested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-14-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0e311d33

KVM: SVM: Do not throw warning when calling avic_vcpu_load on a running vcpu · c0caeee6

由 Suravee Suthikulpanit 提交于 5月 19, 2022

Originalliy, this WARN_ON is designed to detect when calling
avic_vcpu_load() on an already running vcpu in AVIC mode (i.e. the AVIC
is_running bit is set).

However, for x2AVIC, the vCPU can switch from xAPIC to x2APIC mode while in
running state, in which the avic_vcpu_load() will be called from
svm_refresh_apicv_exec_ctrl().

Therefore, remove this warning since it is no longer appropriate.
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NPankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-13-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c0caeee6

KVM: SVM: Introduce logic to (de)activate x2AVIC mode · 4d1d7942

由 Suravee Suthikulpanit 提交于 5月 19, 2022

Introduce logic to (de)activate AVIC, which also allows
switching between AVIC to x2AVIC mode at runtime.

When an AVIC-enabled guest switches from APIC to x2APIC mode,
the SVM driver needs to perform the following steps:

1. Set the x2APIC mode bit for AVIC in VMCB along with the maximum
APIC ID support for each mode accodingly.

2. Disable x2APIC MSRs interception in order to allow the hardware
to virtualize x2APIC MSRs accesses.
Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-12-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4d1d7942

KVM: x86: nSVM: always intercept x2apic msrs · 7a8f7c1f

由 Maxim Levitsky 提交于 5月 19, 2022

As a preparation for x2avic, this patch ensures that x2apic msrs
are always intercepted for the nested guest.
Reviewed-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220519102709.24125-11-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a8f7c1f

KVM: SVM: Refresh AVIC configuration when changing APIC mode · 05c4fe8c

由 Suravee Suthikulpanit 提交于 5月 19, 2022

AMD AVIC can support xAPIC and x2APIC virtualization,
which requires changing x2APIC bit VMCB and MSR intercepton
for x2APIC MSRs. Therefore, call avic_refresh_apicv_exec_ctrl()
to refresh configuration accordingly.
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <20220519102709.24125-10-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

05c4fe8c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功