提交 · c3e9434c9852c09cd1756b05d1474b7c84452819 · openeuler / Kernel

11 8月, 2021 1 次提交

KVM: VMX: Use current VMCS to query WAITPKG support for MSR emulation · 7b9cae02

由 Sean Christopherson 提交于 8月 10, 2021

Use the secondary_exec_controls_get() accessor in vmx_has_waitpkg() to
effectively get the controls for the current VMCS, as opposed to using
vmx->secondary_exec_controls, which is the cached value of KVM's desired
controls for vmcs01 and truly not reflective of any particular VMCS.

While the waitpkg control is not dynamic, i.e. vmcs01 will always hold
the same waitpkg configuration as vmx->secondary_exec_controls, the same
does not hold true for vmcs02 if the L1 VMM hides the feature from L2.
If L1 hides the feature _and_ does not intercept MSR_IA32_UMWAIT_CONTROL,
L2 could incorrectly read/write L1's virtual MSR instead of taking a #GP.

Fixes: 6e3ba4ab ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210810171952.2758100-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7b9cae02

06 8月, 2021 5 次提交

KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap · 93e083d4

由 David Matlack 提交于 8月 04, 2021

gfn_to_rmap was removed in the previous patch so there is no need to
retain the double underscore on __gfn_to_rmap.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93e083d4

KVM: x86/mmu: Leverage vcpu->last_used_slot for rmap_add and rmap_recycle · 601f8af0

由 David Matlack 提交于 8月 04, 2021

rmap_add() and rmap_recycle() both run in the context of the vCPU and
thus we can use kvm_vcpu_gfn_to_memslot() to look up the memslot. This
enables rmap_add() and rmap_recycle() to take advantage of
vcpu->last_used_slot and avoid expensive memslot searching.

This change improves the performance of "Populate memory time" in
dirty_log_perf_test with tdp_mmu=N. In addition to improving the
performance, "Populate memory time" no longer scales with the number
of memslots in the VM.

Command                         | Before           | After
------------------------------- | ---------------- | -------------
./dirty_log_perf_test -v64 -x1  | 15.18001570s     | 14.99469366s
./dirty_log_perf_test -v64 -x64 | 18.71336392s     | 14.98675076s
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-6-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

601f8af0

KVM: x86/mmu: Leverage vcpu->last_used_slot in tdp_mmu_map_handle_target_level · 081de470

由 David Matlack 提交于 8月 04, 2021

The existing TDP MMU methods to handle dirty logging are vcpu-agnostic
since they can be driven by MMU notifiers and other non-vcpu-specific
events in addition to page faults. However this means that the TDP MMU
is not benefiting from the new vcpu->last_used_slot. Fix that by
introducing a tdp_mmu_map_set_spte_atomic() which is only called during
a TDP page fault and has access to the kvm_vcpu for fast slot lookups.

This improves "Populate memory time" in dirty_log_perf_test by 5%:

Command                         | Before           | After
------------------------------- | ---------------- | -------------
./dirty_log_perf_test -v64 -x64 | 5.472321072s     | 5.169832886s
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

081de470

KVM: Move last_used_slot logic out of search_memslots · 0f22af94

由 David Matlack 提交于 8月 04, 2021

Make search_memslots unconditionally search all memslots and move the
last_used_slot logic up one level to __gfn_to_memslot. This is in
preparation for introducing a per-vCPU last_used_slot.

As part of this change convert existing callers of search_memslots to
__gfn_to_memslot to avoid making any functional changes.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-3-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f22af94

KVM: Rename lru_slot to last_used_slot · 87689270

由 David Matlack 提交于 8月 04, 2021

lru_slot is used to keep track of the index of the most-recently used
memslot. The correct acronym would be "mru" but that is not a common
acronym. So call it last_used_slot which is a bit more obvious.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210804222844.1419481-2-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87689270

05 8月, 2021 2 次提交

KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds · d5aaad6f

由 Sean Christopherson 提交于 8月 04, 2021

Take a signed 'long' instead of an 'unsigned long' for the number of
pages to add/subtract to the total number of pages used by the MMU. This
fixes a zero-extension bug on 32-bit kernels that effectively corrupts
the per-cpu counter used by the shrinker.

Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
an unsigned 32-bit value on 32-bit kernels. As a result, the value used
to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
4294967295 and effectively corrupts the counter.

This was found by a staggering amount of sheer dumb luck when running
kvm-unit-tests on a 32-bit KVM build. The shrinker just happened to kick
in while running tests and do_shrink_slab() logged an error about trying
to free a negative number of objects. The truly lucky part is that the
kernel just happened to be a slightly stale build, as the shrinker no
longer yells about negative objects as of commit 18bb473e ("mm:
vmscan: shrink deferred objects proportional to priority").

vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460

Fixes: bc8a3d89 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210804214609.1096003-1-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5aaad6f

KVM: xen: do not use struct gfn_to_hva_cache · 319afe68

由 Paolo Bonzini 提交于 8月 04, 2021

gfn_to_hva_cache is not thread-safe, so it is usually used only within
a vCPU (whose code is protected by vcpu->mutex).  The Xen interface
implementation has such a cache in kvm->arch, but it is not really
used except to store the location of the shared info page.  Replace
shinfo_set and shinfo_cache with just the value that is passed via
KVM_XEN_ATTR_TYPE_SHARED_INFO; the only complication is that the
initialization value is not zero anymore and therefore kvm_xen_init_vm
needs to be introduced.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

319afe68

04 8月, 2021 6 次提交

KVM: SVM: improve the code readability for ASID management · bb2baeb2

由 Mingwei Zhang 提交于 8月 02, 2021

KVM SEV code uses bitmaps to manage ASID states. ASID 0 was always skipped
because it is never used by VM. Thus, in existing code, ASID value and its
bitmap postion always has an 'offset-by-1' relationship.

Both SEV and SEV-ES shares the ASID space, thus KVM uses a dynamic range
[min_asid, max_asid] to handle SEV and SEV-ES ASIDs separately.

Existing code mixes the usage of ASID value and its bitmap position by
using the same variable called 'min_asid'.

Fix the min_asid usage: ensure that its usage is consistent with its name;
allocate extra size for ASID 0 to ensure that each ASID has the same value
with its bitmap position. Add comments on ASID bitmap allocation to clarify
the size change.
Signed-off-by: NMingwei Zhang <mizhang@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Marc Orr <marcorr@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Alper Gun <alpergun@google.com>
Cc: Dionna Glaze <dionnaglaze@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vipin Sharma <vipinsh@google.com>
Cc: Peter Gonda <pgonda@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Message-Id: <20210802180903.159381-1-mizhang@google.com>
[Fix up sev_asid_free to also index by ASID, as suggested by Sean
 Christopherson, and use nr_asids in sev_cpu_init. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb2baeb2

KVM: SVM: Fix off-by-one indexing when nullifying last used SEV VMCB · 179c6c27

由 Sean Christopherson 提交于 8月 03, 2021

Use the raw ASID, not ASID-1, when nullifying the last used VMCB when
freeing an SEV ASID.  The consumer, pre_sev_run(), indexes the array by
the raw ASID, thus KVM could get a false negative when checking for a
different VMCB if KVM manages to reallocate the same ASID+VMCB combo for
a new VM.

Note, this cannot cause a functional issue _in the current code_, as
pre_sev_run() also checks which pCPU last did VMRUN for the vCPU, and
last_vmentry_cpu is initialized to -1 during vCPU creation, i.e. is
guaranteed to mismatch on the first VMRUN.  However, prior to commit
8a14fe4f ("kvm: x86: Move last_cpu into kvm_vcpu_arch as
last_vmentry_cpu"), SVM tracked pCPU on its own and zero-initialized the
last_cpu variable.  Thus it's theoretically possible that older versions
of KVM could miss a TLB flush if the first VMRUN is on pCPU0 and the ASID
and VMCB exactly match those of a prior VM.

Fixes: 70cd94e6 ("KVM: SVM: VMRUN should use associated ASID when SEV is enabled")
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

179c6c27

KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of perf interfaces · e79f49c3

由 Like Xu 提交于 7月 28, 2021

Based on our observations, after any vm-exit associated with vPMU, there
are at least two or more perf interfaces to be called for guest counter
emulation, such as perf_event_{pause, read_value, period}(), and each one
will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes
more severe when guest use counters in a multiplexed manner.

Holding a lock once and completing the KVM request operations in the perf
context would introduce a set of impractical new interfaces. So we can
further optimize the vPMU implementation by avoiding repeated calls to
these interfaces in the KVM context for at least one pattern:

After we call perf_event_pause() once, the event will be disabled and its
internal count will be reset to 0. So there is no need to pause it again
or read its value. Once the event is paused, event period will not be
updated until the next time it's resumed or reprogrammed. And there is
also no need to call perf_event_period twice for a non-running counter,
considering the perf_event for a running counter is never paused.

Based on this implementation, for the following common usage of
sampling 4 events using perf on a 4u8g guest:

  echo 0 > /proc/sys/kernel/watchdog
  echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent
  echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate
  echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent
  for i in `seq 1 1 10`
  do
  taskset -c 0 perf record \
  -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \
  /root/br_instr a
  done

the average latency of the guest NMI handler is reduced from
37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server.
Also, in addition to collecting more samples, no loss of sampling
accuracy was observed compared to before the optimization.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20210728120705.6855-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>

e79f49c3

KVM: X86: Optimize zapping rmap · a75b5404

由 Peter Xu 提交于 7月 30, 2021

Using rmap_get_first() and rmap_remove() for zapping a huge rmap list could be
slow.  The easy way is to travers the rmap list, collecting the a/d bits and
free the slots along the way.

Provide a pte_list_destroy() and do exactly that.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220605.26377-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a75b5404

KVM: X86: Optimize pte_list_desc with per-array counter · 13236e25

由 Peter Xu 提交于 7月 30, 2021

Add a counter field into pte_list_desc, so as to simplify the add/remove/loop
logic. E.g., we don't need to loop over the array any more for most reasons.

This will make more sense after we've switched the array size to be larger
otherwise the counter will be a waste.

Initially I wanted to store a tail pointer at the head of the array list so we
don't need to traverse the list at least for pushing new ones (if without the
counter we traverse both the list and the array). However that'll need
slightly more change without a huge lot benefit, e.g., after we grow entry
numbers per array the list traversing is not so expensive.

So let's be simple but still try to get as much benefit as we can with just
these extra few lines of changes (not to mention the code looks easier too
without looping over arrays).

I used the same a test case to fork 500 child and recycle them ("./rmap_fork
500" [1]), this patch further speeds up the total fork time of about 4%, which
is a total of 33% of vanilla kernel:

Vanilla: 473.90 (+-5.93%)
3->15 slots: 366.10 (+-4.94%)
Add counter: 351.00 (+-3.70%)

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220602.26327-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13236e25

KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger · dc1cff96

由 Peter Xu 提交于 7月 30, 2021

Currently rmap array element only contains 3 entries. However for EPT=N there
could have a lot of guest pages that got tens of even hundreds of rmap entry.

A normal distribution of a 6G guest (even if idle) shows this with rmap count
statistics:

Rmap_Count: 0 1 2-3 4-7 8-15 16-31 32-63 64-127 128-255 256-511 512-1023
Level=4K: 3089171 49005 14016 1363 235 212 15 7 0 0 0
Level=2M: 5951 227 0 0 0 0 0 0 0 0 0
Level=1G: 32 0 0 0 0 0 0 0 0 0 0

If we do some more fork some pages will grow even larger rmap counts.

This patch makes PTE_LIST_EXT bigger so it'll be more efficient for the general
use case of EPT=N as we do list reference less and the loops over PTE_LIST_EXT
will be slightly more efficient; but still not too large so less waste when
array not full.

It should not affecting EPT=Y since EPT normally only has zero or one rmap
entry for each page, so no array is even allocated.

With a test case to fork 500 child and recycle them ("./rmap_fork 500" [1]),
this patch speeds up fork time of about 29%.

Before: 473.90 (+-5.93%)
After: 366.10 (+-4.94%)

[1] https://github.com/xzpeter/clibs/commit/825436f825453de2ea5aaee4bdb1c92281efe5b3Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210730220455.26054-6-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dc1cff96

03 8月, 2021 4 次提交

KVM: x86: hyper-v: Check if guest is allowed to use XMM registers for hypercall input · 4e62aa96

由 Vitaly Kuznetsov 提交于 7月 30, 2021

TLFS states that "Availability of the XMM fast hypercall interface is
indicated via the “Hypervisor Feature Identification” CPUID Leaf
(0x40000003, see section 2.4.4) ... Any attempt to use this interface
when the hypervisor does not indicate availability will result in a #UD
fault."

Implement the check for 'strict' mode (KVM_CAP_HYPERV_ENFORCE_CPUID).
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSiddharth Chandrasekaran <sidcha@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210730122625.112848-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e62aa96

KVM: x86: Introduce trace_kvm_hv_hypercall_done() · f5714bbb

由 Vitaly Kuznetsov 提交于 7月 30, 2021

Hypercall failures are unusual with potentially far going consequences
so it would be useful to see their results when tracing.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSiddharth Chandrasekaran <sidcha@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210730122625.112848-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f5714bbb

KVM: x86: hyper-v: Check access to hypercall before reading XMM registers · 2e2f1e8d

由 Vitaly Kuznetsov 提交于 7月 30, 2021

In case guest doesn't have access to the particular hypercall we can avoid
reading XMM registers.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSiddharth Chandrasekaran <sidcha@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210730122625.112848-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e2f1e8d

KVM: const-ify all relevant uses of struct kvm_memory_slot · 269e9552

由 Hamza Mahfooz 提交于 7月 12, 2021

As alluded to in commit f36f3f28 ("KVM: add "new" argument to
kvm_arch_commit_memory_region"), a bunch of other places where struct
kvm_memory_slot is used, needs to be refactored to preserve the
"const"ness of struct kvm_memory_slot across-the-board.
Signed-off-by: NHamza Mahfooz <someguy@effective-light.com>
Message-Id: <20210713023338.57108-1-someguy@effective-light.com>
[Do not touch body of slot_rmap_walk_init. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

269e9552

02 8月, 2021 22 次提交

KVM: nSVM: remove useless kvm_clear_*_queue · db105fab

由 Paolo Bonzini 提交于 8月 02, 2021

For an event to be in injected state when nested_svm_vmrun executes,
it must have come from exitintinfo when svm_complete_interrupts ran:

  vcpu_enter_guest
   static_call(kvm_x86_run) -> svm_vcpu_run
    svm_complete_interrupts
     // now the event went from "exitintinfo" to "injected"
   static_call(kvm_x86_handle_exit) -> handle_exit
    svm_invoke_exit_handler
      vmrun_interception
       nested_svm_vmrun

However, no event could have been in exitintinfo before a VMRUN
vmexit.  The code in svm.c is a bit more permissive than the one
in vmx.c:

        if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
            exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
            exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
            exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)

but in any case, a VMRUN instruction would not even start to execute
during an attempted event delivery.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db105fab

KVM: x86: Preserve guest's CR0.CD/NW on INIT · 4c72ab5a

由 Sean Christopherson 提交于 7月 13, 2021

Preserve CR0.CD and CR0.NW on INIT instead of forcing them to '1', as
defined by both Intel's SDM and AMD's APM.

Note, current versions of Intel's SDM are very poorly written with
respect to INIT behavior.  Table 9-1. "IA-32 and Intel 64 Processor
States Following Power-up, Reset, or INIT" quite clearly lists power-up,
RESET, _and_ INIT as setting CR0=60000010H, i.e. CD/NW=1.  But the SDM
then attempts to qualify CD/NW behavior in a footnote:

  2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits
     are cleared.

Presumably that footnote is only meant for INIT, as the RESET case and
especially the power-up case are rather non-sensical.  Another footnote
all but confirms that:

  6. Internal caches are invalid after power-up and RESET, but left
     unchanged with an INIT.

Bare metal testing shows that CD/NW are indeed preserved on INIT (someone
else can hack their BIOS to check RESET and power-up :-D).
Reported-by: NReiji Watanabe <reijiw@google.com>
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-47-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c72ab5a

KVM: SVM: Drop redundant clearing of vcpu->arch.hflags at INIT/RESET · 46f4898b

由 Sean Christopherson 提交于 7月 13, 2021

Drop redundant clears of vcpu->arch.hflags in init_vmcb() since
kvm_vcpu_reset() always clears hflags, and it is also always
zero at vCPU creation time.  And of course, the second clearing
in init_vmcb() was always redundant.
Suggested-by: NReiji Watanabe <reijiw@google.com>
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-46-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46f4898b

KVM: SVM: Emulate #INIT in response to triple fault shutdown · 265e4353

由 Sean Christopherson 提交于 7月 13, 2021

Emulate a full #INIT instead of simply initializing the VMCB if the
guest hits a shutdown.  Initializing the VMCB but not other vCPU state,
much of which is mirrored by the VMCB, results in incoherent and broken
vCPU state.

Ideally, KVM would not automatically init anything on shutdown, and
instead put the vCPU into e.g. KVM_MP_STATE_UNINITIALIZED and force
userspace to explicitly INIT or RESET the vCPU.  Even better would be to
add KVM_MP_STATE_SHUTDOWN, since technically NMI can break shutdown
(and SMI on Intel CPUs).

But, that ship has sailed, and emulating #INIT is the next best thing as
that has at least some connection with reality since there exist bare
metal platforms that automatically INIT the CPU if it hits shutdown.

Fixes: 46fe4ddd ("[PATCH] KVM: SVM: Propagate cpu shutdown events to userspace")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-45-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

265e4353

KVM: VMX: Move RESET-only VMWRITE sequences to init_vmcs() · e5494940

由 Sean Christopherson 提交于 7月 13, 2021

Move VMWRITE sequences in vmx_vcpu_reset() guarded by !init_event into
init_vmcs() to make it more obvious that they're, uh, initializing the
VMCS.

No meaningful functional change intended (though the order of VMWRITEs
and whatnot is different).
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-44-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5494940

KVM: VMX: Remove redundant write to set vCPU as active at RESET/INIT · 7aa13fc3

由 Sean Christopherson 提交于 7月 13, 2021

Drop a call to vmx_clear_hlt() during vCPU INIT, the guest's activity
state is unconditionally set to "active" a few lines earlier in
vmx_vcpu_reset().

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-43-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7aa13fc3

KVM: VMX: Smush x2APIC MSR bitmap adjustments into single function · 84ec8d2d

由 Sean Christopherson 提交于 7月 13, 2021

Consolidate all of the dynamic MSR bitmap adjustments into
vmx_update_msr_bitmap_x2apic(), and rename the mode tracker to reflect
that it is x2APIC specific. If KVM gains more cases of dynamic MSR
pass-through, odds are very good that those new cases will be better off
with their own logic, e.g. see Intel PT MSRs and MSR_IA32_SPEC_CTRL.

Attempting to handle all updates in a common helper did more harm than
good, as KVM ended up collecting a large number of useless "updates".
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-42-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

84ec8d2d

KVM: VMX: Remove unnecessary initialization of msr_bitmap_mode · e7c701dd

由 Sean Christopherson 提交于 7月 13, 2021

Don't bother initializing msr_bitmap_mode to 0, all of struct vcpu_vmx is
zero initialized.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-41-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e7c701dd

KVM: VMX: Don't redo x2APIC MSR bitmaps when userspace filter is changed · 002f87a4

由 Sean Christopherson 提交于 7月 13, 2021

Drop an explicit call to update the x2APIC MSRs when the userspace MSR
filter is modified.  The x2APIC MSRs are deliberately exempt from
userspace filtering.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-40-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

002f87a4

KVM: nVMX: Remove obsolete MSR bitmap refresh at nested transitions · 284036c6

由 Sean Christopherson 提交于 7月 13, 2021

Drop unnecessary MSR bitmap updates during nested transitions, as L1's
APIC_BASE MSR is not modified by the standard VM-Enter/VM-Exit flows,
and L2's MSR bitmap is managed separately. In the unlikely event that L1
is pathological and loads APIC_BASE via the VM-Exit load list, KVM will
handle updating the bitmap in its normal WRMSR flows.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-39-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

284036c6

KVM: VMX: Remove obsolete MSR bitmap refresh at vCPU RESET/INIT · 9e4784e1

由 Sean Christopherson 提交于 7月 13, 2021

Remove an unnecessary MSR bitmap refresh during vCPU RESET/INIT.  In both
cases, the MSR bitmap already has the desired values and state.

At RESET, the vCPU is guaranteed to be running with x2APIC disabled, the
x2APIC MSRs are guaranteed to be intercepted due to the MSR bitmap being
initialized to all ones by alloc_loaded_vmcs(), and vmx->msr_bitmap_mode
is guaranteed to be zero, i.e. reflecting x2APIC disabled.

At INIT, the APIC_BASE MSR is not modified, thus there can't be any
change in x2APIC state.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-38-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9e4784e1

KVM: x86: Move setting of sregs during vCPU RESET/INIT to common x86 · f39e805e

由 Sean Christopherson 提交于 7月 13, 2021

Move the setting of CR0, CR4, EFER, RFLAGS, and RIP from vendor code to
common x86.  VMX and SVM now have near-identical sequences, the only
difference being that VMX updates the exception bitmap.  Updating the
bitmap on SVM is unnecessary, but benign.  Unfortunately it can't be left
behind in VMX due to the need to update exception intercepts after the
control registers are set.
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-37-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f39e805e

KVM: VMX: Don't _explicitly_ reconfigure user return MSRs on vCPU INIT · c5c9f920

由 Sean Christopherson 提交于 7月 13, 2021

When emulating vCPU INIT, do not unconditionally refresh the list of user
return MSRs that need to be loaded into hardware when running the guest.
Unconditionally refreshing the list is confusing, as the vast majority of
MSRs are not modified on INIT. The real motivation is to handle the case
where an INIT during long mode obviates the need to load the SYSCALL MSRs,
and that is handled as needed by vmx_set_efer().
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-36-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5c9f920

KVM: VMX: Refresh list of user return MSRs after setting guest CPUID · 432979b5

由 Sean Christopherson 提交于 7月 13, 2021

After a CPUID update, refresh the list of user return MSRs that are
loaded into hardware when running the vCPU.  This is necessary to handle
the oddball case where userspace exposes X86_FEATURE_RDTSCP to the guest
after the vCPU is running.

Fixes: 0023ef39 ("kvm: vmx: Set IA32_TSC_AUX for legacy mode guests")
Fixes: 4e47c7a6 ("KVM: VMX: Add instruction rdtscp support for guest")
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-35-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

432979b5

KVM: VMX: Skip pointless MSR bitmap update when setting EFER · 400dd54b

由 Sean Christopherson 提交于 7月 13, 2021

Split setup_msrs() into vmx_setup_uret_msrs() and an open coded refresh
of the MSR bitmap, and skip the latter when refreshing the user return
MSRs during an EFER load. Only the x2APIC MSRs are dynamically exposed
and hidden, and those are not affected by a change in EFER.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-34-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

400dd54b

KVM: SVM: Stuff save->dr6 at during VMSA sync, not at RESET/INIT · d0f9f826

由 Sean Christopherson 提交于 7月 13, 2021

Move code to stuff vmcb->save.dr6 to its architectural init value from
svm_vcpu_reset() into sev_es_sync_vmsa().  Except for protected guests,
a.k.a. SEV-ES guests, vmcb->save.dr6 is set during VM-Enter, i.e. the
extra write is unnecessary.  For SEV-ES, stuffing save->dr6 handles a
theoretical case where the VMSA could be encrypted before the first
KVM_RUN.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-33-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0f9f826

KVM: SVM: Drop redundant writes to vmcb->save.cr4 at RESET/INIT · 6cfe7b83

由 Sean Christopherson 提交于 7月 13, 2021

Drop direct writes to vmcb->save.cr4 during vCPU RESET/INIT, as the
values being written are fully redundant with respect to
svm_set_cr4(vcpu, 0) a few lines earlier.  Note, svm_set_cr4() also
correctly forces X86_CR4_PAE when NPT is disabled.

No functional change intended.
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-32-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6cfe7b83

KVM: SVM: Tweak order of cr0/cr4/efer writes at RESET/INIT · ef8a0fa5

由 Sean Christopherson 提交于 7月 13, 2021

Hoist svm_set_cr0() up in the sequence of register initialization during
vCPU RESET/INIT, purely to match VMX so that a future patch can move the
sequences to common x86.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-31-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef8a0fa5

KVM: nVMX: Don't evaluate "emulation required" on nested VM-Exit · 816be9e9

由 Sean Christopherson 提交于 7月 13, 2021

Use the "internal" variants of setting segment registers when stuffing
state on nested VM-Exit in order to skip the "emulation required"
updates.  VM-Exit must always go to protected mode, and all segments are
mostly hardcoded (to valid values) on VM-Exit.  The bits of the segments
that aren't hardcoded are explicitly checked during VM-Enter, e.g. the
selector RPLs must all be zero.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-30-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

816be9e9

KVM: VMX: Skip emulation required checks during pmode/rmode transitions · 1dd7a4f1

由 Sean Christopherson 提交于 7月 13, 2021

Don't refresh "emulation required" when stuffing segments during
transitions to/from real mode when running without unrestricted guest.
The checks are unnecessary as vmx_set_cr0() unconditionally rechecks
"emulation required".  They also happen to be broken, as enter_pmode()
and enter_rmode() run with a stale vcpu->arch.cr0.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-29-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1dd7a4f1

KVM: VMX: Process CR0.PG side effects after setting CR0 assets · 32437c2a

由 Sean Christopherson 提交于 7月 13, 2021

Move the long mode and EPT w/o unrestricted guest side effect processing
down in vmx_set_cr0() so that the EPT && !URG case doesn't have to stuff
vcpu->arch.cr0 early. This also fixes an oddity where CR0 might not be
marked available, i.e. the early vcpu->arch.cr0 write would appear to be
in danger of being overwritten, though that can't actually happen in the
current code since CR0.TS is the only guest-owned bit, and CR0.TS is not
read by vmx_set_cr4().
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-28-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

32437c2a

KVM: x86/mmu: Skip the permission_fault() check on MMIO if CR0.PG=0 · 908b7d43

由 Sean Christopherson 提交于 7月 13, 2021

Skip the MMU permission_fault() check if paging is disabled when
verifying the cached MMIO GVA is usable. The check is unnecessary and
can theoretically get a false positive since the MMU doesn't zero out
"permissions" or "pkru_mask" when guest paging is disabled.

The obvious alternative is to zero out all the bitmasks when configuring
nonpaging MMUs, but that's unnecessary work and doesn't align with the
MMU's general approach of doing as little as possible for flows that are
supposed to be unreachable.

This is nearly a nop as the false positive is nothing more than an
insignificant performance blip, and more or less limited to string MMIO
when L1 is running with paging disabled. KVM doesn't cache MMIO if L2 is
active with nested TDP since the "GVA" is really an L2 GPA. If L2 is
active without nested TDP, then paging can't be disabled as neither VMX
nor SVM allows entering the guest without paging of some form.

Jumping back to L1 with paging disabled, in that case direct_map is true
and so KVM will use CR2 as a GPA; the only time it doesn't is if the
fault from the emulator doesn't match or emulator_can_use_gpa(), and that
fails only on string MMIO and other instructions with multiple memory
operands.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-27-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

908b7d43

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功