提交 · ee49a89329711f84601bcb65ac8e8ef54fdac771 · openeuler / Kernel

22 10月, 2021 21 次提交

KVM: x86: Move SVM's APICv sanity check to common x86 · ee49a893

由 Sean Christopherson 提交于 10月 21, 2021

Move SVM's assertion that vCPU's APICv state is consistent with its VM's
state out of svm_vcpu_run() and into x86's common inner run loop.  The
assertion and underlying logic is not unique to SVM, it's just that SVM
has more inhibiting conditions and thus is more likely to run headfirst
into any KVM bugs.

Add relevant comments to document exactly why the update path has unusual
ordering between the update the kick, why said ordering is safe, and also
the basic rules behind the assertion in the run loop.

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211022004927.1448382-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee49a893

KVM: x86/mmu: Extract zapping of rmaps for gfn range to separate helper · 21fa3246

由 Sean Christopherson 提交于 10月 21, 2021

Extract the zapping of rmaps, a.k.a. legacy MMU, for a gfn range to a
separate helper to clean up the unholy mess that kvm_zap_gfn_range() has
become.  In addition to deep nesting, the rmaps zapping spreads out the
declaration of several variables and is generally a mess.  Clean up the
mess now so that future work to improve the memslots implementation
doesn't need to deal with it.

Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211022010005.1454978-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21fa3246

KVM: x86/mmu: Drop a redundant remote TLB flush in kvm_zap_gfn_range() · e8be2a5b

由 Sean Christopherson 提交于 10月 21, 2021

Remove an unnecessary remote TLB flush in kvm_zap_gfn_range() now that
said function holds mmu_lock for write for its entire duration. The
flush was added by the now-reverted commit to allow TDP MMU to flush while
holding mmu_lock for read, as the transition from write=>read required
dropping the lock and thus a pending flush needed to be serviced.

Fixes: 5a324c24 ("Revert "KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock"")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211022010005.1454978-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8be2a5b

KVM: x86/mmu: Drop a redundant, broken remote TLB flush · bc3b3c10

由 Sean Christopherson 提交于 10月 21, 2021

A recent commit to fix the calls to kvm_flush_remote_tlbs_with_address()
in kvm_zap_gfn_range() inadvertantly added yet another flush instead of
fixing the existing flush.  Drop the redundant flush, and fix the params
for the existing flush.

Cc: stable@vger.kernel.org
Fixes: 2822da44 ("KVM: x86/mmu: fix parameters to kvm_flush_remote_tlbs_with_address")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211022010005.1454978-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc3b3c10

KVM: X86: Don't unload MMU in kvm_vcpu_flush_tlb_guest() · 61b05a9f

由 Lai Jiangshan 提交于 10月 19, 2021

kvm_mmu_unload() destroys all the PGD caches.  Use the lighter
kvm_mmu_sync_roots() and kvm_mmu_sync_prev_roots() instead.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-5-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61b05a9f

KVM: X86: pair smp_wmb() of mmu_try_to_unsync_pages() with smp_rmb() · 264d3dc1

由 Lai Jiangshan 提交于 10月 19, 2021

The commit 578e1c4d ("kvm: x86: Avoid taking MMU lock
in kvm_mmu_sync_roots if no sync is needed") added smp_wmb() in
mmu_try_to_unsync_pages(), but the corresponding smp_load_acquire() isn't
used on the load of SPTE.W.  smp_load_acquire() orders _subsequent_
loads after sp->is_unsync; it does not order _earlier_ loads before
the load of sp->is_unsync.

This has no functional change; smp_rmb() is a NOP on x86, and no
compiler barrier is required because there is a VMEXIT between the
load of SPTE.W and kvm_mmu_snc_roots.

Cc: Junaid Shahid <junaids@google.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-4-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

264d3dc1

KVM: X86: Cache CR3 in prev_roots when PCID is disabled · 509bfe3d

由 Lai Jiangshan 提交于 10月 19, 2021

The commit 21823fbd ("KVM: x86: Invalidate all PGDs for the
current PCID on MOV CR3 w/ flush") invalidates all PGDs for the specific
PCID and in the case of PCID is disabled, it includes all PGDs in the
prev_roots and the commit made prev_roots totally unused in this case.

Not using prev_roots fixes a problem when CR4.PCIDE is changed 0 -> 1
before the said commit:

	(CR4.PCIDE=0, CR4.PGE=1; CR3=cr3_a; the page for the guest
	 RIP is global; cr3_b is cached in prev_roots)

	modify page tables under cr3_b
		the shadow root of cr3_b is unsync in kvm
	INVPCID single context
		the guest expects the TLB is clean for PCID=0
	change CR4.PCIDE 0 -> 1
	switch to cr3_b with PCID=0,NOFLUSH=1
		No sync in kvm, cr3_b is still unsync in kvm
	jump to the page that was modified in step 1
		shadow page tables point to the wrong page

It is a very unlikely case, but it shows that stale prev_roots can be
a problem after CR4.PCIDE changes from 0 to 1.  However, to fix this
case, the commit disabled caching CR3 in prev_roots altogether when PCID
is disabled.  Not all CPUs have PCID; especially the PCID support
for AMD CPUs is kind of recent.  To restore the prev_roots optimization
for CR4.PCIDE=0, flush the whole MMU (including all prev_roots) when
CR4.PCIDE changes.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

509bfe3d

KVM: X86: Fix tlb flush for tdp in kvm_invalidate_pcid() · e45e9e39

由 Lai Jiangshan 提交于 10月 19, 2021

The KVM doesn't know whether any TLB for a specific pcid is cached in
the CPU when tdp is enabled.  So it is better to flush all the guest
TLB when invalidating any single PCID context.

The case is very rare or even impossible since KVM generally doesn't
intercept CR3 write or INVPCID instructions when tdp is enabled, so the
fix is mostly for the sake of overall robustness.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211019110154.4091-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e45e9e39

KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE · a91a7c70

由 Lai Jiangshan 提交于 9月 19, 2021

X86_CR4_PGE doesn't participate in kvm_mmu_role, so the mmu context
doesn't need to be reset.  It is only required to flush all the guest
tlb.

It is also inconsistent that X86_CR4_PGE is in KVM_MMU_CR4_ROLE_BITS
while kvm_mmu_role doesn't use X86_CR4_PGE.  So X86_CR4_PGE is also
removed from KVM_MMU_CR4_ROLE_BITS.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210919024246.89230-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a91a7c70

KVM: X86: Don't reset mmu context when X86_CR4_PCIDE 1->0 · 55261738

由 Lai Jiangshan 提交于 9月 19, 2021

X86_CR4_PCIDE doesn't participate in kvm_mmu_role, so the mmu context
doesn't need to be reset.  It is only required to flush all the guest
tlb.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210919024246.89230-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

55261738

KVM: emulate: Comment on difference between RDPMC implementation and manual · 9ae7f6c9

由 Wanpeng Li 提交于 10月 20, 2021

SDM mentioned that, RDPMC:

  IF (((CR4.PCE = 1) or (CPL = 0) or (CR0.PE = 0)) and (ECX indicates a supported counter))
      THEN
          EAX := counter[31:0];
          EDX := ZeroExtend(counter[MSCB:32]);
      ELSE (* ECX is not valid or CR4.PCE is 0 and CPL is 1, 2, or 3 and CR0.PE is 1 *)
          #GP(0);
  FI;

Let's add a comment why CR0.PE isn't tested since it's impossible for CPL to be >0 if
CR0.PE=0.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1634724836-73721-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9ae7f6c9

KVM: x86: Add vendor name to kvm_x86_ops, use it for error messages · 9dadfc4a

由 Sean Christopherson 提交于 10月 18, 2021

Paul pointed out the error messages when KVM fails to load are unhelpful
in understanding exactly what went wrong if userspace probes the "wrong"
module.

Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
and kvm_amd, and use the name for relevant error message when KVM fails
to load so that the user knows which module failed to load.

Opportunistically tweak the "disabled by bios" error message to clarify
that _support_ was disabled, not that the module itself was magically
disabled by BIOS.
Suggested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211018183929.897461-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dadfc4a

kvm: x86: mmu: Make NX huge page recovery period configurable · 4dfe4f40

由 Junaid Shahid 提交于 10月 19, 2021

Currently, the NX huge page recovery thread wakes up every minute and
zaps 1/nx_huge_pages_recovery_ratio of the total number of split NX
huge pages at a time. This is intended to ensure that only a
relatively small number of pages get zapped at a time. But for very
large VMs (or more specifically, VMs with a large number of
executable pages), a period of 1 minute could still result in this
number being too high (unless the ratio is changed significantly,
but that can result in split pages lingering on for too long).

This change makes the period configurable instead of fixing it at
1 minute. Users of large VMs can then adjust the period and/or the
ratio to reduce the number of pages zapped at one time while still
maintaining the same overall duration for cycling through the
entire list. By default, KVM derives a period from the ratio such
that a page will remain on the list for 1 hour on average.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Message-Id: <20211020010627.305925-1-junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4dfe4f40

KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 · 540c7abe

由 Wanpeng Li 提交于 10月 19, 2021

SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.
Tested-by: NLike Xu <likexu@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

540c7abe

KVM: x86/mmu: Rename slot_handle_leaf to slot_handle_level_4k · 610265ea

由 David Matlack 提交于 10月 19, 2021

slot_handle_leaf is a misnomer because it only operates on 4K SPTEs
whereas "leaf" is used to describe any valid terminal SPTE (4K or
large page). Rename slot_handle_leaf to slot_handle_level_4k to
avoid confusion.

Making this change makes it more obvious there is a benign discrepency
between the legacy MMU and the TDP MMU when it comes to dirty logging.
The legacy MMU only iterates through 4K SPTEs when zapping for
collapsing and when clearing D-bits. The TDP MMU, on the other hand,
iterates through SPTEs on all levels.

The TDP MMU behavior of zapping SPTEs at all levels is technically
overkill for its current dirty logging implementation, which always
demotes to 4k SPTES, but both the TDP MMU and legacy MMU zap if and only
if the SPTE can be replaced by a larger page, i.e. will not spuriously
zap 2m (or larger) SPTEs. Opportunistically add comments to explain this
discrepency in the code.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20211019162223.3935109-1-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

610265ea

KVM: VMX: RTIT_CTL_BRANCH_EN has no dependency on other CPUID bit · e099f3eb

由 Xiaoyao Li 提交于 8月 27, 2021

Per Intel SDM, RTIT_CTL_BRANCH_EN bit has no dependency on any CPUID
leaf 0x14.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-5-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e099f3eb

KVM: VMX: Rename pt_desc.addr_range to pt_desc.num_address_ranges · f4d3a902

由 Xiaoyao Li 提交于 8月 27, 2021

To better self explain the meaning of this field and match the
PT_CAP_num_address_ranges constatn.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-4-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4d3a902

KVM: VMX: Use precomputed vmx->pt_desc.addr_range · ba51d627

由 Xiaoyao Li 提交于 8月 27, 2021

The number of valid PT ADDR MSRs for the guest is precomputed in
vmx->pt_desc.addr_range. Use it instead of calculating again.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-3-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba51d627

KVM: VMX: Restore host's MSR_IA32_RTIT_CTL when it's not zero · 2e6e0d68

由 Xiaoyao Li 提交于 8月 27, 2021

A minor optimization to WRMSR MSR_IA32_RTIT_CTL when necessary.

Opportunistically refine the comment to call out that KVM requires
VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20210827070249.924633-2-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2e6e0d68

KVM: x86/mmu: clean up prefetch/prefault/speculative naming · 2839180c

由 Paolo Bonzini 提交于 9月 29, 2021

"prefetch", "prefault" and "speculative" are used throughout KVM to mean
the same thing.  Use a single name, standardizing on "prefetch" which
is already used by various functions such as direct_pte_prefetch,
FNAME(prefetch_gpte), FNAME(pte_prefetch), etc.
Suggested-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2839180c

KVM: cleanup allocation of rmaps and page tracking data · 1e76a3ce

由 David Stevens 提交于 10月 15, 2021

Unify the flags for rmaps and page tracking data, using a
single flag in struct kvm_arch and a single loop to go
over all the address spaces and memslots.  This avoids
code duplication between alloc_all_memslots_rmaps and
kvm_page_track_enable_mmu_write_tracking.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
[This patch is the delta between David's v2 and v3, with conflicts
 fixed and my own commit message. - Paolo]
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1e76a3ce

19 10月, 2021 6 次提交

KVM: x86: Expose TSC offset controls to userspace · 828ca896

由 Oliver Upton 提交于 9月 16, 2021

To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

828ca896

KVM: x86: Refactor tsc synchronization code · 58d4277b

由 Oliver Upton 提交于 9月 16, 2021

Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.
Signed-off-by: NOliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-7-oupton@google.com>
[Make sure kvm->arch.cur_tsc_generation and vcpu->arch.this_tsc_generation are
 equal at the end of __kvm_synchronize_tsc, if matched is false. Reported by
 Maxim Levitsky. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58d4277b

kvm: x86: protect masterclock with a seqcount · 869b4421

由 Paolo Bonzini 提交于 9月 16, 2021

Protect the reference point for kvmclock with a seqcount, so that
kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
updates will also run in parallel and not bounce the kvmclock cacheline.

Of the variables that were protected by pvclock_gtod_sync_lock,
nr_vcpus_matched_tsc is different because it is updated outside
pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
need to keep it protected by a spinlock.  In fact it must now
be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
write-side of a seqcount, is non-preemptible.  Since we already
have tsc_write_lock which is a raw spinlock, we can just use
tsc_write_lock as the lock that protects the write-side of the
seqcount.
Co-developed-by: NOliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-6-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

869b4421

KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5

由 Oliver Upton 提交于 9月 16, 2021

Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c68dc1b5

KVM: x86: avoid warning with -Wbitwise-instead-of-logical · 3d5e7a28

由 Paolo Bonzini 提交于 10月 15, 2021

This is a new warning in clang top-of-tree (will be clang 14):

In file included from arch/x86/kvm/mmu/mmu.c:27:
arch/x86/kvm/mmu/spte.h:318:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
        return __is_bad_mt_xwr(rsvd_check, spte) |
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                 ||
arch/x86/kvm/mmu/spte.h:318:9: note: cast one or both operands to int to silence this warning

The code is fine, but change it anyway to shut up this clever clogs
of a compiler.

Reported-by: torvic9@mailbox.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3d5e7a28

KVM: X86: fix lazy allocation of rmaps · fa13843d

由 Paolo Bonzini 提交于 10月 15, 2021

If allocation of rmaps fails, but some of the pointers have already been written,
those pointers can be cleaned up when the memslot is freed, or even reused later
for another attempt at allocating the rmaps. Therefore there is no need to
WARN, as done for example in memslot_rmap_alloc, but the allocation *must* be
skipped lest KVM will overwrite the previous pointer and will indeed leak memory.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa13843d

18 10月, 2021 1 次提交

KVM: x86/mmu: kvm_faultin_pfn has to return false if pfh is returned · a7cc099f

由 Andrei Vagin 提交于 10月 15, 2021

This looks like a typo in 8f32d5e5. This change didn't intend to do
any functional changes.

The problem was caught by gVisor tests.

Fixes: 8f32d5e5 ("KVM: x86/mmu: allow kvm_faultin_pfn to return page fault handling code")
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Message-Id: <20211015163221.472508-1-avagin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a7cc099f

15 10月, 2021 1 次提交

KVM: SEV-ES: fix length of string I/O · 019057bd

由 Paolo Bonzini 提交于 10月 12, 2021

The size of the data in the scratch buffer is not divided by the size of
each port I/O operation, so vcpu->arch.pio.count ends up being larger
than it should be by a factor of size.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

019057bd

01 10月, 2021 11 次提交

KVM: x86: only allocate gfn_track when necessary · deae4a10

由 David Stevens 提交于 9月 22, 2021

Avoid allocating the gfn_track arrays if nothing needs them. If there
are no external to KVM users of the API (i.e. no GVT-g), then page
tracking is only needed for shadow page tables. This means that when tdp
is enabled and there are no external users, then the gfn_track arrays
can be lazily allocated when the shadow MMU is actually used. This avoid
allocations equal to .05% of guest memory when nested virtualization is
not used, if the kernel is compiled without GVT-g.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-3-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

deae4a10

KVM: x86: add config for non-kvm users of page tracking · e9d0c0c4

由 David Stevens 提交于 9月 22, 2021

Add a config option that allows kvm to determine whether or not there
are any external users of page tracking.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-2-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9d0c0c4

nSVM: Check for reserved encodings of TLB_CONTROL in nested VMCB · 174a921b

由 Krish Sadhukhan 提交于 9月 20, 2021

According to section "TLB Flush" in APM vol 2,

"Support for TLB_CONTROL commands other than the first two, is
optional and is indicated by CPUID Fn8000_000A_EDX[FlushByAsid].

All encodings of TLB_CONTROL not defined in the APM are reserved."
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20210920235134.101970-3-krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

174a921b

kvm: use kvfree() in kvm_arch_free_vm() · 78b497f2

由 Juergen Gross 提交于 9月 03, 2021

By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
use the common variant. This can be accomplished by adding another
macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.

Further simplification can be achieved by adding __kvm_arch_free_vm()
doing the common part.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Message-Id: <20210903130808.30142-5-jgross@suse.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

78b497f2

KVM: x86: Expose Predictive Store Forwarding Disable · b73a5432

由 Babu Moger 提交于 9月 23, 2021

Predictive Store Forwarding: AMD Zen3 processors feature a new
technology called Predictive Store Forwarding (PSF).

PSF is a hardware-based micro-architectural optimization designed
to improve the performance of code execution by predicting address
dependencies between loads and stores.

How PSF works:

It is very common for a CPU to execute a load instruction to an address
that was recently written by a store. Modern CPUs implement a technique
known as Store-To-Load-Forwarding (STLF) to improve performance in such
cases. With STLF, data from the store is forwarded directly to the load
without having to wait for it to be written to memory. In a typical CPU,
STLF occurs after the address of both the load and store are calculated
and determined to match.

PSF expands on this by speculating on the relationship between loads and
stores without waiting for the address calculation to complete. With PSF,
the CPU learns over time the relationship between loads and stores. If
STLF typically occurs between a particular store and load, the CPU will
remember this.

In typical code, PSF provides a performance benefit by speculating on
the load result and allowing later instructions to begin execution
sooner than they otherwise would be able to.

The details of security analysis of AMD predictive store forwarding is
documented here.
https://www.amd.com/system/files/documents/security-analysis-predictive-store-forwarding.pdf

Predictive Store Forwarding controls:
There are two hardware control bits which influence the PSF feature:
- MSR 48h bit 2 – Speculative Store Bypass (SSBD)
- MSR 48h bit 7 – Predictive Store Forwarding Disable (PSFD)

The PSF feature is disabled if either of these bits are set. These bits
are controllable on a per-thread basis in an SMT system. By default, both
SSBD and PSFD are 0 meaning that the speculation features are enabled.

While the SSBD bit disables PSF and speculative store bypass, PSFD only
disables PSF.

PSFD may be desirable for software which is concerned with the
speculative behavior of PSF but desires a smaller performance impact than
setting SSBD.

Support for PSFD is indicated in CPUID Fn8000_0008 EBX[28].
All processors that support PSF will also support PSFD.

Linux kernel does not have the interface to enable/disable PSFD yet. Plan
here is to expose the PSFD technology to KVM so that the guest kernel can
make use of it if they wish to.
Signed-off-by: NBabu Moger <Babu.Moger@amd.com>
Message-Id: <163244601049.30292.5855870305350227855.stgit@bmoger-ubuntu>
[Keep feature private to KVM, as requested by Borislav Petkov. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b73a5432

KVM: x86/mmu: Avoid memslot lookup in make_spte and mmu_try_to_unsync_pages · 53597858

由 David Matlack 提交于 8月 17, 2021

mmu_try_to_unsync_pages checks if page tracking is active for the given
gfn, which requires knowing the memslot. We can pass down the memslot
via make_spte to avoid this lookup.

The memslot is also handy for make_spte's marking of the gfn as dirty:
we can test whether dirty page tracking is enabled, and if so ensure that
pages are mapped as writable with 4K granularity.  Apart from the warning,
no functional change is intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53597858

KVM: x86/mmu: Avoid memslot lookup in rmap_add · 8a9f566a

由 David Matlack 提交于 8月 13, 2021

Avoid the memslot lookup in rmap_add, by passing it down from the fault
handling code to mmu_set_spte and then to rmap_add.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20210813203504.2742757-6-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a9f566a

KVM: MMU: pass struct kvm_page_fault to mmu_set_spte · a12f4381

由 Paolo Bonzini 提交于 8月 17, 2021

mmu_set_spte is called for either PTE prefetching or page faults.  The
three boolean arguments write_fault, speculative and host_writable are
always respectively false/true/true for prefetching and coming from
a struct kvm_page_fault for page faults.

Let mmu_set_spte distinguish these two situation by accepting a
possibly NULL struct kvm_page_fault argument.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a12f4381

KVM: MMU: pass kvm_mmu_page struct to make_spte · 7158bee4

由 Paolo Bonzini 提交于 8月 17, 2021

The level and A/D bit support of the new SPTE can be found in the role,
which is stored in the kvm_mmu_page struct.  This merges two arguments
into one.

For the TDP MMU, the kvm_mmu_page was not used (kvm_tdp_mmu_map does
not use it if the SPTE is already present) so we fetch it just before
calling make_spte.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7158bee4

KVM: MMU: set ad_disabled in TDP MMU role · 87e888ea

由 Paolo Bonzini 提交于 8月 17, 2021

Prepare for removing the ad_disabled argument of make_spte; instead it can
be found in the role of a struct kvm_mmu_page.  First of all, the TDP MMU
must set the role accurately.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87e888ea

KVM: MMU: remove unnecessary argument to mmu_set_spte · eb5cd7ff

由 Paolo Bonzini 提交于 8月 17, 2021

The level of the new SPTE can be found in the kvm_mmu_page struct; there
is no need to pass it down.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb5cd7ff

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功