提交 · e64419d991ea212af087d3c57fcabb4d27db03fc · openeuler / Kernel

21 4月, 2020 10 次提交

KVM: x86: Move "flush guest's TLB" logic to separate kvm_x86_ops hook · e64419d9

由 Sean Christopherson 提交于 3月 20, 2020

Add a dedicated hook to handle flushing TLB entries on behalf of the
guest, i.e. for a paravirtualized TLB flush, and use it directly instead
of bouncing through kvm_vcpu_flush_tlb().

For VMX, change the effective implementation implementation to never do
INVEPT and flush only the current context, i.e. to always flush via
INVVPID(SINGLE_CONTEXT).  The INVEPT performed by __vmx_flush_tlb() when
@invalidate_gpa=false and enable_vpid=0 is unnecessary, as it will only
flush guest-physical mappings; linear and combined mappings are flushed
by VM-Enter when VPID is disabled, and changes in the guest pages tables
do not affect guest-physical mappings.

When EPT and VPID are enabled, doing INVVPID is not required (by Intel's
architecture) to invalidate guest-physical mappings, i.e. TLB entries
that cache guest-physical mappings can live across INVVPID as the
mappings are associated with an EPTP, not a VPID.  The intent of
@invalidate_gpa is to inform vmx_flush_tlb() that it must "invalidate
gpa mappings", i.e. do INVEPT and not simply INVVPID.  Other than nested
VPID handling, which now calls vpid_sync_context() directly, the only
scenario where KVM can safely do INVVPID instead of INVEPT (when EPT is
enabled) is if KVM is flushing TLB entries from the guest's perspective,
i.e. is only required to invalidate linear mappings.

For SVM, flushing TLB entries from the guest's perspective can be done
by flushing the current ASID, as changes to the guest's page tables are
associated only with the current ASID.

Adding a dedicated ->tlb_flush_guest() paves the way toward removing
@invalidate_gpa, which is a potentially dangerous control flag as its
meaning is not exactly crystal clear, even for those who are familiar
with the subtleties of what mappings Intel CPUs are/aren't allowed to
keep across various invalidation scenarios.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-15-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e64419d9

KVM: nVMX: Use vpid_sync_vcpu_addr() to emulate INVVPID with address · bc41d0c4

由 Sean Christopherson 提交于 3月 20, 2020

Use vpid_sync_vcpu_addr() to emulate the "individual address" variant of
INVVPID now that said function handles the fallback case of the (host)
CPU not supporting "individual address".

Note, the "vpid == 0" checks in the vpid_sync_*() helpers aren't
actually redundant with the "!operand.vpid" check in handle_invvpid(),
as the vpid passed to vpid_sync_vcpu_addr() is a KVM (host) controlled
value, i.e. vpid02 can be zero even if operand.vpid is non-zero.

No functional change intended.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-14-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc41d0c4

KVM: VMX: Drop redundant capability checks in low level INVVPID helpers · ca431c0c

由 Sean Christopherson 提交于 3月 20, 2020

Remove the INVVPID capabilities checks from vpid_sync_vcpu_single() and
vpid_sync_vcpu_global() now that all callers ensure the INVVPID variant
is supported.  Note, in some cases the guarantee is provided in concert
with hardware_setup(), which enables VPID if and only if at least of
invvpid_single() or invvpid_global() is supported.

Drop the WARN_ON_ONCE() from vmx_flush_tlb() as vpid_sync_vcpu_single()
will trigger a WARN() on INVVPID failure, i.e. if SINGLE_CONTEXT isn't
supported.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-13-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ca431c0c

KVM: VMX: Handle INVVPID fallback logic in vpid_sync_vcpu_addr() · ab4b3597

由 Sean Christopherson 提交于 3月 20, 2020

Directly invoke vpid_sync_context() to do a global INVVPID when the
individual address variant is not supported instead of deferring such
behavior to the caller.  This allows for additional consolidation of
code as the logic is basically identical to the emulation of the
individual address variant in handle_invvpid().

No functional change intended.
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-12-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ab4b3597

KVM: VMX: Move vpid_sync_vcpu_addr() down a few lines · 8a8b097c

由 Sean Christopherson 提交于 3月 20, 2020

Move vpid_sync_vcpu_addr() below vpid_sync_context() so that it can be
refactored in a future patch to call vpid_sync_context() directly when
the "individual address" INVVPID variant isn't supported.

No functional change intended.
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-11-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a8b097c

KVM: VMX: Use vpid_sync_context() directly when possible · 446ace4b

由 Sean Christopherson 提交于 3月 20, 2020

Use vpid_sync_context() directly for flows that run if and only if
enable_vpid=1, or more specifically, nested VMX flows that are gated by
vmx->nested.msrs.secondary_ctls_high.SECONDARY_EXEC_ENABLE_VPID being
set, which is allowed if and only if enable_vpid=1.  Because these flows
call __vmx_flush_tlb() with @invalidate_gpa=false, the if-statement that
decides between INVEPT and INVVPID will always go down the INVVPID path,
i.e. call vpid_sync_context() because
"enable_ept && (invalidate_gpa || !enable_vpid)" always evaluates false.

This helps pave the way toward removing @invalidate_gpa and @vpid from
__vmx_flush_tlb() and its callers.

Opportunstically drop unnecessary brackets in handle_invvpid() around an
affected __vmx_flush_tlb()->vpid_sync_context() conversion.

No functional change intended.
Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-10-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

446ace4b

KVM: VMX: Skip global INVVPID fallback if vpid==0 in vpid_sync_context() · c746b3a4

由 Sean Christopherson 提交于 3月 20, 2020

Skip the global INVVPID in the unlikely scenario that vpid==0 and the
SINGLE_CONTEXT variant of INVVPID is unsupported.  If vpid==0, there's
no need to INVVPID as it's impossible to do VM-Enter with VPID enabled
and vmcs.VPID==0, i.e. there can't be any TLB entries for the vCPU with
vpid==0.  The fact that the SINGLE_CONTEXT variant isn't supported is
irrelevant.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-9-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c746b3a4

KVM: x86: Sync SPTEs when injecting page/EPT fault into L1 · ee1fa209

由 Junaid Shahid 提交于 3月 20, 2020

When injecting a page fault or EPT violation/misconfiguration, KVM is
not syncing any shadow PTEs associated with the faulting address,
including those in previous MMUs that are associated with L1's current
EPTP (in a nested EPT scenario), nor is it flushing any hardware TLB
entries. All this is done by kvm_mmu_invalidate_gva.

Page faults that are either !PRESENT or RSVD are exempt from the flushing,
as the CPU is not allowed to cache such translations.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-8-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ee1fa209

KVM: x86: cleanup kvm_inject_emulated_page_fault · 0cd665bd

由 Paolo Bonzini 提交于 3月 25, 2020

To reconstruct the kvm_mmu to be used for page fault injection, we
can simply use fault->nested_page_fault.  This matches how
fault->nested_page_fault is assigned in the first place by
FNAME(walk_addr_generic).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0cd665bd

KVM: x86: introduce kvm_mmu_invalidate_gva · 5efac074

由 Paolo Bonzini 提交于 3月 23, 2020

Wrap the combination of mmu->invlpg and kvm_x86_ops->tlb_flush_gva
into a new function. This function also lets us specify the host PGD to
invalidate and also the MMU, both of which will be useful in fixing and
simplifying kvm_inject_emulated_page_fault.

A nested guest's MMU however has g_context->invlpg == NULL. Instead of
setting it to nonpaging_invlpg, make kvm_mmu_invalidate_gva the only
entry point to mmu->invlpg and make a NULL invlpg pointer equivalent
to nonpaging_invlpg, saving a retpoline.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5efac074

16 4月, 2020 13 次提交

KVM: x86: Export kvm_propagate_fault() (as kvm_inject_emulated_page_fault) · 53b3d8e9

由 Sean Christopherson 提交于 3月 20, 2020

Export the page fault propagation helper so that VMX can use it to
correctly emulate TLB invalidation on page faults in an upcoming patch.

In the (hopefully) not-too-distant future, SGX virtualization will also
want access to the helper for injecting page faults to the correct level
(L1 vs. L2) when emulating ENCLS instructions.

Rename the function to kvm_inject_emulated_page_fault() to clarify that
it is (a) injecting a fault and (b) only for page faults.  WARN if it's
invoked with an exception other than PF_VECTOR.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-6-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53b3d8e9

KVM: nVMX: Invalidate all roots when emulating INVVPID without EPT · d6e3f838

由 Junaid Shahid 提交于 3月 20, 2020

Free all roots when emulating INVVPID for L1 and EPT is disabled, as
outstanding changes to the page tables managed by L1 need to be
recognized.  Because L1 and L2 share an MMU when EPT is disabled, and
because VPID is not tracked by the MMU role, all roots in the current
MMU (root_mmu) need to be freed, otherwise a future nested VM-Enter or
VM-Exit could do a fast CR3 switch (without a flush/sync) and consume
stale SPTEs.

Fixes: 5c614b35 ("KVM: nVMX: nested VPID emulation")
Signed-off-by: NJunaid Shahid <junaids@google.com>
[sean: ported to upstream KVM, reworded the comment and changelog]
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-5-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6e3f838

KVM: nVMX: Invalidate all EPTP contexts when emulating INVEPT for L1 · f8aa7e39

由 Sean Christopherson 提交于 3月 20, 2020

Free all L2 (guest_mmu) roots when emulating INVEPT for L1.  Outstanding
changes to the EPT tables managed by L1 need to be recognized, and
relying on KVM to always flush L2's EPTP context on nested VM-Enter is
dangerous.

Similar to handle_invpcid(), rely on kvm_mmu_free_roots() to do a remote
TLB flush if necessary, e.g. if L1 has never entered L2 then there is
nothing to be done.

Nuking all L2 roots is overkill for the single-context variant, but it's
the safe and easy bet.  A more precise zap mechanism will be added in
the future.  Add a TODO to call out that KVM only needs to invalidate
affected contexts.

Fixes: 14c07ad8 ("x86/kvm/mmu: introduce guest_mmu")
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-4-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f8aa7e39

KVM: nVMX: Validate the EPTP when emulating INVEPT(EXTENT_CONTEXT) · eed0030e

由 Sean Christopherson 提交于 3月 20, 2020

Signal VM-Fail for the single-context variant of INVEPT if the specified
EPTP is invalid.  Per the INEVPT pseudocode in Intel's SDM, it's subject
to the standard EPT checks:

  If VM entry with the "enable EPT" VM execution control set to 1 would
  fail due to the EPTP value then VMfail(Invalid operand to INVEPT/INVVPID);

Fixes: bfd0a56b ("nEPT: Nested INVEPT")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-3-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eed0030e

KVM: VMX: Flush all EPTP/VPID contexts on remote TLB flush · e8eff282

由 Sean Christopherson 提交于 3月 20, 2020

Flush all EPTP/VPID contexts if a TLB flush _may_ have been triggered by
a remote or deferred TLB flush, i.e. by KVM_REQ_TLB_FLUSH.  Remote TLB
flushes require all contexts to be invalidated, not just the active
contexts, e.g. all mappings in all contexts for a given HVA need to be
invalidated on a mmu_notifier invalidation.  Similarly, the instigator
of the deferred TLB flush may be expecting all contexts to be flushed,
e.g. vmx_vcpu_load_vmcs().

Without nested VMX, flushing only the current EPTP/VPID context isn't
problematic because KVM uses a constant VPID for each vCPU, and
mmu_alloc_direct_roots() all but guarantees KVM will use a single EPTP
for L1.  In the rare case where a different EPTP is created or reused,
KVM (currently) unconditionally flushes the new EPTP context prior to
entering the guest.

With nested VMX, KVM conditionally uses a different VPID for L2, and
unconditionally uses a different EPTP for L2.  Because KVM doesn't
_intentionally_ guarantee L2's EPTP/VPID context is flushed on nested
VM-Enter, it'd be possible for a malicious L1 to attack the host and/or
different VMs by exploiting the lack of flushing for L2.

  1) Launch nested guest from malicious L1.

  2) Nested VM-Enter to L2.

  3) Access target GPA 'g'.  CPU inserts TLB entry tagged with L2's ASID
     mapping 'g' to host PFN 'x'.

  2) Nested VM-Exit to L1.

  3) L1 triggers kernel same-page merging (ksm) by duplicating/zeroing
     the page for PFN 'x'.

  4) Host kernel merges PFN 'x' with PFN 'y', i.e. unmaps PFN 'x' and
     remaps the page to PFN 'y'.  mmu_notifier sends invalidate command,
     KVM flushes TLB only for L1's ASID.

  4) Host kernel reallocates PFN 'x' to some other task/guest.

  5) Nested VM-Enter to L2.  KVM does not invalidate L2's EPTP or VPID.

  6) L2 accesses GPA 'g' and gains read/write access to PFN 'x' via its
     stale TLB entry.

However, current KVM unconditionally flushes L1's EPTP/VPID context on
nested VM-Exit.  But, that behavior is mostly unintentional, KVM doesn't
go out of its way to flush EPTP/VPID on nested VM-Enter/VM-Exit, rather
a TLB flush is guaranteed to occur prior to re-entering L1 due to
__kvm_mmu_new_cr3() always being called with skip_tlb_flush=false.  On
nested VM-Enter, this happens via kvm_init_shadow_ept_mmu() (nested EPT
enabled) or in nested_vmx_load_cr3() (nested EPT disabled).  On nested
VM-Exit it occurs via nested_vmx_load_cr3().

This also fixes a bug where a deferred TLB flush in the context of L2,
with EPT disabled, would flush L1's VPID instead of L2's VPID, as
vmx_flush_tlb() flushes L1's VPID regardless of is_guest_mode().

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Ben Gardon <bgardon@google.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Junaid Shahid <junaids@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: John Haxby <john.haxby@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Fixes: efebf0aa ("KVM: nVMX: Do not flush TLB on L1<->L2 transitions if L1 uses VPID and EPT")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200320212833.3507-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8eff282

KVM: pass through CPUID(0x80000006) · 43d05de2

由 Eric Northup 提交于 4月 14, 2020

Return the host's L2 cache and TLB information for CPUID.0x80000006
instead of zeroing out the entry as part of KVM_GET_SUPPORTED_CPUID.
This allows a userspace VMM to feed KVM_GET_SUPPORTED_CPUID's output
directly into KVM_SET_CPUID2 (without breaking the guest).
Signed-off-by: NEric Northup (Google) <digitaleric@gmail.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Message-Id: <20200415012320.236065-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

43d05de2

KVM: x86: Return updated timer current count register from KVM_GET_LAPIC · 24647e0a

由 Peter Shier 提交于 10月 10, 2018

kvm_vcpu_ioctl_get_lapic (implements KVM_GET_LAPIC ioctl) does a bulk copy
of the LAPIC registers but must take into account that the one-shot and
periodic timer current count register is computed upon reads and is not
present in register state. When restoring LAPIC state (e.g. after
migration), restart timers from their their current count values at time of
save.

Note: When a one-shot timer expires, the code in arch/x86/kvm/lapic.c does
not zero the value of the LAPIC initial count register (emulating HW
behavior). If no other timer is run and pending prior to a subsequent
KVM_GET_LAPIC call, the returned register set will include the expired
one-shot initial count. On a subsequent KVM_SET_LAPIC call the code will
see a non-zero initial count and start a new one-shot timer using the
expired timer's count. This is a prior existing bug and will be addressed
in a separate patch. Thanks to jmattson@google.com for this find.
Signed-off-by: NPeter Shier <pshier@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <20181010225653.238911-1-pshier@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

24647e0a

KVM: SVM: Fix __svm_vcpu_run declaration. · 56a87e5d

由 Uros Bizjak 提交于 4月 09, 2020

The function returns no value.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 199cd1d7 ("KVM: SVM: Split svm_vcpu_run inline assembly to separate file")
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200409114926.1407442-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56a87e5d

KVM: SVM: Do not setup frame pointer in __svm_vcpu_run · b61f62d4

由 Uros Bizjak 提交于 4月 09, 2020

__svm_vcpu_run is a leaf function and does not need
a frame pointer.  %rbp is also destroyed a few instructions
later when guest registers are loaded.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200409120440.1427215-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b61f62d4

KVM: SVM: Fix build error due to missing release_pages() include · b2bce0a5

由 Borislav Petkov 提交于 4月 11, 2020

Fix:

  arch/x86/kvm/svm/sev.c: In function ‘sev_pin_memory’:
  arch/x86/kvm/svm/sev.c:360:3: error: implicit declaration of function ‘release_pages’;\
	  did you mean ‘reclaim_pages’? [-Werror=implicit-function-declaration]
    360 |   release_pages(pages, npinned);
        |   ^~~~~~~~~~~~~
        |   reclaim_pages

because svm.c includes pagemap.h but the carved out sev.c needs it too.
Triggered by a randconfig build.

Fixes: eaf78265 ("KVM: SVM: Move SEV code to separate file")
Signed-off-by: NBorislav Petkov <bp@suse.de>
Message-Id: <20200411160927.27954-1-bp@alien8.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2bce0a5

KVM: SVM: Do not mark svm_vcpu_run with STACK_FRAME_NON_STANDARD · b4fd6308

由 Uros Bizjak 提交于 4月 14, 2020

svm_vcpu_run does not change stack or frame pointer anymore.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200414113612.104501-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b4fd6308

kvm: nVMX: match comment with return type for nested_vmx_exit_reflected · 69c09755

由 Oliver Upton 提交于 4月 14, 2020

nested_vmx_exit_reflected() returns a bool, not int. As such, refer to
the return values as true/false in the comment instead of 1/0.
Signed-off-by: NOliver Upton <oupton@google.com>
Message-Id: <20200414221241.134103-1-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

69c09755

kvm: nVMX: reflect MTF VM-exits if injected by L1 · b045ae90

由 Oliver Upton 提交于 4月 14, 2020

According to SDM 26.6.2, it is possible to inject an MTF VM-exit via the
VM-entry interruption-information field regardless of the 'monitor trap
flag' VM-execution control. KVM appropriately copies the VM-entry
interruption-information field from vmcs12 to vmcs02. However, if L1
has not set the 'monitor trap flag' VM-execution control, KVM fails to
reflect the subsequent MTF VM-exit into L1.

Fix this by consulting the VM-entry interruption-information field of
vmcs12 to determine if L1 has injected the MTF VM-exit. If so, reflect
the exit, regardless of the 'monitor trap flag' VM-execution control.

Fixes: 5f3d45e7 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
Signed-off-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200414224746.240324-1-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b045ae90

14 4月, 2020 3 次提交

KVM: VMX: Enable machine check support for 32bit targets · fb56baae

由 Uros Bizjak 提交于 4月 14, 2020

There is no reason to limit the use of do_machine_check
to 64bit targets. MCE handling works for both target familes.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: stable@vger.kernel.org
Fixes: a0861c02 ("KVM: Add VT-x machine check support")
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200414071414.45636-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb56baae

KVM: SVM: move more vmentry code to assembly · f14eec0a

由 Paolo Bonzini 提交于 4月 13, 2020

Manipulate IF around vmload/vmsave to remove the confusing usage of
local_irq_enable where interrupts are actually disabled via GIF.
And stuff the RSB immediately without waiting for a RET to avoid
Spectre-v2 attacks.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f14eec0a

KVM: SVM: fix compilation with modular PSP and non-modular KVM · 9ef1530c

由 Paolo Bonzini 提交于 4月 13, 2020

Use svm_sev_enabled() in order to cull all calls to PSP code.  Otherwise,
compilation fails with undefined symbols if the PSP device driver is compiled
as a module and KVM is not.
Reported-by: NUros Bizjak <ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9ef1530c

07 4月, 2020 4 次提交

KVM: VMX: fix crash cleanup when KVM wasn't used · dbef2808

由 Vitaly Kuznetsov 提交于 4月 01, 2020

If KVM wasn't used at all before we crash the cleanup procedure fails with
 BUG: unable to handle page fault for address: ffffffffffffffc8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 23215067 P4D 23215067 PUD 23217067 PMD 0
 Oops: 0000 [#8] SMP PTI
 CPU: 0 PID: 3542 Comm: bash Kdump: loaded Tainted: G      D           5.6.0-rc2+ #823
 RIP: 0010:crash_vmclear_local_loaded_vmcss.cold+0x19/0x51 [kvm_intel]

The root cause is that loaded_vmcss_on_cpu list is not yet initialized,
we initialize it in hardware_enable() but this only happens when we start
a VM.

Previously, we used to have a bitmap with enabled CPUs and that was
preventing [masking] the issue.

Initialized loaded_vmcss_on_cpu list earlier, right before we assign
crash_vmclear_loaded_vmcss pointer. blocked_vcpu_on_cpu list and
blocked_vcpu_on_cpu_lock are moved altogether for consistency.

Fixes: 31603d4f ("KVM: VMX: Always VMCLEAR in-use VMCSes during crash with kexec support")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200401081348.1345307-1-vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dbef2808

KVM: X86: Filter out the broadcast dest for IPI fastpath · 4064a4c6

由 Wanpeng Li 提交于 4月 02, 2020

Except destination shorthand, a destination value 0xffffffff is used to
broadcast interrupts, let's also filter out this for single target IPI
fastpath.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1585815626-28370-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4064a4c6

KVM: nVMX: don't clear mtf_pending when nested events are blocked · 5c8beb47

由 Oliver Upton 提交于 4月 06, 2020

If nested events are blocked, don't clear the mtf_pending flag to avoid
missing later delivery of the MTF VM-exit.

Fixes: 5ef8acbd ("KVM: nVMX: Emulate MTF when performing instruction emulation")
Signed-off-by: NOliver Upton <oupton@google.com>
Message-Id: <20200406201237.178725-1-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c8beb47

KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter · da7e4232

由 Uros Bizjak 提交于 4月 06, 2020

The exception trampoline in .fixup section is not needed, the exception
handling code can jump directly to the label in the .text section.

Changes since v1:
- Fix commit message.

Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200406202108.74300-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da7e4232

03 4月, 2020 10 次提交

KVM: SVM: Split svm_vcpu_run inline assembly to separate file · 199cd1d7

由 Uros Bizjak 提交于 3月 30, 2020

The compiler (GCC) does not like the situation, where there is inline
assembly block that clobbers all available machine registers in the
middle of the function. This situation can be found in function
svm_vcpu_run in file kvm/svm.c and results in many register spills and
fills to/from stack frame.

This patch fixes the issue with the same approach as was done for
VMX some time ago. The big inline assembly is moved to a separate
assembly .S file, taking into account all ABI requirements.

There are two main benefits of the above approach:

* elimination of several register spills and fills to/from stack
frame, and consequently smaller function .text size. The binary size
of svm_vcpu_run is lowered from 2019 to 1626 bytes.

* more efficient access to a register save array. Currently, register
save array is accessed as:

    7b00:    48 8b 98 28 02 00 00     mov    0x228(%rax),%rbx
    7b07:    48 8b 88 18 02 00 00     mov    0x218(%rax),%rcx
    7b0e:    48 8b 90 20 02 00 00     mov    0x220(%rax),%rdx

and passing ia pointer to a register array as an argument to a function one gets:

  12:    48 8b 48 08              mov    0x8(%rax),%rcx
  16:    48 8b 50 10              mov    0x10(%rax),%rdx
  1a:    48 8b 58 18              mov    0x18(%rax),%rbx

As a result, the total size, considering that the new function size is 229
bytes, gets lowered by 164 bytes.
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

199cd1d7

KVM: SVM: Move SEV code to separate file · eaf78265

由 Joerg Roedel 提交于 3月 24, 2020

Move the SEV specific parts of svm.c into the new sev.c file.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Message-Id: <20200324094154.32352-5-joro@8bytes.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eaf78265

KVM: SVM: Move AVIC code to separate file · ef0f6496

由 Joerg Roedel 提交于 3月 31, 2020

Move the AVIC related functions from svm.c to the new avic.c file.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Message-Id: <20200324094154.32352-4-joro@8bytes.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef0f6496

KVM: SVM: Move Nested SVM Implementation to nested.c · 883b0a91

由 Joerg Roedel 提交于 3月 24, 2020

Split out the code for the nested SVM implementation and move it to a
separate file.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Message-Id: <20200324094154.32352-3-joro@8bytes.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

883b0a91

kVM SVM: Move SVM related files to own sub-directory · 46a010dd

由 Joerg Roedel 提交于 3月 24, 2020

Move svm.c and pmu_amd.c into their own arch/x86/kvm/svm/
subdirectory.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Message-Id: <20200324094154.32352-2-joro@8bytes.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46a010dd

x86/kvm: fix a missing-prototypes "vmread_error" · 514ccc19

由 Qian Cai 提交于 4月 02, 2020

The commit 842f4be9 ("KVM: VMX: Add a trampoline to fix VMREAD error
handling") removed the declaration of vmread_error() causes a W=1 build
failure with KVM_WERROR=y. Fix it by adding it back.

arch/x86/kvm/vmx/vmx.c:359:17: error: no previous prototype for 'vmread_error' [-Werror=missing-prototypes]
 asmlinkage void vmread_error(unsigned long field, bool fault)
                 ^~~~~~~~~~~~
Signed-off-by: NQian Cai <cai@lca.pw>
Message-Id: <20200402153955.1695-1-cai@lca.pw>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

514ccc19

mm: allow VM_FAULT_RETRY for multiple times · 4064b982

由 Peter Xu 提交于 4月 01, 2020

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page.  However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event.  Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                             retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
                             retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                             to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths.  One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault.  It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NBrian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4064b982

mm: introduce FAULT_FLAG_DEFAULT · dde16072

由 Peter Xu 提交于 4月 01, 2020

Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags.  Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over.  With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NBrian Geffon <bgeffon@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dde16072

x86/mm: use helper fault_signal_pending() · 39678191

由 Peter Xu 提交于 4月 01, 2020

Let's move the fatal signal check even earlier so that we can directly use
the new fault_signal_pending() in x86 mm code.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NBrian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-5-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39678191

mm/vma: make vma_is_foreign() available for general use · 7969f226

由 Anshuman Khandual 提交于 4月 01, 2020

Idea of a foreign VMA with respect to the present context is very generic.
But currently there are two identical definitions for this in powerpc and
x86 platforms.  Lets consolidate those redundant definitions while making
vma_is_foreign() available for general use later.  This should not cause
any functional change.
Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Link: http://lkml.kernel.org/r/1582782965-3274-3-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7969f226

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功