提交 · 531281ad98ba7e681e0f8fa3c6d216032a08a123 · openeuler / Kernel

10 7月, 2020 1 次提交

KVM: x86/mmu: Track the associated kmem_cache in the MMU caches · 5962bfb7

由 Sean Christopherson 提交于 7月 02, 2020

Track the kmem_cache used for non-page KVM MMU memory caches instead of
passing in the associated kmem_cache when filling the cache.  This will
allow consolidating code and other cleanups.

No functional change intended.
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5962bfb7

09 7月, 2020 6 次提交

x86/kvm/vmx: Move guest enter/exit into .noinstr.text · 3ebccdf3

由 Thomas Gleixner 提交于 7月 08, 2020

Move the functions which are inside the RCU off region into the
non-instrumentable text section.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

Message-Id: <20200708195322.037311579@linutronix.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3ebccdf3

KVM: x86: Rename cpuid_update() callback to vcpu_after_set_cpuid() · 7c1b761b

由 Xiaoyao Li 提交于 7月 09, 2020

The name of callback cpuid_update() is misleading that it's not about
updating CPUID settings of vcpu but updating the configurations of vcpu
based on the CPUIDs. So rename it to vcpu_after_set_cpuid().
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200709043426.92712-5-xiaoyao.li@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c1b761b

KVM: x86: Create mask for guest CR4 reserved bits in kvm_update_cpuid() · b899c132

由 Krish Sadhukhan 提交于 7月 08, 2020

Instead of creating the mask for guest CR4 reserved bits in kvm_valid_cr4(),
do it in kvm_update_cpuid() so that it can be reused instead of creating it
each time kvm_valid_cr4() is called.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <1594168797-29444-2-git-send-email-krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b899c132

KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only · 985ab278

由 Sean Christopherson 提交于 6月 22, 2020

Make 'struct kvm_mmu_page' MMU-only, nothing outside of the MMU should
be poking into the gory details of shadow pages.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

985ab278

kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu · 8a14fe4f

由 Jim Mattson 提交于 6月 03, 2020

Both the vcpu_vmx structure and the vcpu_svm structure have a
'last_cpu' field. Move the common field into the kvm_vcpu_arch
structure. For clarity, rename it to 'last_vmentry_cpu.'
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-6-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8a14fe4f

KVM: x86/mmu: Make .write_log_dirty a nested operation · 02f5fb2e

由 Sean Christopherson 提交于 6月 22, 2020

Move .write_log_dirty() into kvm_x86_nested_ops to help differentiate it
from the non-nested dirty log hooks.  And because it's a nested-only
operation.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622215832.22090-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02f5fb2e

23 6月, 2020 2 次提交

KVM: nVMX: Plumb L2 GPA through to PML emulation · 2dbebf7a

由 Sean Christopherson 提交于 6月 22, 2020

Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all
intents and purposes is vmx_write_pml_buffer(), instead of having the
latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit
update is the result of KVM emulation (rare for L2), then the GPA in the
VMCS may be stale and/or hold a completely unrelated GPA.

Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2dbebf7a

KVM: LAPIC: ensure APIC map is up to date on concurrent update requests · 44d52717

由 Paolo Bonzini 提交于 6月 22, 2020

The following race can cause lost map update events:

         cpu1                            cpu2

                                apic_map_dirty = true
  ------------------------------------------------------------
                                kvm_recalculate_apic_map:
                                     pass check
                                         mutex_lock(&kvm->arch.apic_map_lock);
                                         if (!kvm->arch.apic_map_dirty)
                                     and in process of updating map
  -------------------------------------------------------------
    other calls to
       apic_map_dirty = true         might be too late for affected cpu
  -------------------------------------------------------------
                                     apic_map_dirty = false
  -------------------------------------------------------------
    kvm_recalculate_apic_map:
    bail out on
      if (!kvm->arch.apic_map_dirty)

To fix it, record the beginning of an update of the APIC map in
apic_map_dirty.  If another APIC map change switches apic_map_dirty
back to DIRTY during the update, kvm_recalculate_apic_map should not
make it CLEAN, and the other caller will go through the slow path.
Reported-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44d52717

12 6月, 2020 1 次提交

KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected · 2a18b7e7

由 Vitaly Kuznetsov 提交于 6月 10, 2020

'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.

Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.

s390 seems to always be able to inject 'page not present', the
change is effectively a nop.
Suggested-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a18b7e7

09 6月, 2020 1 次提交

KVM: x86: Unexport x86_fpu_cache and make it static · 80fbd280

由 Sean Christopherson 提交于 6月 08, 2020

Make x86_fpu_cache static now that FPU allocation and destruction is
handled entirely by common x86 code.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200608180218.20946-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

80fbd280

03 6月, 2020 1 次提交

mm: remove the pgprot argument to __vmalloc · 88dca4ca

由 Christoph Hellwig 提交于 6月 01, 2020

The pgprot argument to __vmalloc is always PAGE_KERNEL now, so remove it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com> [hyperv]
Acked-by: Gao Xiang <xiang@kernel.org> [erofs]
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWei Liu <wei.liu@kernel.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/20200414131348.444715-22-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

88dca4ca

01 6月, 2020 8 次提交

x86/kvm/hyper-v: Add support for synthetic debugger interface · f97f5a56

由 Jon Doron 提交于 5月 29, 2020

Add support for Hyper-V synthetic debugger (syndbg) interface.
The syndbg interface is using MSRs to emulate a way to send/recv packets
data.

The debug transport dll (kdvm/kdnet) will identify if Hyper-V is enabled
and if it supports the synthetic debugger interface it will attempt to
use it, instead of trying to initialize a network adapter.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NJon Doron <arilou@gmail.com>
Message-Id: <20200529134543.1127440-4-arilou@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f97f5a56

KVM: x86/pmu: Support full width counting · 27461da3

由 Like Xu 提交于 5月 29, 2020

Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0)
for GP counters that allows writing the full counter width. Enable this
range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]).

The guest would query CPUID to get the counter width, and sign extends
the counter values as needed. The traditional MSRs always limit to 32bit,
even though the counter internally is larger (48 or 57 bits).

When the new capability is set, use the alternative range which do not
have these restrictions. This lowers the overhead of perf stat slightly
because it has to do less interrupts to accumulate the counter value.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27461da3

KVM: x86: acknowledgment mechanism for async pf page ready notifications · 557a961a

由 Vitaly Kuznetsov 提交于 5月 25, 2020

If two page ready notifications happen back to back the second one is not
delivered and the only mechanism we currently have is
kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
only be performed with the next vmexit when it happens and in some cases
it may take a while. With interrupt based page ready notification delivery
the situation is even worse: unlike exceptions, interrupts are not handled
immediately so we must check if the slot is empty. This is slow and
unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
the fact that the slot is free and host should check its notification
queue. Mandate using it for interrupt based 'page ready' APF event
delivery.

As kvm_check_async_pf_completion() is going away from vcpu_run() we need
a way to communicate the fact that vcpu->async_pf.done queue has
transitioned from empty to non-empty state. Introduce
kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-7-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

557a961a

KVM: x86: interrupt based APF 'page ready' event delivery · 2635b5c4

由 Vitaly Kuznetsov 提交于 5月 25, 2020

Concerns were expressed around APF delivery via synthetic #PF exception as
in some cases such delivery may collide with real page fault. For 'page
ready' notifications we can easily switch to using an interrupt instead.
Introduce new MSR_KVM_ASYNC_PF_INT mechanism and deprecate the legacy one.

One notable difference between the two mechanisms is that interrupt may not
get handled immediately so whenever we would like to deliver next event
(regardless of its type) we must be sure the guest had read and cleared
previous event in the slot.

While on it, get rid on 'type 1/type 2' names for APF events in the
documentation as they are causing confusion. Use 'page not present'
and 'page ready' everywhere instead.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-6-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2635b5c4

KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() · 7c0ade6c

由 Vitaly Kuznetsov 提交于 5月 25, 2020

An innocent reader of the following x86 KVM code:

bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
{
        if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
                return true;
...

may get very confused: if APF mechanism is not enabled, why do we report
that we 'can inject async page present'? In reality, upon injection
kvm_arch_async_page_present() will check the same condition again and,
in case APF is disabled, will just drop the item. This is fine as the
guest which deliberately disabled APF doesn't expect to get any APF
notifications.

Rename kvm_arch_can_inject_async_page_present() to
kvm_arch_can_dequeue_async_page_present() to make it clear what we are
checking: if the item can be dequeued (meaning either injected or just
dropped).

On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so
the rename doesn't matter much.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c0ade6c

KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info · 68fd66f1

由 Vitaly Kuznetsov 提交于 5月 25, 2020

Currently, APF mechanism relies on the #PF abuse where the token is being
passed through CR2. If we switch to using interrupts to deliver page-ready
notifications we need a different way to pass the data. Extent the existing
'struct kvm_vcpu_pv_apf_data' with token information for page-ready
notifications.

While on it, rename 'reason' to 'flags'. This doesn't change the semantics
as we only have reasons '1' and '2' and these can be treated as bit flags
but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
making 'reason' name misleading.

The newly introduced apf_put_user_ready() temporary puts both flags and
token information, this will be changed to put token only when we switch
to interrupt based notifications.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

68fd66f1

KVM: nSVM: remove HF_HIF_MASK · 08245e6d

由 Paolo Bonzini 提交于 5月 19, 2020

The L1 flags can be found in the save area of svm->nested.hsave, fish
it from there so that there is one fewer thing to migrate.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

08245e6d

KVM: nSVM: remove HF_VINTR_MASK · e9fd761a

由 Paolo Bonzini 提交于 5月 13, 2020

Now that the int_ctl field is stored in svm->nested.ctl.int_ctl, we can
use it instead of vcpu->arch.hflags to check whether L2 is running
in V_INTR_MASKING mode.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9fd761a

28 5月, 2020 3 次提交

KVM: nSVM: inject exceptions via svm_check_nested_events · 7c86663b

由 Paolo Bonzini 提交于 5月 16, 2020

This allows exceptions injected by the emulator to be properly delivered
as vmexits.  The code also becomes simpler, because we can just let all
L0-intercepted exceptions go through the usual path.  In particular, our
emulation of the VMX #DB exit qualification is very much simplified,
because the vmexit injection path can use kvm_deliver_exception_payload
to update DR6.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c86663b

KVM: x86: enable event window in inject_pending_event · c9d40913

由 Paolo Bonzini 提交于 5月 22, 2020

In case an interrupt arrives after nested.check_events but before the
call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt
window even if the interrupt is actually going to be a vmexit.  This is
useless rather than harmful, but it really complicates reasoning about
SVM's handling of the VINTR intercept.  We'd like to never bother with
the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in
that case there is no interrupt window and we can just exit the nested
guest whenever we want.

This patch moves the opening of the interrupt window inside
inject_pending_event.  This consolidates the check for pending
interrupt/NMI/SMI in one place, and makes KVM's usage of immediate
exits more consistent, extending it beyond just nested virtualization.

There are two functional changes here.  They only affect corner cases,
but overall they simplify the inject_pending_event.

- re-injection of still-pending events will also use req_immediate_exit
instead of using interrupt-window intercepts.  This should have no impact
on performance on Intel since it simply replaces an interrupt-window
or NMI-window exit for a preemption-timer exit.  On AMD, which has no
equivalent of the preemption time, it may incur some overhead but an
actual effect on performance should only be visible in pathological cases.

- kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true
if an interrupt, NMI or SMI is blocked by nested_run_pending.  This
makes sense because entering the VM will allow it to make progress
and deliver the event.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9d40913

KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s index · cb97c2d6

由 Sean Christopherson 提交于 2月 18, 2020

Take a u32 for the index in has_emulated_msr() to match hardware, which
treats MSR indices as unsigned 32-bit values. Functionally, taking a
signed int doesn't cause problems with the current code base, but could
theoretically cause problems with 32-bit KVM, e.g. if the index were
checked via a less-than statement, which would evaluate incorrectly for
MSR indices with bit 31 set.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb97c2d6

20 5月, 2020 1 次提交

KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page · 7357b1df

由 Michael Kelley 提交于 4月 22, 2020

The Hyper-V Reference TSC Page structure is defined twice. struct
ms_hyperv_tsc_page has padding out to a full 4 Kbyte page size. But
the padding is not needed because the declaration includes a union
with HV_HYP_PAGE_SIZE.  KVM uses the second definition, which is
struct _HV_REFERENCE_TSC_PAGE, because it does not have the padding.

Fix the duplication by removing the padding from ms_hyperv_tsc_page.
Fix up the KVM code to use it. Remove the no longer used struct
_HV_REFERENCE_TSC_PAGE.

There is no functional change.
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Link: https://lore.kernel.org/r/20200422195737.10223-2-mikelley@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>

7357b1df

16 5月, 2020 7 次提交

kvm: add halt-polling cpu usage stats · cb953129

由 David Matlack 提交于 5月 08, 2020

Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns

Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).

To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.

Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>

Message-Id: <20200508182240.68440-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb953129

KVM: nVMX: Migrate the VMX-preemption timer · 93dff2fe

由 Jim Mattson 提交于 5月 08, 2020

The hrtimer used to emulate the VMX-preemption timer must be pinned to
the same logical processor as the vCPU thread to be interrupted if we
want to have any hope of adhering to the architectural specification
of the VMX-preemption timer. Even with this change, the emulated
VMX-preemption timer VM-exit occasionally arrives too late.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Message-Id: <20200508203643.85477-4-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93dff2fe

KVM: X86: Introduce more exit_fastpath_completion enum values · 404d5d7b

由 Wanpeng Li 提交于 4月 28, 2020

Adds a fastpath_t typedef since enum lines are a bit long, and replace
EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values.

- EXIT_FASTPATH_EXIT_HANDLED  kvm will still go through it's full run loop,
                              but it would skip invoking the exit handler.

- EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered
                              without invoking the exit handler or going
                              back to vcpu_run
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

404d5d7b

KVM: X86: Force ASYNC_PF_PER_VCPU to be power of two · dd03bcaa

由 Peter Xu 提交于 4月 16, 2020

Forcing the ASYNC_PF_PER_VCPU to be power of two is much easier to be
used rather than calling roundup_pow_of_two() from time to time.  Do
this by adding a BUILD_BUG_ON() inside the hash function.

Another point is that generally async pf does not allow concurrency
over ASYNC_PF_PER_VCPU after all (see kvm_setup_async_pf()), so it
does not make much sense either to have it not a power of two or some
of the entries will definitely be wasted.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200416155859.267366-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dd03bcaa

KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums · 3bae0459

由 Sean Christopherson 提交于 4月 27, 2020

Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL
with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G.  KVM's
enums are borderline impossible to remember and result in code that is
visually difficult to audit, e.g.

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PT_PDPE_LEVEL;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PT_DIRECTORY_LEVEL;
        else
                ept_lpage_level = PT_PAGE_TABLE_LEVEL;

versus

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PG_LEVEL_1G;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PG_LEVEL_2M;
        else
                ept_lpage_level = PG_LEVEL_4K;

No functional change intended.
Suggested-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3bae0459

KVM: x86/mmu: Move max hugepage level to a separate #define · e662ec3e

由 Sean Christopherson 提交于 4月 27, 2020

Rename PT_MAX_HUGEPAGE_LEVEL to KVM_MAX_HUGEPAGE_LEVEL and make it a
separate define in anticipation of dropping KVM's PT_*_LEVEL enums in
favor of the kernel's PG_LEVEL_* enums.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e662ec3e

kvm: x86: Cleanup vcpu->arch.guest_xstate_size · a71936ab

由 Xiaoyao Li 提交于 4月 29, 2020

vcpu->arch.guest_xstate_size lost its only user since commit df1daba7
("KVM: x86: support XSAVES usage in the host"), so clean it up.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200429154312.1411-1-xiaoyao.li@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a71936ab

14 5月, 2020 7 次提交

KVM: x86/mmu: Capture TDP level when updating CPUID · e93fd3b3

由 Sean Christopherson 提交于 5月 01, 2020

Snapshot the TDP level now that it's invariant (SVM) or dependent only
on host capabilities and guest CPUID (VMX).  This avoids having to call
kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or
calculating the page role, and thus avoids the associated retpoline.

Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is
active is legal, if dodgy.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e93fd3b3

KVM: VMX: Add proper cache tracking for CR0 · bd31fe49

由 Sean Christopherson 提交于 5月 01, 2020

Move CR0 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
is called multiple times in a single VM-Exit, and more importantly
eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
"decache_cr0_guest_bits".

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd31fe49

KVM: VMX: Add proper cache tracking for CR4 · f98c1e77

由 Sean Christopherson 提交于 5月 01, 2020

Move CR4 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs and retpolines (when configured) during
nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
on each transition, e.g. when stuffing CR0 and CR3.

As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
on SVM when reading CR4, and squashes the confusing naming discrepancy
of "cache_reg" vs. "decache_cr4_guest_bits".

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f98c1e77

KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch' · 56ba77a4

由 Sean Christopherson 提交于 5月 01, 2020

Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
when reading L1's effective TSC, which is done at least once on every
VM-Exit.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56ba77a4

KVM: x86: Replace late check_nested_events() hack with more precise fix · c300ab9f

由 Paolo Bonzini 提交于 4月 23, 2020

Add an argument to interrupt_allowed and nmi_allowed, to checking if
interrupt injection is blocked.  Use the hook to handle the case where
an interrupt arrives between check_nested_events() and the injection
logic.  Drop the retry of check_nested_events() that hack-a-fixed the
same condition.

Blocking injection is also a bit of a hack, e.g. KVM should do exiting
and non-exiting interrupt processing in a single pass, but it's a more
precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
purely an optimization, setting it on every run loop (which KVM doesn't
do) should not affect functionality, only performance.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
[Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
 asynchronously right now, making the fix generic is easy and removes a
 special case. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c300ab9f

KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int · 88c604b6

由 Sean Christopherson 提交于 4月 22, 2020

Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to
better reflect the return semantics, and to avoid creating an even
bigger mess when the related VMX code is refactored in upcoming patches.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88c604b6

KVM: nVMX: Open a window for pending nested VMX preemption timer · d2060bd4

由 Sean Christopherson 提交于 4月 22, 2020

Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
use it to effectively open a window for servicing the expired timer.
Like pending SMIs on VMX, opening a window simply means requesting an
immediate exit.

This fixes a bug where an expired VMX preemption timer (for L2) will be
delayed and/or lost if a pending exception is injected into L2.  The
pending exception is rightly prioritized by vmx_check_nested_events()
and injected into L2, with the preemption timer left pending.  Because
no window opened, L2 is free to run uninterrupted.

Fixes: f4124500 ("KVM: nVMX: Fully emulate preemption timer")
Reported-by: NJim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-3-sean.j.christopherson@intel.com>
[Check it in kvm_vcpu_has_events too, to ensure that the preemption
 timer is serviced promptly even if the vCPU is halted and L1 is not
 intercepting HLT. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d2060bd4

13 5月, 2020 1 次提交

KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135

由 Babu Moger 提交于 5月 12, 2020

Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.

In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.

While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.

Cc: stable@vger.kernel.org
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37486135

08 5月, 2020 1 次提交

KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6 · d67668e9

由 Paolo Bonzini 提交于 5月 06, 2020

There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the
different handling of DR6 on intercepted #DB exceptions on Intel and AMD.

On Intel, #DB exceptions transmit the DR6 value via the exit qualification
field of the VMCS, and the exit qualification only contains the description
of the precise event that caused a vmexit.

On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception
was to be injected into the guest.  This has two effects when guest debugging
is in use:

* the guest DR6 is clobbered

* the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather
than just the last one that happened (the testcase in the next patch covers
this issue).

This patch fixes both issues by emulating, so to speak, the Intel behavior
on AMD processors.  The important observation is that (after the previous
patches) the VMCB value of DR6 is only ever observable from the guest is
KVM_DEBUGREG_WONT_EXIT is set.  Therefore we can actually set vmcb->save.dr6
to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it
will be if guest debugging is enabled.

Therefore it is possible to enter the guest with an all-zero DR6,
reconstruct the #DB payload from the DR6 we get at exit time, and let
kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6.
Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT
is set, but this is harmless.

This may not be the most optimized way to deal with this, but it is
simple and, being confined within SVM code, it gets rid of the set_dr6
callback and kvm_update_dr6.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d67668e9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功