提交 · c845f9c646e646e6a5fe416c2e835342984249f7 · openanolis / cloud-kernel

11 3月, 2014 5 次提交

KVM: vmx: we do rely on loading DR7 on entry · c845f9c6

由 Paolo Bonzini 提交于 2月 21, 2014

Currently, this works even if the bit is not in "min", because the bit is always
set in MSR_IA32_VMX_ENTRY_CTLS. Mention it for the sake of documentation, and
to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c845f9c6

KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f

由 Jan Kiszka 提交于 3月 07, 2014

It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9a7953f

KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt · 220c5672

由 Jan Kiszka 提交于 3月 07, 2014

According to SDM 27.2.3, IDT vectoring information will not be valid on
vmexits caused by external NMIs. So we have to avoid creating such
scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
have a pending interrupt because that one would be migrated to L1's IDT
vectoring info on nested exit.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

220c5672

KVM: nVMX: Fully emulate preemption timer · f4124500

由 Jan Kiszka 提交于 3月 07, 2014

We cannot rely on the hardware-provided preemption timer support because
we are holding L2 in HLT outside non-root mode. Furthermore, emulating
the preemption will resolve tick rate errata on older Intel CPUs.

The emulation is based on hrtimer which is started on L2 entry, stopped
on L2 exit and evaluated via the new check_nested_events hook. As we no
longer rely on hardware features, we can enable both the preemption
timer support and value saving unconditionally.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4124500

KVM: nVMX: Rework interception of IRQs and NMIs · b6b8a145

由 Jan Kiszka 提交于 3月 07, 2014

Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6b8a145

03 3月, 2014 1 次提交

kvm, vmx: Really fix lazy FPU on nested guest · ccf9844e

由 Paolo Bonzini 提交于 2月 27, 2014

Commit e504c909 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
highlighted a real problem, but the fix was subtly wrong.

nested_read_cr0 is the CR0 as read by L2, but here we want to look at
the CR0 value reflecting L1's setup.  In other words, L2 might think
that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
running it with TS=1, we should inject the fault into L1.

The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
it.

Fixes: e504c909Reported-by: NKashyap Chamarty <kchamart@redhat.com>
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Tested-by: NKashyap Chamarty <kchamart@redhat.com>
Tested-by: NAnthoine Bourgeois <bourgeois@bertin.fr>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ccf9844e

28 2月, 2014 1 次提交

kvm, vmx: Really fix lazy FPU on nested guest · 1b385cbd

由 Paolo Bonzini 提交于 2月 27, 2014

Commit e504c909 (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
highlighted a real problem, but the fix was subtly wrong.

nested_read_cr0 is the CR0 as read by L2, but here we want to look at
the CR0 value reflecting L1's setup.  In other words, L2 might think
that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
running it with TS=1, we should inject the fault into L1.

The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
it.

Fixes: e504c909Reported-by: NKashyap Chamarty <kchamart@redhat.com>
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Tested-by: NKashyap Chamarty <kchamart@redhat.com>
Tested-by: NAnthoine Bourgeois <bourgeois@bertin.fr>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b385cbd

26 2月, 2014 1 次提交

KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save · 0dd376e7

由 Liu, Jinsong 提交于 2月 24, 2014

From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:11:55 +0800
Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save

Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic
to kvm_get/set_msr().
Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dd376e7

24 2月, 2014 1 次提交

KVM: x86: Intel MPX vmx and msr handle · da8999d3

由 Liu, Jinsong 提交于 2月 24, 2014

From caddc009a6d2019034af8f2346b2fd37a81608d0 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:11:11 +0800
Subject: [PATCH v5 1/3] KVM: x86: Intel MPX vmx and msr handle

This patch handle vmx and msr of Intel MPX feature.
Signed-off-by: NXudong Hao <xudong.hao@intel.com>
Signed-off-by: NLiu Jinsong <jinsong.liu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da8999d3

27 1月, 2014 1 次提交

KVM: x86: Validate guest writes to MSR_IA32_APICBASE · 58cb628d

由 Jan Kiszka 提交于 1月 24, 2014

Check for invalid state transitions on guest-initiated updates of
MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is
not supported and all invalid transitions as described in SDM section
10.12.5. It also checks that no reserved bit is set in APICBASE by the
guest.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
[Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

58cb628d

17 1月, 2014 8 次提交

KVM: nVMX: Update guest activity state field on L2 exits · 3edf1e69

由 Jan Kiszka 提交于 1月 04, 2014

Set guest activity state in L1's VMCS according to the VCPUs mp_state.
This ensures we report the correct state in case we L2 executed HLT or
if we put L2 into HLT state and it was now woken up by an event.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3edf1e69

KVM: nVMX: Fix nested_run_pending on activity state HLT · 7af40ad3

由 Jan Kiszka 提交于 1月 04, 2014

When we suspend the guest in HLT state, the nested run is no longer
pending - we emulated it completely. So only set nested_run_pending
after checking the activity state.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7af40ad3

KVM: nVMX: Clean up handling of VMX-related MSRs · cae50139

由 Jan Kiszka 提交于 1月 04, 2014

This simplifies the code and also stops issuing warning about writing to
unhandled MSRs when VMX is disabled or the Feature Control MSR is
locked - we do handle them all according to the spec.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cae50139

KVM: nVMX: Add tracepoints for nested_vmexit and nested_vmexit_inject · 542060ea

由 Jan Kiszka 提交于 1月 04, 2014

Already used by nested SVM for tracing nested vmexit: kvm_nested_vmexit
marks exits from L2 to L0 while kvm_nested_vmexit_inject marks vmexits
that are reflected to L1.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

542060ea

KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit · 533558bc

由 Jan Kiszka 提交于 1月 04, 2014

Instead of fixing up the vmcs12 after the nested vmexit, pass key
parameters already when calling nested_vmx_vmexit. This will help
tracing those vmexits.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

533558bc

KVM: nVMX: Leave VMX mode on clearing of feature control MSR · 42124925

由 Jan Kiszka 提交于 1月 04, 2014

When userspace sets MSR_IA32_FEATURE_CONTROL to 0, make sure we leave
root and non-root mode, fully disabling VMX. The register state of the
VCPU is undefined after this step, so userspace has to set it to a
proper state afterward.

This enables to reboot a VM while it is running some hypervisor code.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42124925

KVM: VMX: Fix DR6 update on #DB exception · 8246bf52

由 Jan Kiszka 提交于 1月 04, 2014

According to the SDM, only bits 0-3 of DR6 "may" be cleared by "certain"
debug exception. So do update them on #DB exception in KVM, but leave
the rest alone, only setting BD and BS in addition to already set bits
in DR6. This also aligns us with kvm_vcpu_check_singlestep.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8246bf52

KVM: SVM: Fix reading of DR6 · 73aaf249

由 Jan Kiszka 提交于 1月 04, 2014

In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df079.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73aaf249

09 1月, 2014 2 次提交

KVM: VMX: fix use after free of vmx->loaded_vmcs · 26a865f4

由 Marcelo Tosatti 提交于 1月 03, 2014

After free_loaded_vmcs executes, the "loaded_vmcs" structure
is kfreed, and now vmx->loaded_vmcs points to a kfreed area.
Subsequent free_loaded_vmcs then attempts to manipulate
vmx->loaded_vmcs.

Switch the order to avoid the problem.

https://bugzilla.redhat.com/show_bug.cgi?id=1047892Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

26a865f4

KVM: VMX: check use I/O bitmap first before unconditional I/O exit · 2f0a6397

由 Zhihui Zhang 提交于 12月 30, 2013

According to Table C-1 of Intel SDM 3C, a VM exit happens on an I/O instruction when
"use I/O bitmaps" VM-execution control was 0 _and_ the "unconditional I/O exiting"
VM-execution control was 1. So we can't just check "unconditional I/O exiting" alone.
This patch was improved by suggestion from Jan Kiszka.
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NZhihui Zhang <zzhsuny@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2f0a6397

02 1月, 2014 1 次提交

KVM: nVMX: Unconditionally uninit the MMU on nested vmexit · 29bf08f1

由 Jan Kiszka 提交于 12月 28, 2013

Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway
in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is
most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if
one guest VCPU manipulates the page of another VCPU in L2, we may be
fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in
nested state. That can crash the host later on if nested_ept_get_cr3 is
invoked while L1 already left vmxon and nested.current_vmcs12 became
NULL therefore.

Cc: stable@kernel.org
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

29bf08f1

21 12月, 2013 1 次提交

KVM: VMX: Do not skip the instruction if handle_dr injects a fault · 4c4d563b

由 Jan Kiszka 提交于 12月 18, 2013

If kvm_get_dr or kvm_set_dr reports that it raised a fault, we must not
advance the instruction pointer. Otherwise the exception will hit the
wrong instruction.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c4d563b

18 12月, 2013 1 次提交

KVM: nVMX: Support direct APIC access from L2 · ca3f257a

由 Jan Kiszka 提交于 12月 16, 2013

It's a pathological case, but still a valid one: If L1 disables APIC
virtualization and also allows L2 to directly write to the APIC page, we
have to forcibly enable APIC virtualization while in L2 if the in-kernel
APIC is in use.

This allows to run the direct interrupt test case in the vmx unit test
without x2APIC.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ca3f257a

12 12月, 2013 2 次提交

KVM: nVMX: Add support for activity state HLT · 6dfacadd

由 Jan Kiszka 提交于 12月 04, 2013

We can easily emulate the HLT activity state for L1: If it decides that
L2 shall be halted on entry, just invoke the normal emulation of halt
after switching to L2. We do not depend on specific host features to
provide this, so we can expose the capability unconditionally.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6dfacadd

KVM: VMX: shadow VM_(ENTRY|EXIT)_CONTROLS vmcs field · 2961e876

由 Gleb Natapov 提交于 11月 25, 2013

VM_(ENTRY|EXIT)_CONTROLS vmcs fields are read/written on each guest
entry but most times it can be avoided since values do not changes.
Keep fields copy in memory to avoid unnecessary reads from vmcs.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2961e876

14 11月, 2013 1 次提交

kvm, vmx: Fix lazy FPU on nested guest · e504c909

由 Anthoine Bourgeois 提交于 11月 13, 2013

If a nested guest does a NM fault but its CR0 doesn't contain the TS
flag (because it was already cleared by the guest with L1 aid) then we
have to activate FPU ourselves in L0 and then continue to L2. If TS flag
is set then we fallback on the previous behavior, forward the fault to
L1 if it asked for.
Signed-off-by: NAnthoine Bourgeois <bourgeois@bertin.fr>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e504c909

31 10月, 2013 3 次提交

kvm/vmx: error message typo fix · 60266204

由 Michael S. Tsirkin 提交于 10月 31, 2013

mst can't be blamed for lack of switch entries: the
issue is with msrs actually.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

60266204

kvm: Create non-coherent DMA registeration · e0f0bbc5

由 Alex Williamson 提交于 10月 30, 2013

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group. Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0f0bbc5

kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6

由 Alex Williamson 提交于 10月 30, 2013

Default to operating in coherent mode.  This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d96eb2c6

28 10月, 2013 3 次提交

nVMX: Report CPU_BASED_VIRTUAL_NMI_PENDING as supported · a294c9bb

由 Jan Kiszka 提交于 10月 23, 2013

If the host supports it, we can and should expose it to the guest as
well, just like we already do with PIN_BASED_VIRTUAL_NMIS.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a294c9bb

nVMX: Fix pick-up of uninjected NMIs · cd2633c5

由 Jan Kiszka 提交于 10月 23, 2013

__vmx_complete_interrupts stored uninjected NMIs in arch.nmi_injected,
not arch.nmi_pending. So we actually need to check the former field in
vmcs12_save_pending_event. This fixes the eventinj unit test when run
in nested KVM.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cd2633c5

KVM: nVMX: Report 2MB EPT pages as supported · d3134dbf

由 Jan Kiszka 提交于 10月 23, 2013

As long as the hardware provides us 2MB EPT pages, we can also expose
them to the guest because our shadow EPT code already supports this
feature.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d3134dbf

11 10月, 2013 1 次提交

KVM: nVMX: Fully support nested VMX preemption timer · 7854cbca

由 Arthur Chunqi Li 提交于 9月 16, 2013

This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2->L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of "Save VMX-preemption timer value" VM-Exit controls
to nVMX.

With this patch, nested VMX preemption timer features are fully
supported.
Signed-off-by: NArthur Chunqi Li <yzt356@gmail.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7854cbca

10 10月, 2013 1 次提交

KVM: nVMX: fix shadow on EPT · d0d538b9

由 Gleb Natapov 提交于 10月 09, 2013

72f85795 broke shadow on EPT. This patch reverts it and fixes PAE
on nEPT (which reverted commit fixed) in other way.

Shadow on EPT is now broken because while L1 builds shadow page table
for L2 (which is PAE while L2 is in real mode) it never loads L2's
GUEST_PDPTR[0-3].  They do not need to be loaded because without nested
virtualization HW does this during guest entry if EPT is disabled,
but in our case L0 emulates L2's vmentry while EPT is enables, so we
cannot rely on vmcs12->guest_pdptr[0-3] to contain up-to-date values
and need to re-read PDPTEs from L2 memory. This is what kvm_set_cr3()
is doing, but by clearing cache bits during L2 vmentry we drop values
that kvm_set_cr3() read from memory.

So why the same code does not work for PAE on nEPT? kvm_set_cr3()
reads pdptes into vcpu->arch.walk_mmu->pdptrs[]. walk_mmu points to
vcpu->arch.nested_mmu while nested guest is running, but ept_load_pdptrs()
uses vcpu->arch.mmu which contain incorrect values. Fix that by using
walk_mmu in ept_(load|save)_pdptrs.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Tested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0d538b9

03 10月, 2013 1 次提交

KVM: mmu: change useless int return types to void · 8a3c1a33

由 Paolo Bonzini 提交于 10月 02, 2013

kvm_mmu initialization is mostly filling in function pointers, there is
no way for it to fail.  Clean up unused return values.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8a3c1a33

30 9月, 2013 4 次提交

KVM: nVMX: Do not generate #DF if #PF happens during exception delivery into L2 · feaf0c7d