提交 · 61f1dd9099aba56b7e6e3c3c4b9ad13199bba06e · openanolis / cloud-kernel

21 10月, 2017 2 次提交

KVM: VMX: Fix VPID capability detection · 61f1dd90

由 Wanpeng Li 提交于 10月 18, 2017

In my setup, EPT is not exposed to L1, the VPID capability is exposed and
can be observed by vmxcap tool in L1:
INVVPID supported                        yes
Individual-address INVVPID               yes
Single-context INVVPID                   yes
All-context INVVPID                      yes
Single-context-retaining-globals INVVPID yes

However, the module parameter of VPID observed in L1 is always N, the
cpu_has_vmx_invvpid() check in L1 KVM fails since vmx_capability.vpid
is 0 and it is not read from MSR due to EPT is not exposed.

The VPID can be used to tag linear mappings when EPT is not enabled. However,
current logic just detects VPID capability if EPT is enabled, this patch
fixes it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

61f1dd90

KVM: nVMX: Fix EPT switching advertising · 575b3a2c

由 Wanpeng Li 提交于 10月 19, 2017

I can use vmxcap tool to observe "EPTP Switching   yes" even if EPT is not
exposed to L1.

EPT switching is advertised unconditionally since it is emulated, however,
it can be treated as an extended feature for EPT and it should not be
advertised if EPT itself is not exposed. This patch fixes it.
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

575b3a2c

19 10月, 2017 1 次提交

KVM: SVM: detect opening of SMI window using STGI intercept · cc3d967f

由 Ladi Prosek 提交于 10月 17, 2017

Commit 05cade71 ("KVM: nSVM: fix SMI injection in guest mode") made
KVM mask SMI if GIF=0 but it didn't do anything to unmask it when GIF is
enabled.

The issue manifests for me as a significantly longer boot time of Windows
guests when running with SMM-enabled OVMF.

This commit fixes it by intercepting STGI instead of requesting immediate
exit if the reason why SMM was masked is GIF.

Fixes: 05cade71 ("KVM: nSVM: fix SMI injection in guest mode")
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

cc3d967f

12 10月, 2017 36 次提交

KVM: x86: extend usage of RET_MMIO_PF_* constants · 9b8ebbdb

由 Paolo Bonzini 提交于 8月 17, 2017

The x86 MMU if full of code that returns 0 and 1 for retry/emulate.  Use
the existing RET_MMIO_PF_RETRY/RET_MMIO_PF_EMULATE enum, renaming it to
drop the MMIO part.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b8ebbdb

KVM: nSVM: fix SMI injection in guest mode · 05cade71

由 Ladi Prosek 提交于 10月 11, 2017

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* MMU was confused (walk_mmu was still set to nested_mmu)
* INTERCEPT_SMI was not emulated for L1 (KVM never injected SVM_EXIT_SMI)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. AMD doesn't seem to
document this but they provide fields in the SMM state-save area to stash the
current state of SVM. What we need to do is basically get out of guest mode for
the duration of SMM. All this completely transparent to L1, i.e. L1 is not given
control and no L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_svm_vmexit is called, unchanged. Re-entering is performed using
enter_svm_guest_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

05cade71

KVM: nSVM: refactor nested_svm_vmrun · c2634065

由 Ladi Prosek 提交于 10月 11, 2017

Analogous to 858e25c0 ("kvm: nVMX: Refactor nested_vmx_run()"), this commit splits
nested_svm_vmrun into two parts. The newly introduced enter_svm_guest_mode modifies the
vcpu state to transition from L1 to L2, while the code left in nested_svm_vmrun handles
the VMRUN instruction.
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c2634065

KVM: nVMX: fix SMI injection in guest mode · 72e9cbdb

由 Ladi Prosek 提交于 10月 11, 2017

Entering SMM while running in guest mode wasn't working very well because several
pieces of the vcpu state were left set up for nested operation.

Some of the issues observed:

* L1 was getting unexpected VM exits (using L1 interception controls but running
  in SMM execution environment)
* SMM handler couldn't write to vmx_set_cr4 because of incorrect validity checks
  predicated on nested.vmxon
* MMU was confused (walk_mmu was still set to nested_mmu)

Intel SDM actually prescribes the logical processor to "leave VMX operation" upon
entering SMM in 34.14.1 Default Treatment of SMI Delivery. What we need to do is
basically get out of guest mode and set nested.vmxon to false for the duration of
SMM. All this completely transparent to L1, i.e. L1 is not given control and no
L1 observable state changes.

To avoid code duplication this commit takes advantage of the existing nested
vmexit and run functionality, perhaps at the cost of efficiency. To get out of
guest mode, nested_vmx_vmexit with exit_reason == -1 is called, a trick already
used in vmx_leave_nested. Re-entering is cleaner, using enter_vmx_non_root_mode.

This commit fixes running Windows Server 2016 with Hyper-V enabled in a VM with
OVMF firmware (OVMF_CODE-need-smm.fd).
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72e9cbdb

KVM: nVMX: set IDTR and GDTR limits when loading L1 host state · 21f2d551

由 Ladi Prosek 提交于 10月 11, 2017

Intel SDM 27.5.2 Loading Host Segment and Descriptor-Table Registers:

"The GDTR and IDTR limits are each set to FFFFH."
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21f2d551

KVM: x86: introduce ISA specific smi_allowed callback · 72d7b374

由 Ladi Prosek 提交于 10月 11, 2017

Similar to NMI, there may be ISA specific reasons why an SMI cannot be
injected into the guest. This commit adds a new smi_allowed callback to
be implemented in following commits.
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

72d7b374

KVM: x86: introduce ISA specific SMM entry/exit callbacks · 0234bf88

由 Ladi Prosek 提交于 10月 11, 2017

Entering and exiting SMM may require ISA specific handling under certain
circumstances. This commit adds two new callbacks with empty implementations.
Actual functionality will be added in following commits.

* pre_enter_smm() is to be called when injecting an SMM, before any
  SMM related vcpu state has been changed
* pre_leave_smm() is to be called when emulating the RSM instruction,
  when the vcpu is in real mode and before any SMM related vcpu state
  has been restored
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0234bf88

KVM: SVM: limit kvm_handle_page_fault to #PF handling · d0006530

由 Paolo Bonzini 提交于 8月 11, 2017

It has always annoyed me a bit how SVM_EXIT_NPF is handled by
pf_interception.  This is also the only reason behind the
under-documented need_unprotect argument to kvm_handle_page_fault.
Let NPF go straight to kvm_mmu_page_fault, just like VMX
does in handle_ept_violation and handle_ept_misconfig.
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0006530

KVM: SVM: unconditionally wake up VCPU on IOMMU interrupt · 1cf53587

由 Paolo Bonzini 提交于 10月 10, 2017

Checking the mode is unnecessary, and is done without a memory barrier
separating the LAPIC write from the vcpu->mode read; in addition,
kvm_vcpu_wake_up is already doing a check for waiters on the wait queue
that has the same effect.

In practice it's safe because spin_lock has full-barrier semantics on x86,
but don't be too clever.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1cf53587

arch/x86: remove redundant null checks before kmem_cache_destroy · c1bd743e

由 Tim Hansen 提交于 10月 07, 2017

Remove redundant null checks before calling kmem_cache_destroy.

Found with make coccicheck M=arch/x86/kvm on linux-next tag
next-20170929.
Signed-off-by: NTim Hansen <devtimhansen@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1bd743e

KVM: VMX: Don't expose unrestricted_guest is enabled if ept is disabled · 8ad8182e

由 Wanpeng Li 提交于 10月 09, 2017

SDM mentioned:

 "If either the â€œunrestricted guestâ€ VM-execution control or the â€œmode-based
  execute control for EPTâ€ VM- execution control is 1, the â€œenable EPTâ€
  VM-execution control must also be 1."

However, we can still observe unrestricted_guest is Y after inserting the kvm-intel.ko
w/ ept=N. It depends on later starts a guest in order that the function
vmx_compute_secondary_exec_control() can be executed, then both the module parameter
and exec control fields will be amended.

This patch fixes it by amending module parameter immediately during vmcs data setup.
Reviewed-by: NJim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8ad8182e

KVM: X86: Processor States following Reset or INIT · a554d207

由 Wanpeng Li 提交于 10月 11, 2017

- XCR0 is reset to 1 by RESET but not INIT
- XSS is zeroed by both RESET and INIT
- BNDCFGU, BND0-BND3, BNDCFGS, BNDSTATUS are zeroed by both RESET and INIT

This patch does this according to SDM.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a554d207

KVM: x86: thoroughly disarm LAPIC timer around TSC deadline switch · 44275932

由 Radim Krčmář 提交于 10月 06, 2017

Our routines look at tscdeadline and period when deciding state of a
timer.  The timer is disarmed when switching between TSC deadline and
other modes, so we should set everything to disarmed state.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44275932

KVM: x86: really disarm lapic timer when clearing TMICT · 5d74a699

由 Radim Krčmář 提交于 10月 06, 2017

preemption timer only looks at tscdeadline and could inject already
disarmed timer.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5d74a699

KVM: x86: handle 0 write to TSC_DEADLINE MSR · 86bbc1e6

由 Radim Krčmář 提交于 10月 06, 2017

0 should disable the timer, but start_hv_timer will recognize it as an
expired timer instead.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

86bbc1e6

kvm, mm: account kvm related kmem slabs to kmemcg · 46bea48a

由 Shakeel Butt 提交于 10月 05, 2017

The kvm slabs can consume a significant amount of system memory
and indeed in our production environment we have observed that
a lot of machines are spending significant amount of memory that
can not be left as system memory overhead. Also the allocations
from these slabs can be triggered directly by user space applications
which has access to kvm and thus a buggy application can leak
such memory. So, these caches should be accounted to kmemcg.
Signed-off-by: NShakeel Butt <shakeelb@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46bea48a

KVM: VMX: rename RDSEED and RDRAND vmx ctrls to reflect exiting · 736fdf72

由 David Hildenbrand 提交于 8月 24, 2017

Let's just name these according to the SDM. This should make it clearer
that the are used to enable exiting and not the feature itself.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

736fdf72

KVM: x86: allow setting identity map addr with no vcpus only · 1af1ac91

由 David Hildenbrand 提交于 8月 24, 2017

Changing it afterwards doesn't make too much sense and will only result
in inconsistencies.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1af1ac91

KVM: VMX: cleanup init_rmode_identity_map() · d8a6e365

由 David Hildenbrand 提交于 8月 24, 2017

No need for another enable_ept check. kvm->arch.ept_identity_map_addr
only has to be inititalized once. Having alloc_identity_pagetable() is
overkill and dropping BUG_ONs is always nice.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

d8a6e365

KVM: nVMX: no need to set ept/vpid caps to 0 · 1c13bffd

由 David Hildenbrand 提交于 8月 24, 2017

They are inititally 0, so no need to reset them to 0.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

1c13bffd

KVM: nVMX: no need to set vcpu->cpu when switching vmcs · 0ee096d0

由 David Hildenbrand 提交于 8月 24, 2017

vcpu->cpu is not cleared when doing a vmx_vcpu_put/load, so this can be
dropped.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0ee096d0

KVM: VMX: drop unnecessary function declarations · 9522ea9e

由 David Hildenbrand 提交于 8月 24, 2017

Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9522ea9e

KVM: VMX: require INVEPT GLOBAL for EPT · f5f51586

由 David Hildenbrand 提交于 8月 24, 2017

Without this, we won't be able to do any flushes, so let's just require
it. Should be absent in very strange configurations.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f5f51586

KVM: VMX: call ept_sync_global() with enable_ept only · fdf288bf

由 David Hildenbrand 提交于 8月 24, 2017

ept_* function should only be called with enable_ept being set.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

fdf288bf

KVM: VMX: drop enable_ept check from ept_sync_context() · 0e1252dc

由 David Hildenbrand 提交于 8月 24, 2017

This function is only called with enable_ept.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0e1252dc

KVM: x86: no need to inititalize vcpu members to 0 · f2d1da69

由 David Hildenbrand 提交于 8月 24, 2017

vmx and svm use zalloc, so this is not necessary.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

f2d1da69

KVM: VMX: vmx_vcpu_setup() cannot fail · 12d79917

由 David Hildenbrand 提交于 8月 24, 2017

Make it a void and drop error handling code.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

12d79917

KVM: x86: drop BUG_ON(vcpu->kvm) · 26de7988

由 David Hildenbrand 提交于 8月 24, 2017

And also get rid of that superfluous local variable "kvm".
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

26de7988

KVM: x86: mmu: free_page can handle NULL · 87ca74ad

由 David Hildenbrand 提交于 8月 24, 2017

Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

87ca74ad

KVM: x86: mmu: returning void in a void function is strange · bb606a9b

由 David Hildenbrand 提交于 8月 24, 2017

Let's just drop the return.
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

bb606a9b

KVM: LAPIC: Apply change to TDCR right away to the timer · c301b909

由 Wanpeng Li 提交于 10月 06, 2017

The description in the Intel SDM of how the divide configuration
register is used: "The APIC timer frequency will be the processor's bus
clock or core crystal clock frequency divided by the value specified in
the divide configuration register."

Observation of baremetal shown that when the TDCR is change, the TMCCT
does not change or make a big jump in value, but the rate at which it
count down change.

The patch update the emulation to APIC timer to so that a change to the
divide configuration would be reflected in the value of the counter and
when the next interrupt is triggered.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Fixed some whitespace and added a check for negative delta and running
 timer. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c301b909

KVM: LAPIC: Keep timer running when switching between one-shot and periodic mode · dedf9c5e

由 Wanpeng Li 提交于 10月 05, 2017

If we take TSC-deadline mode timer out of the picture, the Intel SDM
does not say that the timer is disable when the timer mode is change,
either from one-shot to periodic or vice versa.

After this patch, the timer is no longer disarmed on change of mode, so
the counter (TMCCT) keeps counting down.

So what does a write to LVTT changes ? On baremetal, the change of mode
is probably taken into account only when the counter reach 0. When this
happen, LVTT is use to figure out if the counter should restard counting
down from TMICT (so periodic mode) or stop counting (if one-shot mode).

This patch is based on observation of the behavior of the APIC timer on
baremetal as well as check that they does not go against the description
written in the Intel SDM.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Fixed rate limiting of periodic timer.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

dedf9c5e

KVM: LAPIC: Introduce limit_periodic_timer_frequency · ccbfa1d3

由 Wanpeng Li 提交于 10月 05, 2017

Extract the logic of limit lapic periodic timer frequency to a new function,
this function will be used by later patches.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

ccbfa1d3

KVM: LAPIC: Fix lapic timer mode transition · c69518c8

由 Wanpeng Li 提交于 10月 05, 2017

SDM 10.5.4.1 TSC-Deadline Mode mentioned that "Transitioning between TSC-Deadline
mode and other timer modes also disarms the timer". So the APIC Timer Initial Count
Register for one-shot/periodic mode should be reset. This patch do it.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
[Removed unnecessary definition of APIC_LVT_TIMER_MASK.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

c69518c8

KVM: VMX: Don't expose PLE enable if there is no hardware support · 0f107682

由 Wanpeng Li 提交于 9月 28, 2017

KVM doesn't expose the PLE capability to the L1 hypervisor, however,
ple_window still shows the default value on L1 hypervisor. This patch
fixes it by clearing all the PLE related module parameter if there is
no PLE capability.
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

0f107682

KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit · 8eb3f87d

由 Haozhong Zhang 提交于 10月 10, 2017

When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
guest CR4. Before this CR4 loading, the guest CR4 refers to L2
CR4. Because these two CR4's are in different levels of guest, we
should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
is used to handle guest writes to its CR4, checks the guest change to
CR4 and may fail if the change is invalid.

The failure may cause trouble. Consider we start
  a L1 guest with non-zero L1 PCID in use,
     (i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
and
  a L2 guest with L2 PCID disabled,
     (i.e. L2 CR4.PCIDE == 0)
and following events may happen:

1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
   into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
   of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
   vcpu->arch.cr4) is left to the value of L2 CR4.

2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
   kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
   because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
   CR3.PCID != 0, L0 KVM will inject GP to L1 guest.

Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1")
Cc: qemu-stable@nongnu.org
Signed-off-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8eb3f87d

10 10月, 2017 1 次提交

KVM: MMU: always terminate page walks at level 1 · 829ee279

由 Ladi Prosek 提交于 10月 05, 2017

is_last_gpte() is not equivalent to the pseudo-code given in commit
6bb69c9b ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
value of last_nonleaf_level may override the result even if level == 1.

It is critical for is_last_gpte() to return true on level == 1 to
terminate page walks. Otherwise memory corruption may occur as level
is used as an index to various data structures throughout the page
walking code.  Even though the actual bug would be wherever the MMU is
initialized (as in the previous patch), be defensive and ensure here
that is_last_gpte() returns the correct value.

This patch is also enough to fix CVE-2017-12188.

Fixes: 6bb69c9b
Cc: stable@vger.kernel.org
Cc: Andy Honig <ahonig@google.com>
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
[Panic if walk_addr_generic gets an incorrect level; this is a serious
 bug and it's not worth a WARN_ON where the recovery path might hide
 further exploitable issues; suggested by Andrew Honig. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

829ee279

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功