提交 · 296f047502f1b3ddfd63adbc192624ce80740081 · openeuler / Kernel

30 7月, 2014 1 次提交

KVM: vmx: remove duplicate vmx_mpx_supported() prototype · 296f0475

由 Chris J Arges 提交于 7月 29, 2014

Remove a prototype which was added by both 93c4adc7 and 36be0b9d.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

296f0475

24 7月, 2014 2 次提交

Replace NR_VMX_MSR with its definition · 03916db9

由 Paolo Bonzini 提交于 7月 24, 2014

Using ARRAY_SIZE directly makes it easier to read the code.  While touching
the code, replace the division by a multiplication in the recently added
BUILD_BUG_ON.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03916db9

KVM: x86: Assertions to check no overrun in MSR lists · 0123be42

由 Nadav Amit 提交于 7月 24, 2014

Currently there is no check whether shared MSRs list overrun the allocated size
which can results in bugs. In addition there is no check that vmx->guest_msrs
has sufficient space to accommodate all the VMX msrs.  This patch adds the
assertions.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0123be42

21 7月, 2014 3 次提交

KVM: x86: DR6/7.RTM cannot be written · 6f43ed01

由 Nadav Amit 提交于 7月 15, 2014

Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f43ed01

KVM: nVMX: clean up nested_release_vmcs12 and code around it · 9a2a05b9

由 Paolo Bonzini 提交于 7月 17, 2014

Make nested_release_vmcs12 idempotent.
Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a2a05b9

KVM: nVMX: fix lifetime issues for vmcs02 · 4fa7734c

由 Paolo Bonzini 提交于 7月 17, 2014

free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in
order to detach it from the shadow vmcs.  However, this is not
available anymore after commit 26a865f4 (KVM: VMX: fix use after
free of vmx->loaded_vmcs, 2014-01-03).

Revert that patch, and fix its problem by forcing a vmcs01 as the
active VMCS before freeing all the nested VMX state.
Reported-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4fa7734c

17 7月, 2014 1 次提交

KVM: nVMX: Fix virtual interrupt delivery injection · 963fee16

由 Wanpeng Li 提交于 7月 17, 2014

This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331,
after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is
some progress and the L2 can boot up, however, slowly. The original idea of this
fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>.

Interrupt which delivered by vid should be injected to L1 by L0 if current is in
L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't
have set External-interrupt exiting bit. The current logic doen't consider these
cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old
injection way if L1 doen't have External-interrupt exiting bit set.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: N"Zhang, Yang Z" <yang.z.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

963fee16

11 7月, 2014 2 次提交

KVM: x86: return all bits from get_interrupt_shadow · 37ccdcbe

由 Paolo Bonzini 提交于 5月 20, 2014

For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37ccdcbe

KVM: vmx: speed up emulation of invalid guest state · 98eb2f8b

由 Paolo Bonzini 提交于 3月 27, 2014

About 25% of the time spent in emulation of invalid guest state
is wasted in checking whether emulation is required for the next
instruction.  However, this almost never changes except when a
segment register (or TR or LDTR) changes, or when there is a mode
transition (i.e. CR0 changes).

In fact, vmx_set_segment and vmx_set_cr0 already modify
vmx->emulation_required (except that the former for some reason
uses |= instead of just an assignment).  So there is no need to
call guest_state_valid in the emulation loop.

Emulation performance test results indicate 1650-2600 cycles
for common instructions, versus 2300-3200 before this patch on
a Sandy Bridge Xeon.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98eb2f8b

19 6月, 2014 9 次提交

KVM: vmx: vmx instructions handling does not consider cs.l · 27e6fb5d

由 Nadav Amit 提交于 6月 18, 2014

VMX instructions use 32-bit operands in 32-bit mode, and 64-bit operands in
64-bit mode. The current implementation is broken since it does not use the
register operands correctly, and always uses 64-bit for reads and writes.
Moreover, write to memory in vmwrite only considers long-mode, so it ignores
cs.l. This patch fixes this behavior.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27e6fb5d

KVM: vmx: handle_cr ignores 32/64-bit mode · 1e32c079

由 Nadav Amit 提交于 6月 18, 2014

On 32-bit mode only bits [31:0] of the CR should be used for setting the CR
value.  Otherwise, the host may incorrectly assume the value is invalid if bits
[63:32] are not zero.  Moreover, the CR is currently being read twice when CR8
is used.  Last, nested mov-cr exiting is modified to handle the CR value
correctly as well.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1e32c079

KVM: x86: check DR6/7 high-bits are clear only on long-mode · 5777392e

由 Nadav Amit 提交于 6月 18, 2014

When the guest sets DR6 and DR7, KVM asserts the high 32-bits are clear, and
otherwise injects a #GP exception. This exception should only be injected only
if running in long-mode.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5777392e

KVM: nVMX: Fix returned value of MSR_IA32_VMX_VMCS_ENUM · 5381417f

由 Jan Kiszka 提交于 6月 16, 2014

Many real CPUs get this wrong as well, but ours is totally off: bits 9:1
define the highest index value.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5381417f

KVM: nVMX: Allow to disable VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS · 2996fca0

由 Jan Kiszka 提交于 6月 16, 2014

Allow L1 to "leak" its debug controls into L2, i.e. permit cleared
VM_{ENTRY_LOAD,EXIT_SAVE}_DEBUG_CONTROLS. This requires to manually
transfer the state of DR7 and IA32_DEBUGCTLMSR from L1 into L2 as both
run on different VMCS.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2996fca0

KVM: nVMX: Fix returned value of MSR_IA32_VMX_PROCBASED_CTLS · 560b7ee1

由 Jan Kiszka 提交于 6月 16, 2014

SDM says bits 1, 4-6, 8, 13-16, and 26 have to be set.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

560b7ee1

KVM: nVMX: Allow to disable CR3 access interception · 3dcdf3ec

由 Jan Kiszka 提交于 6月 16, 2014

We already have this control enabled by exposing a broken
MSR_IA32_VMX_PROCBASED_CTLS value. This will properly advertise our
capability once the value is fixed by clearing the right bits in
MSR_IA32_VMX_TRUE_PROCBASED_CTLS. We also have to ensure to test the
right value on L2 entry.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3dcdf3ec

KVM: nVMX: Advertise support for MSR_IA32_VMX_TRUE_*_CTLS · 3dbcd8da

由 Jan Kiszka 提交于 6月 16, 2014

We already implemented them but failed to advertise them. Currently they
all return the identical values to the capability MSRs they are
augmenting. So there is no change in exposed features yet.

Drop related comments at this chance that are partially incorrect and
redundant anyway.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3dbcd8da

arch/x86/kvm/vmx.c: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE · bc39c4db

由 Fabian Frederick 提交于 6月 14, 2014

use mm.h definition

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bc39c4db

22 5月, 2014 2 次提交

KVM: vmx: DR7 masking on task switch emulation is wrong · 1f854112

由 Nadav Amit 提交于 5月 19, 2014

The DR7 masking which is done on task switch emulation should be in hex format
(clearing the local breakpoints enable bits 0,2,4 and 6).
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f854112

KVM: x86: get CPL from SS.DPL · ae9fedc7

由 Paolo Bonzini 提交于 5月 14, 2014

CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae9fedc7

08 5月, 2014 1 次提交

kvm: x86: emulate monitor and mwait instructions as nop · 87c00572

由 Gabriel L. Somlo 提交于 5月 07, 2014

Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.

Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.
Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87c00572

07 5月, 2014 6 次提交

KVM: vmx: handle_dr does not handle RSP correctly · a4ab9d0c

由 Nadav Amit 提交于 5月 07, 2014

The RSP register is not automatically cached, causing mov DR instruction with
RSP to fail. Instead the regular register accessing interface should be used.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a4ab9d0c

KVM: vmx: disable APIC virtualization in nested guests · 696dfd95

由 Paolo Bonzini 提交于 5月 07, 2014

While running a nested guest, we should disable APIC virtualization
controls (virtualized APIC register accesses, virtual interrupt
delivery and posted interrupts), because we do not expose them to
the nested guest.
Reported-by: NHu Yaohui <loki2441@gmail.com>
Suggested-by: NAbel Gordon <abel@stratoscale.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

696dfd95

KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr · 4291b588

由 Bandan Das 提交于 5月 06, 2014

Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4291b588

KVM: nVMX: fail on invalid vmclear/vmptrld pointer · 96ec1463

由 Bandan Das 提交于 5月 06, 2014

The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96ec1463

KVM: nVMX: additional checks on vmxon region · 3573e22c

由 Bandan Das 提交于 5月 06, 2014

Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3573e22c

KVM: nVMX: rearrange get_vmx_mem_address · 19677e32

由 Bandan Das 提交于 5月 06, 2014

Our common function for vmptr checks (in 2/4) needs to fetch
the memory address
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19677e32

28 4月, 2014 1 次提交

KVM: x86: Check for host supported fields in shadow vmcs · fe2b201b

由 Bandan Das 提交于 4月 21, 2014

We track shadow vmcs fields through two static lists,
one for read only and another for r/w fields. However, with
addition of new vmcs fields, not all fields may be supported on
all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
unsupported hosts will result in a vmwrite error. For example, commit
36be0b9d introduced GUEST_BNDCFGS, which is not supported
by all processors. Filter out host unsupported fields before
letting guests use shadow vmcs
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe2b201b

23 4月, 2014 4 次提交

KVM: nVMX: Advertise support for interrupt acknowledgement · e0ba1a6f

由 Bandan Das 提交于 4月 19, 2014

Some Type 1 hypervisors such as XEN won't enable VMX without it present
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e0ba1a6f

KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to · 77b0f5d6

由 Bandan Das 提交于 4月 19, 2014

This feature emulates the "Acknowledge interrupt on exit" behavior.
We can safely emulate it for L1 to run L2 even if L0 itself has it
disabled (to run L1).
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

77b0f5d6

KVM: nVMX: Don't advertise single context invalidation for invept · 4b855078

由 Bandan Das 提交于 4月 19, 2014

For single context invalidation, we fall through to global
invalidation in handle_invept() except for one case - when
the operand supplied by L1 is different from what we have in
vmcs12. However, typically hypervisors will only call invept
for the currently loaded eptp, so the condition will
never be true.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4b855078

KVM: VMX: Advance rip to after an ICEBP instruction · fd2a445a

由 Huw Davies 提交于 4月 16, 2014

When entering an exception after an ICEBP, the saved instruction
pointer should point to after the instruction.

This fixes the bug here: https://bugs.launchpad.net/qemu/+bug/1119686Signed-off-by: NHuw Davies <huw@codeweavers.com>
Reviewed-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fd2a445a

18 4月, 2014 1 次提交

KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1

由 Michael S. Tsirkin 提交于 3月 31, 2014

With KVM, MMIO is much slower than PIO, due to the need to
do page walk and emulation. But with EPT, it does not have to be: we
know the address from the VMCS so if the address is unique, we can look
up the eventfd directly, bypassing emulation.

Unfortunately, this only works if userspace does not need to match on
access length and data.  The implementation adds a separate FAST_MMIO
bus internally. This serves two purposes:
    - minimize overhead for old userspace that does not use eventfd with lengtth = 0
    - minimize disruption in other code (since we don't know the length,
      devices on the MMIO bus only get a valid address in write, this
      way we don't need to touch all devices to teach them to handle
      an invalid length)

At the moment, this optimization only has effect for EPT on x86.

It will be possible to speed up MMIO for NPT and MMU using the same
idea in the future.

With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
I was unable to detect any measureable slowdown to non-eventfd MMIO.

Making MMIO faster is important for the upcoming virtio 1.0 which
includes an MMIO signalling capability.

The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
pre-review and suggestions.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

68c3b4d1

15 4月, 2014 1 次提交

KVM: Disable SMAP for guests in EPT realmode and EPT unpaging mode · e1e746b3

由 Feng Wu 提交于 4月 01, 2014

SMAP is disabled if CPU is in non-paging mode in hardware.
However KVM always uses paging mode to emulate guest non-paging
mode with TDP. To emulate this behavior, SMAP needs to be
manually disabled when guest switches to non-paging mode.
Signed-off-by: NFeng Wu <feng.wu@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e1e746b3

17 3月, 2014 2 次提交

KVM: x86: handle missing MPX in nested virtualization · 93c4adc7

由 Paolo Bonzini 提交于 3月 05, 2014

When doing nested virtualization, we may be able to read BNDCFGS but
still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard
writes to the field with vmx_mpx_supported(), and similarly hide the
MSR from userspace if the processor does not support the field.

We could work around this with the generic MSR save/load machinery,
but there is only a limited number of MSR save/load slots and it is
not really worthwhile to waste one for a scenario that should not
happen except in the nested virtualization case.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93c4adc7

KVM: x86: Add nested virtualization support for MPX · 36be0b9d

由 Paolo Bonzini 提交于 2月 24, 2014

This is simple to do, the "host" BNDCFGS is either 0 or the guest value.
However, both controls have to be present. We cannot provide MPX if
we only have one of the "load BNDCFGS" or "clear BNDCFGS" controls.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

36be0b9d

11 3月, 2014 4 次提交

KVM: nVMX: Allow nested guests to run with dirty debug registers · d16c293e

由 Paolo Bonzini 提交于 2月 21, 2014

When preparing the VMCS02, the CPU-based execution controls is computed
by vmx_exec_control.  Turn off DR access exits there, too, if the
KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d16c293e

KVM: vmx: Allow the guest to run with dirty debug registers · 81908bf4

由 Paolo Bonzini 提交于 2月 21, 2014

When not running in guest-debug mode (i.e. the guest controls the debug
registers, having to take an exit for each DR access is a waste of time.
If the guest gets into a state where each context switch causes DR to be
saved and restored, this can take away as much as 40% of the execution
time from the guest.

If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
can let it write freely to the debug registers and reload them on the
next exit. We still need to exit on the first access, so that the
KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
accesses to the debug registers will not cause a vmexit.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81908bf4

KVM: vmx: we do rely on loading DR7 on entry · c845f9c6

由 Paolo Bonzini 提交于 2月 21, 2014

Currently, this works even if the bit is not in "min", because the bit is always
set in MSR_IA32_VMX_ENTRY_CTLS. Mention it for the sake of documentation, and
to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c845f9c6

KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f

由 Jan Kiszka 提交于 3月 07, 2014

It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9a7953f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功