提交 · 668f198f40d1cc89c2330c6ad56f3b397b05a0bc · openeuler / raspberrypi-kernel

10 3月, 2015 1 次提交

KVM: SVM: use kvm_register_write()/read() · 668f198f

由 David Kaplan 提交于 2月 20, 2015

KVM has nice wrappers to access the register values, clean up a few places
that should use them but currently do not.
Signed-off-by: NDavid Kaplan <david.kaplan@amd.com>
[forward port and testing]
Signed-off-by: NJoel Schopp <joel.schopp@amd.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

668f198f

03 3月, 2015 1 次提交

KVM: SVM: fix interrupt injection (apic->isr_count always 0) · f563db4b

由 Radim Krčmář 提交于 2月 27, 2015

In commit b4eef9b3, we started to use hwapic_isr_update() != NULL
instead of kvm_apic_vid_enabled(vcpu->kvm). This didn't work because
SVM had it defined and "apicv" path in apic_{set,clear}_isr() does not
change apic->isr_count, because it should always be 1. The initial
value of apic->isr_count was based on kvm_apic_vid_enabled(vcpu->kvm),
which is always 0 for SVM, so KVM could have injected interrupts when it
shouldn't.

Fix it by implicitly setting SVM's hwapic_isr_update to NULL and make the
initial isr_count depend on hwapic_isr_update() for good measure.

Fixes: b4eef9b3 ("kvm: x86: vmx: NULL out hwapic_isr_update() in case of !enable_apicv")
Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f563db4b

04 2月, 2015 1 次提交

x86: Store a per-cpu shadow copy of CR4 · 1e02ce4c

由 Andy Lutomirski 提交于 10月 24, 2014

Context switches and TLB flushes can change individual bits of CR4.
CR4 reads take several cycles, so store a shadow copy of CR4 in a
per-cpu variable.

To avoid wasting a cache line, I added the CR4 shadow to
cpu_tlbstate, which is already touched in switch_mm.  The heaviest
users of the cr4 shadow will be switch_mm and __switch_to_xtra, and
__switch_to_xtra is called shortly after switch_mm during context
switch, so the cacheline is likely to be hot.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Vince Weaver <vince@deater.net>
Cc: "hillf.zj" <hillf.zj@alibaba-inc.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

1e02ce4c

09 1月, 2015 1 次提交

KVM: x86: mmu: remove argument to kvm_init_shadow_mmu and kvm_init_shadow_ept_mmu · ad896af0

由 Paolo Bonzini 提交于 10月 02, 2013

The initialization function in mmu.c can always use walk_mmu, which
is known to be vcpu->arch.mmu.  Only init_kvm_nested_mmu is used to
initialize vcpu->arch.nested_mmu.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ad896af0

05 12月, 2014 1 次提交

kvm: x86: Add kvm_x86_ops hook that enables XSAVES for guest · 55412b2e

由 Wanpeng Li 提交于 12月 02, 2014

Expose the XSAVES feature to the guest if the kvm_x86_ops say it is
available.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

55412b2e

13 11月, 2014 1 次提交

kvm: svm: move WARN_ON in svm_adjust_tsc_offset · d913b904

由 Chris J Arges 提交于 11月 12, 2014

When running the tsc_adjust kvm-unit-test on an AMD processor with the
IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be
triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc
is called; however it may trigger unnecessary warnings.

This patch moves the WARN_ON to trigger only if __scale_tsc will actually be
called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common
s64 since this can have signed values.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d913b904

03 11月, 2014 1 次提交

KVM: vmx: Unavailable DR4/5 is checked before CPL · 16f8a6f9

由 Nadav Amit 提交于 10月 03, 2014

If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD
should be generated even if CPL>0. This is according to Intel SDM Table 6-2:
"Priority Among Simultaneous Exceptions and Interrupts".

Note, that this may happen on the first DR access, even if the host does not
sets debug breakpoints. Obviously, it occurs when the host debugs the guest.

This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr.
The emulator already checks DR4/5 availability in check_dr_read. Nested
virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject
exceptions to the guest.

As for SVM, the patch follows the previous logic as much as possible. Anyhow,
it appears the DR interception code might be buggy - even if the DR access
may cause an exception, the instruction is skipped.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

16f8a6f9

24 10月, 2014 2 次提交

kvm: x86: don't kill guest on unknown exit reason · 2bc19dc3

由 Michael S. Tsirkin 提交于 9月 18, 2014

KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application.  Let's not kill the guest: WARN
and inject #UD instead.

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bc19dc3

KVM: x86: Check non-canonical addresses upon WRMSR · 854e8bb1

由 Nadav Amit 提交于 9月 16, 2014

Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is
written to certain MSRs. The behavior is "almost" identical for AMD and Intel
(ignoring MSRs that are not implemented in either architecture since they would
anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if
non-canonical address is written on Intel but not on AMD (which ignores the top
32-bits).

Accordingly, this patch injects a #GP on the MSRs which behave identically on
Intel and AMD. To eliminate the differences between the architecutres, the
value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to
canonical value before writing instead of injecting a #GP.

Some references from Intel and AMD manuals:

According to Intel SDM description of WRMSR instruction #GP is expected on
WRMSR "If the source register contains a non-canonical address and ECX
specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE,
IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP."

According to AMD manual instruction manual:
LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the
LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical
form, a general-protection exception (#GP) occurs."
IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the
base field must be in canonical form or a #GP fault will occur."
IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must
be in canonical form."

This patch fixes CVE-2014-3610.

Cc: stable@vger.kernel.org
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

854e8bb1

11 9月, 2014 1 次提交

kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address. · 73a6d941

由 Tang Chen 提交于 9月 11, 2014

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of
apic access page. So use this macro.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Reviewed-by: NGleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73a6d941

03 9月, 2014 1 次提交

KVM: nSVM: propagate the NPF EXITINFO to the guest · 5e352519

由 Paolo Bonzini 提交于 9月 02, 2014

This is similar to what the EPT code does with the exit qualification.
This allows the guest to see a valid value for bits 33:32.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5e352519

29 8月, 2014 2 次提交

KVM: remove garbage arg to *hardware_{en,dis}able · 13a34e06

由 Radim Krčmář 提交于 8月 28, 2014

In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.
Signed-off-by: NRadim KrÄmÃ¡Å™ <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

13a34e06

KVM: x86: fix some sparse warnings · 48d89b92

由 Paolo Bonzini 提交于 8月 26, 2014

Sparse reports the following easily fixed warnings:

arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer
arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static?
arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer
arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static?

arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static?

Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

48d89b92

27 8月, 2014 1 次提交

x86: Replace __get_cpu_var uses · 89cbc767

由 Christoph Lameter 提交于 8月 17, 2014

__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	__this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	__this_cpu_inc(y)

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

89cbc767

22 8月, 2014 1 次提交

KVM: x86: introduce sched_in to kvm_x86_ops · ae97a3b8

由 Radim Krčmář 提交于 8月 21, 2014

sched_in preempt notifier is available for x86, allow its use in
specific virtualization technlogies as well.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae97a3b8

19 8月, 2014 1 次提交

KVM: x86: drop fpu_activate hook · 4473b570

由 Wanpeng Li 提交于 8月 18, 2014

The only user of the fpu_activate hook was dropped in commit
2d04a05b (KVM: x86 emulator: emulate CLTS internally, 2011-04-20).
vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for
Intel CLTS), but never from common code; hence, there's no need for
a hook.
Reviewed-by: NYang Zhang <yang.z.zhang@intel.com>
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4473b570

11 7月, 2014 2 次提交

KVM: x86: return all bits from get_interrupt_shadow · 37ccdcbe

由 Paolo Bonzini 提交于 5月 20, 2014

For the next patch we will need to know the full state of the
interrupt shadow; we will then set KVM_REQ_EVENT when one bit
is cleared.

However, right now get_interrupt_shadow only returns the one
corresponding to the emulated instruction, or an unconditional
0 if the emulated instruction does not have an interrupt shadow.
This is confusing and does not allow us to check for cleared
bits as mentioned above.

Clean the callback up, and modify toggle_interruptibility to
match the comment above the call.  As a small result, the
call to set_interrupt_shadow will be skipped in the common
case where int_shadow == 0 && mask == 0.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37ccdcbe

KVM: Synthesize G bit for all segments. · 80112c89

由 Jim Mattson 提交于 7月 08, 2014

We have noticed that qemu-kvm hangs early in the BIOS when runnning nested
under some versions of VMware ESXi.

The problem we believe is because KVM assumes that the platform preserves
the 'G' but for any segment register. The SVM specification itemizes the
segment attribute bits that are observed by the CPU, but the (G)ranularity bit
is not one of the bits itemized, for any segment. Though current AMD CPUs keep
track of the (G)ranularity bit for all segment registers other than CS, the
specification does not require it. VMware's virtual CPU may not track the
(G)ranularity bit for any segment register.

Since kvm already synthesizes the (G)ranularity bit for the CS segment. It
should do so for all segments. The patch below does that, and helps get rid of
the hangs. Patch applies on top of Linus' tree.
Signed-off-by: NJim Mattson <jmattson@vmware.com>
Signed-off-by: NAlok N Kataria <akataria@vmware.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

80112c89

10 7月, 2014 4 次提交

KVM: nSVM: Set correct port for IOIO interception evaluation · 6cbc5f5a

由 Jan Kiszka 提交于 6月 30, 2014

Obtaining the port number from DX is bogus as a) there are immediate
port accesses and b) user space may have changed the register content
while processing the PIO access. Forward the correct value from the
instruction emulator instead.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6cbc5f5a

KVM: nSVM: Fix IOIO size reported on emulation · 6493f157

由 Jan Kiszka 提交于 6月 30, 2014

The access size of an in/ins is reported in dst_bytes, and that of
out/outs in src_bytes.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6493f157

KVM: nSVM: Fix IOIO bitmap evaluation · 9bf41833

由 Jan Kiszka 提交于 6月 30, 2014

First, kvm_read_guest returns 0 on success. And then we need to take the
access size into account when testing the bitmap: intercept if any of
bits corresponding to the access is set.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9bf41833

KVM: nSVM: Do not report CLTS via SVM_EXIT_WRITE_CR0 to L1 · 62baf44c

由 Jan Kiszka 提交于 6月 29, 2014

CLTS only changes TS which is not monitored by selected CR0
interception. So skip any attempt to translate WRITE_CR0 to
CR0_SEL_WRITE for this instruction.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

62baf44c

30 6月, 2014 1 次提交

KVM: SVM: Fix CPL export via SS.DPL · 33b458d2

由 Jan Kiszka 提交于 6月 29, 2014

We import the CPL via SS.DPL since ae9fedc7. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

33b458d2

22 5月, 2014 1 次提交

KVM: x86: get CPL from SS.DPL · ae9fedc7

由 Paolo Bonzini 提交于 5月 14, 2014

CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS.  And CS.DPL is also not equal
to the CPL for conforming code segments.

However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).

So this patch:

- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back

- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions).  It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.

This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ae9fedc7

08 5月, 2014 1 次提交

kvm: x86: emulate monitor and mwait instructions as nop · 87c00572

由 Gabriel L. Somlo 提交于 5月 07, 2014

Treat monitor and mwait instructions as nop, which is architecturally
correct (but inefficient) behavior. We do this to prevent misbehaving
guests (e.g. OS X <= 10.7) from crashing after they fail to check for
monitor/mwait availability via cpuid.

Since mwait-based idle loops relying on these nop-emulated instructions
would keep the host CPU pegged at 100%, do NOT advertise their presence
via cpuid, to prevent compliant guests from using them inadvertently.
Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87c00572

17 3月, 2014 1 次提交

KVM: x86: handle missing MPX in nested virtualization · 93c4adc7

由 Paolo Bonzini 提交于 3月 05, 2014

When doing nested virtualization, we may be able to read BNDCFGS but
still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard
writes to the field with vmx_mpx_supported(), and similarly hide the
MSR from userspace if the processor does not support the field.

We could work around this with the generic MSR save/load machinery,
but there is only a limited number of MSR save/load slots and it is
not really worthwhile to waste one for a scenario that should not
happen except in the nested virtualization case.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93c4adc7

13 3月, 2014 1 次提交

KVM: SVM: fix cr8 intercept window · 596f3142

由 Radim Krčmář 提交于 3月 11, 2014

We always disable cr8 intercept in its handler, but only re-enable it
if handling KVM_REQ_EVENT, so there can be a window where we do not
intercept cr8 writes, which allows an interrupt to disrupt a higher
priority task.

Fix this by disabling intercepts in the same function that re-enables
them when needed. This fixes BSOD in Windows 2008.

Cc: <stable@vger.kernel.org>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

596f3142

11 3月, 2014 3 次提交

KVM: svm: Allow the guest to run with dirty debug registers · facb0139

由 Paolo Bonzini 提交于 2月 21, 2014

When not running in guest-debug mode (i.e. the guest controls the debug
registers, having to take an exit for each DR access is a waste of time.
If the guest gets into a state where each context switch causes DR to be
saved and restored, this can take away as much as 40% of the execution
time from the guest.

If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
can let it write freely to the debug registers and reload them on the
next exit. We still need to exit on the first access, so that the
KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
accesses to the debug registers will not cause a vmexit.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

facb0139

KVM: svm: set/clear all DR intercepts in one swoop · 5315c716

由 Paolo Bonzini 提交于 3月 03, 2014

Unlike other intercepts, debug register intercepts will be modified
in hot paths if the guest OS is bad or otherwise gets tricked into
doing so.

Avoid calling recalc_intercepts 16 times for debug registers.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5315c716

KVM: x86: Remove return code from enable_irq/nmi_window · c9a7953f

由 Jan Kiszka 提交于 3月 07, 2014

It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9a7953f

18 2月, 2014 1 次提交

KVM: SVM: fix NMI window after iret · f303b4ce

由 Radim Krčmář 提交于 1月 17, 2014

We should open NMI window right after an iret, but SVM exits before it.
We wanted to single step using the trap flag and then open it.
(or we could emulate the iret instead)
We don't do it since commit 3842d135 (likely), because the iret exit
handler does not request an event, so NMI window remains closed until
the next exit.

Fix this by making KVM_REQ_EVENT request in the iret handler.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f303b4ce

17 1月, 2014 1 次提交

KVM: SVM: Fix reading of DR6 · 73aaf249

由 Jan Kiszka 提交于 1月 04, 2014

In contrast to VMX, SVM dose not automatically transfer DR6 into the
VCPU's arch.dr6. So if we face a DR6 read, we must consult a new vendor
hook to obtain the current value. And as SVM now picks the DR6 state
from its VMCB, we also need a set callback in order to write updates of
DR6 back.

Fixes a regression of 020df079.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73aaf249

03 10月, 2013 1 次提交

KVM: mmu: change useless int return types to void · 8a3c1a33

由 Paolo Bonzini 提交于 10月 02, 2013

kvm_mmu initialization is mostly filling in function pointers, there is
no way for it to fail.  Clean up unused return values.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8a3c1a33

27 6月, 2013 1 次提交

kvm: Add a tracepoint write_tsc_offset · 489223ed

由 Yoshihiro YUNOMAE 提交于 6月 12, 2013

Add a tracepoint write_tsc_offset for tracing TSC offset change.
We want to merge ftrace's trace data of guest OSs and the host OS using
TSC for timestamp in chronological order. We need "TSC offset" values for
each guest when merge those because the TSC value on a guest is always the
host TSC plus guest's TSC offset. If we get the TSC offset values, we can
calculate the host TSC value for each guest events from the TSC offset and
the event TSC value. The host TSC values of the guest events are used when we
want to merge trace data of guests and the host in chronological order.
(Note: the trace_clock of both the host and the guest must be set x86-tsc in
this case)

This tracepoint also records vcpu_id which can be used to merge trace data for
SMP guests. A merge tool will read TSC offset for each vcpu, then the tool
converts guest TSC values to host TSC values for each vcpu.

TSC offset is stored in the VMCS by vmx_write_tsc_offset() or
vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots.
The latter function is executed when kvm clock is updated. Only host can read
TSC offset value from VMCS, so a host needs to output TSC offset value
when TSC offset is changed.

Since the TSC offset is not often changed, it could be overwritten by other
frequent events while tracing. To avoid that, I recommend to use a special
instance for getting this event:

1. set a instance before booting a guest
 # cd /sys/kernel/debug/tracing/instances
 # mkdir tsc_offset
 # cd tsc_offset
 # echo x86-tsc > trace_clock
 # echo 1 > events/kvm/kvm_write_tsc_offset/enable

2. boot a guest
Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

489223ed

03 5月, 2013 1 次提交

KVM: x86: Account for failing enable_irq_window for NMI window request · 03b28f81

由 Jan Kiszka 提交于 4月 29, 2013

With VMX, enable_irq_window can now return -EBUSY, in which case an
immediate exit shall be requested before entering the guest. Account for
this also in enable_nmi_window which uses enable_irq_window in absence
of vnmi support, e.g.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

03b28f81

28 4月, 2013 2 次提交

KVM: x86: Rework request for immediate exit · 730dca42

由 Jan Kiszka 提交于 4月 28, 2013

The VMX implementation of enable_irq_window raised
KVM_REQ_IMMEDIATE_EXIT after we checked it in vcpu_enter_guest. This
caused infinite loops on vmentry. Fix it by letting enable_irq_window
signal the need for an immediate exit via its return value and drop
KVM_REQ_IMMEDIATE_EXIT.

This issue only affects nested VMX scenarios.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

730dca42

kvm, svm: Fix typo in printk message · 6614c7d0

由 Borislav Petkov 提交于 4月 26, 2013

It is "exit_int_info". It is actually EXITINTINFO in the official docs
but we don't like screaming docs.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

6614c7d0

17 4月, 2013 2 次提交

KVM: VMX: Add the deliver posted interrupt algorithm · a20ed54d

由 Yang Zhang 提交于 4月 11, 2013

Only deliver the posted interrupt when target vcpu is running
and there is no previous interrupt pending in pir.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a20ed54d

KVM: VMX: Enable acknowledge interupt on vmexit · a547c6db

由 Yang Zhang 提交于 4月 11, 2013

The "acknowledge interrupt on exit" feature controls processor behavior
for external interrupt acknowledgement. When this control is set, the
processor acknowledges the interrupt controller to acquire the
interrupt vector on VM exit.

After enabling this feature, an interrupt which arrived when target cpu is
running in vmx non-root mode will be handled by vmx handler instead of handler
in idt. Currently, vmx handler only fakes an interrupt stack and jump to idt
table to let real handler to handle it. Further, we will recognize the interrupt
and only delivery the interrupt which not belong to current vcpu through idt table.
The interrupt which belonged to current vcpu will be handled inside vmx handler.
This will reduce the interrupt handle cost of KVM.

Also, interrupt enable logic is changed if this feature is turnning on:
Before this patch, hypervior call local_irq_enable() to enable it directly.
Now IF bit is set on interrupt stack frame, and will be enabled on a return from
interrupt handler if exterrupt interrupt exists. If no external interrupt, still
call local_irq_enable() to enable it.

Refer to Intel SDM volum 3, chapter 33.2.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a547c6db

03 4月, 2013 1 次提交

x86, cpu: Convert AMD Erratum 383 · e6ee94d5

由 Borislav Petkov 提交于 3月 20, 2013

Convert the AMD erratum 383 testing code to the bug infrastructure. This
allows keeping the AMD-specific erratum testing machinery private to
amd.c and not export symbols to modules needlessly.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1363788448-31325-6-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

e6ee94d5