提交 · 0f6c0a740b7d3e1f3697395922d674000f83d060 · openeuler / Kernel

31 7月, 2014 1 次提交

KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table · 0f6c0a74

由 Paolo Bonzini 提交于 7月 30, 2014

Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked. However, this can cause a bug that manifests
as an interrupt storm inside the guest. Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)

The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.

As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).

Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work. What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.

My guess was that the guest is masking the interrupt in the redirection
table in the interrupt routine, i.e. while the interrupt is set in a
LAPIC's ISR, The simplest fix is to ignore the masking state, we would
rather have an unnecessary exit rather than a missed IRQ ACK and anyway
IOAPIC interrupts are not as performance-sensitive as for example MSIs.
Alex tested this patch and it fixed his bug.

[Thanks to Alex for his precise description of the problem
and initial debugging effort. A lot of the text above is
based on emails exchanged with him.]
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f6c0a74

30 7月, 2014 1 次提交

KVM: vmx: remove duplicate vmx_mpx_supported() prototype · 296f0475

由 Chris J Arges 提交于 7月 29, 2014

Remove a prototype which was added by both 93c4adc7 and 36be0b9d.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

296f0475

25 7月, 2014 2 次提交

x86/kvm: Resolve shadow warning from min macro · b55a8144

由 Mark Rustad 提交于 7月 25, 2014

Resolve a shadow warning generated in W=2 builds by the nested
use of the min macro by instead using the min3 macro for the
minimum of 3 values.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b55a8144

kvm: Resolve missing-field-initializers warnings · 25f97ff4

由 Mark Rustad 提交于 7月 25, 2014

Resolve missing-field-initializers warnings seen in W=2 kernel
builds by having macros generate more elaborated initializers.
That is enough to silence the warnings.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25f97ff4

24 7月, 2014 4 次提交

Replace NR_VMX_MSR with its definition · 03916db9

由 Paolo Bonzini 提交于 7月 24, 2014

Using ARRAY_SIZE directly makes it easier to read the code.  While touching
the code, replace the division by a multiplication in the recently added
BUILD_BUG_ON.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03916db9

KVM: x86: Assertions to check no overrun in MSR lists · 0123be42

由 Nadav Amit 提交于 7月 24, 2014

Currently there is no check whether shared MSRs list overrun the allocated size
which can results in bugs. In addition there is no check that vmx->guest_msrs
has sufficient space to accommodate all the VMX msrs.  This patch adds the
assertions.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0123be42

KVM: x86: set rflags.rf during fault injection · d6e8c854

由 Nadav Amit 提交于 7月 24, 2014

x86 does not automatically set rflags.rf during event injection. This patch
does partial job, setting rflags.rf upon fault injection. It does not handle
the setting of RF upon interrupt injection on rep-string instruction.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d6e8c854

KVM: x86: Setting rflags.rf during rep-string emulation · b9a1ecb9

由 Nadav Amit 提交于 7月 24, 2014

This patch updates RF for rep-string emulation. The flag is set upon the first
iteration, and cleared after the last (if emulated). It is intended to make
sure that if a trap (in future data/io #DB emulation) or interrupt is delivered
to the guest during the rep-string instruction, RF will be set correctly. RF
affects whether instruction breakpoint in the guest is masked.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9a1ecb9

22 7月, 2014 1 次提交

Merge tag 'kvm-s390-20140721' of... · c756ad03

由 Paolo Bonzini 提交于 7月 22, 2014

Merge tag 'kvm-s390-20140721' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

Bugfixes
--------
- add IPTE to trace event decoder
- document and advertise KVM_CAP_S390_IRQCHIP

Cleanups
--------
- Reuse kvm_vcpu_block for s390
- Get rid of tasklet for wakup processing

c756ad03

21 7月, 2014 19 次提交

KVM: x86: DR6/7.RTM cannot be written · 6f43ed01

由 Nadav Amit 提交于 7月 15, 2014

Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6f43ed01

KVM: nVMX: clean up nested_release_vmcs12 and code around it · 9a2a05b9

由 Paolo Bonzini 提交于 7月 17, 2014

Make nested_release_vmcs12 idempotent.
Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a2a05b9

KVM: nVMX: fix lifetime issues for vmcs02 · 4fa7734c

由 Paolo Bonzini 提交于 7月 17, 2014

free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in
order to detach it from the shadow vmcs.  However, this is not
available anymore after commit 26a865f4 (KVM: VMX: fix use after
free of vmx->loaded_vmcs, 2014-01-03).

Revert that patch, and fix its problem by forcing a vmcs01 as the
active VMCS before freeing all the nested VMX state.
Reported-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4fa7734c

KVM: x86: Defining missing x86 vectors · c9cdd085

由 Nadav Amit 提交于 7月 21, 2014

Defining XE, XM and VE vector numbers.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9cdd085

KVM: x86: emulator injects #DB when RFLAGS.RF is set · 4161a569

由 Nadav Amit 提交于 7月 17, 2014

If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints.
However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes
this behavior. KVM, however, still appears not to update RFLAGS.RF correctly,
regardless of this patch.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4161a569

KVM: x86: Cleanup of rflags.rf cleaning · 6c6cb69b

由 Nadav Amit 提交于 7月 21, 2014

RFLAGS.RF was cleaned in several functions (e.g., syscall) in the x86 emulator.
Now that we clear it before the execution of an instruction in the emulator, we
can remove the specific cleanup of RFLAGS.RF.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c6cb69b

KVM: x86: Clear rflags.rf on emulated instructions · 4467c3f1

由 Nadav Amit 提交于 7月 21, 2014

When an instruction is emulated RFLAGS.RF should be cleared. KVM previously did
not do so. This patch clears RFLAGS.RF after interception is done. If a fault
occurs during the instruction, RFLAGS.RF will be set by a previous patch. This
patch does not handle the case of traps/interrupts during rep-strings. Traps
are only expected to occur on debug watchpoints, and those are anyhow not
handled by the emulator.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4467c3f1

KVM: x86: popf emulation should not change RF · 163b135e

由 Nadav Amit 提交于 7月 21, 2014

RFLAGS.RF is always zero after popf. Therefore, popf should not updated RF, as
anyhow emulating popf, just as any other instruction should clear RFLAGS.RF.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

163b135e

KVM: x86: Clearing rflags.rf upon skipped emulated instruction · bb663c7a

由 Nadav Amit 提交于 7月 21, 2014

When skipping an emulated instruction, rflags.rf should be cleared as it would
be on real x86 CPU.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb663c7a

Merge tag 'kvm-s390-20140715' of... · ec10b727

由 Paolo Bonzini 提交于 7月 21, 2014

Merge tag 'kvm-s390-20140715' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

This series enables the "KVM_(S|G)ET_MP_STATE" ioctls on s390 to make
the cpu state settable by user space.

This is necessary to avoid races in s390 SIGP/reset handling which
happen because some SIGPs are handled in QEMU, while others are
handled in the kernel. Together with the busy conditions as return
value of SIGP races happen especially in areas like starting and
stopping of CPUs. (For example, there is a program 'cpuplugd', that
runs on several s390 distros which does automatic onlining and
offlining on cpus.)

As soon as the MPSTATE interface is used, user space takes complete
control of the cpu states. Otherwise the kernel will use the old way.

Therefore, the new kernel continues to work fine with old QEMUs.

ec10b727

KVM: s390: add ipte to trace event decoding · e59d120f

由 Christian Borntraeger 提交于 7月 16, 2014

IPTE intercept can happen, let's decode that.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>

e59d120f

KVM: s390: advertise KVM_CAP_S390_IRQCHIP · 78599d90

由 Cornelia Huck 提交于 7月 15, 2014

We should advertise all capabilities, including those that can
be enabled.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

78599d90

KVM: s390: document KVM_CAP_S390_IRQCHIP · 8a366a4b

由 Cornelia Huck 提交于 6月 27, 2014

Let's document that this is a capability that may be enabled per-vm.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

8a366a4b

KVM: document target of capability enablement · 0907c855

由 Cornelia Huck 提交于 6月 27, 2014

Capabilities can be enabled on a vcpu or (since recently) on a vm. Document
this and note for the existing capabilites whether they are per-vcpu or
per-vm.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0907c855

KVM: s390: remove the tasklet used by the hrtimer · ea74c0ea

由 David Hildenbrand 提交于 5月 16, 2014

We can get rid of the tasklet used for waking up a VCPU in the hrtimer
code but wakeup the VCPU directly.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

ea74c0ea

KVM: s390: move vcpu wakeup code to a central point · 0e9c85a5

由 David Hildenbrand 提交于 5月 16, 2014

Let's move the vcpu wakeup code to a central point.

We should set the vcpu->preempted flag only if the target is actually sleeping
and before the real wakeup happens. Otherwise the preempted flag might be set,
when not necessary. This may result in immediate reschedules after schedule()
in some scenarios.

The wakeup code doesn't require the local_int.lock to be held.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0e9c85a5

KVM: s390: remove _bh locking from start_stop_lock · 433b9ee4

由 David Hildenbrand 提交于 5月 06, 2014

The start_stop_lock is no longer acquired when in atomic context, therefore we
can convert it into an ordinary spin_lock.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

433b9ee4

KVM: s390: remove _bh locking from local_int.lock · 4ae3c081

由 David Hildenbrand 提交于 5月 16, 2014

local_int.lock is not used in a bottom-half handler anymore, therefore we can
turn it into an ordinary spin_lock at all occurrences.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

4ae3c081

KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block · 0759d068

由 David Hildenbrand 提交于 5月 13, 2014

This patch cleans up the code in handle_wait by reusing the common code
function kvm_vcpu_block.

signal_pending(), kvm_cpu_has_pending_timer() and kvm_arch_vcpu_runnable() are
sufficient for checking if we need to wake-up that VCPU. kvm_vcpu_block
uses these functions, so no checks are lost.

The flag "timer_due" can be removed - kvm_cpu_has_pending_timer() tests whether
the timer is pending, thus the vcpu is correctly woken up.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

0759d068

17 7月, 2014 1 次提交

KVM: nVMX: Fix virtual interrupt delivery injection · 963fee16

由 Wanpeng Li 提交于 7月 17, 2014

This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331,
after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is
some progress and the L2 can boot up, however, slowly. The original idea of this
fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>.

Interrupt which delivered by vid should be injected to L1 by L0 if current is in
L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't
have set External-interrupt exiting bit. The current logic doen't consider these
cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old
injection way if L1 doen't have External-interrupt exiting bit set.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: N"Zhang, Yang Z" <yang.z.zhang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

963fee16

11 7月, 2014 11 次提交

KVM: x86: Emulator support for #UD on CPL>0 · 68efa764

由 Nadav Amit 提交于 6月 18, 2014

Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
are executed in user mode. This is in contrast to the regular privileged
instructions which cause #GP. In order not to mess with SVM interception of
mwait and monitor which assumes privilege level assertions take place before
interception, a flag has been added.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

68efa764

KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode · 10e38fc7

由 Nadav Amit 提交于 6月 18, 2014

Certain instructions, such as monitor and xsave do not support big real mode
and cause a #GP exception if any of the accessed bytes effective address are
not within [0, 0xffff]. This patch introduces a flag to mark these
instructions, including the necassary checks.
Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10e38fc7

KVM: x86: use kvm_read_guest_page for emulator accesses · 44583cba

由 Paolo Bonzini 提交于 5月 13, 2014

Emulator accesses are always done a page at a time, either by the emulator
itself (for fetches) or because we need to query the MMU for address
translations. Speed up these accesses by using kvm_read_guest_page
and, in the case of fetches, by inlining kvm_read_guest_virt_helper and
dropping the loop around kvm_read_guest_page.

This final tweak saves 30-100 more clock cycles (4-10%), bringing the
count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on
a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series
and 925-1700 after the first two low-hanging fruit changes.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44583cba

KVM: x86: ensure emulator fetches do not span multiple pages · 719d5a9b

由 Paolo Bonzini 提交于 6月 19, 2014

When the CS base is not page-aligned, the linear address of the code could
get close to the page boundary (e.g. 0x...ffe) even if the EIP value is
not. So we need to first linearize the address, and only then compute
the number of valid bytes that can be fetched.

This happens relatively often when executing real mode code.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

719d5a9b

KVM: emulate: put pointers in the fetch_cache · 17052f16

由 Paolo Bonzini 提交于 5月 06, 2014

This simplifies the code a bit, especially the overflow checks.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17052f16

KVM: emulate: avoid per-byte copying in instruction fetches · 9506d57d

由 Paolo Bonzini 提交于 5月 06, 2014

We do not need a memory copying loop anymore in insn_fetch; we
can use a byte-aligned pointer to access instruction fields directly
from the fetch_cache. This eliminates 50-150 cycles (corresponding to
a 5-10% improvement in performance) from each instruction.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9506d57d

KVM: emulate: avoid repeated calls to do_insn_fetch_bytes · 5cfc7e0f

由 Paolo Bonzini 提交于 5月 06, 2014

do_insn_fetch_bytes will only be called once in a given insn_fetch and
insn_fetch_arr, because in fact it will only be called at most twice
for any instruction and the first call is explicit in x86_decode_insn.
This observation lets us hoist the call out of the memory copying loop.
It does not buy performance, because most fetches are one byte long
anyway, but it prepares for the next patch.

The overflow check is tricky, but correct. Because do_insn_fetch_bytes
has already been called once, we know that fc->end is at least 15. So
it is okay to subtract the number of bytes we want to read.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5cfc7e0f

KVM: emulate: speed up do_insn_fetch · 285ca9e9

由 Paolo Bonzini 提交于 5月 06, 2014

Hoist the common case up from do_insn_fetch_byte to do_insn_fetch,
and prime the fetch_cache in x86_decode_insn.  This helps a bit the
compiler and the branch predictor, but above all it lays the
ground for further changes in the next few patches.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

285ca9e9

KVM: emulate: do not initialize memopp · 41061cdb

由 Bandan Das 提交于 4月 16, 2014

rip_relative is only set if decode_modrm runs, and if you have ModRM
you will also have a memopp.  We can then access memopp unconditionally.
Note that rip_relative cannot be hoisted up to decode_modrm, or you
break "mov $0, xyz(%rip)".

Also, move typecast on "out of range value" of mem.ea to decode_modrm.

Together, all these optimizations save about 50 cycles on each emulated
instructions (4-6%).
Signed-off-by: NBandan Das <bsd@redhat.com>
[Fix immediate operands with rip-relative addressing. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

41061cdb

KVM: emulate: rework seg_override · 573e80fe

由 Bandan Das 提交于 4月 16, 2014

x86_decode_insn already sets a default for seg_override,
so remove it from the zeroed area. Also replace set/get functions
with direct access to the field.
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

573e80fe

KVM: emulate: clean up initializations in init_decode_cache · c44b4c6a

由 Bandan Das 提交于 4月 16, 2014

A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Optimize
placement of fields in x86_emulate_ctxt
Signed-off-by: NBandan Das <bsd@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c44b4c6a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功