提交 · 07975ad3b30579ca27d880491ad992326b930c63 · openeuler / raspberrypi-kernel

24 4月, 2012 1 次提交

KVM: Introduce direct MSI message injection for in-kernel irqchips · 07975ad3

由 Jan Kiszka 提交于 3月 29, 2012

Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.

By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07975ad3

20 4月, 2012 1 次提交

KVM: Fix page-crossing MMIO · f78146b0

由 Avi Kivity 提交于 4月 18, 2012

MMIO that are split across a page boundary are currently broken - the
code does not expect to be aborted by the exit to userspace for the
first MMIO fragment.

This patch fixes the problem by generalizing the current code for handling
16-byte MMIOs to handle a number of "fragments", and changes the MMIO
code to create those fragments.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f78146b0

19 4月, 2012 2 次提交

KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context · 2225fd56

由 Avi Kivity 提交于 4月 18, 2012

kvm_set_shared_msr() may not be called in preemptible context,
but vmx_set_msr() does so:

  BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713
  caller is kvm_set_shared_msr+0x32/0xa0 [kvm]
  Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39
  Call Trace:
   [<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100
   [<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm]
   [<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel]
   ...

Making kvm_set_shared_msr() work in preemptible is cleaner, but
it's used in the fast path.  Making two variants is overkill, so
this patch just disables preemption around the call.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2225fd56

KVM: MMU: use page table level macro · f71fa31f

由 Davidlohr Bueso 提交于 4月 18, 2012

Its much cleaner to use PT_PAGE_TABLE_LEVEL than its numeric value.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f71fa31f

17 4月, 2012 7 次提交

KVM: dont clear TMR on EOI · a0c9a822

由 Michael S. Tsirkin 提交于 4月 11, 2012

Intel spec says that TMR needs to be set/cleared
when IRR is set, but kvm also clears it on  EOI.

I did some tests on a real (AMD based) system,
and I see same TMR values both before
and after EOI, so I think it's a minor bug in kvm.

This patch fixes TMR to be set/cleared on IRR set
only as per spec.

And now that we don't clear TMR, we can save
an atomic read of TMR on EOI that's not propagated
to ioapic, by checking whether ioapic needs
a specific vector first and calculating
the mode afterwards.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a0c9a822

KVM: x86 emulator: implement MMX MOVQ (opcodes 0f 6f, 0f 7f) · e5971755

由 Avi Kivity 提交于 4月 09, 2012

Needed by some framebuffer drivers. See

https://bugzilla.kernel.org/show_bug.cgi?id=42779Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e5971755

KVM: x86 emulator: MMX support · cbe2c9d3

由 Avi Kivity 提交于 4月 09, 2012

General support for the MMX instruction set.  Special care is taken
to trap pending x87 exceptions so that they are properly reflected
to the guest.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

cbe2c9d3

KVM: x86 emulator: implement movntps · 3e114eb4

由 Avi Kivity 提交于 4月 09, 2012

Used to write to framebuffers (by at least Icaros).
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3e114eb4

KVM: x86: emulate movdqa · 49597d81

由 Stefan Hajnoczi 提交于 4月 09, 2012

An Ubuntu 9.10 Karmic Koala guest is unable to boot or install due to
missing movdqa emulation:

kvm_exit: reason EXCEPTION_NMI rip 0x7fef3e025a7b info 7fef3e799000 80000b0e
kvm_page_fault: address 7fef3e799000 error_code f
kvm_emulate_insn: 0:7fef3e025a7b: 66 0f 7f 07 (prot64)

movdqa %xmm0,(%rdi)

[avi: mark it explicitly aligned]
Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

49597d81

KVM: x86 emulator: add support for vector alignment · 1c11b376

由 Avi Kivity 提交于 4月 09, 2012

x86 defines three classes of vector instructions: explicitly
aligned (#GP(0) if unaligned, explicitly unaligned, and default
(which depends on the encoding: AVX is unaligned, SSE is aligned).

Add support for marking an instruction as explicitly aligned or
unaligned, and mark MOVDQU as unaligned.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

1c11b376

KVM: SVM: Auto-load on CPUs with SVM · ae759544

由 Josh Triplett 提交于 3月 28, 2012

Enable x86 feature-based autoloading for the kvm-amd module on CPUs
with X86_FEATURE_SVM.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ae759544

10 4月, 2012 1 次提交

KVM: PMU emulation: GLOBAL_CTRL MSR should be enabled on reset · f19a0c2c

由 Gleb Natapov 提交于 4月 09, 2012

On reset all MPU counters should be enabled in GLOBAL_CTRL MSR.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f19a0c2c

08 4月, 2012 11 次提交

KVM: MMU: Improve iteration through sptes from rmap · 1e3f42f0

由 Takuya Yoshikawa 提交于 3月 21, 2012

Iteration using rmap_next(), the actual body is pte_list_next(), is
inefficient: every time we call it we start from checking whether rmap
holds a single spte or points to a descriptor which links more sptes.

In the case of shadow paging, this quadratic total iteration cost is a
problem.  Even for two dimensional paging, with EPT/NPT on, in which we
almost always have a single mapping, the extra checks at the end of the
iteration should be eliminated.

This patch fixes this by introducing rmap_iterator which keeps the
iteration context for the next search.  Furthermore the implementation
of rmap_next() is splitted into two functions, rmap_get_first() and
rmap_get_next(), to avoid repeatedly checking whether the rmap being
iterated on has only one spte.

Although there seemed to be only a slight change for EPT/NPT, the actual
improvement was significant: we observed that GET_DIRTY_LOG for 1GB
dirty memory became 15% faster than before.  This is probably because
the new code is easy to make branch predictions.

Note: we just remove pte_list_next() because we can think of parent_ptes
as a reverse mapping.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1e3f42f0

KVM: MMU: Make pte_list_desc fit cache lines well · 220f773a

由 Takuya Yoshikawa 提交于 3月 21, 2012

We have PTE_LIST_EXT + 1 pointers in this structure and these 40/20
bytes do not fit cache lines well. Furthermore, some allocators may
use 64/32-byte objects for the pte_list_desc cache.

This patch solves this problem by changing PTE_LIST_EXT from 4 to 3.

For shadow paging, the new size is still large enough to hold both the
kernel and process mappings for usual anonymous pages. For file
mappings, there may be a slight change in the cache usage.

Note: with EPT/NPT we almost always have a single spte in each reverse
mapping and we will not see any change by this.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

220f773a

KVM: x86: add paging gcc optimization · c36fc04e

由 Davidlohr Bueso 提交于 3月 08, 2012

Since most guests will have paging enabled for memory management, add likely() optimization
around CR0.PG checks.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c36fc04e

KVM: VMX: Auto-load on CPUs with VMX · e9bda3b3

由 Josh Triplett 提交于 3月 20, 2012

Enable x86 feature-based autoloading for the kvm-intel module on CPUs
with X86_FEATURE_VMX.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Acked-By: NKay Sievers <kay@vrfy.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e9bda3b3

KVM: Switch to srcu-less get_dirty_log() · 60c34612

由 Takuya Yoshikawa 提交于 3月 03, 2012

We have seen some problems of the current implementation of
get_dirty_log() which uses synchronize_srcu_expedited() for updating
dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms
order of latency when we use VGA displays.

Furthermore the recent discussion on the following thread
    "srcu: Implement call_srcu()"
    http://lkml.org/lkml/2012/1/31/211
also motivated us to implement get_dirty_log() without SRCU.

This patch achieves this goal without sacrificing the performance of
both VGA and live migration: in practice the new code is much faster
than the old one unless we have too many dirty pages.

Implementation:

The key part of the implementation is the use of xchg() operation for
clearing dirty bits atomically.  Since this allows us to update only
BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap
until every dirty bit is cleared again for the next call.

Although some people may worry about the problem of using the atomic
memory instruction many times to the concurrently accessible bitmap,
it is usually accessed with mmu_lock held and we rarely see concurrent
accesses: so what we need to care about is the pure xchg() overheads.

Another point to note is that we do not use for_each_set_bit() to check
which ones in each BITS_PER_LONG pages are actually dirty.  Instead we
simply use __ffs() in a loop.  This is much faster than repeatedly call
find_next_bit().

Performance:

The dirty-log-perf unit test showed nice improvements, some times faster
than before, except for some extreme cases; for such cases the speed of
getting dirty page information is much faster than we process it in the
userspace.

For real workloads, both VGA and live migration, we have observed pure
improvements: when the guest was reading a file during live migration,
we originally saw a few ms of latency, but with the new method the
latency was less than 200us.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

60c34612

KVM: Avoid checking huge page mappings in get_dirty_log() · 5dc99b23

由 Takuya Yoshikawa 提交于 3月 01, 2012

Dropped such mappings when we enabled dirty logging and we will never
create new ones until we stop the logging.

For this we introduce a new function which can be used to write protect
a range of PT level pages: although we do not need to care about a range
of pages at this point, the following patch will need this feature to
optimize the write protection of many pages.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5dc99b23

KVM: MMU: Split the main body of rmap_write_protect() off from others · a0ed4607

由 Takuya Yoshikawa 提交于 3月 01, 2012

We will use this in the following patch to implement another function
which needs to write protect pages using the rmap information.

Note that there is a small change in debug printing for large pages:
we do not differentiate them from others to avoid duplicating code.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a0ed4607

KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRL · 1c0b28c2

由 Eric B Munson 提交于 3月 10, 2012

Now that we have a flag that will tell the guest it was suspended, create an
interface for that communication using a KVM ioctl.
Signed-off-by: NEric B Munson <emunson@mgebm.net>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1c0b28c2

KVM: Factor out kvm_vcpu_kick to arch-generic code · b6d33834

由 Christoffer Dall 提交于 3月 08, 2012

The kvm_vcpu_kick function performs roughly the same funcitonality on
most all architectures, so we shouldn't have separate copies.

PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
structure and to accomodate this special need a
__KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
kvm_arch_vcpu_wq have been defined. For all other architectures this
is a generic inline that just returns &vcpu->wq;
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b6d33834

KVM: SVM: count all irq windows exit · 675acb75

由 Jason Wang 提交于 3月 08, 2012

Also count the exits of fast-path.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

675acb75

KVM: x86: expose Intel cpu new features (HLE, RTM) to guest · 83c52915

由 Liu, Jinsong 提交于 2月 28, 2012

Intel recently release 2 new features, HLE and RTM.
Refer to http://software.intel.com/file/41417.
This patch expose them to guest.
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

83c52915

06 4月, 2012 2 次提交

KVM: VMX: vmx_set_cr0 expects kvm->srcu locked · 7a4f5ad0

由 Marcelo Tosatti 提交于 3月 27, 2012

vmx_set_cr0 is called from vcpu run context, therefore it expects
kvm->srcu to be held (for setting up the real-mode TSS).
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7a4f5ad0

S
KVM: PMU: Fix integer constant is too large warning in kvm_pmu_set_msr() · fea52953
由 Sasikantha babu 提交于 3月 21, 2012
```
Signed-off-by: NSasikantha babu <sasikanth.v19@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
fea52953

20 3月, 2012 2 次提交

x86: remove the second argument of k[un]map_atomic() · 8fd75e12

由 Cong Wang 提交于 11月 25, 2011

Acked-by: NAvi Kivity <avi@redhat.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NCong Wang <amwang@redhat.com>

8fd75e12

KVM: x86: fix kvm_write_tsc() TSC matching thinko · 02626b6a

由 Marcelo Tosatti 提交于 3月 08, 2012

kvm_write_tsc() converts from guest TSC to microseconds, not nanoseconds
as intended. The result is that the window for matching is 1000 seconds,
not 1 second.

Microsecond precision is enough for checking whether the TSC write delta
is within the heuristic values, so use it instead of nanoseconds.

Noted by Avi Kivity.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

02626b6a

08 3月, 2012 13 次提交

KVM: nVMX: Fix erroneous exception bitmap check · 95871901

由 Nadav Har'El 提交于 3月 06, 2012

The code which checks whether to inject a pagefault to L1 or L2 (in
nested VMX) was wrong, incorrect in how it checked the PF_VECTOR bit.
Thanks to Dan Carpenter for spotting this.
Signed-off-by: NNadav Har'El <nyh@il.ibm.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

95871901

KVM: Ignore the writes to MSR_K7_HWCR(3) · a223c313

由 Nicolae Mogoreanu 提交于 2月 21, 2012

When CPUID Fn8000_0001_EAX reports 0x00100f22 Windows 7 x64 guest
tries to set bit 3 in MSRC001_0015 in nt!KiDisableCacheErrataSource
and fails. This patch will ignore this step and allow things to move
on without having to fake CPUID value.
Signed-off-by: NNicolae Mogoreanu <mogoreanu@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a223c313

KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask · 4d6931c3

由 Davidlohr Bueso 提交于 3月 05, 2012

The reset_rsvds_bits_mask() function can use the guest walker's root level
number instead of using a separate 'level' variable.
Signed-off-by: NDavidlohr Bueso <dave@gnu.org>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4d6931c3

KVM: PMU: add proper support for fixed counter 2 · 62079d8a

由 Gleb Natapov 提交于 2月 26, 2012

Currently pmu emulation emulates fixed counter 2 as bus cycles
architectural counter, but since commit 9c1497ea perf has
pseudo encoding for it. Use it.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

62079d8a

KVM: PMU: Fix raw event check · fac33683

由 Gleb Natapov 提交于 2月 26, 2012

If eventsel has EDGE, INV or CMASK set we should create raw counter for
it, but the check is done on a wrong variable. Fix it.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fac33683

KVM: PMU: warn when pin control is set in eventsel msr · a7b9d2cc

由 Gleb Natapov 提交于 2月 26, 2012

Print warning once if pin control bit is set in eventsel msr since
emulation does not support it yet.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a7b9d2cc

KVM: VMX: Fix delayed load of shared MSRs · 9ee73970

由 Avi Kivity 提交于 3月 06, 2012

Shared MSRs (MSR_*STAR and related) are stored in both vmx->guest_msrs
and in the CPU registers, but vmx_set_msr() only updated memory. Prior
to 46199f33, this didn't matter, since we called vmx_load_host_state(),
which scheduled a vmx_save_host_state(), which re-synchronized the CPU
state, but now we don't, so the CPU state will not be synchronized until
the next exit to host userspace.  This mostly affects nested vmx workloads,
which play with these MSRs a lot.

Fix by loading the MSR eagerly.
Signed-off-by: NAvi Kivity <avi@redhat.com>

9ee73970

KVM: Allow host IRQ sharing for assigned PCI 2.3 devices · 07700a94

由 Jan Kiszka 提交于 2月 28, 2012

PCI 2.3 allows to generically disable IRQ sources at device level. This
enables us to share legacy IRQs of such devices with other host devices
when passing them to a guest.

The new IRQ sharing feature introduced here is optional, user space has
to request it explicitly. Moreover, user space can inform us about its
view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the
interrupt and signaling it if the guest masked it via the virtualized
PCI config space.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07700a94

KVM: Ensure all vcpus are consistent with in-kernel irqchip settings · 3e515705

由 Avi Kivity 提交于 3月 05, 2012

If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.

Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP

This is somewhat long winded because vcpu->arch.apic is created without
kvm->lock held.

Based on earlier patch by Michael Ellerman.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3e515705

KVM: x86 emulator: Allow PM/VM86 switch during task switch · 4cee4798

由 Kevin Wolf 提交于 2月 08, 2012

Task switches can switch between Protected Mode and VM86. The current
mode must be updated during the task switch emulation so that the new
segment selectors are interpreted correctly.

In order to let privilege checks succeed, rflags needs to be updated in
the vcpu struct as this causes a CPL update.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

4cee4798

KVM: SVM: Fix CPL updates · ea5e97e8

由 Kevin Wolf 提交于 2月 08, 2012

Keep CPL at 0 in real mode and at 3 in VM86. In protected/long mode, use
RPL rather than DPL of the code segment.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ea5e97e8

KVM: x86 emulator: VM86 segments must have DPL 3 · 66b0ab8f

由 Kevin Wolf 提交于 2月 08, 2012

Setting the segment DPL to 0 for at least the VM86 code segment makes
the VM entry fail on VMX.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

66b0ab8f

KVM: x86 emulator: Fix task switch privilege checks · 7f3d35fd

由 Kevin Wolf 提交于 2月 08, 2012

Currently, all task switches check privileges against the DPL of the
TSS. This is only correct for jmp/call to a TSS. If a task gate is used,
the DPL of this take gate is used for the check instead. Exceptions,
external interrupts and iret shouldn't perform any check.

[avi: kill kvm-kmod remnants]
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f3d35fd