提交 · bf998156d24bcb127318ad5bf531ac3bdfcd6449 · openeuler / raspberrypi-kernel

01 8月, 2010 1 次提交

KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages · bf998156

由 Huang Ying 提交于 5月 31, 2010

In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]
Reported-by: NMax Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf998156

19 5月, 2010 1 次提交

KVM: Let vcpu structure alignment be determined at runtime · 0ee75bea

由 Avi Kivity 提交于 4月 28, 2010

vmx and svm vcpus have different contents and therefore may have different
alignmment requirements. Let each specify its required alignment.
Signed-off-by: NAvi Kivity <avi@redhat.com>

0ee75bea

17 5月, 2010 3 次提交

KVM: Get rid of dead function gva_to_page() · 2a059bf4

由 Gui Jianfeng 提交于 4月 16, 2010

Nobody use gva_to_page() anymore, get rid of it.
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2a059bf4

KVM: use the correct RCU API for PROVE_RCU=y · 90d83dc3

由 Lai Jiangshan 提交于 4月 19, 2010

The RCU/SRCU API have already changed for proving RCU usage.

I got the following dmesg when PROVE_RCU=y because we used incorrect API.
This patch coverts rcu_deference() to srcu_dereference() or family API.

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/8550:
 #0:  (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm]
 #1:  (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm]

stack backtrace:
Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27
Call Trace:
 [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3
 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm]
 [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm]
 [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm]
 [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm]
 [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel]
 [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm]
 [<ffffffff810a8692>] ? unlock_page+0x27/0x2c
 [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1
 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm]
 [<ffffffff81060cfa>] ? up_read+0x23/0x3d
 [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6
 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db
 [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241
 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116
 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13
 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a
 [<ffffffff810021db>] system_call_fastpath+0x16/0x1b
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

90d83dc3

KVM: limit the number of pages per memory slot · 660c22c4

由 Takuya Yoshikawa 提交于 4月 13, 2010

This patch limits the number of pages per memory slot to make
us free from extra care about type issues.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

660c22c4

20 4月, 2010 2 次提交

KVM: Increase NR_IOBUS_DEVS limit to 200 · e80e2a60

由 Sridhar Samudrala 提交于 3月 30, 2010

This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.
Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e80e2a60

KVM: fix the handling of dirty bitmaps to avoid overflows · 87bf6e7d

由 Takuya Yoshikawa 提交于 4月 12, 2010

Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
  __set_bit() takes the offset as int, not long.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

87bf6e7d

01 3月, 2010 13 次提交

KVM: Convert kvm->requests_lock to raw_spinlock_t · 70e335e1

由 Avi Kivity 提交于 2月 18, 2010

The code relies on kvm->requests_lock inhibiting preemption.

Noted by Jan Kiszka.
Signed-off-by: NAvi Kivity <avi@redhat.com>

70e335e1

KVM: Introduce kvm_host_page_size · 8f0b1ab6

由 Joerg Roedel 提交于 1月 28, 2010

This patch introduces a generic function to find out the
host page size for a given gfn. This function is needed by
the kvm iommu code. This patch also simplifies the x86
host_mapping_level function.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8f0b1ab6

KVM: enable PCI multiple-segments for pass-through device · ab9f4ecb

由 Zhai, Edwin 提交于 1月 29, 2010

Enable optional parameter (default 0) - PCI segment (or domain) besides
BDF, when assigning PCI device to guest.
Signed-off-by: NZhai Edwin <edwin.zhai@intel.com>
Acked-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

ab9f4ecb

KVM: Lazify fpu activation and deactivation · 02daab21

由 Avi Kivity 提交于 12月 30, 2009

Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.

We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.
Signed-off-by: NAvi Kivity <avi@redhat.com>

02daab21

M
KVM: convert slots_lock to a mutex · 79fac95e
由 Marcelo Tosatti 提交于 12月 23, 2009
```
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
79fac95e
M
KVM: switch vcpu context to use SRCU · f656ce01
由 Marcelo Tosatti 提交于 12月 23, 2009
```
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
f656ce01
M
KVM: convert io_bus to SRCU · e93f8a0f
由 Marcelo Tosatti 提交于 12月 23, 2009
```
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
e93f8a0f
M
KVM: x86: switch kvm_set_memory_alias to SRCU update · a983fb23
由 Marcelo Tosatti 提交于 12月 23, 2009
```
Using a similar two-step procedure as for memslots.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
a983fb23

KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update · bc6678a3

由 Marcelo Tosatti 提交于 12月 23, 2009

Use two steps for memslot deletion: mark the slot invalid (which stops
instantiation of new shadow pages for that slot, but allows destruction),
then instantiate the new empty slot.

Also simplifies kvm_handle_hva locking.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

bc6678a3

KVM: use gfn_to_pfn_memslot in kvm_iommu_map_pages · 3ad26d81

由 Marcelo Tosatti 提交于 12月 23, 2009

So its possible to iommu map a memslot before making it visible to
kvm.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

3ad26d81

KVM: introduce gfn_to_pfn_memslot · 506f0d6f

由 Marcelo Tosatti 提交于 12月 23, 2009

Which takes a memslot pointer instead of using kvm->memslots.

To be used by SRCU convertion later.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

506f0d6f

M
KVM: split kvm_arch_set_memory_region into prepare and commit · f7784b8e
由 Marcelo Tosatti 提交于 12月 23, 2009
```
Required for SRCU convertion later.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
f7784b8e

KVM: modify memslots layout in struct kvm · 46a26bf5

由 Marcelo Tosatti 提交于 12月 23, 2009

Have a pointer to an allocated region inside struct kvm.

[alex: fix ppc book 3s]
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

46a26bf5

03 12月, 2009 7 次提交

KVM: introduce kvm_vcpu_on_spin · d255f4f2

由 Zhai, Edwin 提交于 10月 09, 2009

Introduce kvm_vcpu_on_spin, to be used by VMX/SVM to yield processing
once the cpu detects pause-based looping.
Signed-off-by: N"Zhai, Edwin" <edwin.zhai@intel.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d255f4f2

KVM: Activate Virtualization On Demand · 10474ae8

由 Alexander Graf 提交于 9月 15, 2009

X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

10474ae8

A
KVM: Move assigned device code to own file · bfd99ff5
由 Avi Kivity 提交于 8月 26, 2009
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
bfd99ff5

KVM: Move irq ack notifier list to arch independent code · 136bdfee

由 Gleb Natapov 提交于 8月 24, 2009

Mask irq notifier list is already there.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

136bdfee

KVM: Maintain back mapping from irqchip/pin to gsi · 3e71f88b

由 Gleb Natapov 提交于 8月 24, 2009

Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3e71f88b

KVM: Change irq routing table to use gsi indexed array · 46e624b9

由 Gleb Natapov 提交于 8月 24, 2009

Use gsi indexed array instead of scanning all entries on each interrupt
injection.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

46e624b9

KVM: Move irq sharing information to irqchip level · 1a6e4a8c

由 Gleb Natapov 提交于 8月 24, 2009

This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1a6e4a8c

19 9月, 2009 1 次提交

tracing: Remove markers · fc537766

由 Christoph Hellwig 提交于 9月 17, 2009

Now that the last users of markers have migrated to the event
tracer we can kill off the (now orphan) support code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090917173527.GA1699@lst.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fc537766

10 9月, 2009 12 次提交

KVM: Reduce runnability interface with arch support code · a1b37100

由 Gleb Natapov 提交于 7月 09, 2009

Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1b37100

KVM: Move kvm_cpu_get_interrupt() declaration to x86 code · 0b71785d

由 Gleb Natapov 提交于 7月 09, 2009

It is implemented only by x86.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

0b71785d

KVM: add ioeventfd support · d34e6b17

由 Gregory Haskins 提交于 7月 07, 2009

ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:       110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:      153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d34e6b17

KVM: make io_bus interface more robust · 090b7aff

由 Gregory Haskins 提交于 7月 07, 2009

Today kvm_io_bus_regsiter_dev() returns void and will internally BUG_ON
if it fails.  We want to create dynamic MMIO/PIO entries driven from
userspace later in the series, so we need to enhance the code to be more
robust with the following changes:

   1) Add a return value to the registration function
   2) Fix up all the callsites to check the return code, handle any
      failures, and percolate the error up to the caller.
   3) Add an unregister function that collapses holes in the array
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

090b7aff

KVM: remove in_range from io devices · bda9020e

由 Michael S. Tsirkin 提交于 6月 29, 2009

This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write
functions. in_range now becomes unused so it is removed from device ops in
favor of read/write callbacks performing range checks internally.

This allows aliasing (mostly for in-kernel virtio), as well as better error
handling by making it possible to pass errors up to userspace.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bda9020e

KVM: convert bus to slots_lock · 6c474694

由 Michael S. Tsirkin 提交于 6月 29, 2009

Use slots_lock to protect device list on the bus.  slots_lock is already
taken for read everywhere, so we only need to take it for write when
registering devices.  This is in preparation to removing in_range and
kvm->lock around it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6c474694

KVM: use vcpu_id instead of bsp_vcpu pointer in kvm_vcpu_is_bsp · d3efc8ef

由 Marcelo Tosatti 提交于 6月 17, 2009

Change kvm_vcpu_is_bsp to use vcpu_id instead of bsp_vcpu pointer, which
is only initialized at the end of kvm_vm_ioctl_create_vcpu.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d3efc8ef

KVM: remove old KVMTRACE support code · 2023a29c

由 Marcelo Tosatti 提交于 6月 18, 2009

Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2023a29c

KVM: Prepare memslot data structures for multiple hugepage sizes · ec04b260

由 Joerg Roedel 提交于 6月 19, 2009

[avi: fix build on non-x86]
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ec04b260

KVM: VMX: conditionally disable 2M pages · 54dee993

由 Marcelo Tosatti 提交于 6月 11, 2009

Disable usage of 2M pages if VMX_EPT_2MB_PAGE_BIT (bit 16) is clear
in MSR_IA32_VMX_EPT_VPID_CAP and EPT is enabled.

[avi: s/largepages_disabled/largepages_enabled/ to avoid negative logic]
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

54dee993

KVM: Use macro to iterate over vcpus. · 988a2cae

由 Gleb Natapov 提交于 6月 09, 2009

[christian: remove unused variables on s390]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

988a2cae

KVM: Break dependency between vcpu index in vcpus array and vcpu_id. · 73880c80

由 Gleb Natapov 提交于 6月 09, 2009

Archs are free to use vcpu_id as they see fit. For x86 it is used as
vcpu's apic id. New ioctl is added to configure boot vcpu id that was
assumed to be 0 till now.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

73880c80