提交 · c1a7b32a14138f908df52d7c53b5ce3415ec6b50 · openeuler / raspberrypi-kernel

05 6月, 2012 2 次提交

KVM: Avoid wasting pages for small lpage_info arrays · c1a7b32a

由 Takuya Yoshikawa 提交于 5月 20, 2012

lpage_info is created for each large level even when the memory slot is
not for RAM. This means that when we add one slot for a PCI device, we
end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc().

To make things worse, there is an increasing number of devices which
would result in more pages being wasted this way.

This patch mitigates this problem by using kvm_kvzalloc().
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c1a7b32a

KVM: Separate out dirty_bitmap allocation code as kvm_kvzalloc() · 92eca8fa

由 Takuya Yoshikawa 提交于 5月 20, 2012

Will be used for lpage_info allocation later.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

92eca8fa

01 5月, 2012 1 次提交

KVM: s390: Implement the directed yield (diag 9c) hypervisor call for KVM · 41628d33

由 Konstantin Weitz 提交于 4月 25, 2012

This patch implements the directed yield hypercall found on other
System z hypervisors. It delegates execution time to the virtual cpu
specified in the instruction's parameter.

Useful to avoid long spinlock waits in the guest.

Christian Borntraeger: moved common code in virt/kvm/
Signed-off-by: NKonstantin Weitz <WEITZKON@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

41628d33

24 4月, 2012 1 次提交

KVM: Introduce direct MSI message injection for in-kernel irqchips · 07975ad3

由 Jan Kiszka 提交于 3月 29, 2012

Currently, MSI messages can only be injected to in-kernel irqchips by
defining a corresponding IRQ route for each message. This is not only
unhandy if the MSI messages are generated "on the fly" by user space,
IRQ routes are a limited resource that user space has to manage
carefully.

By providing a direct injection path, we can both avoid using up limited
resources and simplify the necessary steps for user land.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

07975ad3

12 4月, 2012 1 次提交

KVM: unmap pages from the iommu when slots are removed · 32f6daad

由 Alex Williamson 提交于 4月 11, 2012

We've been adding new mappings, but not destroying old mappings.
This can lead to a page leak as pages are pinned using
get_user_pages, but only unpinned with put_page if they still
exist in the memslots list on vm shutdown.  A memslot that is
destroyed while an iommu domain is enabled for the guest will
therefore result in an elevated page reference count that is
never cleared.

Additionally, without this fix, the iommu is only programmed
with the first translation for a gpa.  This can result in
peer-to-peer errors if a mapping is destroyed and replaced by a
new mapping at the same gpa as the iommu will still be pointing
to the original, pinned memory address.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

32f6daad

08 4月, 2012 4 次提交

KVM: Remove unused dirty_bitmap_head and nr_dirty_pages · 93474b25

由 Takuya Yoshikawa 提交于 3月 01, 2012

Now that we do neither double buffering nor heuristic selection of the
write protection method these are not needed anymore.

Note: some drivers have their own implementation of set_bit_le() and
making it generic needs a bit of work; so we use test_and_set_bit_le()
and will later replace it with generic set_bit_le().
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93474b25

KVM: fix kvm_vcpu_kick build failure on S390 · 8c84780d

由 Marcelo Tosatti 提交于 3月 14, 2012

S390's kvm_vcpu_stat does not contain halt_wakeup member.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

8c84780d

KVM: Factor out kvm_vcpu_kick to arch-generic code · b6d33834

由 Christoffer Dall 提交于 3月 08, 2012

The kvm_vcpu_kick function performs roughly the same funcitonality on
most all architectures, so we shouldn't have separate copies.

PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch
structure and to accomodate this special need a
__KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function
kvm_arch_vcpu_wq have been defined. For all other architectures this
is a generic inline that just returns &vcpu->wq;
Acked-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NChristoffer Dall <c.dall@virtualopensystems.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b6d33834

KVM: resize kvm_io_range array dynamically · a1300716

由 Amos Kong 提交于 3月 09, 2012

This patch makes the kvm_io_range array can be resized dynamically.
Signed-off-by: NAmos Kong <akong@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a1300716

08 3月, 2012 7 次提交

KVM: use correct tlbs dirty type in cmpxchg · bec87d6e

由 Alex Shi 提交于 3月 04, 2012

Using 'int' type is not suitable for a 'long' object. So, correct it.
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bec87d6e

KVM: Ensure all vcpus are consistent with in-kernel irqchip settings · 3e515705

由 Avi Kivity 提交于 3月 05, 2012

If some vcpus are created before KVM_CREATE_IRQCHIP, then
irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading
to potential NULL pointer dereferences.

Fix by:
- ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called
- ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP

This is somewhat long winded because vcpu->arch.apic is created without
kvm->lock held.

Based on earlier patch by Michael Ellerman.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NAvi Kivity <avi@redhat.com>

3e515705

KVM: mmu_notifier: Flush TLBs before releasing mmu_lock · 565f3be2

由 Takuya Yoshikawa 提交于 2月 10, 2012

Other threads may process the same page in that small window and skip
TLB flush and then return before these functions do flush.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

565f3be2

KVM: Introduce kvm_memory_slot::arch and move lpage_info into it · db3fe4eb

由 Takuya Yoshikawa 提交于 2月 08, 2012

Some members of kvm_memory_slot are not used by every architecture.

This patch is the first step to make this difference clear by
introducing kvm_memory_slot::arch;  lpage_info is moved into it.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

db3fe4eb

KVM: Simplify ifndef conditional usage in __kvm_set_memory_region() · 189a2f7b

由 Takuya Yoshikawa 提交于 2月 08, 2012

Narrow down the controlled text inside the conditional so that it will
include lpage_info and rmap stuff only.

For this we change the way we check whether the slot is being created
from "if (npages && !new.rmap)" to "if (npages && !old.npages)".

We also stop checking if lpage_info is NULL when we create lpage_info
because we do it from inside the slot creation code block.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

189a2f7b

KVM: Split lpage_info creation out from __kvm_set_memory_region() · a64f273a

由 Takuya Yoshikawa 提交于 2月 08, 2012

This makes it easy to make lpage_info architecture specific.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a64f273a

KVM: Introduce gfn_to_index() which returns the index for a given level · fb03cb6f

由 Takuya Yoshikawa 提交于 2月 08, 2012

This patch cleans up the code and removes the "(void)level;" warning
suppressor.

Note that we can also use this for PT_PAGE_TABLE_LEVEL to treat every
level uniformly later.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fb03cb6f

05 3月, 2012 4 次提交

KVM: Move gfn_to_memslot() to kvm_host.h · 9d4cba7f

由 Paul Mackerras 提交于 1月 12, 2012

This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to
kvm_host.h to reduce the code duplication caused by the need for
non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call
gfn_to_memslot() in real mode.

Rather than putting gfn_to_memslot() itself in a header, which would
lead to increased code size, this puts __gfn_to_memslot() in a header.
Then, the non-modular uses of gfn_to_memslot() are changed to call
__gfn_to_memslot() instead.  This way there is only one place in the
source code that needs to be changed should the gfn_to_memslot()
implementation need to be modified.

On powerpc, the Book3S HV style of KVM has code that is called from
real mode which needs to call gfn_to_memslot() and thus needs this.
(Module code is allocated in the vmalloc region, which can't be
accessed in real mode.)

With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9d4cba7f

KVM: Add barriers to allow mmu_notifier_retry to be used locklessly · a355aa54

由 Paul Mackerras 提交于 12月 12, 2011

This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an
smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give
the correct answer when called without kvm->mmu_lock being held.
PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than
a single global spinlock in order to improve the scalability of updates
to the guest MMU hashed page table, and so needs this.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a355aa54

KVM: s390: ucontrol: export SIE control block to user · 5b1c1493

由 Carsten Otte 提交于 1月 04, 2012

This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault  is introduced for all
architectures. It allows to map architecture specific pages.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

5b1c1493

KVM: s390: add parameter for KVM_CREATE_VM · e08b9637

由 Carsten Otte 提交于 1月 04, 2012

This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a parameter to KVM_CREATE_VM that
allows to set bits that alter the capabilities of the newly created
virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
This requires CAP_SYS_ADMIN privileges and creates a user controlled
virtual machine on s390 architectures.
Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e08b9637

01 2月, 2012 1 次提交

KVM: Fix __set_bit() race in mark_page_dirty() during dirty logging · 50e92b3c

由 Takuya Yoshikawa 提交于 1月 04, 2012

It is possible that the __set_bit() in mark_page_dirty() is called
simultaneously on the same region of memory, which may result in only
one bit being set, because some callers do not take mmu_lock before
mark_page_dirty().

This problem is hard to produce because when we reach mark_page_dirty()
beginning from, e.g., tdp_page_fault(), mmu_lock is being held during
__direct_map():  making kvm-unit-tests' dirty log api test write to two
pages concurrently was not useful for this reason.

So we have confirmed that there can actually be race condition by
checking if some callers really reach there without holding mmu_lock
using spin_is_locked():  probably they were from kvm_write_guest_page().

To fix this race, this patch changes the bit operation to the atomic
version:  note that nr_dirty_pages also suffers from the race but we do
not need exactly correct numbers for now.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

50e92b3c

27 12月, 2011 12 次提交

KVM: ensure that debugfs entries have been created · 4f69b680

由 Hamo 提交于 12月 15, 2011

by checking the return value from kvm_init_debug, we
can ensure that the entries under debugfs for KVM have
been created correctly.
Signed-off-by: NYang Bai <hamo.by@gmail.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

4f69b680

KVM: drop bsp_vcpu pointer from kvm struct · d546cb40

由 Gleb Natapov 提交于 12月 15, 2011

Drop bsp_vcpu pointer from kvm struct since its only use is incorrect
anyway.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d546cb40

KVM: Use memdup_user instead of kmalloc/copy_from_user · ff5c2c03

由 Sasha Levin 提交于 12月 04, 2011

Switch to using memdup_user when possible. This makes code more
smaller and compact, and prevents errors.
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ff5c2c03

KVM: Use kmemdup() instead of kmalloc/memcpy · cdfca7b3

由 Sasha Levin 提交于 12月 04, 2011

Switch to kmemdup() in two places to shorten the code and avoid possible bugs.
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

cdfca7b3

KVM: introduce a table to map slot id to index in memslots array · f85e2cb5

由 Xiao Guangrong 提交于 11月 24, 2011

The operation of getting dirty log is frequent when framebuffer-based
displays are used(for example, Xwindow), so, we introduce a mapping table
to speed up id_to_memslot()
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f85e2cb5

KVM: sort memslots by its size and use line search · bf3e05bc

由 Xiao Guangrong 提交于 11月 24, 2011

Sort memslots base on its size and use line search to find it, so that the
larger memslots have better fit

The idea is from Avi
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf3e05bc

KVM: introduce id_to_memslot function · 28a37544

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce id_to_memslot to get memslot by slot id
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

28a37544

KVM: introduce kvm_for_each_memslot macro · be6ba0f0

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce kvm_for_each_memslot to walk all valid memslot
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

be6ba0f0

KVM: introduce update_memslots function · be593d62

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce update_memslots to update slot which will be update to
kvm->memslots
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

be593d62

KVM: introduce KVM_MEM_SLOTS_NUM macro · 93a5cef0

由 Xiao Guangrong 提交于 11月 24, 2011

Introduce KVM_MEM_SLOTS_NUM macro to instead of
KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

93a5cef0

KVM: Count the number of dirty pages for dirty logging · 7850ac54

由 Takuya Yoshikawa 提交于 11月 14, 2011

Needed for the next patch which uses this number to decide how to write
protect a slot.
Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7850ac54

KVM: Use kmemdup rather than duplicating its implementation · 6da64fdb

由 Thomas Meyer 提交于 11月 08, 2011

 Use kmemdup rather than duplicating its implementation

 The semantic patch that makes this change is available
 in scripts/coccinelle/api/memdup.cocci.

 More information about semantic patching is available at
 http://coccinelle.lip6.fr/Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6da64fdb

26 9月, 2011 1 次提交

KVM: Intelligent device lookup on I/O bus · 743eeb0b

由 Sasha Levin 提交于 7月 27, 2011

Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
is to call the read or write callback for each device registered
on the bus until we find a device which handles it.

Since the number of devices on a bus can be significant due to ioeventfds
and coalesced MMIO zones, this leads to a lot of overhead on each IO
operation.

Instead of registering devices, we now register ranges which points to
a device. Lookup is done using an efficient bsearch instead of a linear
search.

Performance test was conducted by comparing exit count per second with
200 ioeventfds created on one byte and the guest is trying to access a
different byte continuously (triggering usermode exits).
Before the patch the guest has achieved 259k exits per second, after the
patch the guest does 274k exits per second.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

743eeb0b

24 7月, 2011 2 次提交

KVM: MMU: mmio page fault support · ce88decf

由 Xiao Guangrong 提交于 7月 12, 2011

The idea is from Avi:

| We could cache the result of a miss in an spte by using a reserved bit, and
| checking the page fault error code (or seeing if we get an ept violation or
| ept misconfiguration), so if we get repeated mmio on a page, we don't need to
| search the slot list/tree.
| (https://lkml.org/lkml/2011/2/22/221)

When the page fault is caused by mmio, we cache the info in the shadow page
table, and also set the reserved bits in the shadow page table, so if the mmio
is caused again, we can quickly identify it and emulate it directly

Searching mmio gfn in memslots is heavy since we need to walk all memeslots, it
can be reduced by this feature, and also avoid walking guest page table for
soft mmu.

[jan: fix operator precedence issue]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ce88decf

KVM: MMU: filter out the mmio pfn from the fault pfn · fce92dce

由 Xiao Guangrong 提交于 7月 12, 2011

If the page fault is caused by mmio, the gfn can not be found in memslots, and
'bad_pfn' is returned on gfn_to_hva path, so we can use 'bad_pfn' to identify
the mmio page fault.
And, to clarify the meaning of mmio pfn, we return fault page instead of bad
page when the gfn is not allowd to prefetch
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fce92dce

12 7月, 2011 4 次提交

KVM: introduce kvm_read_guest_cached · e03b644f

由 Gleb Natapov 提交于 7月 11, 2011

Introduce kvm_read_guest_cached() function in addition to write one we
already have.

[ by glauber: export function signature in kvm header ]
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Tested-by: NEric Munson <emunson@mgebm.net>
Signed-off-by: NAvi Kivity <avi@redhat.com>

e03b644f

KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK · 1dda606c

由 Alexander Graf 提交于 6月 08, 2011

KVM has an ioctl to define which signal mask should be used while running
inside VCPU_RUN. At least for big endian systems, this mask is different
on 32-bit and 64-bit systems (though the size is identical).

Add a compat wrapper that converts the mask to whatever the kernel accepts,
allowing 32-bit kvm user space to set signal masks.

This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
32-bit user land.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

1dda606c

KVM: Clean up error handling during VCPU creation · d780592b

由 Jan Kiszka 提交于 5月 23, 2011

So far kvm_arch_vcpu_setup is responsible for freeing the vcpu struct if
it fails. Move this confusing resonsibility back into the hands of
kvm_vm_ioctl_create_vcpu. Only kvm_arch_vcpu_setup of x86 is affected,
all other archs cannot fail.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d780592b

KVM: use __copy_to_user/__clear_user to write guest page · 8b0cedff

由 Xiao Guangrong 提交于 5月 15, 2011

Simply use __copy_to_user/__clear_user to write guest page since we have
already verified the user address when the memslot is set
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

8b0cedff