提交 · f080480488028bcc25357f85e8ae54ccc3bb7173 · openanolis / cloud-kernel

06 11月, 2013 1 次提交

由 Gleb Natapov 提交于 11月 05, 2013

It was used in conjunction with KVM_SET_MEMORY_REGION ioctl which was
removed by b74a07be in 2010, QEMU stopped using it in 2008, so
it is time to remove the code finally.
Signed-off-by: NGleb Natapov <gleb@redhat.com>

80f5b5e7

05 11月, 2013 1 次提交

KVM: IOMMU: hva align mapping page size · 27ef63c7

由 Greg Edwards 提交于 11月 04, 2013

When determining the page size we could use to map with the IOMMU, the
page size should also be aligned with the hva, not just the gfn. The
gfn may not reflect the real alignment within the hugetlbfs file.

Most of the time, this works fine. However, if the hugetlbfs file is
backed by non-contiguous huge pages, a multi-huge page memslot starts at
an unaligned offset within the hugetlbfs file, and the gfn is aligned
with respect to the huge page size, kvm_host_page_size() will return the
huge page size and we will use that to map with the IOMMU.

When we later unpin that same memslot, the IOMMU returns the unmap size
as the huge page size, and we happily unpin that many pfns in
monotonically increasing order, not realizing we are spanning
non-contiguous huge pages and partially unpin the wrong huge page.

Ensure the IOMMU mapping page size is aligned with the hva corresponding
to the gfn, which does reflect the alignment within the hugetlbfs file.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGreg Edwards <gedwards@ddn.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGleb Natapov <gleb@redhat.com>

27ef63c7

31 10月, 2013 3 次提交

kvm: Create non-coherent DMA registeration · e0f0bbc5

由 Alex Williamson 提交于 10月 30, 2013

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this, adapting
legacy KVM device assignment and adding VFIO via the KVM-VFIO device.
For now we assume that non-coherent DMA is possible any time we have a
VFIO group. Eventually an interface can be developed as part of the
VFIO external user interface to query the coherency of a group.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0f0bbc5

kvm/x86: Convert iommu_flags to iommu_noncoherent · d96eb2c6

由 Alex Williamson 提交于 10月 30, 2013

Default to operating in coherent mode.  This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d96eb2c6

kvm: Add VFIO device · ec53500f

由 Alex Williamson 提交于 10月 30, 2013

So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but areas are cropping up where a connection beyond eventfds
and irqfds needs to be made. This patch introduces a KVM-VFIO device
that is meant to be a gateway for such interaction. The user creates
the device and can add and remove VFIO groups to it via file
descriptors. When a group is added, KVM verifies the group is valid
and gets a reference to it via the VFIO external user interface.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec53500f

30 10月, 2013 1 次提交

KVM: use a more sensible error number when debugfs directory creation fails · 0c8eb04a

由 Paolo Bonzini 提交于 10月 30, 2013

I don't know if this was due to cut and paste, or somebody was really
using a D20 to pick the error code for kvm_init_debugfs as suggested by
Linus (EFAULT is 14, so the possibility cannot be entirely ruled out).

In any case, this patch fixes it.
Reported-by: NTim Gardner <tim.gardner@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0c8eb04a

28 10月, 2013 1 次提交

KVM: Mapping IOMMU pages after updating memslot · e0230e13

由 Yang Zhang 提交于 10月 24, 2013

In kvm_iommu_map_pages(), we need to know the page size via call
kvm_host_page_size(). And it will check whether the target slot
is valid before return the right page size.
Currently, we will map the iommu pages when creating a new slot.
But we call kvm_iommu_map_pages() during preparing the new slot.
At that time, the new slot is not visible by domain(still in preparing).
So we cannot get the right page size from kvm_host_page_size() and
this will break the IOMMU super page logic.
The solution is to map the iommu pages after we insert the new slot
into domain.
Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
Tested-by: NPatrick Lu <patrick.lu@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0230e13

17 10月, 2013 2 次提交

kvm: Add struct kvm arg to memslot APIs · 5587027c

由 Aneesh Kumar K.V 提交于 10月 07, 2013

We will use that in the later patch to find the kvm ops handler
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5587027c

kvm: powerpc: book3s: Support building HV and PR KVM as module · 2ba9f0d8

由 Aneesh Kumar K.V 提交于 10月 07, 2013

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: squash in compile fix]
Signed-off-by: NAlexander Graf <agraf@suse.de>

2ba9f0d8

15 10月, 2013 1 次提交

KVM: Drop FOLL_GET in GUP when doing async page fault · f2e10669

由 chai wen 提交于 10月 14, 2013

Page pinning is not mandatory in kvm async page fault processing since
after async page fault event is delivered to a guest it accesses page once
again and does its own GUP.  Drop the FOLL_GET flag in GUP in async_pf
code, and do some simplifying in check/clear processing.
Suggested-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NGu zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Nchai wen <chaiw.fnst@cn.fujitsu.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f2e10669

03 10月, 2013 2 次提交

virt/kvm/iommu.c: Add leading zeros to device's BDF notation in debug messages · 29242cb5

由 Andre Richter 提交于 10月 02, 2013

When KVM (de)assigns PCI(e) devices to VMs, a debug message is printed
including the BDF notation of the respective device. Currently, the BDF
notation does not have the commonly used leading zeros. This produces
messages like "assign device 0:1:8.0", which look strange at first sight.

The patch fixes this by exchanging the printk(KERN_DEBUG ...) with dev_info()
and also inserts "kvm" into the debug message, so that it is obvious where
the message comes from. Also reduces LoC.
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NAndre Richter <andre.o.richter@gmail.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

29242cb5

Fix NULL dereference in gfn_to_hva_prot() · a2ac07fe

由 Gleb Natapov 提交于 10月 01, 2013

gfn_to_memslot() can return NULL or invalid slot. We need to check slot
validity before accessing it.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

a2ac07fe

30 9月, 2013 3 次提交

KVM: Convert kvm_lock back to non-raw spinlock · 2f303b74

由 Paolo Bonzini 提交于 9月 25, 2013

In commit e935b837 ("KVM: Convert kvm_lock to raw_spinlock"),
the kvm_lock was made a raw lock.  However, the kvm mmu_shrink()
function tries to grab the (non-raw) mmu_lock within the scope of
the raw locked kvm_lock being held.  This leads to the following:

BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0
Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm]

Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt
Call Trace:
 [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160
 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50
 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm]
 [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0
 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260
 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730
 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0
 [<ffffffff811185bf>] kswapd+0x18f/0x490
 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50
 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730
 [<ffffffff81060d2b>] kthread+0xdb/0xe0
 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100
 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10
 [<ffffffff81060c50>] ? __init_kthread_worker+0x

After the previous patch, kvm_lock need not be a raw spinlock anymore,
so change it back.
Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Cc: kvm@vger.kernel.org
Cc: gleb@redhat.com
Cc: jan.kiszka@siemens.com
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f303b74

KVM: protect kvm_usage_count with its own spinlock · 4a937f96

由 Paolo Bonzini 提交于 9月 10, 2013

The VM list need not be protected by a raw spinlock.  Separate the
two so that kvm_lock can be made non-raw.

Cc: kvm@vger.kernel.org
Cc: gleb@redhat.com
Cc: jan.kiszka@siemens.com
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a937f96

KVM: cleanup (physical) CPU hotplug · 4fa92fb2

由 Paolo Bonzini 提交于 9月 10, 2013

Remove the useless argument, and do not do anything if there are no
VMs running at the time of the hotplug.

Cc: kvm@vger.kernel.org
Cc: gleb@redhat.com
Cc: jan.kiszka@siemens.com
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4fa92fb2

25 9月, 2013 1 次提交

kvm: remove .done from struct kvm_async_pf · 98fda169

由 Radim Krčmář 提交于 9月 04, 2013

'.done' is used to mark the completion of 'async_pf_execute()', but
'cancel_work_sync()' returns true when the work was canceled, so we
use it instead.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98fda169

17 9月, 2013 2 次提交

kvm: free resources after canceling async_pf · 28b441e2

由 Radim Krčmář 提交于 9月 04, 2013

When we cancel 'async_pf_execute()', we should behave as if the work was
never scheduled in 'kvm_setup_async_pf()'.
Fixes a bug when we can't unload module because the vm wasn't destroyed.
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

28b441e2

KVM: mmu: allow page tables to be in read-only slots · ba6a3541

由 Paolo Bonzini 提交于 9月 09, 2013

Page tables in a read-only memory slot will currently cause a triple
fault because the page walker uses gfn_to_hva and it fails on such a slot.

OVMF uses such a page table; however, real hardware seems to be fine with
that as long as the accessed/dirty bits are set. Save whether the slot
is readonly, and later check it when updating the accessed and dirty bits.
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba6a3541

04 9月, 2013 1 次提交
- A
  kvm eventfd: switch to fdget · cffe78d9
  由 Al Viro 提交于 8月 30, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  cffe78d9
30 8月, 2013 3 次提交

ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs · 8d98915b

由 Christoffer Dall 提交于 8月 29, 2013

For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in
one word and since there are 32 private (per cpu) irqs, we have 8
private u32 fields on the vgic_bytemap struct.  We shift the offset from
the base of the register group right by 2, giving us the word index
instead of the field index.  But then there are 8 private words, not 4,
which is also why we subtract 8 words from the offset of the shared
words.
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

8d98915b

ARM: KVM: vgic: fix GICD_ICFGRn access · 6545eae3

由 Marc Zyngier 提交于 8月 29, 2013

All the code in handle_mmio_cfg_reg() assumes the offset has
been shifted right to accomodate for the 2:1 bit compression,
but this is only done when getting the register address.

Shift the offset early so the code works mostly unchanged.
Reported-by: NZhaobo (Bob, ERC) <zhaobo@huawei.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

6545eae3

ARM: KVM: vgic: simplify vgic_get_target_reg · 986af8e0

由 Marc Zyngier 提交于 8月 29, 2013

vgic_get_target_reg is quite complicated, for no good reason.
Actually, it is fairly easy to write it in a much more efficient
way by using the target CPU array instead of the bitmap.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

986af8e0

28 8月, 2013 1 次提交

KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmp · c21fbff1

由 Paolo Bonzini 提交于 8月 27, 2013

This is the type-safe comparison function, so the double-underscore is
not related.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

c21fbff1

27 8月, 2013 1 次提交

kvm: optimize away THP checks in kvm_is_mmio_pfn() · 11feeb49

由 Andrea Arcangeli 提交于 7月 25, 2013

The checks on PG_reserved in the page structure on head and tail pages
aren't necessary because split_huge_page wouldn't transfer the
PG_reserved bit from head to tail anyway.

This was a forward-thinking check done in the case PageReserved was
set by a driver-owned page mapped in userland with something like
remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
possible right now). It was meant to be very safe, but it's overkill
as it's unlikely split_huge_page could ever run without the driver
noticing and tearing down the hugepage itself.

And if a driver in the future will really want to map a reserved
hugepage in userland using an huge pmd it should simply take care of
marking all subpages reserved too to keep KVM safe. This of course
would require such a hypothetical driver to tear down the huge pmd
itself and splitting the hugepage itself, instead of relaying on
split_huge_page, but that sounds very reasonable, especially
considering split_huge_page wouldn't currently transfer the reserved
bit anyway.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

11feeb49

26 8月, 2013 1 次提交

kvm: use anon_inode_getfd() with O_CLOEXEC flag · 24009b05

由 Yann Droneaud 提交于 8月 24, 2013

KVM uses anon_inode_get() to allocate file descriptors as part
of some of its ioctls. But those ioctls are lacking a flag argument
allowing userspace to choose options for the newly opened file descriptor.

In such case it's advised to use O_CLOEXEC by default so that
userspace is allowed to choose, without race, if the file descriptor
is going to be inherited across exec().

This patch set O_CLOEXEC flag on all file descriptors created
with anon_inode_getfd() to not leak file descriptors across exec().
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.comReviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

24009b05

29 7月, 2013 1 次提交

KVM: introduce __kvm_io_bus_sort_cmp · a343c9b7

由 Paolo Bonzini 提交于 7月 16, 2013

kvm_io_bus_sort_cmp is used also directly, not just as a callback for
sort and bsearch.  In these cases, it is handy to have a type-safe
variant.  This patch introduces such a variant, __kvm_io_bus_sort_cmp,
and uses it throughout kvm_main.c.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a343c9b7

18 7月, 2013 2 次提交

KVM: Introduce kvm_arch_memslots_updated() · e59dbe09

由 Takuya Yoshikawa 提交于 7月 04, 2013

This is called right after the memslots is updated, i.e. when the result
of update_memslots() gets installed in install_new_memslots().  Since
the memslots needs to be updated twice when we delete or move a memslot,
kvm_arch_commit_memory_region() does not correspond to this exactly.

In the following patch, x86 will use this new API to check if the mmio
generation has reached its maximum value, in which case mmio sptes need
to be flushed out.
Signed-off-by: NTakuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Acked-by: NAlexander Graf <agraf@suse.de>
Reviewed-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e59dbe09

KVM: kvm-io: support cookies · 126a5af5

由 Cornelia Huck 提交于 7月 03, 2013

Add new functions kvm_io_bus_{read,write}_cookie() that allows users of
the kvm io infrastructure to use a cookie value to speed up lookup of a
device on an io bus.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

126a5af5

27 6月, 2013 2 次提交

KVM: Fix RTC interrupt coalescing tracking · 24f7bb52

由 Gleb Natapov 提交于 6月 24, 2013

This reverts most of the f1ed0450. After
the commit kvm_apic_set_irq() no longer returns accurate information
about interrupt injection status if injection is done into disabled
APIC. RTC interrupt coalescing tracking relies on the information to be
accurate and cannot recover if it is not.
Signed-off-by: NGleb Natapov <gleb@redhat.com>

24f7bb52

ARM: KVM: Allow host virt timer irq to be different from guest timer virt irq · 5ae7f87a

由 Anup Patel 提交于 4月 30, 2013

The arch_timer irq numbers (or PPI numbers) are implementation dependent,
so the host virtual timer irq number can be different from guest virtual
timer irq number.

This patch ensures that host virtual timer irq number is read from DTB and
guest virtual timer irq is determined based on vcpu target type.
Signed-off-by: NAnup Patel <anup.patel@linaro.org>
Signed-off-by: NPranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu>

5ae7f87a

04 6月, 2013 1 次提交

kvm: exclude ioeventfd from counting kvm_io_range limit · 6ea34c9b

由 Amos Kong 提交于 5月 25, 2013

We can easily reach the 1000 limit by start VM with a couple
hundred I/O devices (multifunction=on). The hardcode limit
already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000).

In userspace, we already have maximum file descriptor to
limit ioeventfd count. But kvm_io_bus devices also are used
for pit, pic, ioapic, coalesced_mmio. They couldn't be limited
by maximum file descriptor.

Currently only ioeventfds take too much kvm_io_bus devices,
so just exclude it from counting kvm_io_range limit.

Also fixed one indent issue in kvm_host.h
Signed-off-by: NAmos Kong <akong@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

6ea34c9b

19 5月, 2013 1 次提交

ARM: KVM: move GIC/timer code to a common location · 7275acdf

由 Marc Zyngier 提交于 5月 14, 2013

As KVM/arm64 is looming on the horizon, it makes sense to move some
of the common code to a single location in order to reduce duplication.

The code could live anywhere. Actually, most of KVM is already built
with a bunch of ugly ../../.. hacks in the various Makefiles, so we're
not exactly talking about style here. But maybe it is time to start
moving into a less ugly direction.

The include files must be in a "public" location, as they are accessed
from non-KVM files (arch/arm/kernel/asm-offsets.c).

For this purpose, introduce two new locations:
- virt/kvm/arm/ : x86 and ia64 already share the ioapic code in
  virt/kvm, so this could be seen as a (very ugly) precedent.
- include/kvm/  : there is already an include/xen, and while the
  intent is slightly different, this seems as good a location as
  any

Eventually, we should probably have independant Makefiles at every
levels (just like everywhere else in the kernel), but this is just
the first step.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7275acdf

14 5月, 2013 1 次提交

KVM: x86: Remove support for reporting coalesced APIC IRQs · f1ed0450

由 Jan Kiszka 提交于 4月 28, 2013

Since the arrival of posted interrupt support we can no longer guarantee
that coalesced IRQs are always reported to the IRQ source. Moreover,
accumulated APIC timer events could cause a busy loop when a VCPU should
rather be halted. The consensus is to remove coalesced tracking from the
LAPIC.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

f1ed0450

12 5月, 2013 1 次提交

KVM: add missing misc_deregister() on error in kvm_init() · afc2f792

由 Wei Yongjun 提交于 5月 05, 2013

Add the missing misc_deregister() before return from kvm_init()
in the debugfs init error handling case.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

afc2f792

09 5月, 2013 1 次提交

KVM/MIPS32: Do not call vcpu_load when injecting interrupts. · 2f4d9b54

由 Sanjay Lal 提交于 11月 21, 2012

Signed-off-by: NSanjay Lal <sanjayl@kymasys.com>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

2f4d9b54

08 5月, 2013 1 次提交

KVM: Fix kvm_irqfd_init initialization · 7dac16c3

由 Asias He 提交于 5月 08, 2013

In commit a0f155e9 'KVM: Initialize irqfd from kvm_init()', when
kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko),
kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be
called on the error handling path. This way, the kvm_irqfd system will
not be ready.

This patch fix the following:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: vhost_net
CPU 6
Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK
RIP: 0010:[<ffffffff81c0721e>]  [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
RSP: 0018:ffff880221721cc8  EFLAGS: 00010046
RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950
RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000
RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002
R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8
R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000
FS:  00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640)
Stack:
 ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086
 0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000
 ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000
Call Trace:
 [<ffffffff810ac5c5>] __queue_work+0x45/0x2d0
 [<ffffffff810ac8fe>] queue_work_on+0x8e/0xa0
 [<ffffffff810ac949>] queue_work+0x19/0x20
 [<ffffffff81009b6b>] irqfd_deactivate+0x4b/0x60
 [<ffffffff8100a69d>] kvm_irqfd+0x39d/0x580
 [<ffffffff81007a27>] kvm_vm_ioctl+0x207/0x5b0
 [<ffffffff810c9545>] ? update_curr+0xf5/0x180
 [<ffffffff811b66e8>] do_vfs_ioctl+0x98/0x550
 [<ffffffff810c1f5e>] ? finish_task_switch+0x4e/0xe0
 [<ffffffff81c054aa>] ? __schedule+0x2ea/0x710
 [<ffffffff811b6bf7>] sys_ioctl+0x57/0x90
 [<ffffffff8140ae9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
 [<ffffffff81c0f602>] system_call_fastpath+0x16/0x1b
Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 <f0> 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f
RIP  [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
RSP <ffff880221721cc8>
CR2: 0000000000000000
---[ end trace 13fb1e4b6e5ab21f ]---
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

7dac16c3

05 5月, 2013 1 次提交

kvm: Add compat_ioctl for device control API · db6ae615

由 Scott Wood 提交于 4月 30, 2013

This API shouldn't have 32/64-bit issues, but VFS assumes it does
unless told otherwise.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

db6ae615

02 5月, 2013 1 次提交

KVM: PPC: Book3S: Add API for in-kernel XICS emulation · 5975a2e0

由 Paul Mackerras 提交于 4月 27, 2013

This adds the API for userspace to instantiate an XICS device in a VM
and connect VCPUs to it.  The API consists of a new device type for
the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
which is used to assert and deassert interrupt inputs of the XICS.

The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
Each attribute within this group corresponds to the state of one
interrupt source.  The attribute number is the same as the interrupt
source number.

This does not support irq routing or irqfd yet.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

5975a2e0

27 4月, 2013 2 次提交

kvm: destroy emulated devices on VM exit · 07f0a7bd

由 Scott Wood 提交于 4月 25, 2013

The hassle of getting refcounting right was greater than the hassle
of keeping a list of devices to destroy on VM exit.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

07f0a7bd

kvm/ppc/mpic: in-kernel MPIC emulation · 5df554ad

由 Scott Wood 提交于 4月 12, 2013

Hook the MPIC code up to the KVM interfaces, add locking, etc.
Signed-off-by: NScott Wood <scottwood@freescale.com>
[agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit]
Signed-off-by: NAlexander Graf <agraf@suse.de>

5df554ad

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功