提交 · 56f89f3629ffd1a21d38c3d0bea23deac0e284ce · openeuler / raspberrypi-kernel

05 8月, 2014 1 次提交

KVM: Don't keep reference to irq routing table in irqfd struct · 56f89f36

由 Paul Mackerras 提交于 6月 30, 2014

This makes the irqfd code keep a copy of the irq routing table entry
for each irqfd, rather than a reference to the copy in the actual
irq routing table maintained in kvm/virt/irqchip.c.  This will enable
us to change the routing table structure in future, or even not have a
routing table at all on some platforms.

The synchronization that was previously achieved using srcu_dereference
on the read side is now achieved using a seqcount_t structure.  That
ensures that we don't get a halfway-updated copy of the structure if
we read it while another thread is updating it.

We still use srcu_read_lock/unlock around the read side so that when
changing the routing table we can be sure that after calling
synchronize_srcu, nothing will be using the old routing.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Tested-by: NEric Auger <eric.auger@linaro.org>
Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56f89f36

31 7月, 2014 2 次提交

KVM: arm64: GICv3: mandate page-aligned GICV region · fb3ec679

由 Marc Zyngier 提交于 7月 31, 2014

Just like GICv2 was fixed in 63afbe7a
(kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform),
mandate the GICV region to be both aligned on a page boundary and
its size to be a multiple of page size.

This prevents a guest from being able to poke at regions where we
have no idea what is sitting there.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

fb3ec679

KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table · 0f6c0a74

由 Paolo Bonzini 提交于 7月 30, 2014

Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked. However, this can cause a bug that manifests
as an interrupt storm inside the guest. Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)

The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.

As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).

Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work. What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.

My guess was that the guest is masking the interrupt in the redirection
table in the interrupt routine, i.e. while the interrupt is set in a
LAPIC's ISR, The simplest fix is to ignore the masking state, we would
rather have an unnecessary exit rather than a missed IRQ ACK and anyway
IOAPIC interrupts are not as performance-sensitive as for example MSIs.
Alex tested this patch and it fixed his bug.

[Thanks to Alex for his precise description of the problem
and initial debugging effort. A lot of the text above is
based on emails exchanged with him.]
Reported-by: NAlex Williamson <alex.williamson@redhat.com>
Tested-by: NAlex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0f6c0a74

30 7月, 2014 1 次提交

kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform · 63afbe7a

由 Will Deacon 提交于 7月 25, 2014

If the physical address of GICV isn't page-aligned, then we end up
creating a stage-2 mapping of the page containing it, which causes us to
map neighbouring memory locations directly into the guest.

As an example, consider a platform with GICV at physical 0x2c02f000
running a 64k-page host kernel. If qemu maps this into the guest at
0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
physical regions may cause UNPREDICTABLE behaviour, for example, on the
Juno platform this will cause an SError exception to EL3, which brings
down the entire physical CPU resulting in RCU stalls / HYP panics / host
crashing / wasted weeks of debugging.

SBSA recommends that systems alias the 4k GICV across the bounding 64k
region, in which case GICV physical could be described as 0x2c020000 in
the above scenario.

This patch fixes the problem by failing the vgic probe if the physical
base address or the size of GICV aren't page-aligned. Note that this
generated a warning in dmesg about freeing enabled IRQs, so I had to
move the IRQ enabling later in the probe.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joel Schopp <joel.schopp@amd.com>
Cc: Don Dutile <ddutile@redhat.com>
Acked-by: NPeter Maydell <peter.maydell@linaro.org>
Acked-by: NJoel Schopp <joel.schopp@amd.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

63afbe7a

28 7月, 2014 2 次提交

KVM: Allow KVM_CHECK_EXTENSION on the vm fd · 92b591a4

由 Alexander Graf 提交于 7月 14, 2014

The KVM_CHECK_EXTENSION is only available on the kvm fd today. Unfortunately
on PPC some of the capabilities change depending on the way a VM was created.

So instead we need a way to expose capabilities as VM ioctl, so that we can
see which VM type we're using (HV or PR). To enable this, add the
KVM_CHECK_EXTENSION ioctl to our vm ioctl portfolio.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

92b591a4

KVM: Rename and add argument to check_extension · 784aa3d7

由 Alexander Graf 提交于 7月 14, 2014

In preparation to make the check_extension function available to VM scope
we add a struct kvm * argument to the function header and rename the function
accordingly. It will still be called from the /dev/kvm fd, but with a NULL
argument for struct kvm *.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

784aa3d7

25 7月, 2014 1 次提交

kvm: Resolve missing-field-initializers warnings · 25f97ff4

由 Mark Rustad 提交于 7月 25, 2014

Resolve missing-field-initializers warnings seen in W=2 kernel
builds by having macros generate more elaborated initializers.
That is enough to silence the warnings.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25f97ff4

11 7月, 2014 16 次提交

ARM64: KVM: fix vgic_bitmap_get_reg function for BE 64bit case · 9662fb48

由 Victor Kamensky 提交于 6月 12, 2014

Fix vgic_bitmap_get_reg function to return 'right' word address of
'unsigned long' bitmap value in case of BE 64bit image.
Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

9662fb48

ARM: KVM: vgic mmio should hold data as LE bytes array in BE case · 1c9f0471

由 Victor Kamensky 提交于 6月 12, 2014

According to recent clarifications of mmio.data array meaning -
the mmio.data array should hold bytes as they would appear in
memory. Vgic is little endian device. And in case of BE image
kernel side that emulates vgic, holds data in BE form. So we
need to byteswap cpu<->le32 vgic registers when we read/write them
from mmio.data[].

Change has no effect in LE case because cpu already runs in le32.
Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

1c9f0471

arm64: KVM: vgic: enable GICv2 emulation on top on GICv3 hardware · 67b2abfe

由 Marc Zyngier 提交于 7月 09, 2013

Add the last missing bits that enable GICv2 emulation on top of
GICv3 hardware.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

67b2abfe

KVM: ARM: vgic: add the GICv3 backend · b2fb1c0d

由 Marc Zyngier 提交于 7月 12, 2013

Introduce the support code for emulating a GICv2 on top of GICv3
hardware.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

b2fb1c0d

arm64: KVM: split GICv2 world switch from hyp code · 1a9b1305

由 Marc Zyngier 提交于 6月 21, 2013

Move the GICv2 world switch code into its own file, and add the
necessary indirection to the arm64 switch code.

Also introduce a new type field to the vgic_params structure.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

1a9b1305

KVM: ARM: vgic: revisit implementation of irqchip_in_kernel · f982cf4e

由 Marc Zyngier 提交于 5月 15, 2014

So far, irqchip_in_kernel() was implemented by testing the value of
vctrl_base, which worked fine with GICv2.

With GICv3, this field is useless, as we're using system registers
instead of a emmory mapped interface. To solve this, add a boolean
flag indicating if the we're using a vgic or not.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

f982cf4e

KVM: ARM: vgic: split GICv2 backend from the main vgic code · 8f186d52

由 Marc Zyngier 提交于 2月 04, 2014

Brutally hack the innocent vgic code, and move the GICv2 specific code
to its own file, using vgic_ops and vgic_params as a way to pass
information between the two blocks.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

8f186d52

KVM: ARM: introduce vgic_params structure · ca85f623

由 Marc Zyngier 提交于 6月 18, 2013

Move all the data specific to a given GIC implementation into its own
little structure.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

ca85f623

KVM: ARM: vgic: introduce vgic_enable · da8dafd1

由 Marc Zyngier 提交于 6月 04, 2013

Move the code dealing with enabling the VGIC on to vgic_ops.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

da8dafd1

KVM: ARM: vgic: abstract VMCR access · beee38b9

由 Marc Zyngier 提交于 2月 04, 2014

Instead of directly messing with with the GICH_VMCR bits for the CPU
interface save/restore code, add accessors that encode/decode the
entire set of registers exposed by VMCR.

Not the most efficient thing, but given that this code is only used
by the save/restore code, performance is far from being critical.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

beee38b9

KVM: ARM: vgic: move underflow handling to vgic_ops · 909d9b50

由 Marc Zyngier 提交于 6月 04, 2013

Move the code dealing with LR underflow handling to its own functions,
and make them accessible through vgic_ops.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

909d9b50

KVM: ARM: vgic: abstract MISR decoding · 495dd859

由 Marc Zyngier 提交于 6月 04, 2013

Instead of directly dealing with the GICH_MISR bits, move the code to
its own function and use a couple of public flags to represent the
actual state.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

495dd859

KVM: ARM: vgic: abstract EISR bitmap access · 8d6a0313

由 Marc Zyngier 提交于 6月 04, 2013

Move the GICH_EISR access to its own function.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

8d6a0313

KVM: ARM: vgic: abstract access to the ELRSR bitmap · 69bb2c9f

由 Marc Zyngier 提交于 6月 04, 2013

Move the GICH_ELRSR access to its own functions, and add them to
the vgic_ops structure.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

69bb2c9f

KVM: ARM: vgic: introduce vgic_ops and LR manipulation primitives · 8d5c6b06

由 Marc Zyngier 提交于 6月 03, 2013

In order to split the various register manipulation from the main vgic
code, introduce a vgic_ops structure, and start by abstracting the
LR manipulation code with a couple of accessors.
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

8d5c6b06

KVM: arm/arm64: vgic: move GICv2 registers to their own structure · eede821d

由 Marc Zyngier 提交于 5月 30, 2013

In order to make way for the GICv3 registers, move the v2-specific
registers to their own structure.
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

eede821d

05 6月, 2014 1 次提交

sched: Fix signedness bug in yield_to() · fa93384f

由 Dan Carpenter 提交于 5月 23, 2014

yield_to() is supposed to return -ESRCH if there is no task to
yield to, but because the type is bool that is the same as returning
true.

The only place I see which cares is kvm_vcpu_on_spin().
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NRaghavendra <raghavendra.kt@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/20140523102042.GA7267@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>

fa93384f

03 6月, 2014 1 次提交

KVM: add missing cleanup_srcu_struct · 820b3fcd

由 Paolo Bonzini 提交于 6月 03, 2014

Reported-by: Nhrg <hrgstephen@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

820b3fcd

05 5月, 2014 1 次提交

kvm/irqchip: Speed up KVM_SET_GSI_ROUTING · 719d93cd

由 Christian Borntraeger 提交于 1月 16, 2014

When starting lots of dataplane devices the bootup takes very long on
Christian's s390 with irqfd patches. With larger setups he is even
able to trigger some timeouts in some components. Turns out that the
KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
when having multiple CPUs. This is caused by the synchronize_rcu and
the HZ=100 of s390. By changing the code to use a private srcu we can
speed things up. This patch reduces the boot time till mounting root
from 8 to 2 seconds on my s390 guest with 100 disks.

Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
uses rcu_dereference_raw rather than rcu_dereference, and write-sides
do not do rcu lockdep at all).

Note that we're hardly relying on the "sleepable" part of srcu. We just
want SRCU's faster detection of grace periods.

Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
and RR. The difference between results "before" and "after" the patch
has mean -0.2% and standard deviation 0.6%. Using a paired t-test on the
data points says that there is a 2.5% probability that the patch is the
cause of the performance difference (rather than a random fluctuation).

(Restricting the t-test to RR, which is the most likely to be affected,
changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
probability that the numbers actually say something about the patch.
The probability increases mostly because there are fewer data points).

Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

719d93cd

29 4月, 2014 1 次提交

KVM: ARM: vgic: Fix the overlap check action about setting the GICD & GICC base address. · 30c21170

由 Haibin Wang 提交于 4月 29, 2014

Currently below check in vgic_ioaddr_overlap will always succeed,
because the vgic dist base and vgic cpu base are still kept UNDEF
after initialization. The code as follows will be return forever.

	if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu))
                return 0;

So, before invoking the vgic_ioaddr_overlap, it needs to set the
corresponding base address firstly.
Signed-off-by: NHaibin Wang <wanghaibin.wang@huawei.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

30c21170

28 4月, 2014 6 次提交

KVM: async_pf: change async_pf_execute() to use get_user_pages(tsk => NULL) · e9545b9f

由 Oleg Nesterov 提交于 4月 28, 2014

async_pf_execute() passes tsk == current to gup(), this is doesn't
hurt but unnecessary and misleading. "tsk" is only used to account
the number of faults and current is the random workqueue thread.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Suggested-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9545b9f

KVM: async_pf: kill the unnecessary use_mm/unuse_mm async_pf_execute() · d72d946d

由 Oleg Nesterov 提交于 4月 21, 2014

async_pf_execute() has no reasons to adopt apf->mm, gup(current, mm)
should work just fine even if current has another or NULL ->mm.

Recently kvm_async_page_present_sync() was added insedie the "use_mm"
section, but it seems that it doesn't need current->mm too.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d72d946d

KVM: arm/arm64: vgic: fix GICD_ICFGR register accesses · f2ae85b2

由 Andre Przywara 提交于 4月 11, 2014

Since KVM internally represents the ICFGR registers by stuffing two
of them into one word, the offset for accessing the internal
representation and the one for the MMIO based access are different.
So keep the original offset around, but adjust the internal array
offset by one bit.
Reported-by: NHaibin Wang <wanghaibin.wang@huawei.com>
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

f2ae85b2

KVM: async_pf: mm->mm_users can not pin apf->mm · 41c22f62

由 Oleg Nesterov 提交于 4月 21, 2014

get_user_pages(mm) is simply wrong if mm->mm_users == 0 and exit_mmap/etc
was already called (or is in progress), mm->mm_count can only pin mm->pgd
and mm_struct itself.

Change kvm_setup_async_pf/async_pf_execute to inc/dec mm->mm_users.

kvm_create_vm/kvm_destroy_vm play with ->mm_count too but this case looks
fine at first glance, it seems that this ->mm is only used to verify that
current->mm == kvm->mm.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

41c22f62

KVM: ARM: vgic: Fix sgi dispatch problem · 91021a6c

由 Haibin Wang 提交于 4月 10, 2014

When dispatch SGI(mode == 0), that is the vcpu of VM should send
sgi to the cpu which the target_cpus list.
So, there must add the "break" to branch of case 0.

Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: NHaibin Wang <wanghaibin.wang@huawei.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>

91021a6c

kvm: Use pci_enable_msix_exact() instead of pci_enable_msix() · e8e249d7

由 Alexander Gordeev 提交于 2月 21, 2014

As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range()  or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8e249d7

24 4月, 2014 1 次提交

Revert "KVM: Simplify kvm->tlbs_dirty handling" · a086f6a1

由 Xiao Guangrong 提交于 4月 17, 2014

This reverts commit 5befdc38.

Since we will allow flush tlb out of mmu-lock in the later
patch
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a086f6a1

22 4月, 2014 1 次提交

KVM: s390: Add proper dirty bitmap support to S390 kvm. · 15f36ebd

由 Jason J. Herne 提交于 8月 02, 2012

Replace the kvm_s390_sync_dirty_log() stub with code to construct the KVM
dirty_bitmap from S390 memory change bits.  Also add code to properly clear
the dirty_bitmap size when clearing the bitmap.
Signed-off-by: NJason J. Herne <jjherne@us.ibm.com>
CC: Dominik Dingel <dingel@linux.vnet.ibm.com>
[Dominik Dingel: use gmap_test_and_clear_dirty, locking fixes]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

15f36ebd

18 4月, 2014 2 次提交

KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1

由 Michael S. Tsirkin 提交于 3月 31, 2014

With KVM, MMIO is much slower than PIO, due to the need to
do page walk and emulation. But with EPT, it does not have to be: we
know the address from the VMCS so if the address is unique, we can look
up the eventfd directly, bypassing emulation.

Unfortunately, this only works if userspace does not need to match on
access length and data.  The implementation adds a separate FAST_MMIO
bus internally. This serves two purposes:
    - minimize overhead for old userspace that does not use eventfd with lengtth = 0
    - minimize disruption in other code (since we don't know the length,
      devices on the MMIO bus only get a valid address in write, this
      way we don't need to touch all devices to teach them to handle
      an invalid length)

At the moment, this optimization only has effect for EPT on x86.

It will be possible to speed up MMIO for NPT and MMU using the same
idea in the future.

With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
I was unable to detect any measureable slowdown to non-eventfd MMIO.

Making MMIO faster is important for the upcoming virtio 1.0 which
includes an MMIO signalling capability.

The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
pre-review and suggestions.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

68c3b4d1

KVM: support any-length wildcard ioeventfd · f848a5a8

由 Michael S. Tsirkin 提交于 3月 31, 2014

It is sometimes benefitial to ignore IO size, and only match on address.
In hindsight this would have been a better default than matching length
when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
of access can be optimized on VMX: there no need to do page lookups.
This can currently be done with many ioeventfds but in a suboptimal way.

However we can't change kernel/userspace ABI without risk of breaking
some applications.
Use len = 0 to mean "ignore length for matching" in a more optimal way.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

f848a5a8

08 4月, 2014 1 次提交

arm, kvm: fix double lock on cpu_add_remove_lock · 553f809e

由 Ming Lei 提交于 4月 07, 2014

Commit 8146875d (arm, kvm: Fix CPU hotplug callback registration)
holds the lock before calling the two functions:

	kvm_vgic_hyp_init()
	kvm_timer_hyp_init()

and both the two functions are calling register_cpu_notifier()
to register cpu notifier, so cause double lock on cpu_add_remove_lock.

Considered that both two functions are only called inside
kvm_arch_init() with holding cpu_add_remove_lock, so simply use
__register_cpu_notifier() to fix the problem.

Fixes: 8146875d (arm, kvm: Fix CPU hotplug callback registration)
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

553f809e

04 4月, 2014 2 次提交

KVM: ioapic: try to recover if pending_eoi goes out of range · 4009b249

由 Paolo Bonzini 提交于 3月 28, 2014

The RTC tracking code tracks the cardinality of rtc_status.dest_map
into rtc_status.pending_eoi.  It has some WARN_ONs that trigger if
pending_eoi ever becomes negative; however, these do not do anything
to recover, and it bad things will happen soon after they trigger.

When the next RTC interrupt is triggered, rtc_check_coalesced() will
return false, but ioapic_service will find pending_eoi != 0 and
do a BUG_ON.  To avoid this, should pending_eoi ever be nonzero,
call kvm_rtc_eoi_tracking_restore_all to recompute a correct
dest_map and pending_eoi.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4009b249

KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155) · 5678de3f

由 Paolo Bonzini 提交于 3月 28, 2014

QE reported that they got the BUG_ON in ioapic_service to trigger.
I cannot reproduce it, but there are two reasons why this could happen.

The less likely but also easiest one, is when kvm_irq_delivery_to_apic
does not deliver to any APIC and returns -1.

Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
function is never reached. However, you can target the similar loop in
kvm_irq_delivery_to_apic_fast; just program a zero logical destination
address into the IOAPIC, or an out-of-range physical destination address.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5678de3f