提交 · f3bff6318fa0f54956b02ed451d9b120441006ea · openanolis / cloud-kernel

07 4月, 2013 1 次提交

kvm: fix MMIO/PIO collision misdetection · 05e07f9b

由 Michael S. Tsirkin 提交于 4月 04, 2013

PIO and MMIO are separate address spaces, but
ioeventfd registration code mistakenly detected
two eventfds as duplicate if they use the same address,
even if one is PIO and another one MMIO.
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

05e07f9b

06 3月, 2013 2 次提交

KVM: ioeventfd for virtio-ccw devices. · 2b83451b

由 Cornelia Huck 提交于 2月 28, 2013

Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw
devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

2b83451b

KVM: Initialize irqfd from kvm_init(). · a0f155e9

由 Cornelia Huck 提交于 2月 28, 2013

Currently, eventfd introduces module_init/module_exit functions
to initialize/cleanup the irqfd workqueue. This only works, however,
if no other module_init/module_exit functions are built into the
same module.

Let's just move the initialization and cleanup to kvm_init and kvm_exit.
This way, it is also clearer where kvm startup may fail.
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

a0f155e9

28 2月, 2013 1 次提交

hlist: drop the node parameter from iterators · b67bfe0d

由 Sasha Levin 提交于 2月 27, 2013

I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b67bfe0d

11 12月, 2012 1 次提交

kvm: Fix irqfd resampler list walk · 49f8a1a5

由 Alex Williamson 提交于 12月 06, 2012

Typo for the next pointer means we're walking random data here.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

49f8a1a5

06 12月, 2012 1 次提交

KVM: Distangle eventfd code from irqchip · 914daba8

由 Alexander Graf 提交于 10月 09, 2012

The current eventfd code assumes that when we have eventfd, we also have
irqfd for in-kernel interrupt delivery. This is not necessarily true. On
PPC we don't have an in-kernel irqchip yet, but we can still support easily
support eventfd.
Signed-off-by: NAlexander Graf <agraf@suse.de>

914daba8

23 9月, 2012 1 次提交

KVM: Add resampling irqfds for level triggered interrupts · 7a84428a

由 Alex Williamson 提交于 9月 21, 2012

To emulate level triggered interrupts, add a resample option to
KVM_IRQFD.  When specified, a new resamplefd is provided that notifies
the user when the irqchip has been resampled by the VM.  This may, for
instance, indicate an EOI.  Also in this mode, posting of an interrupt
through an irqfd only asserts the interrupt.  On resampling, the
interrupt is automatically de-asserted prior to user notification.
This enables level triggered interrupts to be posted and re-enabled
from vfio with no userspace intervention.

All resampling irqfds can make use of a single irq source ID, so we
reserve a new one for this interface.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7a84428a

21 8月, 2012 1 次提交

workqueue: deprecate flush[_delayed]_work_sync() · 43829731

由 Tejun Heo 提交于 8月 20, 2012

flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>

43829731

03 7月, 2012 2 次提交

KVM: Sanitize KVM_IRQFD flags · 326cf033

由 Alex Williamson 提交于 6月 29, 2012

We only know of one so far.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

326cf033

KVM: Pass kvm_irqfd to functions · d4db2935

由 Alex Williamson 提交于 6月 29, 2012

Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

d4db2935

26 9月, 2011 1 次提交

KVM: Intelligent device lookup on I/O bus · 743eeb0b

由 Sasha Levin 提交于 7月 27, 2011

Currently the method of dealing with an IO operation on a bus (PIO/MMIO)
is to call the read or write callback for each device registered
on the bus until we find a device which handles it.

Since the number of devices on a bus can be significant due to ioeventfds
and coalesced MMIO zones, this leads to a lot of overhead on each IO
operation.

Instead of registering devices, we now register ranges which points to
a device. Lookup is done using an efficient bsearch instead of a linear
search.

Performance test was conducted by comparing exit count per second with
200 ioeventfds created on one byte and the guest is trying to access a
different byte continuously (triggering usermode exits).
Before the patch the guest has achieved 259k exits per second, after the
patch the guest does 274k exits per second.

Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

743eeb0b

06 4月, 2011 1 次提交

KVM: fix crash on irqfd deassign · 9e02fb96

由 Michael S. Tsirkin 提交于 3月 17, 2011

irqfd in kvm used flush_work incorrectly: it assumed that work scheduled
previously can't run after flush_work, but since kvm uses a non-reentrant
workqueue (by means of schedule_work) we need flush_work_sync to get that
guarantee.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reported-by: NJean-Philippe Menil <jean-philippe.menil@univ-nantes.fr>
Tested-by: NJean-Philippe Menil <jean-philippe.menil@univ-nantes.fr>
Signed-off-by: NAvi Kivity <avi@redhat.com>

9e02fb96

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

18 3月, 2011 1 次提交

KVM: improve comment on rcu use in irqfd_deassign · c8ce057e

由 Michael S. Tsirkin 提交于 3月 06, 2011

The RCU use in kvm_irqfd_deassign is tricky: we have rcu_assign_pointer
but no synchronize_rcu: synchronize_rcu is done by kvm_irq_routing_update
which we share a spinlock with.

Fix up a comment in an attempt to make this clearer.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

c8ce057e

12 1月, 2011 1 次提交

KVM: fast-path msi injection with irqfd · bd2b53b2

由 Michael S. Tsirkin 提交于 11月 18, 2010

Store irq routing table pointer in the irqfd object,
and use that to inject MSI directly without bouncing out to
a kernel thread.

While we touch this structure, rearrange irqfd fields to make fastpath
better packed for better cache utilization.

This also adds some comments about locking rules and rcu usage in code.

Some notes on the design:
- Use pointer into the rt instead of copying an entry,
  to make it possible to use rcu, thus side-stepping
  locking complexities.  We also save some memory this way.
- Old workqueue code is still used for level irqs.
  I don't think we DTRT with level anyway, however,
  it seems easier to keep the code around as
  it has been thought through and debugged, and fix level later than
  rip out and re-instate it later.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Acked-by: NGregory Haskins <ghaskins@novell.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bd2b53b2

23 9月, 2010 1 次提交

KVM: fix irqfd assign/deassign race · 6bbfb265

由 Michael S. Tsirkin 提交于 9月 19, 2010

I think I see the following (theoretical) race:

During irqfd assign, we drop irqfds lock before we
schedule inject work. Therefore, deassign running
on another CPU could cause shutdown and flush to run
before inject, causing user after free in inject.

A simple fix it to schedule inject under the lock.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGregory Haskins <ghaskins@novell.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

6bbfb265

01 8月, 2010 1 次提交
- A
  KVM: Update Red Hat copyrights · 221d059d
  由 Avi Kivity 提交于 5月 23, 2010
```
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
  221d059d
30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

01 3月, 2010 3 次提交
- M
  KVM: do not store wqh in irqfd · 8b97fb0f
  由 Michael S. Tsirkin 提交于 1月 13, 2010
```
wqh is unused, so we do not need to store it in irqfd anymore
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>
```
  8b97fb0f
- M
  KVM: convert slots_lock to a mutex · 79fac95e
  由 Marcelo Tosatti 提交于 12月 23, 2009
```
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
  79fac95e
- M
  KVM: convert io_bus to SRCU · e93f8a0f
  由 Marcelo Tosatti 提交于 12月 23, 2009
```
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
```
  e93f8a0f
25 1月, 2010 2 次提交

KVM: fix spurious interrupt with irqfd · b6a114d2

由 Michael S. Tsirkin 提交于 1月 13, 2010

kvm didn't clear irqfd counter on deassign, as a result we could get a
spurious interrupt when irqfd is assigned back. this leads to poor
performance and, in theory, guest crash.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b6a114d2

KVM: only allow one gsi per fd · f1d1c309

由 Michael S. Tsirkin 提交于 1月 13, 2010

Looks like repeatedly binding same fd to multiple gsi's with irqfd can
use up a ton of kernel memory for irqfd structures.

A simple fix is to allow each fd to only trigger one gsi: triggering a
storm of interrupts in guest is likely useless anyway, and we can do it
by binding a single gsi to many interrupts if we really want to.

Cc: stable@kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NAcked-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

f1d1c309

03 12月, 2009 1 次提交

KVM: Drop kvm->irq_lock lock from irq injection path · 680b3648

由 Gleb Natapov 提交于 8月 24, 2009

The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

680b3648

10 9月, 2009 4 次提交

KVM: correct error-handling code · 6223011f

由 Julia Lawall 提交于 7月 28, 2009

This code is not executed before file has been initialized to the result of
calling eventfd_fget.  This function returns an ERR_PTR value in an error
case instead of NULL.  Thus the test that file is not NULL is always true.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@

x = eventfd_fget(...)
... when != x = E
(
*  if (x == NULL || ...) S1 else S2
|
*  if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NAvi Kivity <avi@redhat.com>

6223011f

KVM: add ioeventfd support · d34e6b17

由 Gregory Haskins 提交于 7月 07, 2009

ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:       110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:      153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d34e6b17

KVM: switch irq injection/acking data structures to irq_lock · fa40a821

由 Marcelo Tosatti 提交于 6月 04, 2009

Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A                               CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(&kvm->lock);            worker_thread()
  -> kvm_deassign_irq()                -> kvm_assigned_dev_interrupt_work_handler()
    -> deassign_host_irq()               mutex_lock(&kvm->lock);
      -> cancel_work_sync() [blocked]

[gleb: fix ia64 path]
Reported-by: NAlex Williamson <alex.williamson@hp.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

fa40a821

KVM: irqfd · 721eecbf

由 Gregory Haskins 提交于 5月 20, 2009

KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure. This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism: Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.
Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

721eecbf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功