提交 · 7ced6c98c7ab7a1f6743931e28671b833af79b1e · openanolis / cloud-kernel

11 4月, 2018 1 次提交

vhost: Fix vhost_copy_to_user() · 7ced6c98

由 Eric Auger 提交于 4月 11, 2018

vhost_copy_to_user is used to copy vring used elements to userspace.
We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.

Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ced6c98

30 3月, 2018 1 次提交

vhost: validate log when IOTLB is enabled · d65026c6

由 Jason Wang 提交于 3月 29, 2018

Vq log_base is the userspace address of bitmap which has nothing to do
with IOTLB. So it needs to be validated unconditionally otherwise we
may try use 0 as log_base which may lead to pin pages that will lead
unexpected result (e.g trigger BUG_ON() in set_bit_to_user()).

Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d65026c6

28 3月, 2018 1 次提交

vhost: correctly remove wait queue during poll failure · dc6455a7

由 Jason Wang 提交于 3月 27, 2018

We tried to remove vq poll from wait queue, but do not check whether
or not it was in a list before. This will lead double free. Fixing
this by switching to use vhost_poll_stop() which zeros poll->wqh after
removing poll from waitqueue to make sure it won't be freed twice.

Cc: Darren Kenny <darren.kenny@oracle.com>
Reported-by: syzbot+c0272972b01b872e604a@syzkaller.appspotmail.com
Fixes: 2b8b328b ("vhost_net: handle polling errors when setting backend")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc6455a7

20 3月, 2018 1 次提交

vhost: fix vhost ioctl signature to build with clang · 26b36604

由 Sonny Rao 提交于 3月 14, 2018

Clang is particularly anal about signed vs unsigned comparisons and
doesn't like the fact that some ioctl numbers set the MSB, so we get
this error when trying to build vhost on aarch64:

drivers/vhost/vhost.c:1400:7: error: overflow converting case value to
 switch condition type (3221794578 to 18446744072636378898)
 [-Werror, -Wswitch]
        case VHOST_GET_VRING_BASE:

3221794578 is 0xC008AF12 in hex
18446744072636378898 is 0xFFFFFFFFC008AF12 in hex

Fix this by using unsigned ints in the function signature for
vhost_vring_ioctl().
Signed-off-by: NSonny Rao <sonnyrao@chromium.org>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

26b36604

12 2月, 2018 1 次提交

vfs: do bulk POLL* -> EPOLL* replacement · a9a08845

由 Linus Torvalds 提交于 2月 11, 2018

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9a08845

01 2月, 2018 5 次提交

vhost: don't hold onto file pointer for VHOST_SET_LOG_FD · d25cc43c

由 Eric Biggers 提交于 1月 06, 2018

We already hold a reference to the eventfd_ctx, which is sufficient;
there's no need to hold a reference to the struct file as well.  So get
rid of vhost_dev->log_file.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NJason Wang <jasowang@redhat.com>

d25cc43c

vhost: don't hold onto file pointer for VHOST_SET_VRING_ERR · 09f332a5

由 Eric Biggers 提交于 1月 06, 2018

We already hold a reference to the eventfd_ctx, which is sufficient;
there's no need to hold a reference to the struct file as well.  So get
rid of vhost_virtqueue->error.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NJason Wang <jasowang@redhat.com>

09f332a5

vhost: don't hold onto file pointer for VHOST_SET_VRING_CALL · e050c7d9

由 Eric Biggers 提交于 1月 06, 2018

We already hold a reference to the eventfd_ctx, which is sufficient;
there's no need to hold a reference to the struct file as well.  So get
rid of vhost_virtqueue->call.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NJason Wang <jasowang@redhat.com>

e050c7d9

夷

vhost: remove unused lock check flag in vhost_dev_cleanup() · f6f93f75

由夷则(Caspar) 提交于 12月 25, 2017

In commit ea5d4046 ("vhost: fix release path lockdep checks"),
Michael added a flag to check whether we should hold a lock in
vhost_dev_cleanup(), however, in commit 47283bef ("vhost: move
memory pointer to VQs"), RCU operations have been replaced by
mutex, we can remove the no-longer-used `locked' parameter now.
Signed-off-by: NCaspar Zhang <jinli.zjl@alibaba-inc.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

f6f93f75

vhost: Remove the unused variable. · ac964d7a

由 Tonghao Zhang 提交于 1月 09, 2018

The patch (7235acdb) changed the way of the work
flushing in which the queued seq, done seq, and the
flushing are not used anymore. Then remove them now.

Fixes: 7235acdb ("vhost: simplify work flushing")
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ac964d7a

25 1月, 2018 2 次提交

vhost: do not try to access device IOTLB when not initialized · 6f3180af

由 Jason Wang 提交于 1月 23, 2018

The code will try to access dev->iotlb when processing
VHOST_IOTLB_INVALIDATE even if it was not initialized which may lead
to NULL pointer dereference. Fixes this by check dev->iotlb before.

Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f3180af

vhost: use mutex_lock_nested() in vhost_dev_lock_vqs() · e9cb4239

由 Jason Wang 提交于 1月 23, 2018

We used to call mutex_lock() in vhost_dev_lock_vqs() which tries to
hold mutexes of all virtqueues. This may confuse lockdep to report a
possible deadlock because of trying to hold locks belong to same
class. Switch to use mutex_lock_nested() to avoid false positive.

Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Reported-by: syzbot+dbb7c1161485e61b0241@syzkaller.appspotmail.com
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9cb4239

06 12月, 2017 1 次提交

drivers/vhost: Remove now-redundant read_barrier_depends() · 3a5db0b1

由 Paul E. McKenney 提交于 11月 27, 2017

Because READ_ONCE() now implies read_barrier_depends(), the
read_barrier_depends() in next_desc() is now redundant.  This commit
therefore removes it and the related comments.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: <kvm@vger.kernel.org>
Cc: <virtualization@lists.linux-foundation.org>
Cc: <netdev@vger.kernel.org>

3a5db0b1

29 11月, 2017 1 次提交
- A
  the rest of drivers/*: annotate ->poll() instances · afc9a42b
  由 Al Viro 提交于 7月 03, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  afc9a42b
28 11月, 2017 3 次提交

vhost: annotate vhost_poll · 58e3b602

由 Al Viro 提交于 7月 03, 2017

its ->mask is POLL... bitmap
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

58e3b602

annotate poll-related wait keys · 3ad6f93e

由 Al Viro 提交于 7月 03, 2017

__poll_t is also used as wait key in some waitqueues.
Verify that wait_..._poll() gets __poll_t as key and
provide a helper for wakeup functions to get back to
that __poll_t value.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3ad6f93e

A
anntotate the places where ->poll() return values go · e6c8adca
由 Al Viro 提交于 7月 03, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e6c8adca

15 11月, 2017 1 次提交

vhost: fix end of range for access_ok · ca2c5b33

由 Michael S. Tsirkin 提交于 8月 21, 2017

During access_ok checks, addr increases as we iterate over the data
structure, thus addr + len - 1 will point beyond the end of region we
are translating.  Harmless since we then verify that the region covers
addr, but let's not waste cpu cycles.
Reported-by: NKoichiro Den <den@klaipeden.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NKoichiro Den <den@klaipeden.com>

ca2c5b33

09 9月, 2017 1 次提交

lib/interval_tree: fast overlap detection · f808c13f

由 Davidlohr Bueso 提交于 9月 08, 2017

Allow interval trees to quickly check for overlaps to avoid unnecesary
tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require using a
'rb_root_cached' such that we can have the leftmost node easily
available.  While most users will make use of this feature, those with
special functions (in addition to the generic insert, delete, search
calls) will avoid using the cached option as they can do funky things
with insertions -- for example, vma_interval_tree_insert_after().

[jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()]
  Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com
Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f808c13f

30 7月, 2017 1 次提交

Revert "vhost: cache used event for better performance" · 8d65843c

由 Jason Wang 提交于 7月 27, 2017

This reverts commit 809ecb9b. Since it
was reported to break vhost_net. We want to cache used event and use
it to check for notification. The assumption was that guest won't move
the event idx back, but this could happen in fact when 16 bit index
wraps around after 64K entries.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d65843c

20 6月, 2017 1 次提交

sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9

由 Ingo Molnar 提交于 6月 20, 2017

Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

ac6424b9

09 5月, 2017 1 次提交

mm: support __GFP_REPEAT in kvmalloc_node for >32kB · 6c5ab651

由 Michal Hocko 提交于 5月 08, 2017

vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp.
vhost_vsock because it would really like to prefer kmalloc to the
vmalloc fallback - see 23cc5a99 ("vhost-net: extend device
allocation to vmalloc") for more context.  Michael Tsirkin has also
noted:

 "__GFP_REPEAT overhead is during allocation time. Using vmalloc means
  all accesses are slowed down. Allocation is not on data path, accesses
  are."

The similar applies to other vhost_kvzalloc users.

Let's teach kvmalloc_node to handle __GFP_REPEAT properly.  There are
two things to be careful about.  First we should prevent from the OOM
killer and so have to involve __GFP_NORETRY by default and secondly
override __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is
ignored for !costly orders.

Supporting __GFP_REPEAT like semantic for !costly request is possible it
would require changes in the page allocator.  This is out of scope of
this patch.

This patch shouldn't introduce any functional change.

Link: http://lkml.kernel.org/r/20170306103032.2540-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c5ab651

02 3月, 2017 3 次提交

sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1

由 Ingo Molnar 提交于 2月 02, 2017

sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>

Fix up affected files that include this signal functionality via sched.h.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

174cd4b1

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> · 6e84f315

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6e84f315

vhost: introduce O(1) vq metadata cache · f8894913

由 Jason Wang 提交于 2月 28, 2017

When device IOTLB is enabled, all address translations were stored in
interval tree. O(lgN) searching time could be slow for virtqueue
metadata (avail, used and descriptors) since they were accessed much
often than other addresses. So this patch introduces an O(1) array
which points to the interval tree nodes that store the translations of
vq metadata. Those array were update during vq IOTLB prefetching and
were reset during each invalidation and tlb update. Each time we want
to access vq metadata, this small array were queried before interval
tree. This would be sufficient for static mappings but not dynamic
mappings, we could do optimizations on top.

Test were done with l2fwd in guest (2M hugepage):

   noiommu  | before        | after
tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%)
rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%)

We can almost reach the same performance as noiommu mode.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

f8894913

28 2月, 2017 1 次提交

vhost: try avoiding avail index access when getting descriptor · e3b56cdd

由 Jason Wang 提交于 2月 07, 2017

If last avail idx is not equal to cached avail idx, we're sure there's
still available buffers in the virtqueue so there's no need to re-read
avail idx. So let's skip this to avoid unnecessary userspace memory
access and memory barrier. Pktgen test show about 3% improvement on rx
pps.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e3b56cdd

04 2月, 2017 1 次提交

vhost: fix initialization for vq->is_le · cda8bba0

由 Halil Pasic 提交于 1月 30, 2017

Currently, under certain circumstances vhost_init_is_le does just a part
of the initialization job, and depends on vhost_reset_is_le being called
too. For this reason vhost_vq_init_access used to call vhost_reset_is_le
when vq->private_data is NULL. This is not only counter intuitive, but
also real a problem because it breaks vhost_net. The bug was introduced to
vhost_net with commit 2751c988 ("vhost: cross-endian support for
legacy devices"). The symptom is corruption of the vq's used.idx field
(virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost
shutdown on a vq with pending descriptors.

Let us make sure the outcome of vhost_init_is_le never depend on the state
it is actually supposed to initialize, and fix virtio_net by removing the
reset from vhost_vq_init_access.

With the above, there is no reason for vhost_reset_is_le to do just half
of the job. Let us make vhost_reset_is_le reinitialize is_le.
Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: NMichael A. Tebolt <miket@us.ibm.com>
Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: commit 2751c988 ("vhost: cross-endian support for legacy devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Tested-by: NMichael A. Tebolt <miket@us.ibm.com>

cda8bba0

19 1月, 2017 1 次提交

vhost: better detection of available buffers · 275bf960

由 Jason Wang 提交于 1月 18, 2017

This patch tries to do several tweaks on vhost_vq_avail_empty() for a
better performance:

- check cached avail index first which could avoid userspace memory access.
- using unlikely() for the failure of userspace access
- check vq->last_avail_idx instead of cached avail index as the last
  step.

This patch is need for batching supports which needs to peek whether
or not there's still available buffers in the ring.
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

275bf960

16 12月, 2016 1 次提交

vhost: cache used event for better performance · 809ecb9b

由 Jason Wang 提交于 12月 12, 2016

When event index was enabled, we need to fetch used event from
userspace memory each time. This userspace fetch (with memory
barrier) could be saved sometime when 1) caching used event and 2)
if used event is ahead of new and old to new updating does not cross
it, we're sure there's no need to notify guest.

This will be useful for heavy tx load e.g guest pktgen test with Linux
driver shows ~3.5% improvement.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

809ecb9b

15 12月, 2016 2 次提交

vhost: add missing __user annotations · 72952cc0

由 Michael S. Tsirkin 提交于 12月 06, 2016

Several vhost functions were missing __user annotations
on pointers, causing sparse warnings. Fix this up.

sparse also warns about vhost_process_iotlb_msg which
is local and should be static. Fix that up as well.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

72952cc0

vhost: make interval tree static inline · 2f952c01

由 Michael S. Tsirkin 提交于 12月 06, 2016

vhost_umem_interval_tree is only used locally within vhost.c, mark it
static. As some functions generated go unused, this triggers warnings
unless we also mark it inline.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

2f952c01

09 12月, 2016 1 次提交

vhost: remove unnecessary smp_mb from vhost_work_queue · 635abf01

由 Peng Tao 提交于 12月 07, 2016

test_and_set_bit() already implies a memory barrier.
Signed-off-by: NPeng Tao <bergwolf@gmail.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

635abf01

06 12月, 2016 1 次提交

[iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8

由 Al Viro 提交于 11月 01, 2016

copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users.  *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case.  Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cbbd26b8

02 8月, 2016 6 次提交

vhost: detect 32 bit integer wrap around · ec33d031

由 Michael S. Tsirkin 提交于 8月 01, 2016

Detect and fail early if long wrap around is triggered.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ec33d031

vhost: new device IOTLB API · 6b1e6cc7

由 Jason Wang 提交于 6月 23, 2016

This patch tries to implement an device IOTLB for vhost. This could be
used with userspace(qemu) implementation of DMA remapping
to emulate an IOMMU for the guest.

The idea is simple, cache the translation in a software device IOTLB
(which is implemented as an interval tree) in vhost and use vhost_net
file descriptor for reporting IOTLB miss and IOTLB
update/invalidation. When vhost meets an IOTLB miss, the fault
address, size and access can be read from the file. After userspace
finishes the translation, it writes the translated address to the
vhost_net file to update the device IOTLB.

When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
addresses set by ioctl are treated as iova instead of virtual address and
the accessing can only be done through IOTLB instead of direct userspace
memory access. Before each round or vq processing, all vq metadata is
prefetched in device IOTLB to make sure no translation fault happens
during vq processing.

In most cases, virtqueues are contiguous even in virtual address space.
The IOTLB translation for virtqueue itself may make it a little
slower. We might add fast path cache on top of this patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
[mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
[mst: fix build warnings ]
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
[ weiyj.lk: missing unlock on error ]
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>

6b1e6cc7

vhost: convert pre sorted vhost memory array to interval tree · a9709d68

由 Jason Wang 提交于 6月 23, 2016

Current pre-sorted memory region array has some limitations for future
device IOTLB conversion:

1) need extra work for adding and removing a single region, and it's
   expected to be slow because of sorting or memory re-allocation.
2) need extra work of removing a large range which may intersect
   several regions with different size.
3) need trick for a replacement policy like LRU

To overcome the above shortcomings, this patch convert it to interval
tree which can easily address the above issue with almost no extra
work.

The patch could be used for:

- Extend the current API and only let the userspace to send diffs of
  memory table.
- Simplify Device IOTLB implementation.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a9709d68

vhost: introduce vhost memory accessors · bfe2bc51

由 Jason Wang 提交于 6月 23, 2016

This patch introduces vhost memory accessors which were just wrappers
for userspace address access helpers. This is a requirement for vhost
device iotlb implementation which will add iotlb translations in those
accessors.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

bfe2bc51

vhost: lockless enqueuing · 04b96e55

由 Jason Wang 提交于 4月 25, 2016

We use spinlock to synchronize the work list now which may cause
unnecessary contentions. So this patch switch to use llist to remove
this contention. Pktgen tests shows about 5% improvement:

Before:
~1300000 pps
After:
~1370000 pps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

04b96e55

vhost: simplify work flushing · 7235acdb

由 Jason Wang 提交于 4月 25, 2016

We used to implement the work flushing through tracking queued seq,
done seq, and the number of flushing. This patch simplify this by just
implement work flushing through another kind of vhost work with
completion. This will be used by lockless enqueuing patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

7235acdb

11 3月, 2016 1 次提交

vhost_net: basic polling support · 03088137

由 Jason Wang 提交于 3月 04, 2016

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

03088137

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功