提交 · e3b56cdd4351f0e227d4d847eeadff4c82aef1b9 · openanolis / cloud-kernel

28 2月, 2017 1 次提交

vhost: try avoiding avail index access when getting descriptor · e3b56cdd

由 Jason Wang 提交于 2月 07, 2017

If last avail idx is not equal to cached avail idx, we're sure there's
still available buffers in the virtqueue so there's no need to re-read
avail idx. So let's skip this to avoid unnecessary userspace memory
access and memory barrier. Pktgen test show about 3% improvement on rx
pps.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e3b56cdd

04 2月, 2017 1 次提交

vhost: fix initialization for vq->is_le · cda8bba0

由 Halil Pasic 提交于 1月 30, 2017

Currently, under certain circumstances vhost_init_is_le does just a part
of the initialization job, and depends on vhost_reset_is_le being called
too. For this reason vhost_vq_init_access used to call vhost_reset_is_le
when vq->private_data is NULL. This is not only counter intuitive, but
also real a problem because it breaks vhost_net. The bug was introduced to
vhost_net with commit 2751c988 ("vhost: cross-endian support for
legacy devices"). The symptom is corruption of the vq's used.idx field
(virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost
shutdown on a vq with pending descriptors.

Let us make sure the outcome of vhost_init_is_le never depend on the state
it is actually supposed to initialize, and fix virtio_net by removing the
reset from vhost_vq_init_access.

With the above, there is no reason for vhost_reset_is_le to do just half
of the job. Let us make vhost_reset_is_le reinitialize is_le.
Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: NMichael A. Tebolt <miket@us.ibm.com>
Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Fixes: commit 2751c988 ("vhost: cross-endian support for legacy devices")
Cc: <stable@vger.kernel.org>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Tested-by: NMichael A. Tebolt <miket@us.ibm.com>

cda8bba0

16 12月, 2016 1 次提交

vhost: cache used event for better performance · 809ecb9b

由 Jason Wang 提交于 12月 12, 2016

When event index was enabled, we need to fetch used event from
userspace memory each time. This userspace fetch (with memory
barrier) could be saved sometime when 1) caching used event and 2)
if used event is ahead of new and old to new updating does not cross
it, we're sure there's no need to notify guest.

This will be useful for heavy tx load e.g guest pktgen test with Linux
driver shows ~3.5% improvement.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

809ecb9b

15 12月, 2016 2 次提交

vhost: add missing __user annotations · 72952cc0

由 Michael S. Tsirkin 提交于 12月 06, 2016

Several vhost functions were missing __user annotations
on pointers, causing sparse warnings. Fix this up.

sparse also warns about vhost_process_iotlb_msg which
is local and should be static. Fix that up as well.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

72952cc0

vhost: make interval tree static inline · 2f952c01

由 Michael S. Tsirkin 提交于 12月 06, 2016

vhost_umem_interval_tree is only used locally within vhost.c, mark it
static. As some functions generated go unused, this triggers warnings
unless we also mark it inline.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

2f952c01

09 12月, 2016 1 次提交

vhost: remove unnecessary smp_mb from vhost_work_queue · 635abf01

由 Peng Tao 提交于 12月 07, 2016

test_and_set_bit() already implies a memory barrier.
Signed-off-by: NPeng Tao <bergwolf@gmail.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

635abf01

06 12月, 2016 1 次提交

[iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8

由 Al Viro 提交于 11月 01, 2016

copy_from_iter_full(), copy_from_iter_full_nocache() and
csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
et.al., advancing iterator only in case of successful full copy
and returning whether it had been successful or not.

Convert some obvious users.  *NOTE* - do not blindly assume that
something is a good candidate for those unless you are sure that
not advancing iov_iter in failure case is the right thing in
this case.  Anything that does short read/short write kind of
stuff (or is in a loop, etc.) is unlikely to be a good one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cbbd26b8

02 8月, 2016 6 次提交

vhost: detect 32 bit integer wrap around · ec33d031

由 Michael S. Tsirkin 提交于 8月 01, 2016

Detect and fail early if long wrap around is triggered.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ec33d031

vhost: new device IOTLB API · 6b1e6cc7

由 Jason Wang 提交于 6月 23, 2016

This patch tries to implement an device IOTLB for vhost. This could be
used with userspace(qemu) implementation of DMA remapping
to emulate an IOMMU for the guest.

The idea is simple, cache the translation in a software device IOTLB
(which is implemented as an interval tree) in vhost and use vhost_net
file descriptor for reporting IOTLB miss and IOTLB
update/invalidation. When vhost meets an IOTLB miss, the fault
address, size and access can be read from the file. After userspace
finishes the translation, it writes the translated address to the
vhost_net file to update the device IOTLB.

When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
addresses set by ioctl are treated as iova instead of virtual address and
the accessing can only be done through IOTLB instead of direct userspace
memory access. Before each round or vq processing, all vq metadata is
prefetched in device IOTLB to make sure no translation fault happens
during vq processing.

In most cases, virtqueues are contiguous even in virtual address space.
The IOTLB translation for virtqueue itself may make it a little
slower. We might add fast path cache on top of this patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
[mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
[mst: fix build warnings ]
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
[ weiyj.lk: missing unlock on error ]
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>

6b1e6cc7

vhost: convert pre sorted vhost memory array to interval tree · a9709d68

由 Jason Wang 提交于 6月 23, 2016

Current pre-sorted memory region array has some limitations for future
device IOTLB conversion:

1) need extra work for adding and removing a single region, and it's
   expected to be slow because of sorting or memory re-allocation.
2) need extra work of removing a large range which may intersect
   several regions with different size.
3) need trick for a replacement policy like LRU

To overcome the above shortcomings, this patch convert it to interval
tree which can easily address the above issue with almost no extra
work.

The patch could be used for:

- Extend the current API and only let the userspace to send diffs of
  memory table.
- Simplify Device IOTLB implementation.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a9709d68

vhost: introduce vhost memory accessors · bfe2bc51

由 Jason Wang 提交于 6月 23, 2016

This patch introduces vhost memory accessors which were just wrappers
for userspace address access helpers. This is a requirement for vhost
device iotlb implementation which will add iotlb translations in those
accessors.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

bfe2bc51

vhost: lockless enqueuing · 04b96e55

由 Jason Wang 提交于 4月 25, 2016

We use spinlock to synchronize the work list now which may cause
unnecessary contentions. So this patch switch to use llist to remove
this contention. Pktgen tests shows about 5% improvement:

Before:
~1300000 pps
After:
~1370000 pps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

04b96e55

vhost: simplify work flushing · 7235acdb

由 Jason Wang 提交于 4月 25, 2016

We used to implement the work flushing through tracking queued seq,
done seq, and the number of flushing. This patch simplify this by just
implement work flushing through another kind of vhost work with
completion. This will be used by lockless enqueuing patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

7235acdb

11 3月, 2016 3 次提交

vhost_net: basic polling support · 03088137

由 Jason Wang 提交于 3月 04, 2016

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

03088137

vhost: introduce vhost_vq_avail_empty() · d4a60603

由 Jason Wang 提交于 3月 04, 2016

This patch introduces a helper which will return true if we're sure
that the available ring is empty for a specific vq. When we're not
sure, e.g vq access failure, return false instead. This could be used
for busy polling code to exit the busy loop.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

d4a60603

vhost: introduce vhost_has_work() · 526d3e7f

由 Jason Wang 提交于 3月 04, 2016

This path introduces a helper which can give a hint for whether or not
there's a work queued in the work list. This could be used for busy
polling code to exit the busy loop.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

526d3e7f

02 3月, 2016 3 次提交

vhost: rename vhost_init_used() · 80f7d030

由 Greg Kurz 提交于 2月 16, 2016

Looking at how callers use this, maybe we should just rename init_used
to vhost_vq_init_access. The _used suffix was a hint that we
access the vq used ring. But maybe what callers care about is
that it must be called after access_ok.

Also, this function manipulates the vq->is_le field which isn't related
to the vq used ring.

This patch simply renames vhost_init_used() to vhost_vq_init_access() as
suggested by Michael.

No behaviour change.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

80f7d030

vhost: rename cross-endian helpers · c5072037

由 Greg Kurz 提交于 2月 16, 2016

The default use case for vhost is when the host and the vring have the
same endianness (default native endianness). But there are cases where
they differ and vhost should byteswap when accessing the vring.

The first case is when the host is big endian and the vring belongs to
a virtio 1.0 device, which is always little endian.

This is covered by the vq->is_le field. This field is initialized when
userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device
stops.

We already have a vhost_init_is_le() helper, but the reset operation is
opencoded as follows:

	vq->is_le = virtio_legacy_is_little_endian();

It isn't clear that we are resetting vq->is_le here.

This patch moves the code to a helper with a more explicit name.

The other case where we may have to byteswap is when the architecture can
switch endianness at runtime (bi-endian). If endianness differs in the host
and in the guest, then legacy devices need to be used in cross-endian mode.

This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which
introduces a vq->user_be field. Userspace may enable cross-endian mode
by calling the SET_VRING_ENDIAN ioctl before the device is started. The
cross-endian mode is disabled when the device is stopped.

The current names of the helpers that manipulate vq->user_be are unclear.

This patch renames those helpers to clearly show that this is cross-endian
stuff and with explicit enable/disable semantics.

No behaviour change.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c5072037

vhost: fix error path in vhost_init_used() · e1f33be9

由 Greg Kurz 提交于 2月 16, 2016

We don't want side effects. If something fails, we rollback vq->is_le to
its previous value.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e1f33be9

07 12月, 2015 2 次提交

vhost: replace % with & on data path · 5fba13b5

由 Michael S. Tsirkin 提交于 11月 29, 2015

We know vring num is a power of 2, so use &
to mask the high bits.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

5fba13b5

vhost: relax log address alignment · d5424838

由 Michael S. Tsirkin 提交于 11月 16, 2015

commit 5d9a07b0 ("vhost: relax used
address alignment") fixed the alignment for the used virtual address,
but not for the physical address used for logging.

That's a mistake: alignment should clearly be the same for virtual and
physical addresses,

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

d5424838

27 7月, 2015 2 次提交

vhost: fix error handling for memory region alloc · 1e099473

由 Igor Mammedov 提交于 7月 15, 2015

callers of vhost_kvzalloc() expect the same behaviour on
allocation error as from kmalloc/vmalloc i.e. NULL return
value. So just return vzmalloc() returned value instead of
returning ERR_PTR(-ENOMEM)

Fixes: 4de7255f ("vhost: extend memory regions allocation to vmalloc")
Spotted-by: NDan Carpenter <dan.carpenter@oracle.com>
Suggested-by: NJulia Lawall <julia.lawall@lip6.fr>
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

1e099473

vhost: actually track log eventfd file · 7932c0bd

由 Marc-André Lureau 提交于 7月 17, 2015

While reviewing vhost log code, I found out that log_file is never
set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet).

Cc: stable@vger.kernel.org
Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

7932c0bd

14 7月, 2015 2 次提交

vhost: add max_mem_regions module parameter · c9ce42f7

由 Igor Mammedov 提交于 7月 02, 2015

it became possible to use a bigger amount of memory
slots, which is used by memory hotplug for
registering hotplugged memory.
However QEMU crashes if it's used with more than ~60
pc-dimm devices and vhost-net enabled since host kernel
in module vhost-net refuses to accept more than 64
memory regions.

Allow to tweak limit via max_mem_regions module paramemter
with default value set to 64 slots.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c9ce42f7

vhost: extend memory regions allocation to vmalloc · 4de7255f

由 Igor Mammedov 提交于 7月 01, 2015

with large number of memory regions we could end up with
high order allocations and kmalloc could fail if
host is under memory pressure.
Considering that memory regions array is used on hot path
try harder to allocate using kmalloc and if it fails resort
to vmalloc.
It's still better than just failing vhost_set_memory() and
causing guest crash due to it when a new memory hotplugged
to guest.

I'll still look at QEMU side solution to reduce amount of
memory regions it feeds to vhost to make things even better,
but it doesn't hurt for kernel to behave smarter and don't
crash older QEMU's which could use large amount of memory
regions.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

4de7255f

01 7月, 2015 1 次提交

vhost: use binary search instead of linear in find_region() · bcfeacab

由 Igor Mammedov 提交于 6月 16, 2015

For default region layouts performance stays the same
as linear search i.e. it takes around 210ns average for
translate_desc() that inlines find_region().

But it scales better with larger amount of regions,
235ns BS vs 300ns LS with 55 memory regions
and it will be about the same values when allowed number
of slots is increased to 509 like it has been done in kvm.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

bcfeacab

01 6月, 2015 1 次提交

vhost: cross-endian support for legacy devices · 2751c988

由 Greg Kurz 提交于 4月 24, 2015

This patch brings cross-endian support to vhost when used to implement
legacy virtio devices. Since it is a relatively rare situation, the
feature availability is controlled by a kernel config option (not set
by default).

The vq->is_le boolean field is added to cache the endianness to be
used for ring accesses. It defaults to native endian, as expected
by legacy virtio devices. When the ring gets active, we force little
endian if the device is modern. When the ring is deactivated, we
revert to the native endian default.

If cross-endian was compiled in, a vq->user_be boolean field is added
so that userspace may request a specific endianness. This field is
used to override the default when activating the ring of a legacy
device. It has no effect on modern devices.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>

2751c988

04 2月, 2015 1 次提交

vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec() · aad9a1ce

由 Al Viro 提交于 12月 10, 2014

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aad9a1ce

29 12月, 2014 1 次提交

vhost: relax used address alignment · 5d9a07b0

由 Michael S. Tsirkin 提交于 12月 21, 2014

virtio 1.0 only requires used address to be 4 byte aligned,
vhost required 8 bytes (size of vring_used_elem).
Fix up vhost to match that.

Additionally, while vhost correctly requires 8 byte
alignment for log, it's unconnected to used ring:
it's a consequence that log has u64 entries.
Tweak code to make that clearer.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

5d9a07b0

09 12月, 2014 2 次提交

M
vhost: virtio 1.0 endian-ness support · 3b1bbe89
由 Michael S. Tsirkin 提交于 10月 24, 2014
```
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
```
3b1bbe89

vhost: switch to __get/__put_user exclusively · 64f7f051

由 Michael S. Tsirkin 提交于 12月 01, 2014

Most places in vhost can use __get/__put_user rather than
get/put_user since addresses are pre-validated.
This should be good for performance, but this also
will help make code sparse-clean: get/put_user macros
don't play well with __virtioXX bitwise tags.
Switch to get/put_user to __ variants everywhere in vhost.
There's one exception - for consistency switch that
as well, and add an explicit access_ok check.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

64f7f051

09 6月, 2014 3 次提交

vhost: move memory pointer to VQs · 47283bef

由 Michael S. Tsirkin 提交于 6月 05, 2014

commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc
    vhost: replace rcu with mutex
replaced rcu sync for memory accesses with VQ mutex locl/unlock.
This is correct since all accesses are under VQ mutex, but incomplete:
we still do useless rcu lock/unlock operations, someone might copy this
code into some other context where this won't be right.
This use of RCU is also non standard and hard to understand.
Let's copy the pointer to each VQ structure, this way
the access rules become straight-forward, and there's
no need for RCU anymore.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

47283bef

vhost: move acked_features to VQs · ea16c514

由 Michael S. Tsirkin 提交于 6月 05, 2014

Refactor code to make sure features are only accessed
under VQ mutex. This makes everything simpler, no need
for RCU here anymore.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ea16c514

vhost: replace rcu with mutex · 98f9ca0a

由 Michael S. Tsirkin 提交于 5月 28, 2014

All memory accesses are done under some VQ mutex.
So lock/unlock all VQs is a faster equivalent of synchronize_rcu()
for memory access changes.
Some guests cause a lot of these changes, so it's helpful
to make them faster.
Reported-by: N"Gonglei (Arei)" <arei.gonglei@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

98f9ca0a

07 12月, 2013 1 次提交

vhost: remove the dead branch · 59566b6e

由 Zhi Yong Wu 提交于 12月 07, 2013

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59566b6e

17 9月, 2013 1 次提交

vhost: wake up worker outside spin_lock · ac9fde24

由 Qin Chuanyu 提交于 6月 07, 2013

the wake_up_process func is included by spin_lock/unlock in
vhost_work_queue,
but it could be done outside the spin_lock.
I have test it with kernel 3.0.27 and guest suse11-sp2 using iperf,
the num as below.
                  original                 modified
thread_num  tp(Gbps)   vhost(%)  |  tp(Gbps)     vhost(%)
1           9.59        28.82    |   9.59        27.49
8           9.61        32.92    |   9.62        26.77
64          9.58        46.48    |   9.55        38.99
256         9.6         63.7     |   9.6         52.59
Signed-off-by: NChuanyu Qin <qinchuanyu@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ac9fde24

04 9月, 2013 1 次提交

vhost: switch to use vhost_add_used_n() · c49e4e57

由 Jason Wang 提交于 9月 02, 2013

Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c49e4e57

21 8月, 2013 1 次提交

vhost: Include linux/uio.h instead of linux/socket.h · 35596b27

由 Asias He 提交于 8月 19, 2013

memcpy_fromiovec is moved from net/core/iovec.c to lib/iovec.c.
linux/uio.h provides the declaration for memcpy_fromiovec.

Include linux/uio.h instead of inux/socket.h for it.
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35596b27

07 7月, 2013 2 次提交

vhost: Make vhost a separate module · 6ac1afbf

由 Asias He 提交于 5月 06, 2013

Currently, vhost-net and vhost-scsi are sharing the vhost core code.
However, vhost-scsi shares the code by including the vhost.c file
directly.

Making vhost a separate module makes it is easier to share code with
other vhost devices.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

6ac1afbf

vhost: Simplify dev->vqs[i] access · 6d5e6aa8

由 Asias He 提交于 5月 06, 2013

Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

6d5e6aa8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功