提交 · 6cdf89b1ca803b2d2d097466516431b1fc5bf985 · openanolis / cloud-kernel

16 11月, 2016 1 次提交

locking/core: Remove cpu_relax_lowlatency() users · f2f09a4c

由 Christian Borntraeger 提交于 10月 25, 2016

With the s390 special case of a yielding cpu_relax() implementation gone,
we can now remove all users of cpu_relax_lowlatency() and replace them
with cpu_relax().
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-5-git-send-email-borntraeger@de.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f2f09a4c

02 8月, 2016 2 次提交

vhost: new device IOTLB API · 6b1e6cc7

由 Jason Wang 提交于 6月 23, 2016

This patch tries to implement an device IOTLB for vhost. This could be
used with userspace(qemu) implementation of DMA remapping
to emulate an IOMMU for the guest.

The idea is simple, cache the translation in a software device IOTLB
(which is implemented as an interval tree) in vhost and use vhost_net
file descriptor for reporting IOTLB miss and IOTLB
update/invalidation. When vhost meets an IOTLB miss, the fault
address, size and access can be read from the file. After userspace
finishes the translation, it writes the translated address to the
vhost_net file to update the device IOTLB.

When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
addresses set by ioctl are treated as iova instead of virtual address and
the accessing can only be done through IOTLB instead of direct userspace
memory access. Before each round or vq processing, all vq metadata is
prefetched in device IOTLB to make sure no translation fault happens
during vq processing.

In most cases, virtqueues are contiguous even in virtual address space.
The IOTLB translation for virtqueue itself may make it a little
slower. We might add fast path cache on top of this patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
[mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
[mst: fix build warnings ]
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
[ weiyj.lk: missing unlock on error ]
Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>

6b1e6cc7

vhost: convert pre sorted vhost memory array to interval tree · a9709d68

由 Jason Wang 提交于 6月 23, 2016

Current pre-sorted memory region array has some limitations for future
device IOTLB conversion:

1) need extra work for adding and removing a single region, and it's
   expected to be slow because of sorting or memory re-allocation.
2) need extra work of removing a large range which may intersect
   several regions with different size.
3) need trick for a replacement policy like LRU

To overcome the above shortcomings, this patch convert it to interval
tree which can easily address the above issue with almost no extra
work.

The patch could be used for:

- Extend the current API and only let the userspace to send diffs of
  memory table.
- Simplify Device IOTLB implementation.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a9709d68

01 7月, 2016 1 次提交

tun: switch to use skb array for tx · 1576d986

由 Jason Wang 提交于 6月 30, 2016

We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.

This patch tries to address this by:

- switch from sk_receive_queue to a skb_array, and resize it when
  tx_queue_len was changed.
- introduce a new proto_ops peek_len which was used for peeking the
  skb length.
- implement a tun version of peek_len for vhost_net to use and convert
  vhost_net to use peek_len if possible.

Pktgen test shows about 15.3% improvement on guest receiving pps for small
buffers:

Before: ~1300000pps
After : ~1500000pps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1576d986

08 6月, 2016 1 次提交

vhost_net: stop polling socket during rx processing · 8241a1e4

由 Jason Wang 提交于 6月 01, 2016

We don't stop rx polling socket during rx processing, this will lead
unnecessary wakeups from under layer net devices (E.g
sock_def_readable() form tun). Rx will be slowed down in this
way. This patch avoids this by stop polling socket during rx
processing. A small drawback is that this introduces some overheads in
light load case because of the extra start/stop polling, but single
netperf TCP_RR does not notice any change. In a super heavy load case,
e.g using pktgen to inject packet to guest, we get about ~8.8%
improvement on pps:

before: ~1240000 pkt/s
after:  ~1350000 pkt/s
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8241a1e4

11 3月, 2016 1 次提交

vhost_net: basic polling support · 03088137

由 Jason Wang 提交于 3月 04, 2016

This patch tries to poll for new added tx buffer or socket receive
queue for a while at the end of tx/rx processing. The maximum time
spent on polling were specified through a new kind of vring ioctl.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

03088137

02 3月, 2016 1 次提交

vhost: rename vhost_init_used() · 80f7d030

由 Greg Kurz 提交于 2月 16, 2016

Looking at how callers use this, maybe we should just rename init_used
to vhost_vq_init_access. The _used suffix was a hint that we
access the vq used ring. But maybe what callers care about is
that it must be called after access_ok.

Also, this function manipulates the vq->is_le field which isn't related
to the vq used ring.

This patch simply renames vhost_init_used() to vhost_vq_init_access() as
suggested by Michael.

No behaviour change.
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

80f7d030

16 9月, 2015 1 次提交

vhost: move features to core · 4e9fa50c

由 Michael S. Tsirkin 提交于 9月 09, 2015

virtio 1 and any layout are core features, move them
there. This fixes vhost test.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

4e9fa50c

12 4月, 2015 1 次提交

new helper: msg_data_left() · 01e97e65

由 Al Viro 提交于 12月 15, 2014

convert open-coded instances
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

01e97e65

03 3月, 2015 1 次提交

net: Remove iocb argument from sendmsg and recvmsg · 1b784140

由 Ying Xue 提交于 3月 02, 2015

After TIPC doesn't depend on iocb argument in its internal
implementations of sendmsg() and recvmsg() hooks defined in proto
structure, no any user is using iocb argument in them at all now.
Then we can drop the redundant iocb argument completely from kinds of
implementations of both sendmsg() and recvmsg() in the entire
networking stack.

Cc: Christoph Hellwig <hch@lst.de>
Suggested-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b784140

28 2月, 2015 2 次提交

vhost: drop hard-coded num_buffers size · 0d79a493

由 Michael S. Tsirkin 提交于 2月 25, 2015

The 2 that we use for copy_to_iter comes from sizeof(u16),
it used to be that way before the iov iter update.
Fix it up, making it obvious the size of stack access
is right.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d79a493

vhost: cleanup iterator update logic · 4c5a8442

由 Michael S. Tsirkin 提交于 2月 25, 2015

Recent iterator-related changes in vhost made it
harder to follow the logic fixing up the header.
In fact, the fixup always happens at the same
offset: sizeof(virtio_net_hdr): sometimes the
fixup iterator is updated by copy_to_iter,
sometimes-by iov_iter_advance.

Rearrange code to make this obvious.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c5a8442

16 2月, 2015 1 次提交

vhost_net: fix wrong iter offset when setting number of buffers · 0960b641

由 Jason Wang 提交于 2月 15, 2015

In commit ba7438ae ("vhost: don't bother copying iovecs in
handle_rx(), kill memcpy_toiovecend()"), we advance iov iter fixup
sizeof(struct virtio_net_hdr) bytes and fill the number of buffers
after doing the socket recvmsg(). This work well but was broken after
commit 6e03f896 ("Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") which tries
to advance sizeof(struct virtio_net_hdr_mrg_rxbuf). It will fill the
number of buffers at the wrong place. This patch fixes this.

Fixes 6e03f896
("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0960b641

05 2月, 2015 1 次提交

vhost/net: fix up num_buffers endian-ness · 5201aa49

由 Michael S. Tsirkin 提交于 2月 03, 2015

In virtio 1.0 mode, when mergeable buffers are enabled on a big-endian
host, num_buffers wasn't byte-swapped correctly, so large incoming
packets got corrupted.

To fix, fill it in within hdr - this also makes sure it gets
the correct type.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5201aa49

04 2月, 2015 2 次提交

vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend() · ba7438ae

由 Al Viro 提交于 12月 10, 2014

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ba7438ae

vhost: don't bother with copying iovec in handle_tx() · 98a527aa

由 Al Viro 提交于 12月 10, 2014

just advance the msg.msg_iter and be done with that.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

98a527aa

14 1月, 2015 1 次提交

net: rename vlan_tx_* helpers since "tx" is misleading there · df8a39de

由 Jiri Pirko 提交于 1月 13, 2015

The same macros are used for rx as well. So rename it.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df8a39de

07 1月, 2015 1 次提交

vhost/net: length miscalculation · 99975cc6

由 Michael S. Tsirkin 提交于 1月 07, 2015

commit 8b38694a
    vhost/net: virtio 1.0 byte swap
had this chunk:
-       heads[headcount - 1].len += datalen;
+       heads[headcount - 1].len = cpu_to_vhost32(vq, len - datalen);

This adds datalen with the wrong sign, causing guest panics.

Fixes: 8b38694aReported-by: NAlex Williamson <alex.williamson@redhat.com>
Suggested-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

99975cc6

10 12月, 2014 1 次提交

put iov_iter into msghdr · c0371da6

由 Al Viro 提交于 11月 24, 2014

Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c0371da6

09 12月, 2014 4 次提交

M
vhost/net: enable virtio 1.0 · 41e3e421
由 Michael S. Tsirkin 提交于 10月 24, 2014
```
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
```
41e3e421

vhost/net: larger header for virtio 1.0 · e4fca7d6

由 Michael S. Tsirkin 提交于 10月 24, 2014

Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>

e4fca7d6

vhost/net: virtio 1.0 byte swap · 8b38694a

由 Michael S. Tsirkin 提交于 10月 24, 2014

I had to add an explicit tag to suppress compiler warning:
gcc isn't smart enough to notice that
len is always initialized since function is called with size > 0.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>

8b38694a

vhost/net: force len for TX to host endian · bf995734

由 Michael S. Tsirkin 提交于 10月 24, 2014

vhost/net keeps a copy of the used ring in host memory but (ab)uses
the length field for internal house-keeping. This works because the
length in the used ring for tx is always 0. In order to suppress sparse
warnings, we force native endianness here.
Note that these values are never exposed to guests.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NJason Wang <jasowang@redhat.com>

bf995734

23 6月, 2014 1 次提交

vhost-net: don't open-code kvfree · d04257b0

由 Romain Francoise 提交于 6月 12, 2014

Commit 23cc5a99 ("vhost-net: extend device allocation to vmalloc")
added another open-coded version of kvfree (which is available since
v3.15-rc5), nuke it.
Signed-off-by: NRomain Francoise <romain@orebokech.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

d04257b0

09 6月, 2014 3 次提交

vhost: move memory pointer to VQs · 47283bef

由 Michael S. Tsirkin 提交于 6月 05, 2014

commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc
    vhost: replace rcu with mutex
replaced rcu sync for memory accesses with VQ mutex locl/unlock.
This is correct since all accesses are under VQ mutex, but incomplete:
we still do useless rcu lock/unlock operations, someone might copy this
code into some other context where this won't be right.
This use of RCU is also non standard and hard to understand.
Let's copy the pointer to each VQ structure, this way
the access rules become straight-forward, and there's
no need for RCU anymore.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

47283bef

vhost: move acked_features to VQs · ea16c514

由 Michael S. Tsirkin 提交于 6月 05, 2014

Refactor code to make sure features are only accessed
under VQ mutex. This makes everything simpler, no need
for RCU here anymore.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ea16c514

vhost-net: extend device allocation to vmalloc · 23cc5a99

由 Michael S. Tsirkin 提交于 1月 23, 2013

Michael Mueller provided a patch to reduce the size of
vhost-net structure as some allocations could fail under
memory pressure/fragmentation. We are still left with
high order allocations though.

This patch is handling the problem at the core level, allowing
vhost structures to use vmalloc() if kmalloc() failed.

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

People are still looking at cleaner ways to handle the problem
at the API level, probably passing in multiple iovecs.
This hack seems consistent with approaches
taken since then by drivers/vhost/scsi.c and net/core/dev.c

Based on patch by Romain Francoise.

Cc: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: NRomain Francoise <romain@orebokech.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>

23cc5a99

02 4月, 2014 1 次提交
- A
  vhost: don't open-code sockfd_put() · 09aaacf0
  由 Al Viro 提交于 3月 05, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  09aaacf0
29 3月, 2014 2 次提交

vhost: validate vhost_get_vq_desc return value · a39ee449

由 Michael S. Tsirkin 提交于 3月 27, 2014

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014ad
    vhost-net: mergeable buffers support

CVE-2014-0055
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a39ee449

vhost: fix total length when packets are too short · d8316f39

由 Michael S. Tsirkin 提交于 3月 27, 2014

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8316f39

14 2月, 2014 2 次提交

vhost: fix a theoretical race in device cleanup · b0c057ca

由 Michael S. Tsirkin 提交于 2月 13, 2014

vhost_zerocopy_callback accesses VQ right after it drops a ubuf
reference.  In theory, this could race with device removal which waits
on the ubuf kref, and crash on use after free.

Do all accesses within rcu read side critical section, and synchronize
on release.

Since callbacks are always invoked from bh, synchronize_rcu_bh seems
enough and will help release complete a bit faster.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0c057ca

vhost: fix ref cnt checking deadlock · 0ad8b480

由 Michael S. Tsirkin 提交于 2月 13, 2014

vhost checked the counter within the refcnt before decrementing.  It
really wanted to know that it is the one that has the last reference, as
a way to batch freeing resources a bit more efficiently.

Note: we only let refcount go to 0 on device release.

This works well but we now access the ref counter twice so there's a
race: all users might see a high count and decide to defer freeing
resources.
In the end no one initiates freeing resources until the last reference
is gone (which is on VM shotdown so might happen after a looooong time).

Let's do what we probably should have done straight away:
switch from kref to plain atomic, documenting the
semantics, return the refcount value atomically after decrement,
then use that to avoid the deadlock.
Reported-by: NQin Chuanyu <qinchuanyu@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad8b480

07 12月, 2013 1 次提交

vhost: remove the dead branch · 59566b6e

由 Zhi Yong Wu 提交于 12月 07, 2013

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59566b6e

04 9月, 2013 5 次提交

vhost_net: correctly limit the max pending buffers · f7c6be40

由 Jason Wang 提交于 9月 02, 2013

As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.

So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7c6be40

vhost_net: poll vhost queue after marking DMA is done · 19c73b3e

由 Jason Wang 提交于 9月 02, 2013

We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19c73b3e

vhost_net: determine whether or not to use zerocopy at one time · ce21a029

由 Jason Wang 提交于 9月 02, 2013

Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce21a029

vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() · c92112ae

由 Jason Wang 提交于 9月 02, 2013

We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.

2% performance improvement were seen on netperf TCP_RR test.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92112ae

vhost_net: make vhost_zerocopy_signal_used() return void · 094afe7d

由 Jason Wang 提交于 9月 02, 2013

None of its caller use its return value, so let it return void.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

094afe7d

11 7月, 2013 2 次提交

vhost: Remove custom vhost rcu usage · 22fa90c7

由 Asias He 提交于 5月 07, 2013

Now, vq->private_data is always accessed under vq mutex. No need to play
the vhost rcu trick.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

22fa90c7

A
vhost-net: Always access vq->private_data under vq mutex · 2e26af79
由 Asias He 提交于 5月 07, 2013
```
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
```
2e26af79

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功