提交 · d838df2e5dcbb6ed4d82854869e9a30f9aeef6da · openeuler / Kernel

23 6月, 2014 1 次提交

vhost-net: don't open-code kvfree · d04257b0

由 Romain Francoise 提交于 6月 12, 2014

Commit 23cc5a99 ("vhost-net: extend device allocation to vmalloc")
added another open-coded version of kvfree (which is available since
v3.15-rc5), nuke it.
Signed-off-by: NRomain Francoise <romain@orebokech.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

d04257b0

09 6月, 2014 3 次提交

vhost: move memory pointer to VQs · 47283bef

由 Michael S. Tsirkin 提交于 6月 05, 2014

commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc
    vhost: replace rcu with mutex
replaced rcu sync for memory accesses with VQ mutex locl/unlock.
This is correct since all accesses are under VQ mutex, but incomplete:
we still do useless rcu lock/unlock operations, someone might copy this
code into some other context where this won't be right.
This use of RCU is also non standard and hard to understand.
Let's copy the pointer to each VQ structure, this way
the access rules become straight-forward, and there's
no need for RCU anymore.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

47283bef

vhost: move acked_features to VQs · ea16c514

由 Michael S. Tsirkin 提交于 6月 05, 2014

Refactor code to make sure features are only accessed
under VQ mutex. This makes everything simpler, no need
for RCU here anymore.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

ea16c514

vhost-net: extend device allocation to vmalloc · 23cc5a99

由 Michael S. Tsirkin 提交于 1月 23, 2013

Michael Mueller provided a patch to reduce the size of
vhost-net structure as some allocations could fail under
memory pressure/fragmentation. We are still left with
high order allocations though.

This patch is handling the problem at the core level, allowing
vhost structures to use vmalloc() if kmalloc() failed.

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.

People are still looking at cleaner ways to handle the problem
at the API level, probably passing in multiple iovecs.
This hack seems consistent with approaches
taken since then by drivers/vhost/scsi.c and net/core/dev.c

Based on patch by Romain Francoise.

Cc: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: NRomain Francoise <romain@orebokech.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>

23cc5a99

02 4月, 2014 1 次提交
- A
  vhost: don't open-code sockfd_put() · 09aaacf0
  由 Al Viro 提交于 3月 05, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  09aaacf0
29 3月, 2014 2 次提交

vhost: validate vhost_get_vq_desc return value · a39ee449

由 Michael S. Tsirkin 提交于 3月 27, 2014

vhost fails to validate negative error code
from vhost_get_vq_desc causing
a crash: we are using -EFAULT which is 0xfffffff2
as vector size, which exceeds the allocated size.

The code in question was introduced in commit
8dd014ad
    vhost-net: mergeable buffers support

CVE-2014-0055
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a39ee449

vhost: fix total length when packets are too short · d8316f39

由 Michael S. Tsirkin 提交于 3月 27, 2014

When mergeable buffers are disabled, and the
incoming packet is too large for the rx buffer,
get_rx_bufs returns success.

This was intentional in order for make recvmsg
truncate the packet and then handle_rx would
detect err != sock_len and drop it.

Unfortunately we pass the original sock_len to
recvmsg - which means we use parts of iov not fully
validated.

Fix this up by detecting this overrun and doing packet drop
immediately.

CVE-2014-0077
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8316f39

14 2月, 2014 2 次提交

vhost: fix a theoretical race in device cleanup · b0c057ca

由 Michael S. Tsirkin 提交于 2月 13, 2014

vhost_zerocopy_callback accesses VQ right after it drops a ubuf
reference.  In theory, this could race with device removal which waits
on the ubuf kref, and crash on use after free.

Do all accesses within rcu read side critical section, and synchronize
on release.

Since callbacks are always invoked from bh, synchronize_rcu_bh seems
enough and will help release complete a bit faster.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0c057ca

vhost: fix ref cnt checking deadlock · 0ad8b480

由 Michael S. Tsirkin 提交于 2月 13, 2014

vhost checked the counter within the refcnt before decrementing.  It
really wanted to know that it is the one that has the last reference, as
a way to batch freeing resources a bit more efficiently.

Note: we only let refcount go to 0 on device release.

This works well but we now access the ref counter twice so there's a
race: all users might see a high count and decide to defer freeing
resources.
In the end no one initiates freeing resources until the last reference
is gone (which is on VM shotdown so might happen after a looooong time).

Let's do what we probably should have done straight away:
switch from kref to plain atomic, documenting the
semantics, return the refcount value atomically after decrement,
then use that to avoid the deadlock.
Reported-by: NQin Chuanyu <qinchuanyu@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad8b480

07 12月, 2013 1 次提交

vhost: remove the dead branch · 59566b6e

由 Zhi Yong Wu 提交于 12月 07, 2013

Since vhost_dev_init() forever return 0, some branches are never run,
therefore need to be removed.
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59566b6e

04 9月, 2013 5 次提交

vhost_net: correctly limit the max pending buffers · f7c6be40

由 Jason Wang 提交于 9月 02, 2013

As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.

So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7c6be40

vhost_net: poll vhost queue after marking DMA is done · 19c73b3e

由 Jason Wang 提交于 9月 02, 2013

We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19c73b3e

vhost_net: determine whether or not to use zerocopy at one time · ce21a029

由 Jason Wang 提交于 9月 02, 2013

Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce21a029

vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() · c92112ae

由 Jason Wang 提交于 9月 02, 2013

We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.

2% performance improvement were seen on netperf TCP_RR test.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92112ae

vhost_net: make vhost_zerocopy_signal_used() return void · 094afe7d

由 Jason Wang 提交于 9月 02, 2013

None of its caller use its return value, so let it return void.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

094afe7d

11 7月, 2013 2 次提交

vhost: Remove custom vhost rcu usage · 22fa90c7

由 Asias He 提交于 5月 07, 2013

Now, vq->private_data is always accessed under vq mutex. No need to play
the vhost rcu trick.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

22fa90c7

A
vhost-net: Always access vq->private_data under vq mutex · 2e26af79
由 Asias He 提交于 5月 07, 2013
```
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
```
2e26af79

10 7月, 2013 1 次提交

vhost-net: fix use-after-free in vhost_net_flush · dd7633ec

由 Michael S. Tsirkin 提交于 7月 07, 2013

vhost_net_ubuf_put_and_wait has a confusing name:
it will actually also free it's argument.
Thus since commit 1280c27f
    "vhost-net: flush outstanding DMAs on memory change"
vhost_net_flush tries to use the argument after passing it
to vhost_net_ubuf_put_and_wait, this results
in use after free.
To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
add an new API for callers that want to free ubufs.
Acked-by: NAsias He <asias@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd7633ec

07 7月, 2013 2 次提交

vhost: Make local function static · 0a1febf7

由 Asias He 提交于 6月 05, 2013

$ make C=1 M=drivers/vhost

drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not declared. Should it be static?
drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not declared. Should it be static?
drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not declared. Should it be static?
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

0a1febf7

vhost-net: fix use-after-free in vhost_net_flush · c38e39c3

由 Michael S. Tsirkin 提交于 6月 25, 2013

vhost_net_ubuf_put_and_wait has a confusing name:
it will actually also free it's argument.
Thus since commit 1280c27f
    "vhost-net: flush outstanding DMAs on memory change"
vhost_net_flush tries to use the argument after passing it
to vhost_net_ubuf_put_and_wait, this results
in use after free.
To fix, don't free the argument in vhost_net_ubuf_put_and_wait,
add an new API for callers that want to free ubufs.
Acked-by: NAsias He <asias@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c38e39c3

11 6月, 2013 3 次提交

vhost: fix ubuf_info cleanup · 288cfe78

由 Michael S. Tsirkin 提交于 6月 06, 2013

vhost_net_clear_ubuf_info didn't clear ubuf_info
after kfree, this could trigger double free.
Fix this and simplify this code to make it more robust: make sure
ubuf info is always freed through vhost_net_clear_ubuf_info.
Reported-by: NTommi Rantala <tt.rantala@gmail.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

288cfe78

vhost: check owner before we overwrite ubuf_info · 05c05351

由 Michael S. Tsirkin 提交于 6月 06, 2013

If device has an owner, we shouldn't touch ubuf_info
since it might be in use.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05c05351

vhost_net: clear msg.control for non-zerocopy case during tx · 4364d5f9

由 Jason Wang 提交于 6月 05, 2013

When we decide not use zero-copy, msg.control should be set to NULL otherwise
macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs
wrongly.

Bug were introduced by commit cedb9bdc
(vhost-net: skip head management if no outstanding).

This solves the following warnings:

WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]()
Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun]
CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566
Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011
ffffffffa0198323 ffff88007c9ebd08 ffffffff81796b73 ffff88007c9ebd48
ffffffff8103d66b 000000007b773e20 ffff8800779f0000 ffff8800779f43f0
ffff8800779f8418 000000000000015c 0000000000000062 ffff88007c9ebd58
Call Trace:
[<ffffffff81796b73>] dump_stack+0x19/0x1e
[<ffffffff8103d66b>] warn_slowpath_common+0x6b/0xa0
[<ffffffff8103d6b5>] warn_slowpath_null+0x15/0x20
[<ffffffffa0197627>] handle_tx+0x477/0x4b0 [vhost_net]
[<ffffffffa0197690>] handle_tx_kick+0x10/0x20 [vhost_net]
[<ffffffffa019541e>] vhost_worker+0xfe/0x1a0 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net]
[<ffffffff81061f46>] kthread+0xc6/0xd0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff817a1aec>] ret_from_fork+0x7c/0xb0
[<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4364d5f9

06 5月, 2013 3 次提交

vhost-net: Cleanup vhost_ubuf and vhost_zcopy · fe729a57

由 Asias He 提交于 5月 06, 2013

- Rename vhost_ubuf to vhost_net_ubuf
- Rename vhost_zcopy_mask to vhost_net_zcopy_mask
- Make funcs static
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

fe729a57

vhost: Move VHOST_NET_FEATURES to net.c · 8570a6e7

由 Asias He 提交于 5月 06, 2013

vhost.h should not depend on device specific marcos like
VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

8570a6e7

A
vhost-net: Free ubuf when vhost_dev_set_owner fails · b1ad8496
由 Asias He 提交于 5月 06, 2013
```
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
```
b1ad8496

01 5月, 2013 4 次提交

vhost: fix error handling in RESET_OWNER ioctl · 150b9e51

由 Michael S. Tsirkin 提交于 4月 28, 2013

RESET_OWNER ioctl would leave the fd in a bad state if
memory allocation failed: device is stopped
but owner is not reset. Make state changes
after allocating memory, such that a failed
ioctl has no effect.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

150b9e51

vhost: move per-vq net specific fields out to net · 81f95a55

由 Michael S. Tsirkin 提交于 4月 28, 2013

This will remove the need for vhost scsi to pull
in virtio-net.h.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

81f95a55

vhost: move vhost-net zerocopy fields to net.c · 2839400f

由 Asias He 提交于 4月 27, 2013

On top of 'vhost: Allow device specific fields per vq', we can move device
specific fields to device virt queue from vhost virt queue.
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

2839400f

vhost: Allow device specific fields per vq · 3ab2e420

由 Asias He 提交于 4月 27, 2013

This is useful for any device who wants device specific fields per vq.
For example, tcm_vhost wants a per vq field to track requests which are
in flight on the vq. Also, on top of this we can add patches to move
things like ubufs from vhost.h out to net.c.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3ab2e420

12 4月, 2013 1 次提交

vhost_net: remove tx polling state · 70181d51

由 Jason Wang 提交于 4月 10, 2013

After commit 2b8b328b (vhost_net: handle polling
errors when setting backend), we in fact track the polling state through
poll->wqh, so there's no need to duplicate the work with an extra
vhost_net_polling_state. So this patch removes this and make the code simpler.

This patch also removes the all tx starting/stopping code in tx path according
to Michael's suggestion.

Netperf test shows almost the same result in stream test, but gets improvements
on TCP_RR tests (both zerocopy or copy) especially on low load cases.

Tested between multiqueue kvm guest and external host with two direct
connected 82599s.

zerocopy disabled:

sessions|transaction rates|normalize|
before/after/+improvements
1 | 9510.24/11727.29/+23.3%    | 693.54/887.68/+28.0%   |
25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% |
50| 277634.64/291905.76/+5%    | 3118.36/3230.11/+3.6%  |

zerocopy enabled:

sessions|transaction rates|normalize|
before/after/+improvements
1 | 7318.33/11929.76/+63.0%    | 521.86/843.30/+61.6%   |
25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% |
50| 272181.02/294347.04/+8.1%  | 3071.56/3257.85/+6.1%  |
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70181d51

18 3月, 2013 1 次提交

vhost/net: fix heads usage of ubuf_info · 46aa92d1

由 Michael S. Tsirkin 提交于 3月 17, 2013

ubuf info allocator uses guest controlled head as an index,
so a malicious guest could put the same head entry in the ring twice,
and we will get two callbacks on the same value.
To fix use upend_idx which is guaranteed to be unique.
Reported-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46aa92d1

30 1月, 2013 2 次提交

vhost_net: handle polling errors when setting backend · 2b8b328b

由 Jason Wang 提交于 1月 28, 2013

Currently, the polling errors were ignored, which can lead following issues:

- vhost remove itself unconditionally from waitqueue when stopping the poll,
  this may crash the kernel since the previous attempt of starting may fail to
  add itself to the waitqueue
- userspace may think the backend were successfully set even when the polling
  failed.

Solve this by:

- check poll->wqh before trying to remove from waitqueue
- report polling errors in vhost_poll_start(), tx_poll_start(), the return value
  will be checked and returned when userspace want to set the backend

After this fix, there still could be a polling failure after backend is set, it
will addressed by the next patch.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b8b328b

vhost_net: correct error handling in vhost_net_set_backend() · 692a998b

由 Jason Wang 提交于 1月 28, 2013

Currently, when vhost_init_used() fails the sock refcnt and ubufs were
leaked. Correct this by calling vhost_init_used() before assign ubufs and
restore the oldsock when it fails.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

692a998b

06 12月, 2012 4 次提交

vhost-net: enable zerocopy tx by default · f9611c43

由 Michael S. Tsirkin 提交于 12月 06, 2012

Zero copy TX has been around for a while now.
We seem to be down to eliminating theoretical bugs
and performance tuning at this point:
it's probably time to enable it by default so that
most users get the benefit.

Keep the flag around meanwhile so users can experiment
with disabling this if they experience regressions.
I expect that we will remove it in the future.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

f9611c43

vhost-net: skip head management if no outstanding · cedb9bdc

由 Michael S. Tsirkin 提交于 12月 06, 2012

For short packets zerocopy mode adds overhead
of managing heads which isn't necessary: we
could simly update used ring directly
same as with zerocopy disabled.

Things seem to run a bit faster if we detect
and bypass head management when zcopy isn't used.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

cedb9bdc

vhost-net: flush outstanding DMAs on memory change · 1280c27f

由 Michael S. Tsirkin 提交于 12月 04, 2012

When memory map changes, we need to flush outstanding
DMAs as they might in theory reference old memory addresses.
To do this simply stop initiating new DMAs
and wait for ubufs ref count to drop to 0.
Afterwards reset the count back to 1.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

1280c27f

vhost: avoid backend flush on vring ops · 935cdee7

由 Michael S. Tsirkin 提交于 12月 06, 2012

vring changes already do a flush internally where appropriate, so we do
not need a second flush.

It's currently not very expensive but a follow-up patch makes flush more
heavy-weight, so remove the extra flush here to avoid regressing
performance if call or kick fds are changed on data path.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

935cdee7

04 12月, 2012 1 次提交

vhost-net: initialize zcopy packet counters · 64e9a9b8

由 Michael S. Tsirkin 提交于 12月 03, 2012

These packet counters are used to drive the zercopy
selection heuristic so nothing too bad happens if they are off a bit -
and they are also reset once in a while.
But it's cleaner to clear them when backend is set so that
we start in a known state.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64e9a9b8

03 11月, 2012 1 次提交

vhost-net: reduce vq polling on tx zerocopy · 24eb21a1

由 Michael S. Tsirkin 提交于 11月 01, 2012

It seems that to avoid deadlocks it is enough to poll vq before
 we are going to use the last buffer.  This is faster than
c70aa540.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24eb21a1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功