- 23 6月, 2014 1 次提交
-
-
由 Romain Francoise 提交于
Commit 23cc5a99 ("vhost-net: extend device allocation to vmalloc") added another open-coded version of kvfree (which is available since v3.15-rc5), nuke it. Signed-off-by: NRomain Francoise <romain@orebokech.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 09 6月, 2014 3 次提交
-
-
由 Michael S. Tsirkin 提交于
commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc vhost: replace rcu with mutex replaced rcu sync for memory accesses with VQ mutex locl/unlock. This is correct since all accesses are under VQ mutex, but incomplete: we still do useless rcu lock/unlock operations, someone might copy this code into some other context where this won't be right. This use of RCU is also non standard and hard to understand. Let's copy the pointer to each VQ structure, this way the access rules become straight-forward, and there's no need for RCU anymore. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
Refactor code to make sure features are only accessed under VQ mutex. This makes everything simpler, no need for RCU here anymore. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
Michael Mueller provided a patch to reduce the size of vhost-net structure as some allocations could fail under memory pressure/fragmentation. We are still left with high order allocations though. This patch is handling the problem at the core level, allowing vhost structures to use vmalloc() if kmalloc() failed. As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT to kzalloc() flags to do this fallback only when really needed. People are still looking at cleaner ways to handle the problem at the API level, probably passing in multiple iovecs. This hack seems consistent with approaches taken since then by drivers/vhost/scsi.c and net/core/dev.c Based on patch by Romain Francoise. Cc: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: NRomain Francoise <romain@orebokech.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 02 4月, 2014 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 3月, 2014 2 次提交
-
-
由 Michael S. Tsirkin 提交于
vhost fails to validate negative error code from vhost_get_vq_desc causing a crash: we are using -EFAULT which is 0xfffffff2 as vector size, which exceeds the allocated size. The code in question was introduced in commit 8dd014ad vhost-net: mergeable buffers support CVE-2014-0055 Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
When mergeable buffers are disabled, and the incoming packet is too large for the rx buffer, get_rx_bufs returns success. This was intentional in order for make recvmsg truncate the packet and then handle_rx would detect err != sock_len and drop it. Unfortunately we pass the original sock_len to recvmsg - which means we use parts of iov not fully validated. Fix this up by detecting this overrun and doing packet drop immediately. CVE-2014-0077 Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2014 2 次提交
-
-
由 Michael S. Tsirkin 提交于
vhost_zerocopy_callback accesses VQ right after it drops a ubuf reference. In theory, this could race with device removal which waits on the ubuf kref, and crash on use after free. Do all accesses within rcu read side critical section, and synchronize on release. Since callbacks are always invoked from bh, synchronize_rcu_bh seems enough and will help release complete a bit faster. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
vhost checked the counter within the refcnt before decrementing. It really wanted to know that it is the one that has the last reference, as a way to batch freeing resources a bit more efficiently. Note: we only let refcount go to 0 on device release. This works well but we now access the ref counter twice so there's a race: all users might see a high count and decide to defer freeing resources. In the end no one initiates freeing resources until the last reference is gone (which is on VM shotdown so might happen after a looooong time). Let's do what we probably should have done straight away: switch from kref to plain atomic, documenting the semantics, return the refcount value atomically after decrement, then use that to avoid the deadlock. Reported-by: NQin Chuanyu <qinchuanyu@huawei.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2013 1 次提交
-
-
由 Zhi Yong Wu 提交于
Since vhost_dev_init() forever return 0, some branches are never run, therefore need to be removed. Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 9月, 2013 5 次提交
-
-
由 Jason Wang 提交于
As Michael point out, We used to limit the max pending DMAs to get better cache utilization. But it was not done correctly since it was one done when there's no new buffers submitted from guest. Guest can easily exceeds the limitation by keeping sending packets. So this patch moves the check into main loop. Tests shows about 5%-10% improvement on per cpu throughput for guest tx. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We used to poll vhost queue before making DMA is done, this is racy if vhost thread were waked up before marking DMA is done which can result the signal to be missed. Fix this by always polling the vhost thread before DMA is done. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if upend_idx != done_idx we still set zcopy_used to true and rollback this choice later. This could be avoided by determining zerocopy once by checking all conditions at one time before. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We tend to batch the used adding and signaling in vhost_zerocopy_callback() which may result more than 100 used buffers to be updated in vhost_zerocopy_signal_used() in some cases. So switch to use vhost_add_used_and_signal_n() to avoid multiple calls to vhost_add_used_and_signal(). Which means much less times of used index updating and memory barriers. 2% performance improvement were seen on netperf TCP_RR test. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
None of its caller use its return value, so let it return void. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2013 2 次提交
-
-
由 Asias He 提交于
Now, vq->private_data is always accessed under vq mutex. No need to play the vhost rcu trick. Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Asias He 提交于
Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 10 7月, 2013 1 次提交
-
-
由 Michael S. Tsirkin 提交于
vhost_net_ubuf_put_and_wait has a confusing name: it will actually also free it's argument. Thus since commit 1280c27f "vhost-net: flush outstanding DMAs on memory change" vhost_net_flush tries to use the argument after passing it to vhost_net_ubuf_put_and_wait, this results in use after free. To fix, don't free the argument in vhost_net_ubuf_put_and_wait, add an new API for callers that want to free ubufs. Acked-by: NAsias He <asias@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 7月, 2013 2 次提交
-
-
由 Asias He 提交于
$ make C=1 M=drivers/vhost drivers/vhost/net.c:168:5: warning: symbol 'vhost_net_set_ubuf_info' was not declared. Should it be static? drivers/vhost/net.c:194:6: warning: symbol 'vhost_net_vq_reset' was not declared. Should it be static? drivers/vhost/scsi.c:219:6: warning: symbol 'tcm_vhost_done_inflight' was not declared. Should it be static? Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
vhost_net_ubuf_put_and_wait has a confusing name: it will actually also free it's argument. Thus since commit 1280c27f "vhost-net: flush outstanding DMAs on memory change" vhost_net_flush tries to use the argument after passing it to vhost_net_ubuf_put_and_wait, this results in use after free. To fix, don't free the argument in vhost_net_ubuf_put_and_wait, add an new API for callers that want to free ubufs. Acked-by: NAsias He <asias@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 11 6月, 2013 3 次提交
-
-
由 Michael S. Tsirkin 提交于
vhost_net_clear_ubuf_info didn't clear ubuf_info after kfree, this could trigger double free. Fix this and simplify this code to make it more robust: make sure ubuf info is always freed through vhost_net_clear_ubuf_info. Reported-by: NTommi Rantala <tt.rantala@gmail.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michael S. Tsirkin 提交于
If device has an owner, we shouldn't touch ubuf_info since it might be in use. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
When we decide not use zero-copy, msg.control should be set to NULL otherwise macvtap/tap may set zerocopy callbacks which may decrease the kref of ubufs wrongly. Bug were introduced by commit cedb9bdc (vhost-net: skip head management if no outstanding). This solves the following warnings: WARNING: at include/linux/kref.h:47 handle_tx+0x477/0x4b0 [vhost_net]() Modules linked in: vhost_net macvtap macvlan tun nfsd exportfs bridge stp llc openvswitch kvm_amd kvm bnx2 megaraid_sas [last unloaded: tun] CPU: 5 PID: 8670 Comm: vhost-8668 Not tainted 3.10.0-rc2+ #1566 Hardware name: Dell Inc. PowerEdge R715/00XHKG, BIOS 1.5.2 04/19/2011 ffffffffa0198323 ffff88007c9ebd08 ffffffff81796b73 ffff88007c9ebd48 ffffffff8103d66b 000000007b773e20 ffff8800779f0000 ffff8800779f43f0 ffff8800779f8418 000000000000015c 0000000000000062 ffff88007c9ebd58 Call Trace: [<ffffffff81796b73>] dump_stack+0x19/0x1e [<ffffffff8103d66b>] warn_slowpath_common+0x6b/0xa0 [<ffffffff8103d6b5>] warn_slowpath_null+0x15/0x20 [<ffffffffa0197627>] handle_tx+0x477/0x4b0 [vhost_net] [<ffffffffa0197690>] handle_tx_kick+0x10/0x20 [vhost_net] [<ffffffffa019541e>] vhost_worker+0xfe/0x1a0 [vhost_net] [<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [<ffffffffa0195320>] ? vhost_attach_cgroups_work+0x30/0x30 [vhost_net] [<ffffffff81061f46>] kthread+0xc6/0xd0 [<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff817a1aec>] ret_from_fork+0x7c/0xb0 [<ffffffff81061e80>] ? kthread_freezable_should_stop+0x70/0x70 Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 5月, 2013 3 次提交
-
-
由 Asias He 提交于
- Rename vhost_ubuf to vhost_net_ubuf - Rename vhost_zcopy_mask to vhost_net_zcopy_mask - Make funcs static Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Asias He 提交于
vhost.h should not depend on device specific marcos like VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF. Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Asias He 提交于
Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 01 5月, 2013 4 次提交
-
-
由 Michael S. Tsirkin 提交于
RESET_OWNER ioctl would leave the fd in a bad state if memory allocation failed: device is stopped but owner is not reset. Make state changes after allocating memory, such that a failed ioctl has no effect. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
This will remove the need for vhost scsi to pull in virtio-net.h. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Asias He 提交于
On top of 'vhost: Allow device specific fields per vq', we can move device specific fields to device virt queue from vhost virt queue. Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Asias He 提交于
This is useful for any device who wants device specific fields per vq. For example, tcm_vhost wants a per vq field to track requests which are in flight on the vq. Also, on top of this we can add patches to move things like ubufs from vhost.h out to net.c. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NAsias He <asias@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 12 4月, 2013 1 次提交
-
-
由 Jason Wang 提交于
After commit 2b8b328b (vhost_net: handle polling errors when setting backend), we in fact track the polling state through poll->wqh, so there's no need to duplicate the work with an extra vhost_net_polling_state. So this patch removes this and make the code simpler. This patch also removes the all tx starting/stopping code in tx path according to Michael's suggestion. Netperf test shows almost the same result in stream test, but gets improvements on TCP_RR tests (both zerocopy or copy) especially on low load cases. Tested between multiqueue kvm guest and external host with two direct connected 82599s. zerocopy disabled: sessions|transaction rates|normalize| before/after/+improvements 1 | 9510.24/11727.29/+23.3% | 693.54/887.68/+28.0% | 25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% | 50| 277634.64/291905.76/+5% | 3118.36/3230.11/+3.6% | zerocopy enabled: sessions|transaction rates|normalize| before/after/+improvements 1 | 7318.33/11929.76/+63.0% | 521.86/843.30/+61.6% | 25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% | 50| 272181.02/294347.04/+8.1% | 3071.56/3257.85/+6.1% | Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2013 1 次提交
-
-
由 Michael S. Tsirkin 提交于
ubuf info allocator uses guest controlled head as an index, so a malicious guest could put the same head entry in the ring twice, and we will get two callbacks on the same value. To fix use upend_idx which is guaranteed to be unique. Reported-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Cc: stable@kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 1月, 2013 2 次提交
-
-
由 Jason Wang 提交于
Currently, the polling errors were ignored, which can lead following issues: - vhost remove itself unconditionally from waitqueue when stopping the poll, this may crash the kernel since the previous attempt of starting may fail to add itself to the waitqueue - userspace may think the backend were successfully set even when the polling failed. Solve this by: - check poll->wqh before trying to remove from waitqueue - report polling errors in vhost_poll_start(), tx_poll_start(), the return value will be checked and returned when userspace want to set the backend After this fix, there still could be a polling failure after backend is set, it will addressed by the next patch. Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
Currently, when vhost_init_used() fails the sock refcnt and ubufs were leaked. Correct this by calling vhost_init_used() before assign ubufs and restore the oldsock when it fails. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2012 4 次提交
-
-
由 Michael S. Tsirkin 提交于
Zero copy TX has been around for a while now. We seem to be down to eliminating theoretical bugs and performance tuning at this point: it's probably time to enable it by default so that most users get the benefit. Keep the flag around meanwhile so users can experiment with disabling this if they experience regressions. I expect that we will remove it in the future. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
For short packets zerocopy mode adds overhead of managing heads which isn't necessary: we could simly update used ring directly same as with zerocopy disabled. Things seem to run a bit faster if we detect and bypass head management when zcopy isn't used. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
When memory map changes, we need to flush outstanding DMAs as they might in theory reference old memory addresses. To do this simply stop initiating new DMAs and wait for ubufs ref count to drop to 0. Afterwards reset the count back to 1. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
vring changes already do a flush internally where appropriate, so we do not need a second flush. It's currently not very expensive but a follow-up patch makes flush more heavy-weight, so remove the extra flush here to avoid regressing performance if call or kick fds are changed on data path. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 04 12月, 2012 1 次提交
-
-
由 Michael S. Tsirkin 提交于
These packet counters are used to drive the zercopy selection heuristic so nothing too bad happens if they are off a bit - and they are also reset once in a while. But it's cleaner to clear them when backend is set so that we start in a known state. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 11月, 2012 1 次提交
-
-
由 Michael S. Tsirkin 提交于
It seems that to avoid deadlocks it is enough to poll vq before we are going to use the last buffer. This is faster than c70aa540. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-