- 12 6月, 2018 1 次提交
-
-
由 Michael S. Tsirkin 提交于
struct vhost_msg within struct vhost_msg_node is copied to userspace. Unfortunately it turns out on 64 bit systems vhost_msg has padding after type which gcc doesn't initialize, leaking 4 uninitialized bytes to userspace. This padding also unfortunately means 32 bit users of this interface are broken on a 64 bit kernel which will need to be fixed separately. Fixes: CVE-2018-1118 Cc: stable@vger.kernel.org Reported-by: NKevin Easton <kevin@guarana.org> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reported-by: syzbot+87cfa083e727a224754b@syzkaller.appspotmail.com Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 31 5月, 2018 1 次提交
-
-
由 Jason Wang 提交于
After commit e2b3b35e ("vhost_net: batch used ring update in rx"), we tend to batch updating used heads. But it doesn't flush batched heads before trying to do busy polling, this will cause vhost to wait for guest TX which waits for the used RX. Fixing by flush batched heads before busy loop. 1 byte TCP_RR performance recovers from 13107.83 to 50402.65. Fixes: e2b3b35e ("vhost_net: batch used ring update in rx") Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 5月, 2018 1 次提交
-
-
由 Jason Wang 提交于
DaeRyong Jeong reports a race between vhost_dev_cleanup() and vhost_process_iotlb_msg(): Thread interleaving: CPU0 (vhost_process_iotlb_msg) CPU1 (vhost_dev_cleanup) (In the case of both VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE) ===== ===== vhost_umem_clean(dev->iotlb); if (!dev->iotlb) { ret = -EFAULT; break; } dev->iotlb = NULL; The reason is we don't synchronize between them, fixing by protecting vhost_process_iotlb_msg() with dev mutex. Reported-by: NDaeRyong Jeong <threeearcat@gmail.com> Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 4月, 2018 3 次提交
-
-
由 Stefan Hajnoczi 提交于
Currently vhost *_access_ok() functions return int. This is error-prone because there are two popular conventions: 1. 0 means failure, 1 means success 2. -errno means failure, 0 means success Although vhost mostly uses #1, it does not do so consistently. umem_access_ok() uses #2. This patch changes the return type from int to bool so that false means failure and true means success. This eliminates a potential source of errors. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stefan Hajnoczi 提交于
Commit d65026c6 ("vhost: validate log when IOTLB is enabled") introduced a regression. The logic was originally: if (vq->iotlb) return 1; return A && B; After the patch the short-circuit logic for A was inverted: if (A || vq->iotlb) return A; return B; This patch fixes the regression by rewriting the checks in the obvious way, no longer returning A when vq->iotlb is non-NULL (which is hard to understand). Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Auger 提交于
vhost_copy_to_user is used to copy vring used elements to userspace. We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC. Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache") Signed-off-by: NEric Auger <eric.auger@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 4月, 2018 1 次提交
-
-
由 haibinzhang(张海斌) 提交于
handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy polling udp packets with small length(e.g. 1byte udp payload), because setting VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length. Ping-Latencies shown below were tested between two Virtual Machines using netperf (UDP_STREAM, len=1), and then another machine pinged the client: vq size=256 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 3.319 18.489 57.303 64 1.643 2.021 2.552 128 1.825 2.600 3.224 256 1.997 2.710 4.295 512 1.860 3.171 4.631 1024 2.002 4.173 9.056 2048 2.257 5.650 9.688 4096 2.093 8.508 15.943 vq size=512 Packet-Weight Ping-Latencies(millisecond) min avg max Origin 6.537 29.177 66.245 64 2.798 3.614 4.403 128 2.861 3.820 4.775 256 3.008 4.018 4.807 512 3.254 4.523 5.824 1024 3.079 5.335 7.747 2048 3.944 8.201 12.762 4096 4.158 11.057 19.985 Seems pretty consistent, a small dip at 2 VQ sizes. Ring size is a hint from device about a burst size it can tolerate. Based on benchmarks, set the weight to 2 * vq size. To evaluate this change, another tests were done using netperf(RR, TX) between two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was tweaked through qemu. Results shown below does not show obvious changes. vq size=256 TCP_RR vq size=512 TCP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2% 1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0% 1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1% 64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3% 64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1% 64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2% 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2% 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2% 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1% vq size=256 UDP_RR vq size=512 UDP_RR size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2% 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2% 1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0% 64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1% 64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0% 64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1% 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0% 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4% 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1% vq size=256 TCP_STREAM vq size=512 TCP_STREAM size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize% 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0% 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4% 64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2% 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1% 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0% 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0% 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1% 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0% 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1% 1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1% 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0% 1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0% 2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0% 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1% 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1% 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0% 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0% 4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1% Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NHaibin Zhang <haibinzhang@tencent.com> Signed-off-by: NYunfang Tai <yunfangtai@tencent.com> Signed-off-by: NLidong Chen <lidongchen@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 3月, 2018 1 次提交
-
-
由 Jason Wang 提交于
Vq log_base is the userspace address of bitmap which has nothing to do with IOTLB. So it needs to be validated unconditionally otherwise we may try use 0 as log_base which may lead to pin pages that will lead unexpected result (e.g trigger BUG_ON() in set_bit_to_user()). Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Reported-by: syzbot+6304bf97ef436580fede@syzkaller.appspotmail.com Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 3月, 2018 1 次提交
-
-
由 Jason Wang 提交于
We tried to remove vq poll from wait queue, but do not check whether or not it was in a list before. This will lead double free. Fixing this by switching to use vhost_poll_stop() which zeros poll->wqh after removing poll from waitqueue to make sure it won't be freed twice. Cc: Darren Kenny <darren.kenny@oracle.com> Reported-by: syzbot+c0272972b01b872e604a@syzkaller.appspotmail.com Fixes: 2b8b328b ("vhost_net: handle polling errors when setting backend") Signed-off-by: NJason Wang <jasowang@redhat.com> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 3月, 2018 1 次提交
-
-
由 Jason Wang 提交于
We try to hold TX virtqueue mutex in vhost_net_rx_peek_head_len() after RX virtqueue mutex is held in handle_rx(). This requires an appropriate lock nesting notation to calm down deadlock detector. Fixes: 03088137 ("vhost_net: basic polling support") Reported-by: syzbot+7f073540b1384a614e09@syzkaller.appspotmail.com Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2018 2 次提交
-
-
由 Sonny Rao 提交于
This will allow usage of vsock from 32-bit binaries on a 64-bit kernel. Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Sonny Rao 提交于
Clang is particularly anal about signed vs unsigned comparisons and doesn't like the fact that some ioctl numbers set the MSB, so we get this error when trying to build vhost on aarch64: drivers/vhost/vhost.c:1400:7: error: overflow converting case value to switch condition type (3221794578 to 18446744072636378898) [-Werror, -Wswitch] case VHOST_GET_VRING_BASE: 3221794578 is 0xC008AF12 in hex 18446744072636378898 is 0xFFFFFFFFC008AF12 in hex Fix this by using unsigned ints in the function signature for vhost_vring_ioctl(). Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 10 3月, 2018 4 次提交
-
-
由 Jason Wang 提交于
After commit fc72d1d5 ("tuntap: XDP transmission"), we can actually queueing XDP pointers in the pointer ring, so we should examine the pointer type before freeing the pointer. Fixes: fc72d1d5 ("tuntap: XDP transmission") Reported-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We get pointer ring from the exported sock, this means we should keep rx_ring and vq->private synced during both vq stop and backend set, otherwise we may see stale rx_ring. Fixes: c67df11f ("vhost_net: try batch dequing from skb array") Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Potapenko 提交于
KMSAN reported a use of uninit memory in vhost_net_buf_unproduce() while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring: ================================================================== BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho et.c:170 CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170 vhost_net_stop_vq drivers/vhost/net.c:974 [inline] vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982 vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015 __fput+0x49f/0xa00 fs/file_table.c:209 ____fput+0x37/0x40 fs/file_table.c:243 task_work_run+0x243/0x2c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop arch/x86/entry/common.c:166 [inline] prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196 syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265 do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292 ... origin: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213 kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392 kmalloc_large_node_hook mm/slub.c:1366 [inline] kmalloc_large_node mm/slub.c:3808 [inline] __kmalloc_node+0x100e/0x1290 mm/slub.c:3818 kmalloc_node include/linux/slab.h:554 [inline] kvmalloc_node+0x1a5/0x2e0 mm/util.c:419 kvmalloc include/linux/mm.h:541 [inline] vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921 misc_open+0x7b5/0x8b0 drivers/char/misc.c:154 chrdev_open+0xc28/0xd90 fs/char_dev.c:417 do_dentry_open+0xccb/0x1430 fs/open.c:752 vfs_open+0x272/0x2e0 fs/open.c:866 do_last fs/namei.c:3378 [inline] path_openat+0x49ad/0x6580 fs/namei.c:3519 do_filp_open+0x267/0x640 fs/namei.c:3553 do_sys_open+0x6ad/0x9c0 fs/open.c:1059 SYSC_openat+0xc7/0xe0 fs/open.c:1086 SyS_openat+0x63/0x90 fs/open.c:1080 do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287 ================================================================== Fixes: c67df11f ("vhost_net: try batch dequing from skb array") Signed-off-by: NAlexander Potapenko <glider@google.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vaibhav Murkute 提交于
Fixed a coding style issue. Signed-off-by: NVaibhav Murkute <vaibhavmurkute88@gmail.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 2月, 2018 1 次提交
-
-
由 Denys Vlasenko 提交于
Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 2月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2018 5 次提交
-
-
由 Eric Biggers 提交于
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_dev->log_file. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NJason Wang <jasowang@redhat.com>
-
由 Eric Biggers 提交于
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_virtqueue->error. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NJason Wang <jasowang@redhat.com>
-
由 Eric Biggers 提交于
We already hold a reference to the eventfd_ctx, which is sufficient; there's no need to hold a reference to the struct file as well. So get rid of vhost_virtqueue->call. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NJason Wang <jasowang@redhat.com>
-
由 夷则(Caspar) 提交于
In commit ea5d4046 ("vhost: fix release path lockdep checks"), Michael added a flag to check whether we should hold a lock in vhost_dev_cleanup(), however, in commit 47283bef ("vhost: move memory pointer to VQs"), RCU operations have been replaced by mutex, we can remove the no-longer-used `locked' parameter now. Signed-off-by: NCaspar Zhang <jinli.zjl@alibaba-inc.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Tonghao Zhang 提交于
The patch (7235acdb) changed the way of the work flushing in which the queued seq, done seq, and the flushing are not used anymore. Then remove them now. Fixes: 7235acdb ("vhost: simplify work flushing") Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 31 1月, 2018 1 次提交
-
-
由 Markus Elfring 提交于
Replace the specification of four data structures by pointer dereferences as the parameter for the operator "sizeof" to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 30 1月, 2018 1 次提交
-
-
由 Jason Wang 提交于
We don't stop device before reset owner, this means we could try to serve any virtqueue kick before reset dev->worker. This will result a warn since the work was pending at llist during owner resetting. Fix this by stopping device during owner reset. Reported-by: syzbot+eb17c6162478cc50632c@syzkaller.appspotmail.com Fixes: 3a4d5c94 ("vhost_net: a kernel-level virtio server") Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 1月, 2018 2 次提交
-
-
由 Jason Wang 提交于
The code will try to access dev->iotlb when processing VHOST_IOTLB_INVALIDATE even if it was not initialized which may lead to NULL pointer dereference. Fixes this by check dev->iotlb before. Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
We used to call mutex_lock() in vhost_dev_lock_vqs() which tries to hold mutexes of all virtqueues. This may confuse lockdep to report a possible deadlock because of trying to hold locks belong to same class. Switch to use mutex_lock_nested() to avoid false positive. Fixes: 6b1e6cc7 ("vhost: new device IOTLB API") Reported-by: syzbot+dbb7c1161485e61b0241@syzkaller.appspotmail.com Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 1月, 2018 1 次提交
-
-
由 Jason Wang 提交于
This patch tries to batched used ring update during RX. This is pretty fit for the case when guest is much faster (e.g dpdk based backend). In this case, used ring is almost empty: - we may get serious cache line misses/contending on both used ring and used idx. - at most 1 packet could be dequeued at one time, batching in guest does not make much effect. Update used ring in a batch can help since guest won't access the used ring until used idx was advanced for several descriptors and since we advance used ring for every N packets, guest will only need to access used idx for every N packet since it can cache the used idx. To have a better interaction for both batch dequeuing and dpdk batching, VHOST_RX_BATCH was used as the maximum number of descriptors that could be batched. Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU E5-2630 connected back to back through ixgbe. Traffic were generated on one remote ixgbe through MoonGen and measure the RX pps through testpmd in guest when do xdp_redirect_map from local ixgbe to tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31% improvement). One possible concern for this is the implications for TCP (especially latency sensitive workload). Result[1] does not show obvious changes for most of the netperf test (RR, TX, and RX). And we do get some improvements for RX on some specific size. Guest RX: size/sessions/+thu%/+normalize% 64/ 1/ +2%/ +2% 64/ 2/ +2%/ -1% 64/ 4/ +1%/ +1% 64/ 8/ 0%/ 0% 256/ 1/ +6%/ -3% 256/ 2/ -3%/ +2% 256/ 4/ +11%/ +11% 256/ 8/ 0%/ 0% 512/ 1/ +4%/ 0% 512/ 2/ +2%/ +2% 512/ 4/ 0%/ -1% 512/ 8/ -8%/ -8% 1024/ 1/ -7%/ -17% 1024/ 2/ -8%/ -7% 1024/ 4/ +1%/ 0% 1024/ 8/ 0%/ 0% 2048/ 1/ +30%/ +14% 2048/ 2/ +46%/ +40% 2048/ 4/ 0%/ 0% 2048/ 8/ 0%/ 0% 4096/ 1/ +23%/ +22% 4096/ 2/ +26%/ +23% 4096/ 4/ 0%/ +1% 4096/ 8/ 0%/ 0% 16384/ 1/ -2%/ -3% 16384/ 2/ +1%/ -4% 16384/ 4/ -1%/ -3% 16384/ 8/ 0%/ -1% 65535/ 1/ +15%/ +7% 65535/ 2/ +4%/ +7% 65535/ 4/ 0%/ +1% 65535/ 8/ 0%/ 0% TCP_RR: size/sessions/+thu%/+normalize% 1/ 1/ 0%/ +1% 1/ 25/ +2%/ +1% 1/ 50/ +4%/ +1% 64/ 1/ 0%/ -4% 64/ 25/ +2%/ +1% 64/ 50/ 0%/ -1% 256/ 1/ 0%/ 0% 256/ 25/ 0%/ 0% 256/ 50/ +4%/ +2% Guest TX: size/sessions/+thu%/+normalize% 64/ 1/ +4%/ -2% 64/ 2/ -6%/ -5% 64/ 4/ +3%/ +6% 64/ 8/ 0%/ +3% 256/ 1/ +15%/ +16% 256/ 2/ +11%/ +12% 256/ 4/ +1%/ 0% 256/ 8/ +5%/ +5% 512/ 1/ -1%/ -6% 512/ 2/ 0%/ -8% 512/ 4/ -2%/ +4% 512/ 8/ +6%/ +9% 1024/ 1/ +3%/ +1% 1024/ 2/ +3%/ +9% 1024/ 4/ 0%/ +7% 1024/ 8/ 0%/ +7% 2048/ 1/ +8%/ +2% 2048/ 2/ +3%/ -1% 2048/ 4/ -1%/ +11% 2048/ 8/ +3%/ +9% 4096/ 1/ +8%/ +8% 4096/ 2/ 0%/ -7% 4096/ 4/ +4%/ +4% 4096/ 8/ +2%/ +5% 16384/ 1/ -3%/ +1% 16384/ 2/ -1%/ -12% 16384/ 4/ -1%/ +5% 16384/ 8/ 0%/ +1% 65535/ 1/ 0%/ -3% 65535/ 2/ +5%/ +16% 65535/ 4/ +1%/ +2% 65535/ 8/ +1%/ -1% Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 1月, 2018 2 次提交
-
-
由 Jason Wang 提交于
This patch implements XDP transmission for TAP. Since we can't create new queues for TAP during XDP set, exist ptr_ring was reused for queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG (0x1UL) was encoded into lowest bit of xpd_buff pointer during ptr_ring_produce, and was decoded during consuming. XDP metadata was stored in the headroom of the packet which should work in most of cases since driver usually reserve enough headroom. Very minor changes were done for vhost_net: it just need to peek the length depends on the type of pointer. Tests were done on two Intel E5-2630 2.40GHz machines connected back to back through two 82599ES. Traffic were generated/received through MoonGen/testpmd(rxonly). It reports ~20% improvements when xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps to 3.05Mpps) Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason Wang 提交于
This patch switches to use ptr_ring instead of skb_array. This will be used to enqueue different types of pointers by encoding type into lower bits. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2017 1 次提交
-
-
由 Paul E. McKenney 提交于
Because READ_ONCE() now implies read_barrier_depends(), the read_barrier_depends() in next_desc() is now redundant. This commit therefore removes it and the related comments. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: <kvm@vger.kernel.org> Cc: <virtualization@lists.linux-foundation.org> Cc: <netdev@vger.kernel.org>
-
- 03 12月, 2017 1 次提交
-
-
由 Wei Xu 提交于
Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html Eventually we figured out that it was a skb leak in handle_rx() when sending packets to the VM. This usually happens when a guest can not drain out vq as fast as vhost fills in, afterwards it sets off the traffic jam and leaks skb(s) which occurs as no headcount to send on the vq from vhost side. This can be avoided by making sure we have got enough headcount before actually consuming a skb from the batched rx array while transmitting, which is simply done by moving checking the zero headcount a bit ahead. Signed-off-by: NWei Xu <wexu@redhat.com> Reported-by: NMatthew Rosato <mjrosato@linux.vnet.ibm.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 11月, 2017 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 11月, 2017 3 次提交
-
-
由 Al Viro 提交于
its ->mask is POLL... bitmap Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
__poll_t is also used as wait key in some waitqueues. Verify that wait_..._poll() gets __poll_t as key and provide a helper for wakeup functions to get back to that __poll_t value. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 11月, 2017 4 次提交
-
-
由 Jason Wang 提交于
We always poll tx for socket, this is sub optimal since this will slightly increase the waitqueue traversing time and more important, vhost could not benefit from commit 9e641bdc ("net-tun: restructure tun_do_read for better sleep/wakeup efficiency") even if we've stopped rx polling during handle_rx(), tx poll were still left in the waitqueue. Pktgen from a remote host to VM over mlx4 on two 2.00GHz Xeon E5-2650 shows 11.7% improvements on rx PPS. (from 1.28Mpps to 1.44Mpps) Cc: Wei Xu <wexu@redhat.com> Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stefan Hajnoczi 提交于
The vhost_vsock->guest_cid field is uninitialized when /dev/vhost-vsock is opened until the VHOST_VSOCK_SET_GUEST_CID ioctl is called. kvmalloc(..., GFP_KERNEL | __GFP_RETRY_MAYFAIL) does not zero memory. All other vhost_vsock fields are initialized explicitly so just initialize this field too. Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
During access_ok checks, addr increases as we iterate over the data structure, thus addr + len - 1 will point beyond the end of region we are translating. Harmless since we then verify that the region covers addr, but let's not waste cpu cycles. Reported-by: NKoichiro Den <den@klaipeden.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NKoichiro Den <den@klaipeden.com>
-
由 Byungchul Park 提交于
The following patch changed the behavior which originally did safe iteration. Make it safe as it was. 12bdcbd5 vhost/scsi: Don't reinvent the wheel but use existing llist API Signed-off-by: NByungchul Park <byungchul.park@lge.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-