提交 · 813dbeb656d6c90266f251d8bd2b02d445afa63f · openeuler / Kernel

11 4月, 2019 1 次提交

vhost: reject zero size iova range · 813dbeb6

由 Jason Wang 提交于 4月 09, 2019

We used to accept zero size iova range which will lead a infinite loop
in translate_desc(). Fixing this by failing the request in this case.

Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

813dbeb6

09 3月, 2019 1 次提交

vhost: silence an unused-variable warning · 81bf7bbe

由 Arnd Bergmann 提交于 3月 06, 2019

On some architectures, the MMU can be disabled, leading to access_ok()
becoming an empty macro that does not evaluate its size argument,
which in turn produces an unused-variable warning:

drivers/vhost/vhost.c:1191:9: error: unused variable 's' [-Werror,-Wunused-variable]
        size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

Mark the variable as __maybe_unused to shut up that warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81bf7bbe

07 3月, 2019 1 次提交

vhost: silence an unused-variable warning · cfdbb4ed

由 Arnd Bergmann 提交于 3月 06, 2019

On some architectures, the MMU can be disabled, leading to access_ok()
becoming an empty macro that does not evaluate its size argument,
which in turn produces an unused-variable warning:

drivers/vhost/vhost.c:1191:9: error: unused variable 's' [-Werror,-Wunused-variable]
size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

Mark the variable as __maybe_unused to shut up that warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

cfdbb4ed

20 2月, 2019 1 次提交

vhost: correctly check the return value of translate_desc() in log_used() · 816db766

由 Jason Wang 提交于 2月 19, 2019

When fail, translate_desc() returns negative value, otherwise the
number of iovs. So we should fail when the return value is negative
instead of a blindly check against zero.

Detected by CoverityScan, CID# 1442593:  Control flow issues  (DEADCODE)

Fixes: cc5e7107 ("vhost: log dirty page correctly")
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Reported-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

816db766

05 2月, 2019 1 次提交

scsi: target/core: Remove the write_pending_status() callback function · f80d2f08

由 Bart Van Assche 提交于 1月 25, 2019

Due to the patch that makes TMF handling synchronous the
write_pending_status() callback function is no longer called.  Hence remove
it.
Acked-by: NFelipe Balbi <balbi@ti.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NAndy Grover <agrover@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Saurav Kashyap <saurav.kashyap@qlogic.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f80d2f08

29 1月, 2019 1 次提交

vhost: fix OOB in get_rx_bufs() · b46a0bf7

由 Jason Wang 提交于 1月 28, 2019

After batched used ring updating was introduced in commit e2b3b35e
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.

        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                    vhost_len, &in, vq_log, &log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35e ("vhost_net: batch used ring update in rx")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b46a0bf7

18 1月, 2019 1 次提交

vhost: log dirty page correctly · cc5e7107

由 Jason Wang 提交于 1月 16, 2019

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Reported-by: NJintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc5e7107

15 1月, 2019 2 次提交

vhost/scsi: Use copy_to_iter() to send control queue response · 8e5dadfe

由 Bijan Mottahedeh 提交于 12月 03, 2018

Uses copy_to_iter() instead of __copy_to_user() in order to ensure we
support arbitrary layouts and an input buffer split across iov entries.

Fixes: 0d02dbd6 ("vhost/scsi: Respond to control queue operations")
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

8e5dadfe

vhost: return EINVAL if iovecs size does not match the message size · 74ad7419

由 Pavel Tikhomirov 提交于 12月 13, 2018

We've failed to copy and process vhost_iotlb_msg so let userspace at
least know about it. For instance before these patch the code below runs
without any error:

int main()
{
  struct vhost_msg msg;
  struct iovec iov;
  int fd;

  fd = open("/dev/vhost-net", O_RDWR);
  if (fd == -1) {
    perror("open");
    return 1;
  }

  iov.iov_base = &msg;
  iov.iov_len = sizeof(msg)-4;

  if (writev(fd, &iov,1) == -1) {
    perror("writev");
    return 1;
  }

  return 0;
}
Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

74ad7419

12 1月, 2019 1 次提交

vhost/vsock: fix vhost vsock cid hashing inconsistent · 7fbe078c

由 Zha Bin 提交于 1月 08, 2019

The vsock core only supports 32bit CID, but the Virtio-vsock spec define
CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
zero. This inconsistency causes one bug in vhost vsock driver. The
scenarios is:

  0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
  object. And hash_min() is used to compute the hash key. hash_min() is
  defined as:
  (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
  That means the hash algorithm has dependency on the size of macro
  argument 'val'.
  0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
  hash_min() to compute the hash key when inserting a vsock object into
  the hash table.
  0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
  to compute the hash key when looking up a vsock for an CID.

Because the different size of the CID, hash_min() returns different hash
key, thus fails to look up the vsock object for an CID.

To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
headers, but explicitly convert u64 to u32 when deal with the hash table
and vsock core.

Fixes: 834e772c ("vhost/vsock: fix use-after-free in network stack callers")
Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.texSigned-off-by: NZha Bin <zhabin@linux.alibaba.com>
Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fbe078c

04 1月, 2019 1 次提交

Remove 'type' argument from access_ok() function · 96d4f267

由 Linus Torvalds 提交于 1月 03, 2019

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96d4f267

20 12月, 2018 2 次提交

vhost: correct the related warning message · a691ffb4

由 wangyan 提交于 12月 13, 2018

Fixes: 'commit d588cf8f ("target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem")'
       'commit cbbd26b8 ("[iov_iter] new primitives - copy_from_iter_full() and friends")'
Signed-off-by: NYan Wang <wangyan122@huawei.com>
Reviewed-by: NJun Piao <piaojun@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a691ffb4

vhost/vsock: switch to a mutex for vhost_vsock_hash · 6db3d8dc

由 Stefan Hajnoczi 提交于 11月 05, 2018

Now that there are no more data path users of vhost_vsock_lock, it can
be turned into a mutex.  It's only used by .release() and in the
.ioctl() path.

Depends-on: <20181105103547.22018-1-stefanha@redhat.com>
Suggested-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

6db3d8dc

13 12月, 2018 3 次提交

Revert "net: vhost: lock the vqs one by one" · 86a07da3

由 Jason Wang 提交于 12月 13, 2018

This reverts commit 78139c94. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've switched to use
mutex_trylock() in previous patch, it's safe to revert it without
having deadlock.

Fixes: commit 78139c94 ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86a07da3

vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() · 476e8ba7

由 Jason Wang 提交于 12月 13, 2018

We used to hold the mutex of paired virtqueue in
vhost_net_busy_poll(). But this will results an inconsistent lock
order which may cause deadlock if we try to bring back the protection
of device IOTLB with vq mutex that requires to hold mutex of all
virtqueues at the same time.

Fix this simply by switching to use mutex_trylock(), when fail just
skip the busy polling. This can happen when device IOTLB is under
updating which should be rare.

Fixes: commit 78139c94 ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

476e8ba7

vhost: make sure used idx is seen before log in vhost_add_used_n() · 841df922

由 Jason Wang 提交于 12月 13, 2018

We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().

Fixes: 8dd014ad ("vhost-net: mergeable buffers support")
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

841df922

07 12月, 2018 2 次提交

vhost/vsock: fix use-after-free in network stack callers · 834e772c

由 Stefan Hajnoczi 提交于 11月 05, 2018

If the network stack calls .send_pkt()/.cancel_pkt() during .release(),
a struct vhost_vsock use-after-free is possible.  This occurs because
.release() does not wait for other CPUs to stop using struct
vhost_vsock.

Switch to an RCU-enabled hashtable (indexed by guest CID) so that
.release() can wait for other CPUs by calling synchronize_rcu().  This
also eliminates vhost_vsock_lock acquisition in the data path so it
could have a positive effect on performance.

This is CVE-2018-14625 "kernel: use-after-free Read in vhost_transport_send_pkt".

Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+bd391451452fb0b93039@syzkaller.appspotmail.com
Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
Reported-by: syzbot+d5a0a170c5069658b141@syzkaller.appspotmail.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

834e772c

vhost/vsock: fix reset orphans race with close timeout · c38f57da

由 Stefan Hajnoczi 提交于 12月 06, 2018

If a local process has closed a connected socket and hasn't received a
RST packet yet, then the socket remains in the table until a timeout
expires.

When a vhost_vsock instance is released with the timeout still pending,
the socket is never freed because vhost_vsock has already set the
SOCK_DONE flag.

Check if the close timer is pending and let it close the socket.  This
prevents the race which can leak sockets.
Reported-by: NMaximilian Riemensberger <riemensberger@cadami.net>
Cc: Graham Whaley <graham.whaley@gmail.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c38f57da

04 12月, 2018 1 次提交

vhost: fix IOTLB locking · e3f78718

由 Jean-Philippe Brucker 提交于 11月 30, 2018

Commit 78139c94 ("net: vhost: lock the vqs one by one") moved the vq
lock to improve scalability, but introduced a possible deadlock in
vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding
the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the
spinlock is taken while holding vq->mutex.

Since calling vhost_poll_queue() doesn't require any lock, avoid the
deadlock by not taking vq->mutex.

Fixes: 78139c94 ("net: vhost: lock the vqs one by one")
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3f78718

29 11月, 2018 2 次提交

scsi: target: replace fabric_ops.name with fabric_alias · 59a206b4

由 David Disseldorp 提交于 11月 23, 2018

iscsi_target_mod is the only LIO fabric where fabric_ops.name differs from
the fabric_ops.fabric_name string.  fabric_ops.name is used when matching
target/$fabric ConfigFS create paths, so rename it .fabric_alias and
fallback to target/$fabric vs .fabric_name comparison if .fabric_alias
isn't initialised.  iscsi_target_mod is the only fabric module to set
.fabric_alias . All other fabric modules rely on .fabric_name matching and
can drop the duplicate string.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

59a206b4

scsi: target: drop unnecessary get_fabric_name() accessor from fabric_ops · 30c7ca93

由 David Disseldorp 提交于 11月 23, 2018

All fabrics return a const string. In all cases *except* iSCSI the
get_fabric_name() string matches fabric_ops.name.

Both fabric_ops.get_fabric_name() and fabric_ops.name are user-facing, with
the former being used for PR/ALUA state and the latter for ConfigFS
(config/target/$name), so we unfortunately need to keep both strings around
for now.  Replace the useless .get_fabric_name() accessor function with a
const string fabric_name member variable.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

30c7ca93

28 11月, 2018 1 次提交

drivers/vhost: Replace synchronize_rcu_bh() with synchronize_rcu() · d05faa5f

由 Paul E. McKenney 提交于 11月 05, 2018

Now that synchronize_rcu() waits for bh-disable regions of code as well
as RCU read-side critical sections, synchronize_rcu_bh() can be replaced
by synchronize_rcu().  This commit therefore makes this change.
Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: <kvm@vger.kernel.org>
Cc: <virtualization@lists.linux-foundation.org>
Cc: <netdev@vger.kernel.org>

d05faa5f

18 11月, 2018 1 次提交

vhost_net: mitigate page reference counting during page frag refill · e4dab1e6

由 Jason Wang 提交于 11月 15, 2018

We do a get_page() which involves a atomic operation. This patch tries
to mitigate a per packet atomic operation by maintaining a reference
bias which is initially USHRT_MAX. Each time a page is got, instead of
calling get_page() we decrease the bias and when we find it's time to
use a new page we will decrease the bias at one time through
__page_cache_drain_cache().

Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6%
improvement.

Before: 4.63Mpps
After:  4.71Mpps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4dab1e6

01 11月, 2018 1 次提交

vhost: Fix Spectre V1 vulnerability · ff002269

由 Jason Wang 提交于 10月 30, 2018

The idx in vhost_vring_ioctl() was controlled by userspace, hence a
potential exploitation of the Spectre variant 1 vulnerability.

Fixing this by sanitizing idx before using it to index d->vqs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff002269

25 10月, 2018 4 次提交

vhost/scsi: Use common handling code in request queue handler · 09d75832

由 Bijan Mottahedeh 提交于 9月 17, 2018

Change the request queue handler to use common handling routines same
as the control queue handler.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

09d75832

vhost/scsi: Extract common handling code from control queue handler · 3f8ca2e1

由 Bijan Mottahedeh 提交于 9月 17, 2018

Prepare to change the request queue handler to use common handling
routines.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3f8ca2e1

vhost/scsi: Respond to control queue operations · 0d02dbd6

由 Bijan Mottahedeh 提交于 9月 17, 2018

The vhost-scsi driver currently does not handle any control queue
operations. In particular, vhost_scsi_ctl_handle_kick, merely prints out
a debug message but does nothing else. This can cause guest VMs to hang.

As part of SCSI recovery from an error, e.g., an I/O timeout, the SCSI
midlayer attempts to abort the failed operation. The SCSI virtio driver
translates the abort to a SCSI TMF request that gets put on the control
queue (virtscsi_abort -> virtscsi_tmf). The SCSI virtio driver then
waits indefinitely for this request to be completed, but it never will
because vhost-scsi never responds to that request.

To avoid a hang, always respond to control queue operations; explicitly
reject TMF requests, and return a no-op response to event requests.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

0d02dbd6

vhost/scsi: truncate T10 PI iov_iter to prot_bytes · 4542d623

由 Greg Edwards 提交于 8月 22, 2018

Commands with protection information included were not truncating the
protection iov_iter to the number of protection bytes in the command.
This resulted in vhost_scsi mis-calculating the size of the protection
SGL in vhost_scsi_calc_sgls(), and including both the protection and
data SG entries in the protection SGL.

Fixes: 09b13fa8 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq")
Signed-off-by: NGreg Edwards <gedwards@ddn.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Fixes: 09b13fa8
Cc: stable@vger.kernel.org
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>

4542d623

08 10月, 2018 1 次提交

net: vhost: remove bad code line · abf1a08f

由 Tonghao Zhang 提交于 10月 07, 2018

Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abf1a08f

27 9月, 2018 4 次提交

net: vhost: add rx busy polling in tx path · 441abde4

由 Tonghao Zhang 提交于 9月 25, 2018

This patch improves the guest receive performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the netperf to test throughput
and mean latency. When running the tests, the vhost-net kthread
of that VM, is alway 100% CPU. The commands are shown as below.

Rx performance is greatly improved by this patch. There is not
notable performance change on tx with this series though. This
patch is useful for bi-directional traffic.

netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY"

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM:
* Without the patch:  19842.95 Mbps, 6.50 us mean latency
* With the patch:     37598.20 Mbps, 3.43 us mean latency
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

441abde4

net: vhost: factor out busy polling logic to vhost_net_busy_poll() · dc151282

由 Tonghao Zhang 提交于 9月 25, 2018

Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

To avoid duplicate codes, introduce the helper functions:
* sock_has_rx_data(changed from sk_has_rx_data)
* vhost_net_busy_poll_try_queue
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc151282

net: vhost: replace magic number of lock annotation · a6a67a2f

由 Tonghao Zhang 提交于 9月 25, 2018

Use the VHOST_NET_VQ_XXX as a subclass for mutex_lock_nested.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6a67a2f

net: vhost: lock the vqs one by one · 78139c94

由 Tonghao Zhang 提交于 9月 25, 2018

This patch changes the way that lock all vqs
at the same, to lock them one by one. It will
be used for next patch to avoid the deadlock.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

78139c94

22 9月, 2018 1 次提交

vhost_net: add a missing error return · 8a1aff14

由 Dan Carpenter 提交于 9月 20, 2018

We accidentally left out this error return so it leads to some use after
free bugs later on.

Fixes: 0a0be13b ("vhost_net: batch submitting XDP buffers to underlayer sockets")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a1aff14

14 9月, 2018 2 次提交

vhost_net: batch submitting XDP buffers to underlayer sockets · 0a0be13b

由 Jason Wang 提交于 9月 12, 2018

This patch implements XDP batching for vhost_net. The idea is first to
try to do userspace copy and build XDP buff directly in vhost. Instead
of submitting the packet immediately, vhost_net will batch them in an
array and submit every 64 (VHOST_NET_BATCH) packets to the under layer
sockets through msg_control of sendmsg().

When XDP is enabled on the TUN/TAP, TUN/TAP can process XDP inside a
loop without caring GUP thus it can do batch map flushing. When XDP is
not enabled or not supported, the underlayer socket need to build skb
and pass it to network core. The batched packet submission allows us
to do batching like netif_receive_skb_list() in the future.

This saves lots of indirect calls for better cache utilization. For
the case that we can't so batching e.g when sndbuf is limited or
packet size is too large, we will go for usual one packet per
sendmsg() way.

Doing testpmd on various setups gives us:

Test                /+pps%
XDP_DROP on TAP     /+44.8%
XDP_REDIRECT on TAP /+29%
macvtap (skb)       /+26%

Netperf tests shows obvious improvements for small packet transmission:

size/session/+thu%/+normalize%
   64/     1/   +2%/    0%
   64/     2/   +3%/   +1%
   64/     4/   +7%/   +5%
   64/     8/   +8%/   +6%
  256/     1/   +3%/    0%
  256/     2/  +10%/   +7%
  256/     4/  +26%/  +22%
  256/     8/  +27%/  +23%
  512/     1/   +3%/   +2%
  512/     2/  +19%/  +14%
  512/     4/  +43%/  +40%
  512/     8/  +45%/  +41%
 1024/     1/   +4%/    0%
 1024/     2/  +27%/  +21%
 1024/     4/  +38%/  +73%
 1024/     8/  +15%/  +24%
 2048/     1/  +10%/   +7%
 2048/     2/  +16%/  +12%
 2048/     4/    0%/   +2%
 2048/     8/    0%/   +2%
 4096/     1/  +36%/  +60%
 4096/     2/  -11%/  -26%
 4096/     4/    0%/  +14%
 4096/     8/    0%/   +4%
16384/     1/   -1%/   +5%
16384/     2/    0%/   +2%
16384/     4/    0%/   -3%
16384/     8/    0%/   +4%
65535/     1/    0%/  +10%
65535/     2/    0%/   +8%
65535/     4/    0%/   +1%
65535/     8/    0%/   +3%
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a0be13b

tun: switch to new type of msg_control · fe8dd45b

由 Jason Wang 提交于 9月 12, 2018

This patch introduces to a new tun/tap specific msg_control:

#define TUN_MSG_UBUF 1
#define TUN_MSG_PTR  2
struct tun_msg_ctl {
       int type;
       void *ptr;
};

This allows us to pass different kinds of msg_control through
sendmsg(). The first supported type is ubuf (TUN_MSG_UBUF) which will
be used by the existed vhost_net zerocopy code. The second is XDP
buff, which allows vhost_net to pass XDP buff to TUN. This could be
used to implement accepting an array of XDP buffs from vhost_net in
the following patches.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe8dd45b

26 8月, 2018 1 次提交

vhost: correctly check the iova range when waking virtqueue · 2d66f997

由 Jason Wang 提交于 8月 24, 2018

We don't wakeup the virtqueue if the first byte of pending iova range
is the last byte of the range we just got updated. This will lead a
virtqueue to wait for IOTLB updating forever. Fixing by correct the
check and wake up the virtqueue in this case.

Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Reported-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Tested-by: NPeter Xu <peterx@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d66f997

22 8月, 2018 2 次提交

vhost/scsi: increase VHOST_SCSI_PREALLOC_PROT_SGLS to 2048 · 864d39df

由 Greg Edwards 提交于 8月 08, 2018

The current value of VHOST_SCSI_PREALLOC_PROT_SGLS is too small to
accommodate larger I/Os, e.g. 16-32 MiB, when the VIRTIO_SCSI_F_T10_PI
feature bit is negotiated and the backing store supports T10 PI.

vhost-scsi rejects the command with errors like:

[   59.581317] vhost_scsi_calc_sgls: requested sgl_count: 1820 exceeds pre-allocated max_sgls: 512
Signed-off-by: NGreg Edwards <gedwards@ddn.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

864d39df

vhost: allow vhost-scsi driver to be built-in · 84f3396b

由 Greg Edwards 提交于 8月 10, 2018

It's useful to allow vhost-scsi to be built-in when testing vhost in L1
+ L2 VMs and booting L1 VM with QEMU '-kernel' option.
Signed-off-by: NGreg Edwards <gedwards@ddn.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

84f3396b

09 8月, 2018 1 次提交

vhost: reset metadata cache when initializing new IOTLB · b13f9c63

由 Jason Wang 提交于 8月 08, 2018

We need to reset metadata cache during new IOTLB initialization,
otherwise the stale pointers to previous IOTLB may be still accessed
which will lead a use after free.

Reported-by: syzbot+c51e6736a1bf614b3272@syzkaller.appspotmail.com
Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b13f9c63

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功