提交 · 098eadce3c622c07b328d0a43dda379b38cf7c5e · openeuler / Kernel

18 6月, 2019 1 次提交

vhost_net: disable zerocopy by default · 098eadce

由 Jason Wang 提交于 6月 17, 2019

Vhost_net was known to suffer from HOL[1] issues which is not easy to
fix. Several downstream disable the feature by default. What's more,
the datapath was split and datacopy path got the support of batching
and XDP support recently which makes it faster than zerocopy part for
small packets transmission.

It looks to me that disable zerocopy by default is more
appropriate. It cold be enabled by default again in the future if we
fix the above issues.

[1] https://patchwork.kernel.org/patch/3787671/Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

098eadce

27 5月, 2019 4 次提交

vhost: scsi: add weight support · c1ea02f1

由 Jason Wang 提交于 5月 17, 2019

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing scsi kthread from hogging cpu
which is guest triggerable.

This addresses CVE-2019-3900.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: 057cbf49 ("tcm_vhost: Initial merge for vhost level target fabric driver")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

c1ea02f1

vhost: vsock: add weight support · e79b431f

由 Jason Wang 提交于 5月 17, 2019

This patch will check the weight and exit the loop if we exceeds the
weight. This is useful for preventing vsock kthread from hogging cpu
which is guest triggerable. The weight can help to avoid starving the
request from on direction while another direction is being processed.

The value of weight is picked from vhost-net.

This addresses CVE-2019-3900.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Fixes: 433fc58e ("VSOCK: Introduce vhost_vsock.ko")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e79b431f

vhost_net: fix possible infinite loop · e2412c07

由 Jason Wang 提交于 5月 17, 2019

When the rx buffer is too small for a packet, we will discard the vq
descriptor and retry it for the next packet:

while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk,
					      &busyloop_intr))) {
...
	/* On overrun, truncate and discard */
	if (unlikely(headcount > UIO_MAXIOV)) {
		iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
		err = sock->ops->recvmsg(sock, &msg,
					 1, MSG_DONTWAIT | MSG_TRUNC);
		pr_debug("Discarded rx packet: len %zd\n", sock_len);
		continue;
	}
...
}

This makes it possible to trigger a infinite while..continue loop
through the co-opreation of two VMs like:

1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the
   vhost process as much as possible e.g using indirect descriptors or
   other.
2) Malicious VM2 generate packets to VM1 as fast as possible

Fixing this by checking against weight at the end of RX and TX
loop. This also eliminate other similar cases when:

- userspace is consuming the packets in the meanwhile
- theoretical TOCTOU attack if guest moving avail index back and forth
  to hit the continue after vhost find guest just add new buffers

This addresses CVE-2019-3900.

Fixes: d8316f39 ("vhost: fix total length when packets are too short")
Fixes: 3a4d5c94 ("vhost_net: a kernel-level virtio server")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e2412c07

vhost: introduce vhost_exceeds_weight() · e82b9b07

由 Jason Wang 提交于 5月 17, 2019

We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e82b9b07

21 5月, 2019 2 次提交

treewide: Add SPDX license identifier - Makefile/Kconfig · ec8f24b7

由 Thomas Gleixner 提交于 5月 19, 2019

Add SPDX license identifiers to all Make/Kconfig files which:

 - Have no license information of any form

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ec8f24b7

treewide: Add SPDX license identifier for more missed files · 09c434b8

由 Thomas Gleixner 提交于 5月 19, 2019

Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have MODULE_LICENCE("GPL*") inside which was used in the initial
   scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

09c434b8

15 5月, 2019 1 次提交

mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b

由 Ira Weiny 提交于 5月 13, 2019

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73b0140b

13 5月, 2019 1 次提交

vhost-scsi: remove incorrect memory barrier · 889e31e7

由 Paolo Bonzini 提交于 4月 16, 2019

At this point, vs_tpg is not public at all; tv_tpg_vhost_count
is accessed under tpg->tv_tpg_mutex; tpg->vhost_scsi is
accessed under vhost_scsi_mutex.  Therefor there are no atomic
operations involved at all here, just remove the barrier.
Reported-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

889e31e7

11 4月, 2019 1 次提交

vhost: reject zero size iova range · 813dbeb6

由 Jason Wang 提交于 4月 09, 2019

We used to accept zero size iova range which will lead a infinite loop
in translate_desc(). Fixing this by failing the request in this case.

Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

813dbeb6

09 3月, 2019 1 次提交

vhost: silence an unused-variable warning · 81bf7bbe

由 Arnd Bergmann 提交于 3月 06, 2019

On some architectures, the MMU can be disabled, leading to access_ok()
becoming an empty macro that does not evaluate its size argument,
which in turn produces an unused-variable warning:

drivers/vhost/vhost.c:1191:9: error: unused variable 's' [-Werror,-Wunused-variable]
        size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

Mark the variable as __maybe_unused to shut up that warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81bf7bbe

07 3月, 2019 1 次提交

vhost: silence an unused-variable warning · cfdbb4ed

由 Arnd Bergmann 提交于 3月 06, 2019

On some architectures, the MMU can be disabled, leading to access_ok()
becoming an empty macro that does not evaluate its size argument,
which in turn produces an unused-variable warning:

drivers/vhost/vhost.c:1191:9: error: unused variable 's' [-Werror,-Wunused-variable]
size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;

Mark the variable as __maybe_unused to shut up that warning.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

cfdbb4ed

20 2月, 2019 1 次提交

vhost: correctly check the return value of translate_desc() in log_used() · 816db766

由 Jason Wang 提交于 2月 19, 2019

When fail, translate_desc() returns negative value, otherwise the
number of iovs. So we should fail when the return value is negative
instead of a blindly check against zero.

Detected by CoverityScan, CID# 1442593:  Control flow issues  (DEADCODE)

Fixes: cc5e7107 ("vhost: log dirty page correctly")
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Reported-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

816db766

05 2月, 2019 1 次提交

scsi: target/core: Remove the write_pending_status() callback function · f80d2f08

由 Bart Van Assche 提交于 1月 25, 2019

Due to the patch that makes TMF handling synchronous the
write_pending_status() callback function is no longer called.  Hence remove
it.
Acked-by: NFelipe Balbi <balbi@ti.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NAndy Grover <agrover@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NBryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Saurav Kashyap <saurav.kashyap@qlogic.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

f80d2f08

29 1月, 2019 1 次提交

vhost: fix OOB in get_rx_bufs() · b46a0bf7

由 Jason Wang 提交于 1月 28, 2019

After batched used ring updating was introduced in commit e2b3b35e
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.

        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                    vhost_len, &in, vq_log, &log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35e ("vhost_net: batch used ring update in rx")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b46a0bf7

18 1月, 2019 1 次提交

vhost: log dirty page correctly · cc5e7107

由 Jason Wang 提交于 1月 16, 2019

Vhost dirty page logging API is designed to sync through GPA. But we
try to log GIOVA when device IOTLB is enabled. This is wrong and may
lead to missing data after migration.

To solve this issue, when logging with device IOTLB enabled, we will:

1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
   get HVA, for writable descriptor, get HVA through iovec. For used
   ring update, translate its GIOVA to HVA
2) traverse the GPA->HVA mapping to get the possible GPA and log
   through GPA. Pay attention this reverse mapping is not guaranteed
   to be unique, so we should log each possible GPA in this case.

This fix the failure of scp to guest during migration. In -next, we
will probably support passing GIOVA->GPA instead of GIOVA->HVA.

Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Reported-by: NJintack Lim <jintack@cs.columbia.edu>
Cc: Jintack Lim <jintack@cs.columbia.edu>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc5e7107

15 1月, 2019 2 次提交

vhost/scsi: Use copy_to_iter() to send control queue response · 8e5dadfe

由 Bijan Mottahedeh 提交于 12月 03, 2018

Uses copy_to_iter() instead of __copy_to_user() in order to ensure we
support arbitrary layouts and an input buffer split across iov entries.

Fixes: 0d02dbd6 ("vhost/scsi: Respond to control queue operations")
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

8e5dadfe

vhost: return EINVAL if iovecs size does not match the message size · 74ad7419

由 Pavel Tikhomirov 提交于 12月 13, 2018

We've failed to copy and process vhost_iotlb_msg so let userspace at
least know about it. For instance before these patch the code below runs
without any error:

int main()
{
  struct vhost_msg msg;
  struct iovec iov;
  int fd;

  fd = open("/dev/vhost-net", O_RDWR);
  if (fd == -1) {
    perror("open");
    return 1;
  }

  iov.iov_base = &msg;
  iov.iov_len = sizeof(msg)-4;

  if (writev(fd, &iov,1) == -1) {
    perror("writev");
    return 1;
  }

  return 0;
}
Signed-off-by: NPavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

74ad7419

12 1月, 2019 1 次提交

vhost/vsock: fix vhost vsock cid hashing inconsistent · 7fbe078c

由 Zha Bin 提交于 1月 08, 2019

The vsock core only supports 32bit CID, but the Virtio-vsock spec define
CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
zero. This inconsistency causes one bug in vhost vsock driver. The
scenarios is:

  0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
  object. And hash_min() is used to compute the hash key. hash_min() is
  defined as:
  (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
  That means the hash algorithm has dependency on the size of macro
  argument 'val'.
  0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
  hash_min() to compute the hash key when inserting a vsock object into
  the hash table.
  0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
  to compute the hash key when looking up a vsock for an CID.

Because the different size of the CID, hash_min() returns different hash
key, thus fails to look up the vsock object for an CID.

To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
headers, but explicitly convert u64 to u32 when deal with the hash table
and vsock core.

Fixes: 834e772c ("vhost/vsock: fix use-after-free in network stack callers")
Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.texSigned-off-by: NZha Bin <zhabin@linux.alibaba.com>
Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fbe078c

04 1月, 2019 1 次提交

Remove 'type' argument from access_ok() function · 96d4f267

由 Linus Torvalds 提交于 1月 03, 2019

Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
of the user address range verification function since we got rid of the
old racy i386-only code to walk page tables by hand.

It existed because the original 80386 would not honor the write protect
bit when in kernel mode, so you had to do COW by hand before doing any
user access.  But we haven't supported that in a long time, and these
days the 'type' argument is a purely historical artifact.

A discussion about extending 'user_access_begin()' to do the range
checking resulted this patch, because there is no way we're going to
move the old VERIFY_xyz interface to that model.  And it's best done at
the end of the merge window when I've done most of my merges, so let's
just get this done once and for all.

This patch was mostly done with a sed-script, with manual fix-ups for
the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.

There were a couple of notable cases:

 - csky still had the old "verify_area()" name as an alias.

 - the iter_iov code had magical hardcoded knowledge of the actual
   values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
   really used it)

 - microblaze used the type argument for a debug printout

but other than those oddities this should be a total no-op patch.

I tried to fix up all architectures, did fairly extensive grepping for
access_ok() uses, and the changes are trivial, but I may have missed
something.  Any missed conversion should be trivially fixable, though.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96d4f267

20 12月, 2018 2 次提交

vhost: correct the related warning message · a691ffb4

由 wangyan 提交于 12月 13, 2018

Fixes: 'commit d588cf8f ("target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem")'
       'commit cbbd26b8 ("[iov_iter] new primitives - copy_from_iter_full() and friends")'
Signed-off-by: NYan Wang <wangyan122@huawei.com>
Reviewed-by: NJun Piao <piaojun@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a691ffb4

vhost/vsock: switch to a mutex for vhost_vsock_hash · 6db3d8dc

由 Stefan Hajnoczi 提交于 11月 05, 2018

Now that there are no more data path users of vhost_vsock_lock, it can
be turned into a mutex.  It's only used by .release() and in the
.ioctl() path.

Depends-on: <20181105103547.22018-1-stefanha@redhat.com>
Suggested-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

6db3d8dc

13 12月, 2018 3 次提交

Revert "net: vhost: lock the vqs one by one" · 86a07da3

由 Jason Wang 提交于 12月 13, 2018

This reverts commit 78139c94. We don't
protect device IOTLB with vq mutex, which will lead e.g use after free
for device IOTLB entries. And since we've switched to use
mutex_trylock() in previous patch, it's safe to revert it without
having deadlock.

Fixes: commit 78139c94 ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86a07da3

vhost_net: switch to use mutex_trylock() in vhost_net_busy_poll() · 476e8ba7

由 Jason Wang 提交于 12月 13, 2018

We used to hold the mutex of paired virtqueue in
vhost_net_busy_poll(). But this will results an inconsistent lock
order which may cause deadlock if we try to bring back the protection
of device IOTLB with vq mutex that requires to hold mutex of all
virtqueues at the same time.

Fix this simply by switching to use mutex_trylock(), when fail just
skip the busy polling. This can happen when device IOTLB is under
updating which should be rare.

Fixes: commit 78139c94 ("net: vhost: lock the vqs one by one")
Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

476e8ba7

vhost: make sure used idx is seen before log in vhost_add_used_n() · 841df922

由 Jason Wang 提交于 12月 13, 2018

We miss a write barrier that guarantees used idx is updated and seen
before log. This will let userspace sync and copy used ring before
used idx is update. Fix this by adding a barrier before log_write().

Fixes: 8dd014ad ("vhost-net: mergeable buffers support")
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

841df922

07 12月, 2018 2 次提交

vhost/vsock: fix use-after-free in network stack callers · 834e772c

由 Stefan Hajnoczi 提交于 11月 05, 2018

If the network stack calls .send_pkt()/.cancel_pkt() during .release(),
a struct vhost_vsock use-after-free is possible.  This occurs because
.release() does not wait for other CPUs to stop using struct
vhost_vsock.

Switch to an RCU-enabled hashtable (indexed by guest CID) so that
.release() can wait for other CPUs by calling synchronize_rcu().  This
also eliminates vhost_vsock_lock acquisition in the data path so it
could have a positive effect on performance.

This is CVE-2018-14625 "kernel: use-after-free Read in vhost_transport_send_pkt".

Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+bd391451452fb0b93039@syzkaller.appspotmail.com
Reported-by: syzbot+e3e074963495f92a89ed@syzkaller.appspotmail.com
Reported-by: syzbot+d5a0a170c5069658b141@syzkaller.appspotmail.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

834e772c

vhost/vsock: fix reset orphans race with close timeout · c38f57da

由 Stefan Hajnoczi 提交于 12月 06, 2018

If a local process has closed a connected socket and hasn't received a
RST packet yet, then the socket remains in the table until a timeout
expires.

When a vhost_vsock instance is released with the timeout still pending,
the socket is never freed because vhost_vsock has already set the
SOCK_DONE flag.

Check if the close timer is pending and let it close the socket.  This
prevents the race which can leak sockets.
Reported-by: NMaximilian Riemensberger <riemensberger@cadami.net>
Cc: Graham Whaley <graham.whaley@gmail.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

c38f57da

04 12月, 2018 1 次提交

vhost: fix IOTLB locking · e3f78718

由 Jean-Philippe Brucker 提交于 11月 30, 2018

Commit 78139c94 ("net: vhost: lock the vqs one by one") moved the vq
lock to improve scalability, but introduced a possible deadlock in
vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding
the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the
spinlock is taken while holding vq->mutex.

Since calling vhost_poll_queue() doesn't require any lock, avoid the
deadlock by not taking vq->mutex.

Fixes: 78139c94 ("net: vhost: lock the vqs one by one")
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3f78718

29 11月, 2018 2 次提交

scsi: target: replace fabric_ops.name with fabric_alias · 59a206b4

由 David Disseldorp 提交于 11月 23, 2018

iscsi_target_mod is the only LIO fabric where fabric_ops.name differs from
the fabric_ops.fabric_name string.  fabric_ops.name is used when matching
target/$fabric ConfigFS create paths, so rename it .fabric_alias and
fallback to target/$fabric vs .fabric_name comparison if .fabric_alias
isn't initialised.  iscsi_target_mod is the only fabric module to set
.fabric_alias . All other fabric modules rely on .fabric_name matching and
can drop the duplicate string.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

59a206b4

scsi: target: drop unnecessary get_fabric_name() accessor from fabric_ops · 30c7ca93

由 David Disseldorp 提交于 11月 23, 2018

All fabrics return a const string. In all cases *except* iSCSI the
get_fabric_name() string matches fabric_ops.name.

Both fabric_ops.get_fabric_name() and fabric_ops.name are user-facing, with
the former being used for PR/ALUA state and the latter for ConfigFS
(config/target/$name), so we unfortunately need to keep both strings around
for now.  Replace the useless .get_fabric_name() accessor function with a
const string fabric_name member variable.
Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

30c7ca93

28 11月, 2018 1 次提交

drivers/vhost: Replace synchronize_rcu_bh() with synchronize_rcu() · d05faa5f

由 Paul E. McKenney 提交于 11月 05, 2018

Now that synchronize_rcu() waits for bh-disable regions of code as well
as RCU read-side critical sections, synchronize_rcu_bh() can be replaced
by synchronize_rcu().  This commit therefore makes this change.
Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: <kvm@vger.kernel.org>
Cc: <virtualization@lists.linux-foundation.org>
Cc: <netdev@vger.kernel.org>

d05faa5f

18 11月, 2018 1 次提交

vhost_net: mitigate page reference counting during page frag refill · e4dab1e6

由 Jason Wang 提交于 11月 15, 2018

We do a get_page() which involves a atomic operation. This patch tries
to mitigate a per packet atomic operation by maintaining a reference
bias which is initially USHRT_MAX. Each time a page is got, instead of
calling get_page() we decrease the bias and when we find it's time to
use a new page we will decrease the bias at one time through
__page_cache_drain_cache().

Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6%
improvement.

Before: 4.63Mpps
After:  4.71Mpps
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4dab1e6

01 11月, 2018 1 次提交

vhost: Fix Spectre V1 vulnerability · ff002269

由 Jason Wang 提交于 10月 30, 2018

The idx in vhost_vring_ioctl() was controlled by userspace, hence a
potential exploitation of the Spectre variant 1 vulnerability.

Fixing this by sanitizing idx before using it to index d->vqs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff002269

25 10月, 2018 4 次提交

vhost/scsi: Use common handling code in request queue handler · 09d75832

由 Bijan Mottahedeh 提交于 9月 17, 2018

Change the request queue handler to use common handling routines same
as the control queue handler.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

09d75832

vhost/scsi: Extract common handling code from control queue handler · 3f8ca2e1

由 Bijan Mottahedeh 提交于 9月 17, 2018

Prepare to change the request queue handler to use common handling
routines.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3f8ca2e1

vhost/scsi: Respond to control queue operations · 0d02dbd6

由 Bijan Mottahedeh 提交于 9月 17, 2018

The vhost-scsi driver currently does not handle any control queue
operations. In particular, vhost_scsi_ctl_handle_kick, merely prints out
a debug message but does nothing else. This can cause guest VMs to hang.

As part of SCSI recovery from an error, e.g., an I/O timeout, the SCSI
midlayer attempts to abort the failed operation. The SCSI virtio driver
translates the abort to a SCSI TMF request that gets put on the control
queue (virtscsi_abort -> virtscsi_tmf). The SCSI virtio driver then
waits indefinitely for this request to be completed, but it never will
because vhost-scsi never responds to that request.

To avoid a hang, always respond to control queue operations; explicitly
reject TMF requests, and return a no-op response to event requests.
Signed-off-by: NBijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

0d02dbd6

vhost/scsi: truncate T10 PI iov_iter to prot_bytes · 4542d623

由 Greg Edwards 提交于 8月 22, 2018

Commands with protection information included were not truncating the
protection iov_iter to the number of protection bytes in the command.
This resulted in vhost_scsi mis-calculating the size of the protection
SGL in vhost_scsi_calc_sgls(), and including both the protection and
data SG entries in the protection SGL.

Fixes: 09b13fa8 ("vhost/scsi: Add ANY_LAYOUT support in vhost_scsi_handle_vq")
Signed-off-by: NGreg Edwards <gedwards@ddn.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Fixes: 09b13fa8
Cc: stable@vger.kernel.org
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>

4542d623

08 10月, 2018 1 次提交

net: vhost: remove bad code line · abf1a08f

由 Tonghao Zhang 提交于 10月 07, 2018

Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abf1a08f

27 9月, 2018 2 次提交

net: vhost: add rx busy polling in tx path · 441abde4

由 Tonghao Zhang 提交于 9月 25, 2018

This patch improves the guest receive performance.
On the handle_tx side, we poll the sock receive queue at the
same time. handle_rx do that in the same way.

We set the poll-us=100us and use the netperf to test throughput
and mean latency. When running the tests, the vhost-net kthread
of that VM, is alway 100% CPU. The commands are shown as below.

Rx performance is greatly improved by this patch. There is not
notable performance change on tx with this series though. This
patch is useful for bi-directional traffic.

netperf -H IP -t TCP_STREAM -l 20 -- -O "THROUGHPUT, THROUGHPUT_UNITS, MEAN_LATENCY"

Topology:
[Host] ->linux bridge -> tap vhost-net ->[Guest]

TCP_STREAM:
* Without the patch:  19842.95 Mbps, 6.50 us mean latency
* With the patch:     37598.20 Mbps, 3.43 us mean latency
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

441abde4

net: vhost: factor out busy polling logic to vhost_net_busy_poll() · dc151282

由 Tonghao Zhang 提交于 9月 25, 2018

Factor out generic busy polling logic and will be
used for in tx path in the next patch. And with the patch,
qemu can set differently the busyloop_timeout for rx queue.

To avoid duplicate codes, introduce the helper functions:
* sock_has_rx_data(changed from sk_has_rx_data)
* vhost_net_busy_poll_try_queue
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc151282

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功