提交 · 6bcf34224ac1e94103797fd68b9836061762f2b2 · openeuler / Kernel

16 11月, 2020 1 次提交

vhost: add helper to check if a vq has been setup · 6bcf3422

由 Mike Christie 提交于 11月 09, 2020

This adds a helper check if a vq has been setup. The next patches
will use this when we move the vhost scsi cmd preallocation from per
session to per vq. In the per vq case, we only want to allocate cmds
for vqs that have actually been setup and not for all the possible
vqs.
Signed-off-by: NMike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/1604986403-4931-2-git-send-email-michael.christie@oracle.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

6bcf3422

21 10月, 2020 2 次提交

vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call · 86e182fe

由 Zhu Lingshan 提交于 9月 09, 2020

This commit removed unnecessary spin_locks in vhost_vring_call
and related operations. Because we manipulate irq offloading
contents in vhost_vdpa ioctl code path which is already
protected by dev mutex and vq mutex.
Signed-off-by: NZhu Lingshan <lingshan.zhu@intel.com>
Link: https://lore.kernel.org/r/20200909065234.3313-1-lingshan.zhu@intel.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

86e182fe

vhost: reduce stack usage in log_used · 5e5e8736

由 Li Wang 提交于 9月 15, 2020

Fix the warning: [-Werror=-Wframe-larger-than=]

drivers/vhost/vhost.c: In function log_used:
drivers/vhost/vhost.c:1906:1:
warning: the frame size of 1040 bytes is larger than 1024 bytes
Signed-off-by: NLi Wang <li.wang@windriver.com>
Link: https://lore.kernel.org/r/1600106889-25013-1-git-send-email-li.wang@windriver.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

5e5e8736

04 10月, 2020 3 次提交

vhost: Don't call log_access_ok() when using IOTLB · ab512251

由 Greg Kurz 提交于 10月 03, 2020

When the IOTLB device is enabled, the log_guest_addr that is passed by
userspace to the VHOST_SET_VRING_ADDR ioctl, and which is then written
to vq->log_addr, is a GIOVA. All writes to this address are translated
by log_user() to writes to an HVA, and then ultimately logged through
the corresponding GPAs in log_write_hva(). No logging will ever occur
with vq->log_addr in this case. It is thus wrong to pass vq->log_addr
and log_guest_addr to log_access_vq() which assumes they are actual
GPAs.

Introduce a new vq_log_used_access_ok() helper that only checks accesses
to the log for the used structure when there isn't an IOTLB device around.
Signed-off-by: NGreg Kurz <groug@kaod.org>
Link: https://lore.kernel.org/r/160171933385.284610.10189082586063280867.stgit@bahia.lanSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

ab512251

vhost: Use vhost_get_used_size() in vhost_vring_set_addr() · 71878fa4

由 Greg Kurz 提交于 10月 03, 2020

The open-coded computation of the used size doesn't take the event
into account when the VIRTIO_RING_F_EVENT_IDX feature is present.
Fix that by using vhost_get_used_size().

Fixes: 8ea8cf89 ("vhost: support event index")
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kurz <groug@kaod.org>
Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lanSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

71878fa4

vhost: Don't call access_ok() when using IOTLB · 0210a8db

由 Greg Kurz 提交于 10月 03, 2020

When the IOTLB device is enabled, the vring addresses we get
from userspace are GIOVAs. It is thus wrong to pass them down
to access_ok() which only takes HVAs.

Access validation is done at prefetch time with IOTLB. Teach
vq_access_ok() about that by moving the (vq->iotlb) check
from vhost_vq_access_ok() to vq_access_ok(). This prevents
vhost_vring_set_addr() to fail when verifying the accesses.
No behavior change for vhost_vq_access_ok().

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Cc: jasowang@redhat.com
CC: stable@vger.kernel.org # 4.14+
Signed-off-by: NGreg Kurz <groug@kaod.org>
Acked-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lanSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

0210a8db

02 9月, 2020 1 次提交

vhost: fix typo in error message · ae6961de

由 Yunsheng Lin 提交于 9月 01, 2020

"enable" should be "disable" when the function name is
vhost_disable_notify(), which does the disabling work.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae6961de

06 8月, 2020 1 次提交

vhost: generialize backend features setting/getting · 460f7ce1

由 Jason Wang 提交于 8月 04, 2020

Move the backend features setting/getting from net.c to vhost.c to be
reused by vhost-vdpa.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200804162048.22587-3-eli@mellanox.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

460f7ce1

05 8月, 2020 2 次提交

vhost: introduce vhost_vring_call · 265a0ad8

由 Zhu Lingshan 提交于 7月 31, 2020

This commit introduces struct vhost_vring_call which replaced
raw struct eventfd_ctx *call_ctx in struct vhost_virtqueue.
Besides eventfd_ctx, it contains a spin lock and an
irq_bypass_producer in its structure.
Signed-off-by: NZhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200731065533.4144-2-lingshan.zhu@intel.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

265a0ad8

vhost: Use flex_array_size() helper in copy_from_user() · bf11d71a

由 Gustavo A. R. Silva 提交于 7月 31, 2020

Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.

This helper offers defense-in-depth against potential integer
overflows, while at the same time makes it explicitly clear that
we are dealing with a flexible array member.
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20200731130956.GA30525@embeddedorSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

bf11d71a

21 7月, 2020 1 次提交

vhost: Remove redundant use of read_barrier_depends() barrier · 71c0b9a6

由 Will Deacon 提交于 10月 30, 2019

Since commit 76ebbe78 ("locking/barriers: Add implicit
smp_read_barrier_depends() to READ_ONCE()"), there is no need to use
smp_read_barrier_depends() outside of the Alpha architecture code.

Unfortunately, there is precisely _one_ user in the vhost code, and
there isn't an obvious READ_ONCE() access making the barrier
redundant. However, on closer inspection (thanks, Jason), it appears
that vring synchronisation between the producer and consumer occurs via
the 'avail_idx' field, which is followed up by an rmb() in
vhost_get_vq_desc(), making the read_barrier_depends() redundant on
Alpha.

Jason says:

  | I'm also confused about the barrier here, basically in driver side
  | we did:
  |
  | 1) allocate pages
  | 2) store pages in indirect->addr
  | 3) smp_wmb()
  | 4) increase the avail idx (somehow a tail pointer of vring)
  |
  | in vhost we did:
  |
  | 1) read avail idx
  | 2) smp_rmb()
  | 3) read indirect->addr
  | 4) read from indirect->addr
  |
  | It looks to me even the data dependency barrier is not necessary
  | since we have rmb() which is sufficient for us to the correct
  | indirect->addr and driver are not expected to do any writing to
  | indirect->addr after avail idx is increased

Remove the redundant barrier invocation.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: NJason Wang <jasowang@redhat.com>
Acked-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NWill Deacon <will@kernel.org>

71c0b9a6

11 6月, 2020 3 次提交

kernel: set USER_DS in kthread_use_mm · 37c54f9b

由 Christoph Hellwig 提交于 6月 10, 2020

Some architectures like arm64 and s390 require USER_DS to be set for
kernel threads to access user address space, which is the whole purpose of
kthread_use_mm, but other like x86 don't.  That has lead to a huge mess
where some callers are fixed up once they are tested on said
architectures, while others linger around and yet other like io_uring try
to do "clever" optimizations for what usually is just a trivial asignment
to a member in the thread_struct for most architectures.

Make kthread_use_mm set USER_DS, and kthread_unuse_mm restore to the
previous value instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20200404094101.672954-7-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37c54f9b

kernel: better document the use_mm/unuse_mm API contract · f5678e7f

由 Christoph Hellwig 提交于 6月 10, 2020

Switch the function documentation to kerneldoc comments, and add
WARN_ON_ONCE asserts that the calling thread is a kernel thread and does
not have ->mm set (or has ->mm set in the case of unuse_mm).

Also give the functions a kthread_ prefix to better document the use case.

[hch@lst.de: fix a comment typo, cover the newly merged use_mm/unuse_mm caller in vfio]
  Link: http://lkml.kernel.org/r/20200416053158.586887-3-hch@lst.de
[sfr@canb.auug.org.au: powerpc/vas: fix up for {un}use_mm() rename]
  Link: http://lkml.kernel.org/r/20200422163935.5aa93ba5@canb.auug.org.auSigned-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [usb]
Acked-by: NHaren Myneni <haren@linux.ibm.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Link: http://lkml.kernel.org/r/20200404094101.672954-6-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5678e7f

kernel: move use_mm/unuse_mm to kthread.c · 9bf5b9eb

由 Christoph Hellwig 提交于 6月 10, 2020

Patch series "improve use_mm / unuse_mm", v2.

This series improves the use_mm / unuse_mm interface by better documenting
the assumptions, and my taking the set_fs manipulations spread over the
callers into the core API.

This patch (of 3):

Use the proper API instead.

Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de

These helpers are only for use with kernel threads, and I will tie them
more into the kthread infrastructure going forward.  Also move the
prototypes to kthread.h - mmu_context.h was a little weird to start with
as it otherwise contains very low-level MM bits.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJens Axboe <axboe@kernel.dk>
Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: Zhi Wang <zhi.a.wang@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: http://lkml.kernel.org/r/20200404094101.672954-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200416053158.586887-1-hch@lst.de
Link: http://lkml.kernel.org/r/20200404094101.672954-5-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bf5b9eb

09 6月, 2020 1 次提交

vhost: convert get_user_pages() --> pin_user_pages() · 690623e1

由 John Hubbard 提交于 6月 07, 2020

This code was using get_user_pages*(), in approximately a "Case 5"
scenario (accessing the data within a page), using the categorization
from [1].  That means that it's time to convert the get_user_pages*() +
put_page() calls to pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small part
of fixing a long-standing disconnect between pinning pages, and file
systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200529234309.484480-3-jhubbard@nvidia.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

690623e1

07 6月, 2020 1 次提交

vhost: replace -1 with VHOST_FILE_UNBIND in ioctls · e0136c16

由 Zhu Lingshan 提交于 6月 05, 2020

This commit replaces -1 with VHOST_FILE_UNBIND in ioctls since
we have added such a macro in the uapi header for vdpa_host.
Signed-off-by: NZhu Lingshan <lingshan.zhu@intel.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/1591352835-22441-5-git-send-email-lingshan.zhu@intel.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

e0136c16

05 6月, 2020 3 次提交

vhost: (cosmetic) remove a superfluous variable initialisation · 002ef18e

由 Guennadi Liakhovetski 提交于 5月 27, 2020

Even the compiler is able to figure out that in this case the
initialisation is superfluous.
Signed-off-by: NGuennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Link: https://lore.kernel.org/r/20200527180541.5570-3-guennadi.liakhovetski@linux.intel.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

002ef18e

vhost: use mmgrab() instead of mmget() for non worker device · 5ce995f3

由 Jason Wang 提交于 5月 29, 2020

For the device that doesn't use vhost worker and use_mm(), mmget() is
too heavy weight and it may brings troubles for implementing mmap()
support for vDPA device.

This is because, an reference to the address space was held via
mm_get() in vhost_dev_set_owner() and an reference to the file was
held in mmap(). This means when process exits, the mm can not be
released thus we can not release the file.

This patch tries to use mmgrab() instead of mmget(), which allows the
address space to be destroy in process exit without releasing the mm
structure itself. This is sufficient for vDPA device which pin user
pages and does not depend on the address space to work.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200529080303.15449-3-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

5ce995f3

vhost: allow device that does not depend on vhost worker · 01fcb1cb

由 Jason Wang 提交于 5月 29, 2020

vDPA device currently relays the eventfd via vhost worker. This is
inefficient due the latency of wakeup and scheduling, so this patch
tries to introduce a use_worker attribute for the vhost device. When
use_worker is not set with vhost_dev_init(), vhost won't try to
allocate a worker thread and the vhost_poll will be processed directly
in the wakeup function.

This help for vDPA since it reduces the latency caused by vhost worker.

In my testing, it saves 0.2 ms in pings between VMs on a mutual host.
Signed-off-by: NZhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200529080303.15449-2-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

01fcb1cb

02 6月, 2020 1 次提交

virtio: force spec specified alignment on types · a865e420

由 Michael S. Tsirkin 提交于 4月 06, 2020

The ring element addresses are passed between components with different
alignments assumptions. Thus, if guest/userspace selects a pointer and
host then gets and dereferences it, we might need to decrease the
compiler-selected alignment to prevent compiler on the host from
assuming pointer is aligned.

This actually triggers on ARM with -mabi=apcs-gnu - which is a
deprecated configuration, but it seems safer to handle this
generally.

Note that userspace that allocates the memory is actually OK and does
not need to be fixed, but userspace that gets it from guest or another
process does need to be fixed. The later doesn't generally talk to the
kernel so while it might be buggy it's not talking to the kernel in the
buggy way - it's just using the header in the buggy way - so fixing
header and asking userspace to recompile is the best we can do.

I verified that the produced kernel binary on x86 is exactly identical
before and after the change.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

a865e420

15 5月, 2020 1 次提交

vhost: missing __user tags · 1b0be99f

由 Michael S. Tsirkin 提交于 5月 15, 2020

sparse warns about converting void * to void __user *. This is not new
but only got noticed now that vhost is built on more systems.
This is just a question of __user tags missing in a couple of places,
so fix it up.

Fixes: f8894913 ("vhost: introduce O(1) vq metadata cache")
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

1b0be99f

02 4月, 2020 2 次提交

vhost: factor out IOTLB · 0bbe3066

由 Jason Wang 提交于 3月 26, 2020

This patch factors out IOTLB into a dedicated module in order to be
reused by other modules like vringh. User may choose to enable the
automatic retiring by specifying VHOST_IOTLB_FLAG_RETIRE flag to fit
for the case of vhost device IOTLB implementation.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-4-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

0bbe3066

vhost: allow per device message handler · 792a4f2e

由 Jason Wang 提交于 3月 26, 2020

This patch allow device to register its own message handler during
vhost_dev_init(). vDPA device will use it to implement its own DMA
mapping logic.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200326140125.19794-3-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

792a4f2e

05 12月, 2019 1 次提交

vhost, kcov: collect coverage from vhost_worker · 8f6a7f96

由 Andrey Konovalov 提交于 12月 04, 2019

Add kcov_remote_start()/kcov_remote_stop() annotations to the
vhost_worker() function, which is responsible for processing vhost
works.

Since vhost_worker() threads are spawned per vhost device instance the
common kcov handle is used for kcov_remote_start()/stop() annotations
(see Documentation/dev-tools/kcov.rst for details).  As the result kcov
can now be used to collect coverage from vhost worker threads.

Link: http://lkml.kernel.org/r/e49d5d154e5da6c9ada521d2b7ce10a49ce9f98b.1572366574.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Alexander Potapenko <glider@google.com>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8f6a7f96

15 9月, 2019 1 次提交

Revert "vhost: block speculation of translated descriptors" · 0d4a3f2a

由 Michael S. Tsirkin 提交于 9月 14, 2019

This reverts commit a89db445.

I was hasty to include this patch, and it breaks the build on 32 bit.
Defence in depth is good but let's do it properly.

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

0d4a3f2a

12 9月, 2019 2 次提交

vhost: make sure log_num < in_num · 060423bf

由 yongduan 提交于 9月 11, 2019

The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: NLidong Chen <lidongchen@tencent.com>
Signed-off-by: Nruippan <ruippan@tencent.com>
Signed-off-by: Nyongduan <yongduan@tencent.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

060423bf

vhost: block speculation of translated descriptors · a89db445

由 Michael S. Tsirkin 提交于 9月 08, 2019

iovec addresses coming from vhost are assumed to be
pre-validated, but in fact can be speculated to a value
out of range.

Userspace address are later validated with array_index_nospec so we can
be sure kernel info does not leak through these addresses, but vhost
must also not leak userspace info outside the allowed memory table to
guests.

Following the defence in depth principle, make sure
the address is not validated out of node range.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: stable@vger.kernel.org
Acked-by: NJason Wang <jasowang@redhat.com>
Tested-by: NJason Wang <jasowang@redhat.com>

a89db445

04 9月, 2019 2 次提交

Revert "vhost: access vq metadata through kernel virtual address" · 3d2c7d37

由 Michael S. Tsirkin 提交于 8月 10, 2019

This reverts commit 7f466032 ("vhost: access vq metadata through
kernel virtual address").  The commit caused a bunch of issues, and
while commit 73f628ec ("vhost: disable metadata prefetch
optimization") disabled the optimization it's not nice to keep lots of
dead code around.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3d2c7d37

vhost: Remove unnecessary variable · 896fc242

由 Yunsheng Lin 提交于 8月 20, 2019

It is unnecessary to use ret variable to return the error
code, just return the error code directly.
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

896fc242

19 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 482 · 7a338472

由 Thomas Gleixner 提交于 6月 04, 2019

Based on 1 normalized pattern(s):

  this work is licensed under the terms of the gnu gpl version 2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 48 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NEnrico Weigelt <info@metux.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081204.624030236@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7a338472

09 6月, 2019 1 次提交

docs: fix broken documentation links · cb1aaebe

由 Mauro Carvalho Chehab 提交于 6月 07, 2019

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: NWolfram Sang <wsa@the-dreams.de>
Reviewed-by: NSven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
Acked-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

cb1aaebe

06 6月, 2019 6 次提交

vhost: access vq metadata through kernel virtual address · 7f466032

由 Jason Wang 提交于 5月 24, 2019

It was noticed that the copy_to/from_user() friends that was used to
access virtqueue metdata tends to be very expensive for dataplane
implementation like vhost since it involves lots of software checks,
speculation barriers, hardware feature toggling (e.g SMAP). The
extra cost will be more obvious when transferring small packets since
the time spent on metadata accessing become more significant.

This patch tries to eliminate those overheads by accessing them
through direct mapping of those pages. Invalidation callbacks is
implemented for co-operation with general VM management (swap, KSM,
THP or NUMA balancing). We will try to get the direct mapping of vq
metadata before each round of packet processing if it doesn't
exist. If we fail, we will simplely fallback to copy_to/from_user()
friends.

This invalidation and direct mapping access are synchronized through
spinlock and RCU. All matedata accessing through direct map is
protected by RCU, and the setup or invalidation are done under
spinlock.

This method might does not work for high mem page which requires
temporary mapping so we just fallback to normal
copy_to/from_user() and may not for arch that has virtual tagged cache
since extra cache flushing is needed to eliminate the alias. This will
result complex logic and bad performance. For those archs, this patch
simply go for copy_to/from_user() friends. This is done by ruling out
kernel mapping codes through ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE.

Note that this is only done when device IOTLB is not enabled. We
could use similar method to optimize IOTLB in the future.

Tests shows at most about 23% improvement on TX PPS when using
virtio-user + vhost_net + xdp1 + TAP on 2.6GHz Broadwell:

        SMAP on | SMAP off
Before: 5.2Mpps | 7.1Mpps
After:  6.4Mpps | 8.2Mpps

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David Miller <davem@davemloft.net>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-parisc@vger.kernel.org
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

7f466032

vhost: factor out setting vring addr and num · feebcaea

由 Jason Wang 提交于 5月 24, 2019

Factoring vring address and num setting which needs special care for
accelerating vq metadata accessing.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

feebcaea

vhost: introduce helpers to get the size of metadata area · 4942e825

由 Jason Wang 提交于 5月 24, 2019

To avoid code duplication since it will be used by kernel VA prefetching.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

4942e825

vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() · 9b5e830b

由 Jason Wang 提交于 5月 24, 2019

Rename the function to be more accurate since it actually tries to
prefetch vq metadata address in IOTLB. And this will be used by
following patch to prefetch metadata virtual addresses.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

9b5e830b

vhost: fine grain userspace memory accessors · 7b5d753e

由 Jason Wang 提交于 5月 24, 2019

This is used to hide the metadata address from virtqueue helpers. This
will allow to implement a vmap based fast accessing to metadata.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

7b5d753e

vhost: generalize adding used elem · 1ab5d138

由 Jason Wang 提交于 5月 24, 2019

Use one generic vhost_copy_to_user() instead of two dedicated
accessor. This will simplify the conversion to fine grain
accessors. About 2% improvement of PPS were seen during vitio-user
txonly test.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

1ab5d138

27 5月, 2019 1 次提交

vhost: introduce vhost_exceeds_weight() · e82b9b07

由 Jason Wang 提交于 5月 17, 2019

We used to have vhost_exceeds_weight() for vhost-net to:

- prevent vhost kthread from hogging the cpu
- balance the time spent between TX and RX

This function could be useful for vsock and scsi as well. So move it
to vhost.c. Device must specify a weight which counts the number of
requests, or it can also specific a byte_weight which counts the
number of bytes that has been processed.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

e82b9b07

15 5月, 2019 1 次提交

mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b

由 Ira Weiny 提交于 5月 13, 2019

To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73b0140b

11 4月, 2019 1 次提交

vhost: reject zero size iova range · 813dbeb6

由 Jason Wang 提交于 4月 09, 2019

We used to accept zero size iova range which will lead a infinite loop
in translate_desc(). Fixing this by failing the request in this case.

Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

813dbeb6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功