1. 28 2月, 2017 1 次提交
  2. 04 2月, 2017 1 次提交
    • H
      vhost: fix initialization for vq->is_le · cda8bba0
      Halil Pasic 提交于
      Currently, under certain circumstances vhost_init_is_le does just a part
      of the initialization job, and depends on vhost_reset_is_le being called
      too. For this reason vhost_vq_init_access used to call vhost_reset_is_le
      when vq->private_data is NULL. This is not only counter intuitive, but
      also real a problem because it breaks vhost_net. The bug was introduced to
      vhost_net with commit 2751c988 ("vhost: cross-endian support for
      legacy devices"). The symptom is corruption of the vq's used.idx field
      (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost
      shutdown on a vq with pending descriptors.
      
      Let us make sure the outcome of vhost_init_is_le never depend on the state
      it is actually supposed to initialize, and fix virtio_net by removing the
      reset from vhost_vq_init_access.
      
      With the above, there is no reason for vhost_reset_is_le to do just half
      of the job. Let us make vhost_reset_is_le reinitialize is_le.
      Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com>
      Reported-by: NMichael A. Tebolt <miket@us.ibm.com>
      Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Fixes: commit 2751c988 ("vhost: cross-endian support for legacy devices")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Tested-by: NMichael A. Tebolt <miket@us.ibm.com>
      cda8bba0
  3. 16 12月, 2016 1 次提交
    • J
      vhost: cache used event for better performance · 809ecb9b
      Jason Wang 提交于
      When event index was enabled, we need to fetch used event from
      userspace memory each time. This userspace fetch (with memory
      barrier) could be saved sometime when 1) caching used event and 2)
      if used event is ahead of new and old to new updating does not cross
      it, we're sure there's no need to notify guest.
      
      This will be useful for heavy tx load e.g guest pktgen test with Linux
      driver shows ~3.5% improvement.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      809ecb9b
  4. 15 12月, 2016 2 次提交
  5. 09 12月, 2016 1 次提交
  6. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  7. 02 8月, 2016 6 次提交
    • M
      vhost: detect 32 bit integer wrap around · ec33d031
      Michael S. Tsirkin 提交于
      Detect and fail early if long wrap around is triggered.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      ec33d031
    • J
      vhost: new device IOTLB API · 6b1e6cc7
      Jason Wang 提交于
      This patch tries to implement an device IOTLB for vhost. This could be
      used with userspace(qemu) implementation of DMA remapping
      to emulate an IOMMU for the guest.
      
      The idea is simple, cache the translation in a software device IOTLB
      (which is implemented as an interval tree) in vhost and use vhost_net
      file descriptor for reporting IOTLB miss and IOTLB
      update/invalidation. When vhost meets an IOTLB miss, the fault
      address, size and access can be read from the file. After userspace
      finishes the translation, it writes the translated address to the
      vhost_net file to update the device IOTLB.
      
      When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq
      addresses set by ioctl are treated as iova instead of virtual address and
      the accessing can only be done through IOTLB instead of direct userspace
      memory access. Before each round or vq processing, all vq metadata is
      prefetched in device IOTLB to make sure no translation fault happens
      during vq processing.
      
      In most cases, virtqueues are contiguous even in virtual address space.
      The IOTLB translation for virtqueue itself may make it a little
      slower. We might add fast path cache on top of this patch.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      [mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ]
      [mst: fix build warnings ]
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      [ weiyj.lk: missing unlock on error ]
      Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
      6b1e6cc7
    • J
      vhost: convert pre sorted vhost memory array to interval tree · a9709d68
      Jason Wang 提交于
      Current pre-sorted memory region array has some limitations for future
      device IOTLB conversion:
      
      1) need extra work for adding and removing a single region, and it's
         expected to be slow because of sorting or memory re-allocation.
      2) need extra work of removing a large range which may intersect
         several regions with different size.
      3) need trick for a replacement policy like LRU
      
      To overcome the above shortcomings, this patch convert it to interval
      tree which can easily address the above issue with almost no extra
      work.
      
      The patch could be used for:
      
      - Extend the current API and only let the userspace to send diffs of
        memory table.
      - Simplify Device IOTLB implementation.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      a9709d68
    • J
      vhost: introduce vhost memory accessors · bfe2bc51
      Jason Wang 提交于
      This patch introduces vhost memory accessors which were just wrappers
      for userspace address access helpers. This is a requirement for vhost
      device iotlb implementation which will add iotlb translations in those
      accessors.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      bfe2bc51
    • J
      vhost: lockless enqueuing · 04b96e55
      Jason Wang 提交于
      We use spinlock to synchronize the work list now which may cause
      unnecessary contentions. So this patch switch to use llist to remove
      this contention. Pktgen tests shows about 5% improvement:
      
      Before:
      ~1300000 pps
      After:
      ~1370000 pps
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      04b96e55
    • J
      vhost: simplify work flushing · 7235acdb
      Jason Wang 提交于
      We used to implement the work flushing through tracking queued seq,
      done seq, and the number of flushing. This patch simplify this by just
      implement work flushing through another kind of vhost work with
      completion. This will be used by lockless enqueuing patch.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      7235acdb
  8. 11 3月, 2016 3 次提交
  9. 02 3月, 2016 3 次提交
    • G
      vhost: rename vhost_init_used() · 80f7d030
      Greg Kurz 提交于
      Looking at how callers use this, maybe we should just rename init_used
      to vhost_vq_init_access. The _used suffix was a hint that we
      access the vq used ring. But maybe what callers care about is
      that it must be called after access_ok.
      
      Also, this function manipulates the vq->is_le field which isn't related
      to the vq used ring.
      
      This patch simply renames vhost_init_used() to vhost_vq_init_access() as
      suggested by Michael.
      
      No behaviour change.
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      80f7d030
    • G
      vhost: rename cross-endian helpers · c5072037
      Greg Kurz 提交于
      The default use case for vhost is when the host and the vring have the
      same endianness (default native endianness). But there are cases where
      they differ and vhost should byteswap when accessing the vring.
      
      The first case is when the host is big endian and the vring belongs to
      a virtio 1.0 device, which is always little endian.
      
      This is covered by the vq->is_le field. This field is initialized when
      userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device
      stops.
      
      We already have a vhost_init_is_le() helper, but the reset operation is
      opencoded as follows:
      
      	vq->is_le = virtio_legacy_is_little_endian();
      
      It isn't clear that we are resetting vq->is_le here.
      
      This patch moves the code to a helper with a more explicit name.
      
      The other case where we may have to byteswap is when the architecture can
      switch endianness at runtime (bi-endian). If endianness differs in the host
      and in the guest, then legacy devices need to be used in cross-endian mode.
      
      This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which
      introduces a vq->user_be field. Userspace may enable cross-endian mode
      by calling the SET_VRING_ENDIAN ioctl before the device is started. The
      cross-endian mode is disabled when the device is stopped.
      
      The current names of the helpers that manipulate vq->user_be are unclear.
      
      This patch renames those helpers to clearly show that this is cross-endian
      stuff and with explicit enable/disable semantics.
      
      No behaviour change.
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      c5072037
    • G
      vhost: fix error path in vhost_init_used() · e1f33be9
      Greg Kurz 提交于
      We don't want side effects. If something fails, we rollback vq->is_le to
      its previous value.
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      e1f33be9
  10. 07 12月, 2015 2 次提交
  11. 27 7月, 2015 2 次提交
  12. 14 7月, 2015 2 次提交
    • I
      vhost: add max_mem_regions module parameter · c9ce42f7
      Igor Mammedov 提交于
      it became possible to use a bigger amount of memory
      slots, which is used by memory hotplug for
      registering hotplugged memory.
      However QEMU crashes if it's used with more than ~60
      pc-dimm devices and vhost-net enabled since host kernel
      in module vhost-net refuses to accept more than 64
      memory regions.
      
      Allow to tweak limit via max_mem_regions module paramemter
      with default value set to 64 slots.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      c9ce42f7
    • I
      vhost: extend memory regions allocation to vmalloc · 4de7255f
      Igor Mammedov 提交于
      with large number of memory regions we could end up with
      high order allocations and kmalloc could fail if
      host is under memory pressure.
      Considering that memory regions array is used on hot path
      try harder to allocate using kmalloc and if it fails resort
      to vmalloc.
      It's still better than just failing vhost_set_memory() and
      causing guest crash due to it when a new memory hotplugged
      to guest.
      
      I'll still look at QEMU side solution to reduce amount of
      memory regions it feeds to vhost to make things even better,
      but it doesn't hurt for kernel to behave smarter and don't
      crash older QEMU's which could use large amount of memory
      regions.
      Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      4de7255f
  13. 01 7月, 2015 1 次提交
  14. 01 6月, 2015 1 次提交
    • G
      vhost: cross-endian support for legacy devices · 2751c988
      Greg Kurz 提交于
      This patch brings cross-endian support to vhost when used to implement
      legacy virtio devices. Since it is a relatively rare situation, the
      feature availability is controlled by a kernel config option (not set
      by default).
      
      The vq->is_le boolean field is added to cache the endianness to be
      used for ring accesses. It defaults to native endian, as expected
      by legacy virtio devices. When the ring gets active, we force little
      endian if the device is modern. When the ring is deactivated, we
      revert to the native endian default.
      
      If cross-endian was compiled in, a vq->user_be boolean field is added
      so that userspace may request a specific endianness. This field is
      used to override the default when activating the ring of a legacy
      device. It has no effect on modern devices.
      Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2751c988
  15. 04 2月, 2015 1 次提交
  16. 29 12月, 2014 1 次提交
    • M
      vhost: relax used address alignment · 5d9a07b0
      Michael S. Tsirkin 提交于
      virtio 1.0 only requires used address to be 4 byte aligned,
      vhost required 8 bytes (size of vring_used_elem).
      Fix up vhost to match that.
      
      Additionally, while vhost correctly requires 8 byte
      alignment for log, it's unconnected to used ring:
      it's a consequence that log has u64 entries.
      Tweak code to make that clearer.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      5d9a07b0
  17. 09 12月, 2014 2 次提交
  18. 09 6月, 2014 3 次提交
  19. 07 12月, 2013 1 次提交
  20. 17 9月, 2013 1 次提交
    • Q
      vhost: wake up worker outside spin_lock · ac9fde24
      Qin Chuanyu 提交于
      the wake_up_process func is included by spin_lock/unlock in
      vhost_work_queue,
      but it could be done outside the spin_lock.
      I have test it with kernel 3.0.27 and guest suse11-sp2 using iperf,
      the num as below.
                        original                 modified
      thread_num  tp(Gbps)   vhost(%)  |  tp(Gbps)     vhost(%)
      1           9.59        28.82    |   9.59        27.49
      8           9.61        32.92    |   9.62        26.77
      64          9.58        46.48    |   9.55        38.99
      256         9.6         63.7     |   9.6         52.59
      Signed-off-by: NChuanyu Qin <qinchuanyu@huawei.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      ac9fde24
  21. 04 9月, 2013 1 次提交
  22. 21 8月, 2013 1 次提交
  23. 07 7月, 2013 2 次提交