- 28 11月, 2017 3 次提交
-
-
由 Al Viro 提交于
its ->mask is POLL... bitmap Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
__poll_t is also used as wait key in some waitqueues. Verify that wait_..._poll() gets __poll_t as key and provide a helper for wakeup functions to get back to that __poll_t value. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 11月, 2017 1 次提交
-
-
由 Michael S. Tsirkin 提交于
During access_ok checks, addr increases as we iterate over the data structure, thus addr + len - 1 will point beyond the end of region we are translating. Harmless since we then verify that the region covers addr, but let's not waste cpu cycles. Reported-by: NKoichiro Den <den@klaipeden.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NKoichiro Den <den@klaipeden.com>
-
- 09 9月, 2017 1 次提交
-
-
由 Davidlohr Bueso 提交于
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NJérôme Glisse <jglisse@redhat.com> Acked-by: NChristian König <christian.koenig@amd.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDoug Ledford <dledford@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 7月, 2017 1 次提交
-
-
由 Jason Wang 提交于
This reverts commit 809ecb9b. Since it was reported to break vhost_net. We want to cache used event and use it to check for notification. The assumption was that guest won't move the event idx back, but this could happen in fact when 16 bit index wraps around after 64K entries. Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 6月, 2017 1 次提交
-
-
由 Ingo Molnar 提交于
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 5月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
vhost code uses __GFP_REPEAT when allocating vhost_virtqueue resp. vhost_vsock because it would really like to prefer kmalloc to the vmalloc fallback - see 23cc5a99 ("vhost-net: extend device allocation to vmalloc") for more context. Michael Tsirkin has also noted: "__GFP_REPEAT overhead is during allocation time. Using vmalloc means all accesses are slowed down. Allocation is not on data path, accesses are." The similar applies to other vhost_kvzalloc users. Let's teach kvmalloc_node to handle __GFP_REPEAT properly. There are two things to be careful about. First we should prevent from the OOM killer and so have to involve __GFP_NORETRY by default and secondly override __GFP_REPEAT for !costly order requests as the __GFP_REPEAT is ignored for !costly orders. Supporting __GFP_REPEAT like semantic for !costly request is possible it would require changes in the page allocator. This is out of scope of this patch. This patch shouldn't introduce any functional change. Link: http://lkml.kernel.org/r/20170306103032.2540-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 3月, 2017 3 次提交
-
-
由 Ingo Molnar 提交于
sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/mm.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. The APIs that are going to be moved first are: mm_alloc() __mmdrop() mmdrop() mmdrop_async_fn() mmdrop_async() mmget_not_zero() mmput() mmput_async() get_task_mm() mm_access() mm_release() Include the new header in the files that are going to need it. Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Wang 提交于
When device IOTLB is enabled, all address translations were stored in interval tree. O(lgN) searching time could be slow for virtqueue metadata (avail, used and descriptors) since they were accessed much often than other addresses. So this patch introduces an O(1) array which points to the interval tree nodes that store the translations of vq metadata. Those array were update during vq IOTLB prefetching and were reset during each invalidation and tlb update. Each time we want to access vq metadata, this small array were queried before interval tree. This would be sufficient for static mappings but not dynamic mappings, we could do optimizations on top. Test were done with l2fwd in guest (2M hugepage): noiommu | before | after tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%) rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%) We can almost reach the same performance as noiommu mode. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 28 2月, 2017 1 次提交
-
-
由 Jason Wang 提交于
If last avail idx is not equal to cached avail idx, we're sure there's still available buffers in the virtqueue so there's no need to re-read avail idx. So let's skip this to avoid unnecessary userspace memory access and memory barrier. Pktgen test show about 3% improvement on rx pps. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 04 2月, 2017 1 次提交
-
-
由 Halil Pasic 提交于
Currently, under certain circumstances vhost_init_is_le does just a part of the initialization job, and depends on vhost_reset_is_le being called too. For this reason vhost_vq_init_access used to call vhost_reset_is_le when vq->private_data is NULL. This is not only counter intuitive, but also real a problem because it breaks vhost_net. The bug was introduced to vhost_net with commit 2751c988 ("vhost: cross-endian support for legacy devices"). The symptom is corruption of the vq's used.idx field (virtio) after VHOST_NET_SET_BACKEND was issued as a part of the vhost shutdown on a vq with pending descriptors. Let us make sure the outcome of vhost_init_is_le never depend on the state it is actually supposed to initialize, and fix virtio_net by removing the reset from vhost_vq_init_access. With the above, there is no reason for vhost_reset_is_le to do just half of the job. Let us make vhost_reset_is_le reinitialize is_le. Signed-off-by: NHalil Pasic <pasic@linux.vnet.ibm.com> Reported-by: NMichael A. Tebolt <miket@us.ibm.com> Reported-by: NDr. David Alan Gilbert <dgilbert@redhat.com> Fixes: commit 2751c988 ("vhost: cross-endian support for legacy devices") Cc: <stable@vger.kernel.org> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NGreg Kurz <groug@kaod.org> Tested-by: NMichael A. Tebolt <miket@us.ibm.com>
-
- 19 1月, 2017 1 次提交
-
-
由 Jason Wang 提交于
This patch tries to do several tweaks on vhost_vq_avail_empty() for a better performance: - check cached avail index first which could avoid userspace memory access. - using unlikely() for the failure of userspace access - check vq->last_avail_idx instead of cached avail index as the last step. This patch is need for batching supports which needs to peek whether or not there's still available buffers in the ring. Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 12月, 2016 1 次提交
-
-
由 Jason Wang 提交于
When event index was enabled, we need to fetch used event from userspace memory each time. This userspace fetch (with memory barrier) could be saved sometime when 1) caching used event and 2) if used event is ahead of new and old to new updating does not cross it, we're sure there's no need to notify guest. This will be useful for heavy tx load e.g guest pktgen test with Linux driver shows ~3.5% improvement. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 15 12月, 2016 2 次提交
-
-
由 Michael S. Tsirkin 提交于
Several vhost functions were missing __user annotations on pointers, causing sparse warnings. Fix this up. sparse also warns about vhost_process_iotlb_msg which is local and should be static. Fix that up as well. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
vhost_umem_interval_tree is only used locally within vhost.c, mark it static. As some functions generated go unused, this triggers warnings unless we also mark it inline. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 09 12月, 2016 1 次提交
-
-
由 Peng Tao 提交于
test_and_set_bit() already implies a memory barrier. Signed-off-by: NPeng Tao <bergwolf@gmail.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2016 1 次提交
-
-
由 Al Viro 提交于
copy_from_iter_full(), copy_from_iter_full_nocache() and csum_and_copy_from_iter_full() - counterparts of copy_from_iter() et.al., advancing iterator only in case of successful full copy and returning whether it had been successful or not. Convert some obvious users. *NOTE* - do not blindly assume that something is a good candidate for those unless you are sure that not advancing iov_iter in failure case is the right thing in this case. Anything that does short read/short write kind of stuff (or is in a loop, etc.) is unlikely to be a good one. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 8月, 2016 6 次提交
-
-
由 Michael S. Tsirkin 提交于
Detect and fail early if long wrap around is triggered. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Jason Wang 提交于
This patch tries to implement an device IOTLB for vhost. This could be used with userspace(qemu) implementation of DMA remapping to emulate an IOMMU for the guest. The idea is simple, cache the translation in a software device IOTLB (which is implemented as an interval tree) in vhost and use vhost_net file descriptor for reporting IOTLB miss and IOTLB update/invalidation. When vhost meets an IOTLB miss, the fault address, size and access can be read from the file. After userspace finishes the translation, it writes the translated address to the vhost_net file to update the device IOTLB. When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq addresses set by ioctl are treated as iova instead of virtual address and the accessing can only be done through IOTLB instead of direct userspace memory access. Before each round or vq processing, all vq metadata is prefetched in device IOTLB to make sure no translation fault happens during vq processing. In most cases, virtqueues are contiguous even in virtual address space. The IOTLB translation for virtqueue itself may make it a little slower. We might add fast path cache on top of this patch. Signed-off-by: NJason Wang <jasowang@redhat.com> [mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ] [mst: fix build warnings ] Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> [ weiyj.lk: missing unlock on error ] Signed-off-by: NWei Yongjun <weiyj.lk@gmail.com>
-
由 Jason Wang 提交于
Current pre-sorted memory region array has some limitations for future device IOTLB conversion: 1) need extra work for adding and removing a single region, and it's expected to be slow because of sorting or memory re-allocation. 2) need extra work of removing a large range which may intersect several regions with different size. 3) need trick for a replacement policy like LRU To overcome the above shortcomings, this patch convert it to interval tree which can easily address the above issue with almost no extra work. The patch could be used for: - Extend the current API and only let the userspace to send diffs of memory table. - Simplify Device IOTLB implementation. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Jason Wang 提交于
This patch introduces vhost memory accessors which were just wrappers for userspace address access helpers. This is a requirement for vhost device iotlb implementation which will add iotlb translations in those accessors. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Jason Wang 提交于
We use spinlock to synchronize the work list now which may cause unnecessary contentions. So this patch switch to use llist to remove this contention. Pktgen tests shows about 5% improvement: Before: ~1300000 pps After: ~1370000 pps Signed-off-by: NJason Wang <jasowang@redhat.com> Reviewed-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Jason Wang 提交于
We used to implement the work flushing through tracking queued seq, done seq, and the number of flushing. This patch simplify this by just implement work flushing through another kind of vhost work with completion. This will be used by lockless enqueuing patch. Signed-off-by: NJason Wang <jasowang@redhat.com> Reviewed-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 11 3月, 2016 3 次提交
-
-
由 Jason Wang 提交于
This patch tries to poll for new added tx buffer or socket receive queue for a while at the end of tx/rx processing. The maximum time spent on polling were specified through a new kind of vring ioctl. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Jason Wang 提交于
This patch introduces a helper which will return true if we're sure that the available ring is empty for a specific vq. When we're not sure, e.g vq access failure, return false instead. This could be used for busy polling code to exit the busy loop. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Jason Wang 提交于
This path introduces a helper which can give a hint for whether or not there's a work queued in the work list. This could be used for busy polling code to exit the busy loop. Signed-off-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 02 3月, 2016 3 次提交
-
-
由 Greg Kurz 提交于
Looking at how callers use this, maybe we should just rename init_used to vhost_vq_init_access. The _used suffix was a hint that we access the vq used ring. But maybe what callers care about is that it must be called after access_ok. Also, this function manipulates the vq->is_le field which isn't related to the vq used ring. This patch simply renames vhost_init_used() to vhost_vq_init_access() as suggested by Michael. No behaviour change. Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Greg Kurz 提交于
The default use case for vhost is when the host and the vring have the same endianness (default native endianness). But there are cases where they differ and vhost should byteswap when accessing the vring. The first case is when the host is big endian and the vring belongs to a virtio 1.0 device, which is always little endian. This is covered by the vq->is_le field. This field is initialized when userspace calls the VHOST_SET_FEATURES ioctl. It is reset when the device stops. We already have a vhost_init_is_le() helper, but the reset operation is opencoded as follows: vq->is_le = virtio_legacy_is_little_endian(); It isn't clear that we are resetting vq->is_le here. This patch moves the code to a helper with a more explicit name. The other case where we may have to byteswap is when the architecture can switch endianness at runtime (bi-endian). If endianness differs in the host and in the guest, then legacy devices need to be used in cross-endian mode. This mode is available with CONFIG_VHOST_CROSS_ENDIAN_LEGACY=y, which introduces a vq->user_be field. Userspace may enable cross-endian mode by calling the SET_VRING_ENDIAN ioctl before the device is started. The cross-endian mode is disabled when the device is stopped. The current names of the helpers that manipulate vq->user_be are unclear. This patch renames those helpers to clearly show that this is cross-endian stuff and with explicit enable/disable semantics. No behaviour change. Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Greg Kurz 提交于
We don't want side effects. If something fails, we rollback vq->is_le to its previous value. Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 07 12月, 2015 2 次提交
-
-
由 Michael S. Tsirkin 提交于
We know vring num is a power of 2, so use & to mask the high bits. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
commit 5d9a07b0 ("vhost: relax used address alignment") fixed the alignment for the used virtual address, but not for the physical address used for logging. That's a mistake: alignment should clearly be the same for virtual and physical addresses, Cc: stable@vger.kernel.org Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 27 7月, 2015 2 次提交
-
-
由 Igor Mammedov 提交于
callers of vhost_kvzalloc() expect the same behaviour on allocation error as from kmalloc/vmalloc i.e. NULL return value. So just return vzmalloc() returned value instead of returning ERR_PTR(-ENOMEM) Fixes: 4de7255f ("vhost: extend memory regions allocation to vmalloc") Spotted-by: NDan Carpenter <dan.carpenter@oracle.com> Suggested-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Marc-André Lureau 提交于
While reviewing vhost log code, I found out that log_file is never set. Note: I haven't tested the change (QEMU doesn't use LOG_FD yet). Cc: stable@vger.kernel.org Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 14 7月, 2015 2 次提交
-
-
由 Igor Mammedov 提交于
it became possible to use a bigger amount of memory slots, which is used by memory hotplug for registering hotplugged memory. However QEMU crashes if it's used with more than ~60 pc-dimm devices and vhost-net enabled since host kernel in module vhost-net refuses to accept more than 64 memory regions. Allow to tweak limit via max_mem_regions module paramemter with default value set to 64 slots. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Igor Mammedov 提交于
with large number of memory regions we could end up with high order allocations and kmalloc could fail if host is under memory pressure. Considering that memory regions array is used on hot path try harder to allocate using kmalloc and if it fails resort to vmalloc. It's still better than just failing vhost_set_memory() and causing guest crash due to it when a new memory hotplugged to guest. I'll still look at QEMU side solution to reduce amount of memory regions it feeds to vhost to make things even better, but it doesn't hurt for kernel to behave smarter and don't crash older QEMU's which could use large amount of memory regions. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 01 7月, 2015 1 次提交
-
-
由 Igor Mammedov 提交于
For default region layouts performance stays the same as linear search i.e. it takes around 210ns average for translate_desc() that inlines find_region(). But it scales better with larger amount of regions, 235ns BS vs 300ns LS with 55 memory regions and it will be about the same values when allowed number of slots is increased to 509 like it has been done in kvm. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 01 6月, 2015 1 次提交
-
-
由 Greg Kurz 提交于
This patch brings cross-endian support to vhost when used to implement legacy virtio devices. Since it is a relatively rare situation, the feature availability is controlled by a kernel config option (not set by default). The vq->is_le boolean field is added to cache the endianness to be used for ring accesses. It defaults to native endian, as expected by legacy virtio devices. When the ring gets active, we force little endian if the device is modern. When the ring is deactivated, we revert to the native endian default. If cross-endian was compiled in, a vq->user_be boolean field is added so that userspace may request a specific endianness. This field is used to override the default when activating the ring of a legacy device. It has no effect on modern devices. Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
-
- 04 2月, 2015 1 次提交
-
-
由 Al Viro 提交于
Cc: Michael S. Tsirkin <mst@redhat.com> Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-