- 30 4月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
The parameter to ZERO_PAGE() was wrong, but since all architectures except for MIPS and s390 ignore it, it wasn't noticed until 0-day reported the build error. Fixes: 67f269b3 ("RDMA/ucontext: Fix regression with disassociate") Cc: stable@vger.kernel.org Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Leon Romanovsky <leonro@mellanox.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 4月, 2019 1 次提交
-
-
由 Jason Gunthorpe 提交于
When this code was consolidated the intention was that the VMA would become backed by anonymous zero pages after the zap_vma_pte - however this very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED bits to transform the VMA into an anonymous VMA. Since the vm_ops was removed this broke. Now userspace gets a SIGBUS if it touches the vma after disassociation. Instead of converting the VMA to anonymous provide a fault handler that puts a zero'd page into the VMA when user-space touches it after disassociation. Cc: stable@vger.kernel.org Suggested-by: NAndrea Arcangeli <aarcange@redhat.com> Fixes: 5f9794dc ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 20 4月, 2019 1 次提交
-
-
由 Andrea Arcangeli 提交于
The core dumping code has always run without holding the mmap_sem for writing, despite that is the only way to ensure that the entire vma layout will not change from under it. Only using some signal serialization on the processes belonging to the mm is not nearly enough. This was pointed out earlier. For example in Hugh's post from Jul 2017: https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils "Not strictly relevant here, but a related note: I was very surprised to discover, only quite recently, how handle_mm_fault() may be called without down_read(mmap_sem) - when core dumping. That seems a misguided optimization to me, which would also be nice to correct" In particular because the growsdown and growsup can move the vm_start/vm_end the various loops the core dump does around the vma will not be consistent if page faults can happen concurrently. Pretty much all users calling mmget_not_zero()/get_task_mm() and then taking the mmap_sem had the potential to introduce unexpected side effects in the core dumping code. Adding mmap_sem for writing around the ->core_dump invocation is a viable long term fix, but it requires removing all copy user and page faults and to replace them with get_dump_page() for all binary formats which is not suitable as a short term fix. For the time being this solution manually covers the places that can confuse the core dump either by altering the vma layout or the vma flags while it runs. Once ->core_dump runs under mmap_sem for writing the function mmget_still_valid() can be dropped. Allowing mmap_sem protected sections to run in parallel with the coredump provides some minor parallelism advantage to the swapoff code (which seems to be safe enough by never mangling any vma field and can keep doing swapins in parallel to the core dumping) and to some other corner case. In order to facilitate the backporting I added "Fixes: 86039bd3" however the side effect of this same race condition in /proc/pid/mem should be reproducible since before 2.6.12-rc2 so I couldn't add any other "Fixes:" because there's no hash beyond the git genesis commit. Because find_extend_vma() is the only location outside of the process context that could modify the "mm" structures under mmap_sem for reading, by adding the mmget_still_valid() check to it, all other cases that take the mmap_sem for reading don't need the new check after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault context also doesn't need the new check, because all tasks under core dumping are frozen. Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com Fixes: 86039bd3 ("userfaultfd: add new syscall to provide memory externalization") Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: NJann Horn <jannh@google.com> Suggested-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NPeter Xu <peterx@redhat.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NJann Horn <jannh@google.com> Acked-by: NJason Gunthorpe <jgg@mellanox.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 2月, 2019 1 次提交
-
-
由 Shamir Rabinovitch 提交于
Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows as soon as the flow has ib_uobject. In addition, remove rdma_get_ucontext helper function that is only used by ib_umem_get. Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 31 1月, 2019 1 次提交
-
-
由 Gal Pressman 提交于
Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: NGal Pressman <galpress@amazon.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 30 1月, 2019 1 次提交
-
-
由 Yishai Hadas 提交于
The vma->vm_mm can become impossible to get before rdma_umap_close() is called, in this case we must not try to get an mm that is already undergoing process exit. In this case there is no need to wait for anything as the VMA will be destroyed by another thread soon and is already effectively 'unreachable' by userspace. BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 800000012bc50067 P4D 800000012bc50067 PUD 129db5067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 2050 Comm: bash Tainted: G W OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__rb_erase_color+0xb9/0x280 Code: 84 17 01 00 00 48 3b 68 10 0f 84 15 01 00 00 48 89 58 08 48 89 de 48 89 ef 4c 89 e3 e8 90 84 22 00 e9 60 ff ff ff 48 8b 5d 10 <f6> 03 01 0f 84 9c 00 00 00 48 8b 43 10 48 85 c0 74 09 f6 00 01 0f RSP: 0018:ffffbecfc090bab8 EFLAGS: 00010246 RAX: ffff97616346cf30 RBX: 0000000000000000 RCX: 0000000000000101 RDX: 0000000000000000 RSI: ffff97623b6ca828 RDI: ffff97621ef10828 RBP: ffff97621ef10828 R08: ffff97621ef10828 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff97623b6ca838 R13: ffffffffbb3fef50 R14: ffff97623b6ca828 R15: 0000000000000000 FS: 00007f7a5c31d740(0000) GS:ffff97623bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011255a000 CR4: 00000000000006e0 Call Trace: unlink_file_vma+0x3b/0x50 free_pgtables+0xa1/0x110 exit_mmap+0xca/0x1a0 ? mlx5_ib_dealloc_pd+0x28/0x30 [mlx5_ib] mmput+0x54/0x140 uverbs_user_mmap_disassociate+0xcc/0x160 [ib_uverbs] uverbs_destroy_ufile_hw+0xf7/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Cc: <stable@vger.kernel.org> # 4.19 Fixes: 5f9794dc ("RDMA/ucontext: Add a core API for mmaping driver IO memory") Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 26 1月, 2019 1 次提交
-
-
由 Yishai Hadas 提交于
The async_file might be freed before the disassociation has been ended, causing qp shutdown to use after free on it. Since uverbs_destroy_ufile_hw is not a fence, it returns if a disassociation is ongoing in another thread. It has to be written this way to avoid deadlock. However this means that the ufile FD close cannot destroy anything that may still be used by an active kref, such as the the async_file. To fix that move the kref_put() to be in ib_uverbs_release_file(). BUG: unable to handle kernel paging request at ffffffffba682787 PGD bc80e067 P4D bc80e067 PUD bc80f063 PMD 1313df163 PTE 80000000bc682061 Oops: 0003 [#1] SMP PTI CPU: 1 PID: 32410 Comm: bash Tainted: G OE 4.20.0-rc6+ #3 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_lock_slowpath+0x1b3/0x2a0 Code: 98 83 e2 60 49 89 df 48 8b 04 c5 80 18 72 ba 48 8d ba 80 32 02 00 ba 00 80 00 00 4c 8d 65 14 41 bd 01 00 00 00 48 01 c7 85 d2 <48> 89 2f 48 89 fb 74 14 8b 45 08 85 c0 75 42 84 d2 74 6b f3 90 83 RSP: 0018:ffffc1bbc064fb58 EFLAGS: 00010006 RAX: ffffffffba65f4e7 RBX: ffff9f209c656c00 RCX: 0000000000000001 RDX: 0000000000008000 RSI: 0000000000000000 RDI: ffffffffba682787 RBP: ffff9f217bb23280 R08: 0000000000000001 R09: 0000000000000000 R10: ffff9f209d2c7800 R11: ffffffffffffffe8 R12: ffff9f217bb23294 R13: 0000000000000001 R14: 0000000000000000 R15: ffff9f209c656c00 FS: 00007fac55aad740(0000) GS:ffff9f217bb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffba682787 CR3: 000000012f8e0000 CR4: 00000000000006e0 Call Trace: _raw_spin_lock_irq+0x27/0x30 ib_uverbs_release_uevent+0x1e/0xa0 [ib_uverbs] uverbs_free_qp+0x7e/0x90 [ib_uverbs] destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs] uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0x73/0x90 [ib_uverbs] uverbs_destroy_ufile_hw+0x5d/0x120 [ib_uverbs] ib_uverbs_remove_one+0xea/0x240 [ib_uverbs] ib_unregister_device+0xfb/0x200 [ib_core] mlx5_ib_remove+0x51/0xe0 [mlx5_ib] mlx5_remove_device+0xc1/0xd0 [mlx5_core] mlx5_unregister_device+0x3d/0xb0 [mlx5_core] remove_one+0x2a/0x90 [mlx5_core] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x16d/0x240 unbind_store+0xb2/0x100 kernfs_fop_write+0x102/0x180 __vfs_write+0x36/0x1a0 ? __alloc_fd+0xa9/0x170 ? set_close_on_exec+0x49/0x70 vfs_write+0xad/0x1a0 ksys_write+0x52/0xc0 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fac551aac60 Cc: <stable@vger.kernel.org> # 4.2 Fixes: 036b1063 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 15 1月, 2019 1 次提交
-
-
由 Jason Gunthorpe 提交于
When the ioctl interface for the write commands was introduced it did not mark the core response with UVERBS_ATTR_F_VALID_OUTPUT. This causes rdma-core in userspace to not mark the buffers as written for valgrind. Along the same lines it turns out we have always missed marking the driver data. Fixing both of these makes valgrind work properly with rdma-core and ioctl. Fixes: 4785860e ("RDMA/uverbs: Implement an ioctl that can call write and write_ex handlers") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 11 1月, 2019 1 次提交
-
-
由 Jason Gunthorpe 提交于
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>
-
- 04 1月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2018 1 次提交
-
-
由 Kamal Heib 提交于
Make all the required change to start use the ib_device_ops structure. Signed-off-by: NKamal Heib <kamalheib1@gmail.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 04 12月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
All of the old arguments can be derived from the uverbs_attr_bundle structure, so get rid of the redundant arguments. Most of the prior work has been removing users of the arguments to allow this to be a simple patch. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
This creates a consistent way to access the two core buffers across write and write_ex handlers. Remove the open coded ucore conversion in the write/ex compatibility handlers. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 27 11月, 2018 5 次提交
-
-
由 Jason Gunthorpe 提交于
Now that we have metadata describing the command format the core code can directly compute the udata pointers and all the really ugly ib_uverbs_init_udata() calls can be removed from the handlers. This means all the write() handlers are no longer sensitive to the layout of the command buffer. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The core code needs to compute the udata so we may as well pass it in the uverbs_attr_bundle instead of on the stack. This converts the simple case of write_ex() which already has a core calculation. Also change the write() path to use the attrs for ib_uverbs_init_udata() instead of on the stack. This lets the write to write_ex compatibility path continue to follow the lead of the _ex path. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The size meta-data in the prior patch describes the smallest acceptable buffer for the write() interface. Globally check this in the core code. This is necessary in the case of write() methods that have a driver udata to prevent computing a negative udata buffer length. The return code of -ENOSPC is chosen here as some of the handlers already use this code, however many other handler use EINVAL. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Currently they return the command length, while all other handlers return 0. This makes the write path closer to the write_ex and ioctl path. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that we can add meta-data to the description of write() methods we need to pass the uverbs_attr_bundle into all write based handlers so future patches can use it as a container for any new data transferred out of the core. This is the first step to bringing the write() and ioctl() methods to a common interface signature. This is a simple search/replace, and we push the attr down into the uobj and other APIs to keep changes minimal. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 23 11月, 2018 3 次提交
-
-
由 Jason Gunthorpe 提交于
This organizes the write commands into objects and links them to the uverbs_api data structure. The command path is reworked to use uapi instead of its internal structures. The command mask is moved from a runtime check to a registration time check in the uapi. Since the write interface does not have the object ID as part of the command, the radix bins are converted into linear lists to support the lookup. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
We have many cases where parts of the uapi are not supported in a driver, needs a certain protocol, or whatever. It is best to reflect this directly into the struct uverbs_api when it is built so that everything is simply blocked off, and future introspection can report a proper supported list. This is done by adding some additional helpers to the definition list language that disable objects based on a 'supported' call back, and a helper that disables based on a NULL struct ib_device function pointer. Disablement is global. For instance, if a driver disables an object then everything connected to that object is removed, including core methods. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The 'tree' data structure is very hard to build at compile time, and this makes it very limited. The new radix tree based compiler can handle a more complex input language that does not require the compiler to perfectly group everything into a neat tree structure. Instead use a simple list to describe to input, where the list elements can be of various different 'opcodes' instructing the radix compiler what to do. Start out with opcodes chaining to other definition lists and chaining to the existing 'tree' definition. Replace the very top level of the 'object tree' with this list type and get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 17 10月, 2018 1 次提交
-
-
由 Leon Romanovsky 提交于
Replace custom code to allocate indexes to generic kernel API. Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 27 9月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
These return the same thing but dev_name is a more conventional use of the kernel API. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
-
- 21 9月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 20 9月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
The error path has several mistakes - cdev_del should not be called if cdev_device_add fails - We must call put_device on all the goto exit paths as that is what frees the uapi, SRCU and the struct itself. While we are here consolidate all the uvdev_dev init that cannot fail at the top. Fixes: c5c4d92e ("RDMA/uverbs: Use cdev_device_add() instead of cdev_add()") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NParav Pandit <parav@mellanox.com>
-
由 Jason Gunthorpe 提交于
This does nothing but indicate if the uverbs_file is in the device's list, use list_del_init instead. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 13 9月, 2018 1 次提交
-
-
由 Steve Wise 提交于
Currently a uverbs completion event queue is flushed of events in ib_uverbs_comp_event_close() with the queue spinlock held and then released. Yet setting ev_queue->is_closed is not set until later in uverbs_hot_unplug_completion_event_file(). In between the time ib_uverbs_comp_event_close() releases the lock and uverbs_hot_unplug_completion_event_file() acquires the lock, a completion event can arrive and be inserted into the event queue by ib_uverbs_comp_handler(). This can cause a "double add" list_add warning or crash depending on the kernel configuration, or a memory leak because the event is never dequeued since the queue is already closed down. So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close(). Cc: stable@vger.kernel.org Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked objects schema") Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 06 9月, 2018 3 次提交
-
-
由 Parav Pandit 提交于
Instead of explicitly adding device attribute files and handling such error conditions, depend on device core layer to create device attributes files based group pointer NULL terminated array. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Instead of doing two step process to add char device and create underlying device, use cdev_device_add() which does both. Currently a kobject per uverbs_device is created to keep reference to its holding ib_uverbs_device in addition to its underlying device 'dev'. Instead just use uverbs_device->dev to keep a reference to. With this change there is single reference tracker for ib_uverbs_device structure. This allows for subsequent patch to registers group attribute as well using single API cdev_device_add(). Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap. Fixes: 7d96c9b1 ("IB/uverbs: Have the core code create the uverbs_root_spec") Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 13 8月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
Everything now uses the uverbs_uapi data structure. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 11 8月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
This radix tree datastructure is intended to replace the 'hash' structure used today for parsing ioctl methods during system calls. This first commit introduces the structure and builds it from the existing .rodata descriptions. The so-called hash arrangement is actually a 5 level open coded radix tree. This new version uses a 3 level radix tree built using the radix tree library. Overall this is much less code and much easier to build as the radix tree API allows for dynamic modification during the building. There is a small memory penalty to pay for this, but since the radix tree is allocated on a per device basis, a few kb of RAM seems immaterial considering the gained simplicity. The radix tree is similar to the existing tree, but also has a 'attr_bkey' concept, which is a small value'd index for each method attribute. This is used to simplify and improve performance of everything in the next patches. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
-
由 Jason Gunthorpe 提交于
There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 02 8月, 2018 3 次提交
-
-
由 Jason Gunthorpe 提交于
The disassociate function was broken by design because it failed all commands. This prevents userspace from calling destroy on a uobject after it has detected a device fatal error and thus reclaiming the resources in userspace is prevented. This fix is now straightforward, when anything destroys a uobject that is not the user the object remains on the IDR with a NULL context and object pointer. All lookup locking modes other than DESTROY will fail. When the user ultimately calls the destroy function it is simply dropped from the IDR while any related information is returned. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that all the callbacks are safe to run concurrently with disassociation this test can be eliminated. The ufile core infrastructure becomes entirely self contained and is not sensitive to disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
This is a step to get rid of the global check for disassociation. In this model, the ib_dev is not proven to be valid by the core code and cannot be provided to the method. Instead, every method decides if it is able to run after disassociation and obtains the ib_dev using one of three different approaches: - Call srcu_dereference on the udevice's ib_dev. As before, this means the method cannot be called after disassociation begins. (eg alloc ucontext) - Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext() - Retrieve the ib_dev from the uobject->object after checking under SRCU if disassociation has started (eg uobj_get) Largely, the code is all ready for this, the main work is to provide a ib_dev after calling uobj_alloc(). The few other places simply use ib_uverbs_get_ucontext() to get the ib_dev. This flexibility will let the next patches allow destroy to operate after disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 26 7月, 2018 3 次提交
-
-
由 Jason Gunthorpe 提交于
We have a parallel unlocked reader and writer with ib_uverbs_get_context() vs everything else, and nothing guarantees this works properly. Audit and fix all of the places that access ucontext to use one of the following locking schemes: - Call ib_uverbs_get_ucontext() under SRCU and check for failure - Access the ucontext through an struct ib_uobject context member while holding a READ or WRITE lock on the uobject. This value cannot be NULL and has no race. - Hold the ucontext_lock and check for ufile->ucontext !NULL This also re-implements ib_uverbs_get_ucontext() in a way that is safe against concurrent ib_uverbs_get_context() and disassociation. As a side effect, every access to ucontext in the commands is via ib_uverbs_get_context() with an error check, or via the uobject, so there is no longer any need for the core code to check ucontext on every command call. These checks are also removed. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
The locking here has always been a bit crazy and spread out, upon some careful analysis we can simplify things. Create a single function uverbs_destroy_ufile_hw() that internally handles all locking. This pulls together pieces of this process that were sprinkled all over the places into one place, and covers them with one lock. This eliminates several duplicate/confusing locks and makes the control flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely simple. Unfortunately we have to keep an extra mutex, ucontext_lock. This lock is logically part of the rwsem and provides the 'down write, fail if write locked, wait if read locked' semantic we require. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call to the type destroy function (aka 'hw' destroy). The main purpose of this lock is to prevent normal add and destroy from running concurrently with uverbs_cleanup_ufile() Since the uobjects list is always manipulated under the 'hw_destroy_rwsem' we can eliminate the uobjects_lock in the cleanup function. This allows converting that lock to a very simple spinlock with a narrow critical section. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 10 7月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
Now that ib_uobject has a ib_uverbs_file we don't need this extra one in ib_ucq_object. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-