- 31 1月, 2019 1 次提交
-
-
由 Gal Pressman 提交于
Drivers that do not provide kernel verbs support should not be used by ib kernel clients at all. In case a device does not implement all mandatory verbs for kverbs usage mark it as a non kverbs provider and prevent its usage for all clients except for uverbs. The device is marked as a non kverbs provider using the 'kverbs_provider' flag which should only be set by the core code. The clients can choose whether kverbs are requested for its usage using the 'no_kverbs_req' flag which is currently set for uverbs only. This patch allows drivers to remove mandatory verbs stubs and simply set the callbacks to NULL. The IB device will be registered as a non-kverbs provider. Note that verbs that are required for the device registration process must be implemented. Signed-off-by: NGal Pressman <galpress@amazon.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 11 1月, 2019 1 次提交
-
-
由 Jason Gunthorpe 提交于
ib_umem_get() can only be called in a method callback, which always has a udata parameter. This allows ib_umem_get() to derive the ucontext pointer directly from the udata without requiring the drivers to find it in some way or another. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>
-
- 04 1月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2018 1 次提交
-
-
由 Kamal Heib 提交于
Make all the required change to start use the ib_device_ops structure. Signed-off-by: NKamal Heib <kamalheib1@gmail.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 04 12月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
All of the old arguments can be derived from the uverbs_attr_bundle structure, so get rid of the redundant arguments. Most of the prior work has been removing users of the arguments to allow this to be a simple patch. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jason Gunthorpe 提交于
This creates a consistent way to access the two core buffers across write and write_ex handlers. Remove the open coded ucore conversion in the write/ex compatibility handlers. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 27 11月, 2018 5 次提交
-
-
由 Jason Gunthorpe 提交于
Now that we have metadata describing the command format the core code can directly compute the udata pointers and all the really ugly ib_uverbs_init_udata() calls can be removed from the handlers. This means all the write() handlers are no longer sensitive to the layout of the command buffer. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The core code needs to compute the udata so we may as well pass it in the uverbs_attr_bundle instead of on the stack. This converts the simple case of write_ex() which already has a core calculation. Also change the write() path to use the attrs for ib_uverbs_init_udata() instead of on the stack. This lets the write to write_ex compatibility path continue to follow the lead of the _ex path. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The size meta-data in the prior patch describes the smallest acceptable buffer for the write() interface. Globally check this in the core code. This is necessary in the case of write() methods that have a driver udata to prevent computing a negative udata buffer length. The return code of -ENOSPC is chosen here as some of the handlers already use this code, however many other handler use EINVAL. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Currently they return the command length, while all other handlers return 0. This makes the write path closer to the write_ex and ioctl path. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that we can add meta-data to the description of write() methods we need to pass the uverbs_attr_bundle into all write based handlers so future patches can use it as a container for any new data transferred out of the core. This is the first step to bringing the write() and ioctl() methods to a common interface signature. This is a simple search/replace, and we push the attr down into the uobj and other APIs to keep changes minimal. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 23 11月, 2018 3 次提交
-
-
由 Jason Gunthorpe 提交于
This organizes the write commands into objects and links them to the uverbs_api data structure. The command path is reworked to use uapi instead of its internal structures. The command mask is moved from a runtime check to a registration time check in the uapi. Since the write interface does not have the object ID as part of the command, the radix bins are converted into linear lists to support the lookup. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
We have many cases where parts of the uapi are not supported in a driver, needs a certain protocol, or whatever. It is best to reflect this directly into the struct uverbs_api when it is built so that everything is simply blocked off, and future introspection can report a proper supported list. This is done by adding some additional helpers to the definition list language that disable objects based on a 'supported' call back, and a helper that disables based on a NULL struct ib_device function pointer. Disablement is global. For instance, if a driver disables an object then everything connected to that object is removed, including core methods. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The 'tree' data structure is very hard to build at compile time, and this makes it very limited. The new radix tree based compiler can handle a more complex input language that does not require the compiler to perfectly group everything into a neat tree structure. Instead use a simple list to describe to input, where the list elements can be of various different 'opcodes' instructing the radix compiler what to do. Start out with opcodes chaining to other definition lists and chaining to the existing 'tree' definition. Replace the very top level of the 'object tree' with this list type and get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 17 10月, 2018 1 次提交
-
-
由 Leon Romanovsky 提交于
Replace custom code to allocate indexes to generic kernel API. Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 27 9月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
These return the same thing but dev_name is a more conventional use of the kernel API. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
-
- 21 9月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
To support disassociation and PCI hot unplug, we have to track all the VMAs that refer to the device IO memory. When disassociation occurs the VMAs have to be revised to point to the zero page, not the IO memory, to allow the physical HW to be unplugged. The three drivers supporting this implemented three different versions of this algorithm, all leaving something to be desired. This new common implementation has a few differences from the driver versions: - Track all VMAs, including splitting/truncating/etc. Tie the lifetime of the private data allocation to the lifetime of the vma. This avoids any tricks with setting vm_ops which Linus didn't like. (see link) - Support multiple mms, and support properly tracking mmaps triggered by processes other than the one first opening the uverbs fd. This makes fork behavior of disassociation enabled drivers the same as fork support in normal drivers. - Don't use crazy get_task stuff. - Simplify the approach for to racing between vm_ops close and disassociation, fixing the related bugs most of the driver implementations had. Since we are in core code the tracking list can be placed in struct ib_uverbs_ufile, which has a lifetime strictly longer than any VMAs created by mmap on the uverbs FD. Link: https://www.spinics.net/lists/stable/msg248747.html Link: https://lkml.kernel.org/r/CA+55aFxJTV_g46AQPoPXen-UPiqR1HGMZictt7VpC-SMFbm3Cw@mail.gmail.comSigned-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 20 9月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
The error path has several mistakes - cdev_del should not be called if cdev_device_add fails - We must call put_device on all the goto exit paths as that is what frees the uapi, SRCU and the struct itself. While we are here consolidate all the uvdev_dev init that cannot fail at the top. Fixes: c5c4d92e ("RDMA/uverbs: Use cdev_device_add() instead of cdev_add()") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NParav Pandit <parav@mellanox.com>
-
由 Jason Gunthorpe 提交于
This does nothing but indicate if the uverbs_file is in the device's list, use list_del_init instead. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 13 9月, 2018 1 次提交
-
-
由 Steve Wise 提交于
Currently a uverbs completion event queue is flushed of events in ib_uverbs_comp_event_close() with the queue spinlock held and then released. Yet setting ev_queue->is_closed is not set until later in uverbs_hot_unplug_completion_event_file(). In between the time ib_uverbs_comp_event_close() releases the lock and uverbs_hot_unplug_completion_event_file() acquires the lock, a completion event can arrive and be inserted into the event queue by ib_uverbs_comp_handler(). This can cause a "double add" list_add warning or crash depending on the kernel configuration, or a memory leak because the event is never dequeued since the queue is already closed down. So add setting ev_queue->is_closed = 1 to ib_uverbs_comp_event_close(). Cc: stable@vger.kernel.org Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked objects schema") Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 06 9月, 2018 3 次提交
-
-
由 Parav Pandit 提交于
Instead of explicitly adding device attribute files and handling such error conditions, depend on device core layer to create device attributes files based group pointer NULL terminated array. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Instead of doing two step process to add char device and create underlying device, use cdev_device_add() which does both. Currently a kobject per uverbs_device is created to keep reference to its holding ib_uverbs_device in addition to its underlying device 'dev'. Instead just use uverbs_device->dev to keep a reference to. With this change there is single reference tracker for ib_uverbs_device structure. This allows for subsequent patch to registers group attribute as well using single API cdev_device_add(). Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap. Fixes: 7d96c9b1 ("IB/uverbs: Have the core code create the uverbs_root_spec") Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 13 8月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
Everything now uses the uverbs_uapi data structure. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 11 8月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
This radix tree datastructure is intended to replace the 'hash' structure used today for parsing ioctl methods during system calls. This first commit introduces the structure and builds it from the existing .rodata descriptions. The so-called hash arrangement is actually a 5 level open coded radix tree. This new version uses a 3 level radix tree built using the radix tree library. Overall this is much less code and much easier to build as the radix tree API allows for dynamic modification during the building. There is a small memory penalty to pay for this, but since the radix tree is allocated on a per device basis, a few kb of RAM seems immaterial considering the gained simplicity. The radix tree is similar to the existing tree, but also has a 'attr_bkey' concept, which is a small value'd index for each method attribute. This is used to simplify and improve performance of everything in the next patches. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
-
由 Jason Gunthorpe 提交于
There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 02 8月, 2018 3 次提交
-
-
由 Jason Gunthorpe 提交于
The disassociate function was broken by design because it failed all commands. This prevents userspace from calling destroy on a uobject after it has detected a device fatal error and thus reclaiming the resources in userspace is prevented. This fix is now straightforward, when anything destroys a uobject that is not the user the object remains on the IDR with a NULL context and object pointer. All lookup locking modes other than DESTROY will fail. When the user ultimately calls the destroy function it is simply dropped from the IDR while any related information is returned. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that all the callbacks are safe to run concurrently with disassociation this test can be eliminated. The ufile core infrastructure becomes entirely self contained and is not sensitive to disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
This is a step to get rid of the global check for disassociation. In this model, the ib_dev is not proven to be valid by the core code and cannot be provided to the method. Instead, every method decides if it is able to run after disassociation and obtains the ib_dev using one of three different approaches: - Call srcu_dereference on the udevice's ib_dev. As before, this means the method cannot be called after disassociation begins. (eg alloc ucontext) - Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext() - Retrieve the ib_dev from the uobject->object after checking under SRCU if disassociation has started (eg uobj_get) Largely, the code is all ready for this, the main work is to provide a ib_dev after calling uobj_alloc(). The few other places simply use ib_uverbs_get_ucontext() to get the ib_dev. This flexibility will let the next patches allow destroy to operate after disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 26 7月, 2018 3 次提交
-
-
由 Jason Gunthorpe 提交于
We have a parallel unlocked reader and writer with ib_uverbs_get_context() vs everything else, and nothing guarantees this works properly. Audit and fix all of the places that access ucontext to use one of the following locking schemes: - Call ib_uverbs_get_ucontext() under SRCU and check for failure - Access the ucontext through an struct ib_uobject context member while holding a READ or WRITE lock on the uobject. This value cannot be NULL and has no race. - Hold the ucontext_lock and check for ufile->ucontext !NULL This also re-implements ib_uverbs_get_ucontext() in a way that is safe against concurrent ib_uverbs_get_context() and disassociation. As a side effect, every access to ucontext in the commands is via ib_uverbs_get_context() with an error check, or via the uobject, so there is no longer any need for the core code to check ucontext on every command call. These checks are also removed. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
The locking here has always been a bit crazy and spread out, upon some careful analysis we can simplify things. Create a single function uverbs_destroy_ufile_hw() that internally handles all locking. This pulls together pieces of this process that were sprinkled all over the places into one place, and covers them with one lock. This eliminates several duplicate/confusing locks and makes the control flow in ib_uverbs_close() and ib_uverbs_free_hw_resources() extremely simple. Unfortunately we have to keep an extra mutex, ucontext_lock. This lock is logically part of the rwsem and provides the 'down write, fail if write locked, wait if read locked' semantic we require. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Rename 'cleanup_rwsem' to 'hw_destroy_rwsem' which is held across any call to the type destroy function (aka 'hw' destroy). The main purpose of this lock is to prevent normal add and destroy from running concurrently with uverbs_cleanup_ufile() Since the uobjects list is always manipulated under the 'hw_destroy_rwsem' we can eliminate the uobjects_lock in the cleanup function. This allows converting that lock to a very simple spinlock with a narrow critical section. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 10 7月, 2018 5 次提交
-
-
由 Jason Gunthorpe 提交于
Now that ib_uobject has a ib_uverbs_file we don't need this extra one in ib_ucq_object. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
The only purpose for this structure was to hold the ib_uobject_file pointer, but now that is part of the standard ib_uobject the structure no longer makes any sense, so get rid of it. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Unnecessary clutter, to indirect through ucontext when the ufile would do. Generally most of the code code should only be working with ufile, except for a few places that touch the driver interface. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The correct handle to refer to the idr/etc is ib_uverbs_file, revise all the core APIs to use this instead. The user API are left as wrappers that automatically convert a ucontext to a ufile for now. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The IDR is part of the ib_ufile so all the machinery to lock it, handle closing and disassociation rightly belongs to the ufile not the ucontext. This changes the lifetime of that data to match the lifetime of the file descriptor which is always strictly longer than the lifetime of the ucontext. We need the entire locking machinery to continue to exist after ucontext destruction to allow us to return the destroy data after a device has been disassociated. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 05 7月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
The specs are required to operate the uverbs file, so they belong inside the ib_uverbs_device, not inside the ib_device. The spec passed in the ib_device is just a communication from the driver and should not be used during runtime. This also changes the lifetime of the spec memory to match the ib_uverbs_device, however at this time the spec_root can still contain driver pointers after disassociation, so it cannot be used if ib_dev is NULL. This is preparation for another series. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 20 6月, 2018 1 次提交
-
-
由 Yishai Hadas 提交于
Drivers that use the IOCTL API may have the ib_uverbs_file and need a way to get the related ib_ucontext from it, this is enabled by this patch. Downstream patches from this series will use it. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 13 6月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
During disassociation the ucontext will become NULL, however due to how the SRCU locking works the ucontext must only be examined after looking at the ib_dev, which governs the RCU control flow. With the wrong ordering userspace will see EINVAL instead of EIO for a disassociated uverbs FD, which breaks rdma-core. Cc: stable@vger.kernel.org Fixes: 491d5c6a ("RDMA/uverbs: Move uncontext check before SRCU read lock") Reported-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-