- 06 9月, 2018 3 次提交
-
-
由 Parav Pandit 提交于
Instead of doing two step process to add char device and create underlying device, use cdev_device_add() which does both. Currently a kobject per uverbs_device is created to keep reference to its holding ib_uverbs_device in addition to its underlying device 'dev'. Instead just use uverbs_device->dev to keep a reference to. With this change there is single reference tracker for ib_uverbs_device structure. This allows for subsequent patch to registers group attribute as well using single API cdev_device_add(). Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Instead of adding/removing device attribute files, depend on device_add() which considers adding these device files based on NULL terminated attributes group array. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
If ib_uverbs_create_uapi() fails, dev_num should be freed from the bitmap. Fixes: 7d96c9b1 ("IB/uverbs: Have the core code create the uverbs_root_spec") Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 05 9月, 2018 2 次提交
-
-
由 Artemy Kovalyov 提交于
The object lock was supposed to always be released during destroy, but when the destruction retry series was integrated with the destroy series it created a failure path that missed the unlock. Keep with convention, if destroy fails the caller must undo all locking. Fixes: 87ad80ab ("IB/uverbs: Consolidate uobject destruction") Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jann Horn 提交于
The current code grabs the private_data of whatever file descriptor userspace has supplied and implicitly casts it to a `struct ucma_file *`, potentially causing a type confusion. This is probably fine in practice because the pointer is only used for comparisons, it is never actually dereferenced; and even in the comparisons, it is unlikely that a file from another filesystem would have a ->private_data pointer that happens to also be valid in this context. But ->private_data is not always guaranteed to be a valid pointer to an object owned by the file's filesystem; for example, some filesystems just cram numbers in there. Check the type of the supplied file descriptor to be safe, analogous to how other places in the kernel do it. Fixes: 88314e4d ("RDMA/cma: add support for rdma_migrate_id()") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 23 8月, 2018 1 次提交
-
-
由 Michal Hocko 提交于
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: NDavid Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 8月, 2018 5 次提交
-
-
由 Parav Pandit 提交于
Filter functions returns either 0 or 1, therefore better change their return type from int to bool to reflect the same. Additionally some filter functions have suffix of _filter some doesn't. Make all filter function consistent to have __filter suffix to improve code readability. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Update all GID table entries of the netdevice whose MAC address changed. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Currently following issues exist: 1. Default GIDs of the lower (slave) netdevice if the bond netdevice is added. Rather default GID should be of bond master netdevice. 2. Due to this, when failover event occurs FAILOVER event handler attempts to delete the GID of the upper device and tries to add the default GID of the lower device. This is incorrect behavior. To have simple and correct code: (a) Split default GIDs addition out of add_netdev_ips(). This allows easier removal in future if RoCE default GIDs are removed. (b) Add default GIDs of the bond master device by using right filter and callback function. (c) Remove unused function enum_netdev_default_gids(). Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Now that we correctly delete the default GIDs of lower devices during CHANGEUPPER event, add default GIDs of the bonding master device. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
When NETDEV_CHANGEUPPER event occurs, lower device is not yet established as slave of the master, and when upper device is bond device, default GID entries not deleted. Due to this, when bond device is fully configured, default GID entries of bond device cannot be added as default GID entries are occupied by the lower netdevice. This is incorrect. Default GID entries should really be of bond netdevice because in all RoCE GIDs (default or IP), MAC address of the bond device will be used. It is confusing to have default GID of netdevice which is not really used for any purpose. Therefore, as first step, implement (a) filter function which filters if a CHANGEUPPER event netdevice and associated upper device is master device or not. (b) callback function which deletes the default GIDs of lower (event netdevice). Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 15 8月, 2018 2 次提交
-
-
由 Parav Pandit 提交于
Currently bond_delete_netdev_default_gids() is called by two callers. (a) del_netdev_default_ips_join() (b) del_netdev_default_ips() Both above functions changes the argument order while calling bond_delete_netdev_default_gids(). This required silly del_netdev_default_ips() wrapper. Additionally, del_netdev_default_ips() deletes default GIDs not IP based GIDs. del_netdev_default_ips() having _ips suffix is confusing. Therefore, get rid of confusing del_netdev_default_ips() and simplify bond_delete_netdev_default_gids() to follow same argument order as its caller. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Add comment for handling CHANGEUPPER netevent handling. To improve code readability, (a) move cmd definitions to its respective if-else branches, (b) avoid single line structure definitions. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 14 8月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
Even though this interface is marked CONFIG_BROKEN we still expect it to compile, at least until we delete it completely. Also mark INFINIBAND_USER_ACCESS_UCM with COMPILE_TEST so these situations can be detected. Fixes: e7ff98ae ("RDMA/cma: Constify path record, ib_cm_event, listen_id pointers") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 13 8月, 2018 5 次提交
-
-
由 Jason Gunthorpe 提交于
Now that the ioctl path and uobjects are converted to use uverbs_api, it is now safe to remove the disassociation protection from the common ioctl code. This completes the work to make destroy functions continue to work even after device disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Everything now uses the uverbs_uapi data structure. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Convert the ioctl method syscall path to use the uverbs_api data structures. The new uapi structure includes all the same information, just in a different and more optimal way. - Use attr_bkey instead of 2 level radix trees for everything related to attributes. This includes the attribute storage, presence, and detection of missing mandatory attributes. - Avoid iterating over all attribute storage at finish, instead use find_first_bit with the attr_bkey to locate only those attrs that need cleanup. - Organize things to always run, and always rely on, cleanup. This avoids a bunch of tricky error unwind cases. - Locate the method using the radix tree, and locate the attributes using a very efficient incremental radix tree lookup - Use the precomputed destroy_bkey to handle uobject destruction - Use the precomputed allocation sizes and precomputed 'need_stack' to avoid maths in the fast path. This is optimal if userspace does not pass (many) unsupported attributes. Overall this results in much better codegen for the attribute accessors, everything is now stored in bitmaps or linear arrays indexed by attr_bkey. The compiler can compute attr_bkey values at compile time for all method attributes, meaning things like uverbs_attr_is_valid() now compile into single instruction bit tests. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Several handlers need temporary allocations for the life of the method, switch them to use the uverbs_alloc allocator. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
This is similar in spirit to devm, it keeps track of any allocations linked to this method call and ensures they are all freed when the method exits. Further, if there is space in the internal/onstack buffer then the allocator will hand out that memory and avoid an expensive call to kalloc/kfree in the syscall path. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 11 8月, 2018 5 次提交
-
-
由 Jason Gunthorpe 提交于
Memory in the bundle is valuable, do not waste it holding an 8 byte pointer for the rare case of writing to a PTR_OUT. We can compute the pointer by storing a small 1 byte array offset and the base address of the uattr memory in the bundle private memory. This also means we can access the kernel's copy of the ib_uverbs_attr, so drop the copy of flags as well. Since the uattr base should be private bundle information this also de-inlines the already too big uverbs_copy_to inline and moves create_udata into uverbs_ioctl.c so they can see the private struct definition. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
This already existed as the anonymous 'ctx' structure, but this was not really a useful form. Hoist this struct into bundle_priv and rework the internal things to use it instead. Move a bunch of the processing internal state into the priv and reduce the excessive use of function arguments. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Currently the struct uverbs_obj_type stored in the ib_uobject is part of the .rodata segment of the module that defines the object. This is a problem if drivers define new uapi objects as we will be left with a dangling pointer after device disassociation. Switch the uverbs_obj_type for struct uverbs_api_object, which is allocated memory that is part of the uverbs_api and is guaranteed to always exist. Further this moves the 'type_class' into this memory which means access to the IDR/FD function pointers is also guaranteed. Drivers cannot define new types. This makes it safe to continue to use all uobjects, including driver defined ones, after disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
This radix tree datastructure is intended to replace the 'hash' structure used today for parsing ioctl methods during system calls. This first commit introduces the structure and builds it from the existing .rodata descriptions. The so-called hash arrangement is actually a 5 level open coded radix tree. This new version uses a 3 level radix tree built using the radix tree library. Overall this is much less code and much easier to build as the radix tree API allows for dynamic modification during the building. There is a small memory penalty to pay for this, but since the radix tree is allocated on a per device basis, a few kb of RAM seems immaterial considering the gained simplicity. The radix tree is similar to the existing tree, but also has a 'attr_bkey' concept, which is a small value'd index for each method attribute. This is used to simplify and improve performance of everything in the next patches. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
-
由 Jason Gunthorpe 提交于
There is no reason for drivers to do this, the core code should take of everything. The drivers will provide their information from rodata to describe their modifications to the core's base uapi specification. The core uses this to build up the runtime uapi for each device. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 10 8月, 2018 1 次提交
-
-
由 Jason Gunthorpe 提交于
This is missing a zeroing of the high bits of flags, and is also not correct for big endian machines. Properly zero extend the 32 bit flags into the 64 bit stack variable. Reported-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Fixes: bccd0622 ("IB/uverbs: Add UVERBS_ATTR_FLAGS_IN to the specs language") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
-
- 08 8月, 2018 1 次提交
-
-
由 Parav Pandit 提交于
sgid_attr is uninitialized on the stack, initialize it to NULL. Fixes: 39839107 ("IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'") Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NYossi Itigin <yosefe@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 02 8月, 2018 11 次提交
-
-
由 Jason Gunthorpe 提交于
The disassociate function was broken by design because it failed all commands. This prevents userspace from calling destroy on a uobject after it has detected a device fatal error and thus reclaiming the resources in userspace is prevented. This fix is now straightforward, when anything destroys a uobject that is not the user the object remains on the IDR with a NULL context and object pointer. All lookup locking modes other than DESTROY will fail. When the user ultimately calls the destroy function it is simply dropped from the IDR while any related information is returned. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that all the callbacks are safe to run concurrently with disassociation this test can be eliminated. The ufile core infrastructure becomes entirely self contained and is not sensitive to disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
This does the same as the patch before, except for ioctl. The rules are the same, but for the ioctl methods the core code handles setting up the uobject. - Retrieve the ib_dev from the uobject->context->device. This is safe under ioctl as the core has already done rdma_alloc_begin_uobject and so CREATE calls are entirely protected by the rwsem. - Retrieve the ib_dev from uobject->object - Call ib_uverbs_get_ucontext() Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
This is a step to get rid of the global check for disassociation. In this model, the ib_dev is not proven to be valid by the core code and cannot be provided to the method. Instead, every method decides if it is able to run after disassociation and obtains the ib_dev using one of three different approaches: - Call srcu_dereference on the udevice's ib_dev. As before, this means the method cannot be called after disassociation begins. (eg alloc ucontext) - Retrieve the ib_dev from the ucontext, via ib_uverbs_get_ucontext() - Retrieve the ib_dev from the uobject->object after checking under SRCU if disassociation has started (eg uobj_get) Largely, the code is all ready for this, the main work is to provide a ib_dev after calling uobj_alloc(). The few other places simply use ib_uverbs_get_ucontext() to get the ib_dev. This flexibility will let the next patches allow destroy to operate after disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Commands that are reading/writing to objects can test for an ongoing disassociation during their initial call to rdma_lookup_get_uobject. This directly prevents all of these commands from conflicting with an ongoing disassociation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
After all the recent structural changes this is now straightforward, hold the hw_destroy_rwsem across the entire uobject creation. We already take this semaphore on the success path, so holding it a bit longer is not going to change the performance. After this change none of the create callbacks require the disassociate_srcu lock to be correct. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
After all the recent structural changes this is now straightfoward, hoist the hw_destroy_rwsem up out of rdma_destroy_explicit and wrap it around the uobject write lock as well as the destroy. This is necessary as obtaining a write lock concurrently with uverbs_destroy_ufile_hw() will cause malfunction. After this change none of the destroy callbacks require the disassociate_srcu lock to be correct. This requires introducing a new lookup mode, UVERBS_LOOKUP_DESTROY as the IOCTL interface needs to hold an unlocked kref until all command verification is completed. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
This is more readable, and future patches will need a 3rd lookup type. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
There are several flows that can destroy a uobject and each one is minimized and sprinkled throughout the code base, making it difficult to understand and very hard to modify the destroy path. Consolidate all of these into uverbs_destroy_uobject() and call it in all cases where a uobject has to be destroyed. This makes one change to the lifecycle, during any abort (eg when alloc_commit is not called) we always call out to alloc_abort, even if remove_commit needs to be called to delete a HW object. This also renames RDMA_REMOVE_DURING_CLEANUP to RDMA_REMOVE_ABORT to clarify its actual usage and revises some of the comments to reflect what the life cycle is for the type implementation. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
The ridiculous dance with uobj_remove_commit() is not needed, the write path can follow the same flow as ioctl - lock and destroy the HW object then use the data left over in the uobject to form the response to userspace. Two helpers are introduced to make this flow straightforward for the caller. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
The core code will destroy the HW object on behalf of the method, if the method provides an implementation it must simply copy data from the stub uobj into the response. Destroy methods cannot touch the HW object. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 31 7月, 2018 3 次提交
-
-
由 Parav Pandit 提交于
In rdma cm module, functions which are common between IB and iWarp are named with cma_. iWarp specific functions are prefixed with cma_iw. IB specific functions are perfixed with cma_ib. However some functions in request processing path didn't follow cma_ib notion. Prefix them with _ib for better code clarity. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
cma_add_one() initializes the default GID regardless of device type. listen_id is bound to a device and an IP address, its GID type is initialized by cma_acquire_dev(). Therefore a valid default GID type is always available, it is not needed to check port type during cma_acquire_dev(). Initialize gid type of a cm id when the cm_id is created instead of doing conditional checks during cma_acquire_dev() and trying to initialize to 0 during _cma_attach_to_dev(). Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
In various functions rdma_cm_event is zero initialized on stack using memset() while holding lock which is not necessary. Therefore, don't hold the lock while initializing on stack. Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-