- 27 11月, 2018 2 次提交
-
-
由 Jason Gunthorpe 提交于
If the struct is used with a driver_udata it should have a trailing driver_data flex array to mark it as having udata. In most cases this forces the end of the struct to be aligned to u64 which is needed to make the trailing driver_data naturally aligned. Unfortunately We have a few cases where the base struct is not aligned to 8 bytes, these are marked with a u32 driver_data and userspace will check for alignment issues when it compiles the driver. Also remove the empty ib_uverbs_modify_qp_resp as nothing uses this. pahole says there is no change to any struct sizes by this change. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Colin Ian King 提交于
There is a spelling mistake in the module description text, fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 23 11月, 2018 13 次提交
-
-
由 Parav Pandit 提交于
When the rdma device is getting removed, get resource info can race with device removal, as below: CPU-0 CPU-1 -------- -------- rdma_nl_rcv_msg() nldev_res_get_cq_dumpit() mutex_lock(device_lock); get device reference mutex_unlock(device_lock); [..] ib_unregister_device() /* Valid reference to * device->dev exists. */ ib_dealloc_device() [..] provider->fill_res_entry(); Even though device object is not freed, fill_res_entry() can get called on device which doesn't have a driver anymore. Kernel core device reference count is not sufficient, as this only keeps the structure valid, and doesn't guarantee the driver is still loaded. Similar race can occur with device renaming and device removal, where device_rename() tries to rename a unregistered device. While this is fine for devices of a class which are not net namespace aware, but it is incorrect for net namespace aware class coming in subsequent series. If a class is net namespace aware, then the below [1] call trace is observed in above situation. Therefore, to avoid the race, keep a reference count and let device unregistration wait until all netlink users drop the reference. [1] Call trace: kernfs: ns required in 'infiniband' for 'mlx5_0' WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last unloaded: devlink] RIP: 0010:kernfs_find_ns+0x104/0x120 Call Trace: kernfs_find_and_get_ns+0x2e/0x50 sysfs_rename_link_ns+0x40/0xb0 device_rename+0xb2/0xf0 ib_device_rename+0xb3/0x100 [ib_core] nldev_set_doit+0x165/0x190 [ib_core] rdma_nl_rcv_msg+0x249/0x250 [ib_core] ? netlink_deliver_tap+0x8f/0x3e0 rdma_nl_rcv+0xd6/0x120 [ib_core] netlink_unicast+0x17c/0x230 netlink_sendmsg+0x2f0/0x3e0 sock_sendmsg+0x30/0x40 __sys_sendto+0xdc/0x160 Fixes: da5c8507 ("RDMA/nldev: add driver-specific resource tracking") Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Currently several rdma_cm module specific functions are declared in core_priv.h file. Now that we have cma_priv.h file specific to rdma_cm kernel module, move them from core_priv.h to cma_priv.h Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
Add annotations to the uverbs_api structure indicating which driver methods are called by the implementation. If the required method is NULL the write API will be not be callable. This effectively duplicates the cmd_mask system, however it does it by expressing invariants required by the core code, not by delegating decision making to the driver. This is another step toward eliminating cmd_mask. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Now that we use struct uverbs_uapi to link the method functions to the dispatcher there is no reason to have them be extern symbols. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
This organizes the write commands into objects and links them to the uverbs_api data structure. The command path is reworked to use uapi instead of its internal structures. The command mask is moved from a runtime check to a registration time check in the uapi. Since the write interface does not have the object ID as part of the command, the radix bins are converted into linear lists to support the lookup. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Bringing all uapi entry points into one place lets us deal with them consistently. For instance the write, write_ex and ioctl paths can be disabled when an API is not supported by the driver. This will replace the uverbs_cmd_table static arrays. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
If we can't destroy the object then we certainly shouldn't allow it be created or used. Remove it from the uverbs_uapi in this case. This also disables methods of other objects that have mandatory object handle inputs - ie REG_DM_MR is now automatically removed if DM objects cannot be created. Typically drivers not supporting an interface will mark all of the supporting functions as NULL, including destroy. This is intended to automatically eliminate entire corner cases in the API that are difficult to test. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
Rely on UAPI_DEF_IS_OBJ_SUPPORTED instead of manipulating the contents of the driver's definition list. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
We have many cases where parts of the uapi are not supported in a driver, needs a certain protocol, or whatever. It is best to reflect this directly into the struct uverbs_api when it is built so that everything is simply blocked off, and future introspection can report a proper supported list. This is done by adding some additional helpers to the definition list language that disable objects based on a 'supported' call back, and a helper that disables based on a NULL struct ib_device function pointer. Disablement is global. For instance, if a driver disables an object then everything connected to that object is removed, including core methods. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The next patch needs another copy of this, provide a simple helper to reduce the coding. uapi_add_get_elm() returns an existing entry or adds a new one. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
The 'tree' data structure is very hard to build at compile time, and this makes it very limited. The new radix tree based compiler can handle a more complex input language that does not require the compiler to perfectly group everything into a neat tree structure. Instead use a simple list to describe to input, where the list elements can be of various different 'opcodes' instructing the radix compiler what to do. Start out with opcodes chaining to other definition lists and chaining to the existing 'tree' definition. Replace the very top level of the 'object tree' with this list type and get rid of struct uverbs_object_tree_def and DECLARE_UVERBS_OBJECT_TREE. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Jason Gunthorpe 提交于
For DM there is no reason not to add the spec for the START_OFFSET, if DM is not supported then ib_dev.alloc_dm is already set to NULL which ensures we do not call the method. For IPSEC, the core code should be setting ib_dev.create_flow_action_esp to NULL to disable it, not relying on wonky manipulation of the specs. Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Ursula Braun 提交于
The mlx4 driver does not trigger an IB_EVENT_PORT_ACTIVE when the RoCE network interface is activated. When SMC determines the RoCE device port to be used, it checks the port states. This patch triggers IB events for NETDEV_UP and NETDEV_DOWN. Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Acked-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 22 11月, 2018 8 次提交
-
-
由 Steve Wise 提交于
Only retry connection setup with MPAv1 if the peer actually aborted the connection upon receiving the MPAv2 start message. This avoids retrying with MPAv1 in the case where the connection was aborted due to retransmit timeouts. Fixes: d2fe99e8 ("RDMA/cxgb4: Add support for MPAv2 Enhanced RDMA Negotiation") Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yuval Shaia 提交于
Since the function always returns 0 make it void. Reported-by: NHåkon Bugge <haakon.bugge@oracle.com> Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Acked-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yue Haibing 提交于
There is no need to have the 'struct se_portal_group *tpg' variable static since new value always be assigned before use. Signed-off-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Parav Pandit 提交于
Structures of ib_verbs.h don't use fields/structures of mm.h, socket.h or scatterlist.h. So remove such header files inclusion. Signed-off-by: NParav Pandit <parav@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Sabyasachi Gupta 提交于
Replaced dma_alloc_coherent + memset with dma_zalloc_coherent Signed-off-by: NSabyasachi Gupta <sabyasachi.linux@gmail.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Sabyasachi Gupta 提交于
Replaced dma_alloc_coherent + memset with dma_zalloc_coherent Signed-off-by: NSabyasachi Gupta <sabyasachi.linux@gmail.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Jason Gunthorpe 提交于
From git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux mlx5 updates taken for dependencies on later ODP patches. Conflict resolved by deleting mlx5_ib_get_vector_affinity() * branch 'mlx5-next': (21 commits) net/mlx5: EQ, Make EQE access methods inline {net,IB}/mlx5: Move Page fault EQ and ODP logic to RDMA net/mlx5: EQ, Generic EQ net/mlx5: EQ, Different EQ types net/mlx5: EQ, Privatize eq_table and friends net/mlx5: EQ, irq_info and rmap belong to eq_table net/mlx5: EQ, Create all EQs in one place net/mlx5: EQ, Move all EQ logic to eq.c net/mlx5: EQ, Remove redundant completion EQ list lock net/mlx5: EQ, No need to store eq index as a field net/mlx5: EQ, Remove unused fields and structures net/mlx5: EQ, Use the right place to store/read IRQ affinity hint IB/mlx5: Improve ODP debugging messages net/mlx5: Use multi threaded workqueue for page fault handling net/mlx5: Return success for PAGE_FAULT_RESUME in internal error state IB/mlx5: Lock QP during page fault handling net/mlx5: Enumerate page fault types net/mlx5: Add interface to hold and release core resources net/mlx5: Release resource on error flow net/mlx5: Fix offsets of ifc reserved fields ... Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Artemy Kovalyov 提交于
This is required so the user can set the SL on the DC QP. Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com> Reviewed-by: NYossi Itigin <yosefe@mellanox.com> Reviewed-by: NMajd Dibbiny <majd@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 21 11月, 2018 12 次提交
-
-
由 Saeed Mahameed 提交于
These are one/two liner generic EQ access methods, better have them declared static inline in eq.h. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Use the new generic EQ API to move all ODP RDMA data structures and logic form mlx5 core driver into mlx5_ib driver. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Acked-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Add mlx5_eq_{create/destroy}_generic APIs and EQE access methods, for mlx5 core consumers generic EQs. This API will be used in downstream patch to move page fault (RDMA ODP) EQ logic into mlx5_ib rdma driver, hence it will use a generic EQ. Current mlx5 EQ allocation scheme: On load mlx5 allocates 4 (for async) + #cores (for data completions) MSIX vectors, mlx5 core will assign 3 MSIX vectors for internal async EQs and will use all of the #cores MSIX vectors for completion EQs, (One vector is going to be reserved for a generic EQ). After this patch an external user (e.g mlx5_ib) of mlx5_core can use this new API to create new generic EQs with the reserved msix vector index for that eq. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
In mlx5 we have three types of usages for EQs, 1. Asynchronous EQs, used internally by mlx5 core for a. FW command completions b. FW page requests c. one EQ for all other Asynchronous events 2. Completion EQs, used for CQ completion (we create one per core) 3. *Special type of EQ (page fault) used for RDMA on demand paging (ODP). *The 3rd type shouldn't be special at least in mlx5 core, it is yet another async events EQ with specific use case, it will be removed in the next two patches, and will completely move its logic to mlx5_ib, as it is rdma specific. In this patch we remove use case (eq type) specific fields from struct mlx5_eq into a new eq type specific structures. struct mlx5_eq_async; truct mlx5_eq_comp; struct mlx5_eq_pagefault; Separate between their type specific flows. In the future we will allow users to create there own generic EQs. for now we will allow only one for ODP in next patches. We will introduce event listeners registration API for those who want to receive mlx5 async events. After that mlx5 eq handling will be clean from feature/user specific handling. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Move unnecessary EQ table structures and declaration from the public include/linux/mlx5/driver.h into the private area of mlx5_core and into eq.c/eq.h. Introduce new mlx5 EQ APIs: mlx5_comp_vectors_count(dev); mlx5_comp_irq_get_affinity_mask(dev, vector); And use them from mlx5_ib or mlx5e netdevice instead of direct access to mlx5_core internal structures. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
irq_info and rmap are EQ properties of the driver, and only needed for EQ objects, move them to the eq_table EQs database structure. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Instead of creating the EQ table in three steps at driver load, - allocate irq vectors - allocate async EQs - allocate completion EQs Gather all of the procedures into one function in eq.c and call it from driver load. This will help us reduce the EQ and EQ table private structures visibility to eq.c in downstream refactoring. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Move completion EQs flows from main.c to eq.c, reasons: 1) It is where this logic belongs. 2) It will help centralize the EQ logic in one file for downstream refactoring, and future extensions/updates. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Completion EQs list is only modified on driver load/unload, locking is not required, remove it. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
eq->index is used only for completion EQs and is assigned to be the completion eq index, it is used only when traversing the completion eqs list, and it can be calculated dynamically, thus remove the eq->index field. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Some fields and structures are not referenced nor used by the driver, remove them. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Saeed Mahameed 提交于
Currently the cpu affinity hint mask for completion EQs is stored and read from the wrong place, since reading and storing is done from the same index, there is no actual issue with that, but internal irq_info for completion EQs stars at MLX5_EQ_VEC_COMP_BASE offset in irq_info array, this patch changes the code to use the correct offset to store and read the IRQ affinity hint. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 13 11月, 2018 5 次提交
-
-
由 Moni Shoua 提交于
Add and modify debug messages to ODP related error flows. In that context, return code EAGAIN is considered less severe and print level for it is set debug instead of warn. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Moni Shoua 提交于
Page fault events are processed in a workqueue context. Since each QP can have up to two concurrent unrelated page-faults, one for requester and one for responder, page-fault handling can be done in parallel. Achieve this by changing the workqueue to be multi-threaded. The number of threads is the same as the number of command interface channels to avoid command interface bottlenecks. In addition to multi-threads, change the workqueue flags to give it high priority. Stress benchmark shows that before this change 85% of page faults were waiting in queue 8 seconds or more while after the change 98% of page faults were waiting in queue 64 milliseconds or less. The number of threads was chosen as the number of channels to the command interface. Fixes: d9aaed83 ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Moni Shoua 提交于
When the device is in internal error state, command interface isn't accessible and the driver decides which commands to fail and which to pass. Move the PAGE_FAULT_RESUME command to the pass list in order to avoid redundant failure messages. Fixes: 89d44f0a ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Moni Shoua 提交于
When page fault event for a WQE arrives, the event data contains the resource (e.g. QP) number which will later be used by the page fault handler to retrieve the resource. Meanwhile, another context can destroy the resource and cause use-after-free. To avoid that, take a reference on the resource when handler starts and release it when it ends. Page fault events for RDMA operations don't need to be protected because the driver doesn't need to access the QP in the page fault handler. Fixes: d9aaed83 ("{net,IB}/mlx5: Refactor page fault handling") Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Moni Shoua 提交于
Give meaningful names to type of WQE page faults. Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-