- 23 8月, 2017 1 次提交
-
-
由 Bharat Potnuri 提交于
Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer dereference in ib_uverbs_comp_handler(), if application doesnot use completion channel. This patch fixes the cq_context initialization. Fixes: 1e7710f3 ("IB/core: Change completion channel to use the reworked") Cc: stable@vger.kernel.org # 4.12 Signed-off-by: NPotnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com> (cherry picked from commit 699a2d5b)
-
- 17 8月, 2017 1 次提交
-
-
由 Maor Gottlieb 提交于
As part of ib_uverbs_remove_one which might be triggered upon reset flow, we trigger IB_EVENT_DEVICE_FATAL event to userspace application. If device was removed after uverbs fd was opened but before ib_uverbs_get_context was called, the event file will be accessed before it was allocated, result in NULL pointer dereference: [ 72.325873] BUG: unable to handle kernel NULL pointer dereference at (null) ... [ 72.325984] IP: _raw_spin_lock_irqsave+0x22/0x40 [ 72.327123] Call Trace: [ 72.327168] ib_uverbs_async_handler.isra.8+0x2e/0x160 [ib_uverbs] [ 72.327216] ? synchronize_srcu_expedited+0x27/0x30 [ 72.327269] ib_uverbs_remove_one+0x120/0x2c0 [ib_uverbs] [ 72.327330] ib_unregister_device+0xd0/0x180 [ib_core] [ 72.327373] mlx5_ib_remove+0x74/0x140 [mlx5_ib] [ 72.327422] mlx5_remove_device+0xfb/0x110 [mlx5_core] [ 72.327466] mlx5_unregister_interface+0x3c/0xa0 [mlx5_core] [ 72.327509] mlx5_ib_cleanup+0x10/0x962 [mlx5_ib] [ 72.327546] SyS_delete_module+0x155/0x230 [ 72.328472] ? exit_to_usermode_loop+0x70/0xa6 [ 72.329370] do_syscall_64+0x54/0xc0 [ 72.330262] entry_SYSCALL64_slow_path+0x25/0x25 Fix it by checking that user context was allocated before trigger the event. Fixes: 036b1063 ('IB/uverbs: Enable device removal when there are active user space applications') Signed-off-by: NMaor Gottlieb <maorg@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 16 8月, 2017 9 次提交
-
-
由 Shiraz Saleem 提交于
ib_unregister_device is not protecting removal of sysfs entries. A call to ib_register_device in that window can result in duplicate sysfs entry warning. Move mutex_unlock to after ib_device_unregister_sysfs to protect against sysfs entry creation. This issue is exposed during driver load/unload stress test. WARNING: CPU: 5 PID: 4445 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5f/0x70 sysfs: cannot create duplicate filename '/class/infiniband/i40iw0' Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Q87M-D2H BIOS F7 01/17/2014 Workqueue: i40e i40e_service_task [i40e] Call Trace: dump_stack+0x67/0x98 __warn+0xcc/0xf0 warn_slowpath_fmt+0x4a/0x50 ? kernfs_path_from_node+0x4b/0x60 sysfs_warn_dup+0x5f/0x70 sysfs_do_create_link_sd.isra.2+0xb7/0xc0 sysfs_create_link+0x20/0x40 device_add+0x28c/0x600 ib_device_register_sysfs+0x58/0x170 [ib_core] ib_register_device+0x325/0x570 [ib_core] ? i40iw_register_rdma_device+0x1f4/0x400 [i40iw] ? kmem_cache_alloc_trace+0x143/0x330 ? __raw_spin_lock_init+0x2d/0x50 i40iw_register_rdma_device+0x2dc/0x400 [i40iw] i40iw_open+0x10a6/0x1950 [i40iw] ? i40iw_open+0xeab/0x1950 [i40iw] ? i40iw_make_cm_node+0x9c0/0x9c0 [i40iw] i40e_client_subtask+0xa4/0x110 [i40e] i40e_service_task+0xc2d/0x1320 [i40e] process_one_work+0x203/0x710 ? process_one_work+0x16f/0x710 worker_thread+0x126/0x4a0 ? trace_hardirqs_on+0xd/0x10 kthread+0x112/0x150 ? process_one_work+0x710/0x710 ? kthread_create_on_node+0x40/0x40 ret_from_fork+0x2e/0x40 ---[ end trace fd11b69e21ea7653 ]--- Couldn't register device i40iw0 with driver model Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: NSindhu Devale <sindhu.devale@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Steve Wise 提交于
Fixes: ee30f7d5 ("iw_cxgb4: Max fastreg depth depends on DSGL support") Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Colin Ian King 提交于
When dmac is NULL, ah is not being freed on the error return path. Fix this by kfree'ing it. Detected by CoverityScan, CID#1452636 ("Resource Leak") Fixes: d8966fcd ("IB/core: Use rdma_ah_attr accessor functions") Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christopher N Bednarz 提交于
Avoid out of bounds error by utilizing I40IW_MAX_STATS_COUNT instead of I40IW_INVALID_FCN_ID. Signed-off-by: NChristopher N Bednarz <christoper.n.bednarz@intel.com> Signed-off-by: NHenry Orosco <henry.orosco@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Christopher N Bednarz 提交于
Utilize correct alignment variable when allocating DMA memory for CQ0. Signed-off-by: NChristopher N Bednarz <christopher.n.bednarz@intel.com> Signed-off-by: NHenry Orosco <henry.orosco@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mustafa Ismail 提交于
The typecast of tcp_seq_num incorrectly uses u8. Fix by casting to u32. Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: NHenry Orosco <henry.orosco@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mustafa Ismail 提交于
Fix incorrect naming of status code and struct. Use inline instead of immediate. Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: NHenry Orosco <henry.orosco@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chien Tin Tung 提交于
Parsing of commit/query Host Memory Cache Function Private Memory is not skipping over reserved fields and incorrectly assigning those values into object's base/cnt/max_cnt fields. Skip over reserved fields and set correct values. Also correct memory alignment requirement for commit/query FPM buffers. Signed-off-by: NChien Tin Tung <chien.tin.tung@intel.com> Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: NChristopher N Bednarz <christopher.n.bednarz@intel.com> Signed-off-by: NHenry Orosco <henry.orosco@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bryan Tan 提交于
There is a chance of a race between arming the CQ and receiving completions. By reporting CQ missed events any ULPs should poll again to get the completions. Fixes: 29c8d9eb ("IB: Add vmw_pvrdma driver") Acked-by: NAditya Sarwade <asarwade@vmware.com> Signed-off-by: NBryan Tan <bryantan@vmware.com> Signed-off-by: NAdit Ranadive <aditr@vmware.com> Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 08 8月, 2017 1 次提交
-
-
由 Doug Ledford 提交于
Merge tag 'rdma-rc-2017-07-26' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma into leon-ipoib IPoIB fixes for 4.13 The patchset provides various fixes for IPoIB. It is combination of fixes to various issues discovered during verification along with static checkers cleanup patches. Most of the patches are from pre-git era and hence lack of Fixes lines. There is one exception in this IPoIB group - addition of patch revert: Revert "IB/core: Allow QP state transition from reset to error", but it followed by proper fix to the annoying print, so I thought it is appropriate to include it. Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 05 8月, 2017 5 次提交
-
-
由 Dan Carpenter 提交于
The hns_roce_v1_create_lp_qp() returns NULL on error, not error pointers. Fixes: bfcc681b ("IB/hns: Fix the bug when free mr") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
The extended address vector is the highest bit in be32 variable, but it was compared with the lowest. This patch fixes the endianness of that check and removes already declared define. Fixes: 17d2f88f ("IB/mlx5: Add ODP atomics support") Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Yishai Hadas 提交于
Uverbs device should be cleaned up only when there is no potential usage of. As part of ib_uverbs_remove_one which might be triggered upon reset flow the device reference count is decreased as expected and leave the final cleanup to the FDs that were opened. Current code increases reference count upon opening a new command FD and decreases it upon closing the file. The event FD is opened internally and rely on the command FD by taking on it a reference count. In case that the command FD was closed and just later the event FD we may ensure that the device resources as of srcu are still alive as they are still in use. Fixing the above by moving the reference count decreasing to the place where the command FD is really freed instead of doing that when it was just closed. fixes: 036b1063 ("IB/uverbs: Enable device removal when there are active user space applications") Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Leon Romanovsky 提交于
initialize to zero the response structure to prevent the leakage of "resp.reserved" field. drivers/infiniband/core/uverbs_cmd.c:1178 ib_uverbs_resize_cq() warn: check that 'resp.reserved' doesn't leak information Fixes: 33b9b3ee ("IB: Add userspace support for resizing CQs") Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Parav Pandit 提交于
Currently while resolving IP address to MAC address single delayed work is used for resolving multiple such resolve requests. This singled work is essentially performs two tasks. (a) any retry needed to resolve and (b) it executes the callback function for all completed requests While work is executing callbacks, any new work scheduled on for this workqueue is lost because workqueue has completed looking at all pending requests and now looking at callbacks, but work is still under execution. Any further retry to look at pending requests in process_req() after executing callbacks would lead to similar race condition (may be reduce the probably further but doesn't eliminate it). Retrying to enqueue work that from queue_req() context is not something rest of the kernel modules have followed. Therefore fix in this patch utilizes kernel facility to enqueue multiple work items to a workqueue. This ensures that no such requests gets lost in synchronization. Request list is still maintained so that rdma_cancel_addr() can unlink the request and get the completion with error sooner. Neighbour update event handling continues to be handled in same way as before. Additionally process_req() work entry cancels any pending work for a request that gets completed while processing those requests. Originally ib_addr was ST workqueue, but it became MT work queue with patch of [1]. This patch again makes it similar to ST so that neighbour update events handler work item doesn't race with other work items. In one such below trace, (though on 4.5 based kernel) it can be seen that process_req() never executed the callback, which is likely for an event that was schedule by queue_req() when previous callback was getting executed by workqueue. [<ffffffff816b0dde>] schedule+0x3e/0x90 [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210 [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70 [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr] [<ffffffff816b228f>] wait_for_completion+0x10f/0x170 [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210 [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr] [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr] [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0 [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core] [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm] [<ffffffff81015982>] ? __switch_to+0x212/0x5e0 [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm] [<ffffffff810a14c1>] process_one_work+0x151/0x4b0 [<ffffffff810a1940>] worker_thread+0x120/0x480 [<ffffffff816b074b>] ? __schedule+0x30b/0x890 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0 [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0 [<ffffffff810a6b1e>] kthread+0xce/0xf0 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff816b53a2>] ret_from_fork+0x42/0x70 [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70 INFO: task kworker/u144:1:156520 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/u144:1 D ffff883ffe1d7600 0 156520 2 0x00000080 Workqueue: ib_addr process_req [ib_addr] ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200 ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4 ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde [1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.htmlSigned-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 7月, 2017 11 次提交
-
-
由 Erez Shitrit 提交于
Modify QP can fail and it can be acceptable, like when moving from RST to ERR state, all the rest are not acceptable and a message to the log should be printed. The current code prints on all failures and many messages like: "Failed to modify QP to ERROR state" appear, even when supported by the state machine of the QP object. Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Leon Romanovsky 提交于
The commit ebc9ca43 ("IB/core: Allow QP state transition from reset to error") allowed transition from Reset to Error state for the QPs. This behavior doesn't follow the IBTA specification 1.3, which in 10.3.1 QUEUE PAIR AND EE CONTEXT STATES section. The quote from the spec: "An error can be forced from any state, except Reset, with the Modify QP/EE Verb." Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Leon Romanovsky 提交于
There is no need to assign "p" pointer twice. This patch fixes the following smatch warning: drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn: missing break? reassigning 'p->id' Fixes: 839fcaba ("IPoIB: Connected mode experimental support") Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Leon Romanovsky 提交于
Refactor error paths in ipoib_add_port() function. The code flow ensures that the function terminates on every error flow and it makes redundant all "else" cases. The functions are called during the flow are returning "result < 0", in case of error, so there is no need to check it explicitly. Fixes: 58e9cc90 ("IB/IPoIB: Fix bad error flow in ipoib_add_port()") Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
-
由 Feras Daoud 提交于
Add SRIOV VF support to get traffic statistics. Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Alex Vesker 提交于
Update the multicast counter when multicast packets are received and provide this information through ethtool support. Signed-off-by: NAlex Vesker <valex@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Feras Daoud 提交于
Set IPOIB_NEIGH_TBL_FLUSH bit after initializing the neighbor flushed completion, otherwise the garbage collector may signal a completion while it is not initialized yet. Fixes: b63b70d8 ("IPoIB: Use a private hash table for path lookup in xmit path") Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NAlex Vesker <valex@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Alex Vesker 提交于
Don't allow negative values to max_nonsrq_conn_qp. There is no functional impact on a negative value but it is logicically incorrect. Fixes: 68e995a2 ("IPoIB/cm: Add connected mode support for devices without SRQs") Signed-off-by: NAlex Vesker <valex@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Erez Shitrit 提交于
While cleaning neighs and there is a send-only mcast neigh, the driver should wait to finish its join process before trying to remove it. Without this patch, we will see messages like: "ipoib_mcast_leave on an in-flight join" and unexpected results in the join_complete. Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Erez Shitrit 提交于
The work mcast_task can re-queue itself, so instead of doing cancel && flush_workqueue, that still can leave a queued task on the air, use cancel_delayed_work_sync. Also, no need to use lock over the cancel, the original lock was due to bit assignment setting (IPOIB_MCAST_RUN) that is not in use anymore. Signed-off-by: NErez Shitrit <erezsh@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Feras Daoud 提交于
A potential race between light_event and interface restart may attach multicast group to an already attached QP. Scenario: light_event flow goes through ipoib_mcast_dev_flush function, if a context switch occurs before calling ipoib_mcast_remove_list, then we may face a situation where the broadcast of the priv is null and the corresponding QP is not detached yet. If an "interface restart" runs during the previous context switch, the following scenario occurs: When the device goes up, ipoib_ib_dev_up function will be called, it will send a new registration request to the broadcast group and then attach the group to the QP that was not detached before. IPOIB_FLUSH_LIGHT INTERFACE RESTART __ipoib_ib_dev_flush | | | | | | | ipoib_mcast_dev_flush | Move mcast list and broadcast to remove_list | | | | | Context Switch--> | | ipoib_ib_dev_down | | | | | ipoib_ib_dev_up | | | | | ipoib_mcast_join_task | allocate new broadcast | | | | | Attach QP to multicast group | | | | | <--Context Switch ipoib_mcast_leave Detach QP from multicast group Signed-off-by: NFeras Daoud <ferasda@mellanox.com> Signed-off-by: NLeon Romanovsky <leon@kernel.org>
-
- 20 7月, 2017 12 次提交
-
-
由 Ismail, Mustafa 提交于
Initialize the port_num for iWARP in rdma_init_qp_attr. Fixes: 5ecce4c9("Check port number supplied by user verbs cmds") Cc: <stable@vger.kernel.org> # v2.6.14+ Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com> Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ismail, Mustafa 提交于
The port number is only valid if IB_QP_PORT is set in the mask. So only check port number if it is valid to prevent modify_qp from failing due to an invalid port number. Fixes: 5ecce4c9("Check port number supplied by user verbs cmds") Cc: <stable@vger.kernel.org> # v2.6.14+ Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com> Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kalderon, Michal 提交于
Once in_dev_get is called to receive in_device pointer, the in_device reference counter is increased, but if there are no ipv4 addresses configured on the net-device the ifa_list will be null, resulting in a flow that doesn't call in_dev_put to decrease the ref_cnt. This was exposed when running RoCE over ipv6 without any ipv4 addresses configured Fixes: commit 8e3867310c90 ("IB/cma: Fix a race condition in iboe_addr_get_sgid()") Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
We might get some bogus error completions in case the target will remotely invalidate the rkey and the HCA will need to retransmit from this buffer. Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Vijay Immanuel 提交于
If we modified the qp to ERROR state, and drained the recieve queue, post_recv must trigger the responder task to complete the drain work request. Cc: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NVijay Immanuel <vijayi@attalasystems.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>-- Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Amrani, Ram 提交于
Wrap ib_copy_to_udata with a function that ensures that the data being copied over to user space isn't longer than the allowed. Fixes: cecbcddf ("qedr: Add support for QP verbs") Fixes: a7efd777 ("qedr: Add support for PD,PKEY and CQ verbs") Fixes: ac1b36e5 ("qedr: Add support for user context verbs") Signed-off-by: NRam Amrani <Ram.Amrani@cavium.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ganesh Goudar 提交于
Only use the read sge lkey/addr and the remote rkey/addr if the length of the read is not zero. Otherwise the read response might be treated as the RTR read response and not delivered to the application. Or worse Terminator hardware will fail a 0B read if the STAG is 0 even if the read length is 0. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Håkon Bugge 提交于
CM REQs cannot be successfully retried, because a new pv_cm_id is created for each request, without checking if one already exists. By checking if an id exists before creating one, the bug is fixed. This bug can be provoked by running an RDMA CM user-land application, but inserting a five seconds delay before the rdma_accept() call on the passive side. This delay is larger than the default CMA timeout, and triggers a retry from the active side. The retried REQ will use another pv_cm_id (the cm_id on the wire). This confuses the CM protocol and two REJs are sent from the passive side. Here is an excerpt from ibdump running without the patch: 3.285092 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382711 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.382861 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject 7.387644 LID: 4 -> LID: 4 InfiniBand 290 CM: ConnectReject and here is the same with bug fix applied: 3.251010 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 7.349387 LID: 4 -> LID: 4 SDP 290 CM: ConnectRequest(SDP Hello) 8.258443 LID: 4 -> LID: 4 SDP 290 CM: ConnectReply(SDP Hello) 8.259890 LID: 4 -> LID: 4 InfiniBand 290 CM: ReadyToUse Suggested-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com> Reported-by: NWei Lin Guay <wei.lin.guay@oracle.com> Tested-by: NWei Lin Guay <wei.lin.guay@oracle.com> Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com> Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Kaike Wan 提交于
Current computation of qp->timeout_jiffies in rvt_modify_qp() will cause overflow due to the fact that the input to the function usecs_to_jiffies is only 32-bit ( unsigned int). Overflow will occur when attr->timeout is equal to or greater than 30. The consequence is unnecessarily excessive retry and thus degradation of the system performance. This patch fixes the problem by limiting the input to 5-bit and calling usecs_to_jiffies() before multiplying the scaling factor. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NKaike Wan <kaike.wan@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Matan Barak 提交于
Delete unused variables to prevent sparse warnings. Fixes: db1b5ddd ("IB/core: Rename uverbs event file structure") Fixes: fd3c7904 ("IB/core: Change idr objects to use the new schema") Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Selvin Xavier 提交于
Local ack delay exposed by the driver is 0 which means infinite QP timeout. Reporting the default value to 16 (approx 260ms) Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Selvin Xavier 提交于
While invoking the req_notify_cq hook, ULPs can request whether the CQs have any CQEs pending. If CQEs are pending, drivers can indicate it by returning 1 for req_notify_cq. The stack will poll CQ again till CQ is empty. This patch peeks the CQ for any valid entries and return accordingly. Signed-off-by: NSelvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-