- 12 4月, 2014 4 次提交
-
-
由 Steve Wise 提交于
There is a race when moving a QP from RTS->CLOSING where a SQ work request could be posted after the FW receives the RDMA_RI/FINI WR. The SQ work request will never get processed, and should be completed with FLUSHED status. Function c4iw_flush_sq(), however was dropping the oldest SQ work request when in CLOSING or IDLE states, instead of completing the pending work request. If that oldest pending work request was actually complete and has a CQE in the CQ, then when that CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent SQ/CQ state. This is a very small timing hole and has only been hit once so far. The fix is two-fold: 1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED status regardless of the QP state. 2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue before moving the state out of RTS. This ensures that the state transition will not happen while another thread is in post_rc_send(), because set_state() and post_rc_send() both aquire the qp spinlock. Also, once we transition the state out of RTS, subsequent calls to post_rc_send() will fail because the "in error" bit is set. I don't think this fully closes the race where the FW can get a FINI followed a SQ work request being posted (because they are posted to differente EQs), but the #1 fix will handle the issue by flushing the SQ work request. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Some HW platforms can reorder read operations, so we must rmb() after we see a valid gen bit in a CQE but before we read any other fields from the CQE. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
1) timedout endpoint processing can be starved. If there are continual CPL messages flowing into the driver, the endpoint timeout processing can be starved. This condition exposed the other bugs below. Solution: In process_work(), call process_timedout_eps() after each CPL is processed. 2) Connection events can be processed even though the endpoint is on the timeout list. If the endpoint is scheduled for timeout processing, then we must ignore MPA Start Requests and Replies. Solution: Change stop_ep_timer() to return 1 if the ep has already been queued for timeout processing. All the callers of stop_ep_timer() need to check this and act accordingly. There are just a few cases where the caller needs to do something different if stop_ep_timer() returns 1: 1) in process_mpa_reply(), ignore the reply and process_timeout() will abort the connection. 2) in process_mpa_request, ignore the request and process_timeout() will abort the connection. It is ok for callers of stop_ep_timer() to abort the connection since that will leave the state in ABORTING or DEAD, and process_timeout() now ignores timeouts when the ep is in these states. 3) Double insertion on the timeout list. Since the endpoint timers are used for connection setup and teardown, we need to guard against the possibility that an endpoint is already on the timeout list. This is a rare condition and only seen under heavy load and in the presense of the above 2 bugs. Solution: In ep_timeout(), don't queue the endpoint if it is already on the queue. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> [ Fix cast from u64* to integer. - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 02 4月, 2014 4 次提交
-
-
由 Steve Wise 提交于
Current hardware doesn't correctly support DSGL. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
To avoid racing with other threads doing close/flush/whatever, rx_data() should hold the endpoint mutex. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
There is a race between ULP threads doing an accept/reject, and the ingress processing thread handling close/abort for the same connection. The accept/reject path needs to hold the lock to serialize these paths. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> [ Fold in locking fix found by Dan Carpenter <dan.carpenter@oracle.com>. - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 29 3月, 2014 1 次提交
-
-
由 Yann Droneaud 提交于
If kmalloc() fails in c4iw_alloc_ucontext(), the function leaves but does not set an error code in ret variable: it will return 0 to the caller. This patch set ret to -ENOMEM in such case. Cc: Steve Wise <swise@opengridcomputing.com> Cc: Steve Wise <swise@chelsio.com> Signed-off-by: NYann Droneaud <ydroneaud@opteya.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 3月, 2014 5 次提交
-
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
When processing an MPA Start Request, if the listening endpoint is DEAD, then abort the connection. If the IWCM returns an error, then we must abort the connection and release resources. Also abort_connection() should not post a CLOSE event, so clean that up too. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
These are generated by HW in some error cases and need to be silently discarded. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must free the request skb and the saved skb with the tcp header. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 21 3月, 2014 9 次提交
-
-
由 Steve Wise 提交于
We cannot save the mapped length using the rdma max_page_list_len field of the ib_fast_reg_page_list struct because the core code uses it. This results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl(). I found this with dma mapping debugging enabled in the kernel. The fix is to save the length in the c4iw_fr_page_list struct. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Based on original work from Jay Hernandez <jay@chelsio.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Always release the neigh entry in rx_pkt(). Based on original work by Santosh Rastapur <santosh@chelsio.com>. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
find_route() must treat loopback as a valid egress interface. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Dan Carpenter 提交于
There is a four byte hole at the end of the "uresp" struct after the ->qid_mask member. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Dan Carpenter 提交于
These sizes should be unsigned so we don't allow negative values and have underflow bugs. These can come from the user so there may be security implications, but I have not tested this. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 15 3月, 2014 2 次提交
-
-
由 Steve Wise 提交于
The current logic suffers from a slow response time to disable user DB usage, and also fails to avoid DB FIFO drops under heavy load. This commit fixes these deficiencies and makes the avoidance logic more optimal. This is done by more efficiently notifying the ULDs of potential DB problems, and implements a smoother flow control algorithm in iw_cxgb4, which is the ULD that puts the most load on the DB fifo. Design: cxgb4: Direct ULD callback from the DB FULL/DROP interrupt handler. This allows the ULD to stop doing user DB writes as quickly as possible. While user DB usage is disabled, the LLD will accumulate DB write events for its queues. Then once DB usage is reenabled, a single DB write is done for each queue with its accumulated write count. This reduces the load put on the DB fifo when reenabling. iw_cxgb4: Instead of marking each qp to indicate DB writes are disabled, we create a device-global status page that each user process maps. This allows iw_cxgb4 to only set this single bit to disable all DB writes for all user QPs vs traversing the idr of all the active QPs. If the libcxgb4 doesn't support this, then we fall back to the old approach of marking each QP. Thus we allow the new driver to work with an older libcxgb4. When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes via the status page and transition the DB state to STOPPED. As user processes see that DB writes are disabled, they call into iw_cxgb4 to submit their DB write events. Since the DB state is in STOPPED, the QP trying to write gets enqueued on a new DB "flow control" list. As subsequent DB writes are submitted for this flow controlled QP, the amount of writes are accumulated for each QP on the flow control list. So all the user QPs that are actively ringing the DB get put on this list and the number of writes they request are accumulated. When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq context, we change the DB state to FLOW_CONTROL, and begin resuming all the QPs that are on the flow control list. This logic runs on until the flow control list is empty or we exit FLOW_CONTROL mode (due to a DB DROP upcall, for example). QPs are removed from this list, and their accumulated DB write counts written to the DB FIFO. Sets of QPs, called chunks in the code, are removed at one time. The chunk size is 64. So 64 QPs are resumed at a time, and before the next chunk is resumed, the logic waits (blocks) for the DB FIFO to drain. This prevents resuming to quickly and overflowing the FIFO. Once the flow control list is empty, the db state transitions back to NORMAL and user QPs are again allowed to write directly to the user DB register. The algorithm is designed such that if the DB write load is high enough, then all the DB writes get submitted by the kernel using this flow controlled approach to avoid DB drops. As the load lightens though, we resume to normal DB writes directly by user applications. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Steve Wise 提交于
Based on original work by Anand Priyadarshee <anandp@chelsio.com>. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 3月, 2014 1 次提交
-
-
由 Yishai Hadas 提交于
This patch refactors the IB core umem code and vendor drivers to use a linear (chained) SG table instead of chunk list. With this change the relevant code becomes clearer—no need for nested loops to build and use umem. Signed-off-by: NShachar Raindel <raindel@mellanox.com> Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 14 2月, 2014 1 次提交
-
-
由 Kumar Sanghvi 提交于
Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 23 1月, 2014 1 次提交
-
-
由 Paul Bolle 提交于
Building mem.o for 32 bits x86 triggers a GCC warning: drivers/infiniband/hw/cxgb4/mem.c: In function '_c4iw_write_mem_dma_aligned': drivers/infiniband/hw/cxgb4/mem.c:79:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Silence that warning by casting "&wr_wait" to unsigned long before casting it to __be64. That's what _c4iw_write_mem_inline() already does. Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Acked-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 23 12月, 2013 3 次提交
-
-
由 Kumar Sanghvi 提交于
Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kumar Sanghvi 提交于
Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kumar Sanghvi 提交于
Based on original work by Santosh Rastapur <santosh@chelsio.com> Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com> Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 12月, 2013 1 次提交
-
-
由 Rashika 提交于
This patch marks the function _c4iw_write_mem_dma() as static because it is not used outside this file, which fixes the warning: drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes] Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com> Acked-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 09 11月, 2013 1 次提交
-
-
由 Ben Hutchings 提交于
Physical addresses may be wider than virtual addresses (e.g. on i386 with PAE) and must not be formatted with %p. Compile-tested only. Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 14 8月, 2013 7 次提交
-
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Lustre uses a advertised max MR size of ~0ULL to indicate it should use a dma_mr. Hence advertise max MR size as ~0ULL. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
When polling, we do a GTS update if the accumulated cidx_inc == the CQ depth / 16. However, if the CQ is large enough, Cq depth / 16 exceeds the size of the field in the GTS word. So we also need to update if cidx_inc hits CIDXINC_MASK to avoid overflowing the field. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
accept_cr() failed to set the arp error handler on a reused skb. This results in a kernel crash if the arp does indeed time out. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
When determining how many WRs are completed with a signaled CQE, correctly deal with queue wraps. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
This patch makes following fixes in QP flush logic: - correctly flushes unsignaled WRs followed by a signaled WR - supports for flushing a CQ bound to multiple QPs - resets cidx_flush if a active queue starts getting HW CQEs again - marks WQ in error when we leave RTS. This was only being done for user queues, but we need it for kernel queues too so that post_send/post_recv will start returning the appropriate error synchronously - eats unsignaled read resp CQEs. HW always inserts CQEs so we must silently discard them if the read work request was unsignaled. - handles QP flushes with pending SW CQEs. The flush and out of order completion logic has a bug where if out of order completions are flushed but not yet polled by the consumer and the qp is then flushed then we end up inserting duplicate completions. - c4iw_flush_sq() should only flush wrs that have not already been flushed. Since we already track where in the SQ we've flushed via sq.cidx_flush, just start at that point and flush any remaining. This bug only caused a problem in the presence of unsignaled work requests. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> [ Fixed sparse warning due to htonl/ntohl confusion. - Roland ] Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
由 Steve Wise 提交于
Move QP to TERMINATE instead to allow the peer to get the TERM message. This bug wasn't detectable until newer FW that moves connections out of RDMA mode as soon as an error is detected. QP can exit RTS before the last AE arrives. This was introduced by changes in the FW to kick connections out of RDMA mode as soon as an error is detected. A side effect of this is that the driver can move the QP out of RTS before the AE causing the connection to get kicked out of RDMA mode is processed. Fix for this is to always post async errors even if the QP is out of RTS. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NVipul Pandya <vipul@chelsio.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-