提交 · 0d0f738f6a11856a704dcd8fd3a008b200f17625 · openeuler / raspberrypi-kernel

19 2月, 2015 1 次提交

RDMA/cxgb4: Don't hang threads forever waiting on WR replies · 1fc8190d

由 Hariprasad S 提交于 12月 17, 2014

In c4iw_wait_for_reply(), if a FW6_MSG WR reply is not received after
C4IW_WR_TO seconds, fail the WR operation and mark the device as fatally
dead.  Further, if the device is marked fatally dead, then fail the WR
wait immediately.

Also change the timeout to 60 seconds.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1fc8190d

22 7月, 2014 1 次提交

iw_cxgb4: Don't limit TPTE count to 32KB · 91244bbd

由 Hariprasad Shenai 提交于 7月 21, 2014

Use the size advertised by FW
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91244bbd

16 7月, 2014 3 次提交

cxgb4/iw_cxgb4: work request logging feature · 7730b4c7

由 Hariprasad Shenai 提交于 7月 14, 2014

This commit enhances the iwarp driver to optionally keep a log of rdma
work request timining data for kernel mode QPs.  If iw_cxgb4 module option
c4iw_wr_log is set to non-zero, each work request is tracked and timing
data maintained in a rolling log that is 4096 entries deep by default.
Module option c4iw_wr_log_size_order allows specifing a log2 size to use
instead of the default order of 12 (4096 entries). Both module options
are read-only and must be passed in at module load time to set them. IE:

modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10

The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
Writing anything to this file will clear all the timing data.
Data tracked includes:

- The host time when the work request was posted, just before ringing
the doorbell.  The host time when the completion was polled by the
application.  This is also the time the log entry is created.  The delta
of these two times is the amount of time took processing the work request.

- The qid of the EQ used to post the work request.

- The work request opcode.

- The cqe wr_id field.  For sq completions requests this is the swsqe
index.  For recv completions this is the MSN of the ingress SEND.
This value can be used to match log entries from this log with firmware
flowc event entries.

- The sge timestamp value just before ringing the doorbell when
posting,  the sge timestamp value just after polling the completion,
and CQE.timestamp field from the completion itself.  With these three
timestamps we can track the latency from post to poll, and the amount
of time the completion resided in the CQ before being reaped by the
application.  With debug firmware, the sge timestamp is also logged by
firmware in its flowc history so that we can compute the latency from
posting the work request until the firmware sees it.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7730b4c7

cxgb4/iw_cxgb4: use firmware ord/ird resource limits · 4c2c5763

由 Hariprasad Shenai 提交于 7月 14, 2014

Advertise a larger max read queue depth for qps, and gather the resource limits
from fw and use them to avoid exhaustinq all the resources.

Design:

cxgb4:

Obtain the max_ordird_qp and max_ird_adapter device params from FW
at init time and pass them up to the ULDs when they attach.  If these
parameters are not available, due to older firmware, then hard-code
the values based on the known values for older firmware.
iw_cxgb4:

Fix the c4iw_query_device() to report these correct values based on
adapter parameters.  ibv_query_device() will always return:

max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
max_res_rd_atom = max_ird_adapter

Bump up the per qp max module option to 32, allowing it to be increased
by the user up to the device max of max_ordird_qp.  32 seems to be
sufficient to maximize throughput for streaming read benchmarks.

Fail connection setup if the negotiated IRD exhausts the available
adapter ird resources.  So the driver will track the amount of ird
resource in use and not send an RI_WR/INIT to FW that would reduce the
available ird resources below zero.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c2c5763

iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21

由 Hariprasad Shenai 提交于 7月 14, 2014

Updates iw_cxgb4 to determine the Ingress Padding Boundary from
cxgb4_lld_info, and take subsequent actions.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04e10e21

14 7月, 2014 1 次提交

RDMA/cxgb4: Call iwpm_init() only once · 46c1376d

由 Steve Wise 提交于 6月 20, 2014

We need to only register with the iwpm core once. Currently it is
being done for every adapter, which causes a failure for each adapter
but the first, making multiple adapters unusable.

Fixes: 9eccfe10 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service")
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

46c1376d

11 6月, 2014 2 次提交

iw_cxgb4: don't truncate the recv window size · b408ff28

由 Hariprasad Shenai 提交于 6月 06, 2014

Fixed a bug that shows up with recv window sizes that exceed the size of
the RCV_BUFSIZ field in opt0 (>= 1024K).  If the recv window exceeds
this, then we specify the max possible in opt0, add add the rest in via
a RX_DATA_ACK credits.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b408ff28

RDMA/cxgb4: Add support for iWARP Port Mapper user space service · 9eccfe10

由 Steve Wise 提交于 3月 26, 2014

Based on original work by Vipul Pandya.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fix htons -> ntohs to make sparse happy.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9eccfe10

29 4月, 2014 1 次提交

RDMA/cxgb4: Fix endpoint mutex deadlocks · cc18b939

由 Steve Wise 提交于 4月 24, 2014

In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock
because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
!internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
The design was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should
disconnect after releasing the endpoint mutex.  Now rx_data() will do
the disconnect in the cases where process_mpa_reply() wants to
disconnect after the TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
and to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the
state, and make all calls to abort_connection() do so with the mutex
held.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

cc18b939

12 4月, 2014 1 次提交

RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices · fa658a98

由 Steve Wise 提交于 4月 09, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fix cast from u64* to integer.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fa658a98

21 3月, 2014 2 次提交

RDMA/cxgb4: Save the correct map length for fast_reg_page_lists · eda6d1d1

由 Steve Wise 提交于 3月 19, 2014

We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it. This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().

I found this with dma mapping debugging enabled in the kernel. The
fix is to save the length in the c4iw_fr_page_list struct.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

eda6d1d1

RDMA/cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes · ba32de9d

由 Steve Wise 提交于 3月 19, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ba32de9d

15 3月, 2014 1 次提交

cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389

由 Steve Wise 提交于 3月 14, 2014

The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler. This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues. Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count. This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps. This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs. If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP. Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED. As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events. Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list. This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example). QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO. Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain. This prevents resuming to
quickly and overflowing the FIFO. Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops. As the load lightens though, we
resume to normal DB writes directly by user applications.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05eb2389

14 8月, 2013 2 次提交

RDMA/cxgb4: Fix QP flush logic · 1cf24dce

由 Steve Wise 提交于 8月 06, 2013

This patch makes following fixes in QP flush logic:

- correctly flushes unsignaled WRs followed by a signaled WR
- supports for flushing a CQ bound to multiple QPs
- resets cidx_flush if a active queue starts getting HW CQEs again
- marks WQ in error when we leave RTS. This was only being done for
  user queues, but we need it for kernel queues too so that
  post_send/post_recv will start returning the appropriate error
  synchronously
- eats unsignaled read resp CQEs. HW always inserts CQEs so we must
  silently discard them if the read work request was unsignaled.
- handles QP flushes with pending SW CQEs. The flush and out of order
  completion logic has a bug where if out of order completions are
  flushed but not yet polled by the consumer and the qp is then
  flushed then we end up inserting duplicate completions.
- c4iw_flush_sq() should only flush wrs that have not already been
  flushed.  Since we already track where in the SQ we've flushed via
  sq.cidx_flush, just start at that point and flush any remaining.
  This bug only caused a problem in the presence of unsignaled work
  requests.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>

[ Fixed sparse warning due to htonl/ntohl confusion.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1cf24dce

RDMA/cxgb4: Add support for active and passive open connection with IPv6 address · 830662f6

由 Vipul Pandya 提交于 7月 04, 2013

Add new cpl messages, cpl_act_open_req6 and cpl_t5_act_open_req6, for
initiating active open connections.

Use LLD api cxgb4_create_server and cxgb4_create_server6 for
initiating passive open connections. Similarly use cxgb4_remove_server
to remove the passive open connections in place of listen_stop.

Add support for iWARP over VLAN device and enable IPv6 support on VLAN device.

Make use of import_ep in c4iw_reconnect.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>

[ Fix build when IPv6 is disabled and make sure iw_cxgb4 is not built-in
  when ipv6 is a module.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

830662f6

14 3月, 2013 4 次提交

RDMA/cxgb4: Bump tcam_full stat and WR reply timeout · 3b174d94

由 Vipul Pandya 提交于 3月 14, 2013

Always bump the tcam_full stat. Also, bump wr reply timeout to 30 seconds.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b174d94

RDMA/cxgb4: Use DSGLs for fastreg and adapter memory writes for T5. · 42b6a949

由 Vipul Pandya 提交于 3月 14, 2013

It enables direct DMA by HW to memory region PBL arrays and fast register PBL
arrays from host memory, vs the T4 way of passing these arrays in the WR itself.
The result is lower latency for memory registration, and larger PBL array
support for fast register operations.

This patch also updates ULP_TX_MEM_WRITE command fields for T5. Ordering bit of
ULP_TX_MEM_WRITE is at bit position 22 in T5 and at 23 in T4.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42b6a949

RDMA/cxgb4: Add module_params to enable DB FC & Coalescing on T5 · 80ccdd60

由 Vipul Pandya 提交于 3月 14, 2013

Both DB Flow-Control and DB Coalescing are disabled by default on T5
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80ccdd60

RDMA/cxgb4: Add Support for Chelsio T5 adapter · f079af7a

由 Vipul Pandya 提交于 3月 14, 2013

Adds support for Chelsio T5 adapter.
Enables T5's Write Combining feature.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f079af7a

28 2月, 2013 1 次提交

IB/cxgb4: convert to idr_alloc() · e8d4dd60

由 Tejun Heo 提交于 2月 27, 2013

Convert to the much saner new idr interface.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e8d4dd60

22 2月, 2013 1 次提交

IB/core: Add "type 2" memory windows support · 7083e42e

由 Shani Michaeli 提交于 2月 06, 2013

This patch enhances the IB core support for Memory Windows (MWs).

MWs allow an application to have better/flexible control over remote
access to memory.

Two types of MWs are supported, with the second type having two flavors:

    Type 1  - associated with PD only
    Type 2A - associated with QPN only
    Type 2B - associated with PD and QPN

Applications can allocate a MW once, and then repeatedly bind the MW
to different ranges in MRs that are associated to the same PD. Type 1
windows are bound through a verb, while type 2 windows are bound by
posting a work request.

The 32-bit memory key is composed of a 24-bit index and an 8-bit
key. The key is changed with each bind, thus allowing more control
over the peer's use of the memory key.

The changes introduced are the following:

* add memory window type enum and a corresponding parameter to ib_alloc_mw.
* type 2 memory window bind work request support.
* create a struct that contains the common part of the bind verb struct
  ibv_mw_bind and the bind work request into a single struct.
* add the ib_inc_rkey helper function to advance the tag part of an rkey.

Consumer interface details:

* new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and
  IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support
  for these features.

  Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or
  IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B
  memory windows. It can set neither to indicate it doesn't support
  type 2 windows at all.

* modify existing provides and consumers code to the new param of
  ib_alloc_mw and the ib_mw_bind_info structure
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7083e42e

15 2月, 2013 2 次提交

RDMA/cxgb4: Fix endpoint timeout race condition · 1ec779cc

由 Vipul Pandya 提交于 1月 07, 2013

The endpoint timeout logic had a race that could cause an endpoint
object to be freed while it was still on the timedout list.  This
can happen if the timer is stopped after it had fired, but before
the timedout thread processed the endpoint timeout.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1ec779cc

RDMA/cxgb4: Keep QP referenced until TID released · 325abead

由 Vipul Pandya 提交于 1月 07, 2013

The driver is currently releasing the last ref on the QP too early.
This can cause bus errors due to HW still fetching WRs from the HW
queue.  The fix is to keep a qp ref until we release the HW TID.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

325abead

20 12月, 2012 2 次提交

RDMA/cxgb4: Fix bug for active and passive LE hash collision path · 793dad94

由 Vipul Pandya 提交于 12月 10, 2012

Retries active opens for INUSE errors.

Logs any active ofld_connect_wr error replies.

Sends ofld_connect_wr on same ctrlq. It needs to go  on the same control txq as
regular CPL active/passive messages.

Retries on active open replies with EADDRINUSE.

Uses active open fw wr only if active filter region is set.

Adds stat for ofld_connect_wr failures.

This patch also adds debugfs file to show endpoints.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

793dad94

RDMA/cxgb4: Fix LE hash collision bug for active open connection · 5be78ee9

由 Vipul Pandya 提交于 12月 10, 2012

It enables establishing active open connection using fw_ofld_connection work
request when cpl_act_open_rpl says TCAM full error which may be because
of LE hash collision. Current support is only for IPv4 active open connections.

Sets ntuple bits in active open requests. For T4 firmware greater than 1.4.10.0
ntuple bits are required to be set.

Adds nocong and enable_ecn module parameter options.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>

[ Move all FW return values to t4fw_api.h.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5be78ee9

19 5月, 2012 6 次提交

RDMA/cxgb4: Add query_qp support · 67bbc055