提交 · 66eb19af0b459426a1f6ba3f78235ffecd1bc5ab · openeuler / raspberrypi-kernel

22 7月, 2014 1 次提交

iw_cxgb4: advertise the correct device max attributes · 66eb19af

由 Hariprasad Shenai 提交于 7月 21, 2014

Advertise the actual max limits for things like qp depths, number of
qps, cqs, etc.

Clean up the queue allocation for qps and cqs.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66eb19af

16 7月, 2014 3 次提交

cxgb4/iw_cxgb4: work request logging feature · 7730b4c7

由 Hariprasad Shenai 提交于 7月 14, 2014

This commit enhances the iwarp driver to optionally keep a log of rdma
work request timining data for kernel mode QPs.  If iw_cxgb4 module option
c4iw_wr_log is set to non-zero, each work request is tracked and timing
data maintained in a rolling log that is 4096 entries deep by default.
Module option c4iw_wr_log_size_order allows specifing a log2 size to use
instead of the default order of 12 (4096 entries). Both module options
are read-only and must be passed in at module load time to set them. IE:

modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10

The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
Writing anything to this file will clear all the timing data.
Data tracked includes:

- The host time when the work request was posted, just before ringing
the doorbell.  The host time when the completion was polled by the
application.  This is also the time the log entry is created.  The delta
of these two times is the amount of time took processing the work request.

- The qid of the EQ used to post the work request.

- The work request opcode.

- The cqe wr_id field.  For sq completions requests this is the swsqe
index.  For recv completions this is the MSN of the ingress SEND.
This value can be used to match log entries from this log with firmware
flowc event entries.

- The sge timestamp value just before ringing the doorbell when
posting,  the sge timestamp value just after polling the completion,
and CQE.timestamp field from the completion itself.  With these three
timestamps we can track the latency from post to poll, and the amount
of time the completion resided in the CQ before being reaped by the
application.  With debug firmware, the sge timestamp is also logged by
firmware in its flowc history so that we can compute the latency from
posting the work request until the firmware sees it.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7730b4c7

cxgb4/iw_cxgb4: display TPTE on errors · 031cf476

由 Hariprasad Shenai 提交于 7月 14, 2014

With ingress WRITE or READ RESPONSE errors, HW provides the offending
stag from the packet.  This patch adds logic to log the parsed TPTE
in this case. cxgb4 now exports a function to read a TPTE entry
from adapter memory.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

031cf476

iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21

由 Hariprasad Shenai 提交于 7月 14, 2014

Updates iw_cxgb4 to determine the Ingress Padding Boundary from
cxgb4_lld_info, and take subsequent actions.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04e10e21

11 6月, 2014 1 次提交

iw_cxgb4: Allocate and use IQs specifically for indirect interrupts · cf38be6d

由 Hariprasad Shenai 提交于 6月 06, 2014

Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
RXQs, which also handle direct interrupts for offload CPLs during RDMA
connection setup/teardown.  The intended T4 usage model, however, is to
have indirect interrupts flow through dedicated IQs. IE not to mix
indirect interrupts with CPL messages in an IQ.  This patch adds the
concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
CIQs.

Design:

cxgb4 creates and exports an array of CIQs for the RDMA ULD.  These IQs
are sized according to the max available CQs available at adapter init.
In addition, these IQs don't need FL buffers since they only service
indirect interrupts.  One CIQ is setup per RX channel similar to the
RDMA RXQs.

iw_cxgb4 will utilize these CIQs based on the vector value passed into
create_cq().  The num_comp_vectors advertised by iw_cxgb4 will be the
number of CIQs configured, and thus the vector value will be the index
into the array of CIQs.

Based on original work by Steve Wise <swise@opengridcomputing.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf38be6d

12 4月, 2014 3 次提交

RDMA/cxgb4: Max fastreg depth depends on DSGL support · a03d9f94

由 Steve Wise 提交于 4月 09, 2014

The max depth of a fastreg mr depends on whether the device supports
DSGL or not.  So compute it dynamically based on the device support
and the module use_dsgl option.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a03d9f94

RDMA/cxgb4: rmb() after reading valid gen bit · def4771f

由 Steve Wise 提交于 4月 09, 2014

Some HW platforms can reorder read operations, so we must rmb() after
we see a valid gen bit in a CQE but before we read any other fields
from the CQE.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

def4771f

RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices · fa658a98

由 Steve Wise 提交于 4月 09, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fix cast from u64* to integer.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fa658a98

15 3月, 2014 1 次提交

cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389

由 Steve Wise 提交于 3月 14, 2014

The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler. This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues. Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count. This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps. This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs. If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP. Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED. As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events. Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list. This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example). QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO. Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain. This prevents resuming to
quickly and overflowing the FIFO. Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops. As the load lightens though, we
resume to normal DB writes directly by user applications.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05eb2389

14 8月, 2013 3 次提交

RDMA/cxgb4: Advertise ~0ULL as max MR size · a2de1499

由 Steve Wise 提交于 8月 06, 2013

Lustre uses a advertised max MR size of ~0ULL to indicate it should
use a dma_mr.  Hence advertise max MR size as ~0ULL.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a2de1499

RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK · b298881f

由 Steve Wise 提交于 8月 06, 2013

When polling, we do a GTS update if the accumulated cidx_inc == the CQ
depth / 16.  However, if the CQ is large enough, Cq depth / 16 exceeds
the size of the field in the GTS word.  So we also need to update if
cidx_inc hits CIDXINC_MASK to avoid overflowing the field.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b298881f

RDMA/cxgb4: Fix QP flush logic · 1cf24dce

由 Steve Wise 提交于 8月 06, 2013

This patch makes following fixes in QP flush logic:

- correctly flushes unsignaled WRs followed by a signaled WR
- supports for flushing a CQ bound to multiple QPs
- resets cidx_flush if a active queue starts getting HW CQEs again
- marks WQ in error when we leave RTS. This was only being done for
  user queues, but we need it for kernel queues too so that
  post_send/post_recv will start returning the appropriate error
  synchronously
- eats unsignaled read resp CQEs. HW always inserts CQEs so we must
  silently discard them if the read work request was unsignaled.
- handles QP flushes with pending SW CQEs. The flush and out of order
  completion logic has a bug where if out of order completions are
  flushed but not yet polled by the consumer and the qp is then
  flushed then we end up inserting duplicate completions.
- c4iw_flush_sq() should only flush wrs that have not already been
  flushed.  Since we already track where in the SQ we've flushed via
  sq.cidx_flush, just start at that point and flush any remaining.
  This bug only caused a problem in the presence of unsignaled work
  requests.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>

[ Fixed sparse warning due to htonl/ntohl confusion.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1cf24dce

14 3月, 2013 2 次提交

RDMA/cxgb4: Use DSGLs for fastreg and adapter memory writes for T5. · 42b6a949

由 Vipul Pandya 提交于 3月 14, 2013

It enables direct DMA by HW to memory region PBL arrays and fast register PBL
arrays from host memory, vs the T4 way of passing these arrays in the WR itself.
The result is lower latency for memory registration, and larger PBL array
support for fast register operations.

This patch also updates ULP_TX_MEM_WRITE command fields for T5. Ordering bit of
ULP_TX_MEM_WRITE is at bit position 22 in T5 and at 23 in T4.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42b6a949

RDMA/cxgb4: Add Support for Chelsio T5 adapter · f079af7a

由 Vipul Pandya 提交于 3月 14, 2013

Adds support for Chelsio T5 adapter.
Enables T5's Write Combining feature.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f079af7a

19 5月, 2012 1 次提交

RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues · 422eea0a

由 Vipul Pandya 提交于 5月 18, 2012

Add module option db_fc_threshold which is the count of active QPs
that trigger automatic db flow control mode.  Automatically transition
to/from flow control mode when the active qp count crosses
db_fc_theshold.

Add more db debugfs stats

On DB DROP event from the LLD, recover all the iwarp queues.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

422eea0a

27 4月, 2011 1 次提交

cxgb4: use pgprot_writecombine() on powerpc · e297d9dd

由 Nishanth Aravamudan 提交于 3月 14, 2011

Commit fe3cc0d9 ("powerpc: Add
pgprot_writecombine") in benh's tree exposes the pgprot_writecombine()
API to drivers on powerpc. cxgb4 has an open-coded version of the same,
so use the common API now that it's available.
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Anton Blanchard <anton@samba.org>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e297d9dd

15 3月, 2011 1 次提交

RDMA/cxgb4: Do CIDX_INC updates every 1/16 CQ depth CQE reaps · ffc3f748

由 Steve Wise 提交于 3月 11, 2011

This avoids the CIDX_INC overflow issue with T4A2 when running
kernel RDMA applications.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ffc3f748

29 9月, 2010 2 次提交

RDMA/cxgb4: Fastreg NSMR fixes · 40dbf6ee

由 Steve Wise 提交于 9月 17, 2010

- Remove dsgl support - doesn't work in T4.
- Wrap the immediate PBL as needed when building it in the wr.
- Adjust max pbl depth allowed based on ulptx alignment requirements.
- Bump the slots per SQ to 5 to allow up to 128MB fast registers.
- Advertise fastreg support by default.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

40dbf6ee

RDMA/cxgb4: Support on-chip SQs · c6d7b267

由 Steve Wise 提交于 9月 13, 2010

T4 support on-chip SQs to reduce latency.  This patch adds support for
this in iw_cxgb4:

 - Manage ocqp memory like other adapter mem resources.
 - Allocate user mode SQs from ocqp mem if available.
 - Map ocqp mem to user process using write combining.
 - Map PCIE_MA_SYNC reg to user process.

Bump uverbs ABI.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c6d7b267

08 8月, 2010 1 次提交

RDMA/cxgb4: Obtain RDMA QID ranges from LLD/FW · 93fb72e4

由 Steve Wise 提交于 6月 23, 2010

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

93fb72e4

22 7月, 2010 1 次提交

RDMA/cxgb4: Support variable sized work requests · d37ac31d

由 Steve Wise 提交于 6月 10, 2010

T4 EQ entries are in multiples of 64 bytes.  Currently the RDMA SQ and
RQ use fixed sized entries composed of 4 EQ entries for the SQ and 2
EQ entries for the RQ.  For optimial latency with small IO, we need to
change this so the HW only needs to DMA the EQ entries actually used
by a given work request.

Implementation:

- add wq_pidx counter to track where we are in the EQ.  cidx/pidx are
  used for the sw sq/rq tracking and flow control.

- the variable part of work requests is the SGL.  Add new functions to
  build the SGL and/or immediate data directly in the EQ memory
  wrapping when needed.

- adjust the min burst size for the EQ contexts to 64B.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

d37ac31d

07 7月, 2010 1 次提交

RDMA/cxgb4: Use the DMA state API instead of the pci equivalents · f38926aa

由 FUJITA Tomonori 提交于 6月 03, 2010

This replace the PCI DMA state API (include/linux/pci-dma.h) with the
DMA equivalents since the PCI DMA state API will be obsolete.

No functional change.

For further information about the background:

http://marc.info/?l=linux-netdev&m=127037540020276&w=2Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f38926aa

25 5月, 2010 3 次提交

RDMA/cxgb4: Update some HW limits · f64b8843

由 Steve Wise 提交于 5月 20, 2010

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f64b8843

RDMA/cxgb4: Fix overflow bug in CQ arm · 7ec45b92

由 Steve Wise 提交于 5月 20, 2010

- wrap cq->cqidx_inc based on cq size.
- optimize t4_arm_cq logic.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7ec45b92

RDMA/cxgb4: Optimize CQ overflow detection · 84172dee

由 Steve Wise 提交于 5月 20, 2010

1) save the timestamp flit in the cq when we consume a CQE.

2) always compare the saved flit with the previous entry flit when
   reading the next CQE entry.  If the flits don't compare, then we
   have overflowed.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

84172dee

06 5月, 2010 1 次提交
- R
  MAINTAINERS: Add cxgb4 and iw_cxgb4 entries · be4c9bad
  由 Roland Dreier 提交于 5月 05, 2010
```
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
```
  be4c9bad
22 4月, 2010 1 次提交

RDMA/cxgb4: Add driver for Chelsio T4 RNIC · cfdda9d7

由 Steve Wise 提交于 4月 21, 2010

Add an RDMA/iWARP driver for Chelsio T4 Ethernet adapters.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

cfdda9d7