提交 · b4e2901c52cc79f287e2b25804e029880e5e4b07 · openeuler / raspberrypi-kernel

12 4月, 2014 4 次提交

由 Steve Wise 提交于 4月 09, 2014

There is a race when moving a QP from RTS->CLOSING where a SQ work
request could be posted after the FW receives the RDMA_RI/FINI WR.
The SQ work request will never get processed, and should be completed
with FLUSHED status.  Function c4iw_flush_sq(), however was dropping
the oldest SQ work request when in CLOSING or IDLE states, instead of
completing the pending work request. If that oldest pending work
request was actually complete and has a CQE in the CQ, then when that
CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
SQ/CQ state.

This is a very small timing hole and has only been hit once so far.

The fix is two-fold:

1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
   status regardless of the QP state.

2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
   before moving the state out of RTS.  This ensures that the state
   transition will not happen while another thread is in
   post_rc_send(), because set_state() and post_rc_send() both aquire
   the qp spinlock.  Also, once we transition the state out of RTS,
   subsequent calls to post_rc_send() will fail because the "in error"
   bit is set.  I don't think this fully closes the race where the FW
   can get a FINI followed a SQ work request being posted (because
   they are posted to differente EQs), but the #1 fix will handle the
   issue by flushing the SQ work request.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b4e2901c

RDMA/cxgb4: rmb() after reading valid gen bit · def4771f

由 Steve Wise 提交于 4月 09, 2014

Some HW platforms can reorder read operations, so we must rmb() after
we see a valid gen bit in a CQE but before we read any other fields
from the CQE.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

def4771f

RDMA/cxgb4: Endpoint timeout fixes · b33bd0cb

由 Steve Wise 提交于 4月 09, 2014

1) timedout endpoint processing can be starved. If there are continual
   CPL messages flowing into the driver, the endpoint timeout
   processing can be starved.  This condition exposed the other bugs
   below.

Solution: In process_work(), call process_timedout_eps() after each CPL
is processed.

2) Connection events can be processed even though the endpoint is on
   the timeout list.  If the endpoint is scheduled for timeout
   processing, then we must ignore MPA Start Requests and Replies.

Solution: Change stop_ep_timer() to return 1 if the ep has already been
queued for timeout processing.  All the callers of stop_ep_timer() need
to check this and act accordingly.  There are just a few cases where
the caller needs to do something different if stop_ep_timer() returns 1:

1) in process_mpa_reply(), ignore the reply and  process_timeout()
   will abort the connection.

2) in process_mpa_request, ignore the request and process_timeout()
   will abort the connection.

It is ok for callers of stop_ep_timer() to abort the connection since
that will leave the state in ABORTING or DEAD, and process_timeout()
now ignores timeouts when the ep is in these states.

3) Double insertion on the timeout list.  Since the endpoint timers
   are used for connection setup and teardown, we need to guard
   against the possibility that an endpoint is already on the timeout
   list.  This is a rare condition and only seen under heavy load and
   in the presense of the above 2 bugs.

Solution: In ep_timeout(), don't queue the endpoint if it is already on
the queue.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b33bd0cb

RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices · fa658a98

由 Steve Wise 提交于 4月 09, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fix cast from u64* to integer.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fa658a98

02 4月, 2014 4 次提交

RDMA/cxgb4: Disable DSGL use by default · 96bb2706

由 Steve Wise 提交于 3月 27, 2014

Current hardware doesn't correctly support DSGL.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

96bb2706

RDMA/cxgb4: rx_data() needs to hold the ep mutex · c529fb50

由 Steve Wise 提交于 3月 21, 2014

To avoid racing with other threads doing close/flush/whatever, rx_data()
should hold the endpoint mutex.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c529fb50

RDMA/cxgb4: Drop RX_DATA packets if the endpoint is gone · 977116c6

由 Steve Wise 提交于 3月 21, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

977116c6

RDMA/cxgb4: Lock around accept/reject downcalls · a7db89eb

由 Steve Wise 提交于 3月 21, 2014

There is a race between ULP threads doing an accept/reject, and the
ingress processing thread handling close/abort for the same connection.
The accept/reject path needs to hold the lock to serialize these paths.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fold in locking fix found by Dan Carpenter <dan.carpenter@oracle.com>.
  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a7db89eb

29 3月, 2014 1 次提交

RDMA/cxgb4: set error code on kmalloc() failure · bfd2793c

由 Yann Droneaud 提交于 3月 28, 2014

If kmalloc() fails in c4iw_alloc_ucontext(), the function
leaves but does not set an error code in ret variable:
it will return 0 to the caller.

This patch set ret to -ENOMEM in such case.

Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@chelsio.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfd2793c

25 3月, 2014 5 次提交

RDMA/cxgb4: Update snd_seq when sending MPA messages · 9c88aa00

由 Steve Wise 提交于 3月 21, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9c88aa00

RDMA/cxgb4: Connect_request_upcall fixes · be13b2df

由 Steve Wise 提交于 3月 21, 2014

When processing an MPA Start Request, if the listening endpoint is
DEAD, then abort the connection.

If the IWCM returns an error, then we must abort the connection and
release resources.  Also abort_connection() should not post a CLOSE
event, so clean that up too.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

be13b2df

RDMA/cxgb4: Ignore read reponse type 1 CQEs · 70b9c660

由 Steve Wise 提交于 3月 21, 2014

These are generated by HW in some error cases and need to be
silently discarded.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

70b9c660

RDMA/cxgb4: Fix possible memory leak in RX_PKT processing · 1ce1d471

由 Steve Wise 提交于 3月 21, 2014

If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must
free the request skb and the saved skb with the tcp header.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1ce1d471

RDMA/cxgb4: Don't leak skb in c4iw_uld_rx_handler() · dbb084cc

由 Steve Wise 提交于 3月 21, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

dbb084cc

21 3月, 2014 9 次提交

RDMA/cxgb4: Save the correct map length for fast_reg_page_lists · eda6d1d1

由 Steve Wise 提交于 3月 19, 2014

We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it. This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().

I found this with dma mapping debugging enabled in the kernel. The
fix is to save the length in the c4iw_fr_page_list struct.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

eda6d1d1

RDMA/cxgb4: Default peer2peer mode to 1 · df2d5130

由 Steve Wise 提交于 3月 19, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

df2d5130

RDMA/cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes · ba32de9d

由 Steve Wise 提交于 3月 19, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ba32de9d

RDMA/cxgb4: Fix incorrect BUG_ON conditions · 8a9c399e

由 Steve Wise 提交于 3月 19, 2014

Based on original work from Jay Hernandez <jay@chelsio.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8a9c399e

RDMA/cxgb4: Always release neigh entry · ebf00060

由 Steve Wise 提交于 3月 19, 2014

Always release the neigh entry in rx_pkt().

Based on original work by Santosh Rastapur <santosh@chelsio.com>.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ebf00060

RDMA/cxgb4: Allow loopback connections · f8e81908

由 Steve Wise 提交于 3月 19, 2014

find_route() must treat loopback as a valid egress interface.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f8e81908

RDMA/cxgb4: Cap CQ size at T4_MAX_IQ_SIZE · ffd43592

由 Steve Wise 提交于 3月 19, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ffd43592

RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq() · e24a72a3

由 Dan Carpenter 提交于 10月 19, 2013

There is a four byte hole at the end of the "uresp" struct after the
->qid_mask member.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e24a72a3

RDMA/cxgb4: Fix underflows in c4iw_create_qp() · ff1706f4

由 Dan Carpenter 提交于 10月 19, 2013

These sizes should be unsigned so we don't allow negative values and
have underflow bugs.  These can come from the user so there may be
security implications, but I have not tested this.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ff1706f4

15 3月, 2014 2 次提交

cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389

由 Steve Wise 提交于 3月 14, 2014

The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler. This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues. Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count. This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps. This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs. If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP. Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED. As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events. Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list. This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example). QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO. Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain. This prevents resuming to
quickly and overflowing the FIFO. Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops. As the load lightens though, we
resume to normal DB writes directly by user applications.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05eb2389

cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice · 7a2cea2a

由 Steve Wise 提交于 3月 14, 2014

Based on original work by Anand Priyadarshee <anandp@chelsio.com>.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a2cea2a

05 3月, 2014 1 次提交

IB: Refactor umem to use linear SG table · eeb8461e

由 Yishai Hadas 提交于 1月 28, 2014

This patch refactors the IB core umem code and vendor drivers to use a
linear (chained) SG table instead of chunk list.  With this change the
relevant code becomes clearer—no need for nested loops to build and
use umem.
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

eeb8461e

14 2月, 2014 1 次提交

RDMA/cxgb4: Add missing neigh_release in LE-Workaround path · 0f013200

由 Kumar Sanghvi 提交于 2月 06, 2014

Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0f013200

23 1月, 2014 1 次提交

RDMA/cxgb4: Fix gcc warning on 32-bit arch · 298589b1

由 Paul Bolle 提交于 1月 09, 2014

Building mem.o for 32 bits x86 triggers a GCC warning:

drivers/infiniband/hw/cxgb4/mem.c: In function '_c4iw_write_mem_dma_aligned':
drivers/infiniband/hw/cxgb4/mem.c:79:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Silence that warning by casting "&wr_wait" to unsigned long before
casting it to __be64. That's what _c4iw_write_mem_inline() already does.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

298589b1

23 12月, 2013 3 次提交

RDMA/cxgb4: Use cxgb4_select_ntuple to correctly calculate ntuple fields · 41b4f86c

由 Kumar Sanghvi 提交于 12月 18, 2013

Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41b4f86c

RDMA/cxgb4: Server filters are supported only for IPv4 · 8c044690

由 Kumar Sanghvi 提交于 12月 18, 2013

Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c044690

RDMA/cxgb4: Calculate the filter server TID properly · a4ea025f

由 Kumar Sanghvi 提交于 12月 18, 2013

Based on original work by Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: NKumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4ea025f

16 12月, 2013 1 次提交

RDMA/cxgb4: Make _c4iw_write_mem_dma() static · c00850dd

由 Rashika 提交于 12月 14, 2013

This patch marks the function _c4iw_write_mem_dma() as static
because it is not used outside this file, which fixes the warning:

drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes]
Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c00850dd

09 11月, 2013 1 次提交

IB/cxgb4: Fix formatting of physical address · 649fb5ec

由 Ben Hutchings 提交于 10月 27, 2013

Physical addresses may be wider than virtual addresses (e.g. on i386
with PAE) and must not be formatted with %p.

Compile-tested only.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

649fb5ec

14 8月, 2013 7 次提交

RDMA/cxgb4: Issue RI.FINI before closing when entering TERM · 09992579

由 Steve Wise 提交于 8月 06, 2013

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

09992579

RDMA/cxgb4: Advertise ~0ULL as max MR size · a2de1499

由 Steve Wise 提交于 8月 06, 2013

Lustre uses a advertised max MR size of ~0ULL to indicate it should
use a dma_mr.  Hence advertise max MR size as ~0ULL.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a2de1499

RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASK · b298881f

由 Steve Wise 提交于 8月 06, 2013

When polling, we do a GTS update if the accumulated cidx_inc == the CQ
depth / 16.  However, if the CQ is large enough, Cq depth / 16 exceeds
the size of the field in the GTS word.  So we also need to update if
cidx_inc hits CIDXINC_MASK to avoid overflowing the field.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b298881f

RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messages · b38a0ad8

由 Steve Wise 提交于 8月 06, 2013

accept_cr() failed to set the arp error handler on a reused skb.  This
results in a kernel crash if the arp does indeed time out.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b38a0ad8

RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap · 27ca34f5

由 Steve Wise 提交于 8月 06, 2013

When determining how many WRs are completed with a signaled CQE,
correctly deal with queue wraps.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

27ca34f5

RDMA/cxgb4: Fix QP flush logic · 1cf24dce

由 Steve Wise 提交于 8月 06, 2013

This patch makes following fixes in QP flush logic:

- correctly flushes unsignaled WRs followed by a signaled WR
- supports for flushing a CQ bound to multiple QPs
- resets cidx_flush if a active queue starts getting HW CQEs again
- marks WQ in error when we leave RTS. This was only being done for
  user queues, but we need it for kernel queues too so that
  post_send/post_recv will start returning the appropriate error
  synchronously
- eats unsignaled read resp CQEs. HW always inserts CQEs so we must
  silently discard them if the read work request was unsignaled.
- handles QP flushes with pending SW CQEs. The flush and out of order
  completion logic has a bug where if out of order completions are
  flushed but not yet polled by the consumer and the qp is then
  flushed then we end up inserting duplicate completions.
- c4iw_flush_sq() should only flush wrs that have not already been
  flushed.  Since we already track where in the SQ we've flushed via
  sq.cidx_flush, just start at that point and flush any remaining.
  This bug only caused a problem in the presence of unsignaled work
  requests.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>

[ Fixed sparse warning due to htonl/ntohl confusion.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1cf24dce

RDMA/cxgb4: Handle newer firmware changes · 97d7ec0c

由 Steve Wise 提交于 8月 06, 2013

Move QP to TERMINATE instead to allow the peer to get the TERM
message. This bug wasn't detectable until newer FW that moves
connections out of RDMA mode as soon as an error is detected.

QP can exit RTS before the last AE arrives.  This was introduced by
changes in the FW to kick connections out of RDMA mode as soon as an
error is detected.  A side effect of this is that the driver can move
the QP out of RTS before the AE causing the connection to get kicked
out of RDMA mode is processed.  Fix for this is to always post async
errors even if the QP is out of RTS.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

97d7ec0c