提交 · b7b37ee0e137c8384c6cb3a37c4621649d5acdf6 · openeuler / Kernel

02 3月, 2017 1 次提交

sched/headers: Prepare to remove the <linux/mm_types.h> dependency from <linux/sched.h> · 589ee628

由 Ingo Molnar 提交于 2月 04, 2017

Update code that relied on sched.h including various MM types for them.

This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

589ee628

14 1月, 2017 1 次提交

locking/atomic, kref: Add kref_read() · 2c935bc5

由 Peter Zijlstra 提交于 11月 14, 2016

Since we need to change the implementation, stop exposing internals.

Provide kref_read() to read the current reference count; typically
used for debug messages.

Kills two anti-patterns:

	atomic_read(&kref->refcount)
	kref->refcount.counter
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2c935bc5

11 1月, 2017 2 次提交

iw_cxgb4: free EQ queue memory on last deref · c12a67fe

由 Steve Wise 提交于 12月 22, 2016

Commit ad61a4c7 ("iw_cxgb4: don't block in destroy_qp awaiting
the last deref") introduced a bug where the RDMA QP EQ queue memory
(and QIDs) are possibly freed before the underlying connection has been
fully shutdown.  The result being a possible DMA read issued by HW after
the queue memory has been unmapped and freed.  This results in possible
WR corruption in the worst case, system bus errors if an IOMMU is in use,
and SGE "bad WR" errors reported in the very least.  The fix is to defer
unmap/free of queue memory and QID resources until the QP struct has
been fully dereferenced.  To do this, the c4iw_ucontext must also be kept
around until the last QP that references it is fully freed.  In addition,
since the last QP deref can happen in an IRQ disabled context, we need
a new workqueue thread to do the final unmap/free of the EQ queue memory.

Fixes: ad61a4c7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Cc: stable@vger.kernel.org
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c12a67fe

iw_cxgb4: refactor sq/rq drain logic · 4fe7c296

由 Steve Wise 提交于 12月 22, 2016

With the addition of the IB/Core drain API, iw_cxgb4 supported drain
by watching the CQs when the QP was out of RTS and signalling "drain
complete" when the last CQE is polled. This, however, doesn't fully
support the drain semantics. Namely, the drain logic is supposed to signal
"drain complete" only when the application has _processed_ the last CQE,
not just removed them from the CQ. Thus a small timing hole exists that
can cause touch after free type bugs in applications using the drain API
(nvmf, iSER, for example). So iw_cxgb4 needs a better solution.

The iWARP Verbs spec mandates that "_at some point_ after the QP is
moved to ERROR", the iWARP driver MUST synchronously fail post_send and
post_recv calls. iw_cxgb4 was currently not allowing any posts once the
QP is in ERROR. This was in part due to the fact that the HW queues for
the QP in ERROR state are disabled at this point, so there wasn't much
else to do but fail the post operation synchronously. This restriction
is what drove the first drain implementation in iw_cxgb4 that has the
above mentioned flaw.

This patch changes iw_cxgb4 to allow post_send and post_recv WRs after
the QP is moved to ERROR state for kernel mode users, thus still adhering
to the Verbs spec for user mode users, but allowing flush WRs for kernel
users. Since the HW queues are disabled, we just synthesize a CQE for
this post, queue it to the SW CQ, and then call the CQ event handler.
This enables proper drain operations for the various storage applications.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fe7c296

17 11月, 2016 1 次提交

iw_cxgb4: invalidate the mr when posting a read_w_inv wr · 5c6b2aaf

由 Steve Wise 提交于 11月 03, 2016

Also, rearrange things a bit to have a common c4iw_invalidate_mr()
function used everywhere that we need to invalidate.

Fixes: 49b53a93 ("iw_cxgb4: add fast-path for small REG_MR operations")
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5c6b2aaf

08 10月, 2016 1 次提交

IB/cxgb4: Move user vendor structures · e44ee2fd

由 Leon Romanovsky 提交于 9月 22, 2016

This patch moves cxgb4 vendor's specific structures to
common UAPI folder which will be visible to all consumers.

These structures are used by user-space library driver
(libcxgb4) and currently manually copied to that library.

This move will allow cross-compile against these files and
simplify introduction of vendor specific data.
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e44ee2fd

16 9月, 2016 1 次提交

libcxgb,iw_cxgb4,cxgbit: add cxgb_compute_wscale() · cc516700

由 Varun Prakash 提交于 9月 13, 2016

Add cxgb_compute_wscale() in libcxgb_cm.h to remove
it's duplicate definitions from cxgb4/cm.c and
cxgbit/cxgbit_cm.c.
Signed-off-by: NVarun Prakash <varun@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc516700

04 9月, 2016 1 次提交

iw_cxgb4: block module unload until all ep resources are released · 37eb816c

由 Steve Wise 提交于 9月 01, 2016

Otherwise an endpoint can be still closing down causing a touch
after free crash.  Also WARN_ON if ulps have failed to destroy
various resources during device removal.

Fixes: ad61a4c7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref")
Reviewed-by: NSagi Grimberg <sagi@grimbrg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>

37eb816c

03 8月, 2016 1 次提交

iw_cxgb4: don't block in destroy_qp awaiting the last deref · ad61a4c7

由 Steve Wise 提交于 7月 29, 2016

Blocking in c4iw_destroy_qp() causes a deadlock when apps destroy a qp
or disconnect a cm_id from their cm event handler function.  There is
no need to block here anyway, so just replace the refcnt atomic with a
kref object and free the memory on the last put.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ad61a4c7

23 6月, 2016 3 次提交

RDMA/iw_cxgb4: Low resource fixes for Completion queue · dd6b0241

由 Hariprasad S 提交于 6月 10, 2016

Pre-allocate buffers to deallocate completion queue, so that completion
queue is deallocated during RDMA termination when system is running
out of memory.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

dd6b0241

RDMA/iw_cxgb4: Low resource fixes for Memory registration · 0f8ab0b6

由 Hariprasad S 提交于 6月 10, 2016

Pre-allocate buffers for deregistering memory region and memory window
during RDMA connection close, when system is running out of memory.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0f8ab0b6

RDMA/iw_cxgb4: Low resource fixes for connection manager · 4a740838

由 Hariprasad S 提交于 6月 10, 2016

Pre-allocate buffers for sending various control messages to close
connection, abort connection, etc so that we gracefully handle
connections when system is running out of memory.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4a740838

14 5月, 2016 4 次提交

RDMA/iw_cxgb4: stop_ep_timer() after MPA negotiation · e4b76a2a

由 Hariprasad S 提交于 5月 06, 2016

->Stop the ep timer after MPA negotiation so that the arp failures
during send_mpa_reply/reject will be handled by process_timeout() after
the ep timer expires.
->Added case MPA_REP_SENT in process_timeout().
->For MPA reject, c4iw_ep_disconnect tries to start an already started
timer, which leads to warning message "timer already started".
-> In case of mpa reject stop the timer and call send_mpa_reject().
-> Added new ep flag STOP_MPA_TIMER to tell fw4_ack() to stop the timer
only for send_mpa_reply(), which is set in c4iw_accept_cr().
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e4b76a2a

RDMA/iw_cxgb4: Add few history bits for ep · 9ca6f7cf

由 Hariprasad S 提交于 5月 06, 2016

- add EP_DISC_FAIL history bit
- add QP_REFED/DEREFED history bits
- Add functions to ref/deref the cm_id and add history bit for the same
- add CLOSE_CON_RPL history
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9ca6f7cf

IB/core: Enhance ib_map_mr_sg() · 9aa8b321

由 Bart Van Assche 提交于 5月 12, 2016

The SRP initiator allows to set max_sectors to a value that exceeds
the largest amount of data that can be mapped at once with an mlx4
HCA using fast registration and a page size of 4 KB. Hence modify
ib_map_mr_sg() such that it can map partial sg-elements. If an
sg-element has been mapped partially, let the caller know
which fraction has been mapped by adjusting *sg_offset.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Tested-by: NLaurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9aa8b321

IB/core: Add passing an offset into the SG to ib_map_mr_sg · ff2ba993

由 Christoph Hellwig 提交于 5月 03, 2016

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ff2ba993

17 3月, 2016 1 次提交

iw_cxgb4: remove port mapper related code · 170003c8

由 Steve Wise 提交于 2月 26, 2016

Now that most of the port mapper code been moved to iwcm, we can remove
it from iw_cxgb4.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

170003c8

02 3月, 2016 1 次提交

IB/core: Add vendor's specific data to alloc mw · b2a239df

由 Matan Barak 提交于 2月 29, 2016

Passing udata to the vendor's driver in order to pass data from the
user-space driver to the kernel-space driver. This data will be
used in downstream patches.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b2a239df

01 3月, 2016 1 次提交

iw_cxgb4: add queue drain functions · 086dc6e3

由 Steve Wise 提交于 2月 17, 2016

Add completion objects, named sq_drained and rq_drained, to the c4iw_qp
struct.  The queue-specific completion object is signaled when the last
CQE is drained from the CQ for that queue.

Add c4iw_drain_sq() to block until qp->rq_drained is completed.

Add c4iw_drain_rq() to block until qp->sq_drained is completed.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

086dc6e3

24 12月, 2015 2 次提交

IB: remove in-kernel support for memory windows · feb7c1e3

由 Christoph Hellwig 提交于 12月 23, 2015

Remove the unused ib_allow_mw and ib_bind_mw functions, remove the
unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw
into the uverbs module.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

feb7c1e3

IB: remove support for phys MRs · b7d3e0a9

由 Christoph Hellwig 提交于 12月 23, 2015

We have stopped using phys MRs in the kernel a while ago, so let's
remove all the cruft used to implement them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core]
Reviewed-By: Devesh Sharma<devesh.sharma@avagotech.com> [ocrdma]
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b7d3e0a9

29 10月, 2015 2 次提交

iw_cxgb4: Remove old FRWR API · d3cfd002

由 Sagi Grimberg 提交于 10月 13, 2015

No ULP uses it anymore, go ahead and remove it.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d3cfd002

iw_cxgb4: Support the new memory registration API · 8376b86d

由 Sagi Grimberg 提交于 10月 13, 2015

Support the new memory registration API by allocating a
private page list array in c4iw_mr and populate it when
c4iw_map_mr_sg is invoked. Also, support IB_WR_REG_MR
by duplicating build_fastreg just take the needed information
from different places:
- page_size, iova, length (ib_mr)
- page array (c4iw_mr)
- key, access flags (ib_reg_wr)

The IB_WR_FAST_REG_MR handlers will be removed later when
all the ULPs will be converted.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8376b86d

31 8月, 2015 1 次提交

iw_cxgb4: Support ib_alloc_mr verb · a2164034

由 Sagi Grimberg 提交于 7月 30, 2015

Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a2164034

13 6月, 2015 1 次提交

IB/core: Change provider's API of create_cq to be extendible · bcf4c1ea

由 Matan Barak 提交于 6月 11, 2015

Add a new ib_cq_init_attr structure which contains the
previous cqe (minimum number of CQ entries) and comp_vector
(completion vector) in addition to a new flags field.
All vendors' create_cq callbacks are changed in order
to work with the new API.

This commit does not change any functionality.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bcf4c1ea

12 6月, 2015 1 次提交

iw_cxgb4: support for bar2 qid densities exceeding the page size · 74217d4c

由 Hariprasad S 提交于 6月 09, 2015

Handle this configuration:

        Queues Per Page * SGE BAR2 Queue Register Area Size > Page Size

Use cxgb4_bar2_sge_qregs() to obtain the proper location within the
bar2 region for a given qid.

Rework the DB and GTS write functions to make use of this bar2 info.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

74217d4c

06 5月, 2015 1 次提交

iw_cxgb4: Remove negative advice dmesg warnings · 179d03bb

由 Hariprasad S 提交于 5月 05, 2015

Remove these log messages in favor of per-endpoint counters as well as
device-global counters that can be inspected via debugfs.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

179d03bb

19 2月, 2015 1 次提交

RDMA/cxgb4: Don't hang threads forever waiting on WR replies · 1fc8190d

由 Hariprasad S 提交于 12月 17, 2014

In c4iw_wait_for_reply(), if a FW6_MSG WR reply is not received after
C4IW_WR_TO seconds, fail the WR operation and mark the device as fatally
dead.  Further, if the device is marked fatally dead, then fail the WR
wait immediately.

Also change the timeout to 60 seconds.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1fc8190d

22 7月, 2014 1 次提交

iw_cxgb4: Don't limit TPTE count to 32KB · 91244bbd

由 Hariprasad Shenai 提交于 7月 21, 2014

Use the size advertised by FW
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91244bbd

16 7月, 2014 3 次提交

cxgb4/iw_cxgb4: work request logging feature · 7730b4c7

由 Hariprasad Shenai 提交于 7月 14, 2014

This commit enhances the iwarp driver to optionally keep a log of rdma
work request timining data for kernel mode QPs.  If iw_cxgb4 module option
c4iw_wr_log is set to non-zero, each work request is tracked and timing
data maintained in a rolling log that is 4096 entries deep by default.
Module option c4iw_wr_log_size_order allows specifing a log2 size to use
instead of the default order of 12 (4096 entries). Both module options
are read-only and must be passed in at module load time to set them. IE:

modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10

The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
Writing anything to this file will clear all the timing data.
Data tracked includes:

- The host time when the work request was posted, just before ringing
the doorbell.  The host time when the completion was polled by the
application.  This is also the time the log entry is created.  The delta
of these two times is the amount of time took processing the work request.

- The qid of the EQ used to post the work request.

- The work request opcode.

- The cqe wr_id field.  For sq completions requests this is the swsqe
index.  For recv completions this is the MSN of the ingress SEND.
This value can be used to match log entries from this log with firmware
flowc event entries.

- The sge timestamp value just before ringing the doorbell when
posting,  the sge timestamp value just after polling the completion,
and CQE.timestamp field from the completion itself.  With these three
timestamps we can track the latency from post to poll, and the amount
of time the completion resided in the CQ before being reaped by the
application.  With debug firmware, the sge timestamp is also logged by
firmware in its flowc history so that we can compute the latency from
posting the work request until the firmware sees it.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7730b4c7

cxgb4/iw_cxgb4: use firmware ord/ird resource limits · 4c2c5763

由 Hariprasad Shenai 提交于 7月 14, 2014

Advertise a larger max read queue depth for qps, and gather the resource limits
from fw and use them to avoid exhaustinq all the resources.

Design:

cxgb4:

Obtain the max_ordird_qp and max_ird_adapter device params from FW
at init time and pass them up to the ULDs when they attach.  If these
parameters are not available, due to older firmware, then hard-code
the values based on the known values for older firmware.
iw_cxgb4:

Fix the c4iw_query_device() to report these correct values based on
adapter parameters.  ibv_query_device() will always return:

max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
max_res_rd_atom = max_ird_adapter

Bump up the per qp max module option to 32, allowing it to be increased
by the user up to the device max of max_ordird_qp.  32 seems to be
sufficient to maximize throughput for streaming read benchmarks.

Fail connection setup if the negotiated IRD exhausts the available
adapter ird resources.  So the driver will track the amount of ird
resource in use and not send an RI_WR/INIT to FW that would reduce the
available ird resources below zero.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c2c5763

iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21

由 Hariprasad Shenai 提交于 7月 14, 2014

Updates iw_cxgb4 to determine the Ingress Padding Boundary from
cxgb4_lld_info, and take subsequent actions.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04e10e21

14 7月, 2014 1 次提交

RDMA/cxgb4: Call iwpm_init() only once · 46c1376d

由 Steve Wise 提交于 6月 20, 2014

We need to only register with the iwpm core once. Currently it is
being done for every adapter, which causes a failure for each adapter
but the first, making multiple adapters unusable.

Fixes: 9eccfe10 ("RDMA/cxgb4: Add support for iWARP Port Mapper user space service")
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

46c1376d

11 6月, 2014 2 次提交

iw_cxgb4: don't truncate the recv window size · b408ff28

由 Hariprasad Shenai 提交于 6月 06, 2014

Fixed a bug that shows up with recv window sizes that exceed the size of
the RCV_BUFSIZ field in opt0 (>= 1024K).  If the recv window exceeds
this, then we specify the max possible in opt0, add add the rest in via
a RX_DATA_ACK credits.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b408ff28

RDMA/cxgb4: Add support for iWARP Port Mapper user space service · 9eccfe10

由 Steve Wise 提交于 3月 26, 2014

Based on original work by Vipul Pandya.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fix htons -> ntohs to make sparse happy.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9eccfe10

29 4月, 2014 1 次提交

RDMA/cxgb4: Fix endpoint mutex deadlocks · cc18b939

由 Steve Wise 提交于 4月 24, 2014

In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
mutex held, they must be called with internal == 1.  rx_data() and
process_mpa_reply() are not doing this.  This causes a deadlock
because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
!internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
The design was intended to only do the disconnect for !internal calls.

Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
internal == 1, and then disconnect only after releasing the mutex.

Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
internal == 1 and set a new attr flag telling it to send a TERMINATE
message.  Previously this was implied by !internal.

Change process_mpa_reply() to return whether the caller should
disconnect after releasing the endpoint mutex.  Now rx_data() will do
the disconnect in the cases where process_mpa_reply() wants to
disconnect after the TERMINATE is sent.

Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
and to send a TERMINATE message if attrs->send_term is 1.

Change abort_connection() to not aquire the ep mutex for setting the
state, and make all calls to abort_connection() do so with the mutex
held.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

cc18b939

12 4月, 2014 1 次提交

RDMA/cxgb4: Use the BAR2/WC path for kernel QPs and T5 devices · fa658a98

由 Steve Wise 提交于 4月 09, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

[ Fix cast from u64* to integer.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fa658a98

21 3月, 2014 2 次提交

RDMA/cxgb4: Save the correct map length for fast_reg_page_lists · eda6d1d1

由 Steve Wise 提交于 3月 19, 2014

We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it. This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().

I found this with dma mapping debugging enabled in the kernel. The
fix is to save the length in the c4iw_fr_page_list struct.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

eda6d1d1

RDMA/cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes · ba32de9d

由 Steve Wise 提交于 3月 19, 2014

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ba32de9d

15 3月, 2014 1 次提交

cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389

由 Steve Wise 提交于 3月 14, 2014

The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD callback from the DB FULL/DROP interrupt handler. This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues. Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count. This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps. This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs. If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP. Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED. As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events. Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list. This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example). QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO. Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain. This prevents resuming to
quickly and overflowing the FIFO. Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops. As the load lightens though, we
resume to normal DB writes directly by user applications.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05eb2389

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功