提交 · 85caafe307a06e4f9993c8f3c994a07374c07831 · openeuler / raspberrypi-kernel

22 6月, 2013 1 次提交

IB/qib: Optimize CQ callbacks · 85caafe3

由 Mike Marciniszyn 提交于 6月 04, 2013

The current workqueue implemention has the following performance
deficiencies on QDR HCAs:

- The CQ call backs tend to run on the CPUs processing the
  receive queues
- The single thread queue isn't optimal for multiple HCAs

This patch adds a dedicated per HCA bound thread to process CQ callbacks.
Reviewed-by: NRamkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

85caafe3

20 7月, 2012 1 次提交

IB/qib: Reduce sdma_lock contention · 551ace12

由 Mike Marciniszyn 提交于 7月 19, 2012

Profiling has shown that sdma_lock is proving a bottleneck for
performance. The situations include:
 - RDMA reads when krcvqs > 1
 - post sends from multiple threads

For RDMA read the current global qib_wq mechanism runs on all CPUs
and contends for the sdma_lock when multiple RMDA read requests are
fielded on differenct CPUs. For post sends, the direct call to
qib_do_send() from multiple threads causes the contention.

Since the sdma mechanism is per port, this fix converts the existing
workqueue to a per port single thread workqueue to reduce the lock
contention in the RDMA read case, and for any other case where the QP
is scheduled via the workqueue mechanism from more than 1 CPU.

For the post send case, This patch modifies the post send code to test
for a non empty sdma engine.  If the sdma is not idle the (now single
thread) workqueue will be used to trigger the send engine instead of
the direct call to qib_do_send().
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

551ace12

18 7月, 2012 1 次提交

IB/qib: Fix QP RCU sparse warnings · 1fb9fed6

由 Mike Marciniszyn 提交于 7月 16, 2012

Commit af061a64 ("IB/qib: Use RCU for qpn lookup") introduced sparse
warnings.

This patch corrects those issues.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1fb9fed6

11 7月, 2012 1 次提交

IB/qib: Fix sparse RCU warnings in qib_keys.c · 7e230177

由 Mike Marciniszyn 提交于 7月 06, 2012

Commit 8aac4cc3 ("IB/qib: RCU locking for MR validation") introduced
new sparse warnings in qib_keys.c.
Acked-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7e230177

09 7月, 2012 2 次提交

IB/qib: RCU locking for MR validation · 8aac4cc3

由 Mike Marciniszyn 提交于 6月 27, 2012

Profiling indicates that MR validation locking is expensive.  The MR
table is largely read-only and is a suitable candidate for RCU locking.

The patch uses RCU locking during validation to eliminate one
lock/unlock during that validation.
Reviewed-by: NMike Heinz <michael.william.heinz@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8aac4cc3

IB/qib: Avoid returning EBUSY from MR deregister · 6a82649f

由 Mike Marciniszyn 提交于 6月 27, 2012

A timing issue can occur where qib_mr_dereg can return -EBUSY if the
MR use count is not zero.

This can occur if the MR is de-registered while RDMA read response
packets are being progressed from the SDMA ring.  The suspicion is
that the peer sent an RDMA read request, which has already been copied
across to the peer.  The peer sees the completion of his request and
then communicates to the responder that the MR is not needed any
longer.  The responder tries to de-register the MR, catching some
responses remaining in the SDMA ring holding the MR use count.

The code now uses a get/put paradigm to track MR use counts and
coordinates with the MR de-registration process using a completion
when the count has reached zero.  A timeout on the delay is in place
to catch other EBUSY issues.

The reference count protocol is as follows:
- The return to the user counts as 1
- A reference from the lk_table or the qib_ibdev counts as 1.
- Transient I/O operations increase/decrease as necessary

A lot of code duplication has been folded into the new routines
init_qib_mregion() and deinit_qib_mregion().  Additionally, explicit
initialization of fields to zero is now handled by kzalloc().

Also, duplicated code 'while.*num_sge' that decrements reference
counts have been consolidated in qib_put_ss().
Reviewed-by: NRamkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6a82649f

15 5月, 2012 1 次提交

IB/qib: Add cache line awareness to qib_qp and qib_devdata structures · 1c94283d

由 Mike Marciniszyn 提交于 5月 07, 2012

This patch reorganizes the QP and devdata files to be more cache line aware.

qib_qp fields in particular are split into read-mostly, send, and receive fields.

qib_devdata fields are split into read-mostly and read/write fields

Testing has show that bidirectional tests improve by as much as 100%
with this patch.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1c94283d

22 10月, 2011 3 次提交

IB/qib: Precompute timeout jiffies to optimize latency · d0f2faf7

由 Mike Marciniszyn 提交于 9月 23, 2011

A new field is added to qib_qp called timeout_jiffies. It is
initialized upon create and modify.

The field is now used instead of a computation based on qp->timeout.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d0f2faf7

IB/qib: Use RCU for qpn lookup · af061a64

由 Mike Marciniszyn 提交于 9月 23, 2011

The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU.
The hash list itself is now accessed via jhash functions instead of mod.

The changes should benefit multiple receive contexts in different
processors by not contending for the lock just to read the hash
structures.

The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in
the context.  The interrupt handler will test the current packet's qpn
against lookaside_qpn if the lookaside_qp pointer is non-NULL.  The
pointer is NULL'ed when the interrupt handler exits.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

af061a64

IB/qib: Decode path MTU optimization · cc6ea138

由 Mike Marciniszyn 提交于 9月 23, 2011

Store both the encoded and decoded MTU in the QP structure as a minor
optimization for UC/RC receive routines.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

cc6ea138

17 1月, 2011 1 次提交

RDMA: Update workqueue usage · f0626710

由 Tejun Heo 提交于 10月 19, 2010

* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f0626710

11 1月, 2011 2 次提交

IB/qib: RDMA lkey/rkey validation is inefficient for large MRs · 2a600f14

由 Mike Marciniszyn 提交于 1月 10, 2011

The current code loops during rkey/lkey validiation to isolate the MR
for the RDMA, which is expensive when the current operation is inside
a very large memory region.

This fix optimizes rkey/lkey validation routines for user memory
regions and fast memory regions.  The MR entry can be isolated by
shifts/mods instead of looping.  The existing loop is preserved for
phys memory regions for now.
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2a600f14

IB/qib: Change receive queue/QPN selection · 2528ea60

由 Mike Marciniszyn 提交于 1月 10, 2011

The basic idea is that on SusieQ, the difficult part of mapping QPN to
context is handled by the mapping registers so the generic QPN
allocation doesn't need to worry about chip specifics.  For Monty and
Linda, there is no mapping table so the qpt->mask (same as
dd->qpn_mask), is used to see if the QPN to context falls within
[zero..dd->n_krcv_queues).
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2528ea60

24 5月, 2010 1 次提交

IB/qib: Add new qib driver for QLogic PCIe InfiniBand adapters · f931551b

由 Ralph Campbell 提交于 5月 23, 2010

Add a low-level IB driver for QLogic PCIe adapters.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f931551b