提交 · b3de6cfebc6167761c40947f05f4c817531f37d5 · openanolis / cloud-kernel

15 2月, 2013 10 次提交

RDMA/cxgb4: Insert hwtid in pass_accept_req instead in pass_establish · b3de6cfe

由 Vipul Pandya 提交于 1月 07, 2013

CPL_ABORT_REQ_RSS can come before TCP connection is established.  In
such case peer_abort was trying to remove the hwtid, which was not
inserted.  To avoid this we insert the hwtid when we are sure that we
are surely going to send passive accept request.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b3de6cfe

RDMA/cxgb4: Don't wakeup threads for MPAv2 · 7c0a33d6

由 Vipul Pandya 提交于 1月 07, 2013

Don't wakeup threads blocked in rdma_init/rdma_fini if we are on
MPAv2, and want to retry connection with MPAv1.

Stop ep-timer on getting MPA version mismatch, before doing the
abort_connection - in process_mpa_request.

Take care to stop ep-timer in error paths for process_mpa_request.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7c0a33d6

RDMA/cxgb4: Don't reconnect on abort for mpa_rev 1 · fe7e0a4d

由 Vipul Pandya 提交于 1月 07, 2013

Only reconnect if the endpoint wasn't freed.

peer_abort() should only attempt to reconnect if the endpoint wasn't
freed.  Also remove hwtid from the debugfs idr.

Add missing check for peer2peer in MPAv2 code

Use correct mpa version on reject.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fe7e0a4d

RDMA/cxgb4: Fix endpoint timeout race condition · 1ec779cc

由 Vipul Pandya 提交于 1月 07, 2013

The endpoint timeout logic had a race that could cause an endpoint
object to be freed while it was still on the timedout list.  This
can happen if the timer is stopped after it had fired, but before
the timedout thread processed the endpoint timeout.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1ec779cc

RDMA/cxgb4: Only log rx_data warnings if cpl status is non-zero · e8e5b927

由 Vipul Pandya 提交于 1月 07, 2013

With newer firmware, we can get streaming data due to connection
errors before the driver moves the QP out of RTS.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

e8e5b927

RDMA/cxgb4: Always log async errors · 04236df2

由 Vipul Pandya 提交于 1月 07, 2013

Log AEs even if the QP isn't in RTS.  It is useful information.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

04236df2

RDMA/cxgb4: Keep QP referenced until TID released · 325abead

由 Vipul Pandya 提交于 1月 07, 2013

The driver is currently releasing the last ref on the QP too early.
This can cause bus errors due to HW still fetching WRs from the HW
queue.  The fix is to keep a qp ref until we release the HW TID.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

325abead

RDMA/cxgb4: Display streaming mode error only if detected in RTS · 1557967b

由 Vipul Pandya 提交于 1月 07, 2013

With later firmware, the chances of getting streaming mode data after
we exit RTS is likely, so we don't need to warn for it.  The only real
case where we don't expect it is when the QP is in RTS.

Move QP to ERROR when streaming mode data received.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1557967b

RDMA/cxgb4: Abort connections when moving to ERROR state · 91e9c071

由 Vipul Pandya 提交于 1月 07, 2013

If a FINI operation fails, then we need to ABORT instead of CLOSE.
Also, if we ABORT due to unexpected STREAMING data, then wake up
anybody blocked in FINI...
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

91e9c071

RDMA/cxgb4: Abort connections that receive unexpected streaming mode data · 55abf8df

由 Vipul Pandya 提交于 1月 07, 2013

This error means the RDMA connection was knocked out of RDMA mode,
probably due to an error on the connection.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

55abf8df

06 2月, 2013 2 次提交

IB/qib: Fix for broken sparse warning fix · d359f354

由 Mike Marciniszyn 提交于 1月 24, 2013

Commit 1fb9fed6 ("IB/qib: Fix QP RCU sparse warning") broke QP
hash list deletion in qp_remove() badly.

This patch restores the former for loop behavior, while still fixing
the sparse warnings.

Cc: <stable@vger.kernel.org>
Reviewed-by: NGary Leshner <gary.s.leshner@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d359f354

IPoIB: Fix crash due to skb double destruct · 7e5a90c2

由 Shlomo Pongratz 提交于 2月 04, 2013

After commit b13912bb ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie

    iperf -c <IP> -P 16 -t 3600

results in a crash.

After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())

The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()."  This leads to running the
skb cleanup routines simultaneously in two different contexts.

The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request.  If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.
Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7e5a90c2

04 1月, 2013 1 次提交

Drivers: infinband: remove __dev* attributes. · 1e6d9abe

由 Greg Kroah-Hartman 提交于 12月 21, 2012

CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
markings need to be removed.

This change removes the use of __devinit, __devexit_p, __devinitdata,
and __devexit from these drivers.

Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.

Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1e6d9abe

20 12月, 2012 4 次提交

RDMA/cxgb4: Fix bug for active and passive LE hash collision path · 793dad94

由 Vipul Pandya 提交于 12月 10, 2012

Retries active opens for INUSE errors.

Logs any active ofld_connect_wr error replies.

Sends ofld_connect_wr on same ctrlq. It needs to go  on the same control txq as
regular CPL active/passive messages.

Retries on active open replies with EADDRINUSE.

Uses active open fw wr only if active filter region is set.

Adds stat for ofld_connect_wr failures.

This patch also adds debugfs file to show endpoints.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

793dad94

RDMA/cxgb4: Fix LE hash collision bug for passive open connection · 1cab775c

由 Vipul Pandya 提交于 12月 10, 2012

It establishes passive open connection through firmware work request. Passive
open connection will go through this path as now instead of listening server we
create a server filter which will redirect the incoming SYN packet to the
offload queue. After this driver tries to establish the connection using
firmware work request.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1cab775c

RDMA/cxgb4: Fix LE hash collision bug for active open connection · 5be78ee9

由 Vipul Pandya 提交于 12月 10, 2012

It enables establishing active open connection using fw_ofld_connection work
request when cpl_act_open_rpl says TCAM full error which may be because
of LE hash collision. Current support is only for IPv4 active open connections.

Sets ntuple bits in active open requests. For T4 firmware greater than 1.4.10.0
ntuple bits are required to be set.

Adds nocong and enable_ecn module parameter options.
Signed-off-by: NVipul Pandya <vipul@chelsio.com>

[ Move all FW return values to t4fw_api.h.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5be78ee9

IPoIB: Call skb_dst_drop() once skb is enqueued for sending · b13912bb

由 Roland Dreier 提交于 12月 19, 2012

Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently.  It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).

However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet.  This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst.  We've seen this lead
to pathological behavior with respect to route and neighbour GC.  The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().

Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b13912bb

08 12月, 2012 3 次提交

T
RDMA/nes: Fix for crash when registering zero length MR for CQ · 7d9c199a
由 Tatyana Nikolova 提交于 12月 06, 2012
```
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>
```
7d9c199a

RDMA/nes: Fix for terminate timer crash · 7bfcfa51

由 Tatyana Nikolova 提交于 12月 06, 2012

The terminate timer needs to be initialized just once.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7bfcfa51

RDMA/nes: Fix for BUG_ON due to adding already-pending timer · 00ad255d

由 Tatyana Nikolova 提交于 12月 06, 2012

To avoid nes tcp_timer crash for SMP architectures, add_timer is
replaced with mod_timer.
Signed-off-by: NTatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

00ad255d

01 12月, 2012 12 次提交

IB/srp: Allow SRP disconnect through sysfs · dc1bdbd9

由 Bart Van Assche 提交于 9月 16, 2011

Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.

Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

dc1bdbd9

IB/srp: send disconnect request without waiting for CM timewait exit · 55d93898

由 Vu Pham 提交于 11月 26, 2012

Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.
Signed-off-by: NVu Pham <vu@mellanox.com>
Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

55d93898

IB/srp: destroy and recreate QP and CQs when reconnecting · 73aa89ed

由 Ishai Rabinovitz 提交于 11月 26, 2012

HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.
Signed-off-by: NIshai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>

[ updated to current code from OFED, cleaned up commit message ]
Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

73aa89ed

IB/srp: Eliminate state SRP_TARGET_DEAD · ef6c49d8

由 Bart Van Assche 提交于 12月 26, 2011

Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED.  That allows us to remove the state
SRP_TARGET_DEAD.  Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected.  This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target().  Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ef6c49d8

IB/srp: Introduce the helper function srp_remove_target() · ee12d6a8

由 Bart Van Assche 提交于 12月 25, 2011

Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ee12d6a8

IB/srp: Suppress superfluous error messages · 294c875a

由 Bart Van Assche 提交于 12月 25, 2011

Keep track of the connection state.  Only report QP errors while
connected.  Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

294c875a

IB/srp: Process all error completions · 4f0af697

由 Bart Van Assche 提交于 11月 26, 2012

If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

4f0af697

IB/srp: Introduce srp_handle_qp_err() · 948d1e88

由 Bart Van Assche 提交于 9月 03, 2011

Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

948d1e88

IB/srp: Simplify SCSI error handling · 224db157

由 Bart Van Assche 提交于 10月 24, 2012

Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

224db157

IB/srp: Keep processing commands during host removal · f3718231

由 Bart Van Assche 提交于 4月 19, 2012

Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f3718231

IB/srp: Eliminate state SRP_TARGET_CONNECTING · 09be70a2

由 Bart Van Assche 提交于 3月 17, 2012

Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state.  This allows us
to eliminate the target state SRP_TARGET_CONNECTING.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

09be70a2

IB/srp: Increase block layer timeout · c9b03c1a

由 Bart Van Assche 提交于 9月 03, 2011

Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Acked-by: NDavid Dillow <dillowda@ornl.gov>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c9b03c1a

30 11月, 2012 2 次提交

RDMA/cm: Change return value from find_gid_port() · 63f05be2

由 shefty 提交于 11月 28, 2012

Problem reported by Dan Carpenter <dan.carpenter@oracle.com>:

The patch 3c86aa70: "RDMA/cm: Add RDMA CM support for IBoE
devices" from Oct 13, 2010, leads to the following warning:
net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create()
	 error: passing non neg 1 to ERR_PTR

This bug would result in a NULL dereference.  svc_rdma_create() is
supposed to return ERR_PTRs or valid pointers, but instead it returns
ERR_PTRs, valid pointers and 1.

The call tree is:

svc_rdma_create()
   => rdma_bind_addr()
      => cma_acquire_dev()
         => find_gid_port()

rdma_bind_addr() should return a valid errno.  Fix this by having
find_gid_port() also return a valid errno.  If we can't find the
specified GID on a given port, return -EADDRNOTAVAIL, rather than
-EAGAIN, to better indicate the error.  We also drop using the
special return value of '1' and instead pass through the error
returned by the underlying verbs call.  On such errors, rather
than aborting the search,  we simply continue to check the next
device/port.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

63f05be2

IB/mlx4: Fix spinlock order to avoid lockdep warnings · ceb7decb

由 Jack Morgenstein 提交于 11月 27, 2012

lockdep warns about taking a hard-irq-unsafe lock (sriov->id_map_lock)
inside a hard-irq-safe lock (sriov->going_down_lock).

Since id_map_lock is never taken in the interrupt context, we can
simply reverse the order of taking the two spinlocks, thus avoiding
the warning and the depencency.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ceb7decb

29 11月, 2012 1 次提交

ib_srpt: Convert TMR path to target_submit_tmr · 3e4f5748

由 Nicholas Bellinger 提交于 11月 28, 2012

This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.

v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

3e4f5748

28 11月, 2012 1 次提交

ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref · 9474b043

由 Nicholas Bellinger 提交于 11月 27, 2012

This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage.  This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.

It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.

Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

9474b043

27 11月, 2012 3 次提交

RDMA/cxgb3: use WARN · 5107c2a3

由 Julia Lawall 提交于 11月 03, 2012

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5107c2a3

RDMA/cxgb4: use WARN · 76f267b7

由 Julia Lawall 提交于 11月 03, 2012

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Acked-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

76f267b7

mlx4: 64-byte CQE/EQE support · 08ff3235

由 Or Gerlitz 提交于 10月 21, 2012

ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

08ff3235

22 11月, 2012 1 次提交

RDMA/amsol1100: Fix missing break · c9795bd7

由 Alan Cox 提交于 10月 25, 2012

Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c9795bd7

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功