提交 · d2ca39f262806aa2f035f680a14aa55ff9e3d889 · openanolis / cloud-kernel

09 4月, 2009 1 次提交

RDMA/cma: Create cm id even when IB port is down · d2ca39f2

由 Yossi Etigin 提交于 4月 08, 2009

When doing rdma_resolve_addr(), if the relevant IB port is down, the
function fails and the cm_id is not bound to the correct device.
Therefore, application does not have a device handle and cannot wait
for the port to become active.  The function fails because the
underlying IPoIB interface is not joined to the broadcast group and
therefore the SA does not have a multicast record to take a Q_Key
from.

The fix is to use lazy Q_Key resolution - cma_set_qkey() will set
id_priv->qkey if it was not set, and will be called just before the
Q_Key is really required.
Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

d2ca39f2

02 4月, 2009 1 次提交

RDMA/cma: Use rate from IPoIB broadcast when joining IPoIB multicast groups · 84adeee9

由 Yossi Etigin 提交于 4月 01, 2009

When joining an IPoIB multicast group, use the same rate as in the
broadcast group.  Otherwise, if the RDMA CM creates this group before
IPoIB does, it might get a different rate.  This will cause IPoIB to
fail joining to the same group later on, because IPoIB uses strict
rate selection.
Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

84adeee9

05 3月, 2009 1 次提交

IB: Remove useless ibdev_is_alive() tests from sysfs code · 6432f366

由 Roland Dreier 提交于 3月 04, 2009

Some attribute show functions test ibdev_is_alive() to make sure that
it's OK to access device state.  However, the sysfs attributes will
not be registered until the device is fully initialized, and they'll
be unregistered before anything is torn down, so ibdev_is_alive()
doesn't do anything useful.  Remove it.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6432f366

04 3月, 2009 2 次提交

IB/sa_query: Fix AH leak due to update_sm_ah() race · 6b708b3d

由 Jack Morgenstein 提交于 3月 03, 2009

Our testing uncovered a race condition in ib_sa_event():

	spin_lock_irqsave(&port->ah_lock, flags);
	if (port->sm_ah)
		kref_put(&port->sm_ah->ref, free_sm_ah);
	port->sm_ah = NULL;
	spin_unlock_irqrestore(&port->ah_lock, flags);

	schedule_work(&sa_dev->port[event->element.port_num -
				    sa_dev->start_port].update_task);

If two events occur back-to-back (e.g., client-reregister and LID
change), both may pass the spinlock-protected code above before the
scheduled work updates the port->sm_ah handle.  Then if the scheduled
work ends up running twice, the second operation will then find a
non-NULL port->sm_ah, and will simply overwrite it in update_sm_ah --
resulting in an AH leak.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6b708b3d

IB/mad: Fix ib_post_send_mad() returning 0 with no generate send comp · 4780c195

由 Ralph Campbell 提交于 3月 03, 2009

If ib_post_send_mad() returns 0, the API guarantees that there will be
a callback to send_buf->mad_agent->send_handler() so that the sender
can call ib_free_send_mad().  Otherwise, the ib_mad_send_buf will be
leaked and the mad_agent reference count will never go to zero and the
IB device module cannot be unloaded.  The above can happen without
this patch if process_mad() returns (IB_MAD_RESULT_SUCCESS |
IB_MAD_RESULT_CONSUMED).

If process_mad() returns IB_MAD_RESULT_SUCCESS and there is no agent
registered to receive the mad being sent, handle_outgoing_dr_smp()
returns zero which causes a MAD packet which is at the end of the
directed route to be incorrectly sent on the wire but doesn't cause a
hang since the HCA generates a send completion.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4780c195

28 2月, 2009 2 次提交

IB/mad: initialize mad_agent_priv before putting on lists · d9620a4c

由 Ralph Campbell 提交于 2月 27, 2009

There is a potential race in ib_register_mad_agent() where the struct
ib_mad_agent_private is not fully initialized before it is added to
the list of agents per IB port. This means the ib_mad_agent_private
could be seen before the refcount, spin locks, and linked lists are
initialized. The fix is to initialize the structure earlier.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

d9620a4c

IB/mad: Fix null pointer dereference in local_completions() · 1d9bc6d6

由 Ralph Campbell 提交于 2月 27, 2009

handle_outgoing_dr_smp() can queue a struct ib_mad_local_private
*local on the mad_agent_priv->local_work work queue with
local->mad_priv == NULL if device->process_mad() returns
IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY and
(!ib_response_mad(&mad_priv->mad.mad) ||
!mad_agent_priv->agent.recv_handler).

In this case, local_completions() will be called with local->mad_priv
== NULL. The code does check for this case and skips calling
recv_mad_agent->agent.recv_handler() but recv == 0 so
kmem_cache_free() is called with a NULL pointer.

Also, since recv isn't reinitialized each time through the loop, it
can cause a memory leak if recv should have been zero.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>

1d9bc6d6

26 2月, 2009 1 次提交

IB: Remove sysfs files before unregistering device · 9206dff1

由 Roland Dreier 提交于 2月 25, 2009

Move the ib_device_unregister_sysfs() call from ib_dealloc_device() to
ib_unregister_device(). The old code allows device unregister to
proceed even if some sysfs files are open, which leaves a window where
userspace can open a file before a device is removed but then end up
reading the file after the device is removed, which leads to various
kernel crashes either because the device data structure is freed or
because the low-level driver code is gone after module removal.

By not returning from ib_unregister_device() until after all sysfs
entries are removed, we make sure that data structures and/or module
code is not freed until after all sysfs access is done.
Reported-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9206dff1

18 1月, 2009 1 次提交

IB: Remove __constant_{endian} uses · 9c3da099

由 Harvey Harrison 提交于 1月 17, 2009

The base versions handle constant folding just fine, use them
directly.  The replacements are OK in the include/ files as they are
not exported to userspace so we don't need the __ prefixed versions.

This patch does not affect code generation at all.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9c3da099

07 1月, 2009 1 次提交

infiniband: struct device - replace bus_id with dev_name(), dev_set_name() · d927e38c

由 Kay Sievers 提交于 1月 06, 2009

Acked-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

d927e38c

30 12月, 2008 1 次提交

RDMA/addr: Fix build breakage when IPv6 is disabled · 2c4ab624

由 Roland Dreier 提交于 12月 29, 2008

Commit 38617c64 ("RDMA/addr: Add support for translating IPv6
addresses") broke the build when CONFIG_IPV6=n, because the ib_addr
module unconditionally attempted to call ipv6_chk_addr() and other
IPv6 functions that are not defined when IPv6 is disabled.  Fix this
by only building IPv6 support if CONFIG_IPV6 is turned on, and
add a Kconfig dependency to prevent the ib_addr code from being built
in when IPv6 is built modular.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2c4ab624

25 12月, 2008 2 次提交

RDMA/cma: Add IPv6 support · 1f5175ad

由 Aleksey Senin 提交于 12月 24, 2008

Handle AF_INET6 cases where required, and use struct sockaddr_storage
wherever an IPv6 address might be stored.
Signed-off-by: NAleksey Senin <aleksey@alst60.(none)>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1f5175ad

RDMA/addr: Add support for translating IPv6 addresses · 38617c64

由 Aleksey Senin 提交于 12月 24, 2008

Add support for translating AF_INET6 addresses to the IB address
translation service.  This requires using struct sockaddr_storage
instead of struct sockaddr wherever an IPv6 address might be stored,
and adding cases to handle IPv6 in addition to IPv4 to the various
translation functions.
Signed-off-by: NAleksey Senin <aleksey@alst60.(none)>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

38617c64

02 11月, 2008 1 次提交

saner FASYNC handling on file close · 233e70f4

由 Al Viro 提交于 10月 31, 2008

As it is, all instances of ->release() for files that have ->fasync()
need to remember to evict file from fasync lists; forgetting that
creates a hole and we actually have a bunch that *does* forget.

So let's keep our lives simple - let __fput() check FASYNC in
file->f_flags and call ->fasync() there if it's been set.  And lose that
crap in ->release() instances - leaving it there is still valid, but we
don't have to bother anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

233e70f4

30 10月, 2008 1 次提交

net: replace %p6 with %pI6 · 5b095d98

由 Harvey Harrison 提交于 10月 29, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b095d98

29 10月, 2008 1 次提交

infiniband: use %p6 for printing message ids · 8867cd7c

由 Harvey Harrison 提交于 10月 28, 2008

Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8867cd7c

20 10月, 2008 1 次提交

x86: sysfs: kill owner field from attribute · 01e8ef11

由 Parag Warudkar 提交于 10月 18, 2008

Tejun's commit 7b595756 made sysfs
attribute->owner unnecessary.  But the field was left in the structure to
ease the merge.  It's been over a year since that change and it is now
time to start killing attribute->owner along with its users - one arch at
a time!

This patch is attempt #1 to get rid of attribute->owner only for
CONFIG_X86_64 or CONFIG_X86_32 .  We will deal with other arches later on
as and when possible - avr32 will be the next since that is something I
can test.  Compile (make allyesconfig / make allmodconfig / custom config)
and boot tested.

akpm: the idea is that we put the declaration of sttribute.owner inside
`#ifndef CONFIG_X86'.  But that proved to be too ambitious for now because
new usages kept on turning up in subsystem trees.

[akpm: remove the ifdef for now]
Signed-off-by: NParag Warudkar <parag.lkml@gmail.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Jean Delvare <khali@linux-fr.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

01e8ef11

17 10月, 2008 1 次提交

device create: infiniband: convert device_create_drvdata to device_create · 91bd418f

由 Greg Kroah-Hartman 提交于 7月 21, 2008

Now that device_create() has been audited, rename things back to the
original call to be sane.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

91bd418f

15 10月, 2008 1 次提交

IB/mad: Use krealloc() to resize snoop table · 52805174

由 Roland Dreier 提交于 10月 14, 2008

Use krealloc() instead of kmalloc() followed by memcpy() when resizing
the MAD module's snoop table.

Also put parentheses around the new table size to avoid calculating
the wrong size to allocate, which fixes a bug pointed out by Haven
Hash <haven.hash@isilon.com>.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

52805174

11 10月, 2008 1 次提交

RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR() · 6aea938f

由 Julien Brunel 提交于 10月 10, 2008

In case of error, the function ucma_alloc_multicast() returns a NULL
pointer, but never returns an ERR pointer.  So after a call to this
function, an IS_ERR test should be replaced by a NULL test.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match bad_is_err_test@
expression x, E;
@@

x = ucma_alloc_multicast(...)
... when != x = E
IS_ERR(x)
// </smpl>
Signed-off-by: NJulien Brunel <brunel@diku.dk>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

6aea938f

01 10月, 2008 1 次提交

IB/cm: Correctly free cm_device structure · a7e80ce2

由 Hefty, Sean 提交于 9月 30, 2008

commit 110cf374 ("infiniband: make cm_device use a struct device and
not a kobject.") introduced a memory leak, since it deleted
cm_release_dev_obj(), which was where cm_dev was freed.  Fix this by
freeing the leaked structure after calling device_unregister().
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a7e80ce2

21 9月, 2008 1 次提交

IB/mad: Don't discard BMA responses in kernel · 7097228c

由 Michael Brooks 提交于 9月 20, 2008

This fixes the problem of incoming BMA responses being dropped due to
a bad "is response" check.  Fix the test to use the ib_response_mad()
predicate, which correctly handles BMA MADs.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=988>.
Signed-off-by: NMichael Brooks <michael.brooks@qlogic.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7097228c

08 8月, 2008 1 次提交

IB/mad: Test ib_create_send_mad() return with IS_ERR(), not == NULL · cd55ef5a

由 Julien Brunel 提交于 8月 07, 2008

In case of error, the function ib_create_send_mad() returns an ERR
pointer, but never returns a NULL pointer.  So testing the return
value for error should be done with IS_ERR, not by comparing with
NULL.

A simplified version of the semantic patch that makes this change is
as follows:

(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@correct_null_test@
expression x,E;
statement S1, S2;
@@
x = ib_create_send_mad(...)
<... when != x = E
if (
(
- x@p2 != NULL
+ ! IS_ERR ( x )
|
- x@p2 == NULL
+ IS_ERR( x )
)
 )
S1
else S2
...>
? x = E;
// </smpl>
Signed-off-by: NJulien Brunel <brunel@diku.dk>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

cd55ef5a

05 8月, 2008 1 次提交

RDMA/cma: Remove padding arrays by using struct sockaddr_storage · 3f446754

由 Roland Dreier 提交于 8月 04, 2008

There are a few places where the RDMA CM code handles IPv6 by doing

	struct sockaddr		addr;
	u8			pad[sizeof(struct sockaddr_in6) -
				    sizeof(struct sockaddr)];

This is fragile and ugly; handle this in a better way with just

	struct sockaddr_storage	addr;

[ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
  switch to struct sockaddr_storage and get rid of padding arrays in
  struct rdma_addr. ]
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

3f446754

25 7月, 2008 2 次提交

RDMA/ucm: BKL is not needed for ib_ucm_open() · 5ba18b18

由 Roland Dreier 提交于 7月 24, 2008

Remove explicit cycle_kernel_lock() call and document why the code is safe.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5ba18b18

RDMA/ucma: BKL is not needed for ucma_open() · f7a6117e

由 Roland Dreier 提交于 7月 24, 2008

Remove explicit lock_kernel() calls and document why the code is safe.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f7a6117e

23 7月, 2008 4 次提交

RDMA/iwcm: Remove IB_ACCESS_LOCAL_WRITE from remote QP attributes · 1ca8d156

由 Dotan Barak 提交于 7月 22, 2008

Remove IB_ACCESS_LOCAL_WRITE from qp.qp_access_flags because this
attribute is only used to set remote permissions.
Signed-off-by: NDotan Barak <dotanba@gmail.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

1ca8d156

IB/sa_query: Check if sm_ah is NULL in ib_sa_remove_one() · 64b784b5

由 Ralph Campbell 提交于 7月 22, 2008

If update_sm_ah() fails, it leaves the port's sm_ah as NULL.  Then if
the device or module is removed, ib_sa_remove_one() will dereference a
NULL pointer when it calls kref_put().  Fix this by testing if sm_ah
is NULL before dropping the reference.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

64b784b5

RDMA/cma: Add RDMA_CM_EVENT_TIMEWAIT_EXIT event · 38ca83a5

由 Amir Vadai 提交于 7月 22, 2008

Consumers that want to re-use their QPs in new connections need to
know when the QP has exited the timewait state.  Report the timewait
event through the rdma_cm.
Signed-off-by: NAmir Vadai <amirv@mellanox.co.il>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

38ca83a5

RDMA/cma: Add RDMA_CM_EVENT_ADDR_CHANGE event · dd5bdff8

由 Or Gerlitz 提交于 7月 22, 2008

Add an RDMA_CM_EVENT_ADDR_CHANGE event can be used by rdma-cm
consumers that wish to have their RDMA sessions always use the same
links (eg <hca/port>) as the IP stack does.  In the current code, this
does not happen when bonding is used and fail-over happened but the IB
link used by an already existing session is operating fine.

Use the netevent notification for sensing that a change has happened
in the IP stack, then scan the rdma-cm ID list to see if there is an
ID that is "misaligned" with respect to the IP stack, and deliver
RDMA_CM_EVENT_ADDR_CHANGE for this ID.  The consumer can act on the
event or just ignore it.
Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

dd5bdff8

22 7月, 2008 2 次提交

infiniband: make cm_device use a struct device and not a kobject. · 110cf374

由 Greg Kroah-Hartman 提交于 5月 27, 2008

This object really should be a struct device, or at least contain a
pointer to a struct device, as it is trying to create a separate device
tree outside of the main device tree.  This patch fixes this problem.

It is needed for the class core rework that is being done in the driver
core.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

110cf374

infiniband: rename "device" to "ib_device" in cm_device · d4c4196f

由 Greg Kroah-Hartman 提交于 5月 27, 2008

This pointer really is a struct ib_device, not a struct device, so name
it properly to help prevent confusion.

This makes the followon patch in this series much smaller and easier to
understand as well.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

d4c4196f

15 7月, 2008 8 次提交

RDMA/cma: Simplify locking needed for serialization of callbacks · de910bd9

由 Or Gerlitz 提交于 7月 14, 2008

The RDMA CM has some logic in place to make sure that callbacks on a
given CM ID are delivered to the consumer in a serialized manner.
Specifically it has code to protect against a device removal racing
with a running callback function.

This patch simplifies this logic by using a mutex per ID instead of a
wait queue and atomic variable.  This means that cma_disable_remove()
now is more properly named to cma_disable_callback(), and
cma_enable_remove() can now be removed because it just would become a
trivial wrapper around mutex_unlock().
Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

de910bd9

RDMA/addr: Keep pointer to netdevice in struct rdma_dev_addr · 64c5e613

由 Or Gerlitz 提交于 7月 14, 2008

Keep a pointer to the local (src) netdevice in struct rdma_dev_addr,
and copy it in as part of rdma_copy_addr().  Use rdma_translate_ip()
in cma_new_conn_id() to reduce some code duplication and also make
sure the src_dev member gets set.

In a high-availability configuration the netdevice pointer can be used
by the RDMA CM to align RDMA sessions to use the same links as the IP
stack does under fail-over and route change cases.
Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

64c5e613

RDMA/core: Add iWARP protocol statistics attributes in sysfs · 7f624d02

由 Steve Wise 提交于 7月 14, 2008

This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7f624d02

R
RDMA/cma: Add missing newlines to printk()s · 468f2239
由 Roland Dreier 提交于 7月 14, 2008
```
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
```
468f2239

IB/core: Reset to error QP state transition is not allowed · e5a5e7d5

由 Ralph Campbell 提交于 7月 14, 2008

I was reviewing the QP state transition diagram in the IB 1.2.1 spec
and the code for qp_state_table[], and noticed that the code allows a
QP to be modified from IB_QPS_RESET to IB_QPS_ERR whereas the notes
for figure 124 (pg 457) specifically says that this transition isn't
allowed.  This is a clarification from earlier versions of the IB
spec, which were ambiguous in this area and suggested that the RESET
to ERR transition was allowed.

Fix up the qp_state_table[] to make RESET->ERR not allowed.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e5a5e7d5

RDMA/core: Add memory management extensions support · 00f7ec36

由 Steve Wise 提交于 7月 14, 2008

This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

00f7ec36

RDMA: Remove subversion $Id tags · f3781d2e

由 Roland Dreier 提交于 7月 14, 2008

They don't get updated by git and so they're worse than useless.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f3781d2e

IB/sa: Fail requests made while creating new SM AH · 164ba089

由 Moni Shoua 提交于 7月 14, 2008

This patch solves a race that occurs after an event occurs that causes
the SA query module to flush its SM address handle (AH).  When SM AH
becomes invalid and needs an update it is handled by the global
workqueue.  On the other hand this event is also handled in the IPoIB
driver by queuing work in the ipoib_workqueue that does multicast
joins.  Although queuing is in the right order, it is done to 2
different workqueues and so there is no guarantee that the first to be
queued is the first to be executed.

This causes a problem because IPoIB may end up sending an request to
the old SM, which will take a long time to time out (since the old SM
is gone); this leads to a much longer than necessary interruption in
multicast traffer.

The patch sets the SA query module's SM AH to NULL when the event
occurs, and until update_sm_ah() is done, any request that needs sm_ah
fails with -EAGAIN return status.

For consumers, the patch doesn't make things worse.  Before the patch,
MADs are sent to the wrong SM so the request gets lost.  Consumers can
be improved if they examine the return code and respond to EAGAIN
properly but even without an improvement the situation is not getting
worse.
Signed-off-by: NMoni Levy <monil@voltaire.com>
Signed-off-by: NMoni Shoua <monis@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

164ba089

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功