提交 · ef70044647b260cb6b7863f392384a06670d0b2a · openanolis / cloud-kernel

14 10月, 2011 10 次提交

IB/cm: Update XRC support based on XRC annex errata · ef700446

由 Sean Hefty 提交于 8月 02, 2011

The XRC annex was updated to have XRC behave more like RD. Specifically,
the XRC TGT QPN moves from the local QPN to local EECN field.  Lookup of
SRQN is done using the REQ/REP protocol.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ef700446

IB/cm: Update protocol to support XRC · d26a360b

由 Sean Hefty 提交于 5月 13, 2011

Update the REQ and REP messages to support XRC connection setup
according to the XRC Annex.  Several existing fields must be set to 0 or
1 when connecting XRC QPs, and a reserved field is changed to an
extended transport type.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d26a360b

RDMA/uverbs: Export XRC TGT QPs to user space · b93f3c18

由 Sean Hefty 提交于 5月 27, 2011

Allow user space to operate on XRC TGT QPs the same way as other types
of QPs, with one notable exception: since XRC TGT QPs may be shared
among multiple processes, the XRC TGT QP is allowed to exist beyond the
lifetime of the creating process.

The process that creates the QP is allowed to destroy it, but if the
process exits without destroying the QP, then the QP will be left bound
to the lifetime of the XRCD.

TGT QPs are not associated with CQs or a PD.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b93f3c18

RDMA/uverbs: Export XRC INI QPs to userspace · 9977f4f6

由 Sean Hefty 提交于 5月 26, 2011

XRC INI QPs are similar to send only RC QPs.  Allow user space to create
INI QPs.  Note that INI QPs do not require receive CQs.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

9977f4f6

RDMA/uverbs: Export XRC SRQs to user space · 8541f8de

由 Sean Hefty 提交于 5月 25, 2011

We require additional information to create XRC SRQs than we can
exchange using the existing create SRQ ABI.  Provide an enhanced create
ABI for extended SRQ types.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
and Roland Dreier <roland@purestorage.com>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8541f8de

RDMA/uverbs: Export XRC domains to user space · 53d0bd1e

由 Sean Hefty 提交于 5月 24, 2011

Allow user space to create XRC domains. Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

53d0bd1e

RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD · d3d72d90

由 Sean Hefty 提交于 5月 26, 2011

XRC TGT QPs are intended to be shared among multiple users and
processes.  Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.

To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.

To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP.  The user releases the
reference by calling ib_release_qp.  This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d3d72d90

RDMA/core: Add XRC QPs · b42b63cf

由 Sean Hefty 提交于 5月 23, 2011

XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC communication is between an initiator (INI) QP and a target (TGT)
QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.

We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.

This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b42b63cf

RDMA/core: Add XRC SRQ type · 418d5130

由 Sean Hefty 提交于 5月 23, 2011

XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC defines SRQs that are specifically used by XRC connections.  Expand
the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.

Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

418d5130

RDMA/core: Add SRQ type field · 96104eda

由 Sean Hefty 提交于 5月 23, 2011

Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

96104eda

13 10月, 2011 1 次提交

RDMA/core: Add XRC domain support · 59991f94

由 Sean Hefty 提交于 5月 23, 2011

XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

A few new concepts are introduced to support this.  This patch adds:

 - A new device capability flag, IB_DEVICE_XRC, which low-level
   drivers set to indicate that a device supports XRC.
 - A new object type, XRC domains (struct ib_xrcd), and new verbs
   ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
   XRC SRQs an incoming message can target.

This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

59991f94

19 7月, 2011 4 次提交

IB/core: Add GID change event · 761d90ed

由 Or Gerlitz 提交于 6月 15, 2011

Add IB GID change event type.  This is needed for IBoE when the HW
driver updates the GID (e.g when new VLANs are added/deleted) table
and the change should be reflected to the IB core cache.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

761d90ed

RDMA/cma: Don't allow IPoIB port space for IBoE · 2efdd6a0

由 Moni Shoua 提交于 7月 12, 2011

This patch fixes a kernel crash in cma_set_qkey().

When the link layer is Ethernet, it is wrong to use IPoIB port space
since no IPoIB interface is available.  Specifically, setting the
Q_Key when port space is RDMA_PS_IPOIB requires MGID calculation and
an SA query, which doesn't make sense over Ethernet.
Signed-off-by: NMoni Shoua <monis@mellanox.co.il>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2efdd6a0

RDMA: Allow for NULL .modify_device() and .modify_port() methods · 10e1b54b

由 Bart Van Assche 提交于 6月 18, 2011

These methods don't make sense for iWARP devices, so rather than
forcing them to implement stubs, just return -ENOSYS in the core if
the hardware driver doesn't set .modify_device and/or .modify_port.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

10e1b54b

RDMA/cma: Avoid assigning an IS_ERR value to cm_id pointer in CMA id object · 0c9361fc

由 Jack Morgenstein 提交于 7月 17, 2011

Avoid assigning an IS_ERR value to the cm_id pointer.  This fixes a
few anomalies in the error flow due to confusion about checking for
NULL vs IS_ERR, and eliminates the need to test for the IS_ERR value
every time we wish to determine if the cma_id object has a cm device
associated with it.

Also, eliminate the now-unnecessary procedure cma_has_cm_dev (we can
check directly for the existence of the device pointer -- for a
non-NULL check, makes no difference if it is the iwarp or the ib
pointer).

Finally, make a few code changes here to improve coding consistency.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0c9361fc

18 7月, 2011 1 次提交
- D
  net: Abstract dst->neighbour accesses behind helpers. · 69cce1d1
  由 David S. Miller 提交于 7月 17, 2011
```
dst_{get,set}_neighbour()
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  69cce1d1
05 7月, 2011 1 次提交

RDMA: Check for NULL mode in .devnode methods · b2bc4782

由 Goldwyn Rodrigues 提交于 7月 04, 2011

Commits 71c29bd5 ("IB/uverbs: Add devnode method to set path/mode")
and c3af0980 ("IB: Add devnode methods to cm_class and umad_class")
added devnode methods that set the mode.

However, these methods don't check for a NULL mode, and so we get a
crash when unloading modules because devtmpfs_delete_node() calls
device_get_devnode() with mode == NULL.

Add the missing checks.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
[ Also fix cm.c.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2bc4782

10 6月, 2011 1 次提交

rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679

由 Greg Rose 提交于 6月 10, 2011

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c7ac8679

26 5月, 2011 4 次提交

RDMA/cma: Save PID of ID's owner · 83e9502d

由 Nir Muchtar 提交于 1月 13, 2011

Save the PID associated with an RDMA CM ID for reporting via netlink.
Signed-off-by: NNir Muchtar <nirm@voltaire.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

83e9502d

RDMA/cma: Add support for netlink statistics export · 753f618a

由 Nir Muchtar 提交于 1月 03, 2011

Add callbacks and data types for statistics export of all current
devices/ids.  The schema for RDMA CM is a series of netlink messages.
Each one contains an rdma_cm_stat struct.  Additionally, two netlink
attributes are created for the addresses for each message (if
applicable).

Their types used are:
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR (The source address for this ID)
RDMA_NL_RDMA_CM_ATTR_DST_ADDR (The destination address for this ID)
sockaddr_* structs are encapsulated within these attributes.

In other words, every transaction contains a series of messages like:

-------message 1-------
struct rdma_cm_id_stats {
       __u32 qp_num;
       __u32 bound_dev_if;
       __u32 port_space;
       __s32 pid;
       __u8 cm_state;
       __u8 node_type;
       __u8 port_num;
       __u8 reserved;
}
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute - contains the source address
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute - contains the destination address
-------end 1-------
-------message 2-------
struct rdma_cm_id_stats
RDMA_NL_RDMA_CM_ATTR_SRC_ADDR attribute
RDMA_NL_RDMA_CM_ATTR_DST_ADDR attribute
-------end 2-------
Signed-off-by: NNir Muchtar <nirm@voltaire.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

753f618a

RDMA/cma: Pass QP type into rdma_create_id() · b26f9b99

由 Sean Hefty 提交于 4月 01, 2010

The RDMA CM currently infers the QP type from the port space selected
by the user. In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type. For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b26f9b99

RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h> · 550e5ca7

由 Nir Muchtar 提交于 5月 20, 2011

Move cma.c's internal definition of enum cma_state to enum rdma_cm_state
in an exported header so that it can be exported via RDMA netlink.
Signed-off-by: NNir Muchtar <nirm@voltaire.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

550e5ca7

24 5月, 2011 4 次提交

IB: Add devnode methods to cm_class and umad_class · c3af0980

由 Roland Dreier 提交于 5月 23, 2011

We want the ucmX, umadX and issmX device nodes to show up under
/dev/infiniband, and additionally ucmX should have mode 0666.  Add
appropriate devnode methods to their class structs for this.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c3af0980

IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required · c8367c4c

由 Ira Weiny 提交于 5月 19, 2011

We had a script which was looping through the devices returned from
ibstat and attempted to register a SMI agent on an ethernet device.
This caused a kernel panic for IBoE devices that don't have QP0.

Fix this by checking if the QP exists before using it.
Signed-off-by: NIra Weiny <weiny2@llnl.gov>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c8367c4c

IB/uverbs: Add devnode method to set path/mode · 71c29bd5

由 Roland Dreier 提交于 5月 23, 2011

We want udev to create a device node under /dev/infiniband with
permission 0666 for uverbsX devices, so add a devnode method to set the
appropriate info.
Signed-off-by: NRoland Dreier <roland@purestorage.com>

71c29bd5

RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node · 04ea2f81

由 Roland Dreier 提交于 5月 23, 2011

We want udev to create a device node under /dev/infiniband with
permission 0666 for rdma_cm, so add that info to our struct miscdevice.
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>

04ea2f81

21 5月, 2011 2 次提交

RDMA: Add netlink infrastructure · b2cbae2c

由 Roland Dreier 提交于 5月 20, 2011

Add basic RDMA netlink infrastructure that allows for registration of
RDMA clients for which data is to be exported and supplies message
construction callbacks.
Signed-off-by: NNir Muchtar <nirm@voltaire.com>

[ Reorganize a few things, add CONFIG_NET dependency.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b2cbae2c

RDMA: Add error handling to ib_core_init() · fd75c789

由 Nir Muchtar 提交于 5月 20, 2011

Fail RDMA midlayer initialization if sysfs setup fails.
Signed-off-by: NNir Muchtar <nirm@voltaire.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

fd75c789

11 5月, 2011 1 次提交
- D
  infiniband: Remove rt->rt_src usage in addr4_resolve() · 5fc3590c
  由 David S. Miller 提交于 5月 09, 2011
```
Use an explicit flow key and fetch it from there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  5fc3590c
10 5月, 2011 3 次提交

RDMA/iwcm: Get rid of enum iw_cm_event_status · d0c49bf3

由 Roland Dreier 提交于 5月 09, 2011

The IW_CM_EVENT_STATUS_xxx values were used in only a couple of places;
cma.c uses -Exxx values instead, and so do the amso1100, cxgb3 and cxgb4
drivers -- only nes was using the enum values (with the mild consequence
that all nes connection failures were treated as generic errors rather
than reported as timeouts or rejections).

We can fix this confusion by getting rid of enum iw_cm_event_status and
using a plain int for struct iw_cm_event.status, and converting nes to
use -Exxx as the other iWARP drivers do.

This also gets rid of the warning

drivers/infiniband/core/cma.c: In function 'cma_iw_handler':
drivers/infiniband/core/cma.c:1333:3: warning: case value '4294967185' not in enumerated type 'enum iw_cm_event_status'
drivers/infiniband/core/cma.c:1336:3: warning: case value '4294967186' not in enumerated type 'enum iw_cm_event_status'
drivers/infiniband/core/cma.c:1332:3: warning: case value '4294967192' not in enumerated type 'enum iw_cm_event_status'
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Reviewed-by: NFaisal Latif <faisal.latif@intel.com>

d0c49bf3

RDMA/cma: Add an ID_REUSEADDR option · a9bb7912

由 Hefty, Sean 提交于 5月 09, 2011

Lustre requires that clients bind to a privileged port number before
connecting to a remote server.  On larger clusters (typically more
than about 1000 nodes), the number of privileged ports is exhausted,
resulting in lustre being unusable.

To handle this, we add support for reusable addresses to the rdma_cm.
This mimics the behavior of the socket option SO_REUSEADDR.  A user
may set an rdma_cm_id to reuse an address before calling
rdma_bind_addr() (explicitly or implicitly).  If set, other
rdma_cm_id's may be bound to the same address, provided that they all
have reuse enabled, and there are no active listens.

If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
will only succeed if there are no other id's bound to that same
address.  The reuse option is exported to user space.  The behavior of
the kernel reuse implementation was verified against that given by
sockets.

This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a9bb7912

RDMA/cma: Fix handling of IPv6 addressing in cma_use_port · 43b752da

由 Hefty, Sean 提交于 5月 09, 2011

cma_use_port() assumes that the sockaddr is an IPv4 address.  Since
IPv6 addressing is supported (and also to support other address
families) make the code more generic in its address handling.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

43b752da

19 3月, 2011 1 次提交

IB/mad: Improve an error message so error code is included · 1eba843d

由 Michael Heinz 提交于 2月 11, 2011

Signed-off-by: NMichael Heinz <michael.heinz@qlogic.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1eba843d

18 3月, 2011 1 次提交

RDMA/addr: Fix return of uninitialized ret value · 1bdd6384

由 Sean Hefty 提交于 3月 17, 2011

Commit b23dd4fe ("ipv4: Make output route lookup return rtable
directly") resulted in leaving ret uninitialized, where it may later
be returned.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1bdd6384

16 3月, 2011 4 次提交

RDMA/cma: Replace global lock in rdma_destroy_id() with id-specific one · a396d43a

由 Sean Hefty 提交于 2月 23, 2011

rdma_destroy_id currently uses the global rdma cm 'lock' to test if an
rdma_cm_id has been bound to a device.  This prevents an active
address resolution callback handler from assigning a device to the
rdma_cm_id after rdma_destroy_id checks for one.

Instead, we can replace the use of the global lock around the check to
the rdma_cm_id device pointer by setting the id state to destroying,
then flushing all active callbacks.  The latter is accomplished by
acquiring and releasing the handler_mutex.  Any active handler will
complete first, and any newly scheduled handlers will find the
rdma_cm_id in an invalid state.

In addition to optimizing the current locking scheme, the use of the
rdma_cm_id mutex is a more intuitive synchronization mechanism than
that of the global lock.  These changes are based on feedback from
Doug Ledford <dledford@redhat.com> while he was trying to debug a
crash in the rdma cm destroy path.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a396d43a

IB/cm: Cancel pending LAP message when exiting IB_CM_ESTABLISH state · 8d8ac865

由 Sean Hefty 提交于 3月 03, 2011

This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
Vadai <amirv@mellanox.com>:

	When destroying a cm_id from a context of a work queue and if
	the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
	release the reference of this id that was taken upon the send
	of the LAP message.  Otherwise, if the expected APR message
	gets lost, it is only after a long time that the reference
	will be released, while during that the work handler thread is
	not available to process other things.

It turns out that we need to cancel any pending LAP messages whenever
we transition out of the IB_CM_ESTABLISH state.  This occurs when
disconnecting - either sending or receiving a DREQ.  It can also
happen in a corner case where we receive a REJ message after sending
an RTU, followed by a LAP.  Add checks and cancel any outstanding LAP
messages in these three cases.

Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni.  When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that the
connection is going away.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8d8ac865

IB/cm: Bump reference count on cm_id before invoking callback · 29963437

由 Sean Hefty 提交于 2月 23, 2011

When processing a SIDR REQ, the ib_cm allocates a new cm_id.  The
refcount of the cm_id is initialized to 1.  However, cm_process_work
will decrement the refcount after invoking all callbacks.  The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.

If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id.  This can lead to a crash when the cm callback thread tries
to access the cm_id.

This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

29963437

RDMA/cma: Fix crash in request handlers · 25ae21a1

由 Sean Hefty 提交于 2月 23, 2011

Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS.  The crash has the following call trace:

    cm_process_work
       cma_req_handler
          cma_disable_callback
          rdma_create_id
             kzalloc
             init_completion
          cma_get_net_info
          cma_save_net_info
          cma_any_addr
             cma_zero_addr
          rdma_translate_ip
             rdma_copy_addr
          cma_acquire_dev
             rdma_addr_get_sgid
             ib_find_cached_gid
             cma_attach_to_dev
          ucma_event_handler
             kzalloc
             ib_copy_ah_attr_to_user
          cma_comp

[ preempted ]

    cma_write
        copy_from_user
        ucma_destroy_id
           copy_from_user
           _ucma_find_context
           ucma_put_ctx
           ucma_free_ctx
              rdma_destroy_id
                 cma_exch
                 cma_cancel_operation
                 rdma_node_get_transport

        rt_mutex_slowunlock
        bad_area_nosemaphore
        oops_enter

They were able to reproduce the crash multiple times with the
following details:

    Crash seems to always happen on the:
            mutex_unlock(&conn_id->handler_mutex);
    as conn_id looks to have been freed during this code path.

An examination of the code shows that a race exists in the request
handlers.  When a new connection request is received, the rdma_cm
allocates a new connection identifier.  This identifier has a single
reference count on it.  If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory.  However, the request
handlers may still be in the process of running.  When control returns
to the request handlers, they can attempt to access the newly created
identifiers.

Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

25ae21a1

13 3月, 2011 2 次提交

D
ipv6: Convert to use flowi6 where applicable. · 4c9483b2
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
4c9483b2

net: Put flowi_* prefix on AF independent members of struct flowi · 1d28f42c

由 David S. Miller 提交于 3月 12, 2011

I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d28f42c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功