提交 · b2853fd6c2d0f383dbdf7427e263eb576a633867 · openeuler / raspberrypi-kernel

02 4月, 2014 1 次提交

IB/core: Don't resolve passive side RoCE L2 address in CMA REQ handler · b2853fd6

由 Moni Shoua 提交于 3月 27, 2014

The code that resolves the passive side source MAC within the rdma_cm
connection request handler was both redundant and buggy, so remove it.

It was redundant since later, when an RC QP is modified to RTR state,
the resolution will take place in the ib_core module.  It was buggy
because this callback also deals with UD SIDR exchange, for which we
incorrectly looked at the REQ member of the CM event and dereferenced
a random value.

Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b2853fd6

23 1月, 2014 2 次提交

IB/core: Fix unused variable warning · 05633102

由 Or Gerlitz 提交于 11月 03, 2013

Fix the below "make W=1" build warning:

    drivers/infiniband/core/iwcm.c: In function ‘destroy_cm_id’:
    drivers/infiniband/core/iwcm.c:330: warning: variable ‘ret’ set but not used
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

05633102

RDMA/cma: Handle global/non-linklocal IPv6 addresses in cma_check_linklocal() · 5462eddd

由 Somnath Kotur 提交于 11月 16, 2013

If addr is not a linklocal address, the code incorrectly fails to
return and ends up assigning the scope ID to the scope id of the
address, which is wrong.  Fix by checking if it's a link local address
first, and immediately return 0 if not.
Signed-off-by: NSomnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5462eddd

20 1月, 2014 3 次提交

IB/cm: Fix missing unlock on error in cm_init_qp_rtr_attr() · 990acea6

由 Wei Yongjun 提交于 1月 16, 2014

Add the missing unlock before return from function cm_init_qp_rtr_attr()
in the error handling case.

Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

990acea6

IB/core: Make ib_addr a core IB module · 2f85d24e

由 Matan Barak 提交于 1月 16, 2014

IP based addressing introduces the usage of rdma_addr_find_dmac_by_grh()
within ib_core.  Since this function is declared in ib_addr, ib_addr
should be a part of the core INFINIBAND modules, rather than
INFINIBAND_ADDR_TRANS.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2f85d24e

IB/core: Resolve Ethernet L2 addresses when modifying QP · ed4c54e5

由 Or Gerlitz 提交于 12月 12, 2013

Existing user space applications provide only IBoE L3 address
attributes to the kernel when they issue a modify QP modify.  To work
with them and let such apps (plus kernel consumers which don't use the
RDMA-CM) keep working transparently under the IBoE GID IP addressing
changes, add an Eth L2 address resolution helper.
Signed-off-by: NMoni Shoua <monis@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ed4c54e5

19 1月, 2014 2 次提交

IB/cma: IBoE (RoCE) IP-based GID addressing · 7b85627b

由 Moni Shoua 提交于 12月 12, 2013

Currently, the IB core and specifically the RDMA-CM assumes that IBoE
(RoCE) gids encode related Ethernet netdevice interface MAC address
and possibly VLAN id.

Change GIDs to be treated as they encode interface IP address.

Since Ethernet layer 2 address parameters are not longer encoded
within gids, we have to extend the Infiniband address structures (e.g.
ib_ah_attr) with layer 2 address parameters, namely mac and vlan.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7b85627b

IB/core: Add support for RDMA_NODE_USNIC_UDP · 5db5765e

由 Upinder Malhi 提交于 1月 15, 2014

Add the complementary RDMA_NODE_USNIC_UDP for RDMA_TRANSPORT_USNIC_UDP.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5db5765e

15 1月, 2014 2 次提交

net: replace macros net_random and net_srandom with direct calls to prandom · 63862b5b

由 Aruna-Hewapathirane 提交于 1月 11, 2014

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: NAruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63862b5b

IB/core: Ethernet L2 attributes in verbs/cm structures · dd5f03be

由 Matan Barak 提交于 12月 12, 2013

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

dd5f03be

14 1月, 2014 1 次提交

IB/core: Add RDMA_TRANSPORT_USNIC_UDP · 248567f7

由 Upinder Malhi 提交于 1月 09, 2014

Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

248567f7

21 12月, 2013 7 次提交

IB/uverbs: Check access to userspace response buffer in extended command · 6cc3df84

由 Yann Droneaud 提交于 12月 11, 2013

This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...)
to ensure the whole buffer is in userspace memory before using the
pointer in uverbs functions. If the buffer or a subset of it is not
valid, returns -EFAULT to the caller.

This will also catch invalid buffer before the final call to
copy_to_user() which happen late in most uverb functions.

Just like the check in read(2) syscall, it's a sanity check to detect
invalid parameters provided by userspace. This particular check was added
in vfs_read() by Linus Torvalds for v2.6.12 with following commit message:

https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502

Make read/write always do the full "access_ok()" tests.

The actual user copy will do them too, but only for the
range that ends up being actually copied. That hides
bugs when the range has been clamped by file size or other
issues.

Note: there's no need to check input buffer since vfs_write() already does
access_ok(VERIFY_READ, ...) as part of write() syscall.

Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6cc3df84

IB/uverbs: Check input length in flow steering uverbs · 6bcca3d4

由 Yann Droneaud 提交于 12月 11, 2013

Since ib_copy_from_udata() doesn't check yet the available input data
length before accessing userspace memory, an explicit check of this
length is required to prevent:

- reading past the user provided buffer,
- underflow when subtracting the expected command size from the input
  length.

This will ensure the newly added flow steering uverbs don't try to
process truncated commands.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6bcca3d4

IB/uverbs: Set error code when fail to consume all flow_spec items · 98a37510

由 Yann Droneaud 提交于 12月 11, 2013

If the flow_spec items parsed count does not match the number of items
declared in the flow_attr command, or if not all bytes are used for
flow_spec items (eg. trailing garbage), a log message is reported and
the function leave through the error path. Unfortunately the error
code is currently not set.

This patch set error code to -EINVAL in such cases, so that the error
is reported to userspace instead of silently fail.

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

98a37510

IB/uverbs: Check reserved fields in create_flow · c780d82a

由 Yann Droneaud 提交于 12月 11, 2013

As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c780d82a

IB/uverbs: Check comp_mask in destroy_flow · 2782c2d3

由 Yann Droneaud 提交于 12月 11, 2013

Just like the check added to create_flow in 22878dbc ("IB/core:
Better checking of userspace values for receive flow steering"),
comp_mask must be checked in destroy_flow too.

Since only empty comp_mask is currently supported, any other value
must be rejected.

This check was silently added in a previous patch[1] to move comp_mask
in extended command header, part of previous patchset[2] against
create/destroy_flow uverbs. The idea of moving comp_mask to the header
was discarded for the final patchset[3].

Unfortunately the check added in destroy_flow uverb was not integrated
in the final patchset.

[1] http://marc.info/?i=40175eda10d670d098204da6aa4c327a0171ae5f.1381510045.git.ydroneaud@opteya.com
[2] http://marc.info/?i=cover.1381510045.git.ydroneaud@opteya.com
[3] http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

Cc: Matan Barak <matanb@mellanox.com>
Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2782c2d3

IB/uverbs: Check reserved field in extended command header · 7efb1b19

由 Yann Droneaud 提交于 12月 11, 2013

As noted by Daniel Vetter in its article "Botching up ioctls"[1]

  "Check *all* unused fields and flags and all the padding for whether
   it's 0, and reject the ioctl if that's not the case.  Otherwise
   your nice plan for future extensions is going right down the
   gutters since someone *will* submit an ioctl struct with random
   stack garbage in the yet unused parts. Which then bakes in the ABI
   that those fields can never be used for anything else but garbage."

It's important to ensure that reserved fields are set to known value,
so that it will be possible to use them latter to extend the ABI.

The same reasonning apply to comp_mask field present in newer uverbs
command: per commit 22878dbc ("IB/core: Better checking of
userspace values for receive flow steering"), unsupported values in
comp_mask are rejected.

[1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7efb1b19

IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA() · a96e4e2f

由 Roland Dreier 提交于 12月 19, 2013

Trying to have a ternary operator to choose between NULL (or 0) and the
real pointer value in invocations leads to an impossible choice between
a sparse error about a literal 0 used as a NULL pointer, and a gcc
warning about "pointer/integer type mismatch in conditional expression."

Rather than clutter the source with more casts, move the ternary
operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it
easier to use and simplifies its callers.
Reported-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

a96e4e2f

17 12月, 2013 1 次提交

IB/core: const'ify inbuf in struct ib_udata · 309243ec

由 Yann Droneaud 提交于 12月 11, 2013

Userspace input buffer is not modified by kernel, so it can be 'const'.

This is also a prerequisite to remove the implicit cast
from INIT_UDATA().

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

309243ec

16 12月, 2013 1 次提交

RDMA/iwcm: Don't touch cm_id after deref in rem_ref · 6b59ba60

由 Steve Wise 提交于 11月 21, 2013

rem_ref() calls iwcm_deref_id(), which will wake up any blockers on
cm_id_priv->destroy_comp if the refcnt hits 0.  That will unblock
someone in iw_destroy_cm_id() which will free the cmid.  If that
happens before rem_ref() calls test_bit(IWCM_F_CALLBACK_DESTROY,
&cm_id_priv->flags), then the test_bit() will touch freed memory.

The fix is to read the bit first, then deref.  We should never be in
iw_destroy_cm_id() with IWCM_F_CALLBACK_DESTROY set, and there is a
BUG_ON() to make sure of that.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6b59ba60

18 11月, 2013 6 次提交

IB/core: Re-enable create_flow/destroy_flow uverbs · 69ad5da4

由 Matan Barak 提交于 11月 06, 2013

This commit reverts commit 7afbddfa ("IB/core: Temporarily disable
create_flow/destroy_flow uverbs").  Since the uverbs extensions
functionality was experimental for v3.12, this patch re-enables the
support for them and flow-steering for v3.13.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

69ad5da4

IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2

由 Yann Droneaud 提交于 11月 06, 2013

Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.

According to the commit 400dbc96, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.

But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].

So the following patch is an attempt to a revised extensible command
infrastructure.

This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.

Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.

Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).

So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).

The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.

Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.

The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.

Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:

                             legacy      extended

   Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
  Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)

For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".

One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.

The proposed scheme will format input (command) and output (response)
buffers this way:

- command:

  legacy header +
  extended header +
  command data (core + hw):

    +----------------------------------------+
    | flags     |   00      00    |  command |
    |        in_words    |   out_words       |
    +----------------------------------------+
    |                 response               |
    |                 response               |
    | provider_in_words | provider_out_words |
    |                 padding                |
    +----------------------------------------+
    |                                        |
    .              <uverbs input>            .
    .              (in_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .             <provider input>           .
    .          (provider_in_words * 8)       .
    |                                        |
    +----------------------------------------+

- response, if present:

    +----------------------------------------+
    |                                        |
    .          <uverbs output space>         .
    .             (out_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .         <provider output space>        .
    .         (provider_out_words * 8)       .
    |                                        |
    +----------------------------------------+

The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.

Note:

The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility).  This was suggested by Roland Dreier in a previous
review[2].  But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.

[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

[3]:
http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

[ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f21519b2

IB/core: Remove ib_uverbs_flow_spec structure from userspace · 2490f20b

由 Yann Droneaud 提交于 11月 06, 2013

The structure holding any types of flow_spec is of no use to
userspace.  It would be wrong for userspace to do:

  struct ib_uverbs_flow_spec flow_spec;

  flow_spec.type = IB_FLOW_SPEC_TCP;
  flow_spec.size = sizeof(flow_spec);

Instead, userspace should use the dedicated flow_spec structure for
  - Ethernet : struct ib_uverbs_flow_spec_eth,
  - IPv4     : struct ib_uverbs_flow_spec_ipv4,
  - TCP/UDP  : struct ib_uverbs_flow_spec_tcp_udp.

In other words, struct ib_uverbs_flow_spec is a "virtual" data
structure that can only be use by the kernel as an alias to the other.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>

2490f20b

IB/core: Make uverbs flow structure use names like verbs ones · b68c9560

由 Yann Droneaud 提交于 11月 06, 2013

This patch adds "flow" prefix to most of data structure added as part
of commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
uverbs") to keep those names in sync with the data structures added in
commit 319a441d ("IB/core: Add receive flow steering support").

It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>

b68c9560

IB/core: Rename 'flow' structs to match other uverbs structs · d82693da

由 Yann Droneaud 提交于 11月 06, 2013

Commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
uverbs") added public data structures to support receive flow
steering. The new structs are not following the 'uverbs' pattern:
they're lacking the common prefix 'ib_uverbs'.

This patch replaces ib_kern prefix by ib_uverbs.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>

d82693da

IB/core: clarify overflow/underflow checks on ib_create/destroy_flow · f8848274

由 Matan Barak 提交于 11月 06, 2013

This patch fixes the following issues:

1. Unneeded checks were removed

2. Removed the fixed size out of flow_attr.size, thus simplifying the checks.

3. Remove a 32bit hole on 64bit systems with strict alignment in
   struct ib_kern_flow_att by adding a reserved field.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f8848274

17 11月, 2013 2 次提交

IB/ucma: Convert use of typedef ctl_table to struct ctl_table · f3a5e3e3

由 Joe Perches 提交于 10月 22, 2013

This typedef is unnecessary and should just be removed.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f3a5e3e3

IB/cm: Convert to using idr_alloc_cyclic() · ab626d1a

由 Zhao Hongjiang 提交于 11月 15, 2013

Commit 3e6628c4 ("idr: introduce idr_alloc_cyclic()") adds a new
idr_alloc_cyclic() routine and converts several of these users to it.
This is just a missed one - add it.
Signed-off-by: NZhao Hongjiang <zhaohongjiang@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ab626d1a

16 11月, 2013 1 次提交

IB/core: Encorce MR access rights rules on kernel consumers · 1c636f80

由 Eli Cohen 提交于 10月 31, 2013

Enforce the rule that when requesting remote write or atomic permissions, local
write must be indicated as well. See IB spec 11.2.8.2.

Spotted by: Hagay Abramovsky <hagaya@mellanox.com>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1c636f80

12 11月, 2013 2 次提交

RDMA/cma: Remove unused argument and minor dead code · 352b9056

由 Michal Nazarewicz 提交于 11月 10, 2013

The dev variable is never assigned after being initialised.
Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

352b9056

RDMA/ucma: Discard events for IDs not yet claimed by user space · c6b21824

由 Sean Hefty 提交于 11月 01, 2013

Problem reported by Avneesh Pant <avneesh.pant@oracle.com>:

It looks like we are triggering a bug in RDMA CM/UCM interaction.
The bug specifically hits when we have an incoming connection
request and the connecting process dies BEFORE the passive end of
the connection can process the request i.e. it does not call
rdma_get_cm_event() to retrieve the initial connection event. We
were able to triage this further and have some additional
information now.

In the example below when P1 dies after issuing a connect request
as the CM id is being destroyed all outstanding connects (to P2)
are sent a reject message. We see this reject message being
received on the passive end and the appropriate CM ID created for
the initial connection message being retrieved in cm_match_req().
The problem is in the ucma_event_handler() code when this reject
message is delivered to it and the initial connect message itself
HAS NOT been delivered to the client. In fact the client has not
even called rdma_cm_get_event() at this stage so we haven't
allocated a new ctx in ucma_get_event() and updated the new
connection CM_ID to point to the new UCMA context.

This results in the reject message not being dropped in
ucma_event_handler() for the new connection request as the
(if (!ctx->uid)) block is skipped since the ctx it refers to is
the listen CM id context which does have a valid UID associated
with it (I believe the new CMID for the connection initially
uses the listen CMID -> context when it is created in
cma_new_conn_id). Thus the assumption that new events for a
connection can get dropped in ucma_event_handler() is incorrect
IF the initial connect request has not been retrieved in the
first case. We end up getting a CM Reject event on the listen CM
ID and our upper layer code asserts (in fact this event does not
even have the listen_id set as that only gets set up librdmacm
for connect requests).

The solution is to verify that the cm_id being reported in the event
is the same as the cm_id referenced by the ucma context. A mismatch
indicates that the ucma context corresponds to the listen. This fix
was validated by using a modified version of librdmacm that was able
to verify the problem and see that the reject message was indeed
dropped after this patch was applied.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c6b21824

09 11月, 2013 5 次提交

IB/core: Add Cisco usNIC rdma node and transport types · 180771a3

由 Upinder Malhi \(umalhi\) 提交于 9月 10, 2013

This patch adds new rdma node and new rdma transport, and supporting
code used by Cisco's low latency driver called usNIC.  usNIC uses its
own transport, distinct from IB and iWARP.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NJeff Squyres <jsquyres@cisco.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

180771a3

IB/netlink: Remove superfluous RDMA_NL_GET_OP() masking · 5476781b

由 Mathias Krause 提交于 9月 30, 2013

'op' is the already RDMA_NL_GET_OP() masked 'type'.  No need to mask it again.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Reviewed-by: NYann Droneaud <ydroneaud@opteya.com>
Acked-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5476781b

IB/core: Pass imm_data from ib_uverbs_send_wr to ib_send_wr correctly · 6b7d103c

由 Latchesar Ionkov 提交于 10月 19, 2013

Currently, we don't copy the immediate data from the userspace struct
to the kernel one when UD messages are being sent.

This patch makes sure that the immediate data is set correctly.
Signed-off-by: NLatchesar Ionkov <lucho@ionkov.net>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

6b7d103c

IB/cma: Check for GID on listening device first · be9130cc

由 Doug Ledford 提交于 9月 24, 2013

As a simple optimization that should speed up the vast majority of
connect attemps on IB devices, when we are searching for the GID of an
incoming connection in the cached GID lists of devices, search the
device that received the incoming connection request first.  If we
don't find it there, then move on to other devices.

This reduces the time to perform 10,000 connections considerably.
Prior to this patch, a bad run of cmtime would look like this:

connect      :    12399.26   12351.10    8609.00    1239.93

With this patch, it looks more like this:

connect      :     5864.86    5799.80    8876.00     586.49
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

be9130cc

IB/cma: Use cached gids · 29f27e84

由 Doug Ledford 提交于 9月 24, 2013

The cma_acquire_dev function was changed by commit 3c86aa70
("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
because multiport devices might have either IB or IBoE formatted gids.
The old function assumed that all ports on the same device used the
same GID format.

However, when it was changed to use find_gid_port(), we inadvertently
lost usage of the GID cache.  This turned out to be a very costly
change.  In our testing, each iteration through each index of the GID
table takes roughly 35us.  When you have multiple devices in a system,
and the GID you are looking for is on one of the later devices, the
code loops through all of the GID indexes on all of the early devices
before it finally succeeds on the target device.  This pathological
search behavior combined with 35us per GID table index retrieval
results in results such as the following from the cmtime application
that's part of the latest librdmacm git repo:

ib1:
step              total ms     max ms     min us  us / conn
create id    :       29.42       0.04       1.00       2.94
bind addr    :   186705.66      19.00   18556.00   18670.57
resolve addr :       41.93       9.68     619.00       4.19
resolve route:      486.93       0.48     101.00      48.69
create qp    :     4021.95       6.18     330.00     402.20
connect      :    68350.39   68588.17   24632.00    6835.04
disconnect   :     1460.43     252.65-1862269.00     146.04
destroy      :       41.16       0.04       2.00       4.12

ib0:
step              total ms     max ms     min us  us / conn
create id    :       28.61       0.68       1.00       2.86
bind addr    :     2178.86       2.95     201.00     217.89
resolve addr :       51.26      16.85     845.00       5.13
resolve route:      620.08       0.43      92.00      62.01
create qp    :     3344.40       6.36     273.00     334.44
connect      :     6435.99    6368.53    7844.00     643.60
disconnect   :     5095.38     321.90     757.00     509.54
destroy      :       37.13       0.02       2.00       3.71

Clearly, both the bind address and connect operations suffer
a huge penalty for being anything other than the default
GID on the first port in the system.

After applying this patch, the numbers now look like this:

ib1:
step              total ms     max ms     min us  us / conn
create id    :       30.15       0.03       1.00       3.01
bind addr    :       80.27       0.04       7.00       8.03
resolve addr :       43.02      13.53     589.00       4.30
resolve route:      482.90       0.45     100.00      48.29
create qp    :     3986.55       5.80     330.00     398.66
connect      :     7141.53    7051.29    5005.00     714.15
disconnect   :     5038.85     193.63     918.00     503.88
destroy      :       37.02       0.04       2.00       3.70

ib0:
step              total ms     max ms     min us  us / conn
create id    :       34.27       0.05       1.00       3.43
bind addr    :       26.45       0.04       1.00       2.64
resolve addr :       38.25      10.54     760.00       3.82
resolve route:      604.79       0.43      97.00      60.48
create qp    :     3314.95       6.34     273.00     331.49
connect      :    12399.26   12351.10    8609.00    1239.93
disconnect   :     5096.76     270.72    1015.00     509.68
destroy      :       37.10       0.03       2.00       3.71

It's worth noting that we still suffer a bit of a penalty on
connect to the wrong device, but the penalty is much less than
it used to be.  Follow on patches deal with this penalty.

Many thanks to Neil Horman for helping to track the source of
slow function that allowed us to track down the fact that
the original patch I mentioned above backed out cache usage
and identify just how much that impacted the system.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

29f27e84

08 11月, 2013 1 次提交

RDMA/cma: Set IBoE SL (user-priority) by egress map when using vlans · eb072c4b

由 Eyal Perry 提交于 11月 06, 2013

On top of commit 366cddb4 "IB/rdma_cm: TOS <=> UP mapping for IBoE", add
support for case vlan egress map is used.

When the IBoE session is being set over a vlan, inherit the socket priority
to vlan priority mapping which was configured for the vlan device egress map.
Signed-off-by: NEyal Perry <eyalpe@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb072c4b

22 10月, 2013 1 次提交

IB/core: Temporarily disable create_flow/destroy_flow uverbs · 7afbddfa

由 Yann Droneaud 提交于 10月 10, 2013

The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.

So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).

The feature will be enabled after proper cleanup for v3.13.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com

[ Add a Kconfig option to reenable these verbs.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7afbddfa

01 10月, 2013 1 次提交

net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 · 0bbf87d8

由 Eric W. Biederman 提交于 9月 28, 2013

- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
  of the callers.
- Move the initialization of sysctl_local_ports into
   sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
  This patch now successfully passes an allmodconfig build.
  Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
Originally-by: NSamya <samya@twitter.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bbf87d8

03 9月, 2013 1 次提交

IB/core: Better checking of userspace values for receive flow steering · 22878dbc

由 Matan Barak 提交于 9月 01, 2013

  - Don't allow unsupported comp_mask values, user should check
    ibv_query_device to know which features are supported.
  - Add a check in ib_uverbs_create_flow() to verify the size passed
    from the user space.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

22878dbc