提交 · 17cd3a2db825506c3e3bb9548ad20f67e2f8d0e7 · openanolis / cloud-kernel

08 3月, 2014 1 次提交

IB/core: Introduce protected memory regions · 17cd3a2d

由 Sagi Grimberg 提交于 2月 23, 2014

This commit introduces verbs for creating/destoying memory
regions which will allow new types of memory key operations such
as protected memory registration.

Indirect memory registration is registering several (one
of more) pre-registered memory regions in a specific layout.
The Indirect region may potentialy describe several regions
and some repitition format between them.

Protected Memory registration is registering a memory region
with various data integrity attributes which will describe protection
schemes that will be handled by the HCA in an offloaded manner.
These memory regions will be applicable for a new REG_SIG_MR
work request introduced later in this patchset.

In the future these routines may replace or implement current memory
regions creation routines existing today:
- ib_reg_user_mr
- ib_alloc_fast_reg_mr
- ib_get_dma_mr
- ib_dereg_mr
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

17cd3a2d

14 2月, 2014 1 次提交

IB: Report using RoCE IP based gids in port caps · b4a26a27

由 Moni Shoua 提交于 2月 09, 2014

For userspace RoCE UD QPs we need to know the GID format that the
kernel uses, e.g when working over older kernels. For that end, add a
new port capability IB_PORT_IP_BASED_GIDS and report it when query
port is issued.
Signed-off-by: NMoni Shoua <monis@mellanox.co.il>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b4a26a27

19 1月, 2014 1 次提交

IB/core: Add support for RDMA_NODE_USNIC_UDP · 5db5765e

由 Upinder Malhi 提交于 1月 15, 2014

Add the complementary RDMA_NODE_USNIC_UDP for RDMA_TRANSPORT_USNIC_UDP.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5db5765e

15 1月, 2014 3 次提交

IB/core: Ethernet L2 attributes in verbs/cm structures · dd5f03be

由 Matan Barak 提交于 12月 12, 2013

This patch add the support for Ethernet L2 attributes in the
verbs/cm/cma structures.

When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.

Thus, those attributes were added to the following structures:

* ib_ah_attr - added dmac
* ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
* ib_wc - added smac, vlan_id
* ib_sa_path_rec - added smac, dmac, vlan_id
* cm_av - added smac and vlan_id

For the path record structure, extra care was taken to avoid the new
fields when packing it into wire format, so we don't break the IB CM
and SA wire protocol.

On the active side, the CM fills. its internal structures from the
path provided by the ULP.  We add there taking the ETH L2 attributes
and placing them into the CM Address Handle (struct cm_av).

On the passive side, the CM fills its internal structures from the WC
associated with the REQ message.  We add there taking the ETH L2
attributes from the WC.

When the HW driver provides the required ETH L2 attributes in the WC,
they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
code checks for the presence of these flags, and in their absence does
address resolution from the ib_init_ah_from_wc() helper function.

ib_modify_qp_is_ok is also updated to consider the link layer. Some
parameters are mandatory for Ethernet link layer, while they are
irrelevant for IB.  Vendor drivers are modified to support the new
function signature.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

dd5f03be

IB/core: Add support for IB L2 device-managed steering · 240ae00e

由 Matan Barak 提交于 11月 07, 2013

This patch adds preliminary support for IB L2 device-managed steering,
currently exposed only in the kernel.

This flow spec can be used by low-level drivers that need to indicate
the link layer type when creating device-managed flow rules.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

240ae00e

IB/core: Add flow steering support for IPoIB UD traffic · 90f1d1b4

由 Matan Barak 提交于 11月 07, 2013

When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering. This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

90f1d1b4

14 1月, 2014 1 次提交

IB/core: Add RDMA_TRANSPORT_USNIC_UDP · 248567f7

由 Upinder Malhi 提交于 1月 09, 2014

Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

248567f7

17 12月, 2013 1 次提交

IB/core: const'ify inbuf in struct ib_udata · 309243ec

由 Yann Droneaud 提交于 12月 11, 2013

Userspace input buffer is not modified by kernel, so it can be 'const'.

This is also a prerequisite to remove the implicit cast
from INIT_UDATA().

Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

309243ec

18 11月, 2013 1 次提交

IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2

由 Yann Droneaud 提交于 11月 06, 2013

Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.

According to the commit 400dbc96, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.

But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].

So the following patch is an attempt to a revised extensible command
infrastructure.

This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.

Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.

Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).

So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).

The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.

Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.

The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.

Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:

                             legacy      extended

   Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
  Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)

For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".

One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.

The proposed scheme will format input (command) and output (response)
buffers this way:

- command:

  legacy header +
  extended header +
  command data (core + hw):

    +----------------------------------------+
    | flags     |   00      00    |  command |
    |        in_words    |   out_words       |
    +----------------------------------------+
    |                 response               |
    |                 response               |
    | provider_in_words | provider_out_words |
    |                 padding                |
    +----------------------------------------+
    |                                        |
    .              <uverbs input>            .
    .              (in_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .             <provider input>           .
    .          (provider_in_words * 8)       .
    |                                        |
    +----------------------------------------+

- response, if present:

    +----------------------------------------+
    |                                        |
    .          <uverbs output space>         .
    .             (out_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .         <provider output space>        .
    .         (provider_out_words * 8)       .
    |                                        |
    +----------------------------------------+

The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.

Note:

The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility).  This was suggested by Roland Dreier in a previous
review[2].  But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.

[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

[3]:
http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

[ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f21519b2

16 11月, 2013 1 次提交

IB/core: Encorce MR access rights rules on kernel consumers · 1c636f80

由 Eli Cohen 提交于 10月 31, 2013

Enforce the rule that when requesting remote write or atomic permissions, local
write must be indicated as well. See IB spec 11.2.8.2.

Spotted by: Hagay Abramovsky <hagaya@mellanox.com>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1c636f80

09 11月, 2013 1 次提交

IB/core: Add Cisco usNIC rdma node and transport types · 180771a3

由 Upinder Malhi \(umalhi\) 提交于 9月 10, 2013

This patch adds new rdma node and new rdma transport, and supporting
code used by Cisco's low latency driver called usNIC.  usNIC uses its
own transport, distinct from IB and iWARP.
Signed-off-by: NUpinder Malhi <umalhi@cisco.com>
Signed-off-by: NJeff Squyres <jsquyres@cisco.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

180771a3

03 9月, 2013 1 次提交

IB/core: Better checking of userspace values for receive flow steering · 22878dbc

由 Matan Barak 提交于 9月 01, 2013

  - Don't allow unsupported comp_mask values, user should check
    ibv_query_device to know which features are supported.
  - Add a check in ib_uverbs_create_flow() to verify the size passed
    from the user space.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

22878dbc

29 8月, 2013 2 次提交

IB/core: Export ib_create/destroy_flow through uverbs · 436f2ad0

由 Hadar Hen Zion 提交于 8月 14, 2013

Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
support flow steering for user space applications.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

436f2ad0

IB/core: Add receive flow steering support · 319a441d

由 Hadar Hen Zion 提交于 8月 07, 2013

The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side.  Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.

This patch adds ib_create_flow(), which allow providing a flow
specification for a QP.  When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.

Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers.  Currently specs for Ethernet, IPv4,
TCP and UDP are defined.  Flow specs are made of values and masks.

The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.

    struct ib_flow_attr {
            enum ib_flow_attr_type type;
            u16      size;
            u16      priority;
            u32      flags;
            u8       num_of_specs;
            u8       port;
            /* Following are the optional layers according to user request
             * struct ib_flow_spec_yyy
             * struct ib_flow_spec_zzz
             */
    };

As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.

The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.

For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.

Steering rules processing order is according to the domain over which
the rule is set and the rule priority.  All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation.  Lower numerical value for the priority field means
higher priority.

The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().

Applications that offload TCP/IP traffic can also be written over IB
UD QPs.  The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too.  A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.

The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:

    IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification

    IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
        all Ethernet traffic which isn't steered to any QP

    IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast

    IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic

ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

319a441d

14 8月, 2013 1 次提交

IB/core: Add locking around event dispatching on XRC target QPs · 73c40c61

由 Yishai Hadas 提交于 8月 01, 2013

Fix a potential race when event occurrs on a target XRC QP and in the
middle of reporting that on its shared qps, one of them is destroyed
by user space application.  Also add note for kernel consumers in
ib_verbs.h that they must not destroy the QP from within the handler.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

73c40c61

08 7月, 2013 1 次提交

IB/core: Add reserved values to enums for low-level driver use · 0134f16b

由 Jack Morgenstein 提交于 7月 07, 2013

Continue the approach taken by commit d2b57063 ("IB/core: Reserve
bits in enum ib_qp_create_flags for low-level driver use") and add
reserved entries to the ib_qp_type and ib_wr_opcode enums.  Low-level
drivers can then define macros to use these reserved values, giving
proper names to the macros for readability.  Also add a range of
reserved flags to enum ib_send_flags.

The mlx5 IB driver uses the new additions.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0134f16b

22 2月, 2013 1 次提交

IB/core: Add "type 2" memory windows support · 7083e42e

由 Shani Michaeli 提交于 2月 06, 2013

This patch enhances the IB core support for Memory Windows (MWs).

MWs allow an application to have better/flexible control over remote
access to memory.

Two types of MWs are supported, with the second type having two flavors:

    Type 1  - associated with PD only
    Type 2A - associated with QPN only
    Type 2B - associated with PD and QPN

Applications can allocate a MW once, and then repeatedly bind the MW
to different ranges in MRs that are associated to the same PD. Type 1
windows are bound through a verb, while type 2 windows are bound by
posting a work request.

The 32-bit memory key is composed of a 24-bit index and an 8-bit
key. The key is changed with each bind, thus allowing more control
over the peer's use of the memory key.

The changes introduced are the following:

* add memory window type enum and a corresponding parameter to ib_alloc_mw.
* type 2 memory window bind work request support.
* create a struct that contains the common part of the bind verb struct
  ibv_mw_bind and the bind work request into a single struct.
* add the ib_inc_rkey helper function to advance the tag part of an rkey.

Consumer interface details:

* new device capability flags IB_DEVICE_MEM_WINDOW_TYPE_2A and
  IB_DEVICE_MEM_WINDOW_TYPE_2B are added to indicate device support
  for these features.

  Devices can set either IB_DEVICE_MEM_WINDOW_TYPE_2A or
  IB_DEVICE_MEM_WINDOW_TYPE_2B if it supports type 2A or type 2B
  memory windows. It can set neither to indicate it doesn't support
  type 2 windows at all.

* modify existing provides and consumers code to the new param of
  ib_alloc_mw and the ib_mw_bind_info structure
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NShani Michaeli <shanim@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7083e42e

01 10月, 2012 1 次提交

IB/core: Reserve bits in enum ib_qp_create_flags for low-level driver use · d2b57063

由 Jack Morgenstein 提交于 8月 03, 2012

Reserve bits 26-31 for internal use by low-level drivers. Two such
bits are used in the mlx4_b driver SR-IOV implementation.

These enum additions guarantee that the core layer will never use
these bits, so that low level drivers may safely make use of them.
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d2b57063

09 5月, 2012 2 次提交

IB/core: Add raw packet QP type · c938a616

由 Or Gerlitz 提交于 3月 01, 2012

IB_QPT_RAW_PACKET allows applications to build a complete packet,
including L2 headers, when sending; on the receive side, the HW will
not strip any headers.

This QP type is designed for userspace direct access to Ethernet; for
example by applications that do TCP/IP themselves.  Only processes
with the NET_RAW capability are allowed to create raw packet QPs (the
name "raw packet QP" is supposed to suggest an analogy to AF_PACKET /
SOL_RAW sockets).
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c938a616

IB/core: Use qp->usecnt to track multicast attach/detach · c3bccbfb

由 Or Gerlitz 提交于 4月 29, 2012

Just as we don't allow PDs, CQs, etc. to be destroyed if there are QPs
that are attached to them, don't let a QP be destroyed if there are
multicast group(s) attached to it.  Use the existing usecnt field of
struct ib_qp which was added by commit 0e0ec7e0 ("RDMA/core: Export
ib_open_qp() to share XRC TGT QPs") to track this.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c3bccbfb

09 3月, 2012 1 次提交

IB: Change CQE "csum_ok" field to a bit flag · d927d505

由 Or Gerlitz 提交于 1月 11, 2012

Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag.  By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding.  However, it will allow to add more fields in the future
without enlarging the struct.  Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
<liranl@mellanox.com>.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d927d505

06 3月, 2012 1 次提交

IB: Use central enum for speed instead of hard-coded values · 2e96691c

由 Or Gerlitz 提交于 2月 28, 2012

The kernel IB stack uses one enumeration for IB speed, which wasn't
explicitly specified in the verbs header file.  Add that enum, and use
it all over the code.

The IB speed/width notation is also used by iWARP and IBoE HW drivers,
which use the convention of rate = speed * width to advertise their
port link rate.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2e96691c

14 10月, 2011 6 次提交

RDMA/core: Export ib_open_qp() to share XRC TGT QPs · 0e0ec7e0

由 Sean Hefty 提交于 8月 08, 2011

XRC TGT QPs are shared resources among multiple processes.  Since the
creating process may exit, allow other processes which share the same
XRC domain to open an existing QP.  This allows us to transfer
ownership of an XRC TGT QP to another process.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0e0ec7e0

RDMA/uverbs: Export XRC domains to user space · 53d0bd1e

由 Sean Hefty 提交于 5月 24, 2011

Allow user space to create XRC domains. Because XRCDs are expected to
be shared among multiple processes, we use inodes to identify an XRCD.

Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

53d0bd1e

RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCD · d3d72d90

由 Sean Hefty 提交于 5月 26, 2011

XRC TGT QPs are intended to be shared among multiple users and
processes.  Allow the destruction of an XRC TGT QP to be done explicitly
through ib_destroy_qp() or when the XRCD is destroyed.

To support destroying an XRC TGT QP, we need to track TGT QPs with the
XRCD.  When the XRCD is destroyed, all tracked XRC TGT QPs are also
cleaned up.

To avoid stale reference issues, if a user is holding a reference on a
TGT QP, we increment a reference count on the QP.  The user releases the
reference by calling ib_release_qp.  This releases any access to the QP
from a user above verbs, but allows the QP to continue to exist until
destroyed by the XRCD.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

d3d72d90

RDMA/core: Add XRC QPs · b42b63cf

由 Sean Hefty 提交于 5月 23, 2011

XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC communication is between an initiator (INI) QP and a target (TGT)
QP.  Target QPs are associated with SRQs through an XRCD.  An XRC TGT QP
behaves like a receive-only RD QP.  XRC INI QPs behave similarly to RC
QPs, except that work requests posted to an XRC INI QP must specify the
remote SRQ that is the target of the work request.

We define two new QP types for XRC, to distinguish between INI and TGT
QPs, and update the core layer to support XRC QPs.

This patch is derived from work by Jack Morgenstein
<jackm@dev.mellanox.co.il>
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

b42b63cf

RDMA/core: Add XRC SRQ type · 418d5130

由 Sean Hefty 提交于 5月 23, 2011

XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

XRC defines SRQs that are specifically used by XRC connections.  Expand
the SRQ code to support XRC SRQs.  An XRC SRQ is currently restricted to
only XRC use according to the IB XRC Annex.

Portions of this patch were derived from work by
Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

418d5130

RDMA/core: Add SRQ type field · 96104eda

由 Sean Hefty 提交于 5月 23, 2011

Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

96104eda

13 10月, 2011 1 次提交

RDMA/core: Add XRC domain support · 59991f94

由 Sean Hefty 提交于 5月 23, 2011

XRC ("eXtended reliable connected") is an IB transport that provides
better scalability by allowing senders to specify which shared receive
queue (SRQ) should be used to receive a message, which essentially
allows one transport context (QP connection) to serve multiple
destinations (as long as they share an adapter, of course).

A few new concepts are introduced to support this.  This patch adds:

 - A new device capability flag, IB_DEVICE_XRC, which low-level
   drivers set to indicate that a device supports XRC.
 - A new object type, XRC domains (struct ib_xrcd), and new verbs
   ib_alloc_xrcd()/ib_dealloc_xrcd().  XRCDs are used to limit which
   XRC SRQs an incoming message can target.

This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

59991f94

12 10月, 2011 1 次提交

IB: Add new InfiniBand link speeds · 71eeba16

由 Marcel Apfelbaum 提交于 10月 05, 2011

Introduce support for the following extended speeds:

FDR-10: a Mellanox proprietary link speed which is 10.3125 Gbps with
        64b/66b encoding rather than 8b/10b encoding.
FDR:    IBA extended speed 14.0625 Gbps.
EDR:    IBA extended speed 25.78125 Gbps.
Signed-off-by: NMarcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: NHal Rosenstock <hal@mellanox.com>
Reviewed-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

71eeba16

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

19 7月, 2011 1 次提交

IB/core: Add GID change event · 761d90ed

由 Or Gerlitz 提交于 6月 15, 2011

Add IB GID change event type.  This is needed for IBoE when the HW
driver updates the GID (e.g when new VLANs are added/deleted) table
and the change should be reflected to the IB core cache.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

761d90ed

17 1月, 2011 1 次提交

RDMA: Update workqueue usage · f0626710

由 Tejun Heo 提交于 10月 19, 2010

* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f0626710

28 9月, 2010 1 次提交

IB/core: Add link layer property to ports · a3f5adaf

由 Eli Cohen 提交于 9月 27, 2010

This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET. This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support. For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a3f5adaf

05 8月, 2010 1 次提交

IB: Rename RAW_ETY to RAW_ETHERTYPE · a2ebf07a

由 Aleksey Senin 提交于 7月 04, 2010

Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make
the special QP type easier to understand.

cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.htmlSigned-off-by: NAleksey Senin <alekseys@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a2ebf07a

22 5月, 2010 1 次提交

IB/core: Allow device-specific per-port sysfs files · 9a6edb60

由 Ralph Campbell 提交于 5月 06, 2010

Add a new parameter to ib_register_device() so that low-level device
drivers can pass in a pointer to a callback function that will be
called for each port that is registered in sysfs.  This allows
low-level device drivers to create files in

    /sys/class/infiniband/<hca>/ports/<N>/

without having to poke through the internals of the RDMA sysfs handling.

There is no need for an unregister function since the kobject
reference will go to zero when ib_unregister_device() is called.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9a6edb60

22 4月, 2010 1 次提交

IB/core: Add support for masked atomic operations · 5e80ba8f

由 Vladimir Sokolovsky 提交于 4月 14, 2010

 - Add new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and IB_WR_MASKED_ATOMIC_FETCH_AND_ADD
   send opcodes that can be used to post "masked atomic compare and
   swap" and "masked atomic fetch and add" work request respectively.
 - Add masked_atomic_cap capability.
 - Add mask fields to atomic struct of ib_send_wr
 - Add new opcodes to ib_wc_opcode

The new operations are described more precisely below:

* Masked Compare and Swap (MskCmpSwap)

The MskCmpSwap atomic operation is an extension to the CmpSwap
operation defined in the IB spec.  MskCmpSwap allows the user to
select a portion of the 64 bit target data for the “compare” check as
well as to restrict the swap to a (possibly different) portion.  The
pseudo code below describes the operation:

| atomic_response = *va
| if (!((compare_add ^ *va) & compare_add_mask)) then
|     *va = (*va & ~(swap_mask)) | (swap & swap_mask)
|
| return atomic_response

The additional operands are carried in the Extended Transport Header.
Atomic response generation and packet format for MskCmpSwap is as for
standard IB Atomic operations.

* Masked Fetch and Add (MFetchAdd)

The MFetchAdd Atomic operation extends the functionality of the
standard IB FetchAdd by allowing the user to split the target into
multiple fields of selectable length. The atomic add is done
independently on each one of this fields. A bit set in the
field_boundary parameter specifies the field boundaries. The pseudo
code below describes the operation:

| bit_adder(ci, b1, b2, *co)
| {
|	value = ci + b1 + b2
|	*co = !!(value & 2)
|
|	return value & 1
| }
|
| #define MASK_IS_SET(mask, attr)      (!!((mask)&(attr)))
| bit_position = 1
| carry = 0
| atomic_response = 0
|
| for i = 0 to 63
| {
|         if ( i != 0 )
|                 bit_position =  bit_position << 1
|
|         bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position),
|                                 MASK_IS_SET(compare_add, bit_position), &new_carry)
|         if (bit_add_res)
|                 atomic_response |= bit_position
|
|         carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, bit_position)))
| }
|
| return atomic_response
Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5e80ba8f

25 2月, 2010 1 次提交

IB/core: Pack struct ib_device a little tighter · 17a55f79

由 Alexander Chiang 提交于 2月 02, 2010

A small change to reduce the size of ib_device to 1112 bytes
(from 1128).
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

17a55f79

10 12月, 2009 1 次提交

IB: Clarify the documentation of ib_post_send() · 55464d46

由 Bart Van Assche 提交于 12月 09, 2009

Clarify the behavior of ib_post_send() when a list of work requests is
passed in and an immediate error is returned.
Signed-off-by: NBart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

55464d46

15 2月, 2009 1 次提交

net: replace __constant_{endian} uses in net headers · f3a7c66b

由 Harvey Harrison 提交于 2月 14, 2009

Base versions handle constant folding now.  For headers exposed to
userspace, we must only expose the __ prefixed versions.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3a7c66b

openanolis / cloud-kernel 12 个月 前同步成功

openanolis / cloud-kernel
12 个月前同步成功