提交 · c70285f880e88cb4f73effb722065a182ba5936f · openanolis / cloud-kernel

23 6月, 2016 3 次提交

IB/uverbs: Extend create QP to get RWQ indirection table · c70285f8

由 Yishai Hadas 提交于 5月 23, 2016

User applications that want to spread incoming traffic between several WQs
should create a QP which contains an indirection table.

When such a QP is created other receive side parameters are not valid
and should not be given. Its send side is optional and assumed active
based on max_send_wr capability value.

Extend create QP to work accordingly.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c70285f8

IB/uverbs: Introduce RWQ Indirection table · de019a94

由 Yishai Hadas 提交于 5月 23, 2016

User applications that want to spread traffic on several WQs, need to
create an indirection table, by using already created WQs.

Adding uverbs API in order to create and destroy this table.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

de019a94

IB/uverbs: Add WQ support · f213c052

由 Yishai Hadas 提交于 5月 23, 2016

User space applications which use RSS functionality need to create
a work queue object (WQ). The lifetime of such an object is:
 * Create a WQ
 * Modify the WQ from reset to init state.
 * Use the WQ (by downstream patches).
 * Destroy the WQ.

These commands are added to the uverbs API.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@rimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f213c052

26 5月, 2016 5 次提交

IB/hfi1: Remove write(), use ioctl() for user cmds · 380fb942

由 Dennis Dalessandro 提交于 5月 19, 2016

Remove the write() handler for user space commands now that ioctl
handling is available. User apps will need to change to use ioctl from
this point forward.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

380fb942

IB/hfi1: Add ioctl() interface for user commands · 8d970cf9

由 Dennis Dalessandro 提交于 5月 19, 2016

IOCTL is more suited to what user space commands need to do than the
write() interface. Add IOCTL definitions for all existing write commands
and the handling for those. The write() interface will be removed in a
follow on patch.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8d970cf9

IB/hfi1: Remove unused user command · ac56f162

由 Dennis Dalessandro 提交于 5月 19, 2016

The HFI1_CMD_SDMA_STATUS_UPD command was never implemented it has no
reason to live in the driver. Remove it.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ac56f162

IB/hfi1: Remove EPROM functionality from data device · d0790317

由 Dennis Dalessandro 提交于 5月 19, 2016

Remove EPROM handling from the cdev which is used for user application
data traffic.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d0790317

IB/hfi1: Remove multiple device cdev · 0eb62659

由 Dennis Dalessandro 提交于 5月 19, 2016

hfi1 current exports a cdev that can be used to target all of the hfi's
in the system. However there is a problem with this approach in
that the devices could be on different subnets. This is a problem that
user space can figure out and explicitly tell the driver on which device
to create a context.

Remove the multi-purpose cdev leaving a dedicated cdev for each port.
Also remove the striping capability that is dependent upon the user
choosing the multi-purpose cdev. It is now up to user space to determine
how to stripe contexts.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0eb62659

25 5月, 2016 1 次提交

IB/netlink: Add a new local service operation · c34d3761

由 Mark Bloch 提交于 5月 19, 2016

This commits adds a new RDMA local service operation:
- IP to GID resolution.

The client request would include the ifindex of the outgoing interface
and would place in an attribute (LS_NLA_TYPE_IPV4 or LS_NLA_TYPE_IPV6)
the destnation IP.

The local service would answer with a message that has the attribute:
- LS_NLA_TYPE_DGID - The destination GID.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c34d3761

14 5月, 2016 1 次提交

IB/core: Add extended device capability flags · 0b24e5ac

由 Majd Dibbiny 提交于 4月 17, 2016

Since all the uverbs device_cap_flags are occupied, we need a place to
expose more device capabilities.

This patch adds a new 64 bit device_cap_flags_ex to expose new
device capabilities.

The lower 32 bits will be identical to the original device_cap_flags,
The upper 32 bits will be new capabilities.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b24e5ac

17 3月, 2016 1 次提交

iwcm: common code for port mapper · b493d91d

由 Faisal Latif 提交于 2月 26, 2016

moved port mapper related code from drivers into common code
Signed-off-by: NMustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: NTatyana E. Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b493d91d

01 3月, 2016 4 次提交

i40iw: add entry in rdma_netlink · 7a43b598

由 Faisal Latif 提交于 1月 20, 2016

Add entry for port mapper services.

Changes since v2:
	moved this patch before being used

Changes since v1:
	moved I40IW as last element
Signed-off-by: NFaisal Latif <faisal.latif@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7a43b598

staging/hfi1: Enable TID caching feature · 0b091fb3

由 Mitko Haralanov 提交于 2月 05, 2016

This commit "flips the switch" on the TID caching feature
implemented in this patch series.

As well as enabling the new feature by tying the new function
with the PSM API, it also cleans up the old unneeded code,
data structure members, and variables.

Due to difference in operation and information, the tracing
functions related to expected receives had to be changed. This
patch include these changes.

The tracing function changes could not be split into a separate
commit without including both tracing variants at the same time.
This would have caused other complications and ugliness.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b091fb3

uapi/hfi1_user: Add command and event for TID caching · 955ad36d

由 Mitko Haralanov 提交于 2月 05, 2016

TID caching will use a new event to signal userland that cache
invalidation has occurred and needs a matching command code that
will be used to read the invalidated TIDs.

Add the event bit and the new command to the exported header file.

The command is also added to the switch() statement in file_ops.c
for completeness and in preparation for its usage later.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

955ad36d

uapi/hfi1_user: Correct comment for capability bit · 462075a6

由 Mitko Haralanov 提交于 2月 05, 2016

The HFI1_CAP_TID_UNMAP comment was incorrectly implying the
opposite of what capability actually did. Correct this error.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

462075a6

22 12月, 2015 1 次提交

staging/rdma/hfi1: Adjust EPROM partitions, add EPROM commands · cd371e09

由 Dean Luick 提交于 11月 16, 2015

Add a new EPROM partition, adjusting partition placement.

Add EPROM range commands as a supserset of the partition
commands.  Remove old partition commands.

Enhance EPROM erase, creating a range function and using the
largest erase (sub) commands when possible.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cd371e09

16 11月, 2015 1 次提交

staging/rdma/hfi1: Clean up macro indentation · 3bd4dce1

由 Mitko Haralanov 提交于 10月 30, 2015

In preparation for implementing Expected TID caching we do some simple clean up
of header file macros.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3bd4dce1

27 10月, 2015 1 次提交

staging/rdma/hfi1: Remove QSFP_ENABLED from HFI capability mask · 3c2f85b8

由 Easwar Hariharan 提交于 10月 26, 2015

The QSFP interface code has been running without issues and the flag is
never set to off. This patch removes the QSFP_ENABLED bit from HFI1_CAP.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NEaswar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3c2f85b8

22 10月, 2015 1 次提交

IB/core: Extend ib_uverbs_create_qp · 6d8a7497

由 Eran Ben Elisha 提交于 10月 21, 2015

ib_uverbs_ex_create_qp follows the extension verbs
mechanism. New features (for example, QP creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6d8a7497

04 9月, 2015 1 次提交

IB/hfi1: Add PSM2 user space header to header_install · 6f876ce4

由 Ira Weiny 提交于 9月 02, 2015

When the hfi1 driver was added a user space header file (hfi1_user.h) was added
to be shared between PSM2 and the driver.  However, the file was not added to
the header install.  Add it now.

Fixes: d4ab3470 ("IB/core: Add core header changes needed for OPA")
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6f876ce4

31 8月, 2015 1 次提交

IB/netlink: Add defines for local service requests through netlink · 6431eb87

由 Kaike Wan 提交于 8月 14, 2015

This patch adds netlink defines for local service client, local service
group, local service operations, and related attributes.
Signed-off-by: NKaike Wan <kaike.wan@intel.com>
Signed-off-by: NJohn Fleck <john.fleck@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6431eb87

29 8月, 2015 1 次提交

IB/core: Add core header changes needed for OPA · d4ab3470

由 Dennis Dalessandro 提交于 7月 30, 2015

This patch adds the value of the CNP opcode to the existing list of enumerated
opcodes in ib_pack.h

Add common OPA header definitions for driver
build:
- opa_port_info.h
- opa_smi.h
- hfi1_user.h

Additionally, ib_mad.h, has additional definitions
that are common to ib_drivers including:
- trap support
- cca support

The qib driver has the duplication removed in favor
those in ib_mad.h
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NJohn, Jubin <jubin.john@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d4ab3470

13 6月, 2015 2 次提交

IB/core: Add timestamp_mask and hca_core_clock to query_device · 24306dc6

由 Matan Barak 提交于 6月 11, 2015

In order to expose timestamp we need to expose two new attributes in
query_device to be used for CQ completion time-stamping:

timestamp_mask - how many bits are valid in the timestamp, where timestamp
values could be 64bits the most.

hca_core_clock - timestamp is given in HW cycles, the frequency in KHZ units
of the HCA, necessary in order to convert cycles to seconds.

This is added both to ib_query_device and its respective uverbs counterpart.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24306dc6

IB/core: Extend ib_uverbs_create_cq · 565197dd

由 Matan Barak 提交于 6月 11, 2015

ib_uverbs_ex_create_cq follows the extension verbs
mechanism. New features (for example, CQ creation flags
field which is added in a downstream patch) could used
via user-space libraries without breaking the ABI.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

565197dd

05 5月, 2015 1 次提交

RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the... · 6eec1774

由 Tatyana Nikolova 提交于 4月 21, 2015

RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients

Add functionality to enable the port mapper on the passive side to provide to its
clients the actual (non-mapped) ip/tcp address information of the connecting peer

1) Adding remote_info_cb() to process the address info of the connecting peer
The address info is provided by the user space port mapper service when
the connection is initiated by the peer
2) Adding a hash list to store the remote address info
3) Adding functionality to add/remove the remote address info
After the info has been provided to the port mapper client,
it is removed from the hash list
Signed-off-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6eec1774

19 2月, 2015 2 次提交

IB/core: Add on demand paging caps to ib_uverbs_ex_query_device · f4056bfd

由 Haggai Eran 提交于 2月 08, 2015

Add on-demand paging capabilities reporting to the extended query device verb.

Yann Droneaud writes:

    Note: as offsetof() is used to retrieve the size of the lower chunk
    of the response, beware that it only works if the upper chunk
    is right after, without any implicit padding. And, as the size of
    the latter chunk is added to the base size, implicit padding at the
    end of the structure is not taken in account. Both point must be
    taken in account when extending the uverbs functionalities.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f4056bfd

IB/core: Add support for extended query device caps · 02d1aa7a

由 Eli Cohen 提交于 2月 08, 2015

Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy
capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.

Following the discussion about this patch [1], the code now validates
the command's comp_mask is zero, returning -EINVAL for unknown values,
in order to allow extending the verb in the future.

The verb also checks the user-space provided response buffer size and
only fills in capabilities that will fit in the buffer. In attempt to
follow the spirit of presentation [2] by Tzahi Oved that was presented
during OpenFabrics Alliance International Developer Workshop 2013, the
comp_mask bits will only describe which fields are valid.  Furthermore,
fields that can simply be cleared when they are not supported, do not
require a comp_mask bit at all.  The verb returns a response_length
field containing the actual number of bytes written by the kernel, so
that a newer version running on an older kernel can tell which fields
were actually returned.

[1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19
    http://thread.gmane.org/gmane.linux.kernel.api/7889/

[2] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdfSigned-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

02d1aa7a

06 2月, 2015 1 次提交

Revert "IB/core: Add support for extended query device caps" · 43c61165

由 Yann Droneaud 提交于 2月 05, 2015

While commit 7e36ef82 ("IB/core: Temporarily disable
ex_query_device uverb") is correct as it makes the extended
QUERY_DEVICE uverb (which came as part of commit 5a77abf9
("IB/core: Add support for extended query device caps") and commit
860f10a7 ("IB/core: Add flags for on demand paging support")) not
available to userspace, it doesn't address the initial issue regarding
ib_copy_to_udata() [1][2].

Additionally, further discussions around this new uverb seems to
conclude it would require a different data structure than the one
currently described in <rdma/ib_user_verbs.h> [3].

Both of these issues require a revert of the changes, so this patch
partially reverts commit 8cdd312c ("IB/mlx5: Implement the ODP
capability query verb") and commit 860f10a7 ("IB/core: Add flags
for on demand paging support") and fully reverts commit 5a77abf9
("IB/core: Add support for extended query device caps").

[1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps"
    http://mid.gmane.org/1418733236.2779.26.camel@opteya.com

[2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb"
    http://mid.gmane.org/1423067503.3030.83.camel@opteya.com

[3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask"
    http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com

Cc: Eli Cohen <eli@mellanox.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

43c61165

16 12月, 2014 2 次提交

IB/core: Add flags for on demand paging support · 860f10a7

由 Sagi Grimberg 提交于 12月 11, 2014

* Add a configuration option for enable on-demand paging support in
  the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
  later patch, this configuration option will select the MMU_NOTIFIER
  configuration option to enable mmu notifiers.
* Add a flag for on demand paging (ODP) support in the IB device capabilities.
* Add a flag to request ODP MR in the access flags to reg_mr.
* Fail registrations done with the ODP flag when the low-level driver
  doesn't support this.
* Change the conditions in which an MR will be writable to explicitly
  specify the access flags.  This is to avoid making an MR writable just
  because it is an ODP MR.
* Add a ODP capabilities to the extended query device verb.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

860f10a7

IB/core: Add support for extended query device caps · 5a77abf9

由 Eli Cohen 提交于 12月 11, 2014

Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
copy capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

5a77abf9

13 8月, 2014 1 次提交

RDMA/uapi: Include socket.h in rdma_user_cm.h · db1044d4

由 Doug Ledford 提交于 8月 12, 2014

added struct sockaddr_storage to rdma_user_cm.h without also adding an
include for linux/socket.h to make sure it is defined.  Systemtap
needs the header files to build standalone and cannot rely on other
files to pre-include other headers, so add linux/socket.h to the list
of includes in this file.

Fixes: ee7aed45 ("RDMA/ucma: Support querying for AF_IB addresses")
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

db1044d4

11 8月, 2014 2 次提交

IB/mad: Add user space RMPP support · 1471cb6c

由 Ira Weiny 提交于 8月 08, 2014

Using the new registration mechanism, define a flag that indicates the
user wishes to process RMPP messages in user space rather than have
the kernel process them.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

1471cb6c

IB/mad: add new ioctl to ABI to support new registration options · 0f29b46d

由 Ira Weiny 提交于 8月 08, 2014

Registrations options are specified through flags.  Definitions of flags will
be in subsequent patches.
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

0f29b46d

02 8月, 2014 1 次提交

IB/core: Add user MR re-registration support · 7e6edb9b

由 Matan Barak 提交于 7月 31, 2014

Memory re-registration is a feature that enables changing the
attributes of a memory region registered by user-space, including PD,
translation (address and length) and access flags.

Add the required support in uverbs and the kernel verbs API.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7e6edb9b

11 6月, 2014 1 次提交

RDMA/core: Add support for iWARP Port Mapper user space service · 30dc5e63

由 Tatyana Nikolova 提交于 3月 26, 2014

This patch adds iWARP Port Mapper (IWPM) Version 2 support.  The iWARP
Port Mapper implementation is based on the port mapper specification
section in the Sockets Direct Protocol paper -
http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf

Existing iWARP RDMA providers use the same IP address as the native
TCP/IP stack when creating RDMA connections.  They need a mechanism to
claim the TCP ports used for RDMA connections to prevent TCP port
collisions when other host applications use TCP ports.  The iWARP Port
Mapper provides a standard mechanism to accomplish this.  Without this
service it is possible for RDMA application to bind/listen on the same
port which is already being used by native TCP host application.  If
that happens the incoming TCP connection data can be passed to the
RDMA stack with error.

The iWARP Port Mapper solution doesn't contain any changes to the
existing network stack in the kernel space.  All the changes are
contained with the infiniband tree and also in user space.

The iWARP Port Mapper service is implemented as a user space daemon
process.  Source for the IWPM service is located at
http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary

The iWARP driver (port mapper client) sends to the IWPM service the
local IP address and TCP port it has received from the RDMA
application, when starting a connection.  The IWPM service performs a
socket bind from user space to get an available TCP port, called a
mapped port, and communicates it back to the client.  In that sense,
the IWPM service is used to map the TCP port, which the RDMA
application uses to any port available from the host TCP port
space. The mapped ports are used in iWARP RDMA connections to avoid
collisions with native TCP stack which is aware that these ports are
taken. When an RDMA connection using a mapped port is terminated, the
client notifies the IWPM service, which then releases the TCP port.

The message exchange between the IWPM service and the iWARP drivers
(between user space and kernel space) is implemented using netlink
sockets.

1) Netlink interface functions are added: ibnl_unicast() and
   ibnl_mulitcast() for sending netlink messages to user space

2) The signature of the existing ibnl_put_msg() is changed to be more
   generic

3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
   corresponding to the two iWarp drivers - nes and cxgb4 which use
   the IWPM service

4) Enums are added to enumerate the attributes in the netlink
   messages, which are exchanged between the user space IWPM service
   and the iWARP drivers
Signed-off-by: NTatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NPJ Waskiewicz <pj.waskiewicz@solidfire.com>

[ Fold in range checking fixes and nlh_next removal as suggested by Dan
  Carpenter and Steve Wise.  Fix sparse endianness in hash.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

30dc5e63

18 11月, 2013 5 次提交

IB/core: Re-enable create_flow/destroy_flow uverbs · 69ad5da4

由 Matan Barak 提交于 11月 06, 2013

This commit reverts commit 7afbddfa ("IB/core: Temporarily disable
create_flow/destroy_flow uverbs").  Since the uverbs extensions
functionality was experimental for v3.12, this patch re-enables the
support for them and flow-steering for v3.13.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

69ad5da4

IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2

由 Yann Droneaud 提交于 11月 06, 2013

Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
commands") added an infrastructure for extensible uverbs commands
while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
using this new infrastructure.

According to the commit 400dbc96, the purpose of this
infrastructure is to support passing around provider (eg. hardware)
specific buffers when userspace issue commands to the kernel, so that
it would be possible to extend uverbs (eg. core) buffers independently
from the provider buffers.

But the new kernel command function prototypes were not modified to
take advantage of this extension. This issue was exposed by Roland
Dreier in a previous review[1].

So the following patch is an attempt to a revised extensible command
infrastructure.

This improved extensible command infrastructure distinguish between
core (eg. legacy)'s command/response buffers from provider
(eg. hardware)'s command/response buffers: each extended command
implementing function is given a struct ib_udata to hold core
(eg. uverbs) input and output buffers, and another struct ib_udata to
hold the hw (eg. provider) input and output buffers.

Having those buffers identified separately make it easier to increase
one buffer to support extension without having to add some code to
guess the exact size of each command/response parts: This should make
the extended functions more reliable.

Additionally, instead of relying on command identifier being greater
than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
unused bits in command field: on the 32 bits provided by command
field, only 6 bits are really needed to encode the identifier of
commands currently supported by the kernel. (Even using only 6 bits
leaves room for about 23 new commands).

So this patch makes use of some high order bits in command field to
store flags, leaving enough room for more command identifiers than one
will ever need (eg. 256).

The new flags are used to specify if the command should be processed
as an extended one or a legacy one. While designing the new command
format, care was taken to make usage of flags itself extensible.

Using high order bits of the commands field ensure that newer
libibverbs on older kernel will properly fail when trying to call
extended commands. On the other hand, older libibverbs on newer kernel
will never be able to issue calls to extended commands.

The extended command header includes the optional response pointer so
that output buffer length and output buffer pointer are located
together in the command, allowing proper parameters checking. This
should make implementing functions easier and safer.

Additionally the extended header ensure 64bits alignment, while making
all sizes multiple of 8 bytes, extending the maximum buffer size:

                             legacy      extended

   Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
  Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)

For the purpose of doing proper buffer size accounting, the headers
size are no more taken in account in "in_words".

One of the odds of the current extensible infrastructure, reading
twice the "legacy" command header, is fixed by removing the "legacy"
command header from the extended command header: they are processed as
two different parts of the command: memory is read once and
information are not duplicated: it's making clear that's an extended
command scheme and not a different command scheme.

The proposed scheme will format input (command) and output (response)
buffers this way:

- command:

  legacy header +
  extended header +
  command data (core + hw):

    +----------------------------------------+
    | flags     |   00      00    |  command |
    |        in_words    |   out_words       |
    +----------------------------------------+
    |                 response               |
    |                 response               |
    | provider_in_words | provider_out_words |
    |                 padding                |
    +----------------------------------------+
    |                                        |
    .              <uverbs input>            .
    .              (in_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .             <provider input>           .
    .          (provider_in_words * 8)       .
    |                                        |
    +----------------------------------------+

- response, if present:

    +----------------------------------------+
    |                                        |
    .          <uverbs output space>         .
    .             (out_words * 8)            .
    |                                        |
    +----------------------------------------+
    |                                        |
    .         <provider output space>        .
    .         (provider_out_words * 8)       .
    |                                        |
    +----------------------------------------+

The overall design is to ensure that the extensible infrastructure is
itself extensible while begin more reliable with more input and bound
checking.

Note:

The unused field in the extended header would be perfect candidate to
hold the command "comp_mask" (eg. bit field used to handle
compatibility).  This was suggested by Roland Dreier in a previous
review[2].  But "comp_mask" field is likely to be present in the uverb
input and/or provider input, likewise for the response, as noted by
Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
header.

[1]:
http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com

[2]:
http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com

[3]:
http://marc.info/?i=525C1149.6000701@mellanox.comSigned-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com

[ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f21519b2

IB/core: Remove ib_uverbs_flow_spec structure from userspace · 2490f20b

由 Yann Droneaud 提交于 11月 06, 2013

The structure holding any types of flow_spec is of no use to
userspace.  It would be wrong for userspace to do:

  struct ib_uverbs_flow_spec flow_spec;

  flow_spec.type = IB_FLOW_SPEC_TCP;
  flow_spec.size = sizeof(flow_spec);

Instead, userspace should use the dedicated flow_spec structure for
  - Ethernet : struct ib_uverbs_flow_spec_eth,
  - IPv4     : struct ib_uverbs_flow_spec_ipv4,
  - TCP/UDP  : struct ib_uverbs_flow_spec_tcp_udp.

In other words, struct ib_uverbs_flow_spec is a "virtual" data
structure that can only be use by the kernel as an alias to the other.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>

2490f20b

IB/core: Use a common header for uverbs flow_specs · 58913efb

由 Yann Droneaud 提交于 11月 06, 2013

A common header will allows better checking of flow specs size, while
ensuring strict alignment to 64 bits.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>

58913efb

IB/core: Make uverbs flow structure use names like verbs ones · b68c9560

由 Yann Droneaud 提交于 11月 06, 2013

This patch adds "flow" prefix to most of data structure added as part
of commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow through
uverbs") to keep those names in sync with the data structures added in
commit 319a441d ("IB/core: Add receive flow steering support").

It's just a matter of translating 'ib_flow' to 'ib_uverbs_flow'.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.comSigned-off-by: NRoland Dreier <roland@purestorage.com>

b68c9560

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功