提交 · a3f5adaf491490089215f863a61b9422fae902f8 · openeuler / Kernel

28 9月, 2010 1 次提交

IB/core: Add link layer property to ports · a3f5adaf

由 Eli Cohen 提交于 9月 27, 2010

This patch allows ports to have different link layers:
IB_LINK_LAYER_INFINIBAND or IB_LINK_LAYER_ETHERNET. This is required
for adding IBoE (InfiniBand-over-Ethernet, aka RoCE) support. For
devices that do not provide an implementation for querying the link
layer property of a port, we return a default value based on the
transport: RMA_TRANSPORT_IB nodes will return IB_LINK_LAYER_INFINIBAND
and RDMA_TRANSPORT_IWARP nodes will return IB_LINK_LAYER_ETHERNET.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a3f5adaf

05 8月, 2010 1 次提交

IB: Rename RAW_ETY to RAW_ETHERTYPE · a2ebf07a

由 Aleksey Senin 提交于 7月 04, 2010

Change abbreviated IB_QPT_RAW_ETY to IB_QPT_RAW_ETHERTYPE to make
the special QP type easier to understand.

cf http://www.mail-archive.com/linux-rdma@vger.kernel.org/msg04530.htmlSigned-off-by: NAleksey Senin <alekseys@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a2ebf07a

22 5月, 2010 1 次提交

IB/core: Allow device-specific per-port sysfs files · 9a6edb60

由 Ralph Campbell 提交于 5月 06, 2010

Add a new parameter to ib_register_device() so that low-level device
drivers can pass in a pointer to a callback function that will be
called for each port that is registered in sysfs.  This allows
low-level device drivers to create files in

    /sys/class/infiniband/<hca>/ports/<N>/

without having to poke through the internals of the RDMA sysfs handling.

There is no need for an unregister function since the kobject
reference will go to zero when ib_unregister_device() is called.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9a6edb60

22 4月, 2010 1 次提交

IB/core: Add support for masked atomic operations · 5e80ba8f

由 Vladimir Sokolovsky 提交于 4月 14, 2010

 - Add new IB_WR_MASKED_ATOMIC_CMP_AND_SWP and IB_WR_MASKED_ATOMIC_FETCH_AND_ADD
   send opcodes that can be used to post "masked atomic compare and
   swap" and "masked atomic fetch and add" work request respectively.
 - Add masked_atomic_cap capability.
 - Add mask fields to atomic struct of ib_send_wr
 - Add new opcodes to ib_wc_opcode

The new operations are described more precisely below:

* Masked Compare and Swap (MskCmpSwap)

The MskCmpSwap atomic operation is an extension to the CmpSwap
operation defined in the IB spec.  MskCmpSwap allows the user to
select a portion of the 64 bit target data for the “compare” check as
well as to restrict the swap to a (possibly different) portion.  The
pseudo code below describes the operation:

| atomic_response = *va
| if (!((compare_add ^ *va) & compare_add_mask)) then
|     *va = (*va & ~(swap_mask)) | (swap & swap_mask)
|
| return atomic_response

The additional operands are carried in the Extended Transport Header.
Atomic response generation and packet format for MskCmpSwap is as for
standard IB Atomic operations.

* Masked Fetch and Add (MFetchAdd)

The MFetchAdd Atomic operation extends the functionality of the
standard IB FetchAdd by allowing the user to split the target into
multiple fields of selectable length. The atomic add is done
independently on each one of this fields. A bit set in the
field_boundary parameter specifies the field boundaries. The pseudo
code below describes the operation:

| bit_adder(ci, b1, b2, *co)
| {
|	value = ci + b1 + b2
|	*co = !!(value & 2)
|
|	return value & 1
| }
|
| #define MASK_IS_SET(mask, attr)      (!!((mask)&(attr)))
| bit_position = 1
| carry = 0
| atomic_response = 0
|
| for i = 0 to 63
| {
|         if ( i != 0 )
|                 bit_position =  bit_position << 1
|
|         bit_add_res = bit_adder(carry, MASK_IS_SET(*va, bit_position),
|                                 MASK_IS_SET(compare_add, bit_position), &new_carry)
|         if (bit_add_res)
|                 atomic_response |= bit_position
|
|         carry = ((new_carry) && (!MASK_IS_SET(compare_add_mask, bit_position)))
| }
|
| return atomic_response
Signed-off-by: NVladimir Sokolovsky <vlad@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5e80ba8f

25 2月, 2010 1 次提交

IB/core: Pack struct ib_device a little tighter · 17a55f79

由 Alexander Chiang 提交于 2月 02, 2010

A small change to reduce the size of ib_device to 1112 bytes
(from 1128).
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

17a55f79

10 12月, 2009 1 次提交

IB: Clarify the documentation of ib_post_send() · 55464d46

由 Bart Van Assche 提交于 12月 09, 2009

Clarify the behavior of ib_post_send() when a list of work requests is
passed in and an immediate error is returned.
Signed-off-by: NBart Van Assche <bart.vanassche@gmail.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

55464d46

15 2月, 2009 1 次提交

net: replace __constant_{endian} uses in net headers · f3a7c66b

由 Harvey Harrison 提交于 2月 14, 2009

Base versions handle constant folding now.  For headers exposed to
userspace, we must only expose the __ prefixed versions.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3a7c66b

27 7月, 2008 1 次提交

dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b

由 FUJITA Tomonori 提交于 7月 25, 2008

Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:

This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).

I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread).  So I
CC'ed this to KVM camp.  Comments are appreciated.

A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
NULL, the system-wide dma_ops pointer is used as before.

If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging).  It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.

The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
device.  Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.

The first patch adds the device argument to dma_mapping_error.  The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.

This patch:

dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations.  So we can't have dma_mapping_ops per device.

Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
argument.

[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d8bb39b

15 7月, 2008 5 次提交

RDMA/core: Add local DMA L_Key support · 96f15c03

由 Steve Wise 提交于 7月 14, 2008

- Change the IB_DEVICE_ZERO_STAG flag to the transport-neutral name
  IB_DEVICE_LOCAL_DMA_LKEY, which is used by iWARP RNICs to indicate 0
  STag support and IB HCAs to indicate reserved L_Key support.

- Add a u32 local_dma_lkey member to struct ib_device.  Drivers fill
  this in with the appropriate local DMA L_Key (if they support it).

- Fix up the drivers using this flag.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

96f15c03

IB/core: Add support for multicast loopback blocking · 47ee1b9f

由 Ron Livne 提交于 7月 14, 2008

This patch also adds a creation flag for QPs,
IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK, which when set means that
multicast sends from the QP to a group that the QP is attached to will
not be looped back to the QP's receive queue.  This can be used to
save receive resources when a consumer does not want a local copy of
multicast traffic; for example IPoIB must waste CPU time throwing away
such local copies of multicast traffic.

This patch also adds a device capability flag that shows whether a
device supports this feature or not.
Signed-off-by: NRon Livne <ronli@voltaire.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

47ee1b9f

RDMA/core: Add iWARP protocol statistics attributes in sysfs · 7f624d02

由 Steve Wise 提交于 7月 14, 2008

This patch adds a sysfs attribute group called "proto_stats" under
/sys/class/infiniband/$device/ and populates this group with protocol
statistics if they exist for a given device.  Currently, only iWARP
stats are defined, but the code is designed to allow InfiniBand
protocol stats if they become available.  These stats are per-device
and more importantly -not- per port.

Details:

- Add union rdma_protocol_stats in ib_verbs.h.  This union allows
  defining transport-specific stats.  Currently only iwarp stats are
  defined.

- Add struct iw_protocol_stats to define the current set of iwarp
  protocol stats.

- Add new ib_device method called get_proto_stats() to return protocol
  statistics.

- Add logic in core/sysfs.c to create iwarp protocol stats attributes
  if the device is an RNIC and has a get_proto_stats() method.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

7f624d02

RDMA/core: Add memory management extensions support · 00f7ec36

由 Steve Wise 提交于 7月 14, 2008

This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

00f7ec36

RDMA: Improve include file coding style · 4deccd6d

由 Dotan Barak 提交于 7月 14, 2008

Remove subversion $Id lines and improve readability by fixing other
coding style problems pointed out by checkpatch.pl.
Signed-off-by: NDotan Barak <dotanba@gmail.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4deccd6d

10 6月, 2008 1 次提交

IB/core: Remove IB_DEVICE_SEND_W_INV capability flag · 4c0283fc

由 Roland Dreier 提交于 6月 09, 2008

In 2.6.26, we added some support for send with invalidate work
requests, including a device capability flag to indicate whether a
device supports such requests. However, the support was incomplete:
the completion structure was not extended with a field for the key
contained in incoming send with invalidate requests.

Full support for memory management extensions (send with invalidate,
local invalidate, fast register through a send queue, etc) is planned
for 2.6.27. Since send with invalidate is not very useful by itself,
just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final
release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27,
which makes things simpler for applications, since they will not have
quite as confusing an array of fine-grained bits to check.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4c0283fc

29 4月, 2008 1 次提交

IB: expand ib_umem_get() prototype · cb9fbc5c

由 Arthur Kepner 提交于 4月 29, 2008

Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
when mapping user-allocated CQs with ib_umem_get().
Signed-off-by: NArthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb9fbc5c

20 4月, 2008 1 次提交

IB: convert struct class_device to struct device · f4e91eb4

由 Tony Jones 提交于 2月 22, 2008

This converts the main ib_device to use struct device instead of struct
class_device as class_device is going away.
Signed-off-by: NTony Jones <tonyj@suse.de>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

f4e91eb4

17 4月, 2008 5 次提交

IB/core: Add support for modify CQ · 2dd57162

由 Eli Cohen 提交于 4月 16, 2008

Add support for modifying CQ parameters for controlling event
generation moderation.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

2dd57162

IB/core: Add support for "send with invalidate" work requests · 0f39cf3d

由 Roland Dreier 提交于 4月 16, 2008

Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
"send with invalidate" work request as defined in the iWARP verbs and
the InfiniBand base memory management extensions. Also put "imm_data"
and a new "invalidate_rkey" member in a new "ex" union in struct
ib_send_wr. The invalidate_rkey member can be used to pass in an
R_Key/STag to be invalidated. Add this new union to struct
ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in
ib_uverbs_post_send().

Fix up low-level drivers to deal with the change to struct ib_send_wr,
and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
since that code never does any send with immediate operations.

Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
the iWARP drivers currently in the tree set the bit. The amso1100
driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
if passed in as part of userspace send requests (since it does not
implement kernel bypass work request queueing). Remove the flag from
all existing drivers that set it until we know which ones are OK.

The values chosen for the new flag is not consecutive to avoid clashing
with flags defined in the XRC patches, which are not merged yet but
which are already in use and are likely to be merged soon.

This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

0f39cf3d

IB/core: Add IPoIB UD LSO support · c93570f2

由 Eli Cohen 提交于 4月 16, 2008

LSO (large send offload) allows the networking stack to pass SKBs with
data size larger than the MTU to the IPoIB driver and have the HCA HW
fragment the data to multiple MSS-sized packets.  Add a device
capability flag IB_DEVICE_UD_TSO for devices that can perform TCP
segmentation offload, a new send work request opcode IB_WR_LSO,
header, hlen and mss fields for the work request structure, and a new
IB_WC_LSO completion type.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c93570f2

IB/core: Add creation flags to struct ib_qp_init_attr · b846f25a

由 Eli Cohen 提交于 4月 16, 2008

Add a create_flags member to struct ib_qp_init_attr that will allow a
kernel verbs consumer to create a pass special flags when creating a QP.
Add a flag value for telling low-level drivers that a QP will be used
for IPoIB UD LSO.  The create_flags member will also be useful for XRC
and ehca low-latency QP support.

Since no create_flags handling is implemented yet, add code to all
low-level drivers to return -EINVAL if create_flags is non-zero.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b846f25a

IB: Make struct ib_uobject.id a signed int · b3d636b0

由 Roland Dreier 提交于 4月 16, 2008

IDR IDs are signed, so struct ib_uobject.id should be signed.  This
avoids some sparse pointer signedness warnings.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

b3d636b0

09 2月, 2008 2 次提交

IB/core: Remove unused struct ib_device.flags member · 5128bdc9

由 Roland Dreier 提交于 2月 08, 2008

Avoid confusion about what it might mean, since it's never initialized.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5128bdc9

IB/core: Add IP checksum offload support · e0605d91

由 Eli Cohen 提交于 1月 30, 2008

Add a device capability to show when it can handle checksum offload.
Also add a send flag for inserting checksums and a csum_ok field to
the completion record.
Signed-off-by: NEli Cohen <eli@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

e0605d91

25 1月, 2008 1 次提交

Kobject: change drivers/infiniband to use kobject_init_and_add · 35be0681

由 Greg Kroah-Hartman 提交于 12月 17, 2007

Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <mshefty@ichips.intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

35be0681

02 11月, 2007 1 次提交

cleanup asm/scatterlist.h includes · 87ae9afd

由 Adrian Bunk 提交于 10月 30, 2007

Not architecture specific code should not #include <asm/scatterlist.h>.

This patch therefore either replaces them with
#include <linux/scatterlist.h> or simply removes them if they were
unused.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

87ae9afd

04 8月, 2007 2 次提交

IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c · 92ddc447

由 Dotan Barak 提交于 8月 01, 2007

After moving the definition of struct ib_umem_chunk from ib_verbs.h to
ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK
to stay in ib_verbs.h.  Move the macro to umem.c, the only place where
it is used.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

92ddc447

IB: Include <linux/list.h> and <linux/rwsem.h> from <rdma/ib_verbs.h> · bfb3ea12

由 Dotan Barak 提交于 7月 31, 2007

ib_verbs.h uses struct list_head and rw_semaphore, so while the files
<linux/list.h> and <linux/rwsem.h> seem to be pulled in indirectly by
the other header files it includes, the right thing is to include
those files directly.
Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

bfb3ea12

19 5月, 2007 1 次提交

IB/core: Add helpers for uncached GID and P_Key searches · 5eb620c8

由 Yosef Etigin 提交于 5月 14, 2007

Add ib_find_gid() and ib_find_pkey() functions that use uncached device
queries. The calls might block but the returns are always up-to-date.
Cache P_Key and GID table lengths in core to avoid extra port info queries.
Signed-off-by: NYosef Etigin <yosefe@voltaire.com>
Acked-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

5eb620c8

09 5月, 2007 1 次提交

IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules · f7c6a7b5

由 Roland Dreier 提交于 3月 04, 2007

Export ib_umem_get()/ib_umem_release() and put low-level drivers in
control of when to call ib_umem_get() to pin and DMA map userspace,
rather than always calling it in ib_uverbs_reg_mr() before calling the
low-level driver's reg_user_mr method.

Also move these functions to be in the ib_core module instead of
ib_uverbs, so that driver modules using them do not depend on
ib_uverbs.

This has a number of advantages:
 - It is better design from the standpoint of making generic code a
   library that can be used or overridden by device-specific code as
   the details of specific devices dictate.
 - Drivers that do not need to pin userspace memory regions do not
   need to take the performance hit of calling ib_mem_get().  For
   example, although I have not tried to implement it in this patch,
   the ipath driver should be able to avoid pinning memory and just
   use copy_{to,from}_user() to access userspace memory regions.
 - Buffers that need special mapping treatment can be identified by
   the low-level driver.  For example, it may be possible to solve
   some Altix-specific memory ordering issues with mthca CQs in
   userspace by mapping CQ buffers with extra flags.
 - Drivers that need to pin and DMA map userspace memory for things
   other than memory regions can use ib_umem_get() directly, instead
   of hacks using extra parameters to their reg_phys_mr method.  For
   example, the mlx4 driver that is pending being merged needs to pin
   and DMA map QP and CQ buffers, but it does not need to create a
   memory key for these buffers.  So the cleanest solution is for mlx4
   to call ib_umem_get() in the create_qp and create_cq methods.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f7c6a7b5

07 5月, 2007 2 次提交

IB: Return "maybe missed event" hint from ib_req_notify_cq() · ed23a727

由 Roland Dreier 提交于 5月 06, 2007

The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested.  In
other words, this means that the following race is possible:

	while (CQ is not empty)
		ib_poll_cq(CQ);
	// new completion is added after while loop is exited
	ib_req_notify_cq(CQ);
	// no event is generated for the existing completion

To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.

However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification).  Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.

Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:

	 < 0	means an error occurred while requesting notification
	== 0	means notification was requested successfully, and if
		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
		events were missed and it is safe to wait for another
		event.
	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
		passed in.  It means that the consumer must poll the
		CQ again to make sure it is empty to avoid the race
		described above.

We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

ed23a727

IB: Add CQ comp_vector support · f4fd0b22

由 Michael S. Tsirkin 提交于 5月 03, 2007

Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.
Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

f4fd0b22

05 2月, 2007 2 次提交

IB: Return qp pointer as part of ib_wc · 062dbb69

由 Michael S. Tsirkin 提交于 12月 31, 2006

struct ib_wc currently only includes the local QP number: this matches
the IB spec, but seems mostly useless. The following patch replaces
this with the pointer to qp itself, and updates all low level drivers
and all users.

This has the following advantages:
- Ability to get a per-qp context through wc->qp->qp_context
- Existing drivers already have the qp pointer ready in poll cq, so
  this change actually saves a tiny bit (extra memory read) on data path
  (for ehca it would actually be expensive to find the QP pointer when
  polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
  NULL for ehca)
- Users that need the QP number can still get it through wc->qp->qp_num

Use case:

In IPoIB connected mode code, I have a common CQ shared by multiple
QPs.  To track connection usage, I need a way to get at some per-QP
context upon the completion, and I would like to avoid allocating
context object per work request just to stick a QP pointer into it.
With this code, I can just use wc->qp->qp_context.
Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

062dbb69

IB: Include <linux/kref.h> explicitly in <rdma/ib_verbs.h> · 459d6e2a

由 Michael S. Tsirkin 提交于 2月 04, 2007

<rdma/ib_verbs.h> uses struct kref, so it should include <linux/kref.h>
explicitly to avoid hidden include dependencies.
Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

459d6e2a

16 12月, 2006 1 次提交

IB: Fix ib_dma_alloc_coherent() wrapper · c59a3da1

由 Roland Dreier 提交于 12月 15, 2006

The ib_dma_alloc_coherent() wrapper uses a u64* for the dma_handle
parameter, unlike dma_alloc_coherent, which uses dma_addr_t*.  This
means that we need a temporary variable to handle the case when
ib_dma_alloc_coherent() just falls through directly to
dma_alloc_coherent() on architectures where sizeof u64 != sizeof
dma_addr_t.
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

c59a3da1

14 12月, 2006 1 次提交

[PATCH] ib_verbs: Use explicit if-else statements to avoid errors with do-while macros · d1998ef3

由 Ben Collins 提交于 12月 13, 2006

At least on PPC, the "op ? op : dma" construct causes a compile failure
because the dma_* is a do{}while(0) macro.

This turns all of them into proper if/else to avoid this problem.
Signed-off-by: NBen Collins <bcollins@ubuntu.com>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d1998ef3

13 12月, 2006 1 次提交

IB: Add DMA mapping functions to allow device drivers to interpose · 9b513090

由 Ralph Campbell 提交于 12月 12, 2006

The QLogic InfiniPath HCAs use programmed I/O instead of HW DMA.
This patch allows a verbs device driver to interpose on DMA mapping
function calls in order to avoid relying on bus_to_virt() and
phys_to_virt() to undo the mappings created by dma_map_single(),
dma_map_sg(), etc.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9b513090

23 9月, 2006 2 次提交

RDMA: iWARP Core Changes. · 07ebafba

由 Tom Tucker 提交于 8月 03, 2006

Modifications to the existing rdma header files, core files, drivers,
and ulp files to support iWARP, including:
 - Hook iWARP CM into the build system and use it in rdma_cm.
 - Convert enum ib_node_type to enum rdma_node_type, which includes
   the possibility of RDMA_NODE_RNIC, and update everything for this.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

07ebafba

IB/uverbs: Pass userspace data to modify_srq and modify_qp methods · 9bc57e2d

由 Ralph Campbell 提交于 8月 11, 2006

Pass a struct ib_udata to the low-level driver's ->modify_srq() and
->modify_qp() methods, so that it can get to the device-specific data
passed in by the userspace driver.
Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9bc57e2d

18 6月, 2006 2 次提交

IB/uverbs: Don't serialize with ib_uverbs_idr_mutex · 9ead190b

由 Roland Dreier 提交于 6月 17, 2006

Currently, all userspace verbs operations that call into the kernel
are serialized by ib_uverbs_idr_mutex.  This can be a scalability
issue for some workloads, especially for devices driven by the ipath
driver, which needs to call into the kernel even for datapath
operations.

Fix this by adding reference counts to the userspace objects, and then
converting ib_uverbs_idr_mutex into a spinlock that only protects the
idrs long enough to take a reference on the object being looked up.
Because remove operations may fail, we have to do a slightly funky
two-step deletion, which is described in the comments at the top of
uverbs_cmd.c.

This also still leaves ib_uverbs_idr_lock as a single lock that is
possibly subject to contention.  However, the lock hold time will only
be a single idr operation, so multiple threads should still be able to
make progress, even if ib_uverbs_idr_lock is being ping-ponged.

Surprisingly, these changes even shrink the object code:

add/remove: 23/5 grow/shrink: 4/21 up/down: 633/-693 (-60)
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

9ead190b

IB: Add ib_init_ah_from_wc() · 4e00d694

由 Sean Hefty 提交于 6月 17, 2006

Add a function to initialize address handle attributes from a work
completion.  This functionality is duplicated by both verbs and the CM.
Signed-off-by: NSean Hefty <sean.hefty@intel.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

4e00d694

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功