提交 8700e3e7 编写于 作者: M Moni Shoua 提交者: Doug Ledford

Soft RoCE driver

Soft RoCE (RXE) - The software RoCE driver

ib_rxe implements the RDMA transport and registers to the RDMA core
device as a kernel verbs provider. It also implements the packet IO
layer. On the other hand ib_rxe registers to the Linux netdev stack
as a udp encapsulating protocol, in that case RDMA, for sending and
receiving packets over any Ethernet device.  This yields a RDMA
transport over the UDP/Ethernet network layer forming a RoCEv2
compatible device.

The configuration procedure of the Soft RoCE drivers requires
binding to any existing Ethernet network device. This is done with
/sys interface.

A userspace Soft RoCE library (librxe) provides user applications
the ability to run with Soft RoCE devices.  The use of rxe verbs ins
user space requires the inclusion of librxe as a device specifics
plug-in to libibverbs. librxe is packaged separately.

Architecture:

     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
+---------------------------------------------------------------+
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +-----------------------------------------------------------+
     |                          Application                      |
     +-----------------------------------------------------------+
                            +-----------------------------------+
                            |             libibverbs            |
User                        +-----------------------------------+
                            +----------------+ +----------------+
                            | librxe         | | HW RoCE lib    |
                            +----------------+ +----------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     +--------------+                           +------------+
     | Sockets      |                           | RDMA ULP   |
     +--------------+                           +------------+
     +--------------+                  +---------------------+
     | TCP/IP       |                  | ib_core             |
     +--------------+                  +---------------------+
                             +------------+ +----------------+
Kernel                       | ib_rxe     | | HW RoCE driver |
                             +------------+ +----------------+
     +------------------------------------+
     | NIC driver                         |
     +------------------------------------+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Soft RoCE resources:

[1[ https://github.com/SoftRoCE/librxe-dev librxe - source code in
Github
[2] https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home - Soft RoCE
Wiki page
[3] https://github.com/SoftRoCE/librxe-dev - Soft RoCE userspace library
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
上级 33688abb
......@@ -7444,6 +7444,15 @@ W: http://www.mellanox.com
Q: http://patchwork.ozlabs.org/project/netdev/list/
F: drivers/net/ethernet/mellanox/mlxsw/
SOFT-ROCE DRIVER (rxe)
M: Moni Shoua <monis@mellanox.com>
L: linux-rdma@vger.kernel.org
S: Supported
W: https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
Q: http://patchwork.kernel.org/project/linux-rdma/list/
F: drivers/infiniband/hw/rxe/
F: include/uapi/rdma/rdma_user_rxe.h
MEMBARRIER SUPPORT
M: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
......
......@@ -84,6 +84,7 @@ source "drivers/infiniband/ulp/iser/Kconfig"
source "drivers/infiniband/ulp/isert/Kconfig"
source "drivers/infiniband/sw/rdmavt/Kconfig"
source "drivers/infiniband/sw/rxe/Kconfig"
source "drivers/infiniband/hw/hfi1/Kconfig"
......
obj-$(CONFIG_INFINIBAND_RDMAVT) += rdmavt/
obj-$(CONFIG_RDMA_RXE) += rxe/
config RDMA_RXE
tristate "Software RDMA over Ethernet (RoCE) driver"
depends on INET && PCI && INFINIBAND
depends on NET_UDP_TUNNEL
---help---
This driver implements the InfiniBand RDMA transport over
the Linux network stack. It enables a system with a
standard Ethernet adapter to interoperate with a RoCE
adapter or with another system running the RXE driver.
Documentation on InfiniBand and RoCE can be downloaded at
www.infinibandta.org and www.openfabrics.org. (See also
siw which is a similar software driver for iWARP.)
The driver is split into two layers, one interfaces with the
Linux RDMA stack and implements a kernel or user space
verbs API. The user space verbs API requires a support
library named librxe which is loaded by the generic user
space verbs API, libibverbs. The other layer interfaces
with the Linux network stack at layer 3.
To configure and work with soft-RoCE driver please use the
following wiki page under "configure Soft-RoCE (RXE)" section:
https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
obj-$(CONFIG_RDMA_RXE) += rdma_rxe.o
rdma_rxe-y := \
rxe.o \
rxe_comp.o \
rxe_req.o \
rxe_resp.o \
rxe_recv.o \
rxe_pool.o \
rxe_queue.o \
rxe_verbs.o \
rxe_av.o \
rxe_srq.o \
rxe_qp.o \
rxe_cq.o \
rxe_mr.o \
rxe_dma.o \
rxe_opcode.o \
rxe_mmap.o \
rxe_icrc.o \
rxe_mcast.o \
rxe_task.o \
rxe_net.o \
rxe_sysfs.o
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
MODULE_AUTHOR("Bob Pearson, Frank Zago, John Groves, Kamal Heib");
MODULE_DESCRIPTION("Soft RDMA transport");
MODULE_LICENSE("Dual BSD/GPL");
MODULE_VERSION("0.2");
/* free resources for all ports on a device */
static void rxe_cleanup_ports(struct rxe_dev *rxe)
{
kfree(rxe->port.pkey_tbl);
rxe->port.pkey_tbl = NULL;
}
/* free resources for a rxe device all objects created for this device must
* have been destroyed
*/
static void rxe_cleanup(struct rxe_dev *rxe)
{
rxe_pool_cleanup(&rxe->uc_pool);
rxe_pool_cleanup(&rxe->pd_pool);
rxe_pool_cleanup(&rxe->ah_pool);
rxe_pool_cleanup(&rxe->srq_pool);
rxe_pool_cleanup(&rxe->qp_pool);
rxe_pool_cleanup(&rxe->cq_pool);
rxe_pool_cleanup(&rxe->mr_pool);
rxe_pool_cleanup(&rxe->mw_pool);
rxe_pool_cleanup(&rxe->mc_grp_pool);
rxe_pool_cleanup(&rxe->mc_elem_pool);
rxe_cleanup_ports(rxe);
}
/* called when all references have been dropped */
void rxe_release(struct kref *kref)
{
struct rxe_dev *rxe = container_of(kref, struct rxe_dev, ref_cnt);
rxe_cleanup(rxe);
ib_dealloc_device(&rxe->ib_dev);
}
void rxe_dev_put(struct rxe_dev *rxe)
{
kref_put(&rxe->ref_cnt, rxe_release);
}
EXPORT_SYMBOL_GPL(rxe_dev_put);
/* initialize rxe device parameters */
static int rxe_init_device_param(struct rxe_dev *rxe)
{
rxe->max_inline_data = RXE_MAX_INLINE_DATA;
rxe->attr.fw_ver = RXE_FW_VER;
rxe->attr.max_mr_size = RXE_MAX_MR_SIZE;
rxe->attr.page_size_cap = RXE_PAGE_SIZE_CAP;
rxe->attr.vendor_id = RXE_VENDOR_ID;
rxe->attr.vendor_part_id = RXE_VENDOR_PART_ID;
rxe->attr.hw_ver = RXE_HW_VER;
rxe->attr.max_qp = RXE_MAX_QP;
rxe->attr.max_qp_wr = RXE_MAX_QP_WR;
rxe->attr.device_cap_flags = RXE_DEVICE_CAP_FLAGS;
rxe->attr.max_sge = RXE_MAX_SGE;
rxe->attr.max_sge_rd = RXE_MAX_SGE_RD;
rxe->attr.max_cq = RXE_MAX_CQ;
rxe->attr.max_cqe = (1 << RXE_MAX_LOG_CQE) - 1;
rxe->attr.max_mr = RXE_MAX_MR;
rxe->attr.max_pd = RXE_MAX_PD;
rxe->attr.max_qp_rd_atom = RXE_MAX_QP_RD_ATOM;
rxe->attr.max_ee_rd_atom = RXE_MAX_EE_RD_ATOM;
rxe->attr.max_res_rd_atom = RXE_MAX_RES_RD_ATOM;
rxe->attr.max_qp_init_rd_atom = RXE_MAX_QP_INIT_RD_ATOM;
rxe->attr.max_ee_init_rd_atom = RXE_MAX_EE_INIT_RD_ATOM;
rxe->attr.atomic_cap = RXE_ATOMIC_CAP;
rxe->attr.max_ee = RXE_MAX_EE;
rxe->attr.max_rdd = RXE_MAX_RDD;
rxe->attr.max_mw = RXE_MAX_MW;
rxe->attr.max_raw_ipv6_qp = RXE_MAX_RAW_IPV6_QP;
rxe->attr.max_raw_ethy_qp = RXE_MAX_RAW_ETHY_QP;
rxe->attr.max_mcast_grp = RXE_MAX_MCAST_GRP;
rxe->attr.max_mcast_qp_attach = RXE_MAX_MCAST_QP_ATTACH;
rxe->attr.max_total_mcast_qp_attach = RXE_MAX_TOT_MCAST_QP_ATTACH;
rxe->attr.max_ah = RXE_MAX_AH;
rxe->attr.max_fmr = RXE_MAX_FMR;
rxe->attr.max_map_per_fmr = RXE_MAX_MAP_PER_FMR;
rxe->attr.max_srq = RXE_MAX_SRQ;
rxe->attr.max_srq_wr = RXE_MAX_SRQ_WR;
rxe->attr.max_srq_sge = RXE_MAX_SRQ_SGE;
rxe->attr.max_fast_reg_page_list_len = RXE_MAX_FMR_PAGE_LIST_LEN;
rxe->attr.max_pkeys = RXE_MAX_PKEYS;
rxe->attr.local_ca_ack_delay = RXE_LOCAL_CA_ACK_DELAY;
rxe->max_ucontext = RXE_MAX_UCONTEXT;
return 0;
}
/* initialize port attributes */
static int rxe_init_port_param(struct rxe_port *port)
{
port->attr.state = RXE_PORT_STATE;
port->attr.max_mtu = RXE_PORT_MAX_MTU;
port->attr.active_mtu = RXE_PORT_ACTIVE_MTU;
port->attr.gid_tbl_len = RXE_PORT_GID_TBL_LEN;
port->attr.port_cap_flags = RXE_PORT_PORT_CAP_FLAGS;
port->attr.max_msg_sz = RXE_PORT_MAX_MSG_SZ;
port->attr.bad_pkey_cntr = RXE_PORT_BAD_PKEY_CNTR;
port->attr.qkey_viol_cntr = RXE_PORT_QKEY_VIOL_CNTR;
port->attr.pkey_tbl_len = RXE_PORT_PKEY_TBL_LEN;
port->attr.lid = RXE_PORT_LID;
port->attr.sm_lid = RXE_PORT_SM_LID;
port->attr.lmc = RXE_PORT_LMC;
port->attr.max_vl_num = RXE_PORT_MAX_VL_NUM;
port->attr.sm_sl = RXE_PORT_SM_SL;
port->attr.subnet_timeout = RXE_PORT_SUBNET_TIMEOUT;
port->attr.init_type_reply = RXE_PORT_INIT_TYPE_REPLY;
port->attr.active_width = RXE_PORT_ACTIVE_WIDTH;
port->attr.active_speed = RXE_PORT_ACTIVE_SPEED;
port->attr.phys_state = RXE_PORT_PHYS_STATE;
port->mtu_cap =
ib_mtu_enum_to_int(RXE_PORT_ACTIVE_MTU);
port->subnet_prefix = cpu_to_be64(RXE_PORT_SUBNET_PREFIX);
return 0;
}
/* initialize port state, note IB convention that HCA ports are always
* numbered from 1
*/
static int rxe_init_ports(struct rxe_dev *rxe)
{
struct rxe_port *port = &rxe->port;
rxe_init_port_param(port);
if (!port->attr.pkey_tbl_len || !port->attr.gid_tbl_len)
return -EINVAL;
port->pkey_tbl = kcalloc(port->attr.pkey_tbl_len,
sizeof(*port->pkey_tbl), GFP_KERNEL);
if (!port->pkey_tbl)
return -ENOMEM;
port->pkey_tbl[0] = 0xffff;
port->port_guid = rxe->ifc_ops->port_guid(rxe);
spin_lock_init(&port->port_lock);
return 0;
}
/* init pools of managed objects */
static int rxe_init_pools(struct rxe_dev *rxe)
{
int err;
err = rxe_pool_init(rxe, &rxe->uc_pool, RXE_TYPE_UC,
rxe->max_ucontext);
if (err)
goto err1;
err = rxe_pool_init(rxe, &rxe->pd_pool, RXE_TYPE_PD,
rxe->attr.max_pd);
if (err)
goto err2;
err = rxe_pool_init(rxe, &rxe->ah_pool, RXE_TYPE_AH,
rxe->attr.max_ah);
if (err)
goto err3;
err = rxe_pool_init(rxe, &rxe->srq_pool, RXE_TYPE_SRQ,
rxe->attr.max_srq);
if (err)
goto err4;
err = rxe_pool_init(rxe, &rxe->qp_pool, RXE_TYPE_QP,
rxe->attr.max_qp);
if (err)
goto err5;
err = rxe_pool_init(rxe, &rxe->cq_pool, RXE_TYPE_CQ,
rxe->attr.max_cq);
if (err)
goto err6;
err = rxe_pool_init(rxe, &rxe->mr_pool, RXE_TYPE_MR,
rxe->attr.max_mr);
if (err)
goto err7;
err = rxe_pool_init(rxe, &rxe->mw_pool, RXE_TYPE_MW,
rxe->attr.max_mw);
if (err)
goto err8;
err = rxe_pool_init(rxe, &rxe->mc_grp_pool, RXE_TYPE_MC_GRP,
rxe->attr.max_mcast_grp);
if (err)
goto err9;
err = rxe_pool_init(rxe, &rxe->mc_elem_pool, RXE_TYPE_MC_ELEM,
rxe->attr.max_total_mcast_qp_attach);
if (err)
goto err10;
return 0;
err10:
rxe_pool_cleanup(&rxe->mc_grp_pool);
err9:
rxe_pool_cleanup(&rxe->mw_pool);
err8:
rxe_pool_cleanup(&rxe->mr_pool);
err7:
rxe_pool_cleanup(&rxe->cq_pool);
err6:
rxe_pool_cleanup(&rxe->qp_pool);
err5:
rxe_pool_cleanup(&rxe->srq_pool);
err4:
rxe_pool_cleanup(&rxe->ah_pool);
err3:
rxe_pool_cleanup(&rxe->pd_pool);
err2:
rxe_pool_cleanup(&rxe->uc_pool);
err1:
return err;
}
/* initialize rxe device state */
static int rxe_init(struct rxe_dev *rxe)
{
int err;
/* init default device parameters */
rxe_init_device_param(rxe);
err = rxe_init_ports(rxe);
if (err)
goto err1;
err = rxe_init_pools(rxe);
if (err)
goto err2;
/* init pending mmap list */
spin_lock_init(&rxe->mmap_offset_lock);
spin_lock_init(&rxe->pending_lock);
INIT_LIST_HEAD(&rxe->pending_mmaps);
INIT_LIST_HEAD(&rxe->list);
mutex_init(&rxe->usdev_lock);
return 0;
err2:
rxe_cleanup_ports(rxe);
err1:
return err;
}
int rxe_set_mtu(struct rxe_dev *rxe, unsigned int ndev_mtu)
{
struct rxe_port *port = &rxe->port;
enum ib_mtu mtu;
mtu = eth_mtu_int_to_enum(ndev_mtu);
/* Make sure that new MTU in range */
mtu = mtu ? min_t(enum ib_mtu, mtu, RXE_PORT_MAX_MTU) : IB_MTU_256;
port->attr.active_mtu = mtu;
port->mtu_cap = ib_mtu_enum_to_int(mtu);
return 0;
}
EXPORT_SYMBOL(rxe_set_mtu);
/* called by ifc layer to create new rxe device.
* The caller should allocate memory for rxe by calling ib_alloc_device.
*/
int rxe_add(struct rxe_dev *rxe, unsigned int mtu)
{
int err;
kref_init(&rxe->ref_cnt);
err = rxe_init(rxe);
if (err)
goto err1;
err = rxe_set_mtu(rxe, mtu);
if (err)
goto err1;
err = rxe_register_device(rxe);
if (err)
goto err1;
return 0;
err1:
rxe_dev_put(rxe);
return err;
}
EXPORT_SYMBOL(rxe_add);
/* called by the ifc layer to remove a device */
void rxe_remove(struct rxe_dev *rxe)
{
rxe_unregister_device(rxe);
rxe_dev_put(rxe);
}
EXPORT_SYMBOL(rxe_remove);
static int __init rxe_module_init(void)
{
int err;
/* initialize slab caches for managed objects */
err = rxe_cache_init();
if (err) {
pr_err("rxe: unable to init object pools\n");
return err;
}
err = rxe_net_init();
if (err) {
pr_err("rxe: unable to init\n");
rxe_cache_exit();
return err;
}
pr_info("rxe: loaded\n");
return 0;
}
static void __exit rxe_module_exit(void)
{
rxe_remove_all();
rxe_net_exit();
rxe_cache_exit();
pr_info("rxe: unloaded\n");
}
module_init(rxe_module_init);
module_exit(rxe_module_exit);
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef RXE_H
#define RXE_H
#include <linux/module.h>
#include <linux/skbuff.h>
#include <linux/crc32.h>
#include <rdma/ib_verbs.h>
#include <rdma/ib_user_verbs.h>
#include <rdma/ib_pack.h>
#include <rdma/ib_smi.h>
#include <rdma/ib_umem.h>
#include <rdma/ib_cache.h>
#include <rdma/ib_addr.h>
#include "rxe_net.h"
#include "rxe_opcode.h"
#include "rxe_hdr.h"
#include "rxe_param.h"
#include "rxe_verbs.h"
#define RXE_UVERBS_ABI_VERSION (1)
#define IB_PHYS_STATE_LINK_UP (5)
#define IB_PHYS_STATE_LINK_DOWN (3)
#define RXE_ROCE_V2_SPORT (0xc000)
int rxe_set_mtu(struct rxe_dev *rxe, unsigned int dev_mtu);
int rxe_add(struct rxe_dev *rxe, unsigned int mtu);
void rxe_remove(struct rxe_dev *rxe);
void rxe_remove_all(void);
int rxe_rcv(struct sk_buff *skb);
void rxe_dev_put(struct rxe_dev *rxe);
struct rxe_dev *net_to_rxe(struct net_device *ndev);
struct rxe_dev *get_rxe_by_name(const char* name);
void rxe_port_up(struct rxe_dev *rxe);
void rxe_port_down(struct rxe_dev *rxe);
#endif /* RXE_H */
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
int rxe_av_chk_attr(struct rxe_dev *rxe, struct ib_ah_attr *attr)
{
struct rxe_port *port;
if (attr->port_num != 1) {
pr_info("rxe: invalid port_num = %d\n", attr->port_num);
return -EINVAL;
}
port = &rxe->port;
if (attr->ah_flags & IB_AH_GRH) {
if (attr->grh.sgid_index > port->attr.gid_tbl_len) {
pr_info("rxe: invalid sgid index = %d\n",
attr->grh.sgid_index);
return -EINVAL;
}
}
return 0;
}
int rxe_av_from_attr(struct rxe_dev *rxe, u8 port_num,
struct rxe_av *av, struct ib_ah_attr *attr)
{
memset(av, 0, sizeof(*av));
memcpy(&av->grh, &attr->grh, sizeof(attr->grh));
av->port_num = port_num;
return 0;
}
int rxe_av_to_attr(struct rxe_dev *rxe, struct rxe_av *av,
struct ib_ah_attr *attr)
{
memcpy(&attr->grh, &av->grh, sizeof(av->grh));
attr->port_num = av->port_num;
return 0;
}
int rxe_av_fill_ip_info(struct rxe_dev *rxe,
struct rxe_av *av,
struct ib_ah_attr *attr,
struct ib_gid_attr *sgid_attr,
union ib_gid *sgid)
{
rdma_gid2ip(&av->sgid_addr._sockaddr, sgid);
rdma_gid2ip(&av->dgid_addr._sockaddr, &attr->grh.dgid);
av->network_type = ib_gid_to_network_type(sgid_attr->gid_type, sgid);
return 0;
}
struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt)
{
if (!pkt || !pkt->qp)
return NULL;
if (qp_type(pkt->qp) == IB_QPT_RC || qp_type(pkt->qp) == IB_QPT_UC)
return &pkt->qp->pri_av;
return (pkt->wqe) ? &pkt->wqe->av : NULL;
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <linux/skbuff.h>
#include "rxe.h"
#include "rxe_loc.h"
#include "rxe_queue.h"
#include "rxe_task.h"
enum comp_state {
COMPST_GET_ACK,
COMPST_GET_WQE,
COMPST_COMP_WQE,
COMPST_COMP_ACK,
COMPST_CHECK_PSN,
COMPST_CHECK_ACK,
COMPST_READ,
COMPST_ATOMIC,
COMPST_WRITE_SEND,
COMPST_UPDATE_COMP,
COMPST_ERROR_RETRY,
COMPST_RNR_RETRY,
COMPST_ERROR,
COMPST_EXIT, /* We have an issue, and we want to rerun the completer */
COMPST_DONE, /* The completer finished successflly */
};
static char *comp_state_name[] = {
[COMPST_GET_ACK] = "GET ACK",
[COMPST_GET_WQE] = "GET WQE",
[COMPST_COMP_WQE] = "COMP WQE",
[COMPST_COMP_ACK] = "COMP ACK",
[COMPST_CHECK_PSN] = "CHECK PSN",
[COMPST_CHECK_ACK] = "CHECK ACK",
[COMPST_READ] = "READ",
[COMPST_ATOMIC] = "ATOMIC",
[COMPST_WRITE_SEND] = "WRITE/SEND",
[COMPST_UPDATE_COMP] = "UPDATE COMP",
[COMPST_ERROR_RETRY] = "ERROR RETRY",
[COMPST_RNR_RETRY] = "RNR RETRY",
[COMPST_ERROR] = "ERROR",
[COMPST_EXIT] = "EXIT",
[COMPST_DONE] = "DONE",
};
static unsigned long rnrnak_usec[32] = {
[IB_RNR_TIMER_655_36] = 655360,
[IB_RNR_TIMER_000_01] = 10,
[IB_RNR_TIMER_000_02] = 20,
[IB_RNR_TIMER_000_03] = 30,
[IB_RNR_TIMER_000_04] = 40,
[IB_RNR_TIMER_000_06] = 60,
[IB_RNR_TIMER_000_08] = 80,
[IB_RNR_TIMER_000_12] = 120,
[IB_RNR_TIMER_000_16] = 160,
[IB_RNR_TIMER_000_24] = 240,
[IB_RNR_TIMER_000_32] = 320,
[IB_RNR_TIMER_000_48] = 480,
[IB_RNR_TIMER_000_64] = 640,
[IB_RNR_TIMER_000_96] = 960,
[IB_RNR_TIMER_001_28] = 1280,
[IB_RNR_TIMER_001_92] = 1920,
[IB_RNR_TIMER_002_56] = 2560,
[IB_RNR_TIMER_003_84] = 3840,
[IB_RNR_TIMER_005_12] = 5120,
[IB_RNR_TIMER_007_68] = 7680,
[IB_RNR_TIMER_010_24] = 10240,
[IB_RNR_TIMER_015_36] = 15360,
[IB_RNR_TIMER_020_48] = 20480,
[IB_RNR_TIMER_030_72] = 30720,
[IB_RNR_TIMER_040_96] = 40960,
[IB_RNR_TIMER_061_44] = 61410,
[IB_RNR_TIMER_081_92] = 81920,
[IB_RNR_TIMER_122_88] = 122880,
[IB_RNR_TIMER_163_84] = 163840,
[IB_RNR_TIMER_245_76] = 245760,
[IB_RNR_TIMER_327_68] = 327680,
[IB_RNR_TIMER_491_52] = 491520,
};
static inline unsigned long rnrnak_jiffies(u8 timeout)
{
return max_t(unsigned long,
usecs_to_jiffies(rnrnak_usec[timeout]), 1);
}
static enum ib_wc_opcode wr_to_wc_opcode(enum ib_wr_opcode opcode)
{
switch (opcode) {
case IB_WR_RDMA_WRITE: return IB_WC_RDMA_WRITE;
case IB_WR_RDMA_WRITE_WITH_IMM: return IB_WC_RDMA_WRITE;
case IB_WR_SEND: return IB_WC_SEND;
case IB_WR_SEND_WITH_IMM: return IB_WC_SEND;
case IB_WR_RDMA_READ: return IB_WC_RDMA_READ;
case IB_WR_ATOMIC_CMP_AND_SWP: return IB_WC_COMP_SWAP;
case IB_WR_ATOMIC_FETCH_AND_ADD: return IB_WC_FETCH_ADD;
case IB_WR_LSO: return IB_WC_LSO;
case IB_WR_SEND_WITH_INV: return IB_WC_SEND;
case IB_WR_RDMA_READ_WITH_INV: return IB_WC_RDMA_READ;
case IB_WR_LOCAL_INV: return IB_WC_LOCAL_INV;
case IB_WR_REG_MR: return IB_WC_REG_MR;
default:
return 0xff;
}
}
void retransmit_timer(unsigned long data)
{
struct rxe_qp *qp = (struct rxe_qp *)data;
if (qp->valid) {
qp->comp.timeout = 1;
rxe_run_task(&qp->comp.task, 1);
}
}
void rxe_comp_queue_pkt(struct rxe_dev *rxe, struct rxe_qp *qp,
struct sk_buff *skb)
{
int must_sched;
skb_queue_tail(&qp->resp_pkts, skb);
must_sched = skb_queue_len(&qp->resp_pkts) > 1;
rxe_run_task(&qp->comp.task, must_sched);
}
static inline enum comp_state get_wqe(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe **wqe_p)
{
struct rxe_send_wqe *wqe;
/* we come here whether or not we found a response packet to see if
* there are any posted WQEs
*/
wqe = queue_head(qp->sq.queue);
*wqe_p = wqe;
/* no WQE or requester has not started it yet */
if (!wqe || wqe->state == wqe_state_posted)
return pkt ? COMPST_DONE : COMPST_EXIT;
/* WQE does not require an ack */
if (wqe->state == wqe_state_done)
return COMPST_COMP_WQE;
/* WQE caused an error */
if (wqe->state == wqe_state_error)
return COMPST_ERROR;
/* we have a WQE, if we also have an ack check its PSN */
return pkt ? COMPST_CHECK_PSN : COMPST_EXIT;
}
static inline void reset_retry_counters(struct rxe_qp *qp)
{
qp->comp.retry_cnt = qp->attr.retry_cnt;
qp->comp.rnr_retry = qp->attr.rnr_retry;
}
static inline enum comp_state check_psn(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe *wqe)
{
s32 diff;
/* check to see if response is past the oldest WQE. if it is, complete
* send/write or error read/atomic
*/
diff = psn_compare(pkt->psn, wqe->last_psn);
if (diff > 0) {
if (wqe->state == wqe_state_pending) {
if (wqe->mask & WR_ATOMIC_OR_READ_MASK)
return COMPST_ERROR_RETRY;
reset_retry_counters(qp);
return COMPST_COMP_WQE;
} else {
return COMPST_DONE;
}
}
/* compare response packet to expected response */
diff = psn_compare(pkt->psn, qp->comp.psn);
if (diff < 0) {
/* response is most likely a retried packet if it matches an
* uncompleted WQE go complete it else ignore it
*/
if (pkt->psn == wqe->last_psn)
return COMPST_COMP_ACK;
else
return COMPST_DONE;
} else if ((diff > 0) && (wqe->mask & WR_ATOMIC_OR_READ_MASK)) {
return COMPST_ERROR_RETRY;
} else {
return COMPST_CHECK_ACK;
}
}
static inline enum comp_state check_ack(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe *wqe)
{
unsigned int mask = pkt->mask;
u8 syn;
/* Check the sequence only */
switch (qp->comp.opcode) {
case -1:
/* Will catch all *_ONLY cases. */
if (!(mask & RXE_START_MASK))
return COMPST_ERROR;
break;
case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
if (pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE &&
pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST) {
return COMPST_ERROR;
}
break;
default:
WARN_ON(1);
}
/* Check operation validity. */
switch (pkt->opcode) {
case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
case IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST:
case IB_OPCODE_RC_RDMA_READ_RESPONSE_ONLY:
syn = aeth_syn(pkt);
if ((syn & AETH_TYPE_MASK) != AETH_ACK)
return COMPST_ERROR;
/* Fall through (IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE
* doesn't have an AETH)
*/
case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
if (wqe->wr.opcode != IB_WR_RDMA_READ &&
wqe->wr.opcode != IB_WR_RDMA_READ_WITH_INV) {
return COMPST_ERROR;
}
reset_retry_counters(qp);
return COMPST_READ;
case IB_OPCODE_RC_ATOMIC_ACKNOWLEDGE:
syn = aeth_syn(pkt);
if ((syn & AETH_TYPE_MASK) != AETH_ACK)
return COMPST_ERROR;
if (wqe->wr.opcode != IB_WR_ATOMIC_CMP_AND_SWP &&
wqe->wr.opcode != IB_WR_ATOMIC_FETCH_AND_ADD)
return COMPST_ERROR;
reset_retry_counters(qp);
return COMPST_ATOMIC;
case IB_OPCODE_RC_ACKNOWLEDGE:
syn = aeth_syn(pkt);
switch (syn & AETH_TYPE_MASK) {
case AETH_ACK:
reset_retry_counters(qp);
return COMPST_WRITE_SEND;
case AETH_RNR_NAK:
return COMPST_RNR_RETRY;
case AETH_NAK:
switch (syn) {
case AETH_NAK_PSN_SEQ_ERROR:
/* a nak implicitly acks all packets with psns
* before
*/
if (psn_compare(pkt->psn, qp->comp.psn) > 0) {
qp->comp.psn = pkt->psn;
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
rxe_run_task(&qp->req.task, 1);
}
}
return COMPST_ERROR_RETRY;
case AETH_NAK_INVALID_REQ:
wqe->status = IB_WC_REM_INV_REQ_ERR;
return COMPST_ERROR;
case AETH_NAK_REM_ACC_ERR:
wqe->status = IB_WC_REM_ACCESS_ERR;
return COMPST_ERROR;
case AETH_NAK_REM_OP_ERR:
wqe->status = IB_WC_REM_OP_ERR;
return COMPST_ERROR;
default:
pr_warn("unexpected nak %x\n", syn);
wqe->status = IB_WC_REM_OP_ERR;
return COMPST_ERROR;
}
default:
return COMPST_ERROR;
}
break;
default:
pr_warn("unexpected opcode\n");
}
return COMPST_ERROR;
}
static inline enum comp_state do_read(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe *wqe)
{
struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
int ret;
ret = copy_data(rxe, qp->pd, IB_ACCESS_LOCAL_WRITE,
&wqe->dma, payload_addr(pkt),
payload_size(pkt), to_mem_obj, NULL);
if (ret)
return COMPST_ERROR;
if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
return COMPST_COMP_ACK;
else
return COMPST_UPDATE_COMP;
}
static inline enum comp_state do_atomic(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe *wqe)
{
struct rxe_dev *rxe = to_rdev(qp->ibqp.device);
int ret;
u64 atomic_orig = atmack_orig(pkt);
ret = copy_data(rxe, qp->pd, IB_ACCESS_LOCAL_WRITE,
&wqe->dma, &atomic_orig,
sizeof(u64), to_mem_obj, NULL);
if (ret)
return COMPST_ERROR;
else
return COMPST_COMP_ACK;
}
static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
struct rxe_cqe *cqe)
{
memset(cqe, 0, sizeof(*cqe));
if (!qp->is_user) {
struct ib_wc *wc = &cqe->ibwc;
wc->wr_id = wqe->wr.wr_id;
wc->status = wqe->status;
wc->opcode = wr_to_wc_opcode(wqe->wr.opcode);
if (wqe->wr.opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
wqe->wr.opcode == IB_WR_SEND_WITH_IMM)
wc->wc_flags = IB_WC_WITH_IMM;
wc->byte_len = wqe->dma.length;
wc->qp = &qp->ibqp;
} else {
struct ib_uverbs_wc *uwc = &cqe->uibwc;
uwc->wr_id = wqe->wr.wr_id;
uwc->status = wqe->status;
uwc->opcode = wr_to_wc_opcode(wqe->wr.opcode);
if (wqe->wr.opcode == IB_WR_RDMA_WRITE_WITH_IMM ||
wqe->wr.opcode == IB_WR_SEND_WITH_IMM)
uwc->wc_flags = IB_WC_WITH_IMM;
uwc->byte_len = wqe->dma.length;
uwc->qp_num = qp->ibqp.qp_num;
}
}
static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
{
struct rxe_cqe cqe;
if ((qp->sq_sig_type == IB_SIGNAL_ALL_WR) ||
(wqe->wr.send_flags & IB_SEND_SIGNALED) ||
(qp->req.state == QP_STATE_ERROR)) {
make_send_cqe(qp, wqe, &cqe);
rxe_cq_post(qp->scq, &cqe, 0);
}
advance_consumer(qp->sq.queue);
/*
* we completed something so let req run again
* if it is trying to fence
*/
if (qp->req.wait_fence) {
qp->req.wait_fence = 0;
rxe_run_task(&qp->req.task, 1);
}
}
static inline enum comp_state complete_ack(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe *wqe)
{
unsigned long flags;
if (wqe->has_rd_atomic) {
wqe->has_rd_atomic = 0;
atomic_inc(&qp->req.rd_atomic);
if (qp->req.need_rd_atomic) {
qp->comp.timeout_retry = 0;
qp->req.need_rd_atomic = 0;
rxe_run_task(&qp->req.task, 1);
}
}
if (unlikely(qp->req.state == QP_STATE_DRAIN)) {
/* state_lock used by requester & completer */
spin_lock_irqsave(&qp->state_lock, flags);
if ((qp->req.state == QP_STATE_DRAIN) &&
(qp->comp.psn == qp->req.psn)) {
qp->req.state = QP_STATE_DRAINED;
spin_unlock_irqrestore(&qp->state_lock, flags);
if (qp->ibqp.event_handler) {
struct ib_event ev;
ev.device = qp->ibqp.device;
ev.element.qp = &qp->ibqp;
ev.event = IB_EVENT_SQ_DRAINED;
qp->ibqp.event_handler(&ev,
qp->ibqp.qp_context);
}
} else {
spin_unlock_irqrestore(&qp->state_lock, flags);
}
}
do_complete(qp, wqe);
if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
return COMPST_UPDATE_COMP;
else
return COMPST_DONE;
}
static inline enum comp_state complete_wqe(struct rxe_qp *qp,
struct rxe_pkt_info *pkt,
struct rxe_send_wqe *wqe)
{
qp->comp.opcode = -1;
if (pkt) {
if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
qp->comp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
rxe_run_task(&qp->req.task, 1);
}
}
do_complete(qp, wqe);
return COMPST_GET_WQE;
}
int rxe_completer(void *arg)
{
struct rxe_qp *qp = (struct rxe_qp *)arg;
struct rxe_send_wqe *wqe = wqe;
struct sk_buff *skb = NULL;
struct rxe_pkt_info *pkt = NULL;
enum comp_state state;
if (!qp->valid) {
while ((skb = skb_dequeue(&qp->resp_pkts))) {
rxe_drop_ref(qp);
kfree_skb(skb);
}
skb = NULL;
pkt = NULL;
while (queue_head(qp->sq.queue))
advance_consumer(qp->sq.queue);
goto exit;
}
if (qp->req.state == QP_STATE_ERROR) {
while ((skb = skb_dequeue(&qp->resp_pkts))) {
rxe_drop_ref(qp);
kfree_skb(skb);
}
skb = NULL;
pkt = NULL;
while ((wqe = queue_head(qp->sq.queue))) {
wqe->status = IB_WC_WR_FLUSH_ERR;
do_complete(qp, wqe);
}
goto exit;
}
if (qp->req.state == QP_STATE_RESET) {
while ((skb = skb_dequeue(&qp->resp_pkts))) {
rxe_drop_ref(qp);
kfree_skb(skb);
}
skb = NULL;
pkt = NULL;
while (queue_head(qp->sq.queue))
advance_consumer(qp->sq.queue);
goto exit;
}
if (qp->comp.timeout) {
qp->comp.timeout_retry = 1;
qp->comp.timeout = 0;
} else {
qp->comp.timeout_retry = 0;
}
if (qp->req.need_retry)
goto exit;
state = COMPST_GET_ACK;
while (1) {
pr_debug("state = %s\n", comp_state_name[state]);
switch (state) {
case COMPST_GET_ACK:
skb = skb_dequeue(&qp->resp_pkts);
if (skb) {
pkt = SKB_TO_PKT(skb);
qp->comp.timeout_retry = 0;
}
state = COMPST_GET_WQE;
break;
case COMPST_GET_WQE:
state = get_wqe(qp, pkt, &wqe);
break;
case COMPST_CHECK_PSN:
state = check_psn(qp, pkt, wqe);
break;
case COMPST_CHECK_ACK:
state = check_ack(qp, pkt, wqe);
break;
case COMPST_READ:
state = do_read(qp, pkt, wqe);
break;
case COMPST_ATOMIC:
state = do_atomic(qp, pkt, wqe);
break;
case COMPST_WRITE_SEND:
if (wqe->state == wqe_state_pending &&
wqe->last_psn == pkt->psn)
state = COMPST_COMP_ACK;
else
state = COMPST_UPDATE_COMP;
break;
case COMPST_COMP_ACK:
state = complete_ack(qp, pkt, wqe);
break;
case COMPST_COMP_WQE:
state = complete_wqe(qp, pkt, wqe);
break;
case COMPST_UPDATE_COMP:
if (pkt->mask & RXE_END_MASK)
qp->comp.opcode = -1;
else
qp->comp.opcode = pkt->opcode;
if (psn_compare(pkt->psn, qp->comp.psn) >= 0)
qp->comp.psn = (pkt->psn + 1) & BTH_PSN_MASK;
if (qp->req.wait_psn) {
qp->req.wait_psn = 0;
rxe_run_task(&qp->req.task, 1);
}
state = COMPST_DONE;
break;
case COMPST_DONE:
if (pkt) {
rxe_drop_ref(pkt->qp);
kfree_skb(skb);
}
goto done;
case COMPST_EXIT:
if (qp->comp.timeout_retry && wqe) {
state = COMPST_ERROR_RETRY;
break;
}
/* re reset the timeout counter if
* (1) QP is type RC
* (2) the QP is alive
* (3) there is a packet sent by the requester that
* might be acked (we still might get spurious
* timeouts but try to keep them as few as possible)
* (4) the timeout parameter is set
*/
if ((qp_type(qp) == IB_QPT_RC) &&
(qp->req.state == QP_STATE_READY) &&
(psn_compare(qp->req.psn, qp->comp.psn) > 0) &&
qp->qp_timeout_jiffies)
mod_timer(&qp->retrans_timer,
jiffies + qp->qp_timeout_jiffies);
goto exit;
case COMPST_ERROR_RETRY:
/* we come here if the retry timer fired and we did
* not receive a response packet. try to retry the send
* queue if that makes sense and the limits have not
* been exceeded. remember that some timeouts are
* spurious since we do not reset the timer but kick
* it down the road or let it expire
*/
/* there is nothing to retry in this case */
if (!wqe || (wqe->state == wqe_state_posted))
goto exit;
if (qp->comp.retry_cnt > 0) {
if (qp->comp.retry_cnt != 7)
qp->comp.retry_cnt--;
/* no point in retrying if we have already
* seen the last ack that the requester could
* have caused
*/
if (psn_compare(qp->req.psn,
qp->comp.psn) > 0) {
/* tell the requester to retry the
* send send queue next time around
*/
qp->req.need_retry = 1;
rxe_run_task(&qp->req.task, 1);
}
goto exit;
} else {
wqe->status = IB_WC_RETRY_EXC_ERR;
state = COMPST_ERROR;
}
break;
case COMPST_RNR_RETRY:
if (qp->comp.rnr_retry > 0) {
if (qp->comp.rnr_retry != 7)
qp->comp.rnr_retry--;
qp->req.need_retry = 1;
pr_debug("set rnr nak timer\n");
mod_timer(&qp->rnr_nak_timer,
jiffies + rnrnak_jiffies(aeth_syn(pkt)
& ~AETH_TYPE_MASK));
goto exit;
} else {
wqe->status = IB_WC_RNR_RETRY_EXC_ERR;
state = COMPST_ERROR;
}
break;
case COMPST_ERROR:
do_complete(qp, wqe);
rxe_qp_error(qp);
goto exit;
}
}
exit:
/* we come here if we are done with processing and want the task to
* exit from the loop calling us
*/
return -EAGAIN;
done:
/* we come here if we have processed a packet we want the task to call
* us again to see if there is anything else to do
*/
return 0;
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
#include "rxe_queue.h"
int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
int cqe, int comp_vector, struct ib_udata *udata)
{
int count;
if (cqe <= 0) {
pr_warn("cqe(%d) <= 0\n", cqe);
goto err1;
}
if (cqe > rxe->attr.max_cqe) {
pr_warn("cqe(%d) > max_cqe(%d)\n",
cqe, rxe->attr.max_cqe);
goto err1;
}
if (cq) {
count = queue_count(cq->queue);
if (cqe < count) {
pr_warn("cqe(%d) < current # elements in queue (%d)",
cqe, count);
goto err1;
}
}
return 0;
err1:
return -EINVAL;
}
static void rxe_send_complete(unsigned long data)
{
struct rxe_cq *cq = (struct rxe_cq *)data;
cq->ibcq.comp_handler(&cq->ibcq, cq->ibcq.cq_context);
}
int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
int comp_vector, struct ib_ucontext *context,
struct ib_udata *udata)
{
int err;
cq->queue = rxe_queue_init(rxe, &cqe,
sizeof(struct rxe_cqe));
if (!cq->queue) {
pr_warn("unable to create cq\n");
return -ENOMEM;
}
err = do_mmap_info(rxe, udata, false, context, cq->queue->buf,
cq->queue->buf_size, &cq->queue->ip);
if (err) {
kvfree(cq->queue->buf);
kfree(cq->queue);
return err;
}
if (udata)
cq->is_user = 1;
tasklet_init(&cq->comp_task, rxe_send_complete, (unsigned long)cq);
spin_lock_init(&cq->cq_lock);
cq->ibcq.cqe = cqe;
return 0;
}
int rxe_cq_resize_queue(struct rxe_cq *cq, int cqe, struct ib_udata *udata)
{
int err;
err = rxe_queue_resize(cq->queue, (unsigned int *)&cqe,
sizeof(struct rxe_cqe),
cq->queue->ip ? cq->queue->ip->context : NULL,
udata, NULL, &cq->cq_lock);
if (!err)
cq->ibcq.cqe = cqe;
return err;
}
int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited)
{
struct ib_event ev;
unsigned long flags;
spin_lock_irqsave(&cq->cq_lock, flags);
if (unlikely(queue_full(cq->queue))) {
spin_unlock_irqrestore(&cq->cq_lock, flags);
if (cq->ibcq.event_handler) {
ev.device = cq->ibcq.device;
ev.element.cq = &cq->ibcq;
ev.event = IB_EVENT_CQ_ERR;
cq->ibcq.event_handler(&ev, cq->ibcq.cq_context);
}
return -EBUSY;
}
memcpy(producer_addr(cq->queue), cqe, sizeof(*cqe));
/* make sure all changes to the CQ are written before we update the
* producer pointer
*/
smp_wmb();
advance_producer(cq->queue);
spin_unlock_irqrestore(&cq->cq_lock, flags);
if ((cq->notify == IB_CQ_NEXT_COMP) ||
(cq->notify == IB_CQ_SOLICITED && solicited)) {
cq->notify = 0;
tasklet_schedule(&cq->comp_task);
}
return 0;
}
void rxe_cq_cleanup(void *arg)
{
struct rxe_cq *cq = arg;
if (cq->queue)
rxe_queue_cleanup(cq->queue);
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
#define DMA_BAD_ADDER ((u64)0)
static int rxe_mapping_error(struct ib_device *dev, u64 dma_addr)
{
return dma_addr == DMA_BAD_ADDER;
}
static u64 rxe_dma_map_single(struct ib_device *dev,
void *cpu_addr, size_t size,
enum dma_data_direction direction)
{
WARN_ON(!valid_dma_direction(direction));
return (uintptr_t)cpu_addr;
}
static void rxe_dma_unmap_single(struct ib_device *dev,
u64 addr, size_t size,
enum dma_data_direction direction)
{
WARN_ON(!valid_dma_direction(direction));
}
static u64 rxe_dma_map_page(struct ib_device *dev,
struct page *page,
unsigned long offset,
size_t size, enum dma_data_direction direction)
{
u64 addr;
WARN_ON(!valid_dma_direction(direction));
if (offset + size > PAGE_SIZE) {
addr = DMA_BAD_ADDER;
goto done;
}
addr = (uintptr_t)page_address(page);
if (addr)
addr += offset;
done:
return addr;
}
static void rxe_dma_unmap_page(struct ib_device *dev,
u64 addr, size_t size,
enum dma_data_direction direction)
{
WARN_ON(!valid_dma_direction(direction));
}
static int rxe_map_sg(struct ib_device *dev, struct scatterlist *sgl,
int nents, enum dma_data_direction direction)
{
struct scatterlist *sg;
u64 addr;
int i;
int ret = nents;
WARN_ON(!valid_dma_direction(direction));
for_each_sg(sgl, sg, nents, i) {
addr = (uintptr_t)page_address(sg_page(sg));
if (!addr) {
ret = 0;
break;
}
sg->dma_address = addr + sg->offset;
#ifdef CONFIG_NEED_SG_DMA_LENGTH
sg->dma_length = sg->length;
#endif
}
return ret;
}
static void rxe_unmap_sg(struct ib_device *dev,
struct scatterlist *sg, int nents,
enum dma_data_direction direction)
{
WARN_ON(!valid_dma_direction(direction));
}
static void rxe_sync_single_for_cpu(struct ib_device *dev,
u64 addr,
size_t size, enum dma_data_direction dir)
{
}
static void rxe_sync_single_for_device(struct ib_device *dev,
u64 addr,
size_t size, enum dma_data_direction dir)
{
}
static void *rxe_dma_alloc_coherent(struct ib_device *dev, size_t size,
u64 *dma_handle, gfp_t flag)
{
struct page *p;
void *addr = NULL;
p = alloc_pages(flag, get_order(size));
if (p)
addr = page_address(p);
if (dma_handle)
*dma_handle = (uintptr_t)addr;
return addr;
}
static void rxe_dma_free_coherent(struct ib_device *dev, size_t size,
void *cpu_addr, u64 dma_handle)
{
free_pages((unsigned long)cpu_addr, get_order(size));
}
struct ib_dma_mapping_ops rxe_dma_mapping_ops = {
.mapping_error = rxe_mapping_error,
.map_single = rxe_dma_map_single,
.unmap_single = rxe_dma_unmap_single,
.map_page = rxe_dma_map_page,
.unmap_page = rxe_dma_unmap_page,
.map_sg = rxe_map_sg,
.unmap_sg = rxe_unmap_sg,
.sync_single_for_cpu = rxe_sync_single_for_cpu,
.sync_single_for_device = rxe_sync_single_for_device,
.alloc_coherent = rxe_dma_alloc_coherent,
.free_coherent = rxe_dma_free_coherent
};
此差异已折叠。
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
/* Compute a partial ICRC for all the IB transport headers. */
u32 rxe_icrc_hdr(struct rxe_pkt_info *pkt, struct sk_buff *skb)
{
unsigned int bth_offset = 0;
struct iphdr *ip4h = NULL;
struct ipv6hdr *ip6h = NULL;
struct udphdr *udph;
struct rxe_bth *bth;
int crc;
int length;
int hdr_size = sizeof(struct udphdr) +
(skb->protocol == htons(ETH_P_IP) ?
sizeof(struct iphdr) : sizeof(struct ipv6hdr));
/* pseudo header buffer size is calculate using ipv6 header size since
* it is bigger than ipv4
*/
u8 pshdr[sizeof(struct udphdr) +
sizeof(struct ipv6hdr) +
RXE_BTH_BYTES];
/* This seed is the result of computing a CRC with a seed of
* 0xfffffff and 8 bytes of 0xff representing a masked LRH.
*/
crc = 0xdebb20e3;
if (skb->protocol == htons(ETH_P_IP)) { /* IPv4 */
memcpy(pshdr, ip_hdr(skb), hdr_size);
ip4h = (struct iphdr *)pshdr;
udph = (struct udphdr *)(ip4h + 1);
ip4h->ttl = 0xff;
ip4h->check = CSUM_MANGLED_0;
ip4h->tos = 0xff;
} else { /* IPv6 */
memcpy(pshdr, ipv6_hdr(skb), hdr_size);
ip6h = (struct ipv6hdr *)pshdr;
udph = (struct udphdr *)(ip6h + 1);
memset(ip6h->flow_lbl, 0xff, sizeof(ip6h->flow_lbl));
ip6h->priority = 0xf;
ip6h->hop_limit = 0xff;
}
udph->check = CSUM_MANGLED_0;
bth_offset += hdr_size;
memcpy(&pshdr[bth_offset], pkt->hdr, RXE_BTH_BYTES);
bth = (struct rxe_bth *)&pshdr[bth_offset];
/* exclude bth.resv8a */
bth->qpn |= cpu_to_be32(~BTH_QPN_MASK);
length = hdr_size + RXE_BTH_BYTES;
crc = crc32_le(crc, pshdr, length);
/* And finish to compute the CRC on the remainder of the headers. */
crc = crc32_le(crc, pkt->hdr + RXE_BTH_BYTES,
rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES);
return crc;
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef RXE_LOC_H
#define RXE_LOC_H
/* rxe_av.c */
int rxe_av_chk_attr(struct rxe_dev *rxe, struct ib_ah_attr *attr);
int rxe_av_from_attr(struct rxe_dev *rxe, u8 port_num,
struct rxe_av *av, struct ib_ah_attr *attr);
int rxe_av_to_attr(struct rxe_dev *rxe, struct rxe_av *av,
struct ib_ah_attr *attr);
int rxe_av_fill_ip_info(struct rxe_dev *rxe,
struct rxe_av *av,
struct ib_ah_attr *attr,
struct ib_gid_attr *sgid_attr,
union ib_gid *sgid);
struct rxe_av *rxe_get_av(struct rxe_pkt_info *pkt);
/* rxe_cq.c */
int rxe_cq_chk_attr(struct rxe_dev *rxe, struct rxe_cq *cq,
int cqe, int comp_vector, struct ib_udata *udata);
int rxe_cq_from_init(struct rxe_dev *rxe, struct rxe_cq *cq, int cqe,
int comp_vector, struct ib_ucontext *context,
struct ib_udata *udata);
int rxe_cq_resize_queue(struct rxe_cq *cq, int new_cqe, struct ib_udata *udata);
int rxe_cq_post(struct rxe_cq *cq, struct rxe_cqe *cqe, int solicited);
void rxe_cq_cleanup(void *arg);
/* rxe_mcast.c */
int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
struct rxe_mc_grp **grp_p);
int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
struct rxe_mc_grp *grp);
int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
union ib_gid *mgid);
void rxe_drop_all_mcast_groups(struct rxe_qp *qp);
void rxe_mc_cleanup(void *arg);
/* rxe_mmap.c */
struct rxe_mmap_info {
struct list_head pending_mmaps;
struct ib_ucontext *context;
struct kref ref;
void *obj;
struct mminfo info;
};
void rxe_mmap_release(struct kref *ref);
struct rxe_mmap_info *rxe_create_mmap_info(struct rxe_dev *dev,
u32 size,
struct ib_ucontext *context,
void *obj);
int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma);
/* rxe_mr.c */
enum copy_direction {
to_mem_obj,
from_mem_obj,
};
int rxe_mem_init_dma(struct rxe_dev *rxe, struct rxe_pd *pd,
int access, struct rxe_mem *mem);
int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
u64 length, u64 iova, int access, struct ib_udata *udata,
struct rxe_mem *mr);
int rxe_mem_init_fast(struct rxe_dev *rxe, struct rxe_pd *pd,
int max_pages, struct rxe_mem *mem);
int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr,
int length, enum copy_direction dir, u32 *crcp);
int copy_data(struct rxe_dev *rxe, struct rxe_pd *pd, int access,
struct rxe_dma_info *dma, void *addr, int length,
enum copy_direction dir, u32 *crcp);
void *iova_to_vaddr(struct rxe_mem *mem, u64 iova, int length);
enum lookup_type {
lookup_local,
lookup_remote,
};
struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key,
enum lookup_type type);
int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length);
int rxe_mem_map_pages(struct rxe_dev *rxe, struct rxe_mem *mem,
u64 *page, int num_pages, u64 iova);
void rxe_mem_cleanup(void *arg);
int advance_dma_data(struct rxe_dma_info *dma, unsigned int length);
/* rxe_qp.c */
int rxe_qp_chk_init(struct rxe_dev *rxe, struct ib_qp_init_attr *init);
int rxe_qp_from_init(struct rxe_dev *rxe, struct rxe_qp *qp, struct rxe_pd *pd,
struct ib_qp_init_attr *init, struct ib_udata *udata,
struct ib_pd *ibpd);
int rxe_qp_to_init(struct rxe_qp *qp, struct ib_qp_init_attr *init);
int rxe_qp_chk_attr(struct rxe_dev *rxe, struct rxe_qp *qp,
struct ib_qp_attr *attr, int mask);
int rxe_qp_from_attr(struct rxe_qp *qp, struct ib_qp_attr *attr,
int mask, struct ib_udata *udata);
int rxe_qp_to_attr(struct rxe_qp *qp, struct ib_qp_attr *attr, int mask);
void rxe_qp_error(struct rxe_qp *qp);
void rxe_qp_destroy(struct rxe_qp *qp);
void rxe_qp_cleanup(void *arg);
static inline int qp_num(struct rxe_qp *qp)
{
return qp->ibqp.qp_num;
}
static inline enum ib_qp_type qp_type(struct rxe_qp *qp)
{
return qp->ibqp.qp_type;
}
static inline enum ib_qp_state qp_state(struct rxe_qp *qp)
{
return qp->attr.qp_state;
}
static inline int qp_mtu(struct rxe_qp *qp)
{
if (qp->ibqp.qp_type == IB_QPT_RC || qp->ibqp.qp_type == IB_QPT_UC)
return qp->attr.path_mtu;
else
return RXE_PORT_MAX_MTU;
}
static inline int rcv_wqe_size(int max_sge)
{
return sizeof(struct rxe_recv_wqe) +
max_sge * sizeof(struct ib_sge);
}
void free_rd_atomic_resource(struct rxe_qp *qp, struct resp_res *res);
static inline void rxe_advance_resp_resource(struct rxe_qp *qp)
{
qp->resp.res_head++;
if (unlikely(qp->resp.res_head == qp->attr.max_rd_atomic))
qp->resp.res_head = 0;
}
void retransmit_timer(unsigned long data);
void rnr_nak_timer(unsigned long data);
void dump_qp(struct rxe_qp *qp);
/* rxe_srq.c */
#define IB_SRQ_INIT_MASK (~IB_SRQ_LIMIT)
int rxe_srq_chk_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
struct ib_srq_attr *attr, enum ib_srq_attr_mask mask);
int rxe_srq_from_init(struct rxe_dev *rxe, struct rxe_srq *srq,
struct ib_srq_init_attr *init,
struct ib_ucontext *context, struct ib_udata *udata);
int rxe_srq_from_attr(struct rxe_dev *rxe, struct rxe_srq *srq,
struct ib_srq_attr *attr, enum ib_srq_attr_mask mask,
struct ib_udata *udata);
extern struct ib_dma_mapping_ops rxe_dma_mapping_ops;
void rxe_release(struct kref *kref);
int rxe_completer(void *arg);
int rxe_requester(void *arg);
int rxe_responder(void *arg);
u32 rxe_icrc_hdr(struct rxe_pkt_info *pkt, struct sk_buff *skb);
void rxe_resp_queue_pkt(struct rxe_dev *rxe,
struct rxe_qp *qp, struct sk_buff *skb);
void rxe_comp_queue_pkt(struct rxe_dev *rxe,
struct rxe_qp *qp, struct sk_buff *skb);
static inline unsigned wr_opcode_mask(int opcode, struct rxe_qp *qp)
{
return rxe_wr_opcode_info[opcode].mask[qp->ibqp.qp_type];
}
static inline int rxe_xmit_packet(struct rxe_dev *rxe, struct rxe_qp *qp,
struct rxe_pkt_info *pkt, struct sk_buff *skb)
{
int err;
int is_request = pkt->mask & RXE_REQ_MASK;
if ((is_request && (qp->req.state != QP_STATE_READY)) ||
(!is_request && (qp->resp.state != QP_STATE_READY))) {
pr_info("Packet dropped. QP is not in ready state\n");
goto drop;
}
if (pkt->mask & RXE_LOOPBACK_MASK) {
memcpy(SKB_TO_PKT(skb), pkt, sizeof(*pkt));
err = rxe->ifc_ops->loopback(skb);
} else {
err = rxe->ifc_ops->send(rxe, pkt, skb);
}
if (err) {
rxe->xmit_errors++;
return err;
}
atomic_inc(&qp->skb_out);
if ((qp_type(qp) != IB_QPT_RC) &&
(pkt->mask & RXE_END_MASK)) {
pkt->wqe->state = wqe_state_done;
rxe_run_task(&qp->comp.task, 1);
}
goto done;
drop:
kfree_skb(skb);
err = 0;
done:
return err;
}
#endif /* RXE_LOC_H */
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
int rxe_mcast_get_grp(struct rxe_dev *rxe, union ib_gid *mgid,
struct rxe_mc_grp **grp_p)
{
int err;
struct rxe_mc_grp *grp;
if (rxe->attr.max_mcast_qp_attach == 0) {
err = -EINVAL;
goto err1;
}
grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
if (grp)
goto done;
grp = rxe_alloc(&rxe->mc_grp_pool);
if (!grp) {
err = -ENOMEM;
goto err1;
}
INIT_LIST_HEAD(&grp->qp_list);
spin_lock_init(&grp->mcg_lock);
grp->rxe = rxe;
rxe_add_key(grp, mgid);
err = rxe->ifc_ops->mcast_add(rxe, mgid);
if (err)
goto err2;
done:
*grp_p = grp;
return 0;
err2:
rxe_drop_ref(grp);
err1:
return err;
}
int rxe_mcast_add_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
struct rxe_mc_grp *grp)
{
int err;
struct rxe_mc_elem *elem;
/* check to see of the qp is already a member of the group */
spin_lock_bh(&qp->grp_lock);
spin_lock_bh(&grp->mcg_lock);
list_for_each_entry(elem, &grp->qp_list, qp_list) {
if (elem->qp == qp) {
err = 0;
goto out;
}
}
if (grp->num_qp >= rxe->attr.max_mcast_qp_attach) {
err = -ENOMEM;
goto out;
}
elem = rxe_alloc(&rxe->mc_elem_pool);
if (!elem) {
err = -ENOMEM;
goto out;
}
/* each qp holds a ref on the grp */
rxe_add_ref(grp);
grp->num_qp++;
elem->qp = qp;
elem->grp = grp;
list_add(&elem->qp_list, &grp->qp_list);
list_add(&elem->grp_list, &qp->grp_list);
err = 0;
out:
spin_unlock_bh(&grp->mcg_lock);
spin_unlock_bh(&qp->grp_lock);
return err;
}
int rxe_mcast_drop_grp_elem(struct rxe_dev *rxe, struct rxe_qp *qp,
union ib_gid *mgid)
{
struct rxe_mc_grp *grp;
struct rxe_mc_elem *elem, *tmp;
grp = rxe_pool_get_key(&rxe->mc_grp_pool, mgid);
if (!grp)
goto err1;
spin_lock_bh(&qp->grp_lock);
spin_lock_bh(&grp->mcg_lock);
list_for_each_entry_safe(elem, tmp, &grp->qp_list, qp_list) {
if (elem->qp == qp) {
list_del(&elem->qp_list);
list_del(&elem->grp_list);
grp->num_qp--;
spin_unlock_bh(&grp->mcg_lock);
spin_unlock_bh(&qp->grp_lock);
rxe_drop_ref(elem);
rxe_drop_ref(grp); /* ref held by QP */
rxe_drop_ref(grp); /* ref from get_key */
return 0;
}
}
spin_unlock_bh(&grp->mcg_lock);
spin_unlock_bh(&qp->grp_lock);
rxe_drop_ref(grp); /* ref from get_key */
err1:
return -EINVAL;
}
void rxe_drop_all_mcast_groups(struct rxe_qp *qp)
{
struct rxe_mc_grp *grp;
struct rxe_mc_elem *elem;
while (1) {
spin_lock_bh(&qp->grp_lock);
if (list_empty(&qp->grp_list)) {
spin_unlock_bh(&qp->grp_lock);
break;
}
elem = list_first_entry(&qp->grp_list, struct rxe_mc_elem,
grp_list);
list_del(&elem->grp_list);
spin_unlock_bh(&qp->grp_lock);
grp = elem->grp;
spin_lock_bh(&grp->mcg_lock);
list_del(&elem->qp_list);
grp->num_qp--;
spin_unlock_bh(&grp->mcg_lock);
rxe_drop_ref(grp);
rxe_drop_ref(elem);
}
}
void rxe_mc_cleanup(void *arg)
{
struct rxe_mc_grp *grp = arg;
struct rxe_dev *rxe = grp->rxe;
rxe_drop_key(grp);
rxe->ifc_ops->mcast_delete(rxe, &grp->mgid);
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <linux/module.h>
#include <linux/vmalloc.h>
#include <linux/mm.h>
#include <linux/errno.h>
#include <asm/pgtable.h>
#include "rxe.h"
#include "rxe_loc.h"
#include "rxe_queue.h"
void rxe_mmap_release(struct kref *ref)
{
struct rxe_mmap_info *ip = container_of(ref,
struct rxe_mmap_info, ref);
struct rxe_dev *rxe = to_rdev(ip->context->device);
spin_lock_bh(&rxe->pending_lock);
if (!list_empty(&ip->pending_mmaps))
list_del(&ip->pending_mmaps);
spin_unlock_bh(&rxe->pending_lock);
vfree(ip->obj); /* buf */
kfree(ip);
}
/*
* open and close keep track of how many times the memory region is mapped,
* to avoid releasing it.
*/
static void rxe_vma_open(struct vm_area_struct *vma)
{
struct rxe_mmap_info *ip = vma->vm_private_data;
kref_get(&ip->ref);
}
static void rxe_vma_close(struct vm_area_struct *vma)
{
struct rxe_mmap_info *ip = vma->vm_private_data;
kref_put(&ip->ref, rxe_mmap_release);
}
static struct vm_operations_struct rxe_vm_ops = {
.open = rxe_vma_open,
.close = rxe_vma_close,
};
/**
* rxe_mmap - create a new mmap region
* @context: the IB user context of the process making the mmap() call
* @vma: the VMA to be initialized
* Return zero if the mmap is OK. Otherwise, return an errno.
*/
int rxe_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
{
struct rxe_dev *rxe = to_rdev(context->device);
unsigned long offset = vma->vm_pgoff << PAGE_SHIFT;
unsigned long size = vma->vm_end - vma->vm_start;
struct rxe_mmap_info *ip, *pp;
int ret;
/*
* Search the device's list of objects waiting for a mmap call.
* Normally, this list is very short since a call to create a
* CQ, QP, or SRQ is soon followed by a call to mmap().
*/
spin_lock_bh(&rxe->pending_lock);
list_for_each_entry_safe(ip, pp, &rxe->pending_mmaps, pending_mmaps) {
if (context != ip->context || (__u64)offset != ip->info.offset)
continue;
/* Don't allow a mmap larger than the object. */
if (size > ip->info.size) {
pr_err("mmap region is larger than the object!\n");
spin_unlock_bh(&rxe->pending_lock);
ret = -EINVAL;
goto done;
}
goto found_it;
}
pr_warn("unable to find pending mmap info\n");
spin_unlock_bh(&rxe->pending_lock);
ret = -EINVAL;
goto done;
found_it:
list_del_init(&ip->pending_mmaps);
spin_unlock_bh(&rxe->pending_lock);
ret = remap_vmalloc_range(vma, ip->obj, 0);
if (ret) {
pr_err("rxe: err %d from remap_vmalloc_range\n", ret);
goto done;
}
vma->vm_ops = &rxe_vm_ops;
vma->vm_private_data = ip;
rxe_vma_open(vma);
done:
return ret;
}
/*
* Allocate information for rxe_mmap
*/
struct rxe_mmap_info *rxe_create_mmap_info(struct rxe_dev *rxe,
u32 size,
struct ib_ucontext *context,
void *obj)
{
struct rxe_mmap_info *ip;
ip = kmalloc(sizeof(*ip), GFP_KERNEL);
if (!ip)
return NULL;
size = PAGE_ALIGN(size);
spin_lock_bh(&rxe->mmap_offset_lock);
if (rxe->mmap_offset == 0)
rxe->mmap_offset = PAGE_SIZE;
ip->info.offset = rxe->mmap_offset;
rxe->mmap_offset += size;
spin_unlock_bh(&rxe->mmap_offset_lock);
INIT_LIST_HEAD(&ip->pending_mmaps);
ip->info.size = size;
ip->context = context;
ip->obj = obj;
kref_init(&ip->ref);
return ip;
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include "rxe.h"
#include "rxe_loc.h"
/*
* lfsr (linear feedback shift register) with period 255
*/
static u8 rxe_get_key(void)
{
static unsigned key = 1;
key = key << 1;
key |= (0 != (key & 0x100)) ^ (0 != (key & 0x10))
^ (0 != (key & 0x80)) ^ (0 != (key & 0x40));
key &= 0xff;
return key;
}
int mem_check_range(struct rxe_mem *mem, u64 iova, size_t length)
{
switch (mem->type) {
case RXE_MEM_TYPE_DMA:
return 0;
case RXE_MEM_TYPE_MR:
case RXE_MEM_TYPE_FMR:
return ((iova < mem->iova) ||
((iova + length) > (mem->iova + mem->length))) ?
-EFAULT : 0;
default:
return -EFAULT;
}
}
#define IB_ACCESS_REMOTE (IB_ACCESS_REMOTE_READ \
| IB_ACCESS_REMOTE_WRITE \
| IB_ACCESS_REMOTE_ATOMIC)
static void rxe_mem_init(int access, struct rxe_mem *mem)
{
u32 lkey = mem->pelem.index << 8 | rxe_get_key();
u32 rkey = (access & IB_ACCESS_REMOTE) ? lkey : 0;
if (mem->pelem.pool->type == RXE_TYPE_MR) {
mem->ibmr.lkey = lkey;
mem->ibmr.rkey = rkey;
}
mem->lkey = lkey;
mem->rkey = rkey;
mem->state = RXE_MEM_STATE_INVALID;
mem->type = RXE_MEM_TYPE_NONE;
mem->map_shift = ilog2(RXE_BUF_PER_MAP);
}
void rxe_mem_cleanup(void *arg)
{
struct rxe_mem *mem = arg;
int i;
if (mem->umem)
ib_umem_release(mem->umem);
if (mem->map) {
for (i = 0; i < mem->num_map; i++)
kfree(mem->map[i]);
kfree(mem->map);
}
}
static int rxe_mem_alloc(struct rxe_dev *rxe, struct rxe_mem *mem, int num_buf)
{
int i;
int num_map;
struct rxe_map **map = mem->map;
num_map = (num_buf + RXE_BUF_PER_MAP - 1) / RXE_BUF_PER_MAP;
mem->map = kmalloc_array(num_map, sizeof(*map), GFP_KERNEL);
if (!mem->map)
goto err1;
for (i = 0; i < num_map; i++) {
mem->map[i] = kmalloc(sizeof(**map), GFP_KERNEL);
if (!mem->map[i])
goto err2;
}
WARN_ON(!is_power_of_2(RXE_BUF_PER_MAP));
mem->map_shift = ilog2(RXE_BUF_PER_MAP);
mem->map_mask = RXE_BUF_PER_MAP - 1;
mem->num_buf = num_buf;
mem->num_map = num_map;
mem->max_buf = num_map * RXE_BUF_PER_MAP;
return 0;
err2:
for (i--; i >= 0; i--)
kfree(mem->map[i]);
kfree(mem->map);
err1:
return -ENOMEM;
}
int rxe_mem_init_dma(struct rxe_dev *rxe, struct rxe_pd *pd,
int access, struct rxe_mem *mem)
{
rxe_mem_init(access, mem);
mem->pd = pd;
mem->access = access;
mem->state = RXE_MEM_STATE_VALID;
mem->type = RXE_MEM_TYPE_DMA;
return 0;
}
int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,
u64 length, u64 iova, int access, struct ib_udata *udata,
struct rxe_mem *mem)
{
int entry;
struct rxe_map **map;
struct rxe_phys_buf *buf = NULL;
struct ib_umem *umem;
struct scatterlist *sg;
int num_buf;
void *vaddr;
int err;
umem = ib_umem_get(pd->ibpd.uobject->context, start, length, access, 0);
if (IS_ERR(umem)) {
pr_warn("err %d from rxe_umem_get\n",
(int)PTR_ERR(umem));
err = -EINVAL;
goto err1;
}
mem->umem = umem;
num_buf = umem->nmap;
rxe_mem_init(access, mem);
err = rxe_mem_alloc(rxe, mem, num_buf);
if (err) {
pr_warn("err %d from rxe_mem_alloc\n", err);
ib_umem_release(umem);
goto err1;
}
WARN_ON(!is_power_of_2(umem->page_size));
mem->page_shift = ilog2(umem->page_size);
mem->page_mask = umem->page_size - 1;
num_buf = 0;
map = mem->map;
if (length > 0) {
buf = map[0]->buf;
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
vaddr = page_address(sg_page(sg));
if (!vaddr) {
pr_warn("null vaddr\n");
err = -ENOMEM;
goto err1;
}
buf->addr = (uintptr_t)vaddr;
buf->size = umem->page_size;
num_buf++;
buf++;
if (num_buf >= RXE_BUF_PER_MAP) {
map++;
buf = map[0]->buf;
num_buf = 0;
}
}
}
mem->pd = pd;
mem->umem = umem;
mem->access = access;
mem->length = length;
mem->iova = iova;
mem->va = start;
mem->offset = ib_umem_offset(umem);
mem->state = RXE_MEM_STATE_VALID;
mem->type = RXE_MEM_TYPE_MR;
return 0;
err1:
return err;
}
int rxe_mem_init_fast(struct rxe_dev *rxe, struct rxe_pd *pd,
int max_pages, struct rxe_mem *mem)
{
int err;
rxe_mem_init(0, mem);
/* In fastreg, we also set the rkey */
mem->ibmr.rkey = mem->ibmr.lkey;
err = rxe_mem_alloc(rxe, mem, max_pages);
if (err)
goto err1;
mem->pd = pd;
mem->max_buf = max_pages;
mem->state = RXE_MEM_STATE_FREE;
mem->type = RXE_MEM_TYPE_MR;
return 0;
err1:
return err;
}
static void lookup_iova(
struct rxe_mem *mem,
u64 iova,
int *m_out,
int *n_out,
size_t *offset_out)
{
size_t offset = iova - mem->iova + mem->offset;
int map_index;
int buf_index;
u64 length;
if (likely(mem->page_shift)) {
*offset_out = offset & mem->page_mask;
offset >>= mem->page_shift;
*n_out = offset & mem->map_mask;
*m_out = offset >> mem->map_shift;
} else {
map_index = 0;
buf_index = 0;
length = mem->map[map_index]->buf[buf_index].size;
while (offset >= length) {
offset -= length;
buf_index++;
if (buf_index == RXE_BUF_PER_MAP) {
map_index++;
buf_index = 0;
}
length = mem->map[map_index]->buf[buf_index].size;
}
*m_out = map_index;
*n_out = buf_index;
*offset_out = offset;
}
}
void *iova_to_vaddr(struct rxe_mem *mem, u64 iova, int length)
{
size_t offset;
int m, n;
void *addr;
if (mem->state != RXE_MEM_STATE_VALID) {
pr_warn("mem not in valid state\n");
addr = NULL;
goto out;
}
if (!mem->map) {
addr = (void *)(uintptr_t)iova;
goto out;
}
if (mem_check_range(mem, iova, length)) {
pr_warn("range violation\n");
addr = NULL;
goto out;
}
lookup_iova(mem, iova, &m, &n, &offset);
if (offset + length > mem->map[m]->buf[n].size) {
pr_warn("crosses page boundary\n");
addr = NULL;
goto out;
}
addr = (void *)(uintptr_t)mem->map[m]->buf[n].addr + offset;
out:
return addr;
}
/* copy data from a range (vaddr, vaddr+length-1) to or from
* a mem object starting at iova. Compute incremental value of
* crc32 if crcp is not zero. caller must hold a reference to mem
*/
int rxe_mem_copy(struct rxe_mem *mem, u64 iova, void *addr, int length,
enum copy_direction dir, u32 *crcp)
{
int err;
int bytes;
u8 *va;
struct rxe_map **map;
struct rxe_phys_buf *buf;
int m;
int i;
size_t offset;
u32 crc = crcp ? (*crcp) : 0;
if (mem->type == RXE_MEM_TYPE_DMA) {
u8 *src, *dest;
src = (dir == to_mem_obj) ?
addr : ((void *)(uintptr_t)iova);
dest = (dir == to_mem_obj) ?
((void *)(uintptr_t)iova) : addr;
if (crcp)
*crcp = crc32_le(*crcp, src, length);
memcpy(dest, src, length);
return 0;
}
WARN_ON(!mem->map);
err = mem_check_range(mem, iova, length);
if (err) {
err = -EFAULT;
goto err1;
}
lookup_iova(mem, iova, &m, &i, &offset);
map = mem->map + m;
buf = map[0]->buf + i;
while (length > 0) {
u8 *src, *dest;
va = (u8 *)(uintptr_t)buf->addr + offset;
src = (dir == to_mem_obj) ? addr : va;
dest = (dir == to_mem_obj) ? va : addr;
bytes = buf->size - offset;
if (bytes > length)
bytes = length;
if (crcp)
crc = crc32_le(crc, src, bytes);
memcpy(dest, src, bytes);
length -= bytes;
addr += bytes;
offset = 0;
buf++;
i++;
if (i == RXE_BUF_PER_MAP) {
i = 0;
map++;
buf = map[0]->buf;
}
}
if (crcp)
*crcp = crc;
return 0;
err1:
return err;
}
/* copy data in or out of a wqe, i.e. sg list
* under the control of a dma descriptor
*/
int copy_data(
struct rxe_dev *rxe,
struct rxe_pd *pd,
int access,
struct rxe_dma_info *dma,
void *addr,
int length,
enum copy_direction dir,
u32 *crcp)
{
int bytes;
struct rxe_sge *sge = &dma->sge[dma->cur_sge];
int offset = dma->sge_offset;
int resid = dma->resid;
struct rxe_mem *mem = NULL;
u64 iova;
int err;
if (length == 0)
return 0;
if (length > resid) {
err = -EINVAL;
goto err2;
}
if (sge->length && (offset < sge->length)) {
mem = lookup_mem(pd, access, sge->lkey, lookup_local);
if (!mem) {
err = -EINVAL;
goto err1;
}
}
while (length > 0) {
bytes = length;
if (offset >= sge->length) {
if (mem) {
rxe_drop_ref(mem);
mem = NULL;
}
sge++;
dma->cur_sge++;
offset = 0;
if (dma->cur_sge >= dma->num_sge) {
err = -ENOSPC;
goto err2;
}
if (sge->length) {
mem = lookup_mem(pd, access, sge->lkey,
lookup_local);
if (!mem) {
err = -EINVAL;
goto err1;
}
} else {
continue;
}
}
if (bytes > sge->length - offset)
bytes = sge->length - offset;
if (bytes > 0) {
iova = sge->addr + offset;
err = rxe_mem_copy(mem, iova, addr, bytes, dir, crcp);
if (err)
goto err2;
offset += bytes;
resid -= bytes;
length -= bytes;
addr += bytes;
}
}
dma->sge_offset = offset;
dma->resid = resid;
if (mem)
rxe_drop_ref(mem);
return 0;
err2:
if (mem)
rxe_drop_ref(mem);
err1:
return err;
}
int advance_dma_data(struct rxe_dma_info *dma, unsigned int length)
{
struct rxe_sge *sge = &dma->sge[dma->cur_sge];
int offset = dma->sge_offset;
int resid = dma->resid;
while (length) {
unsigned int bytes;
if (offset >= sge->length) {
sge++;
dma->cur_sge++;
offset = 0;
if (dma->cur_sge >= dma->num_sge)
return -ENOSPC;
}
bytes = length;
if (bytes > sge->length - offset)
bytes = sge->length - offset;
offset += bytes;
resid -= bytes;
length -= bytes;
}
dma->sge_offset = offset;
dma->resid = resid;
return 0;
}
/* (1) find the mem (mr or mw) corresponding to lkey/rkey
* depending on lookup_type
* (2) verify that the (qp) pd matches the mem pd
* (3) verify that the mem can support the requested access
* (4) verify that mem state is valid
*/
struct rxe_mem *lookup_mem(struct rxe_pd *pd, int access, u32 key,
enum lookup_type type)
{
struct rxe_mem *mem;
struct rxe_dev *rxe = to_rdev(pd->ibpd.device);
int index = key >> 8;
if (index >= RXE_MIN_MR_INDEX && index <= RXE_MAX_MR_INDEX) {
mem = rxe_pool_get_index(&rxe->mr_pool, index);
if (!mem)
goto err1;
} else {
goto err1;
}
if ((type == lookup_local && mem->lkey != key) ||
(type == lookup_remote && mem->rkey != key))
goto err2;
if (mem->pd != pd)
goto err2;
if (access && !(access & mem->access))
goto err2;
if (mem->state != RXE_MEM_STATE_VALID)
goto err2;
return mem;
err2:
rxe_drop_ref(mem);
err1:
return NULL;
}
int rxe_mem_map_pages(struct rxe_dev *rxe, struct rxe_mem *mem,
u64 *page, int num_pages, u64 iova)
{
int i;
int num_buf;
int err;
struct rxe_map **map;
struct rxe_phys_buf *buf;
int page_size;
if (num_pages > mem->max_buf) {
err = -EINVAL;
goto err1;
}
num_buf = 0;
page_size = 1 << mem->page_shift;
map = mem->map;
buf = map[0]->buf;
for (i = 0; i < num_pages; i++) {
buf->addr = *page++;
buf->size = page_size;
buf++;
num_buf++;
if (num_buf == RXE_BUF_PER_MAP) {
map++;
buf = map[0]->buf;
num_buf = 0;
}
}
mem->iova = iova;
mem->va = iova;
mem->length = num_pages << mem->page_shift;
mem->state = RXE_MEM_STATE_VALID;
return 0;
err1:
return err;
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#include <linux/skbuff.h>
#include <linux/if_arp.h>
#include <linux/netdevice.h>
#include <linux/if.h>
#include <linux/if_vlan.h>
#include <net/udp_tunnel.h>
#include <net/sch_generic.h>
#include <linux/netfilter.h>
#include <rdma/ib_addr.h>
#include "rxe.h"
#include "rxe_net.h"
#include "rxe_loc.h"
static LIST_HEAD(rxe_dev_list);
static spinlock_t dev_list_lock; /* spinlock for device list */
struct rxe_dev *net_to_rxe(struct net_device *ndev)
{
struct rxe_dev *rxe;
struct rxe_dev *found = NULL;
spin_lock_bh(&dev_list_lock);
list_for_each_entry(rxe, &rxe_dev_list, list) {
if (rxe->ndev == ndev) {
found = rxe;
break;
}
}
spin_unlock_bh(&dev_list_lock);
return found;
}
struct rxe_dev *get_rxe_by_name(const char* name)
{
struct rxe_dev *rxe;
struct rxe_dev *found = NULL;
spin_lock_bh(&dev_list_lock);
list_for_each_entry(rxe, &rxe_dev_list, list) {
if (!strcmp(name, rxe->ib_dev.name)) {
found = rxe;
break;
}
}
spin_unlock_bh(&dev_list_lock);
return found;
}
struct rxe_recv_sockets recv_sockets;
static __be64 rxe_mac_to_eui64(struct net_device *ndev)
{
unsigned char *mac_addr = ndev->dev_addr;
__be64 eui64;
unsigned char *dst = (unsigned char *)&eui64;
dst[0] = mac_addr[0] ^ 2;
dst[1] = mac_addr[1];
dst[2] = mac_addr[2];
dst[3] = 0xff;
dst[4] = 0xfe;
dst[5] = mac_addr[3];
dst[6] = mac_addr[4];
dst[7] = mac_addr[5];
return eui64;
}
static __be64 node_guid(struct rxe_dev *rxe)
{
return rxe_mac_to_eui64(rxe->ndev);
}
static __be64 port_guid(struct rxe_dev *rxe)
{
return rxe_mac_to_eui64(rxe->ndev);
}
static struct device *dma_device(struct rxe_dev *rxe)
{
struct net_device *ndev;
ndev = rxe->ndev;
if (ndev->priv_flags & IFF_802_1Q_VLAN)
ndev = vlan_dev_real_dev(ndev);
return ndev->dev.parent;
}
static int mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
{
int err;
unsigned char ll_addr[ETH_ALEN];
ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
err = dev_mc_add(rxe->ndev, ll_addr);
return err;
}
static int mcast_delete(struct rxe_dev *rxe, union ib_gid *mgid)
{
int err;
unsigned char ll_addr[ETH_ALEN];
ipv6_eth_mc_map((struct in6_addr *)mgid->raw, ll_addr);
err = dev_mc_del(rxe->ndev, ll_addr);
return err;
}
static struct dst_entry *rxe_find_route4(struct net_device *ndev,
struct in_addr *saddr,
struct in_addr *daddr)
{
struct rtable *rt;
struct flowi4 fl = { { 0 } };
memset(&fl, 0, sizeof(fl));
fl.flowi4_oif = ndev->ifindex;
memcpy(&fl.saddr, saddr, sizeof(*saddr));
memcpy(&fl.daddr, daddr, sizeof(*daddr));
fl.flowi4_proto = IPPROTO_UDP;
rt = ip_route_output_key(&init_net, &fl);
if (IS_ERR(rt)) {
pr_err_ratelimited("no route to %pI4\n", &daddr->s_addr);
return NULL;
}
return &rt->dst;
}
#if IS_ENABLED(CONFIG_IPV6)
static struct dst_entry *rxe_find_route6(struct net_device *ndev,
struct in6_addr *saddr,
struct in6_addr *daddr)
{
struct dst_entry *ndst;
struct flowi6 fl6 = { { 0 } };
memset(&fl6, 0, sizeof(fl6));
fl6.flowi6_oif = ndev->ifindex;
memcpy(&fl6.saddr, saddr, sizeof(*saddr));
memcpy(&fl6.daddr, daddr, sizeof(*daddr));
fl6.flowi6_proto = IPPROTO_UDP;
if (unlikely(ipv6_stub->ipv6_dst_lookup(sock_net(recv_sockets.sk6->sk),
recv_sockets.sk6->sk, &ndst, &fl6))) {
pr_err_ratelimited("no route to %pI6\n", daddr);
goto put;
}
if (unlikely(ndst->error)) {
pr_err("no route to %pI6\n", daddr);
goto put;
}
return ndst;
put:
dst_release(ndst);
return NULL;
}
#else
static struct dst_entry *rxe_find_route6(struct net_device *ndev,
struct in6_addr *saddr,
struct in6_addr *daddr)
{
return NULL;
}
#endif
static int rxe_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
{
struct udphdr *udph;
struct net_device *ndev = skb->dev;
struct rxe_dev *rxe = net_to_rxe(ndev);
struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
if (!rxe)
goto drop;
if (skb_linearize(skb)) {
pr_err("skb_linearize failed\n");
goto drop;
}
udph = udp_hdr(skb);
pkt->rxe = rxe;
pkt->port_num = 1;
pkt->hdr = (u8 *)(udph + 1);
pkt->mask = RXE_GRH_MASK;
pkt->paylen = be16_to_cpu(udph->len) - sizeof(*udph);
return rxe_rcv(skb);
drop:
kfree_skb(skb);
return 0;
}
static struct socket *rxe_setup_udp_tunnel(struct net *net, __be16 port,
bool ipv6)
{
int err;
struct socket *sock;
struct udp_port_cfg udp_cfg;
struct udp_tunnel_sock_cfg tnl_cfg;
memset(&udp_cfg, 0, sizeof(udp_cfg));
if (ipv6) {
udp_cfg.family = AF_INET6;
udp_cfg.ipv6_v6only = 1;
} else {
udp_cfg.family = AF_INET;
}
udp_cfg.local_udp_port = port;
/* Create UDP socket */
err = udp_sock_create(net, &udp_cfg, &sock);
if (err < 0) {
pr_err("failed to create udp socket. err = %d\n", err);
return ERR_PTR(err);
}
tnl_cfg.sk_user_data = NULL;
tnl_cfg.encap_type = 1;
tnl_cfg.encap_rcv = rxe_udp_encap_recv;
tnl_cfg.encap_destroy = NULL;
/* Setup UDP tunnel */
setup_udp_tunnel_sock(net, sock, &tnl_cfg);
return sock;
}
static void rxe_release_udp_tunnel(struct socket *sk)
{
udp_tunnel_sock_release(sk);
}
static void prepare_udp_hdr(struct sk_buff *skb, __be16 src_port,
__be16 dst_port)
{
struct udphdr *udph;
__skb_push(skb, sizeof(*udph));
skb_reset_transport_header(skb);
udph = udp_hdr(skb);
udph->dest = dst_port;
udph->source = src_port;
udph->len = htons(skb->len);
udph->check = 0;
}
static void prepare_ipv4_hdr(struct dst_entry *dst, struct sk_buff *skb,
__be32 saddr, __be32 daddr, __u8 proto,
__u8 tos, __u8 ttl, __be16 df, bool xnet)
{
struct iphdr *iph;
skb_scrub_packet(skb, xnet);
skb_clear_hash(skb);
skb_dst_set(skb, dst);
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
skb_push(skb, sizeof(struct iphdr));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
iph->version = IPVERSION;
iph->ihl = sizeof(struct iphdr) >> 2;
iph->frag_off = df;
iph->protocol = proto;
iph->tos = tos;
iph->daddr = daddr;
iph->saddr = saddr;
iph->ttl = ttl;
__ip_select_ident(dev_net(dst->dev), iph,
skb_shinfo(skb)->gso_segs ?: 1);
iph->tot_len = htons(skb->len);
ip_send_check(iph);
}
static void prepare_ipv6_hdr(struct dst_entry *dst, struct sk_buff *skb,
struct in6_addr *saddr, struct in6_addr *daddr,
__u8 proto, __u8 prio, __u8 ttl)
{
struct ipv6hdr *ip6h;
memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED
| IPSKB_REROUTED);
skb_dst_set(skb, dst);
__skb_push(skb, sizeof(*ip6h));
skb_reset_network_header(skb);
ip6h = ipv6_hdr(skb);
ip6_flow_hdr(ip6h, prio, htonl(0));
ip6h->payload_len = htons(skb->len);
ip6h->nexthdr = proto;
ip6h->hop_limit = ttl;
ip6h->daddr = *daddr;
ip6h->saddr = *saddr;
ip6h->payload_len = htons(skb->len - sizeof(*ip6h));
}
static int prepare4(struct rxe_dev *rxe, struct sk_buff *skb, struct rxe_av *av)
{
struct dst_entry *dst;
bool xnet = false;
__be16 df = htons(IP_DF);
struct in_addr *saddr = &av->sgid_addr._sockaddr_in.sin_addr;
struct in_addr *daddr = &av->dgid_addr._sockaddr_in.sin_addr;
struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
dst = rxe_find_route4(rxe->ndev, saddr, daddr);
if (!dst) {
pr_err("Host not reachable\n");
return -EHOSTUNREACH;
}
if (!memcmp(saddr, daddr, sizeof(*daddr)))
pkt->mask |= RXE_LOOPBACK_MASK;
prepare_udp_hdr(skb, htons(RXE_ROCE_V2_SPORT),
htons(ROCE_V2_UDP_DPORT));
prepare_ipv4_hdr(dst, skb, saddr->s_addr, daddr->s_addr, IPPROTO_UDP,
av->grh.traffic_class, av->grh.hop_limit, df, xnet);
return 0;
}
static int prepare6(struct rxe_dev *rxe, struct sk_buff *skb, struct rxe_av *av)
{
struct dst_entry *dst;
struct in6_addr *saddr = &av->sgid_addr._sockaddr_in6.sin6_addr;
struct in6_addr *daddr = &av->dgid_addr._sockaddr_in6.sin6_addr;
struct rxe_pkt_info *pkt = SKB_TO_PKT(skb);
dst = rxe_find_route6(rxe->ndev, saddr, daddr);
if (!dst) {
pr_err("Host not reachable\n");
return -EHOSTUNREACH;
}
if (!memcmp(saddr, daddr, sizeof(*daddr)))
pkt->mask |= RXE_LOOPBACK_MASK;
prepare_udp_hdr(skb, htons(RXE_ROCE_V2_SPORT),
htons(ROCE_V2_UDP_DPORT));
prepare_ipv6_hdr(dst, skb, saddr, daddr, IPPROTO_UDP,
av->grh.traffic_class,
av->grh.hop_limit);
return 0;
}
static int prepare(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
struct sk_buff *skb, u32 *crc)
{
int err = 0;
struct rxe_av *av = rxe_get_av(pkt);
if (av->network_type == RDMA_NETWORK_IPV4)
err = prepare4(rxe, skb, av);
else if (av->network_type == RDMA_NETWORK_IPV6)
err = prepare6(rxe, skb, av);
*crc = rxe_icrc_hdr(pkt, skb);
return err;
}
static void rxe_skb_tx_dtor(struct sk_buff *skb)
{
struct sock *sk = skb->sk;
struct rxe_qp *qp = sk->sk_user_data;
int skb_out = atomic_dec_return(&qp->skb_out);
if (unlikely(qp->need_req_skb &&
skb_out < RXE_INFLIGHT_SKBS_PER_QP_LOW))
rxe_run_task(&qp->req.task, 1);
}
static int send(struct rxe_dev *rxe, struct rxe_pkt_info *pkt,
struct sk_buff *skb)
{
struct sk_buff *nskb;
struct rxe_av *av;
int err;
av = rxe_get_av(pkt);
nskb = skb_clone(skb, GFP_ATOMIC);
if (!nskb)
return -ENOMEM;
nskb->destructor = rxe_skb_tx_dtor;
nskb->sk = pkt->qp->sk->sk;
if (av->network_type == RDMA_NETWORK_IPV4) {
err = ip_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
} else if (av->network_type == RDMA_NETWORK_IPV6) {
err = ip6_local_out(dev_net(skb_dst(skb)->dev), nskb->sk, nskb);
} else {
pr_err("Unknown layer 3 protocol: %d\n", av->network_type);
kfree_skb(nskb);
return -EINVAL;
}
if (unlikely(net_xmit_eval(err))) {
pr_debug("error sending packet: %d\n", err);
return -EAGAIN;
}
kfree_skb(skb);
return 0;
}
static int loopback(struct sk_buff *skb)
{
return rxe_rcv(skb);
}
static inline int addr_same(struct rxe_dev *rxe, struct rxe_av *av)
{
return rxe->port.port_guid == av->grh.dgid.global.interface_id;
}
static struct sk_buff *init_packet(struct rxe_dev *rxe, struct rxe_av *av,
int paylen, struct rxe_pkt_info *pkt)
{
unsigned int hdr_len;
struct sk_buff *skb;
if (av->network_type == RDMA_NETWORK_IPV4)
hdr_len = ETH_HLEN + sizeof(struct udphdr) +
sizeof(struct iphdr);
else
hdr_len = ETH_HLEN + sizeof(struct udphdr) +
sizeof(struct ipv6hdr);
skb = alloc_skb(paylen + hdr_len + LL_RESERVED_SPACE(rxe->ndev),
GFP_ATOMIC);
if (unlikely(!skb))
return NULL;
skb_reserve(skb, hdr_len + LL_RESERVED_SPACE(rxe->ndev));
skb->dev = rxe->ndev;
if (av->network_type == RDMA_NETWORK_IPV4)
skb->protocol = htons(ETH_P_IP);
else
skb->protocol = htons(ETH_P_IPV6);
pkt->rxe = rxe;
pkt->port_num = 1;
pkt->hdr = skb_put(skb, paylen);
pkt->mask |= RXE_GRH_MASK;
memset(pkt->hdr, 0, paylen);
return skb;
}
/*
* this is required by rxe_cfg to match rxe devices in
* /sys/class/infiniband up with their underlying ethernet devices
*/
static char *parent_name(struct rxe_dev *rxe, unsigned int port_num)
{
return rxe->ndev->name;
}
static enum rdma_link_layer link_layer(struct rxe_dev *rxe,
unsigned int port_num)
{
return IB_LINK_LAYER_ETHERNET;
}
static struct rxe_ifc_ops ifc_ops = {
.node_guid = node_guid,
.port_guid = port_guid,
.dma_device = dma_device,
.mcast_add = mcast_add,
.mcast_delete = mcast_delete,
.prepare = prepare,
.send = send,
.loopback = loopback,
.init_packet = init_packet,
.parent_name = parent_name,
.link_layer = link_layer,
};
struct rxe_dev *rxe_net_add(struct net_device *ndev)
{
int err;
struct rxe_dev *rxe = NULL;
rxe = (struct rxe_dev *)ib_alloc_device(sizeof(*rxe));
if (!rxe)
return NULL;
rxe->ifc_ops = &ifc_ops;
rxe->ndev = ndev;
err = rxe_add(rxe, ndev->mtu);
if (err) {
ib_dealloc_device(&rxe->ib_dev);
return NULL;
}
spin_lock_bh(&dev_list_lock);
list_add_tail(&rxe_dev_list, &rxe->list);
spin_unlock_bh(&dev_list_lock);
return rxe;
}
void rxe_remove_all(void)
{
spin_lock_bh(&dev_list_lock);
while (!list_empty(&rxe_dev_list)) {
struct rxe_dev *rxe =
list_first_entry(&rxe_dev_list, struct rxe_dev, list);
list_del(&rxe->list);
spin_unlock_bh(&dev_list_lock);
rxe_remove(rxe);
spin_lock_bh(&dev_list_lock);
}
spin_unlock_bh(&dev_list_lock);
}
EXPORT_SYMBOL(rxe_remove_all);
static void rxe_port_event(struct rxe_dev *rxe,
enum ib_event_type event)
{
struct ib_event ev;
ev.device = &rxe->ib_dev;
ev.element.port_num = 1;
ev.event = event;
ib_dispatch_event(&ev);
}
/* Caller must hold net_info_lock */
void rxe_port_up(struct rxe_dev *rxe)
{
struct rxe_port *port;
port = &rxe->port;
port->attr.state = IB_PORT_ACTIVE;
port->attr.phys_state = IB_PHYS_STATE_LINK_UP;
rxe_port_event(rxe, IB_EVENT_PORT_ACTIVE);
pr_info("rxe: set %s active\n", rxe->ib_dev.name);
return;
}
/* Caller must hold net_info_lock */
void rxe_port_down(struct rxe_dev *rxe)
{
struct rxe_port *port;
port = &rxe->port;
port->attr.state = IB_PORT_DOWN;
port->attr.phys_state = IB_PHYS_STATE_LINK_DOWN;
rxe_port_event(rxe, IB_EVENT_PORT_ERR);
pr_info("rxe: set %s down\n", rxe->ib_dev.name);
return;
}
static int rxe_notify(struct notifier_block *not_blk,
unsigned long event,
void *arg)
{
struct net_device *ndev = netdev_notifier_info_to_dev(arg);
struct rxe_dev *rxe = net_to_rxe(ndev);
if (!rxe)
goto out;
switch (event) {
case NETDEV_UNREGISTER:
list_del(&rxe->list);
rxe_remove(rxe);
break;
case NETDEV_UP:
rxe_port_up(rxe);
break;
case NETDEV_DOWN:
rxe_port_down(rxe);
break;
case NETDEV_CHANGEMTU:
pr_info("rxe: %s changed mtu to %d\n", ndev->name, ndev->mtu);
rxe_set_mtu(rxe, ndev->mtu);
break;
case NETDEV_REBOOT:
case NETDEV_CHANGE:
case NETDEV_GOING_DOWN:
case NETDEV_CHANGEADDR:
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
default:
pr_info("rxe: ignoring netdev event = %ld for %s\n",
event, ndev->name);
break;
}
out:
return NOTIFY_OK;
}
static struct notifier_block rxe_net_notifier = {
.notifier_call = rxe_notify,
};
int rxe_net_init(void)
{
int err;
spin_lock_init(&dev_list_lock);
recv_sockets.sk6 = rxe_setup_udp_tunnel(&init_net,
htons(ROCE_V2_UDP_DPORT), true);
if (IS_ERR(recv_sockets.sk6)) {
recv_sockets.sk6 = NULL;
pr_err("rxe: Failed to create IPv6 UDP tunnel\n");
return -1;
}
recv_sockets.sk4 = rxe_setup_udp_tunnel(&init_net,
htons(ROCE_V2_UDP_DPORT), false);
if (IS_ERR(recv_sockets.sk4)) {
rxe_release_udp_tunnel(recv_sockets.sk6);
recv_sockets.sk4 = NULL;
recv_sockets.sk6 = NULL;
pr_err("rxe: Failed to create IPv4 UDP tunnel\n");
return -1;
}
err = register_netdevice_notifier(&rxe_net_notifier);
if (err) {
rxe_release_udp_tunnel(recv_sockets.sk6);
rxe_release_udp_tunnel(recv_sockets.sk4);
pr_err("rxe: Failed to rigister netdev notifier\n");
}
return err;
}
void rxe_net_exit(void)
{
if (recv_sockets.sk6)
rxe_release_udp_tunnel(recv_sockets.sk6);
if (recv_sockets.sk4)
rxe_release_udp_tunnel(recv_sockets.sk4);
unregister_netdevice_notifier(&rxe_net_notifier);
}
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef RXE_NET_H
#define RXE_NET_H
#include <net/sock.h>
#include <net/if_inet6.h>
#include <linux/module.h>
struct rxe_recv_sockets {
struct socket *sk4;
struct socket *sk6;
};
extern struct rxe_recv_sockets recv_sockets;
struct rxe_dev *rxe_net_add(struct net_device *ndev);
int rxe_net_init(void);
void rxe_net_exit(void);
#endif /* RXE_NET_H */
此差异已折叠。
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
* Copyright (c) 2015 System Fabric Works, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
* General Public License (GPL) Version 2, available from the file
* COPYING in the main directory of this source tree, or the
* OpenIB.org BSD license below:
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
* BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
* ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
* CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/
#ifndef RXE_OPCODE_H
#define RXE_OPCODE_H
/*
* contains header bit mask definitions and header lengths
* declaration of the rxe_opcode_info struct and
* rxe_wr_opcode_info struct
*/
enum rxe_wr_mask {
WR_INLINE_MASK = BIT(0),
WR_ATOMIC_MASK = BIT(1),
WR_SEND_MASK = BIT(2),
WR_READ_MASK = BIT(3),
WR_WRITE_MASK = BIT(4),
WR_LOCAL_MASK = BIT(5),
WR_REG_MASK = BIT(6),
WR_READ_OR_WRITE_MASK = WR_READ_MASK | WR_WRITE_MASK,
WR_READ_WRITE_OR_SEND_MASK = WR_READ_OR_WRITE_MASK | WR_SEND_MASK,
WR_WRITE_OR_SEND_MASK = WR_WRITE_MASK | WR_SEND_MASK,
WR_ATOMIC_OR_READ_MASK = WR_ATOMIC_MASK | WR_READ_MASK,
};
#define WR_MAX_QPT (8)
struct rxe_wr_opcode_info {
char *name;
enum rxe_wr_mask mask[WR_MAX_QPT];
};
extern struct rxe_wr_opcode_info rxe_wr_opcode_info[];
enum rxe_hdr_type {
RXE_LRH,
RXE_GRH,
RXE_BTH,
RXE_RETH,
RXE_AETH,
RXE_ATMETH,
RXE_ATMACK,
RXE_IETH,
RXE_RDETH,
RXE_DETH,
RXE_IMMDT,
RXE_PAYLOAD,
NUM_HDR_TYPES
};
enum rxe_hdr_mask {
RXE_LRH_MASK = BIT(RXE_LRH),
RXE_GRH_MASK = BIT(RXE_GRH),
RXE_BTH_MASK = BIT(RXE_BTH),
RXE_IMMDT_MASK = BIT(RXE_IMMDT),
RXE_RETH_MASK = BIT(RXE_RETH),
RXE_AETH_MASK = BIT(RXE_AETH),
RXE_ATMETH_MASK = BIT(RXE_ATMETH),
RXE_ATMACK_MASK = BIT(RXE_ATMACK),
RXE_IETH_MASK = BIT(RXE_IETH),
RXE_RDETH_MASK = BIT(RXE_RDETH),
RXE_DETH_MASK = BIT(RXE_DETH),
RXE_PAYLOAD_MASK = BIT(RXE_PAYLOAD),
RXE_REQ_MASK = BIT(NUM_HDR_TYPES + 0),
RXE_ACK_MASK = BIT(NUM_HDR_TYPES + 1),
RXE_SEND_MASK = BIT(NUM_HDR_TYPES + 2),
RXE_WRITE_MASK = BIT(NUM_HDR_TYPES + 3),
RXE_READ_MASK = BIT(NUM_HDR_TYPES + 4),
RXE_ATOMIC_MASK = BIT(NUM_HDR_TYPES + 5),
RXE_RWR_MASK = BIT(NUM_HDR_TYPES + 6),
RXE_COMP_MASK = BIT(NUM_HDR_TYPES + 7),
RXE_START_MASK = BIT(NUM_HDR_TYPES + 8),
RXE_MIDDLE_MASK = BIT(NUM_HDR_TYPES + 9),
RXE_END_MASK = BIT(NUM_HDR_TYPES + 10),
RXE_LOOPBACK_MASK = BIT(NUM_HDR_TYPES + 12),
RXE_READ_OR_ATOMIC = (RXE_READ_MASK | RXE_ATOMIC_MASK),
RXE_WRITE_OR_SEND = (RXE_WRITE_MASK | RXE_SEND_MASK),
};
#define OPCODE_NONE (-1)
#define RXE_NUM_OPCODE 256
struct rxe_opcode_info {
char *name;
enum rxe_hdr_mask mask;
int length;
int offset[NUM_HDR_TYPES];
};
extern struct rxe_opcode_info rxe_opcode[RXE_NUM_OPCODE];
#endif /* RXE_OPCODE_H */
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -6,3 +6,4 @@ header-y += ib_user_verbs.h
header-y += rdma_netlink.h
header-y += rdma_user_cm.h
header-y += hfi/
header-y += rdma_user_rxe.h
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册