提交 29c8d9eb 编写于 作者: A Adit Ranadive 提交者: Doug Ledford

IB: Add vmw_pvrdma driver

This patch series adds a driver for a paravirtual RDMA device. The
device is developed for VMware's Virtual Machines and allows existing RDMA
applications to continue to use existing Verbs API when deployed in VMs
on ESXi. We recently did a presentation in the OFA Workshop [1] regarding
this device.

Description and RDMA Support
============================
The virtual device is exposed as a dual function PCIe device. One part
is a virtual network device (VMXNet3) which provides networking properties
like MAC, IP addresses to the RDMA part of the device. The networking
properties are used to register GIDs required by RDMA applications to
communicate.

These patches add support and the all required infrastructure for
letting applications use such a device. We support the mandatory Verbs API as
well as the base memory management extensions (Local Inv, Send with Inv and
Fast Register Work Requests). We currently support both Reliable Connected
and Unreliable Datagram QPs but do not support Shared Receive Queues
(SRQs).

Also, we support the following types of Work Requests:
 o Send/Receive (with or without Immediate Data)
 o RDMA Write (with or without Immediate Data)
 o RDMA Read
 o Local Invalidate
 o Send with Invalidate
 o Fast Register Work Requests

This version only adds support for version 1 of RoCE. We will add RoCEv2
support in a future patch. We do support registration of both MAC-based
and IP-based GIDs. I have also created a git tree for our user-level driver
[2].

Testing
=======
We have tested this internally for various types of Guest OS - Red Hat,
Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12
using backported versions of this driver. The tests included several
runs of the performance tests (included with OFED), Intel MPI PingPong
benchmark on OpenMPI, krping for FRWRs. Mellanox has been kind enough
to test the backported version of the driver internally on their hardware
using a VMware provided ESX build. I have also applied and tested this
with Doug's k.o/for-4.9 branch (commit 5603910b). Note, that this patch
series should be applied all together. I split out the commits so that
it may be easier to review.

PVRDMA Resources
================
[1] OFA Workshop Presentation -
https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf

[2] Libpvrdma User-level library -
http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summaryReviewed-by: NJorgen Hansen <jhansen@vmware.com>
Reviewed-by: NGeorge Zhang <georgezhang@vmware.com>
Reviewed-by: NAditya Sarwade <asarwade@vmware.com>
Reviewed-by: NBryan Tan <bryantan@vmware.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
上级 b1226c7d
......@@ -12928,6 +12928,13 @@ S: Maintained
F: drivers/scsi/vmw_pvscsi.c
F: drivers/scsi/vmw_pvscsi.h
VMWARE PVRDMA DRIVER
M: Adit Ranadive <aditr@vmware.com>
M: VMware PV-Drivers <pv-drivers@vmware.com>
L: linux-rdma@vger.kernel.org
S: Maintained
F: drivers/infiniband/hw/vmw_pvrdma/
VOLTAGE AND CURRENT REGULATOR FRAMEWORK
M: Liam Girdwood <lgirdwood@gmail.com>
M: Mark Brown <broonie@kernel.org>
......
......@@ -73,6 +73,7 @@ source "drivers/infiniband/hw/mlx4/Kconfig"
source "drivers/infiniband/hw/mlx5/Kconfig"
source "drivers/infiniband/hw/nes/Kconfig"
source "drivers/infiniband/hw/ocrdma/Kconfig"
source "drivers/infiniband/hw/vmw_pvrdma/Kconfig"
source "drivers/infiniband/hw/usnic/Kconfig"
source "drivers/infiniband/hw/hns/Kconfig"
......
......@@ -7,6 +7,7 @@ obj-$(CONFIG_MLX4_INFINIBAND) += mlx4/
obj-$(CONFIG_MLX5_INFINIBAND) += mlx5/
obj-$(CONFIG_INFINIBAND_NES) += nes/
obj-$(CONFIG_INFINIBAND_OCRDMA) += ocrdma/
obj-$(CONFIG_INFINIBAND_VMWARE_PVRDMA) += vmw_pvrdma/
obj-$(CONFIG_INFINIBAND_USNIC) += usnic/
obj-$(CONFIG_INFINIBAND_HFI1) += hfi1/
obj-$(CONFIG_INFINIBAND_HNS) += hns/
......
config INFINIBAND_VMWARE_PVRDMA
tristate "VMware Paravirtualized RDMA Driver"
depends on NETDEVICES && ETHERNET && PCI && INET && VMXNET3
---help---
This driver provides low-level support for VMware Paravirtual
RDMA adapter. It interacts with the VMXNet3 driver to provide
Ethernet capabilities.
obj-$(CONFIG_INFINIBAND_VMWARE_PVRDMA) += vmw_pvrdma.o
vmw_pvrdma-y := pvrdma_cmd.o pvrdma_cq.o pvrdma_doorbell.o pvrdma_main.o pvrdma_misc.o pvrdma_mr.o pvrdma_qp.o pvrdma_verbs.o
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __PVRDMA_H__
#define __PVRDMA_H__
#include <linux/compiler.h>
#include <linux/interrupt.h>
#include <linux/list.h>
#include <linux/mutex.h>
#include <linux/pci.h>
#include <linux/semaphore.h>
#include <linux/workqueue.h>
#include <rdma/ib_umem.h>
#include <rdma/ib_verbs.h>
#include <rdma/vmw_pvrdma-abi.h>
#include "pvrdma_ring.h"
#include "pvrdma_dev_api.h"
#include "pvrdma_verbs.h"
/* NOT the same as BIT_MASK(). */
#define PVRDMA_MASK(n) ((n << 1) - 1)
/*
* VMware PVRDMA PCI device id.
*/
#define PCI_DEVICE_ID_VMWARE_PVRDMA 0x0820
struct pvrdma_dev;
struct pvrdma_page_dir {
dma_addr_t dir_dma;
u64 *dir;
int ntables;
u64 **tables;
u64 npages;
void **pages;
};
struct pvrdma_cq {
struct ib_cq ibcq;
int offset;
spinlock_t cq_lock; /* Poll lock. */
struct pvrdma_uar_map *uar;
struct ib_umem *umem;
struct pvrdma_ring_state *ring_state;
struct pvrdma_page_dir pdir;
u32 cq_handle;
bool is_kernel;
atomic_t refcnt;
wait_queue_head_t wait;
};
struct pvrdma_id_table {
u32 last;
u32 top;
u32 max;
u32 mask;
spinlock_t lock; /* Table lock. */
unsigned long *table;
};
struct pvrdma_uar_map {
unsigned long pfn;
void __iomem *map;
int index;
};
struct pvrdma_uar_table {
struct pvrdma_id_table tbl;
int size;
};
struct pvrdma_ucontext {
struct ib_ucontext ibucontext;
struct pvrdma_dev *dev;
struct pvrdma_uar_map uar;
u64 ctx_handle;
};
struct pvrdma_pd {
struct ib_pd ibpd;
u32 pdn;
u32 pd_handle;
int privileged;
};
struct pvrdma_mr {
u32 mr_handle;
u64 iova;
u64 size;
};
struct pvrdma_user_mr {
struct ib_mr ibmr;
struct ib_umem *umem;
struct pvrdma_mr mmr;
struct pvrdma_page_dir pdir;
u64 *pages;
u32 npages;
u32 max_pages;
u32 page_shift;
};
struct pvrdma_wq {
struct pvrdma_ring *ring;
spinlock_t lock; /* Work queue lock. */
int wqe_cnt;
int wqe_size;
int max_sg;
int offset;
};
struct pvrdma_ah {
struct ib_ah ibah;
struct pvrdma_av av;
};
struct pvrdma_qp {
struct ib_qp ibqp;
u32 qp_handle;
u32 qkey;
struct pvrdma_wq sq;
struct pvrdma_wq rq;
struct ib_umem *rumem;
struct ib_umem *sumem;
struct pvrdma_page_dir pdir;
int npages;
int npages_send;
int npages_recv;
u32 flags;
u8 port;
u8 state;
bool is_kernel;
struct mutex mutex; /* QP state mutex. */
atomic_t refcnt;
wait_queue_head_t wait;
};
struct pvrdma_dev {
/* PCI device-related information. */
struct ib_device ib_dev;
struct pci_dev *pdev;
void __iomem *regs;
struct pvrdma_device_shared_region *dsr; /* Shared region pointer */
dma_addr_t dsrbase; /* Shared region base address */
void *cmd_slot;
void *resp_slot;
unsigned long flags;
struct list_head device_link;
/* Locking and interrupt information. */
spinlock_t cmd_lock; /* Command lock. */
struct semaphore cmd_sema;
struct completion cmd_done;
struct {
enum pvrdma_intr_type type; /* Intr type */
struct msix_entry msix_entry[PVRDMA_MAX_INTERRUPTS];
irq_handler_t handler[PVRDMA_MAX_INTERRUPTS];
u8 enabled[PVRDMA_MAX_INTERRUPTS];
u8 size;
} intr;
/* RDMA-related device information. */
union ib_gid *sgid_tbl;
struct pvrdma_ring_state *async_ring_state;
struct pvrdma_page_dir async_pdir;
struct pvrdma_ring_state *cq_ring_state;
struct pvrdma_page_dir cq_pdir;
struct pvrdma_cq **cq_tbl;
spinlock_t cq_tbl_lock;
struct pvrdma_qp **qp_tbl;
spinlock_t qp_tbl_lock;
struct pvrdma_uar_table uar_table;
struct pvrdma_uar_map driver_uar;
__be64 sys_image_guid;
spinlock_t desc_lock; /* Device modification lock. */
u32 port_cap_mask;
struct mutex port_mutex; /* Port modification mutex. */
bool ib_active;
atomic_t num_qps;
atomic_t num_cqs;
atomic_t num_pds;
atomic_t num_ahs;
/* Network device information. */
struct net_device *netdev;
struct notifier_block nb_netdev;
};
struct pvrdma_netdevice_work {
struct work_struct work;
struct net_device *event_netdev;
unsigned long event;
};
static inline struct pvrdma_dev *to_vdev(struct ib_device *ibdev)
{
return container_of(ibdev, struct pvrdma_dev, ib_dev);
}
static inline struct
pvrdma_ucontext *to_vucontext(struct ib_ucontext *ibucontext)
{
return container_of(ibucontext, struct pvrdma_ucontext, ibucontext);
}
static inline struct pvrdma_pd *to_vpd(struct ib_pd *ibpd)
{
return container_of(ibpd, struct pvrdma_pd, ibpd);
}
static inline struct pvrdma_cq *to_vcq(struct ib_cq *ibcq)
{
return container_of(ibcq, struct pvrdma_cq, ibcq);
}
static inline struct pvrdma_user_mr *to_vmr(struct ib_mr *ibmr)
{
return container_of(ibmr, struct pvrdma_user_mr, ibmr);
}
static inline struct pvrdma_qp *to_vqp(struct ib_qp *ibqp)
{
return container_of(ibqp, struct pvrdma_qp, ibqp);
}
static inline struct pvrdma_ah *to_vah(struct ib_ah *ibah)
{
return container_of(ibah, struct pvrdma_ah, ibah);
}
static inline void pvrdma_write_reg(struct pvrdma_dev *dev, u32 reg, u32 val)
{
writel(cpu_to_le32(val), dev->regs + reg);
}
static inline u32 pvrdma_read_reg(struct pvrdma_dev *dev, u32 reg)
{
return le32_to_cpu(readl(dev->regs + reg));
}
static inline void pvrdma_write_uar_cq(struct pvrdma_dev *dev, u32 val)
{
writel(cpu_to_le32(val), dev->driver_uar.map + PVRDMA_UAR_CQ_OFFSET);
}
static inline void pvrdma_write_uar_qp(struct pvrdma_dev *dev, u32 val)
{
writel(cpu_to_le32(val), dev->driver_uar.map + PVRDMA_UAR_QP_OFFSET);
}
static inline void *pvrdma_page_dir_get_ptr(struct pvrdma_page_dir *pdir,
u64 offset)
{
return pdir->pages[offset / PAGE_SIZE] + (offset % PAGE_SIZE);
}
static inline enum pvrdma_mtu ib_mtu_to_pvrdma(enum ib_mtu mtu)
{
return (enum pvrdma_mtu)mtu;
}
static inline enum ib_mtu pvrdma_mtu_to_ib(enum pvrdma_mtu mtu)
{
return (enum ib_mtu)mtu;
}
static inline enum pvrdma_port_state ib_port_state_to_pvrdma(
enum ib_port_state state)
{
return (enum pvrdma_port_state)state;
}
static inline enum ib_port_state pvrdma_port_state_to_ib(
enum pvrdma_port_state state)
{
return (enum ib_port_state)state;
}
static inline int ib_port_cap_flags_to_pvrdma(int flags)
{
return flags & PVRDMA_MASK(PVRDMA_PORT_CAP_FLAGS_MAX);
}
static inline int pvrdma_port_cap_flags_to_ib(int flags)
{
return flags;
}
static inline enum pvrdma_port_width ib_port_width_to_pvrdma(
enum ib_port_width width)
{
return (enum pvrdma_port_width)width;
}
static inline enum ib_port_width pvrdma_port_width_to_ib(
enum pvrdma_port_width width)
{
return (enum ib_port_width)width;
}
static inline enum pvrdma_port_speed ib_port_speed_to_pvrdma(
enum ib_port_speed speed)
{
return (enum pvrdma_port_speed)speed;
}
static inline enum ib_port_speed pvrdma_port_speed_to_ib(
enum pvrdma_port_speed speed)
{
return (enum ib_port_speed)speed;
}
static inline int pvrdma_qp_attr_mask_to_ib(int attr_mask)
{
return attr_mask;
}
static inline int ib_qp_attr_mask_to_pvrdma(int attr_mask)
{
return attr_mask & PVRDMA_MASK(PVRDMA_QP_ATTR_MASK_MAX);
}
static inline enum pvrdma_mig_state ib_mig_state_to_pvrdma(
enum ib_mig_state state)
{
return (enum pvrdma_mig_state)state;
}
static inline enum ib_mig_state pvrdma_mig_state_to_ib(
enum pvrdma_mig_state state)
{
return (enum ib_mig_state)state;
}
static inline int ib_access_flags_to_pvrdma(int flags)
{
return flags;
}
static inline int pvrdma_access_flags_to_ib(int flags)
{
return flags & PVRDMA_MASK(PVRDMA_ACCESS_FLAGS_MAX);
}
static inline enum pvrdma_qp_type ib_qp_type_to_pvrdma(enum ib_qp_type type)
{
return (enum pvrdma_qp_type)type;
}
static inline enum ib_qp_type pvrdma_qp_type_to_ib(enum pvrdma_qp_type type)
{
return (enum ib_qp_type)type;
}
static inline enum pvrdma_qp_state ib_qp_state_to_pvrdma(enum ib_qp_state state)
{
return (enum pvrdma_qp_state)state;
}
static inline enum ib_qp_state pvrdma_qp_state_to_ib(enum pvrdma_qp_state state)
{
return (enum ib_qp_state)state;
}
static inline enum pvrdma_wr_opcode ib_wr_opcode_to_pvrdma(enum ib_wr_opcode op)
{
return (enum pvrdma_wr_opcode)op;
}
static inline enum ib_wc_status pvrdma_wc_status_to_ib(
enum pvrdma_wc_status status)
{
return (enum ib_wc_status)status;
}
static inline int pvrdma_wc_opcode_to_ib(int opcode)
{
return opcode;
}
static inline int pvrdma_wc_flags_to_ib(int flags)
{
return flags;
}
static inline int ib_send_flags_to_pvrdma(int flags)
{
return flags & PVRDMA_MASK(PVRDMA_SEND_FLAGS_MAX);
}
void pvrdma_qp_cap_to_ib(struct ib_qp_cap *dst,
const struct pvrdma_qp_cap *src);
void ib_qp_cap_to_pvrdma(struct pvrdma_qp_cap *dst,
const struct ib_qp_cap *src);
void pvrdma_gid_to_ib(union ib_gid *dst, const union pvrdma_gid *src);
void ib_gid_to_pvrdma(union pvrdma_gid *dst, const union ib_gid *src);
void pvrdma_global_route_to_ib(struct ib_global_route *dst,
const struct pvrdma_global_route *src);
void ib_global_route_to_pvrdma(struct pvrdma_global_route *dst,
const struct ib_global_route *src);
void pvrdma_ah_attr_to_ib(struct ib_ah_attr *dst,
const struct pvrdma_ah_attr *src);
void ib_ah_attr_to_pvrdma(struct pvrdma_ah_attr *dst,
const struct ib_ah_attr *src);
int pvrdma_uar_table_init(struct pvrdma_dev *dev);
void pvrdma_uar_table_cleanup(struct pvrdma_dev *dev);
int pvrdma_uar_alloc(struct pvrdma_dev *dev, struct pvrdma_uar_map *uar);
void pvrdma_uar_free(struct pvrdma_dev *dev, struct pvrdma_uar_map *uar);
void _pvrdma_flush_cqe(struct pvrdma_qp *qp, struct pvrdma_cq *cq);
int pvrdma_page_dir_init(struct pvrdma_dev *dev, struct pvrdma_page_dir *pdir,
u64 npages, bool alloc_pages);
void pvrdma_page_dir_cleanup(struct pvrdma_dev *dev,
struct pvrdma_page_dir *pdir);
int pvrdma_page_dir_insert_dma(struct pvrdma_page_dir *pdir, u64 idx,
dma_addr_t daddr);
int pvrdma_page_dir_insert_umem(struct pvrdma_page_dir *pdir,
struct ib_umem *umem, u64 offset);
dma_addr_t pvrdma_page_dir_get_dma(struct pvrdma_page_dir *pdir, u64 idx);
int pvrdma_page_dir_insert_page_list(struct pvrdma_page_dir *pdir,
u64 *page_list, int num_pages);
int pvrdma_cmd_post(struct pvrdma_dev *dev, union pvrdma_cmd_req *req,
union pvrdma_cmd_resp *rsp, unsigned resp_code);
#endif /* __PVRDMA_H__ */
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <linux/list.h>
#include "pvrdma.h"
#define PVRDMA_CMD_TIMEOUT 10000 /* ms */
static inline int pvrdma_cmd_recv(struct pvrdma_dev *dev,
union pvrdma_cmd_resp *resp,
unsigned resp_code)
{
int err;
dev_dbg(&dev->pdev->dev, "receive response from device\n");
err = wait_for_completion_interruptible_timeout(&dev->cmd_done,
msecs_to_jiffies(PVRDMA_CMD_TIMEOUT));
if (err == 0 || err == -ERESTARTSYS) {
dev_warn(&dev->pdev->dev,
"completion timeout or interrupted\n");
return -ETIMEDOUT;
}
spin_lock(&dev->cmd_lock);
memcpy(resp, dev->resp_slot, sizeof(*resp));
spin_unlock(&dev->cmd_lock);
if (resp->hdr.ack != resp_code) {
dev_warn(&dev->pdev->dev,
"unknown response %#x expected %#x\n",
resp->hdr.ack, resp_code);
return -EFAULT;
}
return 0;
}
int
pvrdma_cmd_post(struct pvrdma_dev *dev, union pvrdma_cmd_req *req,
union pvrdma_cmd_resp *resp, unsigned resp_code)
{
int err;
dev_dbg(&dev->pdev->dev, "post request to device\n");
/* Serializiation */
down(&dev->cmd_sema);
BUILD_BUG_ON(sizeof(union pvrdma_cmd_req) !=
sizeof(struct pvrdma_cmd_modify_qp));
spin_lock(&dev->cmd_lock);
memcpy(dev->cmd_slot, req, sizeof(*req));
spin_unlock(&dev->cmd_lock);
init_completion(&dev->cmd_done);
pvrdma_write_reg(dev, PVRDMA_REG_REQUEST, 0);
/* Make sure the request is written before reading status. */
mb();
err = pvrdma_read_reg(dev, PVRDMA_REG_ERR);
if (err == 0) {
if (resp != NULL)
err = pvrdma_cmd_recv(dev, resp, resp_code);
} else {
dev_warn(&dev->pdev->dev,
"failed to write request error reg: %d\n", err);
err = -EFAULT;
}
up(&dev->cmd_sema);
return err;
}
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <asm/page.h>
#include <linux/io.h>
#include <linux/wait.h>
#include <rdma/ib_addr.h>
#include <rdma/ib_smi.h>
#include <rdma/ib_user_verbs.h>
#include "pvrdma.h"
/**
* pvrdma_req_notify_cq - request notification for a completion queue
* @ibcq: the completion queue
* @notify_flags: notification flags
*
* @return: 0 for success.
*/
int pvrdma_req_notify_cq(struct ib_cq *ibcq,
enum ib_cq_notify_flags notify_flags)
{
struct pvrdma_dev *dev = to_vdev(ibcq->device);
struct pvrdma_cq *cq = to_vcq(ibcq);
u32 val = cq->cq_handle;
val |= (notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
PVRDMA_UAR_CQ_ARM_SOL : PVRDMA_UAR_CQ_ARM;
pvrdma_write_uar_cq(dev, val);
return 0;
}
/**
* pvrdma_create_cq - create completion queue
* @ibdev: the device
* @attr: completion queue attributes
* @context: user context
* @udata: user data
*
* @return: ib_cq completion queue pointer on success,
* otherwise returns negative errno.
*/
struct ib_cq *pvrdma_create_cq(struct ib_device *ibdev,
const struct ib_cq_init_attr *attr,
struct ib_ucontext *context,
struct ib_udata *udata)
{
int entries = attr->cqe;
struct pvrdma_dev *dev = to_vdev(ibdev);
struct pvrdma_cq *cq;
int ret;
int npages;
unsigned long flags;
union pvrdma_cmd_req req;
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_cq *cmd = &req.create_cq;
struct pvrdma_cmd_create_cq_resp *resp = &rsp.create_cq_resp;
struct pvrdma_create_cq ucmd;
BUILD_BUG_ON(sizeof(struct pvrdma_cqe) != 64);
entries = roundup_pow_of_two(entries);
if (entries < 1 || entries > dev->dsr->caps.max_cqe)
return ERR_PTR(-EINVAL);
if (!atomic_add_unless(&dev->num_cqs, 1, dev->dsr->caps.max_cq))
return ERR_PTR(-ENOMEM);
cq = kzalloc(sizeof(*cq), GFP_KERNEL);
if (!cq) {
atomic_dec(&dev->num_cqs);
return ERR_PTR(-ENOMEM);
}
cq->ibcq.cqe = entries;
if (context) {
if (ib_copy_from_udata(&ucmd, udata, sizeof(ucmd))) {
ret = -EFAULT;
goto err_cq;
}
cq->umem = ib_umem_get(context, ucmd.buf_addr, ucmd.buf_size,
IB_ACCESS_LOCAL_WRITE, 1);
if (IS_ERR(cq->umem)) {
ret = PTR_ERR(cq->umem);
goto err_cq;
}
npages = ib_umem_page_count(cq->umem);
} else {
cq->is_kernel = true;
/* One extra page for shared ring state */
npages = 1 + (entries * sizeof(struct pvrdma_cqe) +
PAGE_SIZE - 1) / PAGE_SIZE;
/* Skip header page. */
cq->offset = PAGE_SIZE;
}
if (npages < 0 || npages > PVRDMA_PAGE_DIR_MAX_PAGES) {
dev_warn(&dev->pdev->dev,
"overflow pages in completion queue\n");
ret = -EINVAL;
goto err_umem;
}
ret = pvrdma_page_dir_init(dev, &cq->pdir, npages, cq->is_kernel);
if (ret) {
dev_warn(&dev->pdev->dev,
"could not allocate page directory\n");
goto err_umem;
}
/* Ring state is always the first page. Set in library for user cq. */
if (cq->is_kernel)
cq->ring_state = cq->pdir.pages[0];
else
pvrdma_page_dir_insert_umem(&cq->pdir, cq->umem, 0);
atomic_set(&cq->refcnt, 1);
init_waitqueue_head(&cq->wait);
spin_lock_init(&cq->cq_lock);
memset(cmd, 0, sizeof(*cmd));
cmd->hdr.cmd = PVRDMA_CMD_CREATE_CQ;
cmd->nchunks = npages;
cmd->ctx_handle = (context) ?
(u64)to_vucontext(context)->ctx_handle : 0;
cmd->cqe = entries;
cmd->pdir_dma = cq->pdir.dir_dma;
ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_CQ_RESP);
if (ret < 0) {
dev_warn(&dev->pdev->dev,
"could not create completion queue, error: %d\n", ret);
goto err_page_dir;
}
cq->ibcq.cqe = resp->cqe;
cq->cq_handle = resp->cq_handle;
spin_lock_irqsave(&dev->cq_tbl_lock, flags);
dev->cq_tbl[cq->cq_handle % dev->dsr->caps.max_cq] = cq;
spin_unlock_irqrestore(&dev->cq_tbl_lock, flags);
if (context) {
cq->uar = &(to_vucontext(context)->uar);
/* Copy udata back. */
if (ib_copy_to_udata(udata, &cq->cq_handle, sizeof(__u32))) {
dev_warn(&dev->pdev->dev,
"failed to copy back udata\n");
pvrdma_destroy_cq(&cq->ibcq);
return ERR_PTR(-EINVAL);
}
}
return &cq->ibcq;
err_page_dir:
pvrdma_page_dir_cleanup(dev, &cq->pdir);
err_umem:
if (context)
ib_umem_release(cq->umem);
err_cq:
atomic_dec(&dev->num_cqs);
kfree(cq);
return ERR_PTR(ret);
}
static void pvrdma_free_cq(struct pvrdma_dev *dev, struct pvrdma_cq *cq)
{
atomic_dec(&cq->refcnt);
wait_event(cq->wait, !atomic_read(&cq->refcnt));
if (!cq->is_kernel)
ib_umem_release(cq->umem);
pvrdma_page_dir_cleanup(dev, &cq->pdir);
kfree(cq);
}
/**
* pvrdma_destroy_cq - destroy completion queue
* @cq: the completion queue to destroy.
*
* @return: 0 for success.
*/
int pvrdma_destroy_cq(struct ib_cq *cq)
{
struct pvrdma_cq *vcq = to_vcq(cq);
union pvrdma_cmd_req req;
struct pvrdma_cmd_destroy_cq *cmd = &req.destroy_cq;
struct pvrdma_dev *dev = to_vdev(cq->device);
unsigned long flags;
int ret;
memset(cmd, 0, sizeof(*cmd));
cmd->hdr.cmd = PVRDMA_CMD_DESTROY_CQ;
cmd->cq_handle = vcq->cq_handle;
ret = pvrdma_cmd_post(dev, &req, NULL, 0);
if (ret < 0)
dev_warn(&dev->pdev->dev,
"could not destroy completion queue, error: %d\n",
ret);
/* free cq's resources */
spin_lock_irqsave(&dev->cq_tbl_lock, flags);
dev->cq_tbl[vcq->cq_handle] = NULL;
spin_unlock_irqrestore(&dev->cq_tbl_lock, flags);
pvrdma_free_cq(dev, vcq);
atomic_dec(&dev->num_cqs);
return ret;
}
/**
* pvrdma_modify_cq - modify the CQ moderation parameters
* @ibcq: the CQ to modify
* @cq_count: number of CQEs that will trigger an event
* @cq_period: max period of time in usec before triggering an event
*
* @return: -EOPNOTSUPP as CQ resize is not supported.
*/
int pvrdma_modify_cq(struct ib_cq *cq, u16 cq_count, u16 cq_period)
{
return -EOPNOTSUPP;
}
static inline struct pvrdma_cqe *get_cqe(struct pvrdma_cq *cq, int i)
{
return (struct pvrdma_cqe *)pvrdma_page_dir_get_ptr(
&cq->pdir,
cq->offset +
sizeof(struct pvrdma_cqe) * i);
}
void _pvrdma_flush_cqe(struct pvrdma_qp *qp, struct pvrdma_cq *cq)
{
int head;
int has_data;
if (!cq->is_kernel)
return;
/* Lock held */
has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
cq->ibcq.cqe, &head);
if (unlikely(has_data > 0)) {
int items;
int curr;
int tail = pvrdma_idx(&cq->ring_state->rx.prod_tail,
cq->ibcq.cqe);
struct pvrdma_cqe *cqe;
struct pvrdma_cqe *curr_cqe;
items = (tail > head) ? (tail - head) :
(cq->ibcq.cqe - head + tail);
curr = --tail;
while (items-- > 0) {
if (curr < 0)
curr = cq->ibcq.cqe - 1;
if (tail < 0)
tail = cq->ibcq.cqe - 1;
curr_cqe = get_cqe(cq, curr);
if ((curr_cqe->qp & 0xFFFF) != qp->qp_handle) {
if (curr != tail) {
cqe = get_cqe(cq, tail);
*cqe = *curr_cqe;
}
tail--;
} else {
pvrdma_idx_ring_inc(
&cq->ring_state->rx.cons_head,
cq->ibcq.cqe);
}
curr--;
}
}
}
static int pvrdma_poll_one(struct pvrdma_cq *cq, struct pvrdma_qp **cur_qp,
struct ib_wc *wc)
{
struct pvrdma_dev *dev = to_vdev(cq->ibcq.device);
int has_data;
unsigned int head;
bool tried = false;
struct pvrdma_cqe *cqe;
retry:
has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
cq->ibcq.cqe, &head);
if (has_data == 0) {
if (tried)
return -EAGAIN;
pvrdma_write_uar_cq(dev, cq->cq_handle | PVRDMA_UAR_CQ_POLL);
tried = true;
goto retry;
} else if (has_data == PVRDMA_INVALID_IDX) {
dev_err(&dev->pdev->dev, "CQ ring state invalid\n");
return -EAGAIN;
}
cqe = get_cqe(cq, head);
/* Ensure cqe is valid. */
rmb();
if (dev->qp_tbl[cqe->qp & 0xffff])
*cur_qp = (struct pvrdma_qp *)dev->qp_tbl[cqe->qp & 0xffff];
else
return -EAGAIN;
wc->opcode = pvrdma_wc_opcode_to_ib(cqe->opcode);
wc->status = pvrdma_wc_status_to_ib(cqe->status);
wc->wr_id = cqe->wr_id;
wc->qp = &(*cur_qp)->ibqp;
wc->byte_len = cqe->byte_len;
wc->ex.imm_data = cqe->imm_data;
wc->src_qp = cqe->src_qp;
wc->wc_flags = pvrdma_wc_flags_to_ib(cqe->wc_flags);
wc->pkey_index = cqe->pkey_index;
wc->slid = cqe->slid;
wc->sl = cqe->sl;
wc->dlid_path_bits = cqe->dlid_path_bits;
wc->port_num = cqe->port_num;
wc->vendor_err = 0;
/* Update shared ring state */
pvrdma_idx_ring_inc(&cq->ring_state->rx.cons_head, cq->ibcq.cqe);
return 0;
}
/**
* pvrdma_poll_cq - poll for work completion queue entries
* @ibcq: completion queue
* @num_entries: the maximum number of entries
* @entry: pointer to work completion array
*
* @return: number of polled completion entries
*/
int pvrdma_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)
{
struct pvrdma_cq *cq = to_vcq(ibcq);
struct pvrdma_qp *cur_qp = NULL;
unsigned long flags;
int npolled;
if (num_entries < 1 || wc == NULL)
return 0;
spin_lock_irqsave(&cq->cq_lock, flags);
for (npolled = 0; npolled < num_entries; ++npolled) {
if (pvrdma_poll_one(cq, &cur_qp, wc + npolled))
break;
}
spin_unlock_irqrestore(&cq->cq_lock, flags);
/* Ensure we do not return errors from poll_cq */
return npolled;
}
/**
* pvrdma_resize_cq - resize CQ
* @ibcq: the completion queue
* @entries: CQ entries
* @udata: user data
*
* @return: -EOPNOTSUPP as CQ resize is not supported.
*/
int pvrdma_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata)
{
return -EOPNOTSUPP;
}
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __PVRDMA_DEV_API_H__
#define __PVRDMA_DEV_API_H__
#include <linux/types.h>
#include "pvrdma_verbs.h"
#define PVRDMA_VERSION 17
#define PVRDMA_BOARD_ID 1
#define PVRDMA_REV_ID 1
/*
* Masks and accessors for page directory, which is a two-level lookup:
* page directory -> page table -> page. Only one directory for now, but we
* could expand that easily. 9 bits for tables, 9 bits for pages, gives one
* gigabyte for memory regions and so forth.
*/
#define PVRDMA_PDIR_SHIFT 18
#define PVRDMA_PTABLE_SHIFT 9
#define PVRDMA_PAGE_DIR_DIR(x) (((x) >> PVRDMA_PDIR_SHIFT) & 0x1)
#define PVRDMA_PAGE_DIR_TABLE(x) (((x) >> PVRDMA_PTABLE_SHIFT) & 0x1ff)
#define PVRDMA_PAGE_DIR_PAGE(x) ((x) & 0x1ff)
#define PVRDMA_PAGE_DIR_MAX_PAGES (1 * 512 * 512)
#define PVRDMA_MAX_FAST_REG_PAGES 128
/*
* Max MSI-X vectors.
*/
#define PVRDMA_MAX_INTERRUPTS 3
/* Register offsets within PCI resource on BAR1. */
#define PVRDMA_REG_VERSION 0x00 /* R: Version of device. */
#define PVRDMA_REG_DSRLOW 0x04 /* W: Device shared region low PA. */
#define PVRDMA_REG_DSRHIGH 0x08 /* W: Device shared region high PA. */
#define PVRDMA_REG_CTL 0x0c /* W: PVRDMA_DEVICE_CTL */
#define PVRDMA_REG_REQUEST 0x10 /* W: Indicate device request. */
#define PVRDMA_REG_ERR 0x14 /* R: Device error. */
#define PVRDMA_REG_ICR 0x18 /* R: Interrupt cause. */
#define PVRDMA_REG_IMR 0x1c /* R/W: Interrupt mask. */
#define PVRDMA_REG_MACL 0x20 /* R/W: MAC address low. */
#define PVRDMA_REG_MACH 0x24 /* R/W: MAC address high. */
/* Object flags. */
#define PVRDMA_CQ_FLAG_ARMED_SOL BIT(0) /* Armed for solicited-only. */
#define PVRDMA_CQ_FLAG_ARMED BIT(1) /* Armed. */
#define PVRDMA_MR_FLAG_DMA BIT(0) /* DMA region. */
#define PVRDMA_MR_FLAG_FRMR BIT(1) /* Fast reg memory region. */
/*
* Atomic operation capability (masked versions are extended atomic
* operations.
*/
#define PVRDMA_ATOMIC_OP_COMP_SWAP BIT(0) /* Compare and swap. */
#define PVRDMA_ATOMIC_OP_FETCH_ADD BIT(1) /* Fetch and add. */
#define PVRDMA_ATOMIC_OP_MASK_COMP_SWAP BIT(2) /* Masked compare and swap. */
#define PVRDMA_ATOMIC_OP_MASK_FETCH_ADD BIT(3) /* Masked fetch and add. */
/*
* Base Memory Management Extension flags to support Fast Reg Memory Regions
* and Fast Reg Work Requests. Each flag represents a verb operation and we
* must support all of them to qualify for the BMME device cap.
*/
#define PVRDMA_BMME_FLAG_LOCAL_INV BIT(0) /* Local Invalidate. */
#define PVRDMA_BMME_FLAG_REMOTE_INV BIT(1) /* Remote Invalidate. */
#define PVRDMA_BMME_FLAG_FAST_REG_WR BIT(2) /* Fast Reg Work Request. */
/*
* GID types. The interpretation of the gid_types bit field in the device
* capabilities will depend on the device mode. For now, the device only
* supports RoCE as mode, so only the different GID types for RoCE are
* defined.
*/
#define PVRDMA_GID_TYPE_FLAG_ROCE_V1 BIT(0)
#define PVRDMA_GID_TYPE_FLAG_ROCE_V2 BIT(1)
enum pvrdma_pci_resource {
PVRDMA_PCI_RESOURCE_MSIX, /* BAR0: MSI-X, MMIO. */
PVRDMA_PCI_RESOURCE_REG, /* BAR1: Registers, MMIO. */
PVRDMA_PCI_RESOURCE_UAR, /* BAR2: UAR pages, MMIO, 64-bit. */
PVRDMA_PCI_RESOURCE_LAST, /* Last. */
};
enum pvrdma_device_ctl {
PVRDMA_DEVICE_CTL_ACTIVATE, /* Activate device. */
PVRDMA_DEVICE_CTL_QUIESCE, /* Quiesce device. */
PVRDMA_DEVICE_CTL_RESET, /* Reset device. */
};
enum pvrdma_intr_vector {
PVRDMA_INTR_VECTOR_RESPONSE, /* Command response. */
PVRDMA_INTR_VECTOR_ASYNC, /* Async events. */
PVRDMA_INTR_VECTOR_CQ, /* CQ notification. */
/* Additional CQ notification vectors. */
};
enum pvrdma_intr_cause {
PVRDMA_INTR_CAUSE_RESPONSE = (1 << PVRDMA_INTR_VECTOR_RESPONSE),
PVRDMA_INTR_CAUSE_ASYNC = (1 << PVRDMA_INTR_VECTOR_ASYNC),
PVRDMA_INTR_CAUSE_CQ = (1 << PVRDMA_INTR_VECTOR_CQ),
};
enum pvrdma_intr_type {
PVRDMA_INTR_TYPE_INTX, /* Legacy. */
PVRDMA_INTR_TYPE_MSI, /* MSI. */
PVRDMA_INTR_TYPE_MSIX, /* MSI-X. */
};
enum pvrdma_gos_bits {
PVRDMA_GOS_BITS_UNK, /* Unknown. */
PVRDMA_GOS_BITS_32, /* 32-bit. */
PVRDMA_GOS_BITS_64, /* 64-bit. */
};
enum pvrdma_gos_type {
PVRDMA_GOS_TYPE_UNK, /* Unknown. */
PVRDMA_GOS_TYPE_LINUX, /* Linux. */
};
enum pvrdma_device_mode {
PVRDMA_DEVICE_MODE_ROCE, /* RoCE. */
PVRDMA_DEVICE_MODE_IWARP, /* iWarp. */
PVRDMA_DEVICE_MODE_IB, /* InfiniBand. */
};
struct pvrdma_gos_info {
u32 gos_bits:2; /* W: PVRDMA_GOS_BITS_ */
u32 gos_type:4; /* W: PVRDMA_GOS_TYPE_ */
u32 gos_ver:16; /* W: Guest OS version. */
u32 gos_misc:10; /* W: Other. */
u32 pad; /* Pad to 8-byte alignment. */
};
struct pvrdma_device_caps {
u64 fw_ver; /* R: Query device. */
__be64 node_guid;
__be64 sys_image_guid;
u64 max_mr_size;
u64 page_size_cap;
u64 atomic_arg_sizes; /* EX verbs. */
u32 ex_comp_mask; /* EX verbs. */
u32 device_cap_flags2; /* EX verbs. */
u32 max_fa_bit_boundary; /* EX verbs. */
u32 log_max_atomic_inline_arg; /* EX verbs. */
u32 vendor_id;
u32 vendor_part_id;
u32 hw_ver;
u32 max_qp;
u32 max_qp_wr;
u32 device_cap_flags;
u32 max_sge;
u32 max_sge_rd;
u32 max_cq;
u32 max_cqe;
u32 max_mr;
u32 max_pd;
u32 max_qp_rd_atom;
u32 max_ee_rd_atom;
u32 max_res_rd_atom;
u32 max_qp_init_rd_atom;
u32 max_ee_init_rd_atom;
u32 max_ee;
u32 max_rdd;
u32 max_mw;
u32 max_raw_ipv6_qp;
u32 max_raw_ethy_qp;
u32 max_mcast_grp;
u32 max_mcast_qp_attach;
u32 max_total_mcast_qp_attach;
u32 max_ah;
u32 max_fmr;
u32 max_map_per_fmr;
u32 max_srq;
u32 max_srq_wr;
u32 max_srq_sge;
u32 max_uar;
u32 gid_tbl_len;
u16 max_pkeys;
u8 local_ca_ack_delay;
u8 phys_port_cnt;
u8 mode; /* PVRDMA_DEVICE_MODE_ */
u8 atomic_ops; /* PVRDMA_ATOMIC_OP_* bits */
u8 bmme_flags; /* FRWR Mem Mgmt Extensions */
u8 gid_types; /* PVRDMA_GID_TYPE_FLAG_ */
u8 reserved[4];
};
struct pvrdma_ring_page_info {
u32 num_pages; /* Num pages incl. header. */
u32 reserved; /* Reserved. */
u64 pdir_dma; /* Page directory PA. */
};
#pragma pack(push, 1)
struct pvrdma_device_shared_region {
u32 driver_version; /* W: Driver version. */
u32 pad; /* Pad to 8-byte align. */
struct pvrdma_gos_info gos_info; /* W: Guest OS information. */
u64 cmd_slot_dma; /* W: Command slot address. */
u64 resp_slot_dma; /* W: Response slot address. */
struct pvrdma_ring_page_info async_ring_pages;
/* W: Async ring page info. */
struct pvrdma_ring_page_info cq_ring_pages;
/* W: CQ ring page info. */
u32 uar_pfn; /* W: UAR pageframe. */
u32 pad2; /* Pad to 8-byte align. */
struct pvrdma_device_caps caps; /* R: Device capabilities. */
};
#pragma pack(pop)
/* Event types. Currently a 1:1 mapping with enum ib_event. */
enum pvrdma_eqe_type {
PVRDMA_EVENT_CQ_ERR,
PVRDMA_EVENT_QP_FATAL,
PVRDMA_EVENT_QP_REQ_ERR,
PVRDMA_EVENT_QP_ACCESS_ERR,
PVRDMA_EVENT_COMM_EST,
PVRDMA_EVENT_SQ_DRAINED,
PVRDMA_EVENT_PATH_MIG,
PVRDMA_EVENT_PATH_MIG_ERR,
PVRDMA_EVENT_DEVICE_FATAL,
PVRDMA_EVENT_PORT_ACTIVE,
PVRDMA_EVENT_PORT_ERR,
PVRDMA_EVENT_LID_CHANGE,
PVRDMA_EVENT_PKEY_CHANGE,
PVRDMA_EVENT_SM_CHANGE,
PVRDMA_EVENT_SRQ_ERR,
PVRDMA_EVENT_SRQ_LIMIT_REACHED,
PVRDMA_EVENT_QP_LAST_WQE_REACHED,
PVRDMA_EVENT_CLIENT_REREGISTER,
PVRDMA_EVENT_GID_CHANGE,
};
/* Event queue element. */
struct pvrdma_eqe {
u32 type; /* Event type. */
u32 info; /* Handle, other. */
};
/* CQ notification queue element. */
struct pvrdma_cqne {
u32 info; /* Handle */
};
enum {
PVRDMA_CMD_FIRST,
PVRDMA_CMD_QUERY_PORT = PVRDMA_CMD_FIRST,
PVRDMA_CMD_QUERY_PKEY,
PVRDMA_CMD_CREATE_PD,
PVRDMA_CMD_DESTROY_PD,
PVRDMA_CMD_CREATE_MR,
PVRDMA_CMD_DESTROY_MR,
PVRDMA_CMD_CREATE_CQ,
PVRDMA_CMD_RESIZE_CQ,
PVRDMA_CMD_DESTROY_CQ,
PVRDMA_CMD_CREATE_QP,
PVRDMA_CMD_MODIFY_QP,
PVRDMA_CMD_QUERY_QP,
PVRDMA_CMD_DESTROY_QP,
PVRDMA_CMD_CREATE_UC,
PVRDMA_CMD_DESTROY_UC,
PVRDMA_CMD_CREATE_BIND,
PVRDMA_CMD_DESTROY_BIND,
PVRDMA_CMD_MAX,
};
enum {
PVRDMA_CMD_FIRST_RESP = (1 << 31),
PVRDMA_CMD_QUERY_PORT_RESP = PVRDMA_CMD_FIRST_RESP,
PVRDMA_CMD_QUERY_PKEY_RESP,
PVRDMA_CMD_CREATE_PD_RESP,
PVRDMA_CMD_DESTROY_PD_RESP_NOOP,
PVRDMA_CMD_CREATE_MR_RESP,
PVRDMA_CMD_DESTROY_MR_RESP_NOOP,
PVRDMA_CMD_CREATE_CQ_RESP,
PVRDMA_CMD_RESIZE_CQ_RESP,
PVRDMA_CMD_DESTROY_CQ_RESP_NOOP,
PVRDMA_CMD_CREATE_QP_RESP,
PVRDMA_CMD_MODIFY_QP_RESP,
PVRDMA_CMD_QUERY_QP_RESP,
PVRDMA_CMD_DESTROY_QP_RESP,
PVRDMA_CMD_CREATE_UC_RESP,
PVRDMA_CMD_DESTROY_UC_RESP_NOOP,
PVRDMA_CMD_CREATE_BIND_RESP_NOOP,
PVRDMA_CMD_DESTROY_BIND_RESP_NOOP,
PVRDMA_CMD_MAX_RESP,
};
struct pvrdma_cmd_hdr {
u64 response; /* Key for response lookup. */
u32 cmd; /* PVRDMA_CMD_ */
u32 reserved; /* Reserved. */
};
struct pvrdma_cmd_resp_hdr {
u64 response; /* From cmd hdr. */
u32 ack; /* PVRDMA_CMD_XXX_RESP */
u8 err; /* Error. */
u8 reserved[3]; /* Reserved. */
};
struct pvrdma_cmd_query_port {
struct pvrdma_cmd_hdr hdr;
u8 port_num;
u8 reserved[7];
};
struct pvrdma_cmd_query_port_resp {
struct pvrdma_cmd_resp_hdr hdr;
struct pvrdma_port_attr attrs;
};
struct pvrdma_cmd_query_pkey {
struct pvrdma_cmd_hdr hdr;
u8 port_num;
u8 index;
u8 reserved[6];
};
struct pvrdma_cmd_query_pkey_resp {
struct pvrdma_cmd_resp_hdr hdr;
u16 pkey;
u8 reserved[6];
};
struct pvrdma_cmd_create_uc {
struct pvrdma_cmd_hdr hdr;
u32 pfn; /* UAR page frame number */
u8 reserved[4];
};
struct pvrdma_cmd_create_uc_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 ctx_handle;
u8 reserved[4];
};
struct pvrdma_cmd_destroy_uc {
struct pvrdma_cmd_hdr hdr;
u32 ctx_handle;
u8 reserved[4];
};
struct pvrdma_cmd_create_pd {
struct pvrdma_cmd_hdr hdr;
u32 ctx_handle;
u8 reserved[4];
};
struct pvrdma_cmd_create_pd_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 pd_handle;
u8 reserved[4];
};
struct pvrdma_cmd_destroy_pd {
struct pvrdma_cmd_hdr hdr;
u32 pd_handle;
u8 reserved[4];
};
struct pvrdma_cmd_create_mr {
struct pvrdma_cmd_hdr hdr;
u64 start;
u64 length;
u64 pdir_dma;
u32 pd_handle;
u32 access_flags;
u32 flags;
u32 nchunks;
};
struct pvrdma_cmd_create_mr_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 mr_handle;
u32 lkey;
u32 rkey;
u8 reserved[4];
};
struct pvrdma_cmd_destroy_mr {
struct pvrdma_cmd_hdr hdr;
u32 mr_handle;
u8 reserved[4];
};
struct pvrdma_cmd_create_cq {
struct pvrdma_cmd_hdr hdr;
u64 pdir_dma;
u32 ctx_handle;
u32 cqe;
u32 nchunks;
u8 reserved[4];
};
struct pvrdma_cmd_create_cq_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 cq_handle;
u32 cqe;
};
struct pvrdma_cmd_resize_cq {
struct pvrdma_cmd_hdr hdr;
u32 cq_handle;
u32 cqe;
};
struct pvrdma_cmd_resize_cq_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 cqe;
u8 reserved[4];
};
struct pvrdma_cmd_destroy_cq {
struct pvrdma_cmd_hdr hdr;
u32 cq_handle;
u8 reserved[4];
};
struct pvrdma_cmd_create_qp {
struct pvrdma_cmd_hdr hdr;
u64 pdir_dma;
u32 pd_handle;
u32 send_cq_handle;
u32 recv_cq_handle;
u32 srq_handle;
u32 max_send_wr;
u32 max_recv_wr;
u32 max_send_sge;
u32 max_recv_sge;
u32 max_inline_data;
u32 lkey;
u32 access_flags;
u16 total_chunks;
u16 send_chunks;
u16 max_atomic_arg;
u8 sq_sig_all;
u8 qp_type;
u8 is_srq;
u8 reserved[3];
};
struct pvrdma_cmd_create_qp_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 qpn;
u32 max_send_wr;
u32 max_recv_wr;
u32 max_send_sge;
u32 max_recv_sge;
u32 max_inline_data;
};
struct pvrdma_cmd_modify_qp {
struct pvrdma_cmd_hdr hdr;
u32 qp_handle;
u32 attr_mask;
struct pvrdma_qp_attr attrs;
};
struct pvrdma_cmd_query_qp {
struct pvrdma_cmd_hdr hdr;
u32 qp_handle;
u32 attr_mask;
};
struct pvrdma_cmd_query_qp_resp {
struct pvrdma_cmd_resp_hdr hdr;
struct pvrdma_qp_attr attrs;
};
struct pvrdma_cmd_destroy_qp {
struct pvrdma_cmd_hdr hdr;
u32 qp_handle;
u8 reserved[4];
};
struct pvrdma_cmd_destroy_qp_resp {
struct pvrdma_cmd_resp_hdr hdr;
u32 events_reported;
u8 reserved[4];
};
struct pvrdma_cmd_create_bind {
struct pvrdma_cmd_hdr hdr;
u32 mtu;
u32 vlan;
u32 index;
u8 new_gid[16];
u8 gid_type;
u8 reserved[3];
};
struct pvrdma_cmd_destroy_bind {
struct pvrdma_cmd_hdr hdr;
u32 index;
u8 dest_gid[16];
u8 reserved[4];
};
union pvrdma_cmd_req {
struct pvrdma_cmd_hdr hdr;
struct pvrdma_cmd_query_port query_port;
struct pvrdma_cmd_query_pkey query_pkey;
struct pvrdma_cmd_create_uc create_uc;
struct pvrdma_cmd_destroy_uc destroy_uc;
struct pvrdma_cmd_create_pd create_pd;
struct pvrdma_cmd_destroy_pd destroy_pd;
struct pvrdma_cmd_create_mr create_mr;
struct pvrdma_cmd_destroy_mr destroy_mr;
struct pvrdma_cmd_create_cq create_cq;
struct pvrdma_cmd_resize_cq resize_cq;
struct pvrdma_cmd_destroy_cq destroy_cq;
struct pvrdma_cmd_create_qp create_qp;
struct pvrdma_cmd_modify_qp modify_qp;
struct pvrdma_cmd_query_qp query_qp;
struct pvrdma_cmd_destroy_qp destroy_qp;
struct pvrdma_cmd_create_bind create_bind;
struct pvrdma_cmd_destroy_bind destroy_bind;
};
union pvrdma_cmd_resp {
struct pvrdma_cmd_resp_hdr hdr;
struct pvrdma_cmd_query_port_resp query_port_resp;
struct pvrdma_cmd_query_pkey_resp query_pkey_resp;
struct pvrdma_cmd_create_uc_resp create_uc_resp;
struct pvrdma_cmd_create_pd_resp create_pd_resp;
struct pvrdma_cmd_create_mr_resp create_mr_resp;
struct pvrdma_cmd_create_cq_resp create_cq_resp;
struct pvrdma_cmd_resize_cq_resp resize_cq_resp;
struct pvrdma_cmd_create_qp_resp create_qp_resp;
struct pvrdma_cmd_query_qp_resp query_qp_resp;
struct pvrdma_cmd_destroy_qp_resp destroy_qp_resp;
};
#endif /* __PVRDMA_DEV_API_H__ */
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <linux/bitmap.h>
#include <linux/errno.h>
#include <linux/slab.h>
#include "pvrdma.h"
int pvrdma_uar_table_init(struct pvrdma_dev *dev)
{
u32 num = dev->dsr->caps.max_uar;
u32 mask = num - 1;
struct pvrdma_id_table *tbl = &dev->uar_table.tbl;
if (!is_power_of_2(num))
return -EINVAL;
tbl->last = 0;
tbl->top = 0;
tbl->max = num;
tbl->mask = mask;
spin_lock_init(&tbl->lock);
tbl->table = kcalloc(BITS_TO_LONGS(num), sizeof(long), GFP_KERNEL);
if (!tbl->table)
return -ENOMEM;
/* 0th UAR is taken by the device. */
set_bit(0, tbl->table);
return 0;
}
void pvrdma_uar_table_cleanup(struct pvrdma_dev *dev)
{
struct pvrdma_id_table *tbl = &dev->uar_table.tbl;
kfree(tbl->table);
}
int pvrdma_uar_alloc(struct pvrdma_dev *dev, struct pvrdma_uar_map *uar)
{
struct pvrdma_id_table *tbl;
unsigned long flags;
u32 obj;
tbl = &dev->uar_table.tbl;
spin_lock_irqsave(&tbl->lock, flags);
obj = find_next_zero_bit(tbl->table, tbl->max, tbl->last);
if (obj >= tbl->max) {
tbl->top = (tbl->top + tbl->max) & tbl->mask;
obj = find_first_zero_bit(tbl->table, tbl->max);
}
if (obj >= tbl->max) {
spin_unlock_irqrestore(&tbl->lock, flags);
return -ENOMEM;
}
set_bit(obj, tbl->table);
obj |= tbl->top;
spin_unlock_irqrestore(&tbl->lock, flags);
uar->index = obj;
uar->pfn = (pci_resource_start(dev->pdev, PVRDMA_PCI_RESOURCE_UAR) >>
PAGE_SHIFT) + uar->index;
return 0;
}
void pvrdma_uar_free(struct pvrdma_dev *dev, struct pvrdma_uar_map *uar)
{
struct pvrdma_id_table *tbl = &dev->uar_table.tbl;
unsigned long flags;
u32 obj;
obj = uar->index & (tbl->max - 1);
spin_lock_irqsave(&tbl->lock, flags);
clear_bit(obj, tbl->table);
tbl->last = min(tbl->last, obj);
tbl->top = (tbl->top + tbl->max) & tbl->mask;
spin_unlock_irqrestore(&tbl->lock, flags);
}
此差异已折叠。
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/bitmap.h>
#include "pvrdma.h"
int pvrdma_page_dir_init(struct pvrdma_dev *dev, struct pvrdma_page_dir *pdir,
u64 npages, bool alloc_pages)
{
u64 i;
if (npages > PVRDMA_PAGE_DIR_MAX_PAGES)
return -EINVAL;
memset(pdir, 0, sizeof(*pdir));
pdir->dir = dma_alloc_coherent(&dev->pdev->dev, PAGE_SIZE,
&pdir->dir_dma, GFP_KERNEL);
if (!pdir->dir)
goto err;
pdir->ntables = PVRDMA_PAGE_DIR_TABLE(npages - 1) + 1;
pdir->tables = kcalloc(pdir->ntables, sizeof(*pdir->tables),
GFP_KERNEL);
if (!pdir->tables)
goto err;
for (i = 0; i < pdir->ntables; i++) {
pdir->tables[i] = dma_alloc_coherent(&dev->pdev->dev, PAGE_SIZE,
(dma_addr_t *)&pdir->dir[i],
GFP_KERNEL);
if (!pdir->tables[i])
goto err;
}
pdir->npages = npages;
if (alloc_pages) {
pdir->pages = kcalloc(npages, sizeof(*pdir->pages),
GFP_KERNEL);
if (!pdir->pages)
goto err;
for (i = 0; i < pdir->npages; i++) {
dma_addr_t page_dma;
pdir->pages[i] = dma_alloc_coherent(&dev->pdev->dev,
PAGE_SIZE,
&page_dma,
GFP_KERNEL);
if (!pdir->pages[i])
goto err;
pvrdma_page_dir_insert_dma(pdir, i, page_dma);
}
}
return 0;
err:
pvrdma_page_dir_cleanup(dev, pdir);
return -ENOMEM;
}
static u64 *pvrdma_page_dir_table(struct pvrdma_page_dir *pdir, u64 idx)
{
return pdir->tables[PVRDMA_PAGE_DIR_TABLE(idx)];
}
dma_addr_t pvrdma_page_dir_get_dma(struct pvrdma_page_dir *pdir, u64 idx)
{
return pvrdma_page_dir_table(pdir, idx)[PVRDMA_PAGE_DIR_PAGE(idx)];
}
static void pvrdma_page_dir_cleanup_pages(struct pvrdma_dev *dev,
struct pvrdma_page_dir *pdir)
{
if (pdir->pages) {
u64 i;
for (i = 0; i < pdir->npages && pdir->pages[i]; i++) {
dma_addr_t page_dma = pvrdma_page_dir_get_dma(pdir, i);
dma_free_coherent(&dev->pdev->dev, PAGE_SIZE,
pdir->pages[i], page_dma);
}
kfree(pdir->pages);
}
}
static void pvrdma_page_dir_cleanup_tables(struct pvrdma_dev *dev,
struct pvrdma_page_dir *pdir)
{
if (pdir->tables) {
int i;
pvrdma_page_dir_cleanup_pages(dev, pdir);
for (i = 0; i < pdir->ntables; i++) {
u64 *table = pdir->tables[i];
if (table)
dma_free_coherent(&dev->pdev->dev, PAGE_SIZE,
table, pdir->dir[i]);
}
kfree(pdir->tables);
}
}
void pvrdma_page_dir_cleanup(struct pvrdma_dev *dev,
struct pvrdma_page_dir *pdir)
{
if (pdir->dir) {
pvrdma_page_dir_cleanup_tables(dev, pdir);
dma_free_coherent(&dev->pdev->dev, PAGE_SIZE,
pdir->dir, pdir->dir_dma);
}
}
int pvrdma_page_dir_insert_dma(struct pvrdma_page_dir *pdir, u64 idx,
dma_addr_t daddr)
{
u64 *table;
if (idx >= pdir->npages)
return -EINVAL;
table = pvrdma_page_dir_table(pdir, idx);
table[PVRDMA_PAGE_DIR_PAGE(idx)] = daddr;
return 0;
}
int pvrdma_page_dir_insert_umem(struct pvrdma_page_dir *pdir,
struct ib_umem *umem, u64 offset)
{
u64 i = offset;
int j, entry;
int ret = 0, len = 0;
struct scatterlist *sg;
if (offset >= pdir->npages)
return -EINVAL;
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {
len = sg_dma_len(sg) >> PAGE_SHIFT;
for (j = 0; j < len; j++) {
dma_addr_t addr = sg_dma_address(sg) +
umem->page_size * j;
ret = pvrdma_page_dir_insert_dma(pdir, i, addr);
if (ret)
goto exit;
i++;
}
}
exit:
return ret;
}
int pvrdma_page_dir_insert_page_list(struct pvrdma_page_dir *pdir,
u64 *page_list,
int num_pages)
{
int i;
int ret;
if (num_pages > pdir->npages)
return -EINVAL;
for (i = 0; i < num_pages; i++) {
ret = pvrdma_page_dir_insert_dma(pdir, i, page_list[i]);
if (ret)
return ret;
}
return 0;
}
void pvrdma_qp_cap_to_ib(struct ib_qp_cap *dst, const struct pvrdma_qp_cap *src)
{
dst->max_send_wr = src->max_send_wr;
dst->max_recv_wr = src->max_recv_wr;
dst->max_send_sge = src->max_send_sge;
dst->max_recv_sge = src->max_recv_sge;
dst->max_inline_data = src->max_inline_data;
}
void ib_qp_cap_to_pvrdma(struct pvrdma_qp_cap *dst, const struct ib_qp_cap *src)
{
dst->max_send_wr = src->max_send_wr;
dst->max_recv_wr = src->max_recv_wr;
dst->max_send_sge = src->max_send_sge;
dst->max_recv_sge = src->max_recv_sge;
dst->max_inline_data = src->max_inline_data;
}
void pvrdma_gid_to_ib(union ib_gid *dst, const union pvrdma_gid *src)
{
BUILD_BUG_ON(sizeof(union pvrdma_gid) != sizeof(union ib_gid));
memcpy(dst, src, sizeof(*src));
}
void ib_gid_to_pvrdma(union pvrdma_gid *dst, const union ib_gid *src)
{
BUILD_BUG_ON(sizeof(union pvrdma_gid) != sizeof(union ib_gid));
memcpy(dst, src, sizeof(*src));
}
void pvrdma_global_route_to_ib(struct ib_global_route *dst,
const struct pvrdma_global_route *src)
{
pvrdma_gid_to_ib(&dst->dgid, &src->dgid);
dst->flow_label = src->flow_label;
dst->sgid_index = src->sgid_index;
dst->hop_limit = src->hop_limit;
dst->traffic_class = src->traffic_class;
}
void ib_global_route_to_pvrdma(struct pvrdma_global_route *dst,
const struct ib_global_route *src)
{
ib_gid_to_pvrdma(&dst->dgid, &src->dgid);
dst->flow_label = src->flow_label;
dst->sgid_index = src->sgid_index;
dst->hop_limit = src->hop_limit;
dst->traffic_class = src->traffic_class;
}
void pvrdma_ah_attr_to_ib(struct ib_ah_attr *dst,
const struct pvrdma_ah_attr *src)
{
pvrdma_global_route_to_ib(&dst->grh, &src->grh);
dst->dlid = src->dlid;
dst->sl = src->sl;
dst->src_path_bits = src->src_path_bits;
dst->static_rate = src->static_rate;
dst->ah_flags = src->ah_flags;
dst->port_num = src->port_num;
memcpy(&dst->dmac, &src->dmac, sizeof(dst->dmac));
}
void ib_ah_attr_to_pvrdma(struct pvrdma_ah_attr *dst,
const struct ib_ah_attr *src)
{
ib_global_route_to_pvrdma(&dst->grh, &src->grh);
dst->dlid = src->dlid;
dst->sl = src->sl;
dst->src_path_bits = src->src_path_bits;
dst->static_rate = src->static_rate;
dst->ah_flags = src->ah_flags;
dst->port_num = src->port_num;
memcpy(&dst->dmac, &src->dmac, sizeof(dst->dmac));
}
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <linux/list.h>
#include <linux/slab.h>
#include "pvrdma.h"
/**
* pvrdma_get_dma_mr - get a DMA memory region
* @pd: protection domain
* @acc: access flags
*
* @return: ib_mr pointer on success, otherwise returns an errno.
*/
struct ib_mr *pvrdma_get_dma_mr(struct ib_pd *pd, int acc)
{
struct pvrdma_dev *dev = to_vdev(pd->device);
struct pvrdma_user_mr *mr;
union pvrdma_cmd_req req;
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_mr *cmd = &req.create_mr;
struct pvrdma_cmd_create_mr_resp *resp = &rsp.create_mr_resp;
int ret;
/* Support only LOCAL_WRITE flag for DMA MRs */
if (acc & ~IB_ACCESS_LOCAL_WRITE) {
dev_warn(&dev->pdev->dev,
"unsupported dma mr access flags %#x\n", acc);
return ERR_PTR(-EOPNOTSUPP);
}
mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);
memset(cmd, 0, sizeof(*cmd));
cmd->hdr.cmd = PVRDMA_CMD_CREATE_MR;
cmd->pd_handle = to_vpd(pd)->pd_handle;
cmd->access_flags = acc;
cmd->flags = PVRDMA_MR_FLAG_DMA;
ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_MR_RESP);
if (ret < 0) {
dev_warn(&dev->pdev->dev,
"could not get DMA mem region, error: %d\n", ret);
kfree(mr);
return ERR_PTR(ret);
}
mr->mmr.mr_handle = resp->mr_handle;
mr->ibmr.lkey = resp->lkey;
mr->ibmr.rkey = resp->rkey;
return &mr->ibmr;
}
/**
* pvrdma_reg_user_mr - register a userspace memory region
* @pd: protection domain
* @start: starting address
* @length: length of region
* @virt_addr: I/O virtual address
* @access_flags: access flags for memory region
* @udata: user data
*
* @return: ib_mr pointer on success, otherwise returns an errno.
*/
struct ib_mr *pvrdma_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
u64 virt_addr, int access_flags,
struct ib_udata *udata)
{
struct pvrdma_dev *dev = to_vdev(pd->device);
struct pvrdma_user_mr *mr = NULL;
struct ib_umem *umem;
union pvrdma_cmd_req req;
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_mr *cmd = &req.create_mr;
struct pvrdma_cmd_create_mr_resp *resp = &rsp.create_mr_resp;
int nchunks;
int ret;
int entry;
struct scatterlist *sg;
if (length == 0 || length > dev->dsr->caps.max_mr_size) {
dev_warn(&dev->pdev->dev, "invalid mem region length\n");
return ERR_PTR(-EINVAL);
}
umem = ib_umem_get(pd->uobject->context, start,
length, access_flags, 0);
if (IS_ERR(umem)) {
dev_warn(&dev->pdev->dev,
"could not get umem for mem region\n");
return ERR_CAST(umem);
}
nchunks = 0;
for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry)
nchunks += sg_dma_len(sg) >> PAGE_SHIFT;
if (nchunks < 0 || nchunks > PVRDMA_PAGE_DIR_MAX_PAGES) {
dev_warn(&dev->pdev->dev, "overflow %d pages in mem region\n",
nchunks);
ret = -EINVAL;
goto err_umem;
}
mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr) {
ret = -ENOMEM;
goto err_umem;
}
mr->mmr.iova = virt_addr;
mr->mmr.size = length;
mr->umem = umem;
ret = pvrdma_page_dir_init(dev, &mr->pdir, nchunks, false);
if (ret) {
dev_warn(&dev->pdev->dev,
"could not allocate page directory\n");
goto err_umem;
}
ret = pvrdma_page_dir_insert_umem(&mr->pdir, mr->umem, 0);
if (ret)
goto err_pdir;
memset(cmd, 0, sizeof(*cmd));
cmd->hdr.cmd = PVRDMA_CMD_CREATE_MR;
cmd->start = start;
cmd->length = length;
cmd->pd_handle = to_vpd(pd)->pd_handle;
cmd->access_flags = access_flags;
cmd->nchunks = nchunks;
cmd->pdir_dma = mr->pdir.dir_dma;
ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_MR_RESP);
if (ret < 0) {
dev_warn(&dev->pdev->dev,
"could not register mem region, error: %d\n", ret);
goto err_pdir;
}
mr->mmr.mr_handle = resp->mr_handle;
mr->ibmr.lkey = resp->lkey;
mr->ibmr.rkey = resp->rkey;
return &mr->ibmr;
err_pdir:
pvrdma_page_dir_cleanup(dev, &mr->pdir);
err_umem:
ib_umem_release(umem);
kfree(mr);
return ERR_PTR(ret);
}
/**
* pvrdma_alloc_mr - allocate a memory region
* @pd: protection domain
* @mr_type: type of memory region
* @max_num_sg: maximum number of pages
*
* @return: ib_mr pointer on success, otherwise returns an errno.
*/
struct ib_mr *pvrdma_alloc_mr(struct ib_pd *pd, enum ib_mr_type mr_type,
u32 max_num_sg)
{
struct pvrdma_dev *dev = to_vdev(pd->device);
struct pvrdma_user_mr *mr;
union pvrdma_cmd_req req;
union pvrdma_cmd_resp rsp;
struct pvrdma_cmd_create_mr *cmd = &req.create_mr;
struct pvrdma_cmd_create_mr_resp *resp = &rsp.create_mr_resp;
int size = max_num_sg * sizeof(u64);
int ret;
if (mr_type != IB_MR_TYPE_MEM_REG ||
max_num_sg > PVRDMA_MAX_FAST_REG_PAGES)
return ERR_PTR(-EINVAL);
mr = kzalloc(sizeof(*mr), GFP_KERNEL);
if (!mr)
return ERR_PTR(-ENOMEM);
mr->pages = kzalloc(size, GFP_KERNEL);
if (!mr->pages) {
ret = -ENOMEM;
goto freemr;
}
ret = pvrdma_page_dir_init(dev, &mr->pdir, max_num_sg, false);
if (ret) {
dev_warn(&dev->pdev->dev,
"failed to allocate page dir for mr\n");
ret = -ENOMEM;
goto freepages;
}
memset(cmd, 0, sizeof(*cmd));
cmd->hdr.cmd = PVRDMA_CMD_CREATE_MR;
cmd->pd_handle = to_vpd(pd)->pd_handle;
cmd->access_flags = 0;
cmd->flags = PVRDMA_MR_FLAG_FRMR;
cmd->nchunks = max_num_sg;
ret = pvrdma_cmd_post(dev, &req, &rsp, PVRDMA_CMD_CREATE_MR_RESP);
if (ret < 0) {
dev_warn(&dev->pdev->dev,
"could not create FR mem region, error: %d\n", ret);
goto freepdir;
}
mr->max_pages = max_num_sg;
mr->mmr.mr_handle = resp->mr_handle;
mr->ibmr.lkey = resp->lkey;
mr->ibmr.rkey = resp->rkey;
mr->page_shift = PAGE_SHIFT;
mr->umem = NULL;
return &mr->ibmr;
freepdir:
pvrdma_page_dir_cleanup(dev, &mr->pdir);
freepages:
kfree(mr->pages);
freemr:
kfree(mr);
return ERR_PTR(ret);
}
/**
* pvrdma_dereg_mr - deregister a memory region
* @ibmr: memory region
*
* @return: 0 on success.
*/
int pvrdma_dereg_mr(struct ib_mr *ibmr)
{
struct pvrdma_user_mr *mr = to_vmr(ibmr);
struct pvrdma_dev *dev = to_vdev(ibmr->device);
union pvrdma_cmd_req req;
struct pvrdma_cmd_destroy_mr *cmd = &req.destroy_mr;
int ret;
memset(cmd, 0, sizeof(*cmd));
cmd->hdr.cmd = PVRDMA_CMD_DESTROY_MR;
cmd->mr_handle = mr->mmr.mr_handle;
ret = pvrdma_cmd_post(dev, &req, NULL, 0);
if (ret < 0)
dev_warn(&dev->pdev->dev,
"could not deregister mem region, error: %d\n", ret);
pvrdma_page_dir_cleanup(dev, &mr->pdir);
if (mr->umem)
ib_umem_release(mr->umem);
kfree(mr->pages);
kfree(mr);
return 0;
}
static int pvrdma_set_page(struct ib_mr *ibmr, u64 addr)
{
struct pvrdma_user_mr *mr = to_vmr(ibmr);
if (mr->npages == mr->max_pages)
return -ENOMEM;
mr->pages[mr->npages++] = addr;
return 0;
}
int pvrdma_map_mr_sg(struct ib_mr *ibmr, struct scatterlist *sg, int sg_nents,
unsigned int *sg_offset)
{
struct pvrdma_user_mr *mr = to_vmr(ibmr);
struct pvrdma_dev *dev = to_vdev(ibmr->device);
int ret;
mr->npages = 0;
ret = ib_sg_to_pages(ibmr, sg, sg_nents, sg_offset, pvrdma_set_page);
if (ret < 0)
dev_warn(&dev->pdev->dev, "could not map sg to pages\n");
return ret;
}
此差异已折叠。
/*
* Copyright (c) 2012-2016 VMware, Inc. All rights reserved.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of EITHER the GNU General Public License
* version 2 as published by the Free Software Foundation or the BSD
* 2-Clause License. This program is distributed in the hope that it
* will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
* WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License version 2 for more details at
* http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
*
* You should have received a copy of the GNU General Public License
* along with this program available in the file COPYING in the main
* directory of this source tree.
*
* The BSD 2-Clause License
*
* Redistribution and use in source and binary forms, with or
* without modification, are permitted provided that the following
* conditions are met:
*
* - Redistributions of source code must retain the above
* copyright notice, this list of conditions and the following
* disclaimer.
*
* - Redistributions in binary form must reproduce the above
* copyright notice, this list of conditions and the following
* disclaimer in the documentation and/or other materials
* provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
* INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __PVRDMA_RING_H__
#define __PVRDMA_RING_H__
#include <linux/types.h>
#define PVRDMA_INVALID_IDX -1 /* Invalid index. */
struct pvrdma_ring {
atomic_t prod_tail; /* Producer tail. */
atomic_t cons_head; /* Consumer head. */
};
struct pvrdma_ring_state {
struct pvrdma_ring tx; /* Tx ring. */
struct pvrdma_ring rx; /* Rx ring. */
};
static inline int pvrdma_idx_valid(__u32 idx, __u32 max_elems)
{
/* Generates fewer instructions than a less-than. */
return (idx & ~((max_elems << 1) - 1)) == 0;
}
static inline __s32 pvrdma_idx(atomic_t *var, __u32 max_elems)
{
const unsigned int idx = atomic_read(var);
if (pvrdma_idx_valid(idx, max_elems))
return idx & (max_elems - 1);
return PVRDMA_INVALID_IDX;
}
static inline void pvrdma_idx_ring_inc(atomic_t *var, __u32 max_elems)
{
__u32 idx = atomic_read(var) + 1; /* Increment. */
idx &= (max_elems << 1) - 1; /* Modulo size, flip gen. */
atomic_set(var, idx);
}
static inline __s32 pvrdma_idx_ring_has_space(const struct pvrdma_ring *r,
__u32 max_elems, __u32 *out_tail)
{
const __u32 tail = atomic_read(&r->prod_tail);
const __u32 head = atomic_read(&r->cons_head);
if (pvrdma_idx_valid(tail, max_elems) &&
pvrdma_idx_valid(head, max_elems)) {
*out_tail = tail & (max_elems - 1);
return tail != (head ^ max_elems);
}
return PVRDMA_INVALID_IDX;
}
static inline __s32 pvrdma_idx_ring_has_data(const struct pvrdma_ring *r,
__u32 max_elems, __u32 *out_head)
{
const __u32 tail = atomic_read(&r->prod_tail);
const __u32 head = atomic_read(&r->cons_head);
if (pvrdma_idx_valid(tail, max_elems) &&
pvrdma_idx_valid(head, max_elems)) {
*out_head = head & (max_elems - 1);
return tail != head;
}
return PVRDMA_INVALID_IDX;
}
static inline bool pvrdma_idx_ring_is_valid_idx(const struct pvrdma_ring *r,
__u32 max_elems, __u32 *idx)
{
const __u32 tail = atomic_read(&r->prod_tail);
const __u32 head = atomic_read(&r->cons_head);
if (pvrdma_idx_valid(tail, max_elems) &&
pvrdma_idx_valid(head, max_elems) &&
pvrdma_idx_valid(*idx, max_elems)) {
if (tail > head && (*idx < tail && *idx >= head))
return true;
else if (head > tail && (*idx >= head || *idx < tail))
return true;
}
return false;
}
#endif /* __PVRDMA_RING_H__ */
此差异已折叠。
此差异已折叠。
......@@ -14,3 +14,4 @@ header-y += mlx5-abi.h
header-y += mthca-abi.h
header-y += nes-abi.h
header-y += ocrdma-abi.h
header-y += vmw_pvrdma-abi.h
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册