提交 ccd5d1b9 编写于 作者: L Linus Torvalds

Merge tag 'ntb-4.13' of git://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
 "The major change in the series is a rework of the NTB infrastructure
  to all for IDT hardware to be supported (and resulting fallout from
  that). There are also a few clean-ups, etc.

  New IDT NTB driver and changes to the NTB infrastructure to allow for
  this different kind of NTB HW, some style fixes (per Greg KH
  recommendation), and some ntb_test tweaks"

* tag 'ntb-4.13' of git://github.com/jonmason/ntb:
  ntb_netdev: set the net_device's parent
  ntb: Add error path/handling to Debug FS entry creation
  ntb: Add more debugfs support for ntb_perf testing options
  ntb: Remove debug-fs variables from the context structure
  ntb: Add a module option to control affinity of DMA channels
  NTB: Add IDT 89HPESxNTx PCIe-switches support
  ntb_hw_intel: Style fixes: open code macros that just obfuscate code
  ntb_hw_amd: Style fixes: open code macros that just obfuscate code
  NTB: Add ntb.h comments
  NTB: Add PCIe Gen4 link speed
  NTB: Add new Memory Windows API documentation
  NTB: Add Messaging NTB API
  NTB: Alter Scratchpads API to support multi-ports devices
  NTB: Alter MW API to support multi-ports devices
  NTB: Alter link-state API to support multi-port devices
  NTB: Add indexed ports NTB API
  NTB: Make link-state API being declared first
  NTB: ntb_test: add parameter for doorbell bitmask
  NTB: ntb_test: modprobe on remote host
# NTB Drivers
NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
the separate memory systems of two computers to the same PCI-Express fabric.
Existing NTB hardware supports a common feature set, including scratchpad
registers, doorbell registers, and memory translation windows. Scratchpad
registers are read-and-writable registers that are accessible from either side
of the device, so that peers can exchange a small amount of information at a
fixed address. Doorbell registers provide a way for peers to send interrupt
events. Memory windows allow translated read and write access to the peer
memory.
the separate memory systems of two or more computers to the same PCI-Express
fabric. Existing NTB hardware supports a common feature set: doorbell
registers and memory translation windows, as well as non common features like
scratchpad and message registers. Scratchpad registers are read-and-writable
registers that are accessible from either side of the device, so that peers can
exchange a small amount of information at a fixed address. Message registers can
be utilized for the same purpose. Additionally they are provided with with
special status bits to make sure the information isn't rewritten by another
peer. Doorbell registers provide a way for peers to send interrupt events.
Memory windows allow translated read and write access to the peer memory.
## NTB Core Driver (ntb)
......@@ -26,6 +28,87 @@ as ntb hardware, or hardware drivers, are inserted and removed. The
registration uses the Linux Device framework, so it should feel familiar to
anyone who has written a pci driver.
### NTB Typical client driver implementation
Primary purpose of NTB is to share some peace of memory between at least two
systems. So the NTB device features like Scratchpad/Message registers are
mainly used to perform the proper memory window initialization. Typically
there are two types of memory window interfaces supported by the NTB API:
inbound translation configured on the local ntb port and outbound translation
configured by the peer, on the peer ntb port. The first type is
depicted on the next figure
Inbound translation:
Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
____________
| dma-mapped |-ntb_mw_set_trans(addr) |
| memory | _v____________ | ______________
| (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
|------------| |--------------| | |--------------|
So typical scenario of the first type memory window initialization looks:
1) allocate a memory region, 2) put translated address to NTB config,
3) somehow notify a peer device of performed initialization, 4) peer device
maps corresponding outbound memory window so to have access to the shared
memory region.
The second type of interface, that implies the shared windows being
initialized by a peer device, is depicted on the figure:
Outbound translation:
Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
____________ ______________
| dma-mapped | | | MW base addr |<== memory-mapped IO
| memory | | |--------------|
| (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
|------------| | |--------------|
Typical scenario of the second type interface initialization would be:
1) allocate a memory region, 2) somehow deliver a translated address to a peer
device, 3) peer puts the translated address to NTB config, 4) peer device maps
outbound memory window so to have access to the shared memory region.
As one can see the described scenarios can be combined in one portable
algorithm.
Local device:
1) Allocate memory for a shared window
2) Initialize memory window by translated address of the allocated region
(it may fail if local memory window initialization is unsupported)
3) Send the translated address and memory window index to a peer device
Peer device:
1) Initialize memory window with retrieved address of the allocated
by another device memory region (it may fail if peer memory window
initialization is unsupported)
2) Map outbound memory window
In accordance with this scenario, the NTB Memory Window API can be used as
follows:
Local device:
1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
be allocated for memory windows between local device and peer device
of port with specified index.
2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
shared memory region alignment and size. Then memory can be properly
allocated.
3) Allocate physically contiguous memory region in compliance with
restrictions retrieved in 2).
4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
the memory window with specified index for the defined peer device
(it may fail if local translated address setting is not supported)
5) Send translated base address (usually together with memory window
number) to the peer device using, for instance, scratchpad or message
registers.
Peer device:
1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
device (related to pidx) translated address for specified memory
window. It may fail if retrieved address, for instance, exceeds
maximum possible address or isn't properly aligned.
2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
window so to have an access to the shared memory.
Also it is worth to note, that method ntb_mw_count(pidx) should return the
same value as ntb_peer_mw_count() on the peer with port index - pidx.
### NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
The primary client for NTB is the Transport client, used in tandem with NTB
......
......@@ -9381,6 +9381,12 @@ F: include/linux/ntb.h
F: include/linux/ntb_transport.h
F: tools/testing/selftests/ntb/
NTB IDT DRIVER
M: Serge Semin <fancer.lancer@gmail.com>
L: linux-ntb@googlegroups.com
S: Supported
F: drivers/ntb/hw/idt/
NTB INTEL DRIVER
M: Jon Mason <jdmason@kudzu.us>
M: Dave Jiang <dave.jiang@intel.com>
......
......@@ -418,6 +418,8 @@ static int ntb_netdev_probe(struct device *client_dev)
if (!ndev)
return -ENOMEM;
SET_NETDEV_DEV(ndev, client_dev);
dev = netdev_priv(ndev);
dev->ndev = ndev;
dev->pdev = pdev;
......
source "drivers/ntb/hw/amd/Kconfig"
source "drivers/ntb/hw/idt/Kconfig"
source "drivers/ntb/hw/intel/Kconfig"
obj-$(CONFIG_NTB_AMD) += amd/
obj-$(CONFIG_NTB_IDT) += idt/
obj-$(CONFIG_NTB_INTEL) += intel/
......@@ -5,6 +5,7 @@
* GPL LICENSE SUMMARY
*
* Copyright (C) 2016 Advanced Micro Devices, Inc. All Rights Reserved.
* Copyright (C) 2016 T-Platforms. All Rights Reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
......@@ -13,6 +14,7 @@
* BSD LICENSE
*
* Copyright (C) 2016 Advanced Micro Devices, Inc. All Rights Reserved.
* Copyright (C) 2016 T-Platforms. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
......@@ -79,40 +81,42 @@ static int ndev_mw_to_bar(struct amd_ntb_dev *ndev, int idx)
return 1 << idx;
}
static int amd_ntb_mw_count(struct ntb_dev *ntb)
static int amd_ntb_mw_count(struct ntb_dev *ntb, int pidx)
{
if (pidx != NTB_DEF_PEER_IDX)
return -EINVAL;
return ntb_ndev(ntb)->mw_count;
}
static int amd_ntb_mw_get_range(struct ntb_dev *ntb, int idx,
phys_addr_t *base,
resource_size_t *size,
resource_size_t *align,
resource_size_t *align_size)
static int amd_ntb_mw_get_align(struct ntb_dev *ntb, int pidx, int idx,
resource_size_t *addr_align,
resource_size_t *size_align,
resource_size_t *size_max)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
int bar;
if (pidx != NTB_DEF_PEER_IDX)
return -EINVAL;
bar = ndev_mw_to_bar(ndev, idx);
if (bar < 0)
return bar;
if (base)
*base = pci_resource_start(ndev->ntb.pdev, bar);
if (size)
*size = pci_resource_len(ndev->ntb.pdev, bar);
if (addr_align)
*addr_align = SZ_4K;
if (align)
*align = SZ_4K;
if (size_align)
*size_align = 1;
if (align_size)
*align_size = 1;
if (size_max)
*size_max = pci_resource_len(ndev->ntb.pdev, bar);
return 0;
}
static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int pidx, int idx,
dma_addr_t addr, resource_size_t size)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
......@@ -122,11 +126,14 @@ static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
u64 base_addr, limit, reg_val;
int bar;
if (pidx != NTB_DEF_PEER_IDX)
return -EINVAL;
bar = ndev_mw_to_bar(ndev, idx);
if (bar < 0)
return bar;
mw_size = pci_resource_len(ndev->ntb.pdev, bar);
mw_size = pci_resource_len(ntb->pdev, bar);
/* make sure the range fits in the usable mw size */
if (size > mw_size)
......@@ -135,7 +142,7 @@ static int amd_ntb_mw_set_trans(struct ntb_dev *ntb, int idx,
mmio = ndev->self_mmio;
peer_mmio = ndev->peer_mmio;
base_addr = pci_resource_start(ndev->ntb.pdev, bar);
base_addr = pci_resource_start(ntb->pdev, bar);
if (bar != 1) {
xlat_reg = AMD_BAR23XLAT_OFFSET + ((bar - 2) << 2);
......@@ -212,7 +219,7 @@ static int amd_link_is_up(struct amd_ntb_dev *ndev)
return 0;
}
static int amd_ntb_link_is_up(struct ntb_dev *ntb,
static u64 amd_ntb_link_is_up(struct ntb_dev *ntb,
enum ntb_speed *speed,
enum ntb_width *width)
{
......@@ -225,7 +232,7 @@ static int amd_ntb_link_is_up(struct ntb_dev *ntb,
if (width)
*width = NTB_LNK_STA_WIDTH(ndev->lnk_sta);
dev_dbg(ndev_dev(ndev), "link is up.\n");
dev_dbg(&ntb->pdev->dev, "link is up.\n");
ret = 1;
} else {
......@@ -234,7 +241,7 @@ static int amd_ntb_link_is_up(struct ntb_dev *ntb,
if (width)
*width = NTB_WIDTH_NONE;
dev_dbg(ndev_dev(ndev), "link is down.\n");
dev_dbg(&ntb->pdev->dev, "link is down.\n");
}
return ret;
......@@ -254,7 +261,7 @@ static int amd_ntb_link_enable(struct ntb_dev *ntb,
if (ndev->ntb.topo == NTB_TOPO_SEC)
return -EINVAL;
dev_dbg(ndev_dev(ndev), "Enabling Link.\n");
dev_dbg(&ntb->pdev->dev, "Enabling Link.\n");
ntb_ctl = readl(mmio + AMD_CNTL_OFFSET);
ntb_ctl |= (PMM_REG_CTL | SMM_REG_CTL);
......@@ -275,7 +282,7 @@ static int amd_ntb_link_disable(struct ntb_dev *ntb)
if (ndev->ntb.topo == NTB_TOPO_SEC)
return -EINVAL;
dev_dbg(ndev_dev(ndev), "Enabling Link.\n");
dev_dbg(&ntb->pdev->dev, "Enabling Link.\n");
ntb_ctl = readl(mmio + AMD_CNTL_OFFSET);
ntb_ctl &= ~(PMM_REG_CTL | SMM_REG_CTL);
......@@ -284,6 +291,31 @@ static int amd_ntb_link_disable(struct ntb_dev *ntb)
return 0;
}
static int amd_ntb_peer_mw_count(struct ntb_dev *ntb)
{
/* The same as for inbound MWs */
return ntb_ndev(ntb)->mw_count;
}
static int amd_ntb_peer_mw_get_addr(struct ntb_dev *ntb, int idx,
phys_addr_t *base, resource_size_t *size)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
int bar;
bar = ndev_mw_to_bar(ndev, idx);
if (bar < 0)
return bar;
if (base)
*base = pci_resource_start(ndev->ntb.pdev, bar);
if (size)
*size = pci_resource_len(ndev->ntb.pdev, bar);
return 0;
}
static u64 amd_ntb_db_valid_mask(struct ntb_dev *ntb)
{
return ntb_ndev(ntb)->db_valid_mask;
......@@ -400,30 +432,30 @@ static int amd_ntb_spad_write(struct ntb_dev *ntb,
return 0;
}
static u32 amd_ntb_peer_spad_read(struct ntb_dev *ntb, int idx)
static u32 amd_ntb_peer_spad_read(struct ntb_dev *ntb, int pidx, int sidx)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
void __iomem *mmio = ndev->self_mmio;
u32 offset;
if (idx < 0 || idx >= ndev->spad_count)
if (sidx < 0 || sidx >= ndev->spad_count)
return -EINVAL;
offset = ndev->peer_spad + (idx << 2);
offset = ndev->peer_spad + (sidx << 2);
return readl(mmio + AMD_SPAD_OFFSET + offset);
}
static int amd_ntb_peer_spad_write(struct ntb_dev *ntb,
int idx, u32 val)
static int amd_ntb_peer_spad_write(struct ntb_dev *ntb, int pidx,
int sidx, u32 val)
{
struct amd_ntb_dev *ndev = ntb_ndev(ntb);
void __iomem *mmio = ndev->self_mmio;
u32 offset;
if (idx < 0 || idx >= ndev->spad_count)
if (sidx < 0 || sidx >= ndev->spad_count)
return -EINVAL;
offset = ndev->peer_spad + (idx << 2);
offset = ndev->peer_spad + (sidx << 2);
writel(val, mmio + AMD_SPAD_OFFSET + offset);
return 0;
......@@ -431,8 +463,10 @@ static int amd_ntb_peer_spad_write(struct ntb_dev *ntb,
static const struct ntb_dev_ops amd_ntb_ops = {
.mw_count = amd_ntb_mw_count,
.mw_get_range = amd_ntb_mw_get_range,
.mw_get_align = amd_ntb_mw_get_align,
.mw_set_trans = amd_ntb_mw_set_trans,
.peer_mw_count = amd_ntb_peer_mw_count,
.peer_mw_get_addr = amd_ntb_peer_mw_get_addr,
.link_is_up = amd_ntb_link_is_up,
.link_enable = amd_ntb_link_enable,
.link_disable = amd_ntb_link_disable,
......@@ -466,18 +500,19 @@ static void amd_ack_smu(struct amd_ntb_dev *ndev, u32 bit)
static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
{
void __iomem *mmio = ndev->self_mmio;
struct device *dev = &ndev->ntb.pdev->dev;
u32 status;
status = readl(mmio + AMD_INTSTAT_OFFSET);
if (!(status & AMD_EVENT_INTMASK))
return;
dev_dbg(ndev_dev(ndev), "status = 0x%x and vec = %d\n", status, vec);
dev_dbg(dev, "status = 0x%x and vec = %d\n", status, vec);
status &= AMD_EVENT_INTMASK;
switch (status) {
case AMD_PEER_FLUSH_EVENT:
dev_info(ndev_dev(ndev), "Flush is done.\n");
dev_info(dev, "Flush is done.\n");
break;
case AMD_PEER_RESET_EVENT:
amd_ack_smu(ndev, AMD_PEER_RESET_EVENT);
......@@ -503,7 +538,7 @@ static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
status = readl(mmio + AMD_PMESTAT_OFFSET);
/* check if this is WAKEUP event */
if (status & 0x1)
dev_info(ndev_dev(ndev), "Wakeup is done.\n");
dev_info(dev, "Wakeup is done.\n");
amd_ack_smu(ndev, AMD_PEER_D0_EVENT);
......@@ -512,14 +547,14 @@ static void amd_handle_event(struct amd_ntb_dev *ndev, int vec)
AMD_LINK_HB_TIMEOUT);
break;
default:
dev_info(ndev_dev(ndev), "event status = 0x%x.\n", status);
dev_info(dev, "event status = 0x%x.\n", status);
break;
}
}
static irqreturn_t ndev_interrupt(struct amd_ntb_dev *ndev, int vec)
{
dev_dbg(ndev_dev(ndev), "vec %d\n", vec);
dev_dbg(&ndev->ntb.pdev->dev, "vec %d\n", vec);
if (vec > (AMD_DB_CNT - 1) || (ndev->msix_vec_count == 1))
amd_handle_event(ndev, vec);
......@@ -541,7 +576,7 @@ static irqreturn_t ndev_irq_isr(int irq, void *dev)
{
struct amd_ntb_dev *ndev = dev;
return ndev_interrupt(ndev, irq - ndev_pdev(ndev)->irq);
return ndev_interrupt(ndev, irq - ndev->ntb.pdev->irq);
}
static int ndev_init_isr(struct amd_ntb_dev *ndev,
......@@ -550,7 +585,7 @@ static int ndev_init_isr(struct amd_ntb_dev *ndev,
struct pci_dev *pdev;
int rc, i, msix_count, node;
pdev = ndev_pdev(ndev);
pdev = ndev->ntb.pdev;
node = dev_to_node(&pdev->dev);
......@@ -592,7 +627,7 @@ static int ndev_init_isr(struct amd_ntb_dev *ndev,
goto err_msix_request;
}
dev_dbg(ndev_dev(ndev), "Using msix interrupts\n");
dev_dbg(&pdev->dev, "Using msix interrupts\n");
ndev->db_count = msix_min;
ndev->msix_vec_count = msix_max;
return 0;
......@@ -619,7 +654,7 @@ static int ndev_init_isr(struct amd_ntb_dev *ndev,
if (rc)
goto err_msi_request;
dev_dbg(ndev_dev(ndev), "Using msi interrupts\n");
dev_dbg(&pdev->dev, "Using msi interrupts\n");
ndev->db_count = 1;
ndev->msix_vec_count = 1;
return 0;
......@@ -636,7 +671,7 @@ static int ndev_init_isr(struct amd_ntb_dev *ndev,
if (rc)
goto err_intx_request;
dev_dbg(ndev_dev(ndev), "Using intx interrupts\n");
dev_dbg(&pdev->dev, "Using intx interrupts\n");
ndev->db_count = 1;
ndev->msix_vec_count = 1;
return 0;
......@@ -651,7 +686,7 @@ static void ndev_deinit_isr(struct amd_ntb_dev *ndev)
void __iomem *mmio = ndev->self_mmio;
int i;
pdev = ndev_pdev(ndev);
pdev = ndev->ntb.pdev;
/* Mask all doorbell interrupts */
ndev->db_mask = ndev->db_valid_mask;
......@@ -777,7 +812,8 @@ static void ndev_init_debugfs(struct amd_ntb_dev *ndev)
ndev->debugfs_info = NULL;
} else {
ndev->debugfs_dir =
debugfs_create_dir(ndev_name(ndev), debugfs_dir);
debugfs_create_dir(pci_name(ndev->ntb.pdev),
debugfs_dir);
if (!ndev->debugfs_dir)
ndev->debugfs_info = NULL;
else
......@@ -812,7 +848,7 @@ static int amd_poll_link(struct amd_ntb_dev *ndev)
reg = readl(mmio + AMD_SIDEINFO_OFFSET);
reg &= NTB_LIN_STA_ACTIVE_BIT;
dev_dbg(ndev_dev(ndev), "%s: reg_val = 0x%x.\n", __func__, reg);
dev_dbg(&ndev->ntb.pdev->dev, "%s: reg_val = 0x%x.\n", __func__, reg);
if (reg == ndev->cntl_sta)
return 0;
......@@ -894,7 +930,8 @@ static int amd_init_ntb(struct amd_ntb_dev *ndev)
break;
default:
dev_err(ndev_dev(ndev), "AMD NTB does not support B2B mode.\n");
dev_err(&ndev->ntb.pdev->dev,
"AMD NTB does not support B2B mode.\n");
return -EINVAL;
}
......@@ -923,10 +960,10 @@ static int amd_init_dev(struct amd_ntb_dev *ndev)
struct pci_dev *pdev;
int rc = 0;
pdev = ndev_pdev(ndev);
pdev = ndev->ntb.pdev;
ndev->ntb.topo = amd_get_topo(ndev);
dev_dbg(ndev_dev(ndev), "AMD NTB topo is %s\n",
dev_dbg(&pdev->dev, "AMD NTB topo is %s\n",
ntb_topo_string(ndev->ntb.topo));
rc = amd_init_ntb(ndev);
......@@ -935,7 +972,7 @@ static int amd_init_dev(struct amd_ntb_dev *ndev)
rc = amd_init_isr(ndev);
if (rc) {
dev_err(ndev_dev(ndev), "fail to init isr.\n");
dev_err(&pdev->dev, "fail to init isr.\n");
return rc;
}
......@@ -973,7 +1010,7 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
if (rc)
goto err_dma_mask;
dev_warn(ndev_dev(ndev), "Cannot DMA highmem\n");
dev_warn(&pdev->dev, "Cannot DMA highmem\n");
}
rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
......@@ -981,7 +1018,7 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
rc = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32));
if (rc)
goto err_dma_mask;
dev_warn(ndev_dev(ndev), "Cannot DMA consistent highmem\n");
dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
}
ndev->self_mmio = pci_iomap(pdev, 0, 0);
......@@ -1004,7 +1041,7 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
static void amd_ntb_deinit_pci(struct amd_ntb_dev *ndev)
{
struct pci_dev *pdev = ndev_pdev(ndev);
struct pci_dev *pdev = ndev->ntb.pdev;
pci_iounmap(pdev, ndev->self_mmio);
......
......@@ -211,9 +211,6 @@ struct amd_ntb_dev {
struct dentry *debugfs_info;
};
#define ndev_pdev(ndev) ((ndev)->ntb.pdev)
#define ndev_name(ndev) pci_name(ndev_pdev(ndev))
#define ndev_dev(ndev) (&ndev_pdev(ndev)->dev)
#define ntb_ndev(__ntb) container_of(__ntb, struct amd_ntb_dev, ntb)
#define hb_ndev(__work) container_of(__work, struct amd_ntb_dev, hb_timer.work)
......
config NTB_IDT
tristate "IDT PCIe-switch Non-Transparent Bridge support"
depends on PCI
help
This driver supports NTB of cappable IDT PCIe-switches.
Some of the pre-initializations must be made before IDT PCIe-switch
exposes it NT-functions correctly. It should be done by either proper
initialisation of EEPROM connected to master smbus of the switch or
by BIOS using slave-SMBus interface changing corresponding registers
value. Evidently it must be done before PCI bus enumeration is
finished in Linux kernel.
First of all partitions must be activated and properly assigned to all
the ports with NT-functions intended to be activated (see SWPARTxCTL
and SWPORTxCTL registers). Then all NT-function BARs must be enabled
with chosen valid aperture. For memory windows related BARs the
aperture settings shall determine the maximum size of memory windows
accepted by a BAR. Note that BAR0 must map PCI configuration space
registers.
It's worth to note, that since a part of this driver relies on the
BAR settings of peer NT-functions, the BAR setups can't be done over
kernel PCI fixups. That's why the alternative pre-initialization
techniques like BIOS using SMBus interface or EEPROM should be
utilized. Additionally if one needs to have temperature sensor
information printed to system log, the corresponding registers must
be initialized within BIOS/EEPROM as well.
If unsure, say N.
obj-$(CONFIG_NTB_IDT) += ntb_hw_idt.o
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -382,9 +382,6 @@ struct intel_ntb_dev {
struct dentry *debugfs_info;
};
#define ndev_pdev(ndev) ((ndev)->ntb.pdev)
#define ndev_name(ndev) pci_name(ndev_pdev(ndev))
#define ndev_dev(ndev) (&ndev_pdev(ndev)->dev)
#define ntb_ndev(__ntb) container_of(__ntb, struct intel_ntb_dev, ntb)
#define hb_ndev(__work) container_of(__work, struct intel_ntb_dev, \
hb_timer.work)
......
......@@ -5,6 +5,7 @@
* GPL LICENSE SUMMARY
*
* Copyright (C) 2015 EMC Corporation. All Rights Reserved.
* Copyright (C) 2016 T-Platforms. All Rights Reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of version 2 of the GNU General Public License as
......@@ -18,6 +19,7 @@
* BSD LICENSE
*
* Copyright (C) 2015 EMC Corporation. All Rights Reserved.
* Copyright (C) 2016 T-Platforms. All Rights Reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
......@@ -191,6 +193,73 @@ void ntb_db_event(struct ntb_dev *ntb, int vector)
}
EXPORT_SYMBOL(ntb_db_event);
void ntb_msg_event(struct ntb_dev *ntb)
{
unsigned long irqflags;
spin_lock_irqsave(&ntb->ctx_lock, irqflags);
{
if (ntb->ctx_ops && ntb->ctx_ops->msg_event)
ntb->ctx_ops->msg_event(ntb->ctx);
}
spin_unlock_irqrestore(&ntb->ctx_lock, irqflags);
}
EXPORT_SYMBOL(ntb_msg_event);
int ntb_default_port_number(struct ntb_dev *ntb)
{
switch (ntb->topo) {
case NTB_TOPO_PRI:
case NTB_TOPO_B2B_USD:
return NTB_PORT_PRI_USD;
case NTB_TOPO_SEC:
case NTB_TOPO_B2B_DSD:
return NTB_PORT_SEC_DSD;
default:
break;
}
return -EINVAL;
}
EXPORT_SYMBOL(ntb_default_port_number);
int ntb_default_peer_port_count(struct ntb_dev *ntb)
{
return NTB_DEF_PEER_CNT;
}
EXPORT_SYMBOL(ntb_default_peer_port_count);
int ntb_default_peer_port_number(struct ntb_dev *ntb, int pidx)
{
if (pidx != NTB_DEF_PEER_IDX)
return -EINVAL;
switch (ntb->topo) {
case NTB_TOPO_PRI:
case NTB_TOPO_B2B_USD:
return NTB_PORT_SEC_DSD;
case NTB_TOPO_SEC:
case NTB_TOPO_B2B_DSD:
return NTB_PORT_PRI_USD;
default:
break;
}
return -EINVAL;
}
EXPORT_SYMBOL(ntb_default_peer_port_number);
int ntb_default_peer_port_idx(struct ntb_dev *ntb, int port)
{
int peer_port = ntb_default_peer_port_number(ntb, NTB_DEF_PEER_IDX);
if (peer_port == -EINVAL || port != peer_port)
return -EINVAL;
return 0;
}
EXPORT_SYMBOL(ntb_default_peer_port_idx);
static int ntb_probe(struct device *dev)
{
struct ntb_dev *ntb;
......
......@@ -95,6 +95,9 @@ MODULE_PARM_DESC(use_dma, "Use DMA engine to perform large data copy");
static struct dentry *nt_debugfs_dir;
/* Only two-ports NTB devices are supported */
#define PIDX NTB_DEF_PEER_IDX
struct ntb_queue_entry {
/* ntb_queue list reference */
struct list_head entry;
......@@ -670,7 +673,7 @@ static void ntb_free_mw(struct ntb_transport_ctx *nt, int num_mw)
if (!mw->virt_addr)
return;
ntb_mw_clear_trans(nt->ndev, num_mw);
ntb_mw_clear_trans(nt->ndev, PIDX, num_mw);
dma_free_coherent(&pdev->dev, mw->buff_size,
mw->virt_addr, mw->dma_addr);
mw->xlat_size = 0;
......@@ -727,7 +730,8 @@ static int ntb_set_mw(struct ntb_transport_ctx *nt, int num_mw,
}
/* Notify HW the memory location of the receive buffer */
rc = ntb_mw_set_trans(nt->ndev, num_mw, mw->dma_addr, mw->xlat_size);
rc = ntb_mw_set_trans(nt->ndev, PIDX, num_mw, mw->dma_addr,
mw->xlat_size);
if (rc) {
dev_err(&pdev->dev, "Unable to set mw%d translation", num_mw);
ntb_free_mw(nt, num_mw);
......@@ -858,17 +862,17 @@ static void ntb_transport_link_work(struct work_struct *work)
size = max_mw_size;
spad = MW0_SZ_HIGH + (i * 2);
ntb_peer_spad_write(ndev, spad, upper_32_bits(size));
ntb_peer_spad_write(ndev, PIDX, spad, upper_32_bits(size));
spad = MW0_SZ_LOW + (i * 2);
ntb_peer_spad_write(ndev, spad, lower_32_bits(size));
ntb_peer_spad_write(ndev, PIDX, spad, lower_32_bits(size));
}
ntb_peer_spad_write(ndev, NUM_MWS, nt->mw_count);
ntb_peer_spad_write(ndev, PIDX, NUM_MWS, nt->mw_count);
ntb_peer_spad_write(ndev, NUM_QPS, nt->qp_count);
ntb_peer_spad_write(ndev, PIDX, NUM_QPS, nt->qp_count);
ntb_peer_spad_write(ndev, VERSION, NTB_TRANSPORT_VERSION);
ntb_peer_spad_write(ndev, PIDX, VERSION, NTB_TRANSPORT_VERSION);
/* Query the remote side for its info */
val = ntb_spad_read(ndev, VERSION);
......@@ -944,7 +948,7 @@ static void ntb_qp_link_work(struct work_struct *work)
val = ntb_spad_read(nt->ndev, QP_LINKS);
ntb_peer_spad_write(nt->ndev, QP_LINKS, val | BIT(qp->qp_num));
ntb_peer_spad_write(nt->ndev, PIDX, QP_LINKS, val | BIT(qp->qp_num));
/* query remote spad for qp ready bits */
dev_dbg_ratelimited(&pdev->dev, "Remote QP link status = %x\n", val);
......@@ -1055,7 +1059,12 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
int node;
int rc, i;
mw_count = ntb_mw_count(ndev);
mw_count = ntb_mw_count(ndev, PIDX);
if (!ndev->ops->mw_set_trans) {
dev_err(&ndev->dev, "Inbound MW based NTB API is required\n");
return -EINVAL;
}
if (ntb_db_is_unsafe(ndev))
dev_dbg(&ndev->dev,
......@@ -1064,6 +1073,9 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
dev_dbg(&ndev->dev,
"scratchpad is unsafe, proceed anyway...\n");
if (ntb_peer_port_count(ndev) != NTB_DEF_PEER_CNT)
dev_warn(&ndev->dev, "Multi-port NTB devices unsupported\n");
node = dev_to_node(&ndev->dev);
nt = kzalloc_node(sizeof(*nt), GFP_KERNEL, node);
......@@ -1094,8 +1106,13 @@ static int ntb_transport_probe(struct ntb_client *self, struct ntb_dev *ndev)
for (i = 0; i < mw_count; i++) {
mw = &nt->mw_vec[i];
rc = ntb_mw_get_range(ndev, i, &mw->phys_addr, &mw->phys_size,
&mw->xlat_align, &mw->xlat_align_size);
rc = ntb_mw_get_align(ndev, PIDX, i, &mw->xlat_align,
&mw->xlat_align_size, NULL);
if (rc)
goto err1;
rc = ntb_peer_mw_get_addr(ndev, i, &mw->phys_addr,
&mw->phys_size);
if (rc)
goto err1;
......@@ -2091,8 +2108,7 @@ void ntb_transport_link_down(struct ntb_transport_qp *qp)
val = ntb_spad_read(qp->ndev, QP_LINKS);
ntb_peer_spad_write(qp->ndev, QP_LINKS,
val & ~BIT(qp->qp_num));
ntb_peer_spad_write(qp->ndev, PIDX, QP_LINKS, val & ~BIT(qp->qp_num));
if (qp->link_is_up)
ntb_send_link_down(qp);
......
......@@ -76,6 +76,7 @@
#define DMA_RETRIES 20
#define SZ_4G (1ULL << 32)
#define MAX_SEG_ORDER 20 /* no larger than 1M for kmalloc buffer */
#define PIDX NTB_DEF_PEER_IDX
MODULE_LICENSE(DRIVER_LICENSE);
MODULE_VERSION(DRIVER_VERSION);
......@@ -100,6 +101,10 @@ static bool use_dma; /* default to 0 */
module_param(use_dma, bool, 0644);
MODULE_PARM_DESC(use_dma, "Using DMA engine to measure performance");
static bool on_node = true; /* default to 1 */
module_param(on_node, bool, 0644);
MODULE_PARM_DESC(on_node, "Run threads only on NTB device node (default: true)");
struct perf_mw {
phys_addr_t phys_addr;
resource_size_t phys_size;
......@@ -135,9 +140,6 @@ struct perf_ctx {
bool link_is_up;
struct delayed_work link_work;
wait_queue_head_t link_wq;
struct dentry *debugfs_node_dir;
struct dentry *debugfs_run;
struct dentry *debugfs_threads;
u8 perf_threads;
/* mutex ensures only one set of threads run at once */
struct mutex run_mutex;
......@@ -344,6 +346,10 @@ static int perf_move_data(struct pthr_ctx *pctx, char __iomem *dst, char *src,
static bool perf_dma_filter_fn(struct dma_chan *chan, void *node)
{
/* Is the channel required to be on the same node as the device? */
if (!on_node)
return true;
return dev_to_node(&chan->dev->device) == (int)(unsigned long)node;
}
......@@ -361,7 +367,7 @@ static int ntb_perf_thread(void *data)
pr_debug("kthread %s starting...\n", current->comm);
node = dev_to_node(&pdev->dev);
node = on_node ? dev_to_node(&pdev->dev) : NUMA_NO_NODE;
if (use_dma && !pctx->dma_chan) {
dma_cap_mask_t dma_mask;
......@@ -454,7 +460,7 @@ static void perf_free_mw(struct perf_ctx *perf)
if (!mw->virt_addr)
return;
ntb_mw_clear_trans(perf->ntb, 0);
ntb_mw_clear_trans(perf->ntb, PIDX, 0);
dma_free_coherent(&pdev->dev, mw->buf_size,
mw->virt_addr, mw->dma_addr);
mw->xlat_size = 0;
......@@ -490,7 +496,7 @@ static int perf_set_mw(struct perf_ctx *perf, resource_size_t size)
mw->buf_size = 0;
}
rc = ntb_mw_set_trans(perf->ntb, 0, mw->dma_addr, mw->xlat_size);
rc = ntb_mw_set_trans(perf->ntb, PIDX, 0, mw->dma_addr, mw->xlat_size);
if (rc) {
dev_err(&perf->ntb->dev, "Unable to set mw0 translation\n");
perf_free_mw(perf);
......@@ -517,9 +523,9 @@ static void perf_link_work(struct work_struct *work)
if (max_mw_size && size > max_mw_size)
size = max_mw_size;
ntb_peer_spad_write(ndev, MW_SZ_HIGH, upper_32_bits(size));
ntb_peer_spad_write(ndev, MW_SZ_LOW, lower_32_bits(size));
ntb_peer_spad_write(ndev, VERSION, PERF_VERSION);
ntb_peer_spad_write(ndev, PIDX, MW_SZ_HIGH, upper_32_bits(size));
ntb_peer_spad_write(ndev, PIDX, MW_SZ_LOW, lower_32_bits(size));
ntb_peer_spad_write(ndev, PIDX, VERSION, PERF_VERSION);
/* now read what peer wrote */
val = ntb_spad_read(ndev, VERSION);
......@@ -561,8 +567,12 @@ static int perf_setup_mw(struct ntb_dev *ntb, struct perf_ctx *perf)
mw = &perf->mw;
rc = ntb_mw_get_range(ntb, 0, &mw->phys_addr, &mw->phys_size,
&mw->xlat_align, &mw->xlat_align_size);
rc = ntb_mw_get_align(ntb, PIDX, 0, &mw->xlat_align,
&mw->xlat_align_size, NULL);
if (rc)
return rc;
rc = ntb_peer_mw_get_addr(ntb, 0, &mw->phys_addr, &mw->phys_size);
if (rc)
return rc;
......@@ -677,7 +687,8 @@ static ssize_t debugfs_run_write(struct file *filp, const char __user *ubuf,
pr_info("Fix run_order to %u\n", run_order);
}
node = dev_to_node(&perf->ntb->pdev->dev);
node = on_node ? dev_to_node(&perf->ntb->pdev->dev)
: NUMA_NO_NODE;
atomic_set(&perf->tdone, 0);
/* launch kernel thread */
......@@ -723,34 +734,71 @@ static const struct file_operations ntb_perf_debugfs_run = {
static int perf_debugfs_setup(struct perf_ctx *perf)
{
struct pci_dev *pdev = perf->ntb->pdev;
struct dentry *debugfs_node_dir;
struct dentry *debugfs_run;
struct dentry *debugfs_threads;
struct dentry *debugfs_seg_order;
struct dentry *debugfs_run_order;
struct dentry *debugfs_use_dma;
struct dentry *debugfs_on_node;
if (!debugfs_initialized())
return -ENODEV;
/* Assumpion: only one NTB device in the system */
if (!perf_debugfs_dir) {
perf_debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
if (!perf_debugfs_dir)
return -ENODEV;
}
perf->debugfs_node_dir = debugfs_create_dir(pci_name(pdev),
perf_debugfs_dir);
if (!perf->debugfs_node_dir)
return -ENODEV;
debugfs_node_dir = debugfs_create_dir(pci_name(pdev),
perf_debugfs_dir);
if (!debugfs_node_dir)
goto err;
perf->debugfs_run = debugfs_create_file("run", S_IRUSR | S_IWUSR,
perf->debugfs_node_dir, perf,
&ntb_perf_debugfs_run);
if (!perf->debugfs_run)
return -ENODEV;
debugfs_run = debugfs_create_file("run", S_IRUSR | S_IWUSR,
debugfs_node_dir, perf,
&ntb_perf_debugfs_run);
if (!debugfs_run)
goto err;
perf->debugfs_threads = debugfs_create_u8("threads", S_IRUSR | S_IWUSR,
perf->debugfs_node_dir,
&perf->perf_threads);
if (!perf->debugfs_threads)
return -ENODEV;
debugfs_threads = debugfs_create_u8("threads", S_IRUSR | S_IWUSR,
debugfs_node_dir,
&perf->perf_threads);
if (!debugfs_threads)
goto err;
debugfs_seg_order = debugfs_create_u32("seg_order", 0600,
debugfs_node_dir,
&seg_order);
if (!debugfs_seg_order)
goto err;
debugfs_run_order = debugfs_create_u32("run_order", 0600,
debugfs_node_dir,
&run_order);
if (!debugfs_run_order)
goto err;
debugfs_use_dma = debugfs_create_bool("use_dma", 0600,
debugfs_node_dir,
&use_dma);
if (!debugfs_use_dma)
goto err;
debugfs_on_node = debugfs_create_bool("on_node", 0600,
debugfs_node_dir,
&on_node);
if (!debugfs_on_node)
goto err;
return 0;
err:
debugfs_remove_recursive(perf_debugfs_dir);
perf_debugfs_dir = NULL;
return -ENODEV;
}
static int perf_probe(struct ntb_client *client, struct ntb_dev *ntb)
......@@ -766,8 +814,15 @@ static int perf_probe(struct ntb_client *client, struct ntb_dev *ntb)
return -EIO;
}
node = dev_to_node(&pdev->dev);
if (!ntb->ops->mw_set_trans) {
dev_err(&ntb->dev, "Need inbound MW based NTB API\n");
return -EINVAL;
}
if (ntb_peer_port_count(ntb) != NTB_DEF_PEER_CNT)
dev_warn(&ntb->dev, "Multi-port NTB devices unsupported\n");
node = on_node ? dev_to_node(&pdev->dev) : NUMA_NO_NODE;
perf = kzalloc_node(sizeof(*perf), GFP_KERNEL, node);
if (!perf) {
rc = -ENOMEM;
......
......@@ -90,6 +90,9 @@ static unsigned long db_init = 0x7;
module_param(db_init, ulong, 0644);
MODULE_PARM_DESC(db_init, "Initial doorbell bits to ring on the peer");
/* Only two-ports NTB devices are supported */
#define PIDX NTB_DEF_PEER_IDX
struct pp_ctx {
struct ntb_dev *ntb;
u64 db_bits;
......@@ -135,7 +138,7 @@ static void pp_ping(unsigned long ctx)
"Ping bits %#llx read %#x write %#x\n",
db_bits, spad_rd, spad_wr);
ntb_peer_spad_write(pp->ntb, 0, spad_wr);
ntb_peer_spad_write(pp->ntb, PIDX, 0, spad_wr);
ntb_peer_db_set(pp->ntb, db_bits);
ntb_db_clear_mask(pp->ntb, db_mask);
......@@ -222,6 +225,12 @@ static int pp_probe(struct ntb_client *client,
}
}
if (ntb_spad_count(ntb) < 1) {
dev_dbg(&ntb->dev, "no enough scratchpads\n");
rc = -EINVAL;
goto err_pp;
}
if (ntb_spad_is_unsafe(ntb)) {
dev_dbg(&ntb->dev, "scratchpad is unsafe\n");
if (!unsafe) {
......@@ -230,6 +239,9 @@ static int pp_probe(struct ntb_client *client,
}
}
if (ntb_peer_port_count(ntb) != NTB_DEF_PEER_CNT)
dev_warn(&ntb->dev, "multi-port NTB is unsupported\n");
pp = kmalloc(sizeof(*pp), GFP_KERNEL);
if (!pp) {
rc = -ENOMEM;
......
此差异已折叠。
此差异已折叠。
......@@ -18,6 +18,7 @@ LIST_DEVS=FALSE
DEBUGFS=${DEBUGFS-/sys/kernel/debug}
DB_BITMASK=0x7FFF
PERF_RUN_ORDER=32
MAX_MW_SIZE=0
RUN_DMA_TESTS=
......@@ -38,6 +39,7 @@ function show_help()
echo "be highly recommended."
echo
echo "Options:"
echo " -b BITMASK doorbell clear bitmask for ntb_tool"
echo " -C don't cleanup ntb modules on exit"
echo " -d run dma tests"
echo " -h show this help message"
......@@ -52,8 +54,9 @@ function show_help()
function parse_args()
{
OPTIND=0
while getopts "Cdhlm:r:p:w:" opt; do
while getopts "b:Cdhlm:r:p:w:" opt; do
case "$opt" in
b) DB_BITMASK=${OPTARG} ;;
C) DONT_CLEANUP=1 ;;
d) RUN_DMA_TESTS=1 ;;
h) show_help; exit 0 ;;
......@@ -85,6 +88,10 @@ set -e
function _modprobe()
{
modprobe "$@"
if [[ "$REMOTE_HOST" != "" ]]; then
ssh "$REMOTE_HOST" modprobe "$@"
fi
}
function split_remote()
......@@ -154,7 +161,7 @@ function doorbell_test()
echo "Running db tests on: $(basename $LOC) / $(basename $REM)"
write_file "c 0xFFFFFFFF" "$REM/db"
write_file "c $DB_BITMASK" "$REM/db"
for ((i=1; i <= 8; i++)); do
let DB=$(read_file "$REM/db") || true
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册