提交 · 78228cbdeb0aa5c96e2a721e7e0d6953b416b5a3 · openanolis / cloud-kernel

22 3月, 2016 2 次提交

IB/mlx5: Implement callbacks for manipulating VFs · eff901d3

由 Eli Cohen 提交于 3月 11, 2016

Implement the IB defined callbacks used to manipulate the policy for the
link state, set GUIDs or get statistics information. This functionality
is added into a new file that will be used to add any SRIOV related
functionality to the mlx5 IB layer.

The following callbacks have been added:

mlx5_ib_get_vf_config
mlx5_ib_set_vf_link_state
mlx5_ib_get_vf_stats
mlx5_ib_set_vf_guid

In addition, publish whether this device is based on a virtual function.

In mlx5 supported devices, virtual functions are implemented as vHCAs.
vHCAs have their own QP number space so it is possible that two vHCAs
will use a QP with the same number at the same time.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

eff901d3

net/mlx5_core: Refactor device capability function · b06e7de8

由 Leon Romanovsky 提交于 2月 23, 2016

Device capability function was called similar in all places.
It was called twice for every queried parameter, while the
difference between calls was in HCA capability mode only.

The change proposed unify these calls into one function.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b06e7de8

02 3月, 2016 3 次提交

net/mlx5: Fix global UAR mapping · 0ba42241

由 Moshe Lazer 提交于 3月 02, 2016

Avoid double mapping of io mapped memory, Device page may be
mapped to non-cached(NC) or to write-combining(WC).
The code before this fix tries to map it both to WC and NC
contrary to what stated in Intel's software developer manual.

Here we remove the global WC mapping of all UARS
"dev->priv.bf_mapping", since UAR mapping should be decided
per UAR (e.g we want different mappings for EQs, CQs vs QPs).

Caller will now have to choose whether to map via
write-combining API or not.

mlx5e SQs will choose write-combining in order to perform
BlueFlame writes.

Fixes: 88a85f99 ('TX latency optimization to save DMA reads')
Signed-off-by: NMoshe Lazer <moshel@mellanox.com>
Reviewed-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ba42241

net/mlx5: Make command timeout way shorter · 6b6c07bd

由 Or Gerlitz 提交于 3月 02, 2016

The command timeout is terribly long, whole two hours. Make it 60s so if
things do go wrong, the user gets feedback in relatively short time, so
they can take corrective actions and/or investigate using tools and such.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b6c07bd

net/mlx5: Refactor mlx5_core_mr to mkey · a606b0f6

由 Matan Barak 提交于 2月 29, 2016

Mlx5's mkey mechanism is also used for memory windows.
The current code base uses MR (memory region) naming, which is
inaccurate. Changing MR to mkey in order to represent its different
usages more accurately.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a606b0f6

01 3月, 2016 1 次提交

net/mlx5_core: Add helper function to read IB error counters · 1c64bf6f

由 Meny Yossefi 提交于 2月 18, 2016

Added helper function to read IB standard error counters
via the PPCNT register.

The PPCNT register read command provides the 32-bit error counters
of both IB/RoCE link layer and transport layer.
Signed-off-by: NMeny Yossefi <menyy@mellanox.com>
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1c64bf6f

25 2月, 2016 2 次提交

net/mlx5: Introduce physical port TC/prio access functions · 4f3961ee

由 Saeed Mahameed 提交于 2月 22, 2016

Add access functions to set and query a physical port TC groups
and prio parameters.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f3961ee

net/mlx5: Introduce a new header file for physical port functions · ada68c31

由 Achiad Shochat 提交于 2月 22, 2016

All the device physical port access functions are implemented in the
port.c file.
We just extract the exposure of these functions from driver.h into a
dedicated header file called port.h.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ada68c31

22 1月, 2016 1 次提交

net/mlx5_core: Add RQ and SQ event handling · e2013b21

由 majd@mellanox.com 提交于 1月 14, 2016

RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated
events are affiliated also with SQs and RQs.

Since SQ, RQ and QP resource numbers do not share the same name
space, a queue type field was added to the event data to specify
the SW object that the event is affiliated with.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e2013b21

18 1月, 2016 1 次提交

net/mlx5_core: Fix trimming down IRQ number · 0b6e26ce

由 Doron Tsur 提交于 1月 17, 2016

With several ConnectX-4 cards installed on a server, one may receive
irqn > 255 from the kernel API, which we mistakenly trim to 8bit.

This causes EQ creation failure with the following stack trace:
[<ffffffff812a11f4>] dump_stack+0x48/0x64
[<ffffffff810ace21>] __setup_irq+0x3a1/0x4f0
[<ffffffff810ad7e0>] request_threaded_irq+0x120/0x180
[<ffffffffa0923660>] ? mlx5_eq_int+0x450/0x450 [mlx5_core]
[<ffffffffa0922f64>] mlx5_create_map_eq+0x1e4/0x2b0 [mlx5_core]
[<ffffffffa091de01>] alloc_comp_eqs+0xb1/0x180 [mlx5_core]
[<ffffffffa091ea99>] mlx5_dev_init+0x5e9/0x6e0 [mlx5_core]
[<ffffffffa091ec29>] init_one+0x99/0x1c0 [mlx5_core]
[<ffffffff812e2afc>] local_pci_probe+0x4c/0xa0

Fixing it by changing of the irqn type from u8 to unsigned int to
support values > 255

Fixes: 61d0e73e ('net/mlx5_core: Use the the real irqn in eq->irqn')
Reported-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDoron Tsur <doront@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b6e26ce

24 12月, 2015 2 次提交

IB/mlx5: Advertise atomic capabilities in query device · da7525d2

由 Eran Ben Elisha 提交于 12月 14, 2015

In order to ensure IB spec atomic correctness in atomic operations, if
HW is configured to host endianness, advertise IB_ATOMIC_HCA.  if not,
advertise IB_ATOMIC_NONE.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

da7525d2

IB/mlx5: Extend query_device/port to support RoCE · 3f89a643

由 Achiad Shochat 提交于 12月 23, 2015

Using the vport access functions to retrieve the Ethernet
specific information and return this information in
ib_query_device and ib_query_port.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3f89a643

12 12月, 2015 1 次提交

net/mlx5_core: Flow steering tree initialization · 25302363

由 Maor Gottlieb 提交于 12月 10, 2015

Flow steering initialization is based on static tree which
illustrates the flow steering tree when the driver is loaded. The
initialization considers the max supported flow table level of the device,
a minimum of 2 kernel flow tables(vlan and mac) are required to have
kernel flow table functionality.

The tree structures when the driver is loaded:

		root_namespace(receive nic)
			  |
		priority-0 (kernel priority)
			  |
		namespace(kernel namespace)
			  |
		priority-0 (flow tables priority)

In the following patches, When the EN driver will use the flow steering
API, it create two flow tables and their flow groups under
priority-0(flow tables priority).
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25302363

04 12月, 2015 2 次提交

net/mlx5: Introducing E-Switch and l2 table · 073bb189

由 Saeed Mahameed 提交于 12月 01, 2015

E-Switch is the software entity that represents and manages ConnectX4
inter-HCA ethernet l2 switching.

E-Switch has its own Virtual Ports, each Vport/vNIC/VF can be
connected to the device through a vport of an e-switch.

Each e-switch is managed by one vNIC identified by
HCA_CAP.vport_group_manager (usually it is the PF/vport[0]),
and its main responsibility is to forward each packet to the
right vport.

e-Switch needs to manage its own l2-table and FDB tables.

L2 table is a flow table that is managed by FW, it is needed for
Multi-host (Multi PF) configuration for inter HCA switching between
PFs.

FDB table is a flow table that is totally managed by e-Switch driver,
its main responsibility is to switch packets between e-Swtich internal
vports and uplink vport that belong to the same.

This patch introduces only e-Swtich l2 table management, FDB managemnt
will come later when ethernet SRIOV/VFs will be enabled.

preperation for ethernet sriov and l2 table management.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

073bb189

net/mlx5_core: Add base sriov support · fc50db98

由 Eli Cohen 提交于 12月 01, 2015

This patch adds SRIOV base support for mlx5 supported devices. The same
driver is used for both PFs and VFs; VFs are identified by the driver
through the flag MLX5_PCI_DEV_IS_VF added to the pci table entries.
Virtual functions are created as usual through writing a value to the
sriov_numvs sysfs file of the PF device. Upon instantiating VFs, they will
all be probed by the driver on the hypervisor. One can gracefully unbind
them through /sys/bus/pci/drivers/mlx5_core/unbind.

mlx5_wait_for_vf_pages() was added to ensure that when a VF dies without
executing proper teardown, the hypervisor driver waits till all of the
pages that were allocated at the hypervisor to maintain its operation
are returned.

In order for the VF to be operational, the PF needs to call enable_hca
for it. This can be done before the VFs are created through a call to
pci_enable_sriov.

If the there are VFs assigned to a VMs when the driver of the PF is
unloaded, all the VF will experience system error and PF driver unloads
cleanly; in this case pci_disable_sriov is not called and the devices
will show when running lspci. Once the PF driver is reloaded, it will
sync its data structures which maintain state on its VFs.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc50db98

15 10月, 2015 3 次提交

net/mlx5_core: Wait for FW readiness on startup · e3297246

由 Eli Cohen 提交于 10月 14, 2015

On device initialization, wait till firmware indicates that that it is done
with initialization before proceeding to initialize the device.

Also update initialization segment layout to match driver/firmware
interface definitions.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3297246

net/mlx5_core: Add pci error handlers to mlx5_core driver · 89d44f0a

由 Majd Dibbiny 提交于 10月 14, 2015

This patch implement the pci_error_handlers for mlx5_core which allow the
driver to recover from PCI error.

Once an error is detected in the PCI, the mlx5_pci_err_detected is called
and it:
1) Marks the device to be in 'Internal Error' state.
2) Dispatches an event to the mlx5_ib to flush all the outstanding cqes
with error.
3) Returns all the on going commands with error.
4) Unloads the driver.

Afterwards, the FW is reset and mlx5_pci_slot_reset is called and it
enables the device and restore it's pci state.

If the later succeeds, mlx5_pci_resume is called, and it loads the SW
stack.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89d44f0a

net/mlx5_core: Fix internal error detection conditions · fd76ee4d

由 Eli Cohen 提交于 10月 14, 2015

The detection of a fatal condition has been updated to take into account
the state reported by the device or by detecting an all ones read of the
firmware version which indicates that the device is not accessible.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd76ee4d

09 10月, 2015 2 次提交

net/mlx5_core: Use private health thread for each device · ac6ea6e8

由 Eli Cohen 提交于 10月 08, 2015

Use a single threaded work queue for each device in the system instead of
using one thread for any device. This is required so we can concurrently
process system error handling for all the devices that need that.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac6ea6e8

net/mlx5_core: Prepare cmd interface to system errors handling · 020446e0

由 Eli Cohen 提交于 10月 08, 2015

In preparation to handling system errors at the mlx5_core level, change the
interface of cmd_work_handler to accept a 64 bit argument for the vector.

This allows to encode a flag that signifies when the handler is called
as a result of a driver logic that wishes to terminate commands that
the hardware may not be able to terminate. Such command completions
are detected at the handler and proper return status is encoded.

To be able to terminate page handler commands, we make sure to set
the corresponding bit in the bitmask.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

020446e0

25 9月, 2015 1 次提交

IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY · c6790aa9

由 Sagi Grimberg 提交于 9月 24, 2015

Commit 96249d70 ("IB/core: Guarantee that a local_dma_lkey
is available") allows ULPs that make use of the local dma key to keep
working as before by allocating a DMA MR with local permissions and
converted these consumers to use the MR associated with the PD
rather then device->local_dma_lkey.

ConnectIB has some known issues with memory registration
using the local_dma_lkey (SEND, RDMA, RECV seems to work ok).

Thus don't expose support for it (remove device->local_dma_lkey
setting), and take advantage of the above commit such that no regression
is introduced to working systems.

The local_dma_lkey support will be restored in CX4 depending on FW
capability query.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c6790aa9

29 8月, 2015 1 次提交

mlx5: Fix missing device local_dma_lkey · a3c87420

由 Sagi Grimberg 提交于 7月 20, 2015

The mlx5 driver exposes device capability IB_DEVICE_LOCAL_DMA_LKEY
but does not set the the device local_dma_lkey. This breaks
rpcrdma drivers.

Query and set this lkey when creating the device resources.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a3c87420

18 8月, 2015 2 次提交

net/mlx5e: Support ethtool get/set_pauseparam · 3c2d18ef

由 Achiad Shochat 提交于 8月 16, 2015

Only rx/tx pause settings.
Autoneg setting is currently not supported.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c2d18ef

net/mlx5e: Ethtool link speed setting fixes · 6fa1bcab

由 Achiad Shochat 提交于 8月 16, 2015

- Port speed settings are applied by the device only upon
  port admin status transition from DOWN to UP.
  So we enforce this transition regardless of the port's
  current operation state (which may be occasionally DOWN if
  for example the network cable is disconnected).
- Fix the PORT_UP/DOWN device interface enum
- Set the local_port bit in the device PAOS register
- EXPORT the PAOS (Port Administrative and Operational Status)
  register set/query access functions.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fa1bcab

07 8月, 2015 1 次提交

net/mlx5_core: Support physical port counters · efea389d

由 Gal Pressman 提交于 8月 04, 2015

Added physical port counters in the following standard formats to
ethtool statistics:
  - IEEE 802.3
  - RFC2863
  - RFC2819
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efea389d

27 7月, 2015 2 次提交

net/mlx5e: TX latency optimization to save DMA reads · 88a85f99

由 Achiad Shochat 提交于 7月 23, 2015

A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
  written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.
Signed-off-by: NAchiad Shochat <achiad@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88a85f99

net/mlx5e: Allocate DMA coherent memory on reader NUMA node · 311c7c71

由 Saeed Mahameed 提交于 7月 23, 2015

By affinity hints and XPS, each mlx5e channel is assigned a CPU
core.

Channel DMA coherent memory that is written by the NIC and read
by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU
core assigned for the channel.

Channel DMA coherent memory that is written by SW and read by the
NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC.

Doorbell record (written by SW and read by the NIC) is an
exception since it is accessed by SW more frequently.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

311c7c71

12 6月, 2015 1 次提交

net/mlx5e: Fix HW MTU settings · facc9699

由 Saeed Mahameed 提交于 6月 11, 2015

Previously we configured HW MTU to be netdev->mtu, actually we
need to configure netdev->mtu + (ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN).

Also, query MTU can not fail, hence make the relevant helper a
void functionm, add mlx5e_set_dev_port_mtu, helper function to
handle MTU setting.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

facc9699

05 6月, 2015 6 次提交

net/mlx5_core: Add more query port helpers · a124d13e