提交 · 12066d612b22001829fa378ca127956ee72e13c1 · openeuler / Kernel

12 10月, 2018 5 次提交

mlxsw: spectrum: Move L3 protocol and address definitions to global header file · 12066d61

由 Ido Schimmel 提交于 10月 11, 2018

The L3 protocol and address definitions are going to be used by the NVE
code, so move them to the global header file from the one private to the
router.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12066d61

mlxsw: spectrum_switchdev: Do not assume notifier information type · 9c73b1d1

由 Ido Schimmel 提交于 10月 11, 2018

VxLAN notifications are going to use a different notifier information
type, so cast to the correct type based on the received event.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c73b1d1

mlxsw: spectrum_switchdev: Check notification relevance based on upper device · 5050f6ae

由 Ido Schimmel 提交于 10月 11, 2018

VxLAN FDB updates are sent with the VxLAN device which is not our upper
and will therefore be ignored by current code.

Solve this by checking whether the upper device (bridge) is our upper.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5050f6ae

mlxsw: spectrum_switchdev: Prepare for VxLAN FDB notifications · ab74c3a1

由 Ido Schimmel 提交于 10月 11, 2018

VxLAN FDB notifications need to be handled differently than bridge FDB
notifications, so initialize the work item based on the received
notification and rename the invoked function accordingly.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab74c3a1

mlxsw: spectrum: Remove misuses of private header file · bf341eb8

由 Ido Schimmel 提交于 10月 11, 2018

The spectrum_router.h header file is private to the router block and
should only be included by direct consumers of it, such as dpipe and the
multicast routing code.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf341eb8

11 10月, 2018 1 次提交

mlxsw: pci: Fix a typo · 9e664316

由 Nir Dotan 提交于 10月 08, 2018

Signed-off-by: NNir Dotan <nird@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e664316

10 10月, 2018 1 次提交

net/mlx4_en: Use minimal rx and tx ring sizes on kdump kernel · 27055454

由 Alaa Hleihel 提交于 10月 09, 2018

When memory is limited (on kdump kernel), reduce size of rx and tx rings.
Also reduce the number of rx rings.
Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27055454

05 10月, 2018 2 次提交

mlxsw: spectrum: Delete RIF when VLAN device is removed · c360867e

由 Ido Schimmel 提交于 10月 04, 2018

In commit 602b74ed ("mlxsw: spectrum_switchdev: Do not leak RIFs
when removing bridge") I handled the case where RIFs created for VLAN
devices were not properly cleaned up when their real device (a bridge)
was removed.

However, I forgot to handle the case of the VLAN device itself being
removed. Do so now when the VLAN device is being unlinked from its real
device.

Fixes: 99f44bb3 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Reported-by: NArtem Shvorin <art@qrator.net>
Tested-by: NArtem Shvorin <art@qrator.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c360867e

mlxsw: pci: Derive event type from event queue number · f3c84a8e

由 Nir Dotan 提交于 10月 04, 2018

Due to a hardware issue in Spectrum-2, the field event_type of the event
queue element (EQE) has become reserved. It was used to distinguish between
command interface completion events and completion events.

Use queue number to determine event type, as command interface completion
events are always received on EQ0 and mlxsw driver maps completion events
to EQ1.

Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
Signed-off-by: NNir Dotan <nird@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3c84a8e

04 10月, 2018 5 次提交

net/mlx5: Add Fast teardown support · fcd29ad1

由 Feras Daoud 提交于 8月 09, 2018

Today mlx5 devices support two teardown modes:
1- Regular teardown
2- Force teardown

This change introduces the enhanced version of the "Force teardown" that
allows SW to perform teardown in a faster way without the need to reclaim
all the pages.

Fast teardown provides the following advantages:
1- Fix a FW race condition that could cause command timeout
2- Avoid moving to polling mode
3- Close the vport to prevent PCI ACK to be sent without been scatter
to memory
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Reviewed-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

fcd29ad1

net/mlx5e: Add new counter for aRFS rule insertion failures · 94563847

由 Eran Ben Elisha 提交于 7月 08, 2018

Count aRFS rules insertion failure for ethtool output. In addition, move
the error print into debug prints mechanism, as it could flood the dmesg
and reduce system BW dramatically.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

94563847

net/mlx5e: Add extack messages for TC offload failures · e98bedf5

由 Eli Britstein 提交于 8月 15, 2018

Return tc extack messages for failures to user space.
Messages provide reasons for not being able to offload rules to HW.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e98bedf5

net/mlx5e: E-Switch, Add extack messages to devlink callbacks · 8c98ee77

由 Eli Britstein 提交于 8月 05, 2018

Return extack messages for failures in the e-switch devlink callbacks.
Messages provide reasons for not being able to issue the operation.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8c98ee77

devlink: Add extack for eswitch operations · db7ff19e

由 Eli Britstein 提交于 8月 15, 2018

Add extack argument to the eswitch related operations.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

db7ff19e

02 10月, 2018 16 次提交

net/mlx5: Cache the system image guid · 59c9d35e

由 Alaa Hleihel 提交于 9月 05, 2018

The system image guid is a read-only field which is used by the TC
offloads code to determine if two mlx5 devices belong to the same
ASIC while adding flows.

Read this once and save it on the core device rather than querying each
time an offloaded flow is added.
Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

59c9d35e

net/mlx5e: Allow reporting of checksum unnecessary · b856df28

由 Or Gerlitz 提交于 7月 01, 2018

Currently we practically never report checksum unnecessary, because
for all IP packets we take the checksum complete path.

Enable non-default runs with reprorting checksum unnecessary, using
an ethtool private flag. This can be useful for performance evals
and other explorations.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b856df28

net/mlx5e: Enable reporting checksum unnecessary also for L3 packets · b820e6fb

由 Or Gerlitz 提交于 7月 01, 2018

We can report checksum unnecessary also when the L3 checksum
flag on the cqe is set and there's no L4 header.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b820e6fb

net/mlx5e: Add ethtool control of ring params to VF representors · f128f138

由 Gavi Teitz 提交于 9月 13, 2018

Added ethtool control to the representors for setting and querying
the ring params.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>

f128f138

net/mlx5e: Enable multi-queue and RSS for VF representors · 84a09733

由 Gavi Teitz 提交于 9月 12, 2018

Increased the amount of channels the representors can open to be the
amount of CPUs. The default amount opened remains one.

Used the standard NIC netdev functions to:
* Set RSS params when building the representors' params.
* Setup an indirect TIR and RQT for the representors upon
  initialization.
* Create a TTC flow table for the representors' indirect TIR (when
  creating the TTC table, mlx5e_set_ttc_basic_params() is not called,
  in order to avoid setting the inner_ttc param, which is not needed).

Added ethtool control to the representors for setting and querying
the amount of open channels. Additionally, included logic in the
representors' ethtool set channels handler which controls a
representor's vport rx rule, so that if there is one open channel
the rx rule steers traffic to the representor's direct TIR, whereas
if there is more than one channel, the rx rule steers traffic to the
new TTC flow table.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

84a09733

net/mlx5e: Expose ethtool rss key size / indirection table functions · a5355de8

由 Or Gerlitz 提交于 8月 26, 2018

Towards enabling RSS for the vport representors, expose the functions for
querying the rss hash key size and indirection table size via ethtool.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a5355de8

net/mlx5e: Expose function for building RSS params · 3edc0159

由 Gavi Teitz 提交于 8月 19, 2018

Towards enabling RSS for the vport representors, extract the
procedure for building a device's RSS params, and expose the
function.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

3edc0159

net/mlx5e: Provide explicit directive if to create inner indirect tirs · 46dc933c

由 Or Gerlitz 提交于 8月 28, 2018

Change the driver functions that deal with creating indirect tirs
to get a flag telling if inner ttc is desired.

A pre-step for enabling rss on the vport representors, where
inner ttc is not needed.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

46dc933c

net/mlx5: E-Switch, Provide flow dest when creating vport rx rule · c966f7d5

由 Gavi Teitz 提交于 8月 17, 2018

Currently the destination for the representor e-switch rx rule is
a TIR number. Towards changing that to potentially be a flow table,
as part of enabling RSS for representors, modify the signature of
the related e-switch API to get a flow destination.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c966f7d5

net/mlx5e: Extract creation of rep's default flow rule · 092297e0

由 Gavi Teitz 提交于 8月 19, 2018

Cleaning up the flow of the representors' rx initialization, towards
enabling RSS for the representors.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

092297e0

net/mlx5e: Enable stateless offloads for VF representor netdevs · dabeb3b0

由 Gavi Teitz 提交于 8月 16, 2018

Enabled checksum and TSO offloads for the representors, in
order to increase their performance, which is required to
increase the performance of flows that cannot be offloaded.

Checksum offloads contribute to a general acceleration of all
traffic (to around 150%), whereas the TSO offload contributes
to a prominent acceleration of the representor's TX for traffic
flows with larger than MTU sized packets (to around 200%). This
is the usual case for TCP streams, as the PF, which serves as
the uplink representor, and the VF representors employ GRO before
forwarding the packets to the representor.

GRO was enabled implicitly for the representors beforehand, and
is explicitly enabled here to ensure that the representors preserve
the performance boost it provides (of around 200%) when working in
tandem with the TSO offload by the forwardee, which is the standard
case as both the PF and the VF representors employ HW TSO.

The impact of these changes can be seen in the following
measurements taken on a setup of a VM over a VF, connected
to OVS via the VF representor, to an external host:

Before current changes:
                     TCP Throughput [Gb/s]
External host to VM         ~ 10.5
VM to external host         ~ 23.5

With just checksum offloads enabled:
                     TCP Throughput [Gb/s]
External host to VM         ~ 14.9
VM to external host         ~ 28.5

With the TSO offload also enabled:
                     TCP Throughput [Gb/s]
External host to VM         ~ 30.5
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

dabeb3b0

net/mlx5e: Change VF representors' RQ type · 749359f4

由 Gavi Teitz 提交于 8月 16, 2018

The representors' RQ size was not large enough for them to achieve
high enough performance, and therefore needed to be enlarged, while
suffering a minimum hit to its memory usage. To achieve this the
representors RQ size was increased, and its type was changed to be a
striding RQ if it is supported.

Towards that goal the following changes were made:

* Extracted the sequence for setting the standard netdev's RQ parmas
  into a function

* Replaced the sequence for setting the representor's RQ params with
  the standard sequence

The impact of this change can be seen in the following measurements
taken on a setup of a VM over a VF, connected to OVS via the VF
representor, to an external host:

Before current change:
                     TCP Throughput [Gb/s]
VM to external host         ~  7.2

With the current change (measured with a striding RQ):
                     TCP Throughput [Gb/s]
VM to external host         ~ 23.5

Each representor now consumes 2 [MB] of memory for its packet
buffers.
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

749359f4

net/mlx5e: Ethtool steering, Support masks for l3/l4 filters · 3a95e0cc

由 Or Gerlitz 提交于 8月 16, 2018

Allow using partial masks for L3 addresses and L4 ports across
the place.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

3a95e0cc

net/mlx5e: Set vlan masks for all offloaded TC rules · cee26487

由 Jianbo Liu 提交于 8月 25, 2018

In flow steering, if asked to, the hardware matches on the first ethertype
which is not vlan. It's possible to set a rule as follows, which is meant
to match on untagged packet, but will match on a vlan packet:
    tc filter add dev eth0 parent ffff: protocol ip flower ...

To avoid this for packets with single tag, we set vlan masks to tell
hardware to check the tags for every matched packet.

Fixes: 095b6cfd ('net/mlx5e: Add TC vlan match parsing')
Signed-off-by: NJianbo Liu <jianbol@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

cee26487

net/mlx5: E-Switch, Fix out of bound access when setting vport rate · 11aa5800

由 Eran Ben Elisha 提交于 9月 16, 2018

The code that deals with eswitch vport bw guarantee was going beyond the
eswitch vport array limit, fix that.  This was pointed out by the kernel
address sanitizer (KASAN).

The error from KASAN log:
[2018-09-15 15:04:45] BUG: KASAN: slab-out-of-bounds in
mlx5_eswitch_set_vport_rate+0x8c1/0xae0 [mlx5_core]

Fixes: c9497c98 ("net/mlx5: Add support for setting VF min rate")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

11aa5800

net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules · 4d8fcf21

由 Alaa Hleihel 提交于 9月 05, 2018

If the peer device was already unbound, then do not attempt to modify
it's resources, otherwise we will crash on dereferencing non-existing
device.

Fixes: 5c65c564 ("net/mlx5e: Support offloading TC NIC hairpin flows")
Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4d8fcf21

24 9月, 2018 3 次提交

mlx5: remove ndo_poll_controller · 9c29bcd1

由 Eric Dumazet 提交于 9月 21, 2018

As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

mlx5 uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c29bcd1

mlx4: remove ndo_poll_controller · a24b66c2

由 Eric Dumazet 提交于 9月 21, 2018

As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.

mlx4 uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a24b66c2

mlxsw: Make MLXSW_SP1_FWREV_MINOR a hard requirement · 12ba7e10

由 Petr Machata 提交于 9月 23, 2018

Up until now, mlxsw tolerated firmware versions that weren't exactly
matching the required version, if the branch number matched. That
allowed the users to test various firmware versions as long as they were
on the right branch.

On the other hand, it made it impossible for mlxsw to put a hard lower
bound on a version that fixes all problems known to date. If a user had
a somewhat older FW version installed, mlxsw would start up just fine,
possibly performing non-optimally as it would use features that trigger
problematic behavior.

Therefore tweak the check to accept any FW version that is:

- on the same branch as the preferred version, and
- the same as or newer than the preferred version.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Reviewed-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12ba7e10

22 9月, 2018 1 次提交

net/mlx4: Use cpumask_available for eq->affinity_mask · 8ac1ee6f

由 Nathan Chancellor 提交于 9月 21, 2018

Clang warns that the address of a pointer will always evaluated as true
in a boolean context:

drivers/net/ethernet/mellanox/mlx4/eq.c:243:11: warning: address of
array 'eq->affinity_mask' will always evaluate to 'true'
[-Wpointer-bool-conversion]
        if (!eq->affinity_mask || cpumask_empty(eq->affinity_mask))
            ~~~~~^~~~~~~~~~~~~
1 warning generated.

Use cpumask_available, introduced in commit f7e30f01 ("cpumask: Add
helper cpumask_available()"), which does the proper checking and avoids
this warning.

Link: https://github.com/ClangBuiltLinux/linux/issues/86Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ac1ee6f

20 9月, 2018 6 次提交

mlxsw: spectrum: Bump required firmware version · f9d5b1d5