提交 · 58a606dba708f114e7f412b5c3024027e8a73e34 · openeuler / Kernel

19 10月, 2021 3 次提交

net/mlx5: Introduce new uplink destination type · 58a606db

由 Maor Gottlieb 提交于 8月 03, 2021

The uplink destination type should be used in rules to steer the
packet to the uplink when the device is in steering based LAG mode.
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

58a606db

net/mlx5: Add support to create match definer · e7e2519e

由 Maor Gottlieb 提交于 7月 06, 2021

Introduce new APIs to create and destroy flow matcher
for given format id.

Flow match definer object is used for defining the fields and
mask used for the hash calculation. User should mask the desired
fields like done in the match criteria.

This object is assigned to flow group of type hash. In this flow
group type, packets lookup is done based on the hash result.

This patch also adds the required bits to create such flow group.
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e7e2519e

net/mlx5: Introduce port selection namespace · 425a563a

由 Maor Gottlieb 提交于 5月 23, 2021

Add new port selection flow steering namespace. Flow steering rules in
this namespaceare are used to determine the physical port for egress
packets.
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

425a563a

16 10月, 2021 5 次提交

net/mlx5: Use native_port_num as 1st option of device index · 1021d064

由 Rongwei Liu 提交于 10月 12, 2021

Using "native_port_num" can support more NICs.

Fallback to PCIe IDs if "native_port_num" query fails.
Signed-off-by: NRongwei Liu <rongweil@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

1021d064

net/mlx5: Introduce new device index wrapper · 2ec16ddd

由 Rongwei Liu 提交于 9月 16, 2021

Downstream patches.
Signed-off-by: NRongwei Liu <rongweil@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

2ec16ddd

net/mlx5: Disable roce at HCA level · fbfa97b4

由 Shay Drory 提交于 8月 18, 2021

Currently, when a user disables roce via the devlink param, this change
isn't passed down to the device.
If device allows disabling RoCE at device level, make use of it. This
instructs the device to skip memory allocations related to RoCE
functionality which otherwise is done by the device.
Signed-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

fbfa97b4

net/mlx5: Read timeout values from init segment · 5945e1ad

由 Amir Tzin 提交于 10月 07, 2021

Replace hard coded timeouts with values stored in firmware's init
segment. Timeouts are read from init segment during driver load. If init
segment timeouts are not supported then fallback to hard coded defaults
instead. Also move pre initialization timeouts which cannot be read from
firmware to the new mechanism.
Signed-off-by: NAmir Tzin <amirtz@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5945e1ad

net/mlx5: Add layout to support default timeouts register · 4b2c5fa9

由 Amir Tzin 提交于 7月 21, 2021

Add needed structures and defines for DTOR (default timeouts register).
This will be used to get timeouts values from FW instead of hard coded
values in the driver code thus enabling support for slower devices which
need longer timeouts.
Signed-off-by: NAmir Tzin <amirtz@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4b2c5fa9

13 10月, 2021 1 次提交

net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp · 0bc73ad4

由 Aya Levin 提交于 9月 26, 2021

Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.

Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

0bc73ad4

05 10月, 2021 3 次提交

net/mlx5: Enable single IRQ for PCI Function · f891b7cd

由 Shay Drory 提交于 8月 01, 2021

Prior to this patch the driver requires two IRQs to function properly,
one required IRQ for control and at least one required IRQ for IO.

This requirement can be relaxed to one as the driver now allows
sharing of IRQs, so control and IO EQs can share the same irq.

This is needed for high scale amount of VFs.
Signed-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

f891b7cd

net/mlx5: Shift control IRQ to the last index · 3663ad34

由 Shay Drory 提交于 8月 19, 2021

Control IRQ is the first IRQ vector. This complicates handling of
completion irqs as we need to offset them by one.
in the next patch, there are scenarios where completion and control EQs
will share the same irq. for example: functions with single IRQ. To ease
such scenarios, we shift control IRQ to the end of the irq array.
Signed-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3663ad34

net/mlx5: Bridge, mark reg_c1 when pushing VLAN · 5249001d

由 Vlad Buslov 提交于 9月 01, 2021

On ingress VLAN push also assign value 0x7FE to reg_c1 tunnel id+opts
bits (tunnel id 0, which is not a valid tunnel id, and option 0x7FE which
was reserved by one of previous patches in the series). In following patch
the reg value is matched on egress miss to restore the packet to its
original state by removing the VLAN before passing it to the software data
path.
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NPaul Blakey <paulb@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5249001d

20 8月, 2021 1 次提交

net/mlx5: E-switch, Introduce rate limiting groups API · 1ae258f8

由 Dmytro Linkin 提交于 5月 31, 2021

Extend eswitch API with rate limiting groups:

- Define new struct mlx5_esw_rate_group that is used to hold all
  internal group data.

- Implement functions that allow creation, destruction and cleanup of
  groups.

- Assign all vports to internal unlimited zero group by default.

This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.
Co-developed-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: NHuy Nguyen <huyn@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>

1ae258f8

12 8月, 2021 4 次提交

net/mlx5: Allocate individual capability · 48f02eef

由 Parav Pandit 提交于 7月 13, 2021

Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.

By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment

In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

48f02eef

net/mlx5: Reorganize current and maximal capabilities to be per-type · 5958a6fa

由 Parav Pandit 提交于 7月 13, 2021

In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5958a6fa

net/mlx5: Delete impossible dev->state checks · 8e792700

由 Leon Romanovsky 提交于 8月 01, 2021

New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.

That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.

PCI:
 .probe()
  -> probe_one()
   -> mlx5_init_one()

The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().

Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8e792700

net/mlx5: Fix typo in comments · 39c538d6

由 Cai Huoqing 提交于 7月 30, 2021

Fix typo:
*vectores  ==> vectors
*realeased  ==> released
*erros  ==> errors
*namepsace  ==> namespace
*trafic  ==> traffic
*proccessed  ==> processed
*retore  ==> restore
*Currenlty  ==> Currently
*crated  ==> created
*chane  ==> change
*cannnot  ==> cannot
*usuallly  ==> usually
*failes  ==> fails
*importent  ==> important
*reenabled  ==> re-enabled
*alocation  ==> allocation
*recived  ==> received
*tanslation  ==> translation
Signed-off-by: NCai Huoqing <caihuoqing@baidu.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

39c538d6

11 8月, 2021 1 次提交

vdpa/mlx5: Fix queue type selection logic · 879753c8

由 Eli Cohen 提交于 8月 11, 2021

get_queue_type() comments that splict virtqueue is preferred, however,
the actual logic preferred packed virtqueues. Since firmware has not
supported packed virtqueues we ended up using split virtqueues as was
desired.

Since we do not advertise support for packed virtqueues, we add a check
to verify split virtqueues are indeed supported.

Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: NEli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210811053759.66752-1-elic@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

879753c8

10 8月, 2021 1 次提交

net/mlx5: Synchronize correct IRQ when destroying CQ · 563476ae

由 Shay Drory 提交于 4月 11, 2021

The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.

This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.

As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.

Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ad ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

563476ae

06 8月, 2021 4 次提交

net/mlx5: E-Switch, Add event callback for representors · c8e6a9e6

由 Mark Bloch 提交于 8月 03, 2021

This callback will allow to notify representors about relevant events
when in OFFLOADS mode. In downstream patches, this will be used to notify
about PAIR/UNPAIR devcom events.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c8e6a9e6

{net, RDMA}/mlx5: Extend send to vport rules · 979bf468

由 Mark Bloch 提交于 8月 03, 2021

In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.

While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

979bf468

net/mlx5: Lag, add initial logic for shared FDB · af8c0e25

由 Mark Bloch 提交于 8月 03, 2021

As shared FDB requires changes in two subsystems first expose the needed
core functions so the RDMA side can be changed.

mlx5_lag_is_master(): return true if a given mlx5 device is the lag master.
mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB.
mlx5_lag_get_peer_mdev(): Return the peer mdev in lag.

The mentioned functions will be used by downstream patches in order
to add support for shared FDB for the RDMA side.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

af8c0e25

net/mlx5: Return mdev from eswitch · 97a8a8c1

由 Mark Bloch 提交于 8月 03, 2021

Export a function so users can retrieve the mellanox device that manages
the eswitch from the eswitch device.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

97a8a8c1

03 8月, 2021 1 次提交

net/mlx5: Move TTC logic to fs_ttc · 371cf74e

由 Maor Gottlieb 提交于 7月 02, 2021

Now that TTC logic is not dependent on mlx5e structs, move it to
lib/fs_ttc.c so it could be used other part of the mlx5 driver.
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

371cf74e

27 7月, 2021 1 次提交

net/mlx5e: Block LRO if firmware asks for tunneled LRO · 26ab7b38

由 Maxim Mikityanskiy 提交于 4月 23, 2021

This commit does a cleanup in LRO configuration.

LRO is a parameter of an RQ, but its state is changed by modifying a TIR
related to the RQ.

The current status: LRO for tunneled packets is not supported in the
driver, inner TIRs may enable LRO on creation, but LRO status of inner
TIRs isn't changed in mlx5e_modify_tirs_lro(). This is inconsistent, but
as long as the firmware doesn't declare support for tunneled LRO, it
works, because the same RQs are shared between the inner and outer TIRs.

This commit does two fixes:

1. If the firmware has the tunneled LRO capability, LRO is blocked
altogether, because it's not possible to block it for inner TIRs only,
when the same RQs are shared between inner and outer TIRs, and the
driver won't be able to handle tunneled LRO traffic.

2. mlx5e_modify_tirs_lro() is patched to modify LRO state for all TIRs,
including inner ones, because all TIRs related to an RQ should agree on
their LRO state.

Fixes: 7b3722fa ("net/mlx5e: Support RSS for GRE tunneled packets")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

26ab7b38

25 7月, 2021 1 次提交

IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq · 616d5769

由 Tal Gilboa 提交于 7月 18, 2021

is_apu_thread_cq() used to detect CQs which are attached to APU
threads. This was extended to support other elements as well,
so the function was renamed to is_apu_cq().

c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't
reflected when the APU support was first introduced.

Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa
Signed-off-by: NTal Gilboa <talgi@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

616d5769

18 7月, 2021 1 次提交

net/mlx5: Add DCS caps & fields support · 96cd2dd6

由 Lior Nahmanson 提交于 12月 28, 2020

This fields will be needed when adding a support for DCS offload

max_dci_stream_channels - maximum DCI stream channels supported per DCI.
max_dci_errored_streams - maximum DCI error stream channels
supported per DCI before a DCI move to error state.
Signed-off-by: NLior Nahmanson <liorna@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

96cd2dd6

03 7月, 2021 1 次提交

vdpa/mlx5: Support creating resources with uid == 0 · e13cd45d

由 Eli Cohen 提交于 5月 31, 2021

Currently all resources must be created with uid != 0 which is essential
when userspace processes are allocating virtquueue resources. Since this
is a kernel implementation, it is perfectly legal to open resources with
uid == 0.

In case firmware supports, avoid allocating user context.
Signed-off-by: NEli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210531160404.31368-1-elic@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>

e13cd45d

26 6月, 2021 1 次提交

net/mlx5: DR, Add support for flow sampler offload · 1ab6dc35

由 Yevgeny Kliteynik 提交于 4月 19, 2021

Add SW steering support for sFlow / flow sampler action.
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

1ab6dc35

22 6月, 2021 1 次提交

RDMA/mlx5: Refactor get_ts_format functions to simplify code · 9a1ac95a

由 Aharon Landau 提交于 6月 16, 2021

QPC, SQC and RQC timestamp formats and capabilities are always equal
because they represent general hardware support. So instead of code
duplication, let's merge them into general enum and logic.
Signed-off-by: NAharon Landau <aharonl@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

9a1ac95a

17 6月, 2021 1 次提交

net/mlx5e: Don't create devices during unload flow · a5ae8fc9

由 Dmytro Linkin 提交于 5月 14, 2021

Running devlink reload command for port in switchdev mode cause
resources to corrupt: driver can't release allocated EQ and reclaim
memory pages, because "rdma" auxiliary device had add CQs which blocks
EQ from deletion.
Erroneous sequence happens during reload-down phase, and is following:

1. detach device - suspends auxiliary devices which support it, destroys
   others. During this step "eth-rep" and "rdma-rep" are destroyed,
   "eth" - suspended.
2. disable SRIOV - moves device to legacy mode; as part of disablement -
   rescans drivers. This step adds "rdma" auxiliary device.
3. destroy EQ table - <failure>.

Driver shouldn't create any device during unload flows. To handle that
implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
on device attach. If flag is set do no-op on drivers rescan.

Fixes: a925b5e3 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

a5ae8fc9

15 6月, 2021 2 次提交

net/mlx5: Enlarge interrupt field in CREATE_EQ · 3af26495

由 Shay Drory 提交于 5月 10, 2021

FW is now supporting more than 256 MSI-X per PF (up to 2K).
Hence, enlarge interrupt field in CREATE_EQ to make use of the new
MSI-X's.
Signed-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3af26495

net/mlx5: Provide cpumask at EQ creation phase · e4e3f24b

由 Leon Romanovsky 提交于 2月 23, 2021

The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e4e3f24b

10 6月, 2021 5 次提交

net/mlx5: Bridge, add offload infrastructure · 19e9bfa0

由 Vlad Buslov 提交于 4月 02, 2021

Create new files bridge.{c|h} in en/rep directory that implement bridge
interaction with representor netdevices and handle required
events/notifications, bridge.{c|h} in esw directory that implement all
necessary eswitch offloading infrastructure and works on vport/eswitch
level. Provide new kconfig MLX5_BRIDGE which is automatically selected when
both kernel bridge and mlx5 eswitch configs are enabled.

Provide basic infrastructure for bridge offloads:

- struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure
that encapsulates generic bridge-offloads data (notifier blocks, ingress
flow table/group, etc.) that is created/deleted on enable/disable eswitch
offloads.

- struct mlx5_esw_bridge - per-bridge structure that encapsulates
per-bridge data (reference counter, FDB, egress flow table/group, etc.)
that is created when first eswitch represetor is attached to new bridge and
deleted when last representor is removed from the bridge as a result of
NETDEV_CHANGEUPPER event.

The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
namespace. The new priority is between tc-miss and slow path priorities.
Priority consist of two levels: the ingress table that is global per
eswitch and matches incoming packets by src_mac/vid and redirects them to
next level (egress table) that is chosen according to ingress port bridge
membership and matches on dst_mac/vid in order to redirect packet to vport
according to the following diagram:

                +
                |
      +---------v----------+
      |                    |
      |   FDB_TC_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_FT_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |    FDB_TC_MISS     |
      |                    |
      +---------+----------+
                |
+--------------------------------------+
|               |                      |
|        +------+                      |
|        |                             |
| +------v--------+   FDB_BR_OFFLOAD   |
| | INGRESS_TABLE |                    |
| +------+---+----+                    |
|        |   |      match              |
|        |   +---------+               |
|        |             |               |    +-------+
|        |     +-------v-------+ match |    |       |
|        |     | EGRESS_TABLE  +------------> vport |
|        |     +-------+-------+       |    |       |
|        |             |               |    +-------+
|        |    miss     |               |
|        +------+------+               |
|               |                      |
+--------------------------------------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_SLOW_PATH    |
      |                    |
      +---------+----------+
                |
                v
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

19e9bfa0

net/mlx5: Create TC-miss priority and table · ec3be887

由 Vlad Buslov 提交于 3月 04, 2021

In order to adhere to kernel software datapath model bridge offloads must
come after TC and NF FDBs. Following patches in this series add new FDB
priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload
is implemented with unmanaged tables, its miss path is not automatically
connected to next priority and requires the code to manually connect with
slow table. To keep bridge offloads encapsulated and not mix it with
eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD
and FDB_SLOW_PATH:

          +
          |
+---------v----------+
|                    |
|   FDB_TC_OFFLOAD   |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|   FDB_FT_OFFLOAD   |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|    FDB_TC_MISS     |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|   FDB_SLOW_PATH    |
|                    |
+---------+----------+
          |
          v

Initialize the new priority with single default empty managed table and use
the table as TC/NF miss patch instead of slow table. This approach allows
bridge offloads to be created as new FDB namespace priority between
FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any
other modules since miss path of managed TC-miss table is automatically
wired to next priority.
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ec3be887

net/mlx5: Added new parameters to reformat context · 3f3f05ab

由 Yevgeny Kliteynik 提交于 3月 09, 2021

Adding new reformat context type (INSERT_HEADER) requires adding two new
parameters to reformat context - reformat_param_0 and reformat_param_1.
As defined by HW spec, these parameters have different meaning for
different reformat context type.

The first parameter (reformat_param_0) is not new to HW spec, but it
wasn't used by any of the supported reformats. The second parameter
(reformat_param_1) is new to the HW spec - it was added to allow
supporting INSERT_HEADER.

For NSERT_HEADER, reformat_param_0 indicates the header used to
reference the location of the inserted header, and reformat_param_1
indicates the offset of the inserted header from the reference point
defined by reformat_param_0.
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3f3f05ab

net/mlx5: mlx5_ifc support for header insert/remove · 67133eaa

由 Yevgeny Kliteynik 提交于 3月 09, 2021

Add support for HCA caps 2 that contains capabilities for the new
insert/remove header actions.

Added the required definitions for supporting the new reformat type:
added packet reformat parameters, reformat anchors and definitions
to allow copy/set into the inserted EMD (Embedded MetaData) tag.
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NJianbo Liu <jianbol@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

67133eaa

net/mlx5e: Fix page reclaim for dead peer hairpin · a3e5fd93

由 Dima Chumak 提交于 5月 26, 2021

When adding a hairpin flow, a firmware-side send queue is created for
the peer net device, which claims some host memory pages for its
internal ring buffer. If the peer net device is removed/unbound before
the hairpin flow is deleted, then the send queue is not destroyed which
leads to a stack trace on pci device remove:

[ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
[ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
[ 748.002171] ------------[ cut here ]------------
[ 748.001177] FW pages counter is 4 after reclaiming all pages
[ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]                      [  +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
[ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
[ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
[ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
[ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
[ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
[ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
[ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
[ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
[ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
[ 748.001429] FS:  00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
[ 748.001695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
[ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 748.001654] Call Trace:
[ 748.000576]  ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
[ 748.001416]  ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
[ 748.001354]  ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
[ 748.001203]  mlx5_function_teardown+0x30/0x60 [mlx5_core]
[ 748.001275]  mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
[ 748.001200]  remove_one+0x5f/0xc0 [mlx5_core]
[ 748.001075]  pci_device_remove+0x9f/0x1d0
[ 748.000833]  device_release_driver_internal+0x1e0/0x490
[ 748.001207]  unbind_store+0x19f/0x200
[ 748.000942]  ? sysfs_file_ops+0x170/0x170
[ 748.001000]  kernfs_fop_write_iter+0x2bc/0x450
[ 748.000970]  new_sync_write+0x373/0x610
[ 748.001124]  ? new_sync_read+0x600/0x600
[ 748.001057]  ? lock_acquire+0x4d6/0x700
[ 748.000908]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 748.001126]  ? fd_install+0x1c9/0x4d0
[ 748.000951]  vfs_write+0x4d0/0x800
[ 748.000804]  ksys_write+0xf9/0x1d0
[ 748.000868]  ? __x64_sys_read+0xb0/0xb0
[ 748.000811]  ? filp_open+0x50/0x50
[ 748.000919]  ? syscall_enter_from_user_mode+0x1d/0x50
[ 748.001223]  do_syscall_64+0x3f/0x80
[ 748.000892]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 748.001026] RIP: 0033:0x7f58bcfb22f7
[ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
[ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
[ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
[ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
[ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
[ 748.001564] irq event stamp: 0
[ 748.000787] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
[ 748.001854] softirqs last  enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
[ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 748.001492] ---[ end trace a6fabd773d1c51ae ]---

Fix by destroying the send queue of a hairpin peer net device that is
being removed/unbound, which returns the allocated ring buffer pages to
the host.

Fixes: 4d8fcf21 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
Signed-off-by: NDima Chumak <dchumak@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

a3e5fd93

02 6月, 2021 1 次提交

net/mlx5: DR, Create multi-destination flow table with level less than 64 · 216214c6

由 Yevgeny Kliteynik 提交于 12月 09, 2020

Flow table that contains flow pointing to multiple flow tables or multiple
TIRs must have a level lower than 64. In our case it applies to muli-
destination flow table.
Fix the level of the created table to comply with HW Spec definitions, and
still make sure that its level lower than SW-owned tables, so that it
would be possible to point from the multi-destination FW table to SW
tables.

Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: NAlex Vesker <valex@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

216214c6

28 5月, 2021 1 次提交

net/mlx5: Move chains ft pool to be used by all firmware steering · 4a98544d

由 Paul Blakey 提交于 3月 08, 2021

Firmware FT pool is per device, but the software tracking of this pool
only services fs_chains users, and if another layer takes a flow table,
the pool will not be updated, and fs_chains will fail creating a flow
table, with no recovery till the flow table is returned.

Move FT pool to be global per device, and stored at the cmd level,
so all layers can use it.
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4a98544d

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功