提交 · 868a01a27d80a59e719f0c369d1b26b923fc7674 · gsplhtlxg / clone-Linux

26 5月, 2018 4 次提交

net/mlx5e: Introducing new statistics rwlock · 868a01a2

由 Shalom Lagziel 提交于 2月 12, 2018

Introduce a new read/write lock that will protect statistics gathering from
netdev channels configuration changes.
e.g. when channels are being replaced (increase/decrease number of rings)
prevent statistic gathering (ndo_get_stats64) to read the statistics of
in-active channels (channels that are being closed).

Plus update channels software statistics on the fly when calling
ndo_get_stats64, and remove it from stats periodic work.

Fixes: 9218b44d ("net/mlx5e: Statistics handling refactoring")
Signed-off-by: NShalom Lagziel <shaloml@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

868a01a2

net/mlx5: Use order-0 allocations for all WQ types · 3a2f7033

由 Tariq Toukan 提交于 4月 04, 2018

Complete the transition of all WQ types to use fragmented
order-0 coherent memory instead of high-order allocations.

CQ-WQ already uses order-0.
Here we do the same for cyclic and linked-list WQs.

This allows the driver to load cleanly on systems with a highly
fragmented coherent memory.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

No degradation is sensed.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

3a2f7033

net/mlx5e: TX, Use actual WQE size for SQ edge fill · 043dc78e

由 Tariq Toukan 提交于 3月 21, 2018

We fill SQ edge with NOPs to avoid WQEs wrap.
Here, instead of doing that in advance for the maximum possible
WQE size, we do it on-demand using the actual WQE size.
We re-order some parts in mlx5e_sq_xmit to finish the calculation
of WQE size (ds_cnt) before doing any writes to the WQE buffer.

When SQ work queue is fragmented (introduced in an downstream patch),
dealing with WQE wraps becomes more frequent. This change would drastically
reduce the overhead in this case.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

Before: 14.9 Mpps
After:  15.8 Mpps

Improvement of 6%.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

043dc78e

net/mlx5e: Use WQ API functions instead of direct fields access · ddf385e3

由 Tariq Toukan 提交于 5月 02, 2018

Use the WQ API to get the WQ size, and to map a counter
into a WQ entry index.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ddf385e3

25 5月, 2018 1 次提交

net/mlx5e: Move port speed code from en_ethtool.c to en/port.c · 2c81bfd5

由 Huy Nguyen 提交于 2月 22, 2018

Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Future code
can use these functions without ethtool link mode dependency.
  u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
  int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
  u32 mlx5e_port_speed2linkmodes(u32 speed);

Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2c81bfd5

18 5月, 2018 2 次提交

net/mlx5e: Use shared table for offloaded TC eswitch flows · 655dc3d2

由 Or Gerlitz 提交于 4月 10, 2018

Currently, each representor netdev use their own hash table to keep
the mapping from TC flow (f->cookie) to the driver offloaded instance.
The table is the one which originally was added for offloading TC NIC
(not eswitch) rules.

This scheme breaks when the core TC code calls us to add the same flow
twice, (e.g under egdev use case) since we don't spot that and offload
a 2nd flow into the HW with the wrong source vport.

As a pre-step to solve that, we move to use a single table which keeps
all offloaded TC eswitch flows. The table is located at the eswitch
uplink representor object.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

655dc3d2

net/mlx5e: Add ingress/egress indication for offloaded TC flows · 60bd4af8

由 Or Gerlitz 提交于 4月 18, 2018

When an e-switch TC rule is offloaded through the egdev (egress
device) mechanism, we treat this as egress, all other cases (NIC
and e-switch) are considred ingress.

This is preparation step that will allow us to  identify "wrong"
stat/del offload calls made by the TC core on egdev based flows and
ignore them.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

60bd4af8

15 5月, 2018 2 次提交

net/mlx5e: Use __set_bit for adaptive-moderation bit in RQ state · af5a6c93

由 Gal Pressman 提交于 1月 23, 2018

Make the code more clear by replacing the existing code with __set_bit.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

af5a6c93

net/mlx5e: Report all channels with min RX WQEs timeout · 1e7477ae

由 Eran Ben Elisha 提交于 3月 28, 2018

Report all channels which got timeout on posting the minimal number of
RX WQEs and not only the first one. Avoid busy wait on every channel,
when one of the RQs check got timeout, poll once for the remaining RQs.

In addition, add channel index to log when failed to get min RX WQEs
This info is needed in order to debug in case of dysfunctional channel.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

1e7477ae

05 5月, 2018 1 次提交

net/mlx5: Cleanup unused field in Work Queue parameters · 6fa242af

由 Tariq Toukan 提交于 2月 18, 2018

Remove the 'linear' field from struct mlx5_wq_param.
It is redundant, set but never read.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6fa242af

01 5月, 2018 3 次提交

net/mlx5e: TLS, Add error statistics · 43585a41

由 Ilya Lesokhin 提交于 4月 30, 2018

Add statistics for rare TLS related errors.
Since the errors are rare we have a counter per netdev
rather then per SQ.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43585a41

net/mlx5e: TLS, Add Innova TLS TX offload data path · bf239741

由 Ilya Lesokhin 提交于 4月 30, 2018

Implement the TLS tx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf239741

net/mlx5e: TLS, Add Innova TLS TX support · c83294b9

由 Ilya Lesokhin 提交于 4月 30, 2018

Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the
TLS generic NIC offload infrastructure.
The NETIF_F_HW_TLS_TX capability will be added in the next patch.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c83294b9

24 4月, 2018 2 次提交

net/mlx5e: Enable adaptive-TX moderation · cbce4f44

由 Tal Gilboa 提交于 4月 24, 2018

Add support for adaptive TX moderation. This greatly reduces TX interrupt
rate and increases bandwidth, mostly for TCP bandwidth over ARM
architecture (below). There is a slight single stream TCP with very large
message sizes degradation (x86). In this case if there's any moderation on
transmitted packets the bandwidth would reduce due to hitting TCP output limit.
Since this is a synthetic case, this is still worth doing.

Performance improvement (ConnectX-4Lx 40GbE, ARM)
TCP 64B bandwidth with 1-50 streams increased 6-35%.
TCP 64B bandwidth with 100-500 streams increased 20-70%.

Performance improvement (ConnectX-5 100GbE, x86)
Bandwidth: increased up to 40% (1024B with 10s of streams).
Interrupt rate: reduced up to 50% (1024B with 1000s of streams).

Performance degradation (ConnectX-5 100GbE, x86)
Bandwidth: up to 10% decrease single stream TCP (1MB message size from
51Gb/s to 47Gb/s).
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbce4f44

net/dim: Rename *_get_profile() functions to *_get_rx_moderation() · 026a807c

由 Tal Gilboa 提交于 4月 24, 2018

Preparation for introducing adaptive TX to net DIM.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

026a807c

17 4月, 2018 2 次提交

mlx5: use page_pool for xdp_return_frame call · 60bbf7ee

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

This patch shows how it is possible to have both the driver local page
cache, which uses elevated refcnt for "catching"/avoiding SKB
put_page returns the page through the page allocator.  And at the
same time, have pages getting returned to the page_pool from
ndp_xdp_xmit DMA completion.

The performance improvement for XDP_REDIRECT in this patch is really
good.  Especially considering that (currently) the xdp_return_frame
API and page_pool_put_page() does per frame operations of both
rhashtable ID-lookup and locked return into (page_pool) ptr_ring.
(It is the plan to remove these per frame operation in a followup
patchset).

The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe,
with xdp_redirect_map (using devmap) . And the target/maximum
capability of ixgbe is 13Mpps (on this HW setup).

Before this patch for mlx5, XDP redirected frames were returned via
the page allocator.  The single flow performance was 6Mpps, and if I
started two flows the collective performance drop to 4Mpps, because we
hit the page allocator lock (further negative scaling occurs).

Two test scenarios need to be covered, for xdp_return_frame API, which
is DMA-TX completion running on same-CPU or cross-CPU free/return.
Results were same-CPU=10Mpps, and cross-CPU=12Mpps.  This is very
close to our 13Mpps max target.

The reason max target isn't reached in cross-CPU test, is likely due
to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to
ixgbe testing).  It is also planned to remove this unnecessary DMA
unmap in a later patchset

V2: Adjustments requested by Tariq
 - Changed page_pool_create return codes not return NULL, only
   ERR_PTR, as this simplifies err handling in drivers.
 - Save a branch in mlx5e_page_release
 - Correct page_pool size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ

V5: Updated patch desc

V8: Adjust for b0cedc84 ("net/mlx5e: Remove rq_headroom field from params")
V9:
 - Adjust for 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
 - Adjust for 73281b78 ("net/mlx5e: Derive Striding RQ size from MTU")
 - Correct handling if page_pool_create fail for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ

V10: Req from Tariq
 - Change pool_size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60bbf7ee

mlx5: register a memory model when XDP is enabled · 84f5e3fb

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Now all the users of ndo_xdp_xmit have been converted to use xdp_return_frame.
This enable a different memory model, thus activating another code path
in the xdp_return_frame API.

V2: Fixed issues pointed out by Tariq.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84f5e3fb

06 4月, 2018 1 次提交

net/mlx5: Mkey creation command adjustments · cdbd0d2b

由 Ariel Levkovich 提交于 4月 05, 2018

This change updates the mlx5 interface to create mkey
on the device.

The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.

All the places that use the old access mode format are adjusted as
well.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cdbd0d2b

03 4月, 2018 1 次提交

net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth · 33523a36

由 Tal Gilboa 提交于 3月 30, 2018

Use the new pci_bandwidth_available() function to calculate maximum
available bandwidth through the PCI chain instead of computing it ourselves
with mlx5e_get_pci_bw().

This is used to detect when the device is capable of more bandwidth than is
available in the current slot.  The driver may adjust compression settings
accordingly.

Note that pci_bandwidth_available() accounts for PCIe encoding overhead, so
it is more accurate than mlx5e_get_pci_bw() was.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
[bhelgaas: remove mlx5e_get_pci_bw() wrapper altogether]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>

33523a36

02 4月, 2018 1 次提交

net/mlx5e: Set EQE based as default TX interrupt moderation mode · 48bfc397

由 Tal Gilboa 提交于 3月 30, 2018

The default TX moderation mode was mistakenly set to CQE based. The
intention was to add a control ability in order to improve some specific
use-cases. In general, we prefer to use EQE based moderation as it gives
much better numbers for the common cases.

CQE based causes a degradation in the common case since it resets the
moderation timer on CQE generation. This causes an issue when TSO is
well utilized (large TSO sessions). The timer is set to 16us so traffic
of ~64KB TSO sessions per second would mean timer reset (CQE per TSO
session -> long time between CQEs). In this case we quickly reach the
tcp_limit_output_bytes (256KB by default) and cause a halt in TX traffic.

By setting EQE based moderation we make sure timer would expire after
16us regardless of the packet rate.
This fixes an up to 40% packet rate and up to 23% bandwidth degradtions.

Fixes: 0088cbbc ("net/mlx5e: Enable CQE based moderation on TX CQ")
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48bfc397

31 3月, 2018 7 次提交

net/mlx5e: Keep single pre-initialized UMR WQE per RQ · b8a98a4c

由 Tariq Toukan 提交于 12月 20, 2017

All UMR WQEs of an RQ share many common fields. We use
pre-initialized structures to save calculations in datapath.
One field (xlt_offset) was the only reason we saved a pre-initialized
copy per WQE index.
Here we remove its initialization (move its calculation to datapath),
and reduce the number of copies to one-per-RQ.

A very small datapath calculation is added, it occurs once per a MPWQE
(i.e. once every 256KB), but reduces memory consumption and gives
better cache utilization.

Performance testing:
Tested packet rate, no degradation sensed.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b8a98a4c

net/mlx5e: Support XDP over Striding RQ · 22f45398

由 Tariq Toukan 提交于 2月 07, 2018

Add XDP support over Striding RQ.
Now that linear SKB is supported over Striding RQ,
we can support XDP by setting stride size to PAGE_SIZE
and headroom to XDP_PACKET_HEADROOM.

Upon a MPWQE free, do not release pages that are being
XDP xmit, they will be released upon completions.

Striding RQ is capable of a higher packet-rate than
conventional RQ.
A performance gain is expected for all cases that had
a HW packet-rate bottleneck. This is the case whenever
using many flows that distribute to many cores.

Performance testing:
ConnectX-5, 24 rings, default MTU.
CQE compression ON (to reduce completions BW in PCI).

XDP_DROP packet rate:
--------------------------------------------------
| pkt size | XDP rate   | 100GbE linerate | pct% |
--------------------------------------------------
|   64byte | 126.2 Mpps |      148.0 Mpps |  85% |
|  128byte |  80.0 Mpps |       84.8 Mpps |  94% |
|  256byte |  42.7 Mpps |       42.7 Mpps | 100% |
|  512byte |  23.4 Mpps |       23.4 Mpps | 100% |
--------------------------------------------------
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

22f45398

net/mlx5e: Use linear SKB in Striding RQ · 619a8f2a

由 Tariq Toukan 提交于 2月 07, 2018

Current Striding RQ HW feature utilizes the RX buffers so that
there is no wasted room between the strides. This maximises
the memory utilization.
This prevents the use of build_skb() (which requires headroom
and tailroom), and demands to memcpy the packets headers into
the skb linear part.

In this patch, whenever a set of conditions holds, we apply
an RQ configuration that allows combining the use of linear SKB
on top of a Striding RQ.

To use build_skb() with Striding RQ, the following must hold:
1. packet does not cross a page boundary.
2. there is enough headroom and tailroom surrounding the packet.

We can satisfy 1 and 2 by configuring:
	stride size = MTU + headroom + tailoom.

This is possible only when:
a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE.
b. HW LRO is turned off.

Using linear SKB has many advantages:
- Saves a memcpy of the headers.
- No page-boundary checks in datapath.
- No filler CQEs.
- Significantly smaller CQ.
- SKB data continuously resides in linear part, and not split to
  small amount (linear part) and large amount (fragment).
  This saves datapath cycles in driver and improves utilization
  of SKB fragments in GRO.
- The fragments of a resulting GRO SKB follow the IP forwarding
  assumption of equal-size fragments.

Some implementation details:
HW writes the packets to the beginning of a stride,
i.e. does not keep headroom. To overcome this we make sure we can
extend backwards and use the last bytes of stride i-1.
Extra care is needed for stride 0 as it has no preceding stride.
We make sure headroom bytes are available by shifting the buffer
pointer passed to HW by headroom bytes.

This configuration now becomes default, whenever capable.
Of course, this implies turning LRO off.

Performance testing:
ConnectX-5, single core, single RX ring, default MTU.

UDP packet rate, early drop in TC layer:

--------------------------------------------
| pkt size | before    | after     | ratio |
--------------------------------------------
| 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x |
|  500byte | 5.23 Mpps | 5.97 Mpps | 1.14x |
|   64byte | 5.94 Mpps | 5.96 Mpps | 1.00x |
--------------------------------------------

TCP streams: ~20% gain
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

619a8f2a

net/mlx5e: Use inline MTTs in UMR WQEs · ea3886ca

由 Tariq Toukan 提交于 7月 10, 2017

When modifying the page mapping of a HW memory region
(via a UMR post), post the new values inlined in WQE,
instead of using a data pointer.

This is a micro-optimization, inline UMR WQEs of different
rings scale better in HW.

In addition, this obsoletes a few control flows and helps
delete ~50 LOC.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ea3886ca

net/mlx5e: Derive Striding RQ size from MTU · 73281b78

由 Tariq Toukan 提交于 2月 11, 2018

In Striding RQ, each WQE serves multiple packets
(hence called Multi-Packet WQE, MPWQE).
The size of a MPWQE is constant (currently 256KB).

Upon a ringparam set operation, we calculate the number of
MPWQEs per RQ. For this, first it is needed to determine the
number of packets that can reside within a single MPWQE.
In this patch we use the actual MTU size instead of ETH_DATA_LEN
for this calculation.

This implies that a change in MTU might require a change
in Striding RQ ring size.

In addition, this obsoletes some WQEs-to-packets translation
functions and helps delete ~60 LOC.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

73281b78

net/mlx5e: Save MTU in channels params · 472a1e44

由 Tariq Toukan 提交于 3月 12, 2018

Knowing the MTU is required for RQ creation flow.
By our design, channels creation flow is totally isolated
from priv/netdev, and can be completed with access to
channels params and mdev.
Adding the MTU to the channels params helps preserving that.
In addition, we save it in RQ to make its access faster in
datapath checks.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

472a1e44

net/mlx5e: Use eq ptr from cq · 7b2117bb

由 Saeed Mahameed 提交于 2月 01, 2018

Instead of looking for the EQ of the CQ, remove that redundant code and
use the eq pointer stored in the cq struct.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7b2117bb

28 3月, 2018 9 次提交

net/mlx5e: Recover Send Queue (SQ) from error state · db75373c

由 Eran Ben Elisha 提交于 12月 26, 2017

An error TX completion (CQE) which arrived on a specific SQ indicates
that this SQ got moved by the hardware to error state, which means all
pending and incoming TX requests are dropped or will be dropped and no
further "Good" CQEs will be generated for that SQ.

Before this patch TX completions (CQEs) were not monitored and were
handled as a regular CQE. This caused the SQ to stay in an error state,
making it useless for xmiting new packets.

Mitigation plan:
In case of an error completion, schedule a recovery work which would do
the following:
- Mark the TXQ as DRV_XOFF to disable new packets to arrive from the
  stack
- NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to
  release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ
  consumer/producer indices synced.
- Modify the SQ state ERR -> RST -> RDY (restart the SQ).
- Reactivate the SQ and reset SQ cc and pc

If we identify two consecutive requests for SQ recover in less than
500 msecs, drop the recover request to avoid CPU overload, as this
scenario most likely happened due to a severe repeated bug.

In addition, add SQ recover SW counter to monitor successful recoveries.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

db75373c

net/mlx5e: Move all TX timeout logic to be under state lock · bfc647d5

由 Eran Ben Elisha 提交于 1月 16, 2018

Driver callback for handling TX timeout should access some internal
resources (SQ, CQ) in order to decide if the tx timeout work should be
scheduled.  These resources might be unavailable if channels are closed
in parallel (ifdown for example).

The state lock is the mechanism to protect from such races.
Move all TX timeout logic to be in the work under a state lock.

In addition, Move the work from the global WQ to mlx5e WQ to make sure
this work is flushed when device is detached..

Also, move the mlx5e_tx_timeout_work code to be next to the TX timeout
NDO for better code locality.

Fixes: 3947ca18 ("net/mlx5e: Implement ndo_tx_timeout callback")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

bfc647d5

net/mlx5e: Remove unused max inline related code · c4554fbc

由 Gal Pressman 提交于 1月 21, 2018

Commit 58d52291 ("net/mlx5e: Support TX packet copy into WQE")
introduced the max inline WQE as an ethtool tunable. One commit later,
that functionality was made dependent on BlueFlame.

Commit 6982ab60 ("net/mlx5e: Xmit, no write combining") removed
BlueFlame support, and with it the max inline WQE.
This patch cleans up the leftovers from the removed feature.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c4554fbc

net/mlx5e: Add ethtool priv-flag for Striding RQ · 2ccb0a79

由 Tariq Toukan 提交于 2月 07, 2018

Add a control private flag in ethtool to enable/disable
Striding RQ feature.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2ccb0a79

net/mlx5e: Do not reset Receive Queue params on every type change · 2a0f561b

由 Tariq Toukan 提交于 2月 18, 2018

Do not implicit a call to mlx5e_init_rq_type_params() upon every
change in RQ type. It should be called only on channels creation.

Fixes: 2fc4bfb7 ("net/mlx5e: Dynamic RQ type infrastructure")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2a0f561b

net/mlx5e: Remove rq_headroom field from params · b0cedc84

由 Tariq Toukan 提交于 2月 07, 2018

It can be derived from other params, calculate it
via the dedicated function when needed.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b0cedc84

net/mlx5e: Remove RQ MPWQE fields from params · f1e4fc9b

由 Tariq Toukan 提交于 2月 07, 2018

Introduce functions to calculate them when needed.
They can be derived from other params.
This will simplify transition between RQ configurations.

In general, any parameter that is not explicitly set
or controlled, but derived from other parameters,
should not have a control-path field itself, but a
getter function.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f1e4fc9b

net/mlx5e: Disable Striding RQ when PCI is slower than link · 291f445e

由 Tariq Toukan 提交于 2月 11, 2018

We turn the feature off for servers with PCI BW bounded
by a threshold (16G) and lower than MAX LINK BW.
This improves the effectiveness of CQE compression feature,
that is defaulted to ON for the same case.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

291f445e

net/mlx5e: Unify slow PCI heuristic · 0608d4db

由 Tariq Toukan 提交于 1月 17, 2018

Get the link/pci speed query and logic into a single function.
Unify the heuristics and use a single PCI threshold (16G) for all.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0608d4db

27 3月, 2018 4 次提交

net/mlx5e: Sync netdev vxlan ports at open · a117f73d

由 Shahar Klein 提交于 3月 20, 2018

When mlx5_core is loaded it is expected to sync ports
with all vxlan devices so it can support vxlan encap/decap.
This is done via udp_tunnel_get_rx_info(). Currently this
call is set in mlx5e_nic_enable() and if the netdev is not in
NETREG_REGISTERED state it will not be called.

Normally on load the netdev state is not NETREG_REGISTERED
so udp_tunnel_get_rx_info() will not be called.

Moving udp_tunnel_get_rx_info() to mlx5e_open() so
it will be called on netdev UP event and allow encap/decap.

Fixes: 610e89e0 ("net/mlx5e: Don't sync netdev state when not registered")
Signed-off-by: NShahar Klein <shahark@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a117f73d

net/mlx5: Make eswitch support to depend on switchdev · f125376b

由 Or Gerlitz 提交于 2月 15, 2018

Add dependancy for switchdev to be congfigured as any user-space control
plane SW is expected to use the HW switchdev ID to locate the representors
related to VFs of a certain PF and apply SW/offloaded switching on them.

Fixes: e80541ec ('net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f125376b

net/mlx5e: Add VLAN offload features to hw_enc_features · 71186172

由 Aviv Heller 提交于 8月 17, 2017

We support outer VLAN offload in driver and HW regardless of whether
an encapsulation is present in the next headers.

Exposing this in hw_enc_features will allow us to offload outer VLANs
in cases where encapsulation protocols like VXLAN and IPsec are used.
Signed-off-by: NAviv Heller <avivh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

71186172

net/mlx5e: Add a helper macro in set features ndo · be0f780b

由 Gal Pressman 提交于 1月 11, 2018

Add a new macro to prevent copy-pasting the same code for each new
feature.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

be0f780b