提交 · 43585a41bd894925abe5015edbce475beb3d8c10 · gsplhtlxg / clone-Linux

01 5月, 2018 3 次提交

net/mlx5e: TLS, Add error statistics · 43585a41

由 Ilya Lesokhin 提交于 4月 30, 2018

Add statistics for rare TLS related errors.
Since the errors are rare we have a counter per netdev
rather then per SQ.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43585a41

net/mlx5e: TLS, Add Innova TLS TX offload data path · bf239741

由 Ilya Lesokhin 提交于 4月 30, 2018

Implement the TLS tx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf239741

net/mlx5e: TLS, Add Innova TLS TX support · c83294b9

由 Ilya Lesokhin 提交于 4月 30, 2018

Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the
TLS generic NIC offload infrastructure.
The NETIF_F_HW_TLS_TX capability will be added in the next patch.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c83294b9

24 4月, 2018 2 次提交

net/mlx5e: Enable adaptive-TX moderation · cbce4f44

由 Tal Gilboa 提交于 4月 24, 2018

Add support for adaptive TX moderation. This greatly reduces TX interrupt
rate and increases bandwidth, mostly for TCP bandwidth over ARM
architecture (below). There is a slight single stream TCP with very large
message sizes degradation (x86). In this case if there's any moderation on
transmitted packets the bandwidth would reduce due to hitting TCP output limit.
Since this is a synthetic case, this is still worth doing.

Performance improvement (ConnectX-4Lx 40GbE, ARM)
TCP 64B bandwidth with 1-50 streams increased 6-35%.
TCP 64B bandwidth with 100-500 streams increased 20-70%.

Performance improvement (ConnectX-5 100GbE, x86)
Bandwidth: increased up to 40% (1024B with 10s of streams).
Interrupt rate: reduced up to 50% (1024B with 1000s of streams).

Performance degradation (ConnectX-5 100GbE, x86)
Bandwidth: up to 10% decrease single stream TCP (1MB message size from
51Gb/s to 47Gb/s).
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbce4f44

net/dim: Rename *_get_profile() functions to *_get_rx_moderation() · 026a807c

由 Tal Gilboa 提交于 4月 24, 2018

Preparation for introducing adaptive TX to net DIM.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

026a807c

17 4月, 2018 2 次提交

mlx5: use page_pool for xdp_return_frame call · 60bbf7ee

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

This patch shows how it is possible to have both the driver local page
cache, which uses elevated refcnt for "catching"/avoiding SKB
put_page returns the page through the page allocator.  And at the
same time, have pages getting returned to the page_pool from
ndp_xdp_xmit DMA completion.

The performance improvement for XDP_REDIRECT in this patch is really
good.  Especially considering that (currently) the xdp_return_frame
API and page_pool_put_page() does per frame operations of both
rhashtable ID-lookup and locked return into (page_pool) ptr_ring.
(It is the plan to remove these per frame operation in a followup
patchset).

The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe,
with xdp_redirect_map (using devmap) . And the target/maximum
capability of ixgbe is 13Mpps (on this HW setup).

Before this patch for mlx5, XDP redirected frames were returned via
the page allocator.  The single flow performance was 6Mpps, and if I
started two flows the collective performance drop to 4Mpps, because we
hit the page allocator lock (further negative scaling occurs).

Two test scenarios need to be covered, for xdp_return_frame API, which
is DMA-TX completion running on same-CPU or cross-CPU free/return.
Results were same-CPU=10Mpps, and cross-CPU=12Mpps.  This is very
close to our 13Mpps max target.

The reason max target isn't reached in cross-CPU test, is likely due
to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to
ixgbe testing).  It is also planned to remove this unnecessary DMA
unmap in a later patchset

V2: Adjustments requested by Tariq
 - Changed page_pool_create return codes not return NULL, only
   ERR_PTR, as this simplifies err handling in drivers.
 - Save a branch in mlx5e_page_release
 - Correct page_pool size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ

V5: Updated patch desc

V8: Adjust for b0cedc84 ("net/mlx5e: Remove rq_headroom field from params")
V9:
 - Adjust for 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
 - Adjust for 73281b78 ("net/mlx5e: Derive Striding RQ size from MTU")
 - Correct handling if page_pool_create fail for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ

V10: Req from Tariq
 - Change pool_size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60bbf7ee

mlx5: register a memory model when XDP is enabled · 84f5e3fb

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Now all the users of ndo_xdp_xmit have been converted to use xdp_return_frame.
This enable a different memory model, thus activating another code path
in the xdp_return_frame API.

V2: Fixed issues pointed out by Tariq.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84f5e3fb

06 4月, 2018 1 次提交

net/mlx5: Mkey creation command adjustments · cdbd0d2b

由 Ariel Levkovich 提交于 4月 05, 2018

This change updates the mlx5 interface to create mkey
on the device.

The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.

All the places that use the old access mode format are adjusted as
well.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cdbd0d2b

03 4月, 2018 1 次提交

net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth · 33523a36

由 Tal Gilboa 提交于 3月 30, 2018

Use the new pci_bandwidth_available() function to calculate maximum
available bandwidth through the PCI chain instead of computing it ourselves
with mlx5e_get_pci_bw().

This is used to detect when the device is capable of more bandwidth than is
available in the current slot.  The driver may adjust compression settings
accordingly.

Note that pci_bandwidth_available() accounts for PCIe encoding overhead, so
it is more accurate than mlx5e_get_pci_bw() was.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
[bhelgaas: remove mlx5e_get_pci_bw() wrapper altogether]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>

33523a36

02 4月, 2018 1 次提交

net/mlx5e: Set EQE based as default TX interrupt moderation mode · 48bfc397

由 Tal Gilboa 提交于 3月 30, 2018

The default TX moderation mode was mistakenly set to CQE based. The
intention was to add a control ability in order to improve some specific
use-cases. In general, we prefer to use EQE based moderation as it gives
much better numbers for the common cases.

CQE based causes a degradation in the common case since it resets the
moderation timer on CQE generation. This causes an issue when TSO is
well utilized (large TSO sessions). The timer is set to 16us so traffic
of ~64KB TSO sessions per second would mean timer reset (CQE per TSO
session -> long time between CQEs). In this case we quickly reach the
tcp_limit_output_bytes (256KB by default) and cause a halt in TX traffic.

By setting EQE based moderation we make sure timer would expire after
16us regardless of the packet rate.
This fixes an up to 40% packet rate and up to 23% bandwidth degradtions.

Fixes: 0088cbbc ("net/mlx5e: Enable CQE based moderation on TX CQ")
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48bfc397

31 3月, 2018 7 次提交

net/mlx5e: Keep single pre-initialized UMR WQE per RQ · b8a98a4c

由 Tariq Toukan 提交于 12月 20, 2017

All UMR WQEs of an RQ share many common fields. We use
pre-initialized structures to save calculations in datapath.
One field (xlt_offset) was the only reason we saved a pre-initialized
copy per WQE index.
Here we remove its initialization (move its calculation to datapath),
and reduce the number of copies to one-per-RQ.

A very small datapath calculation is added, it occurs once per a MPWQE
(i.e. once every 256KB), but reduces memory consumption and gives
better cache utilization.

Performance testing:
Tested packet rate, no degradation sensed.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b8a98a4c

net/mlx5e: Support XDP over Striding RQ · 22f45398

由 Tariq Toukan 提交于 2月 07, 2018

Add XDP support over Striding RQ.
Now that linear SKB is supported over Striding RQ,
we can support XDP by setting stride size to PAGE_SIZE
and headroom to XDP_PACKET_HEADROOM.

Upon a MPWQE free, do not release pages that are being
XDP xmit, they will be released upon completions.

Striding RQ is capable of a higher packet-rate than
conventional RQ.
A performance gain is expected for all cases that had
a HW packet-rate bottleneck. This is the case whenever
using many flows that distribute to many cores.

Performance testing:
ConnectX-5, 24 rings, default MTU.
CQE compression ON (to reduce completions BW in PCI).

XDP_DROP packet rate:
--------------------------------------------------
| pkt size | XDP rate   | 100GbE linerate | pct% |
--------------------------------------------------
|   64byte | 126.2 Mpps |      148.0 Mpps |  85% |
|  128byte |  80.0 Mpps |       84.8 Mpps |  94% |
|  256byte |  42.7 Mpps |       42.7 Mpps | 100% |
|  512byte |  23.4 Mpps |       23.4 Mpps | 100% |
--------------------------------------------------
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

22f45398

net/mlx5e: Use linear SKB in Striding RQ · 619a8f2a

由 Tariq Toukan 提交于 2月 07, 2018

Current Striding RQ HW feature utilizes the RX buffers so that
there is no wasted room between the strides. This maximises
the memory utilization.
This prevents the use of build_skb() (which requires headroom
and tailroom), and demands to memcpy the packets headers into
the skb linear part.

In this patch, whenever a set of conditions holds, we apply
an RQ configuration that allows combining the use of linear SKB
on top of a Striding RQ.

To use build_skb() with Striding RQ, the following must hold:
1. packet does not cross a page boundary.
2. there is enough headroom and tailroom surrounding the packet.

We can satisfy 1 and 2 by configuring:
	stride size = MTU + headroom + tailoom.

This is possible only when:
a. (MTU - headroom - tailoom) does not exceed PAGE_SIZE.
b. HW LRO is turned off.

Using linear SKB has many advantages:
- Saves a memcpy of the headers.
- No page-boundary checks in datapath.
- No filler CQEs.
- Significantly smaller CQ.
- SKB data continuously resides in linear part, and not split to
  small amount (linear part) and large amount (fragment).
  This saves datapath cycles in driver and improves utilization
  of SKB fragments in GRO.
- The fragments of a resulting GRO SKB follow the IP forwarding
  assumption of equal-size fragments.

Some implementation details:
HW writes the packets to the beginning of a stride,
i.e. does not keep headroom. To overcome this we make sure we can
extend backwards and use the last bytes of stride i-1.
Extra care is needed for stride 0 as it has no preceding stride.
We make sure headroom bytes are available by shifting the buffer
pointer passed to HW by headroom bytes.

This configuration now becomes default, whenever capable.
Of course, this implies turning LRO off.

Performance testing:
ConnectX-5, single core, single RX ring, default MTU.

UDP packet rate, early drop in TC layer:

--------------------------------------------
| pkt size | before    | after     | ratio |
--------------------------------------------
| 1500byte | 4.65 Mpps | 5.96 Mpps | 1.28x |
|  500byte | 5.23 Mpps | 5.97 Mpps | 1.14x |
|   64byte | 5.94 Mpps | 5.96 Mpps | 1.00x |
--------------------------------------------

TCP streams: ~20% gain
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

619a8f2a

net/mlx5e: Use inline MTTs in UMR WQEs · ea3886ca

由 Tariq Toukan 提交于 7月 10, 2017

When modifying the page mapping of a HW memory region
(via a UMR post), post the new values inlined in WQE,
instead of using a data pointer.

This is a micro-optimization, inline UMR WQEs of different
rings scale better in HW.

In addition, this obsoletes a few control flows and helps
delete ~50 LOC.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ea3886ca

net/mlx5e: Derive Striding RQ size from MTU · 73281b78

由 Tariq Toukan 提交于 2月 11, 2018

In Striding RQ, each WQE serves multiple packets
(hence called Multi-Packet WQE, MPWQE).
The size of a MPWQE is constant (currently 256KB).

Upon a ringparam set operation, we calculate the number of
MPWQEs per RQ. For this, first it is needed to determine the
number of packets that can reside within a single MPWQE.
In this patch we use the actual MTU size instead of ETH_DATA_LEN
for this calculation.

This implies that a change in MTU might require a change
in Striding RQ ring size.

In addition, this obsoletes some WQEs-to-packets translation
functions and helps delete ~60 LOC.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

73281b78

net/mlx5e: Save MTU in channels params · 472a1e44

由 Tariq Toukan 提交于 3月 12, 2018

Knowing the MTU is required for RQ creation flow.
By our design, channels creation flow is totally isolated
from priv/netdev, and can be completed with access to
channels params and mdev.
Adding the MTU to the channels params helps preserving that.
In addition, we save it in RQ to make its access faster in
datapath checks.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

472a1e44

net/mlx5e: Use eq ptr from cq · 7b2117bb

由 Saeed Mahameed 提交于 2月 01, 2018

Instead of looking for the EQ of the CQ, remove that redundant code and
use the eq pointer stored in the cq struct.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7b2117bb

28 3月, 2018 9 次提交

net/mlx5e: Recover Send Queue (SQ) from error state · db75373c

由 Eran Ben Elisha 提交于 12月 26, 2017

An error TX completion (CQE) which arrived on a specific SQ indicates
that this SQ got moved by the hardware to error state, which means all
pending and incoming TX requests are dropped or will be dropped and no
further "Good" CQEs will be generated for that SQ.

Before this patch TX completions (CQEs) were not monitored and were
handled as a regular CQE. This caused the SQ to stay in an error state,
making it useless for xmiting new packets.

Mitigation plan:
In case of an error completion, schedule a recovery work which would do
the following:
- Mark the TXQ as DRV_XOFF to disable new packets to arrive from the
  stack
- NAPI to flush all pending SQ WQEs (via flush_in_error_en bit) to
  release SW and HW resources(SKB, DMA, etc) and have the SQ and CQ
  consumer/producer indices synced.
- Modify the SQ state ERR -> RST -> RDY (restart the SQ).
- Reactivate the SQ and reset SQ cc and pc

If we identify two consecutive requests for SQ recover in less than
500 msecs, drop the recover request to avoid CPU overload, as this
scenario most likely happened due to a severe repeated bug.

In addition, add SQ recover SW counter to monitor successful recoveries.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

db75373c

net/mlx5e: Move all TX timeout logic to be under state lock · bfc647d5

由 Eran Ben Elisha 提交于 1月 16, 2018

Driver callback for handling TX timeout should access some internal
resources (SQ, CQ) in order to decide if the tx timeout work should be
scheduled.  These resources might be unavailable if channels are closed
in parallel (ifdown for example).

The state lock is the mechanism to protect from such races.
Move all TX timeout logic to be in the work under a state lock.

In addition, Move the work from the global WQ to mlx5e WQ to make sure
this work is flushed when device is detached..

Also, move the mlx5e_tx_timeout_work code to be next to the TX timeout
NDO for better code locality.

Fixes: 3947ca18 ("net/mlx5e: Implement ndo_tx_timeout callback")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

bfc647d5

net/mlx5e: Remove unused max inline related code · c4554fbc

由 Gal Pressman 提交于 1月 21, 2018

Commit 58d52291 ("net/mlx5e: Support TX packet copy into WQE")
introduced the max inline WQE as an ethtool tunable. One commit later,
that functionality was made dependent on BlueFlame.

Commit 6982ab60 ("net/mlx5e: Xmit, no write combining") removed
BlueFlame support, and with it the max inline WQE.
This patch cleans up the leftovers from the removed feature.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c4554fbc

net/mlx5e: Add ethtool priv-flag for Striding RQ · 2ccb0a79

由 Tariq Toukan 提交于 2月 07, 2018

Add a control private flag in ethtool to enable/disable
Striding RQ feature.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2ccb0a79

net/mlx5e: Do not reset Receive Queue params on every type change · 2a0f561b

由 Tariq Toukan 提交于 2月 18, 2018

Do not implicit a call to mlx5e_init_rq_type_params() upon every
change in RQ type. It should be called only on channels creation.

Fixes: 2fc4bfb7 ("net/mlx5e: Dynamic RQ type infrastructure")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2a0f561b

net/mlx5e: Remove rq_headroom field from params · b0cedc84

由 Tariq Toukan 提交于 2月 07, 2018

It can be derived from other params, calculate it
via the dedicated function when needed.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b0cedc84

net/mlx5e: Remove RQ MPWQE fields from params · f1e4fc9b

由 Tariq Toukan 提交于 2月 07, 2018

Introduce functions to calculate them when needed.
They can be derived from other params.
This will simplify transition between RQ configurations.

In general, any parameter that is not explicitly set
or controlled, but derived from other parameters,
should not have a control-path field itself, but a
getter function.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f1e4fc9b

net/mlx5e: Disable Striding RQ when PCI is slower than link · 291f445e

由 Tariq Toukan 提交于 2月 11, 2018

We turn the feature off for servers with PCI BW bounded
by a threshold (16G) and lower than MAX LINK BW.
This improves the effectiveness of CQE compression feature,
that is defaulted to ON for the same case.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

291f445e

net/mlx5e: Unify slow PCI heuristic · 0608d4db

由 Tariq Toukan 提交于 1月 17, 2018

Get the link/pci speed query and logic into a single function.
Unify the heuristics and use a single PCI threshold (16G) for all.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0608d4db

27 3月, 2018 6 次提交

net/mlx5e: Sync netdev vxlan ports at open · a117f73d

由 Shahar Klein 提交于 3月 20, 2018

When mlx5_core is loaded it is expected to sync ports
with all vxlan devices so it can support vxlan encap/decap.
This is done via udp_tunnel_get_rx_info(). Currently this
call is set in mlx5e_nic_enable() and if the netdev is not in
NETREG_REGISTERED state it will not be called.

Normally on load the netdev state is not NETREG_REGISTERED
so udp_tunnel_get_rx_info() will not be called.

Moving udp_tunnel_get_rx_info() to mlx5e_open() so
it will be called on netdev UP event and allow encap/decap.

Fixes: 610e89e0 ("net/mlx5e: Don't sync netdev state when not registered")
Signed-off-by: NShahar Klein <shahark@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a117f73d

net/mlx5: Make eswitch support to depend on switchdev · f125376b

由 Or Gerlitz 提交于 2月 15, 2018

Add dependancy for switchdev to be congfigured as any user-space control
plane SW is expected to use the HW switchdev ID to locate the representors
related to VFs of a certain PF and apply SW/offloaded switching on them.

Fixes: e80541ec ('net/mlx5: Add CONFIG_MLX5_ESWITCH Kconfig')
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f125376b

net/mlx5e: Add VLAN offload features to hw_enc_features · 71186172

由 Aviv Heller 提交于 8月 17, 2017

We support outer VLAN offload in driver and HW regardless of whether
an encapsulation is present in the next headers.

Exposing this in hw_enc_features will allow us to offload outer VLANs
in cases where encapsulation protocols like VXLAN and IPsec are used.
Signed-off-by: NAviv Heller <avivh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

71186172

net/mlx5e: Add a helper macro in set features ndo · be0f780b

由 Gal Pressman 提交于 1月 11, 2018

Add a new macro to prevent copy-pasting the same code for each new
feature.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

be0f780b

net/mlx5e: Make choose LRO timeout function static · 707129dc

由 Gal Pressman 提交于 1月 31, 2018

The function is used in en_main.c only, we can make it static and remove
its declaration from en.h
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

707129dc

net/mlx5e: Add interface down dropped packets statistics · 7cbaf9a3

由 Moshe Shemesh 提交于 2月 08, 2018

Added the following packets drop counter:
Rx interface down dropped packets - counts packets which were received
while the ETH interface was down.
This counter will be shown on ethtool as a new counter called
rx_if_down_packets.

The implementation allocates a q_counter for drop rq which gets all the
received traffic while the interface is down.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7cbaf9a3

20 3月, 2018 1 次提交

net/mlx5: Packet pacing enhancement · 05d3ac97

由 Bodong Wang 提交于 3月 19, 2018

Add two new parameters: max_burst_sz and typical_pkt_size (both
in bytes) to rate limit configurations.

max_burst_sz: The device will schedule bursts of packets for an
SQ connected to this rate, smaller than or equal to this value.
Value 0x0 indicates packet bursts will be limited to the device
defaults. This field should be used if bursts of packets must be
strictly kept under a certain value.

typical_pkt_size: When the rate limit is intended for a stream of
similar packets, stating the typical packet size can improve the
accuracy of the rate limiter. The expected packet size will be
the same for all SQs associated with the same rate limit index.

Ethernet driver is updated according to this change, but these two
parameters will be kept as 0 due to lacking of proper way to get the
configurations from user space which requires to change
ndo_set_tx_maxrate interface.
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

05d3ac97

21 2月, 2018 2 次提交

net/mlx5e: Specify numa node when allocating drop rq · 2f0db879

由 Gal Pressman 提交于 1月 25, 2018

When allocating a drop rq, no numa node is explicitly set which means
allocations are done on node zero. This is not necessarily the nearest
numa node to the HCA, and even worse, might even be a memoryless numa
node.

Choose the numa_node given to us by the pci device in order to properly
allocate the coherent dma memory instead of assuming zero is valid.

Fixes: 556dd1b9 ("net/mlx5e: Set drop RQ's necessary parameters only")
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2f0db879

net/mlx5e: Eliminate build warnings on no previous prototype · 9afe9a53

由 Or Gerlitz 提交于 1月 01, 2018

Fix these gcc warnings on drivers/net/ethernet/mellanox/mlx5:

[..]/core/lib/clock.c:454:6: warning: no previous prototype for 'mlx5_init_clock' [-Wmissing-prototypes]
[..]/core/lib/clock.c:510:6: warning: no previous prototype for 'mlx5_cleanup_clock' [-Wmissing-prototypes]
[..]/core/en_main.c:3141:5: warning: no previous prototype for 'mlx5e_setup_tc' [-Wmissing-prototypes]
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: NMatan Barak <matanb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

9afe9a53

26 1月, 2018 1 次提交

mlx5: use tc_cls_can_offload_and_chain0() · 9ab88e83

由 Jakub Kicinski 提交于 1月 25, 2018

Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ab88e83

20 1月, 2018 4 次提交

net/mlx5e: Extend the stats group API to have update_stats() · 19386177

由 Kamal Heib 提交于 11月 28, 2017

Extend the stats group API to have an update_stats() callback which
will be used to fetch the hardware or software counters data.
Signed-off-by: NKamal Heib <kamalh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

19386177

net/mlx5e: Add per-channel counters infrastructure, use it upon TX timeout · 57d689a8

由 Eran Ben Elisha 提交于 12月 19, 2017

Add per-channel counter ch#_eq_rearm to monitor how many lost interrupt
recovery actions happened upon TX timeouts.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

57d689a8

net/mlx5e: Poll event queue upon TX timeout before performing full channels recovery · 7ca560b5

由 Eran Ben Elisha 提交于 12月 19, 2017

Up until this patch, on every TX timeout we would try to do channels
recovery. However, in case of a lost interrupt for an EQ, the channel
associated to it cannot be recovered if reopened as it would never get
another interrupt on sent/received traffic, and eventually ends up with
another TX timeout (Restarting the EQ is not part of channel recovery).

This patch adds a mechanism for explicitly polling EQ in case of a TX
timeout in order to recover from a lost interrupt. If this is not the
case (no pending EQEs), perform a channels full recovery as usual.

Once a lost EQE is recovered, it triggers the NAPI to run and handle all
pending completions. This will free some budget in the bql (via calling
netdev_tx_completed_queue) or by clearing pending TXWQEs and waking up
the queue. One of the above actions will move the queue to be ready for
transmit again.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7ca560b5

net/mlx5e: Add Event Queue meta data info for TX timeout logs · 3a32b26a

由 Eran Ben Elisha 提交于 12月 13, 2017

When TX timeout occurs, EQ consumer index and irqn can help in debug for
understanding the SW state of EQ. Add them to the logger prints for the
relevant EQ only.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

3a32b26a