提交 · 43585a41bd894925abe5015edbce475beb3d8c10 · gsplhtlxg / clone-Linux

01 5月, 2018 10 次提交

net/mlx5e: TLS, Add error statistics · 43585a41

由 Ilya Lesokhin 提交于 4月 30, 2018

Add statistics for rare TLS related errors.
Since the errors are rare we have a counter per netdev
rather then per SQ.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43585a41

net/mlx5e: TLS, Add Innova TLS TX offload data path · bf239741

由 Ilya Lesokhin 提交于 4月 30, 2018

Implement the TLS tx offload data path according to the
requirements of the TLS generic NIC offload infrastructure.

Special metadata ethertype is used to pass information to
the hardware.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf239741

net/mlx5e: TLS, Add Innova TLS TX support · c83294b9

由 Ilya Lesokhin 提交于 4月 30, 2018

Add NETIF_F_HW_TLS_TX capability and expose tlsdev_ops to work with the
TLS generic NIC offload infrastructure.
The NETIF_F_HW_TLS_TX capability will be added in the next patch.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c83294b9

net/mlx5: Accel, Add TLS tx offload interface · 1ae17322

由 Ilya Lesokhin 提交于 4月 30, 2018

Add routines for manipulating TLS TX offload contexts.

In Innova TLS, TLS contexts are added or deleted
via a command message over the SBU connection.
The HW then sends a response message over the same connection.

Add implementation for Innova TLS (FPGA-based) hardware.

These routines will be used by the TLS offload support in a later patch

mlx5/accel is a middle acceleration layer to allow mlx5e and other ULPs
to work directly with mlx5_core rather than Innova FPGA or other mlx5
acceleration providers.

In the future, when IPSec/TLS or any other acceleration gets integrated
into ConnectX chip, mlx5/accel layer will provide the integrated
acceleration, rather than the Innova one.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ae17322

net/mlx5e: Move defines out of ipsec code · bb909416

由 Ilya Lesokhin 提交于 4月 30, 2018

The defines are not IPSEC specific.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb909416

mlxsw: spectrum_span: Allow bridge for gretap mirror · 946a11e7

由 Petr Machata 提交于 4月 29, 2018

When handling mirroring to a gretap or ip6gretap netdevice in mlxsw, the
underlay address (i.e. the remote address of the tunnel) may be routed
to a bridge.

In that case, look up the resolved neighbor Ethernet address in that
bridge's FDB. Then configure the offload to direct the mirrored traffic
to that port, possibly with tagging.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

946a11e7

mlxsw: Respin SPAN on switchdev events · c520bc69

由 Petr Machata 提交于 4月 29, 2018

Changes to switchdev artifact can make a SPAN entry offloadable or
unoffloadable. To that end:

- Listen to SWITCHDEV_FDB_*_TO_BRIDGE notifications in addition to
  the *_TO_DEVICE ones, to catch whatever activity is sent to the
  bridge (likely by mlxsw itself).

  On each FDB notification, respin SPAN to reconcile it with the FDB
  changes.

- Also respin on switchdev port attribute changes (which currently
  covers changes to STP state of ports) and port object additions and
  removals.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c520bc69

mlxsw: spectrum: Register SPAN before switchdev · cda880de

由 Petr Machata 提交于 4月 29, 2018

Since switchdev events can trigger SPAN respin, it is necessary that the
data structures are available. Register SPAN first, with a commentary on
what the dependencies are.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cda880de

mlxsw: spectrum_switchdev: Publish two functions · ea93c7b6

由 Petr Machata 提交于 4月 29, 2018

Publish the existing function mlxsw_sp_bridge_port_find(), and add
another service accessor mlxsw_sp_bridge_port_stp_state(). Publish both
in a new file spectrum_switchdev.h.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea93c7b6

mlxsw: spectrum: Extract mlxsw_sp_stp_spms_state() · 541e1159

由 Petr Machata 提交于 4月 29, 2018

Instead of duplicating the decision regarding port forwarding state made
by mlxsw_sp_port_vid_stp_set(), extract the decision-making into a new
function and reuse.
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

541e1159

30 4月, 2018 1 次提交

mlx4: Don't bother using skb_tx_hash in mlx4_en_select_queue · b86629eb

由 Alexander Duyck 提交于 4月 27, 2018

The code in the fallback path has supported XDP in conjunction with the Tx
traffic classification for TCs for over a year now. So instead of just
calling skb_tx_hash for every packet we are better off using the fallback
since that will record the Tx queue to the socket and then that can be used
instead of having to recompute the hash every time.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b86629eb

24 4月, 2018 2 次提交

net/mlx5e: Enable adaptive-TX moderation · cbce4f44

由 Tal Gilboa 提交于 4月 24, 2018

Add support for adaptive TX moderation. This greatly reduces TX interrupt
rate and increases bandwidth, mostly for TCP bandwidth over ARM
architecture (below). There is a slight single stream TCP with very large
message sizes degradation (x86). In this case if there's any moderation on
transmitted packets the bandwidth would reduce due to hitting TCP output limit.
Since this is a synthetic case, this is still worth doing.

Performance improvement (ConnectX-4Lx 40GbE, ARM)
TCP 64B bandwidth with 1-50 streams increased 6-35%.
TCP 64B bandwidth with 100-500 streams increased 20-70%.

Performance improvement (ConnectX-5 100GbE, x86)
Bandwidth: increased up to 40% (1024B with 10s of streams).
Interrupt rate: reduced up to 50% (1024B with 1000s of streams).

Performance degradation (ConnectX-5 100GbE, x86)
Bandwidth: up to 10% decrease single stream TCP (1MB message size from
51Gb/s to 47Gb/s).
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbce4f44

net/dim: Rename *_get_profile() functions to *_get_rx_moderation() · 026a807c

由 Tal Gilboa 提交于 4月 24, 2018

Preparation for introducing adaptive TX to net DIM.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

026a807c

20 4月, 2018 1 次提交

net/ipv6: Rename fib6_info struct elements · 93c2fb25

由 David Ahern 提交于 4月 18, 2018

Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.

Rename only; not functional change intended.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93c2fb25

19 4月, 2018 1 次提交

bpf: make mlx4 compatible w/ bpf_xdp_adjust_tail · e5e0a59b

由 Nikita V. Shirokov 提交于 4月 17, 2018

w/ bpf_xdp_adjust_tail helper xdp's data_end pointer could be changed as
well (only "decrease" of pointer's location is going to be supported).
changing of this pointer will change packet's size.
for mlx4 driver we will just calculate packet's length unconditionally
(the same way as it's already being done in mlx5)
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NNikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

e5e0a59b

18 4月, 2018 2 次提交

net/ipv6: Flip FIB entries to fib6_info · 8d1c802b

由 David Ahern 提交于 4月 17, 2018

Convert all code paths referencing a FIB entry from
rt6_info to fib6_info.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d1c802b

net/ipv6: Move nexthop data to fib6_nh · 5e670d84

由 David Ahern 提交于 4月 17, 2018

Introduce fib6_nh structure and move nexthop related data from
rt6_info and rt6_info.dst to fib6_nh. References to dev, gateway or
lwtstate from a FIB lookup perspective are converted to use fib6_nh;
datapath references to dst version are left as is.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e670d84

17 4月, 2018 5 次提交

xdp: transition into using xdp_frame for return API · 03993094

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Changing API xdp_return_frame() to take struct xdp_frame as argument,
seems like a natural choice. But there are some subtle performance
details here that needs extra care, which is a deliberate choice.

When de-referencing xdp_frame on a remote CPU during DMA-TX
completion, result in the cache-line is change to "Shared"
state. Later when the page is reused for RX, then this xdp_frame
cache-line is written, which change the state to "Modified".

This situation already happens (naturally) for, virtio_net, tun and
cpumap as the xdp_frame pointer is the queued object.  In tun and
cpumap, the ptr_ring is used for efficiently transferring cache-lines
(with pointers) between CPUs. Thus, the only option is to
de-referencing xdp_frame.

It is only the ixgbe driver that had an optimization, in which it can
avoid doing the de-reference of xdp_frame.  The driver already have
TX-ring queue, which (in case of remote DMA-TX completion) have to be
transferred between CPUs anyhow.  In this data area, we stored a
struct xdp_mem_info and a data pointer, which allowed us to avoid
de-referencing xdp_frame.

To compensate for this, a prefetchw is used for telling the cache
coherency protocol about our access pattern.  My benchmarks show that
this prefetchw is enough to compensate the ixgbe driver.

V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
V8: Adjust for commit bd658dda ("net/mlx5e: Separate dma base address
and offset in dma_sync call")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03993094

mlx5: use page_pool for xdp_return_frame call · 60bbf7ee

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

This patch shows how it is possible to have both the driver local page
cache, which uses elevated refcnt for "catching"/avoiding SKB
put_page returns the page through the page allocator.  And at the
same time, have pages getting returned to the page_pool from
ndp_xdp_xmit DMA completion.

The performance improvement for XDP_REDIRECT in this patch is really
good.  Especially considering that (currently) the xdp_return_frame
API and page_pool_put_page() does per frame operations of both
rhashtable ID-lookup and locked return into (page_pool) ptr_ring.
(It is the plan to remove these per frame operation in a followup
patchset).

The benchmark performed was RX on mlx5 and XDP_REDIRECT out ixgbe,
with xdp_redirect_map (using devmap) . And the target/maximum
capability of ixgbe is 13Mpps (on this HW setup).

Before this patch for mlx5, XDP redirected frames were returned via
the page allocator.  The single flow performance was 6Mpps, and if I
started two flows the collective performance drop to 4Mpps, because we
hit the page allocator lock (further negative scaling occurs).

Two test scenarios need to be covered, for xdp_return_frame API, which
is DMA-TX completion running on same-CPU or cross-CPU free/return.
Results were same-CPU=10Mpps, and cross-CPU=12Mpps.  This is very
close to our 13Mpps max target.

The reason max target isn't reached in cross-CPU test, is likely due
to RX-ring DMA unmap/map overhead (which doesn't occur in ixgbe to
ixgbe testing).  It is also planned to remove this unnecessary DMA
unmap in a later patchset

V2: Adjustments requested by Tariq
 - Changed page_pool_create return codes not return NULL, only
   ERR_PTR, as this simplifies err handling in drivers.
 - Save a branch in mlx5e_page_release
 - Correct page_pool size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ

V5: Updated patch desc

V8: Adjust for b0cedc84 ("net/mlx5e: Remove rq_headroom field from params")
V9:
 - Adjust for 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
 - Adjust for 73281b78 ("net/mlx5e: Derive Striding RQ size from MTU")
 - Correct handling if page_pool_create fail for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ

V10: Req from Tariq
 - Change pool_size calc for MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60bbf7ee

page_pool: refurbish version of page_pool code · ff7d6b27

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Need a fast page recycle mechanism for ndo_xdp_xmit API for returning
pages on DMA-TX completion time, which have good cross CPU
performance, given DMA-TX completion time can happen on a remote CPU.

Refurbish my page_pool code, that was presented[1] at MM-summit 2016.
Adapted page_pool code to not depend the page allocator and
integration into struct page.  The DMA mapping feature is kept,
even-though it will not be activated/used in this patchset.

[1] http://people.netfilter.org/hawk/presentations/MM-summit2016/generic_page_pool_mm_summit2016.pdf

V2: Adjustments requested by Tariq
 - Changed page_pool_create return codes, don't return NULL, only
   ERR_PTR, as this simplifies err handling in drivers.

V4: many small improvements and cleanups
- Add DOC comment section, that can be used by kernel-doc
- Improve fallback mode, to work better with refcnt based recycling
  e.g. remove a WARN as pointed out by Tariq
  e.g. quicker fallback if ptr_ring is empty.

V5: Fixed SPDX license as pointed out by Alexei

V6: Adjustments requested by Eric Dumazet
 - Adjust ____cacheline_aligned_in_smp usage/placement
 - Move rcu_head in struct page_pool
 - Free pages quicker on destroy, minimize resources delayed an RCU period
 - Remove code for forward/backward compat ABI interface

V8: Issues found by kbuild test robot
 - Address sparse should be static warnings
 - Only compile+link when a driver use/select page_pool,
   mlx5 selects CONFIG_PAGE_POOL, although its first used in two patches
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff7d6b27

mlx5: register a memory model when XDP is enabled · 84f5e3fb

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Now all the users of ndo_xdp_xmit have been converted to use xdp_return_frame.
This enable a different memory model, thus activating another code path
in the xdp_return_frame API.

V2: Fixed issues pointed out by Tariq.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84f5e3fb

mlx5: basic XDP_REDIRECT forward support · 5168d732

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

This implements basic XDP redirect support in mlx5 driver.

Notice that the ndo_xdp_xmit() is NOT implemented, because that API
need some changes that this patchset is working towards.

The main purpose of this patch is have different drivers doing
XDP_REDIRECT to show how different memory models behave in a cross
driver world.

Update(pre-RFCv2 Tariq): Need to DMA unmap page before xdp_do_redirect,
as the return API does not exist yet to to keep this mapped.

Update(pre-RFCv3 Saeed): Don't mix XDP_TX and XDP_REDIRECT flushing,
introduce xdpsq.db.redirect_flush boolian.

V9: Adjust for commit 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5168d732

09 4月, 2018 1 次提交

devlink: convert occ_get op to separate registration · fc56be47

由 Jiri Pirko 提交于 4月 05, 2018

This resolves race during initialization where the resources with
ops are registered before driver and the structures used by occ_get
op is initialized. So keep occ_get callbacks registered only when
all structs are initialized.

The example flows, as it is in mlxsw:
1) driver load/asic probe:
   mlxsw_core
      -> mlxsw_sp_resources_register
        -> mlxsw_sp_kvdl_resources_register
          -> devlink_resource_register IDX
   mlxsw_spectrum
      -> mlxsw_sp_kvdl_init
        -> mlxsw_sp_kvdl_parts_init
          -> mlxsw_sp_kvdl_part_init
            -> devlink_resource_size_get IDX (to get the current setup
                                              size from devlink)
        -> devlink_resource_occ_get_register IDX (register current
                                                  occupancy getter)
2) reload triggered by devlink command:
  -> mlxsw_devlink_core_bus_device_reload
    -> mlxsw_sp_fini
      -> mlxsw_sp_kvdl_fini
	-> devlink_resource_occ_get_unregister IDX
    (struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
     which is using mlxsw_sp would cause use-after free)
    -> mlxsw_sp_init
      -> mlxsw_sp_kvdl_init
        -> mlxsw_sp_kvdl_parts_init
          -> mlxsw_sp_kvdl_part_init
            -> devlink_resource_size_get IDX (to get the current setup
                                              size from devlink)
        -> devlink_resource_occ_get_register IDX (register current
                                                  occupancy getter)

Fixes: d9f9b9a4 ("devlink: Add support for resource abstraction")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc56be47

06 4月, 2018 2 次提交

net/mlx5: Mkey creation command adjustments · cdbd0d2b

由 Ariel Levkovich 提交于 4月 05, 2018

This change updates the mlx5 interface to create mkey
on the device.

The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.

All the places that use the old access mode format are adjusted as
well.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cdbd0d2b

net/mlx5: Query device memory capabilities · e72bd817

由 Ariel Levkovich 提交于 4月 05, 2018

This patch adds querying of device memory capabilities by the mlx5_core
driver during initialization.

Device memory capabilities is a new capability type and structure
which contains the necessary data that is needed for future device
memory allocation.

The presence of this new capabilities struct is indicated in the
general capabilities struct which is queried first by the driver.
If the presence bit is set, the driver will also query the new
capabilities struct and save it in the device context.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

e72bd817

03 4月, 2018 3 次提交

net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth · 33523a36

由 Tal Gilboa 提交于 3月 30, 2018

Use the new pci_bandwidth_available() function to calculate maximum
available bandwidth through the PCI chain instead of computing it ourselves
with mlx5e_get_pci_bw().

This is used to detect when the device is capable of more bandwidth than is
available in the current slot.  The driver may adjust compression settings
accordingly.

Note that pci_bandwidth_available() accounts for PCIe encoding overhead, so
it is more accurate than mlx5e_get_pci_bw() was.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
[bhelgaas: remove mlx5e_get_pci_bw() wrapper altogether]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>

33523a36

net/mlx5: Report PCIe link properties with pcie_print_link_status() · 00c6bcb0

由 Tal Gilboa 提交于 3月 30, 2018

Use pcie_print_link_status() to report PCIe link speed and possible
limitations.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
[bhelgaas: changelog]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>

00c6bcb0

net/mlx4_core: Report PCIe link properties with pcie_print_link_status() · 190b509c

由 Tal Gilboa 提交于 3月 30, 2018

Use pcie_print_link_status() to report PCIe link speed and possible
limitations instead of implementing this in the driver itself.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
[bhelgaas: changelog]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>

190b509c

02 4月, 2018 2 次提交

net/mlx5e: Set EQE based as default TX interrupt moderation mode · 48bfc397

由 Tal Gilboa 提交于 3月 30, 2018

The default TX moderation mode was mistakenly set to CQE based. The
intention was to add a control ability in order to improve some specific
use-cases. In general, we prefer to use EQE based moderation as it gives
much better numbers for the common cases.

CQE based causes a degradation in the common case since it resets the
moderation timer on CQE generation. This causes an issue when TSO is
well utilized (large TSO sessions). The timer is set to 16us so traffic
of ~64KB TSO sessions per second would mean timer reset (CQE per TSO
session -> long time between CQEs). In this case we quickly reach the
tcp_limit_output_bytes (256KB by default) and cause a halt in TX traffic.

By setting EQE based moderation we make sure timer would expire after
16us regardless of the packet rate.
This fixes an up to 40% packet rate and up to 23% bandwidth degradtions.

Fixes: 0088cbbc ("net/mlx5e: Enable CQE based moderation on TX CQ")
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48bfc397

net/mlx4_en: CHECKSUM_COMPLETE support for fragments · ce5a453c

由 Eric Dumazet 提交于 3月 27, 2018

Refine the RX check summing handling to propagate the
hardware provided checksum so that we do not have to
compute it later in software.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce5a453c

01 4月, 2018 10 次提交

mlxsw: spectrum: Don't use resource ID of 0 · 59441fef

由 Petr Machata 提交于 4月 01, 2018

In commit 14530746 ("devlink: Remove top_hierarchy arg to
devlink_resource_register"), the "top_hierarchy" parameter to
devlink_resource_register() was removed in favor of using the parameter
"parent_resource_id" exclusively to determine who the parent is. The
root node's resource ID for this purpose is
DEVLINK_RESOURCE_ID_PARENT_TOP with the value 0. It is therefore
problematic that the resource MLXSW_SP_RESOURCE_KVD has also ID of 0.

Fix this by numbering driver-specific resources from 1.

Fixes: 14530746 ("devlink: Remove top_hierarchy arg to devlink_resource_register")
Signed-off-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59441fef

mlxsw: spectrum: Pass mlxsw_core as arg of mlxsw_sp_kvdl_resources_register() · 88d2fbcd

由 Jiri Pirko 提交于 4月 01, 2018

Pass struct mlxsw_core instead of devlink since it is nicer within mlxsw
code and we need both structs in mlxsw_sp_kvdl_resources_register()
anyway.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88d2fbcd

mlxsw: Move "resources_query_enable" out of mlxsw_config_profile · ad3f20b2

由 Jiri Pirko 提交于 4月 01, 2018

As struct mlxsw_config_profile is mapped to the payload of the FW
command of the same name, resources_query_enable flag does not belong
there. Move it to struct mlxsw_driver.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad3f20b2

mlxsw: Move "used_kvd_sizes" check to mlxsw_pci_config_profile · 110d2d21

由 Jiri Pirko 提交于 4月 01, 2018

The check should be done directly in mlxsw_pci_config_profile, as for
other profile items. Also, be consistent in naming with the rest and
rename to "used_kvd_sizes".
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

110d2d21

mlxsw: core: Fix arg name of MLXSW_CORE_RES_VALID and MLXSW_CORE_RES_GET · 64f45888

由 Jiri Pirko 提交于 4月 01, 2018

First arg of these helpers should be "mlxsw_core".
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64f45888

mlxsw: remove kvd_hash_granularity from config profile struct · 72779c97

由 Jiri Pirko 提交于 4月 01, 2018

This should not be part of the struct, as the struct fields
are tightly coupled with the FW command payload of the same name.
Just use the "granularity" define directly, as in other places.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72779c97

mlxsw: spectrum: Change KVD linear parts from list to array · 4f8768be

由 Jiri Pirko 提交于 4月 01, 2018

The parts info is array. The parts copy this info array, yet they are a
list. So make the indexing according to the id and change the list of
parts into array of parts. This helps to eliminate lookups and
constructs like mlxsw_sp_kvdl_part_update() (took me some non-trivial
time to figure out what is going on there).
Alongside with that, introduce a helper macro to define the parts infos.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f8768be

mlxsw: Constify devlink_resource_ops · f9b91201

由 Jiri Pirko 提交于 4月 01, 2018

devlink_resource_ops should be const as the arg of register function is
also const.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9b91201

mlxsw: spectrum_kvdl: Fix handling of resource_size_param · c8276dd2

由 Jiri Pirko 提交于 4月 01, 2018

Current code uses global variables, adjusts them and passes pointer down
to devlink. With every other mlxsw_core instance, the previously passed
pointer values are rewritten. Fix this by de-globalize the variables.

Fixes: 7f47b19b ("mlxsw: spectrum_kvdl: Add support for per part occupancy")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8276dd2

mlxsw: spectrum_acl: Fix flex actions header ifndef define construct · 9270aa0d

由 Jiri Pirko 提交于 4月 01, 2018

Fix copy&paste error in flex actions header ifndef define construct
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9270aa0d