提交 · c0b14a0854fab0a0164aabfe49a76aae9216fe97 · openeuler / Kernel

02 5月, 2019 14 次提交

net/mlx5: E-Switch, Use atomic rep state to serialize state change · 6f4e0219

由 Bodong Wang 提交于 4月 18, 2019

When the state of rep was introduced, it was also designed to prevent
duplicate unloading of the same rep. Considering the following two
flows when an eswitch manager is at switchdev mode with n VF reps loaded.

+--------------------------------------+--------------------------------+
| cpu-0                                | cpu-1                          |
| --------                             | --------                       |
| mlx5_ib_remove                       | mlx5_eswitch_disable_sriov     |
|  mlx5_ib_unregister_vport_reps       |  esw_offloads_cleanup          |
|   mlx5_eswitch_unregister_vport_reps |   esw_offloads_unload_all_reps |
|    __unload_reps_all_vport           |    __unload_reps_all_vport     |
+--------------------------------------+--------------------------------+

These two flows will try to unload the same rep. Per original design,
once one flow unloads the rep, the state moves to REGISTERED. The 2nd
flow will no longer needs to do the unload and bails out. However, as
read and write of the state is not atomic, when 1st flow is doing the
unload, the state is still LOADED, 2nd flow is able to do the same
unload action. Kernel crash will happen.

To solve this, driver should do atomic test-and-set for the state. So
that only one flow can change the rep state from LOADED to REGISTERED,
and proceed to do the actual unloading.

Since the state is changing to atomic type, all other read/write should
be atomic action as well.

Fixes: f121e0ea (net/mlx5: E-Switch, Add state to eswitch vport representors)
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NVu Pham <vuhuong@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6f4e0219

net/mlx5: E-Switch, Fix the check of legal vport · 5d9986a3

由 Bodong Wang 提交于 4月 15, 2019

The check of legal vport is to ensure the vport number falls between
0 and total number of vports. Along with the introduction of uplink
rep, enabled vports are not consecutive any more.
Therefore, rely on the eswitch vport getter function to check if it's
a valid vport.

As the getter function relies on eswitch, add the check of vport
group manager and validation the presence of eswitch structure.
Remove the redundant check in the function calls.

Since the vport array will be allocated once eswitch is initialized
and will be kept alive if eswitch presents, no need to protect it with
the state lock.

Fixes: 5ae51620 ("net/mlx5: E-Switch, Assign a different position for uplink rep and vport")
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5d9986a3

net/mlx5: E-Switch, Use getter to access all vport array · 4314ebaa

由 Bodong Wang 提交于 4月 15, 2019

Some functions issue vport commands and access vport array using
vport_index/vport_num interchangeably which is OK for VFs vports.
However, this creates potential bug if those vports are not VFs
(E.g, uplink, sf) where their vport_index don't equal to vport_num.

Prepare code to access mlx5_vport structure using a getter function.
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Signed-off-by: NVu Pham <vuhuong@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4314ebaa

net/mlx5: Use available mlx5_vport struct · ee813f31

由 Parav Pandit 提交于 4月 21, 2019

Several functions need to access mlx5_vport and vport_num.
When these functions are called, caller already has mlx5_vport*
available.
Hence pass such mlx5_vport pointer.

This is preparation patch to add error checks to
mlx5_eswitch_get_vport() and to return error status.
By doing so, reduce places where error check of mlx5_eswitch_get_vport()
can be avoided.

While doing such change, mlx5_eswitch_query_vport_drop_stats() gets
corrected to work on vport, instead of vport_idx.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NBodong Wang <bodong@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ee813f31

net/mlx5: Reuse mlx5_esw_for_each_vf_vport macro in two files · 786ef904

由 Parav Pandit 提交于 4月 21, 2019

Currently mlx5_esw_for_each_vf_vport iterates over mlx5_vport entries in
eswitch.c
Same macro in eswitch_offloads.c iterates over vport number in
eswitch_offloads.c

Instead of duplicate macro names, to avoid confusion and to reuse the
same macro in both files, move it to eswitch.h.

To iterate over vport numbers where there is no need to iterate over
mlx5_vport, but only a vport number is needed, rename those macros in
eswitch_offloads.c to mlx5_esw_for_each_vf_num_vport*.

While at it, keep all vport and vport rep iterators together.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

786ef904

net/mlx5: Remove unused mlx5_query_nic_vport_vlans · c9bbfb37

由 Bodong Wang 提交于 4月 12, 2019

mlx5_query_nic_vport_vlans() is not used anymore. Hence remove it.
This patch doesn't change any functionality.
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c9bbfb37

net/mlx5e: remove meaningless CFLAGS_tracepoint.o · 0bdddcea

由 Masahiro Yamada 提交于 4月 19, 2019

CFLAGS_tracepoint.o specifies CFLAGS for compiling tracepoint.c but
it does not exist under drivers/net/ethernet/mellanox/mlx5/core/.

CFLAGS_tracepoint.o is unused.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0bdddcea

net/mlx5e: Put the common XDP code into a function · 33e10924

由 Maxim Mikityanskiy 提交于 3月 01, 2019

The same code that returns XDP frames and releases pages is used both in
mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Create a function that
cleans up an MPWQE.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

33e10924

net/mlx5e: ethtool, Add support for EEPROM high pages query · a708fb7b

由 Erez Alfasi 提交于 3月 21, 2019

Add the support to read additional EEPROM information from high pages.
Information for modules such as SFF-8436 and SFF-8636:
 1) Application select table
 2) User writable EEPROM
 3) Thresholds and alarms
Signed-off-by: NErez Alfasi <ereza@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a708fb7b

net/mlx5e: Return error when trying to insert existing flower filter · 0e1c1a2f

由 Vlad Buslov 提交于 4月 15, 2019

With unlocked TC it is possible to have spurious deletes and inserts of
same filter. TC layer needs drivers to always return error when flow
insertion failed in order to correctly calculate "in_hw_count" for each
filter. Fix mlx5e_configure_flower() to return -EEXIST when TC tries to
insert a filter that is already provisioned to the driver.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0e1c1a2f

net/mlx5e: Replace TC VLAN pop with VLAN 0 rewrite in prio tag mode · 0bac1194

由 Eli Britstein 提交于 3月 04, 2019

Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN
push on RX path. To workaround that limitation untagged packets are
tagged with VLAN ID 0x000 (priority tag) and pop/push actions are
replaced by VLAN re-write actions (which are supported by the HW).
Replace TC VLAN pop action with a VLAN priority tag header rewrite.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0bac1194

net/mlx5e: ACLs for priority tag mode · 18486737

由 Eli Britstein 提交于 3月 04, 2019

Current ConnectX HW is unable to perform VLAN pop in TX path and VLAN
push on RX path. As a workaround, untagged packets are tagged with
VID 0x000 allowing pop/push actions to be exchanged with VLAN rewrite
actions.
Use the ingress ACL table, preceding the FDB, to push VLAN 0x000 ID tag
for untagged packets and the egress ACL table, succeeding the FDB, to
pop VLAN 0x000 ID tag.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

18486737

net/mlx5e: Turn on HW tunnel offload in all TIRs · 69dad68d

由 Tariq Toukan 提交于 1月 20, 2019

Hardware requires that all TIRs that steer traffic to the same RQ
should share identical tunneled_offload_en value.
For that, the tunneled_offload_en bit should be set/unset (according to
the HW capability) for all TIRs', not only the ones dedicated for
tunneled (inner) traffic.

Fixes: 1b223dd3 ("net/mlx5e: Fix checksum handling for non-stripped vlan packets")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

69dad68d

net/mlx5e: Take common TIR context settings into a function · 7306c274

由 Tariq Toukan 提交于 1月 16, 2019

Many TIR context settings are common to different TIR types,
take them into a common function.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7306c274

30 4月, 2019 8 次提交

net/mlx5: Geneve, Add flow table capabilities for Geneve decap with TLV options · b169e64a

由 Yevgeny Kliteynik 提交于 4月 29, 2019

Introduce specification for Geneve decap flow with encapsulation options
and allow creation of rules that are matching on Geneve TLV options.
Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
Signed-off-by: NYevgeny Kliteynik <kliteyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b169e64a

net/mlx5: Eswitch, enable RoCE loopback traffic · 80f09dfc

由 Maor Gottlieb 提交于 4月 29, 2019

When in switchdev mode, we would like to treat loopback RoCE
traffic (on eswitch manager) as RDMA and not as regular
Ethernet traffic
In order to enable it we add flow steering rule that forward RoCE
loopback traffic to the HW RoCE filter (by adding allow rule).
In addition we add RoCE address in GID index 0, which will be
set in the RoCE loopback packet.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

80f09dfc

net/mlx5: Add new miss flow table action · f6f7d6b5

由 Maor Gottlieb 提交于 4月 29, 2019

Flow table supports three types of miss action:
1. Default miss action - go to default miss table according to table.
2. Go to specific table.
3. Switch domain - go to the root table of an alternative steering
   table domain.

New table miss action was added - switch_domain.
The next domain for RDMA_RX namespace is the NIC RX domain.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f6f7d6b5

net/mlx5: Add support in RDMA RX steering · d83eb50e

由 Maor Gottlieb 提交于 4月 29, 2019

Add new flow steering namespace - MLX5_FLOW_NAMESPACE_RDMA_RX.
Flow steering rules in this namespace are used to filter
RDMA traffic.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d83eb50e

net/mlx5: Pass flow steering objects to fs_cmd · ae288a48

由 Maor Gottlieb 提交于 4月 29, 2019

Pass the flow steering objects instead of their attributes
to fs_cmd in order to decrease number of arguments and in
addition it will be used to update object fields.
Pass the flow steering root namespace instead of the device
so will have context to the namespace in the fs_cmd layer.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

ae288a48

net/mlx5: Enable general events on all interfaces · 72c6f524

由 Aya Levin 提交于 4月 29, 2019

Open events of type 'GENERAL' to all types of interfaces. Prior to this
patch, 'GENERAL' events were captured only by Ethernet interfaces. Other
interface types (non-Ethernet) were excluded and couldn't receive
'GENERAL' events.

Fixes: 5d3c537f ("net/mlx5: Handle event of power detection in the PCIE slot")
Signed-off-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

72c6f524

net/mlx5: Separate and generalize dma device from pci device · c42260f1

由 Vu Pham 提交于 4月 29, 2019

The mlx5 Sub-Function (SF) sub device will be introduced in
subsequent patches. It will be created as mediated device and
belong to mdev bus. It is necessary to treat dma operations on
PF, VF and SF in uniform way, hence reduce the dependency on
pdev pci dev struct and work directly out of newly introduced
'struct device' from previous patch.

This patch does not change any functionality.
Signed-off-by: NVu Pham <vuhuong@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c42260f1

net/mlx5: Get rid of storing copy of device name · 27b942fb

由 Parav Pandit 提交于 4月 29, 2019

Currently mlx5 core stores copy of the PCI device name in a
mlx5_priv structure and uses pr_warn, pr_err helpers.

Get rid of the copy of this name; instead store the parent device
pointer that contains name as well as dma specific parameters.
This also allows to use kernel's well defined dev_warn, dev_err, dev_dbg
device specific print routines.

This is also a preparation patch to access non PCI parent device in
future.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

27b942fb

25 4月, 2019 1 次提交

net/mlx5: Introduce new TIR creation core API · 96780e4f

由 Ariel Levkovich 提交于 3月 31, 2019

Introducing new TIR creation core API which allows caller
to receive back from the call the full command outbox.

This comes as a preparation for the next patch that will
retrieve the TIR ICM address from the command outbox.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Reviewed-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

96780e4f

24 4月, 2019 15 次提交

mlxsw: spectrum_router: Prevent ipv6 gateway with v4 route via replace and append · 7973d9e7

由 David Ahern 提交于 4月 23, 2019

mlxsw currently does not support v6 gateways with v4 routes. Commit
19a9d136 ("ipv4: Flag fib_info with a fib_nh using IPv6 gateway")
prevents a route from being added, but nothing stops the replace or
append. Add a catch for them too.
    $ ip  ro add 172.16.2.0/24 via 10.99.1.2
    $ ip  ro replace 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
    Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
    $ ip  ro append 172.16.2.0/24 via inet6 fe80::202:ff:fe00:b dev swp1s0
    Error: mlxsw_spectrum: IPv6 gateway with IPv4 route is not supported.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7973d9e7

net/mlx5e: Use #define for the WQE wait timeout constant · f8ebecf2

由 Maxim Mikityanskiy 提交于 3月 05, 2019

Create a #define for the timeout of mlx5e_wait_for_min_rx_wqes to
clarify the meaning of a magic number.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f8ebecf2

net/mlx5e: Remove unused rx_page_reuse stat · 03ceda6f

由 Maxim Mikityanskiy 提交于 3月 21, 2019

Remove the no longer used page_reuse stat of RQs.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

03ceda6f

net/mlx5e: Take HW interrupt trigger into a function · 63d26b49

由 Maxim Mikityanskiy 提交于 3月 01, 2019

mlx5e_trigger_irq posts a NOP to the ICO SQ just to trigger an IRQ and
enter the NAPI poll on the right CPU according to the affinity. Use it
in mlx5e_activate_rq.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

63d26b49

net/mlx5e: Remove unused parameter · 10961c56

由 Maxim Mikityanskiy 提交于 3月 27, 2019

mdev is unused in mlx5e_rx_is_linear_skb.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

10961c56

net/mlx5e: Add an underflow warning comment · b1b187e1

由 Maxim Mikityanskiy 提交于 3月 27, 2019

mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
the requested number of frames in the RQ (F) and the number of packets
per WQE (P). It ensures that N is not less than the minimum number of
WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
should be true. This function deals with logarithms, so it should check
that log(F) - log(P) >= log(N_min). However, if F < P, this expression
will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
instead.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b1b187e1

net/mlx5e: Move parameter calculation functions to en/params.c · 9a22d5d8

由 Maxim Mikityanskiy 提交于 3月 27, 2019

This commit moves the parameter calculation functions to a separate file
for better modularity and code sharing with future features.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

9a22d5d8

net/mlx5e: Report mlx5e_xdp_set errors · 74bbaebf

由 Maxim Mikityanskiy 提交于 3月 19, 2019

If the channels fail to reopen after setting an XDP program, return the
error code instead of 0. A proper fix is still needed, as now any error
while reopening the channels brings the interface down. This patch only
adds error reporting.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

74bbaebf

net/mlx5e: Remove unused parameter · 83b2fd64

由 Maxim Mikityanskiy 提交于 3月 07, 2019

params is unused in mlx5e_init_di_list.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

83b2fd64

net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow · c2273219

由 Shay Agroskin 提交于 3月 14, 2019

Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).

When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.

This is added to better utilize the HW resources (which now makes
one less packet data prefetch) and allow better scalability, on the
account of CPU usage (which now 'memcpy's the packet into the WQE).

To load balance between HW and CPU and get max packet rate, we use
watermarks to detect how much the HW is congested and move the work
loads back and forth between HW and CPU.

Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz

* Tested with hyper-threading disabled

XDP_TX:

|          | before | after   |       |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring   | 12Mpps | 12Mpps  | same  |

XDP_REDIRECT:

** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.

|          | before  | after   |      |
| 32 rings | 64Mpps  | 92Mpps  | +43% |
| 1 ring   | 6.4Mpps | 6.4Mpps | same |

As we can see, feature significantly improves scaling, without
hurting single ring performance.
Signed-off-by: NShay Agroskin <shayag@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c2273219

net/mlx5e: XDP, Add TX MPWQE session counter · 73cab880

由 Shay Agroskin 提交于 2月 25, 2019

This counter tracks how many TX MPWQE sessions are started in XDP SQ
in XDP TX/REDIRECT flow. It counts per-channel and global stats.
Signed-off-by: NShay Agroskin <shayag@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

73cab880

net/mlx5e: XDP, Enhance RQ indication for XDP redirect flush · 15143bf5

由 Tariq Toukan 提交于 3月 10, 2019

The XDP redirect flush indication belongs to the receive queue,
not to its XDP send queue.

For this, use a new bit on rq->flags.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NShay Agroskin <shayag@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

15143bf5

net/mlx5e: XDP, Fix shifted flag index in RQ bitmap · f03590f7

由 Tariq Toukan 提交于 3月 10, 2019

Values in enum mlx5e_rq_flag are used as bit indixes.
Intention was to use them with no BIT(i) wrapping.

No functional bug fix here, as the same (shifted)flag bit
is used for all set, test, and clear operations.

Fixes: 121e8927 ("net/mlx5e: Refactor RQ XDP_TX indication")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NShay Agroskin <shayag@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f03590f7

net/mlx5e: RX, Support multiple outstanding UMR posts · fd9b4be8

由 Tariq Toukan 提交于 2月 27, 2019

The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.

A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.

When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.

In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.

For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.

This is expected to improve packet rate in high CPU scale.

Performance test:
As expected, huge improvement in large-scale (48 cores).

xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.

Before: Unstable, 7 to 30 Mpps
After:  Stable,   at 70.5 Mpps

No degradation in other tested scenarios.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

fd9b4be8

net: pass net_device argument to the eth_get_headlen · c43f1255

由 Stanislav Fomichev 提交于 4月 22, 2019

Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c43f1255

23 4月, 2019 2 次提交

mlxsw: spectrum_buffers: Adjust CPU port shared buffer egress quotas · 7a1ff9f4

由 Ido Schimmel 提交于 4月 22, 2019

Switch the CPU port to use the new dedicated egress pool instead the
previously used egress pool which was shared with normal front panel
ports.

Add per-port quotas for the amount of traffic that can be buffered for
the CPU port and also adjust the per-{port, TC} quotas.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a1ff9f4

mlxsw: spectrum_buffers: Allow skipping ingress port quota configuration · 6d28725c

由 Ido Schimmel 提交于 4月 22, 2019

The CPU port is used to transmit traffic that is trapped to the host
CPU. It is therefore irrelevant to define ingress quota for it.

Add a 'skip_ingress' argument to the function tasked with configuring
per-port quotas, so that ingress quotas could be skipped in case the
passed local port is the CPU port.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d28725c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功