提交 · d5949d92c29ce147a9cb9e21fcf8ad7c1ff327b1 · openeuler / Kernel

11 4月, 2019 6 次提交

mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2 · d5949d92

由 Ido Schimmel 提交于 4月 10, 2019

In Spectrum-1, when a multicast packet is admitted to the shared buffer
it increases the quotas of all the ports and {port, TC} to which it is
forwarded to.

The above means that multicast packets are accounted multiple times in
the shared buffer and can therefore cause the associated shared buffer
pool to fill up very quickly.

To work around this issue, commit e83c045e ("mlxsw:
spectrum_buffers: Configure MC pool") added a dedicated multicast pool
in which multicast packets are accounted.

The issue is not present in Spectrum-2, but in order to be backward
compatible with Spectrum-1, its default behavior is to allow a multicast
packet to increase multiple egress quotas instead of one.

Until the new (non-backward compatible) mode is supported, configure a
dedicated multicast pool as in Spectrum-1.

Fixes: fe099bf6 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reviewed-by: NPetr Machata <petrm@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5949d92

mlxsw: spectrum_router: Do not check VRF MAC address · 972fae68

由 Ido Schimmel 提交于 4月 10, 2019

Commit 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
addresses") enabled the driver to veto router interface (RIF) MAC
addresses that it cannot support.

This check should only be performed for interfaces for which the driver
actually configures a RIF. A VRF upper is not one of them, so ignore it.

Without this patch it is not possible to set an IP address on the VRF
device and use it as a loopback.

Fixes: 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
Tested-by: NAlexander Petrovskiy <alexpe@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

972fae68

mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue · b442fed1

由 Ido Schimmel 提交于 4月 10, 2019

The workqueue is used to periodically update the networking stack about
activity / statistics of various objects such as neighbours and TC
actions.

It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag.

Fixes: 3d5479e9 ("mlxsw: core: Remove deprecated create_workqueue")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b442fed1

mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue · 4af06997

由 Ido Schimmel 提交于 4月 10, 2019

The ordered workqueue is used to offload various objects such as routes
and neighbours in the order they are notified.

It should not be called as part of memory reclaim path, so remove the
WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker
tries to flush a non-WQ_MEM_RECLAIM workqueue.

[1]
[97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker
[97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130
...
[97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
[97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum]
[97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130
...
[97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086
[97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000
[97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046
[97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0
[97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0
[97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0
[97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000
[97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0
[97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[97703.543115] Call Trace:
[97703.543129] __flush_work+0xbd/0x1e0
[97703.543137] ? __cancel_work_timer+0x136/0x1b0
[97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0
[97703.543154] __cancel_work_timer+0x136/0x1b0
[97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core]
[97703.543184] cancel_work_sync+0x10/0x20
[97703.543191] rhashtable_free_and_destroy+0x23/0x140
[97703.543198] rhashtable_destroy+0xd/0x10
[97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum]
[97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum]
[97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum]
[97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum]
[97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum]
[97703.543484] process_one_work+0x1fd/0x400
[97703.543493] worker_thread+0x34/0x410
[97703.543500] kthread+0x121/0x140
[97703.543507] ? process_one_work+0x400/0x400
[97703.543512] ? kthread_park+0x90/0x90
[97703.543523] ret_from_fork+0x35/0x40

Fixes: a3832b31 ("mlxsw: core: Create an ordered workqueue for FIB offload")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NSemion Lisyansky <semionl@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4af06997

mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue · a8c133b0

由 Ido Schimmel 提交于 4月 10, 2019

The EMAD workqueue is used to handle retransmission of EMAD packets that
contain configuration data for the device's firmware.

Given the workers need to allocate these packets and that the code is
not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
flag.

Fixes: d965465b ("mlxsw: core: Fix possible deadlock")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8c133b0

mlxsw: spectrum_switchdev: Add MDB entries in prepare phase · d4d0e409

由 Ido Schimmel 提交于 4月 10, 2019

The driver cannot guarantee in the prepare phase that it will be able to
write an MDB entry to the device. In case the driver returned success
during the prepare phase, but then failed to add the entry in the commit
phase, a WARNING [1] will be generated by the switchdev core.

Fix this by doing the work in the prepare phase instead.

[1]
[  358.544486] swp12s0: Commit of object (id=2) failed.
[  358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0
[  358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350
[  358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
[  358.580582] Workqueue: events switchdev_deferred_process_work
[  358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0
...
[  358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286
[  358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000
[  358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8
[  358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000
[  358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000
[  358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200
[  358.659790] FS:  0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000
[  358.668820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0
[  358.683188] Call Trace:
[  358.685918]  switchdev_port_obj_add_deferred+0x13/0x60
[  358.691655]  switchdev_deferred_process+0x6b/0xf0
[  358.696907]  switchdev_deferred_process_work+0xa/0x10
[  358.702548]  process_one_work+0x1f5/0x3f0
[  358.707022]  worker_thread+0x28/0x3c0
[  358.711099]  ? process_one_work+0x3f0/0x3f0
[  358.715768]  kthread+0x10d/0x130
[  358.719369]  ? __kthread_create_on_node+0x180/0x180
[  358.724815]  ret_from_fork+0x35/0x40

Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Reported-by: NAlex Kushnarov <alexanderk@mellanox.com>
Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4d0e409

30 3月, 2019 15 次提交

net/mlx5e: Consider tunnel type for encap contexts · 7f1a546e

由 Eli Britstein 提交于 3月 18, 2019

The driver allocates an encap context based on the tunnel properties,
and reuse that context for all flows using the same tunnel properties.
Commit df2ef3bf ("net/mlx5e: Add GRE protocol offloading")
introduced another tunnel protocol other than the single VXLAN
previously supported. A flow that uses a tunnel with the same tunnel
properties but with a different tunnel type (GRE vs VXLAN for example)
would mistakenly reuse the previous alocated context, causing the
traffic to be sent with the wrong encapsulation. Fix that by
considering the tunnel type for encap contexts.

Fixes: df2ef3bf ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7f1a546e

net/mlx5e: Update xon formula · e28408e9

由 Huy Nguyen 提交于 3月 07, 2019

Set xon = xoff - netdev's max_mtu.
netdev's max_mtu will give enough time for the pause frame to
arrive at the sender.

Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e28408e9

net/mlx5e: Update xoff formula · 5ec983e9

由 Huy Nguyen 提交于 3月 07, 2019

Set minimum speed in xoff threshold formula to 40Gbps

Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5ec983e9

net/mlx5: E-Switch, fix syndrome (0x678139) when turn on vepa · 36acf63a

由 Huy Nguyen 提交于 3月 22, 2019

Make sure the struct mlx5_flow_destination is zero before
filling in the field.

Fixes: 8da202b2 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

36acf63a

net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands · eca4a928

由 Omri Kahalon 提交于 2月 24, 2019

Traditionally, the PF (Physical Function) which resides on vport 0 was
the E-switch manager. Since the ECPF (Embedded CPU Physical Function),
which resides on vport 0xfffe, was introduced as the E-Switch manager,
the assumption that the E-switch manager is on vport 0 is incorrect.

Since the eswitch code already uses the actual vport value, all we
need is to always set other_vport=1.
Signed-off-by: NOmri Kahalon <omrik@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

eca4a928

net/mlx5: E-Switch, Protect from invalid memory access in offload fdb table · 5c1d260e

由 Roi Dayan 提交于 3月 21, 2019

The esw offloads structures share a union with the legacy mode structs.
Reset the offloads struct to zero in init to protect from null
assumptions made by the legacy mode code.
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5c1d260e

net/mlx5e: Correctly use the namespace type when allocating pedit action · 84be899f

由 Tonghao Zhang 提交于 2月 26, 2019

The capacity of FDB offloading and NIC offloading table are
different, and when allocating the pedit actions, we should
use the correct namespace type.

Fixes: c500c86b ("net/mlx5e: support for two independent packet edit actions")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

84be899f

net/mlx5: E-Switch, Fix access to invalid memory when toggling esw modes · 8a91ad93

由 Roi Dayan 提交于 3月 07, 2019

The esw fdb table has a union of legacy and offloads members.
So if we were in a certain esw mode we could set some memebers and not
set null which is fine as on destroy path and don't care.
But then moving from legacy to switchdev a second time, the cleanup flow
of legacy mode checks if a struct member was in use if it's not null so
we need to make sure to reset the code to null when we init legacy mode.

Fixes: 8da202b2 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8a91ad93

net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys · dd1b9e09

由 Aya Levin 提交于 2月 28, 2019

Allow configuration of legacy link-modes even when extended link-modes
are supported. This requires reading of legacy advertisement even when
extended link-modes are supported. Since legacy and extended
advertisement are mutually excluded, wait for empty reply from extended
advertisement before reading legacy advertisement.

Fixes: 6a897372 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

dd1b9e09

net/mlx5: ethtool, Fix type analysis of advertised link-mode · 8d047bf5

由 Aya Levin 提交于 2月 28, 2019

Ethtool option set_link_ksettings allows setting of legacy link-modes
or extended link-modes. Refine the decision of which type of link-modes
is set.

Fixes: 6a897372 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: NAya Levin <ayal@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8d047bf5

net/mlx5e: Add a lock on tir list · 80a2a902

由 Yuval Avnery 提交于 3月 11, 2019

Refresh tirs is looping over a global list of tirs while netdevs are
adding and removing tirs from that list. That is why a lock is
required.

Fixes: 724b2aa1 ("net/mlx5e: TIRs management refactoring")
Signed-off-by: NYuval Avnery <yuvalav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

80a2a902

net: mlx5: Add a missing check on idr_find, free buf · 8e949363

由 Aditya Pakki 提交于 3月 19, 2019

idr_find() can return a NULL value to 'flow' which is used without a
check. The patch adds a check to avoid potential NULL pointer dereference.

In case of mlx5_fpga_sbu_conn_sendmsg() failure, free buf allocated
using kzalloc.

Fixes: ab412e1d ("net/mlx5: Accel, add TLS rx offload routines")
Signed-off-by: NAditya Pakki <pakki001@umn.edu>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8e949363

net/mlx5e: Allow IPv4 ttl & IPv6 hop_limit rewrite for all L4 protocols · 8998576b

由 Dmytro Linkin 提交于 2月 04, 2019

For some protocols we are not allowing IP header rewrite offload, since
the HW is not capable to properly adjust the l4 checksum. However, TTL
& HOPLIMIT modification can be done for all IP protocols, because they
are not part of the pseudo header taken into account for checksum.

Fixes: 73867881 ("drivers: net: use flow action infrastructure")
Signed-off-by: NDmytro Linkin <dmitrolin@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8998576b

net/mlx5e: Fix error handling when refreshing TIRs · bc87a003

由 Gavi Teitz 提交于 3月 11, 2019

Previously, a false positive would be caught if the TIRs list is
empty, since the err value was initialized to -ENOMEM, and was only
updated if a TIR is refreshed. This is resolved by initializing the
err value to zero.

Fixes: b676f653 ("net/mlx5e: Refactor refresh TIRs")
Signed-off-by: NGavi Teitz <gavi@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

bc87a003

net/mlx5: Decrease default mr cache size · e8b26b21

由 Artemy Kovalyov 提交于 3月 19, 2019

Delete initialization of high order entries in mr cache to decrease initial
memory footprint. When required, the administrator can populate the
entries with memory keys via the /sys interface.

This approach is very helpful to significantly reduce the per HW function
memory footprint in virtualization environments such as SRIOV.

Fixes: 9603b61d ("mlx5: Move pci device handling from mlx5_ib to mlx5_core")
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reported-by: NShalom Toledo <shalomt@mellanox.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e8b26b21

20 3月, 2019 1 次提交

mlxsw: core: mlxsw: core: avoid -Wint-in-bool-context warning · 7442c483

由 Arnd Bergmann 提交于 3月 18, 2019

A recently added function in mlxsw triggers a harmless compiler warning:

In file included from drivers/net/ethernet/mellanox/mlxsw/core.h:17,
                 from drivers/net/ethernet/mellanox/mlxsw/core_env.c:7:
drivers/net/ethernet/mellanox/mlxsw/core_env.c: In function 'mlxsw_env_module_temp_thresholds_get':
drivers/net/ethernet/mellanox/mlxsw/reg.h:8015:45: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
 #define MLXSW_REG_MTMP_TEMP_TO_MC(val) (val * 125)
                                        ~~~~~^~~~~~
drivers/net/ethernet/mellanox/mlxsw/core_env.c:116:8: note: in expansion of macro 'MLXSW_REG_MTMP_TEMP_TO_MC'
   if (!MLXSW_REG_MTMP_TEMP_TO_MC(module_temp)) {
        ^~~~~~~~~~~~~~~~~~~~~~~~~

The warning is normally disabled, but it would be nice to enable
it to find real bugs, and there are no other known instances at
the moment.

Replace the negation with a zero-comparison, which also matches
the comment above it.

Fixes: d93c19a1 ("mlxsw: core: Add API for QSFP module temperature thresholds reading")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7442c483

18 3月, 2019 2 次提交

IB/mlx5: Use mlx5 core to create/destroy a DEVX DCT · c5ae1954

由 Yishai Hadas 提交于 3月 06, 2019

To prevent a hardware memory leak when a DEVX DCT object is destroyed
without calling DRAIN DCT before, (e.g. under cleanup flow), need to
manage its creation and destruction via mlx5 core.

In that case the DRAIN DCT command will be called and only once that it
will be completed the DESTROY DCT command will be called.  Otherwise, the
DESTROY DCT may fail and a hardware leak may occur.

As of that change the DRAIN DCT command should not be exposed any more
from DEVX, it's managed internally by the driver to work as expected by
the device specification.

Fixes: 7efce369 ("IB/mlx5: Add obj create and destroy functionality")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c5ae1954

net/mlx5: Fix DCT creation bad flow · f84b66b9

由 Yishai Hadas 提交于 3月 06, 2019

In case the DCT creation command has succeeded a DRAIN must be issued
before calling DESTROY.

In addition, the original code used the wrong parameter for the DESTROY
command, 'in' instead of 'din', which caused another creation try instead
of destroying.

Cc: <stable@vger.kernel.org> # 4.15
Fixes: 57cda166 ("net/mlx5: Add DCT command interface")
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

f84b66b9

13 3月, 2019 5 次提交

net/mlx4_core: Fix qp mtt size calculation · 8511a653

由 Jack Morgenstein 提交于 3月 12, 2019

Calculation of qp mtt size (in function mlx4_RST2INIT_wrapper)
ultimately depends on function roundup_pow_of_two.

If the amount of memory required by the QP is less than one page,
roundup_pow_of_two is called with argument zero.  In this case, the
roundup_pow_of_two result is undefined.

Calling roundup_pow_of_two with a zero argument resulted in the
following stack trace:

UBSAN: Undefined behaviour in ./include/linux/log2.h:61:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 4 PID: 26939 Comm: rping Tainted: G OE 4.19.0-rc1
Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
Call Trace:
dump_stack+0x9a/0xeb
ubsan_epilogue+0x9/0x7c
__ubsan_handle_shift_out_of_bounds+0x254/0x29d
? __ubsan_handle_load_invalid_value+0x180/0x180
? debug_show_all_locks+0x310/0x310
? sched_clock+0x5/0x10
? sched_clock+0x5/0x10
? sched_clock_cpu+0x18/0x260
? find_held_lock+0x35/0x1e0
? mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]

Fix this by explicitly testing for zero, and returning one if the
argument is zero (assuming that the next higher power of 2 in this case
should be one).

Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8511a653

net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling · c07d2792

由 Jack Morgenstein 提交于 3月 12, 2019

In procedures mlx4_cmd_use_events() and mlx4_cmd_use_polling(), we need to
guarantee that there are no FW commands in progress on the comm channel
(for VFs) or wrapped FW commands (on the PF) when SRIOV is active.

We do this by also taking the slave_cmd_mutex when SRIOV is active.

This is especially important when switching from event to polling, since we
free the command-context array during the switch.  If there are FW commands
in progress (e.g., waiting for a completion event), the completion event
handler will access freed memory.

Since the decision to use comm_wait or comm_poll is taken before grabbing
the event_sem/poll_sem in mlx4_comm_cmd_wait/poll, we must take the
slave_cmd_mutex as well (to guarantee that the decision to use events or
polling and the call to the appropriate cmd function are atomic).

Fixes: a7e1f049 ("net/mlx4_core: Fix deadlock when switching between polling and event fw commands")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c07d2792

net/mlx4_core: Fix reset flow when in command polling mode · e15ce4b8

由 Jack Morgenstein 提交于 3月 12, 2019

As part of unloading a device, the driver switches from
FW command event mode to FW command polling mode.

Part of switching over to polling mode is freeing the command context array
memory (unfortunately, currently, without NULLing the command context array
pointer).

The reset flow calls "complete" to complete all outstanding fw commands
(if we are in event mode). The check for event vs. polling mode here
is to test if the command context array pointer is NULL.

If the reset flow is activated after the switch to polling mode, it will
attempt (incorrectly) to complete all the commands in the context array --
because the pointer was not NULLed when the driver switched over to polling
mode.

As a result, we have a use-after-free situation, which results in a
kernel crash.

For example:
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
PGD 0
Oops: 0000 [#1] SMP
Modules linked in: netconsole nfsv3 nfs_acl nfs lockd grace ...
CPU: 2 PID: 940 Comm: kworker/2:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
Workqueue: events hv_eject_device_work [pci_hyperv]
task: ffff8d1734ca0fd0 ti: ffff8d17354bc000 task.ti: ffff8d17354bc000
RIP: 0010:[<ffffffff876c4a8e>]  [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
RSP: 0018:ffff8d17354bfa38  EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff8d17362d42c8 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8d17362d42c8
RBP: ffff8d17354bfa70 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000298 R11: ffff8d173610e000 R12: ffff8d17362d42d0
R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000003
FS:  0000000000000000(0000) GS:ffff8d1802680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000f16d8000 CR4: 00000000001406e0
Call Trace:
 [<ffffffff876c7adc>] complete+0x3c/0x50
 [<ffffffffc04242f0>] mlx4_cmd_wake_completions+0x70/0x90 [mlx4_core]
 [<ffffffffc041e7b1>] mlx4_enter_error_state+0xe1/0x380 [mlx4_core]
 [<ffffffffc041fa4b>] mlx4_comm_cmd+0x29b/0x360 [mlx4_core]
 [<ffffffffc041ff51>] __mlx4_cmd+0x441/0x920 [mlx4_core]
 [<ffffffff877f62b1>] ? __slab_free+0x81/0x2f0
 [<ffffffff87951384>] ? __radix_tree_lookup+0x84/0xf0
 [<ffffffffc043a8eb>] mlx4_free_mtt_range+0x5b/0xb0 [mlx4_core]
 [<ffffffffc043a957>] mlx4_mtt_cleanup+0x17/0x20 [mlx4_core]
 [<ffffffffc04272c7>] mlx4_free_eq+0xa7/0x1c0 [mlx4_core]
 [<ffffffffc042803e>] mlx4_cleanup_eq_table+0xde/0x130 [mlx4_core]
 [<ffffffffc0433e08>] mlx4_unload_one+0x118/0x300 [mlx4_core]
 [<ffffffffc0434191>] mlx4_remove_one+0x91/0x1f0 [mlx4_core]

The fix is to set the command context array pointer to NULL after freeing
the array.

Fixes: f5aef5aa ("net/mlx4_core: Activate reset flow upon fatal command cases")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e15ce4b8

mlxsw: minimal: Initialize base_mac · 426aa1fc

由 Jiri Pirko 提交于 3月 12, 2019

Currently base_mac is not initialized which causes wrong reporting of
zeroed parent_id to userspace. Fix this by initializing base_mac
properly.

Fixes: c100e47c ("mlxsw: minimal: Add ethtool support")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

426aa1fc

mlxsw: core: Prevent duplication during QSFP module initialization · 6bab45b4

由 Vadim Pasternak 提交于 3月 12, 2019

Verify during thermal initialization if QSFP module's entry is already
configured in order to prevent duplication.
Such scenario could happen in case two switch drivers (PCI and I2C
based) coexist and if after boot, splitting configuration is applied
for some ports and then I2C based driver is re-probed.
In such case after reboot same QSFP module, associated with split will
be discovered by I2C based driver few times, and it will cause a crash.

It could happen for example on system equipped with BMC (Baseboard
Management Controller), running I2C based driver, when the next steps
are performed:
- System boot
- Host side configures port spilt.
- BMC side is rebooted.

Fixes: 6a79507c ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bab45b4

12 3月, 2019 10 次提交

net/mlx5: Avoid panic when setting vport rate · 24319258

由 Tonghao Zhang 提交于 3月 04, 2019

If we try to set VFs rate on a VF (not PF) net device, the kernel
will be crash. The commands are show as below:

$ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
$ ip link set $MLX_VF0 vf 0 max_tx_rate 2 min_tx_rate 1

If not applied the first patch ("net/mlx5: Avoid panic when setting
vport mac, getting vport config"), the command:

$ ip link set $MLX_VF0 vf 0 rate 100

can also crash the kernel.

[ 1650.006388] RIP: 0010:mlx5_eswitch_set_vport_rate+0x1f/0x260 [mlx5_core]
[ 1650.007092]  do_setlink+0x982/0xd20
[ 1650.007129]  __rtnl_newlink+0x528/0x7d0
[ 1650.007374]  rtnl_newlink+0x43/0x60
[ 1650.007407]  rtnetlink_rcv_msg+0x2a2/0x320
[ 1650.007484]  netlink_rcv_skb+0xcb/0x100
[ 1650.007519]  netlink_unicast+0x17f/0x230
[ 1650.007554]  netlink_sendmsg+0x2d2/0x3d0
[ 1650.007592]  sock_sendmsg+0x36/0x50
[ 1650.007625]  ___sys_sendmsg+0x280/0x2a0
[ 1650.007963]  __sys_sendmsg+0x58/0xa0
[ 1650.007998]  do_syscall_64+0x5b/0x180
[ 1650.009438]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: c9497c98 ("net/mlx5: Add support for setting VF min rate")
Cc: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

24319258

net/mlx5: Avoid panic when setting vport mac, getting vport config · 6e77c413

由 Tonghao Zhang 提交于 3月 04, 2019

If we try to set VFs mac address on a VF (not PF) net device,
the kernel will be crash. The commands are show as below:

$ echo 2 > /sys/class/net/$MLX_PF0/device/sriov_numvfs
$ ip link set $MLX_VF0 vf 0 mac 00:11:22:33:44:00

[exception RIP: mlx5_eswitch_set_vport_mac+41]
[ffffb8b7079e3688] do_setlink at ffffffff8f67f85b
[ffffb8b7079e37a8] __rtnl_newlink at ffffffff8f683778
[ffffb8b7079e3b68] rtnl_newlink at ffffffff8f683a63
[ffffb8b7079e3b90] rtnetlink_rcv_msg at ffffffff8f67d812
[ffffb8b7079e3c10] netlink_rcv_skb at ffffffff8f6b88ab
[ffffb8b7079e3c60] netlink_unicast at ffffffff8f6b808f
[ffffb8b7079e3ca0] netlink_sendmsg at ffffffff8f6b8412
[ffffb8b7079e3d18] sock_sendmsg at ffffffff8f6452f6
[ffffb8b7079e3d30] ___sys_sendmsg at ffffffff8f645860
[ffffb8b7079e3eb0] __sys_sendmsg at ffffffff8f647a38
[ffffb8b7079e3f38] do_syscall_64 at ffffffff8f00401b
[ffffb8b7079e3f50] entry_SYSCALL_64_after_hwframe at ffffffff8f80008c

and

[exception RIP: mlx5_eswitch_get_vport_config+12]
[ffffa70607e57678] mlx5e_get_vf_config at ffffffffc03c7f8f [mlx5_core]
[ffffa70607e57688] do_setlink at ffffffffbc67fa59
[ffffa70607e577a8] __rtnl_newlink at ffffffffbc683778
[ffffa70607e57b68] rtnl_newlink at ffffffffbc683a63
[ffffa70607e57b90] rtnetlink_rcv_msg at ffffffffbc67d812
[ffffa70607e57c10] netlink_rcv_skb at ffffffffbc6b88ab
[ffffa70607e57c60] netlink_unicast at ffffffffbc6b808f
[ffffa70607e57ca0] netlink_sendmsg at ffffffffbc6b8412
[ffffa70607e57d18] sock_sendmsg at ffffffffbc6452f6
[ffffa70607e57d30] ___sys_sendmsg at ffffffffbc645860
[ffffa70607e57eb0] __sys_sendmsg at ffffffffbc647a38
[ffffa70607e57f38] do_syscall_64 at ffffffffbc00401b
[ffffa70607e57f50] entry_SYSCALL_64_after_hwframe at ffffffffbc80008c

Fixes: a8d70a05 ("net/mlx5: E-Switch, Disallow vlan/spoofcheck setup if not being esw manager")
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6e77c413

net/mlx5e: Fix access to non-existing receive queue · c475e11e

由 Tariq Toukan 提交于 3月 05, 2019

In case number of channels is changed while interface is down,
RSS indirection table is mistakenly not modified accordingly,
causing access to out-of-range non-existing object.

Fix by updating the RSS indireciton table also in the early
return flow of interface down.

Fixes: fb35c534 ("net/mlx5e: Fix NULL pointer derefernce in set channels error flow")
Fixes: bbeb53b8 ("net/mlx5e: Move RSS params to a dedicated struct")
Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
Tested-by: NMaria Pasechnik <mariap@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c475e11e

net/mlx5e: IPoIB, Fix RX checksum statistics update · 3d6f3cdf

由 Feras Daoud 提交于 1月 14, 2019

Update the RX checksum only if the feature is enabled.

Fixes: 9d6bd752 ("net/mlx5e: IPoIB, RX handler")
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

3d6f3cdf

net/mlx5: Remove redundant lag function to get pf num · 6ffb6303

由 Roi Dayan 提交于 2月 26, 2019

The function is not being used.
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6ffb6303

net/mlx5e: Properly get the PF number phys port name ndo · 5b33eba9

由 Roi Dayan 提交于 2月 26, 2019

Currently, we fail to retrieve the PF number in some cases (e.g
single ported cards, lag capability), this further results in a
call trace issued by the rtnetlink code, since the error value
is not -EOPNOTSUPP. Change the implementation to be independent
from the lag code and function properly on both two ports and
single ported cards.

Call Trace:

[  194.525057] mlx5_core 0000:82:00.0: mlx5_lag_get_pf_num:605:(pid 837): no lag device, can't get pf num
[  194.525804] WARNING: CPU: 7 PID: 837 at net/core/rtnetlink.c:3457 rtmsg_ifinfo_build_skb+0x131/0x160
[  194.529952] CPU: 7 PID: 837 Comm: kworker/7:3 Tainted: G        W  O      5.0.0-rc7+ #3
[  194.531307] Workqueue: events linkwatch_event
[  194.531697] RIP: 0010:rtmsg_ifinfo_build_skb+0x131/0x160
[  194.545007] Call Trace:
[  194.545406]  rtmsg_ifinfo_event.part.29+0x1b/0xb0
[  194.545810]  rtmsg_ifinfo+0x51/0x80
[  194.546209]  netdev_state_change+0xc7/0x110
[  194.546608]  ? dev_valid_name+0x1b0/0x1b0
[  194.547010]  ? __local_bh_enable_ip+0xef/0x1d0
[  194.547411]  ? lockdep_hardirqs_on+0x3ea/0x560
[  194.547811]  ? linkwatch_do_dev+0x9b/0x100
[  194.548207]  linkwatch_do_dev+0x9b/0x100
[  194.548605]  __linkwatch_run_queue+0x244/0x430
[  194.549014]  ? linkwatch_schedule_work+0x100/0x100
[  194.549412]  ? lock_acquire+0x10f/0x2d0
[  194.549816]  linkwatch_event+0x3f/0x50
[  194.550212]  process_one_work+0x7d3/0x1460

Fixes: c12ecc23 ("net/mlx5e: Move to use common phys port names for vport representors")
Signed-off-by: NRoi Dayan <roid@mellanox.com>
Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5b33eba9

net/mlx5: Consolidate update FTE for all removal changes · 718ce4d6

由 Eli Britstein 提交于 1月 08, 2019

With commit a18e879d ("net/mlx5e: Annul encap action ordering
requirement") and a use-case of e-switch remote mirroring, the
incremental/stepped FTE removal process done by the fs core got us to
illegal transient states and FW errors:

SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad
parameter(0x3), syndrome (0x9c2e40)

To avoid that and improve FTE removal performance, aggregate the FTE's
updates that should be applied. Remove the FTE if it is empty, or apply
one FW update command with the aggregated updates.

Fixes: a18e879d ("net/mlx5e: Annul encap action ordering requirement")
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

718ce4d6

net/mlx5: Add a locked flag to node removal functions · 476d61b7

由 Eli Britstein 提交于 1月 31, 2019

Add a locked flag to the node removal functions to signal if the
parent is already locked from the caller function or not as a pre-step
towards outside lock. Currently always use false with no functional
change.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

476d61b7

net/mlx5: Add modify FTE helper function · e7aafc8f

由 Eli Britstein 提交于 1月 08, 2019

Add modify FTE helper function and use it when deleting a rule, as a
pre-step towards consolidated FTE modification, with no functional
change.
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e7aafc8f

net/mlx5: Fix multiple updates of steering rules in parallel · 6237634d

由 Eli Britstein 提交于 1月 31, 2019

There might be a condition where the fte found is not active yet. In
this case we should not use it, but continue to search for another, or
allocate a new one.

Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: NEli Britstein <elibr@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6237634d

07 3月, 2019 1 次提交

net/mlx5: ODP support for XRC transport is not enabled by default in FW · fca22e7e

由 Moni Shoua 提交于 2月 25, 2019

ODP support for XRC transport is not enabled by default in FW, so we need
separate ODP checks to enable/disable it.

While that, rewrite the set of ODP SRQ support capabilities in way that
tests each field separately for clearness, which is not needed for current
FW, but better to have it separated.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

fca22e7e

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功