提交 · 76a8426959a6a70b0d00158d2d7a8661957878c8 · openeuler / Kernel

06 5月, 2022 15 次提交

net: move snowflake callers to netif_napi_add_tx_weight() · 8d602e1a

由 Jakub Kicinski 提交于 5月 04, 2022

Make the drivers with custom tx napi weight call netif_napi_add_tx_weight().
Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220504163725.550782-2-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

8d602e1a

net: switch to netif_napi_add_tx() · 16d083e2

由 Jakub Kicinski 提交于 5月 04, 2022

Switch net callers to the new API not requiring
the NAPI_POLL_WEIGHT argument.
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Acked-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: NAlexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

16d083e2

jme: remove an unnecessary indirection · fd49f8e6

由 Jakub Kicinski 提交于 5月 04, 2022

Remove a define which looks like a OS abstraction layer
and makes spatch conversions on this driver problematic.

Link: https://lore.kernel.org/r/20220504163939.551231-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

fd49f8e6

net: ethernet: Prepare cleanup of powerpc's asm/prom.h · 6bff3ffc

由 Christophe Leroy 提交于 5月 04, 2022

powerpc's asm/prom.h includes some headers that it doesn't
need itself.

In order to clean powerpc's asm/prom.h up in a further step,
first clean all files that include asm/prom.h

Some files don't need asm/prom.h at all. For those ones,
just remove inclusion of asm/prom.h

Some files don't need any of the items provided by asm/prom.h,
but need some of the headers included by asm/prom.h. For those
ones, add the needed headers that are brought by asm/prom.h at
the moment and remove asm/prom.h

Some files really need asm/prom.h but also need some of the
headers included by asm/prom.h. For those one, leave asm/prom.h
but also add the needed headers so that they can be removed
from asm/prom.h in a later step.
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/09a13d592d628de95d30943e59b2170af5b48110.1651663857.git.christophe.leroy@csgroup.euSigned-off-by: NJakub Kicinski <kuba@kernel.org>

6bff3ffc

Revert "Merge branch 'mlxsw-line-card-model'" · c4a67a21

由 Jakub Kicinski 提交于 5月 04, 2022

This reverts commit 5e927a9f, reversing
changes made to cfc1d91a.

The discussion is still ongoing so let's remove the uAPI
until the discussion settles.

Link: https://lore.kernel.org/all/20220425090021.32e9a98f@kernel.org/Reviewed-by: NIdo Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20220504154037.539442-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

c4a67a21

ice: remove period on argument description in ice_for_each_vf · 4eaf1797

由 Jacob Keller 提交于 4月 11, 2022

The ice_for_each_vf macros have comments describing the implementation. One
of the arguments has a period on the end, which is not our typical style.
Remove the unnecessary period.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4eaf1797

ice: add a function comment for ice_cfg_mac_antispoof · 71c114e8

由 Jacob Keller 提交于 4月 11, 2022

This function definition was missing a comment describing its
implementation. Add one.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

71c114e8

ice: fix wording in comment for ice_reset_vf · 19c3e1ed

由 Jacob Keller 提交于 4月 11, 2022

The comment explaining ice_reset_vf has an extraneous "the" with the "if
the resets are disabled". Remove it.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

19c3e1ed

ice: remove return value comment for ice_reset_all_vfs · 00be8197

由 Jacob Keller 提交于 4月 11, 2022

Since commit fe99d1c0 ("ice: make ice_reset_all_vfs void"), the
ice_reset_all_vfs function has not returned anything. The function comment
still indicated it did. Fix this.

While here, also add a line to clarify the function resets all VFs at once
in response to hardware resets such as a PF reset.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

00be8197

ice: always check VF VSI pointer values · baeb705f

由 Jacob Keller 提交于 4月 11, 2022

The ice_get_vf_vsi function can return NULL in some cases, such as if
handling messages during a reset where the VSI is being removed and
recreated.

Several places throughout the driver do not bother to check whether this
VSI pointer is valid. Static analysis tools maybe report issues because
they detect paths where a potentially NULL pointer could be dereferenced.

Fix this by checking the return value of ice_get_vf_vsi everywhere.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

baeb705f

ice: add newline to dev_dbg in ice_vf_fdir_dump_info · 9880d3d6

由 Jacob Keller 提交于 4月 11, 2022

The debug print in ice_vf_fdir_dump_info does not end in newlines. This can
look confusing when reading the kernel log, as the next print will
immediately continue on the same line.

Fix this by adding the forgotten newline.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

9880d3d6

ice: get switch id on switchdev devices · 4b889474

由 Michal Swiatkowski 提交于 4月 20, 2022

Switch id should be the same for each netdevice on a driver.
The id must be unique between devices on the same system, but
does not need to be unique between devices on different systems.

The switch id is used to locate ports on a switch and to know if
aggregated ports belong to the same switch.

To meet this requirements, use pci_get_dsn as switch id value, as
this is unique value for each devices on the same system.

Implementing switch id is needed by automatic tools for kubernetes.

Set switch id by setting devlink port attribiutes and calling
devlink_port_attrs_set while creating pf (for uplink) and vf
(for representator) devlink port.

To get switch id (in switchdev mode):
cat /sys/class/net/$PF0/phys_switch_id
Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: NMarcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4b889474

ice: return ENOSPC when exceeding ICE_MAX_CHAIN_WORDS · bd1ffe8e

由 Wojciech Drewek 提交于 4月 20, 2022

When number of words exceeds ICE_MAX_CHAIN_WORDS, -ENOSPC
should be returned not -EINVAL. Do not overwrite this
error code in ice_add_tc_flower_adv_fltr.
Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
Suggested-by: NMarcin Szycik <marcin.szycik@linux.intel.com>
Acked-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

bd1ffe8e

ice: introduce common helper for retrieving VSI by vsi_num · 295819b5

由 Maciej Fijalkowski 提交于 3月 24, 2022

Both ice_idc.c and ice_virtchnl.c carry their own implementation of a
helper function that is looking for a given VSI based on provided
vsi_num. Their functionality is the same, so let's introduce the common
function in ice.h that both of the mentioned sites will use.

This is a strictly cleanup thing, no functionality is changed.
Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

295819b5

ice: use min_t() to make code cleaner in ice_gnss · 187dbc15

由 Wan Jiabing 提交于 3月 21, 2022

Fix the following coccicheck warning:
./drivers/net/ethernet/intel/ice/ice_gnss.c:79:26-27: WARNING opportunity for min()
Signed-off-by: NWan Jiabing <wanjiabing@vivo.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

187dbc15

05 5月, 2022 6 次提交

net: sparx5: Add handling of host MDB entries · 1c1ed5a4

由 Casper Andersson 提交于 5月 03, 2022

Handle adding and removing MDB entries for host
Signed-off-by: NCasper Andersson <casper.casan@gmail.com>
Link: https://lore.kernel.org/r/20220503093922.1630804-1-casper.casan@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

1c1ed5a4

net: mscc: ocelot: don't use magic numbers for OCELOT_POLICER_DISCARD · 91d350d6

由 Vladimir Oltean 提交于 5月 03, 2022

OCELOT_POLICER_DISCARD helps "kill dropped packets dead" since a
PERMIT/DENY mask mode with a port mask of 0 isn't enough to stop the CPU
port from receiving packets removed from the forwarding path.

The hardcoded initialization done for it in ocelot_vcap_init() is
confusing. All we need from it is to have a rate and a burst size of 0.

Reuse qos_policer_conf_set() for that purpose.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

91d350d6

net: mscc: ocelot: drop port argument from qos_policer_conf_set · 8e90c499

由 Vladimir Oltean 提交于 5月 03, 2022

The "port" argument is used for nothing else except printing on the
error path. Print errors on behalf of the policer index, which is less
confusing anyway.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

8e90c499

net: mscc: ocelot: use list_for_each_entry in ocelot_vcap_filter_add_to_block · 09fd1e0d

由 Vladimir Oltean 提交于 5月 03, 2022

Unify the code paths for adding to an empty list and to a list with
elements by keeping a "pos" list_head element that indicates where to
insert. Initialize "pos" with the list head itself in case
list_for_each_entry() doesn't iterate over any element.

Note that list_for_each_safe() isn't needed because no element is
removed from the list while iterating.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

09fd1e0d

net: mscc: ocelot: add to tail of empty list in ocelot_vcap_filter_add_to_block · 3825a0d0

由 Vladimir Oltean 提交于 5月 03, 2022

This makes no functional difference but helps in minimizing the delta
for a future change.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3825a0d0

net: mscc: ocelot: use list_add_tail in ocelot_vcap_filter_add_to_block() · 0a448bba

由 Vladimir Oltean 提交于 5月 03, 2022

list_add(..., pos->prev) and list_add_tail(..., pos) are equivalent, use
the later form to unify with the case where the list is empty later.
Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

0a448bba

04 5月, 2022 19 次提交

mlxsw: spectrum_router: Only query neighbour activity when necessary · cff94376

由 Ido Schimmel 提交于 5月 04, 2022

The driver periodically queries the device for activity of neighbour
entries in order to report it to the kernel's neighbour table.

Avoid unnecessary activity query when no neighbours are installed. Use
an atomic variable to track the number of neighbours, as it is read
without any locks.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cff94376

mlxsw: spectrum_switchdev: Only query FDB notifications when necessary · b8950003

由 Ido Schimmel 提交于 5月 04, 2022

The driver periodically queries the device for FDB notifications (e.g.,
learned, aged-out) in order to update the bridge driver. These
notifications can only be generated when bridges are offloaded to the
device.

Avoid unnecessary queries by starting to query upon installation of the
first bridge and stop querying upon removal of the last bridge.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8950003

mlxsw: spectrum_acl: Do not report activity for multicast routes · d1314096

由 Ido Schimmel 提交于 5月 04, 2022

The driver periodically queries the device for activity of ACL rules in
order to report it to tc upon 'FLOW_CLS_STATS'.

In Spectrum-2 and later ASICs, multicast routes are programmed as ACL
rules, but unlike rules installed by tc, their activity is of no
interest.

Avoid unnecessary activity query for such rules by always reporting them
as inactive.
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1314096

mlxsw: Treat LLDP packets as control · 0106668c

由 Petr Machata 提交于 5月 04, 2022

When trapping packets for on-CPU processing, Spectrum machines
differentiate between control and non-control traps. Traffic trapped
through non-control traps is treated as data and kept in shared buffer in
pools 0-4. Traffic trapped through control traps is kept in the dedicated
control buffer 9. The advantage of marking traps as control is that
pressure in the data plane does not prevent the control traffic to be
processed.

When the LLDP trap was introduced, it was marked as a control trap. But
then in commit aed4b572 ("mlxsw: spectrum: PTP: Hook into packet
receive path"), PTP traps were introduced. Because Ethernet-encapsulated
PTP packets look to the Spectrum-1 ASIC as LLDP traffic and are trapped
under the LLDP trap, this trap was reconfigured as non-control, in sync
with the PTP traps.

There is however no requirement that PTP traffic be handled as data.
Besides, the usual encapsulation for PTP traffic is UDP, not bare Ethernet,
and that is in deployments that even need PTP, which is far less common
than LLDP. This is reflected by the default policer, which was not bumped
up to the 19Kpps / 24Kpps that is the expected load of a PTP-enabled
Spectrum-1 switch.

Marking of LLDP trap as non-control was therefore probably misguided. In
this patch, change it back to control.
Reported-by: NMaksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0106668c

mlxsw: spectrum_dcb: Do not warn about priority changes · b6b58456

由 Petr Machata 提交于 5月 04, 2022

The idea behind the warnings is that the user would get warned in case when
more than one priority is configured for a given DSCP value on a netdevice.

The warning is currently wrong, because dcb_ieee_getapp_mask() returns
the first matching entry, not all of them, and the warning will then claim
that some priority is "current", when in fact it is not.

But more importantly, the warning is misleading in general. Consider the
following commands:

# dcb app flush dev swp19 dscp-prio
# dcb app add dev swp19 dscp-prio 24:3
# dcb app replace dev swp19 dscp-prio 24:2

The last command will issue the following warning:

mlxsw_spectrum3 0000:07:00.0 swp19: Ignoring new priority 2 for DSCP 24 in favor of current value of 3

The reason is that the "replace" command works by first adding the new
value, and then removing all old values. This is the only way to make the
replacement without causing the traffic to be prioritized to whatever the
chip defaults to. The warning is issued in response to adding the new
priority, and then no warning is shown when the old priority is removed.
The upshot is that the canonical way to change traffic prioritization
always produces a warning about ignoring the new priority, but what gets
configured is in fact what the user intended.

An option to just emit warning every time that the prioritization changes
just to make it clear that it happened is obviously unsatisfactory.

Therefore, in this patch, remove the warnings.
Reported-by: NMaksym Yaremchuk <maksymy@nvidia.com>
Signed-off-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6b58456

sfc: Copy a subset of mcdi_pcol.h to siena · 6b73f20a

由 Martin Habets 提交于 5月 04, 2022

For Siena we do not need new messages that were defined
for the EF100 architecture. Several debug messages have
also been removed.
Signed-off-by: NMartin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b73f20a

sfc: Disable Siena support · 0c38a5bd

由 Martin Habets 提交于 5月 04, 2022

Disable the build of Siena code until later in this patch series.
Prevent sfc.ko from binding to Siena NICs.

efx_init_sriov/efx_fini_sriov is only used for Siena. Remove calls
to those.
Signed-off-by: NMartin Habets <habetsm.xilinx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c38a5bd

net/mlx5: Fix matching on inner TTC · a042d7f5

由 Mark Bloch 提交于 4月 10, 2022

The cited commits didn't use proper matching on inner TTC
as a result distribution of encapsulated packets wasn't symmetric
between the physical ports.

Fixes: 4c71ce50 ("net/mlx5: Support partial TTC rules")
Fixes: 8e25a2bc ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

a042d7f5

net/mlx5: Avoid double clear or set of sync reset requested · fc3d3db0

由 Moshe Shemesh 提交于 4月 11, 2022

Double clear of reset requested state can lead to NULL pointer as it
will try to delete the timer twice. This can happen for example on a
race between abort from FW and pci error or reset. Avoid such case using
test_and_clear_bit() to verify only one time reset requested state clear
flow. Similarly use test_and_set_bit() to verify only one time reset
requested state set flow.

Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NMaher Sanalla <msanalla@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

fc3d3db0

net/mlx5: Fix deadlock in sync reset flow · cb7786a7

由 Moshe Shemesh 提交于 4月 11, 2022

The sync reset flow can lead to the following deadlock when
poll_sync_reset() is called by timer softirq and waiting on
del_timer_sync() for the same timer. Fix that by moving the part of the
flow that waits for the timer to reset_reload_work.

It fixes the following kernel Trace:
RIP: 0010:del_timer_sync+0x32/0x40
...
Call Trace:
 <IRQ>
 mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
 poll_sync_reset.cold+0x36/0x52 [mlx5_core]
 call_timer_fn+0x32/0x130
 __run_timers.part.0+0x180/0x280
 ? tick_sched_handle+0x33/0x60
 ? tick_sched_timer+0x3d/0x80
 ? ktime_get+0x3e/0xa0
 run_timer_softirq+0x2a/0x50
 __do_softirq+0xe1/0x2d6
 ? hrtimer_interrupt+0x136/0x220
 irq_exit+0xae/0xb0
 smp_apic_timer_interrupt+0x7b/0x140
 apic_timer_interrupt+0xf/0x20
 </IRQ>

Fixes: 3c5193a8 ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NMaher Sanalla <msanalla@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

cb7786a7

net/mlx5e: Fix trust state reset in reload · b781bff8

由 Moshe Tal 提交于 2月 09, 2022

Setting dscp2prio during the driver reload can cause dcb ieee app list to
be not empty after the reload finish and as a result to a conflict between
the priority trust state reported by the app and the state in the device
register.

Reset the dcb ieee app list on initialization in case this is
conflicting with the register status.

Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
Signed-off-by: NMoshe Tal <moshet@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b781bff8

net/mlx5e: Avoid checking offload capability in post_parse action · 0e322efd

由 Ariel Levkovich 提交于 3月 28, 2022

During TC action parsing, the can_offload callback is called
before calling the action's main parsing callback.

Later on, the can_offload callback is called again before handling
the action's post_parse callback if exists.

Since the main parsing callback might have changed and set parsing
params for the rule, following can_offload checks might fail because
some parsing params were already set.

Specifically, the ct action main parsing sets the ct param in the
parsing status structure and when the second can_offload for ct action
is called, before handling the ct post parsing, it will return an error
since it checks this ct param to indicate multiple ct actions which are
not supported.

Therefore, the can_offload call is removed from the post parsing
handling to prevent such cases.
This is allowed since the first can_offload call will ensure that the
action can be offloaded and the fact the code reached the post parsing
handling already means that the action can be offloaded.

Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
Reviewed-by: NPaul Blakey <paulb@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

0e322efd

net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release · b069e14f

由 Paul Blakey 提交于 3月 29, 2022

__mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
if that is the last reference to that tuple, the actual deletion of
the tuple can happen after the FT is already destroyed and freed.

Flush the used workqueue before destroying the ct FT.

Fixes: a2173131 ("net/mlx5e: CT: manage the lifetime of the ct entry object")
Reviewed-by: NOz Shlomo <ozsh@nvidia.com>
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b069e14f

net/mlx5e: TC, fix decap fallback to uplink when int port not supported · e3fdc71b

由 Ariel Levkovich 提交于 4月 25, 2022

When resolving the decap route device for a tunnel decap rule,
the result may be an OVS internal port device.

Prior to adding the support for internal port offload, such case
would result in using the uplink as the default decap route device
which allowed devices that can't support internal port offload
to offload this decap rule.

This behavior got broken by adding the internal port offload which
will fail in case the device can't support internal port offload.

To restore the old behavior, use the uplink device as the decap
route as before when internal port offload is not supported.

Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e3fdc71b

net/mlx5e: TC, Fix ct_clear overwriting ct action metadata · 087032ee

由 Ariel Levkovich 提交于 2月 23, 2022

ct_clear action is translated to clearing reg_c metadata
which holds ct state and zone information using mod header
actions.
These actions are allocated during the actions parsing, as
part of the flow attributes main mod header action list.

If ct action exists in the rule, the flow's main mod header
is used only in the post action table rule, after the ct tables
which set the ct info in the reg_c as part of the ct actions.

Therefore, if the original rule has a ct_clear action followed
by a ct action, the ct action reg_c setting will be done first and
will be followed by the ct_clear resetting reg_c and overwriting
the ct info.

Fix this by moving the ct_clear mod header actions allocation from
the ct action parsing stage to the ct action post parsing stage where
it is already known if ct_clear is followed by a ct action.
In such case, we skip the mod header actions allocation for the ct
clear since the ct action will write to reg_c anyway after clearing it.

Fixes: 806401c2 ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
Reviewed-by: NPaul Blakey <paulb@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

087032ee

net/mlx5e: Lag, Don't skip fib events on current dst · 4a2a664e

由 Vlad Buslov 提交于 4月 18, 2022

Referenced change added check to skip updating fib when new fib instance
has same or lower priority. However, new fib instance can be an update on
same dst address as existing one even though the structure is another
instance that has different address. Ignoring events on such instances
causes multipath LAG state to not be correctly updated.

Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
structure and don't skip events that have the same value of that fields.

Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4a2a664e

net/mlx5e: Lag, Fix fib_info pointer assignment · a6589155

由 Vlad Buslov 提交于 4月 18, 2022

Referenced change incorrectly sets single path fib_info even when LAG is
not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
that verifies LAG state.

Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

a6589155

net/mlx5e: Lag, Fix use-after-free in fib event handler · 27b0420f

由 Vlad Buslov 提交于 4月 18, 2022

Recent commit that modified fib route event handler to handle events
according to their priority introduced use-after-free[0] in mp->mfi pointer
usage. The pointer now is not just cached in order to be compared to
following fib_info instances, but is also dereferenced to obtain
fib_priority. However, since mlx5 lag code doesn't hold the reference to
fin_info during whole mp->mfi lifetime, it could be used after fib_info
instance has already been freed be kernel infrastructure code.

Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
type and cache fib_info priority in dedicated integer. Group
fib_info-related data into dedicated 'fib' structure that will be further
extended by following patches in the series.

[0]:

[  203.588029] ==================================================================
[  203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138

[  203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G    B             5.17.0-rc7+ #6
[  203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[  203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
[  203.600053] Call Trace:
[  203.600608]  <TASK>
[  203.601110]  dump_stack_lvl+0x48/0x5e
[  203.601860]  print_address_description.constprop.0+0x1f/0x160
[  203.602950]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.604073]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.605177]  kasan_report.cold+0x83/0xdf
[  203.605969]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.607102]  mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
[  203.608199]  ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
[  203.609382]  ? read_word_at_a_time+0xe/0x20
[  203.610463]  ? strscpy+0xa0/0x2a0
[  203.611463]  process_one_work+0x722/0x1270
[  203.612344]  worker_thread+0x540/0x11e0
[  203.613136]  ? rescuer_thread+0xd50/0xd50
[  203.613949]  kthread+0x26e/0x300
[  203.614627]  ? kthread_complete_and_exit+0x20/0x20
[  203.615542]  ret_from_fork+0x1f/0x30
[  203.616273]  </TASK>

[  203.617174] Allocated by task 3746:
[  203.617874]  kasan_save_stack+0x1e/0x40
[  203.618644]  __kasan_kmalloc+0x81/0xa0
[  203.619394]  fib_create_info+0xb41/0x3c50
[  203.620213]  fib_table_insert+0x190/0x1ff0
[  203.621020]  fib_magic.isra.0+0x246/0x2e0
[  203.621803]  fib_add_ifaddr+0x19f/0x670
[  203.622563]  fib_inetaddr_event+0x13f/0x270
[  203.623377]  blocking_notifier_call_chain+0xd4/0x130
[  203.624355]  __inet_insert_ifa+0x641/0xb20
[  203.625185]  inet_rtm_newaddr+0xc3d/0x16a0
[  203.626009]  rtnetlink_rcv_msg+0x309/0x880
[  203.626826]  netlink_rcv_skb+0x11d/0x340
[  203.627626]  netlink_unicast+0x4cc/0x790
[  203.628430]  netlink_sendmsg+0x762/0xc00
[  203.629230]  sock_sendmsg+0xb2/0xe0
[  203.629955]  ____sys_sendmsg+0x58a/0x770
[  203.630756]  ___sys_sendmsg+0xd8/0x160
[  203.631523]  __sys_sendmsg+0xb7/0x140
[  203.632294]  do_syscall_64+0x35/0x80
[  203.633045]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  203.634427] Freed by task 0:
[  203.635063]  kasan_save_stack+0x1e/0x40
[  203.635844]  kasan_set_track+0x21/0x30
[  203.636618]  kasan_set_free_info+0x20/0x30
[  203.637450]  __kasan_slab_free+0xfc/0x140
[  203.638271]  kfree+0x94/0x3b0
[  203.638903]  rcu_core+0x5e4/0x1990
[  203.639640]  __do_softirq+0x1ba/0x5d3

[  203.640828] Last potentially related work creation:
[  203.641785]  kasan_save_stack+0x1e/0x40
[  203.642571]  __kasan_record_aux_stack+0x9f/0xb0
[  203.643478]  call_rcu+0x88/0x9c0
[  203.644178]  fib_release_info+0x539/0x750
[  203.644997]  fib_table_delete+0x659/0xb80
[  203.645809]  fib_magic.isra.0+0x1a3/0x2e0
[  203.646617]  fib_del_ifaddr+0x93f/0x1300
[  203.647415]  fib_inetaddr_event+0x9f/0x270
[  203.648251]  blocking_notifier_call_chain+0xd4/0x130
[  203.649225]  __inet_del_ifa+0x474/0xc10
[  203.650016]  devinet_ioctl+0x781/0x17f0
[  203.650788]  inet_ioctl+0x1ad/0x290
[  203.651533]  sock_do_ioctl+0xce/0x1c0
[  203.652315]  sock_ioctl+0x27b/0x4f0
[  203.653058]  __x64_sys_ioctl+0x124/0x190
[  203.653850]  do_syscall_64+0x35/0x80
[  203.654608]  entry_SYSCALL_64_after_hwframe+0x44/0xae

[  203.666952] The buggy address belongs to the object at ffff888144df2000
                which belongs to the cache kmalloc-256 of size 256
[  203.669250] The buggy address is located 80 bytes inside of
                256-byte region [ffff888144df2000, ffff888144df2100)
[  203.671332] The buggy address belongs to the page:
[  203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
[  203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
[  203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
[  203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
[  203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
[  203.679928] page dumped because: kasan: bad access detected

[  203.681455] Memory state around the buggy address:
[  203.682421]  ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  203.683863]  ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  203.686701]                                                  ^
[  203.687820]  ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  203.689226]  ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  203.690620] ==================================================================

Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

27b0420f

net/mlx5e: Fix the calling of update_buffer_lossy() API · c4d963a5

由 Mark Zhang 提交于 4月 06, 2022

The arguments of update_buffer_lossy() is in a wrong order. Fix it.

Fixes: 88b3d5c9 ("net/mlx5e: Fix port buffers cell size value")
Signed-off-by: NMark Zhang <markzhang@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c4d963a5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功