提交 · 453f8305483851c20a41b66719d5acdc945541ca · openeuler / Kernel

09 2月, 2022 3 次提交

i40e: Add a stat tracking new RX page allocations · 453f8305

由 Joe Damato 提交于 12月 17, 2021

Add a counter for new page allocations in the i40e RX path. This stat is
accessible with ethtool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

453f8305

i40e: Aggregate and export RX page reuse stat · b3936d27

由 Joe Damato 提交于 12月 17, 2021

rx page reuse was already being tracked by the i40e driver per RX ring.
Aggregate the counts and make them accessible via ethtool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b3936d27

i40e: Remove rx page reuse double count · 89bb0983

由 Joe Damato 提交于 12月 17, 2021

Page reuse was being tracked from two locations:
  - i40e_reuse_rx_page (via 40e_clean_rx_irq), and
  - i40e_alloc_mapped_page

Remove the double count and only count reuse from i40e_alloc_mapped_page
when the page is about to be reused.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

89bb0983

08 2月, 2022 2 次提交

net: stmmac: optimize locking around PTP clock reads · 642436a1

由 Yannick Vignon 提交于 2月 04, 2022

Reading the PTP clock is a simple operation requiring only 3 register
reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
counter-productive: if the 2nd task preempting the 1st has a higher prio
but needs to read time as well, it will require 2 context switches, which
will pretty much always be more costly than just disabling preemption for
the duration of the reads. Moreover, with the code logic recently added
to get_systime(), disabling preemption is not even required anymore:
reads and writes just need to be protected from each other, to prevent a
clock read while the clock is being updated.

Improve the above situation by replacing the PTP spinlock by a rwlock, and
using read_lock for PTP clock reads so simultaneous reads do not block
each other.
Signed-off-by: NYannick Vignon <yannick.vignon@nxp.com>
Link: https://lore.kernel.org/r/20220204135545.2770625-1-yannick.vignon@oss.nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

642436a1

net: typhoon: include <net/vxlan.h> · d1d5bd64

由 Eric Dumazet 提交于 2月 07, 2022

We need this to get vxlan_features_check() definition.

Fixes: d2692eee ("net: typhoon: implement ndo_features_check method")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220208003502.1799728-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

d1d5bd64

07 2月, 2022 5 次提交

net: hns3: add support for TX push mode · 87a9b2fd

由 Yufeng Mo 提交于 2月 07, 2022

For the device that supports the TX push capability, the BD can
be directly copied to the device memory. However, due to hardware
restrictions, the push mode can be used only when there are no
more than two BDs, otherwise, the doorbell mode based on device
memory is used.
Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87a9b2fd

r8169: factor out redundant RTL8168d PHY config functionality to rtl8168d_1_common() · b845bac8

由 Heiner Kallweit 提交于 2月 06, 2022

rtl8168d_2_hw_phy_config() shares quite some functionality with
rtl8168d_1_hw_phy_config(), so let's factor out the common part to a
new function rtl8168d_1_common(). In addition improve the code a little.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b845bac8

mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv6 addresses · 463e1ab8

由 Danielle Ratson 提交于 2月 06, 2022

Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6
source and destination addresses change. Offload suitable mangles to
the IPv6 address change action.
Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

463e1ab8

mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv4 addresses · d7809b62

由 Danielle Ratson 提交于 2月 06, 2022

Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6
source and destination addresses change. Offload suitable mangles to
the IPv4 address change action.
Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7809b62

mlxsw: core_acl_flex_actions: Add SIP_DIP_ACTION · e3541022

由 Danielle Ratson 提交于 2月 06, 2022

Add fields related to SIP_DIP_ACTION, which is used for changing of SIP
and DIP addresses.
Signed-off-by: NDanielle Ratson <danieller@nvidia.com>
Reviewed-by: NPetr Machata <petrm@nvidia.com>
Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3541022

05 2月, 2022 9 次提交

net: typhoon: implement ndo_features_check method · d2692eee

由 Eric Dumazet 提交于 2月 04, 2022

Instead of disabling TSO at compile time if MAX_SKB_FRAGS > 32,
implement ndo_features_check() method for this driver for
a more dynamic handling.

If skb has more than 32 frags and is a GSO packet, force
software segmentation.

Most locally generated packets will use a small number
of fragments anyway.

For forwarding workloads, we can limit gro_max_size at ingress,
we might also implement gro_max_segs if needed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2692eee

net: sundance: Replace one-element array with non-array object · 5f215513

由 Gustavo A. R. Silva 提交于 2月 04, 2022

It seems this one-element array is not actually being used as an
array of variable size, so we can just replace it with just a
non-array object of type struct desc_frag and refactor a bit the
rest of the code.

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

This issue was found with the help of Coccinelle and audited and fixed,
manually.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f215513

bnx2x: Replace one-element array with flexible-array member · 76ad950c

由 Gustavo A. R. Silva 提交于 2月 04, 2022

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

This helps with the ongoing efforts to globally enable -Warray-bounds
and get us closer to being able to tighten the FORTIFY_SOURCE routines
on memcpy().

This issue was found with the help of Coccinelle and audited and fixed,
manually.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76ad950c

net: mana: Remove unnecessary check of cqe_type in mana_process_rx_cqe() · 68f83135

由 Haiyang Zhang 提交于 2月 04, 2022

The switch statement already ensures cqe_type == CQE_RX_OKAY at that
point.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68f83135

net: mana: Add handling of CQE_RX_TRUNCATED · e4b76219

由 Haiyang Zhang 提交于 2月 04, 2022

The proper way to drop this kind of CQE is advancing rxq tail
without indicating the packet to the upper network layer.
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4b76219

net: sparx5: remove phylink_config.pcs_poll usage · 3682e7b8

由 Russell King (Oracle) 提交于 2月 04, 2022

Phylink will use PCS polling whenever phylink_config.pcs_poll or the
phylink_pcs poll member is set. As this driver sets both, remove the
former.
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3682e7b8

net: lan966x: Update mdb when enabling/disabling mcast_snooping · add2c844

由 Horatiu Vultur 提交于 2月 04, 2022

When the multicast snooping is disabled, the mdb entries should be
removed from the HW, but they still need to be kept in memory for when
the mcast_snooping will be enabled again.
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

add2c844

net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED · 47aeea0d

由 Horatiu Vultur 提交于 2月 04, 2022

The callback allows to enable/disable multicast snooping.
When the snooping is enabled, all IGMP and MLD frames are redirected to
the CPU, therefore make sure not to set the skb flag 'offload_fwd_mark'.
The HW will not flood multicast ipv4/ipv6 data frames.
When the snooping is disabled, the HW will flood IGMP, MLD and multicast
ipv4/ipv6 frames according to the mcast_flood flag.
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47aeea0d

net: lan966x: Update the PGID used by IPV6 data frames · 1c213f05

由 Horatiu Vultur 提交于 2月 04, 2022

When enabling the multicast snooping, the forwarding of the IPV6 frames
has it's own forwarding mask.
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c213f05

04 2月, 2022 8 次提交

net: lan966x: use .mac_select_pcs() interface · 41414c9b

由 Horatiu Vultur 提交于 2月 02, 2022

Convert lan966x to use the mac_select_interface instead of
phylink_set_pcs.
Signed-off-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220202114949.833075-1-horatiu.vultur@microchip.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

41414c9b

net: stmmac: ensure PTP time register reads are consistent · 80d46090

由 Yannick Vignon 提交于 2月 03, 2022

Even if protected from preemption and interrupts, a small time window
remains when the 2 register reads could return inconsistent values,
each time the "seconds" register changes. This could lead to an about
1-second error in the reported time.

Add logic to ensure the "seconds" and "nanoseconds" values are consistent.

Fixes: 92ba6888 ("stmmac: add the support for PTP hw clock driver")
Signed-off-by: NYannick Vignon <yannick.vignon@nxp.com>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

80d46090

i40e: Fix race condition while adding/deleting MAC/VLAN filters · 53a9e346

由 Jedrzej Jagielski 提交于 1月 14, 2022

There was a race condition in access to hw->aq.asq_last_status
while adding and deleting  MAC/VLAN filters causing
incorrect error status to be printed as ERROR OK instead of
the correct error.

Change calls to i40e_aq_add_macvlan in i40e_aqc_add_filters
and i40e_aq_remove_macvlan in i40e_aqc_del_filters
to  _v2 versions that return Admin Queue status on the stack
to avoid race conditions in access to hw->aq.asq_last_status.
Signed-off-by: NSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

53a9e346

i40e: Add new version of i40e_aq_add_macvlan function · b3237df9

由 Jedrzej Jagielski 提交于 1月 14, 2022

ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.

Add new _v2 version of i40e_aq_add_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: NSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b3237df9

i40e: Add new versions of send ASQ command functions · 74073848

由 Jedrzej Jagielski 提交于 1月 14, 2022

ASQ send command functions are returning only i40e status codes
yet some calling functions also need Admin Queue status
that is stored in hw->aq.asq_last_status. Since hw object
is stored on a heap it introduces a possibility for
a race condition in access to hw if calling function is not
fast enough to read hw->aq.asq_last_status before next
send ASQ command is executed.

Add new versions of send ASQ command functions that return
Admin Queue status on the stack to avoid race conditions
in access to hw->aq.asq_last_status.
Add new _v2 version of i40e_aq_remove_macvlan that is using
new _v2 versions of ASQ send command functions and returns
the Admin Queue status on the stack.
Signed-off-by: NSylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

74073848

i40e: Add sending commands in atomic context · 59b3d735

由 Jedrzej Jagielski 提交于 1月 14, 2022

Change functions:
- i40e_aq_add_macvlan
- i40e_aq_remove_macvlan
- i40e_aq_delete_element
- i40e_aq_add_vsi
- i40e_aq_update_vsi_params
to explicitly use i40e_asq_send_command_atomic(..., true)
instead of i40e_asq_send_command, as they use mutexes and do some
work in an atomic context.
Without this change setting vlan via netdev will fail with
call trace cased by bug "BUG: scheduling while atomic".
Signed-off-by: NWitold Fijalkowski <witoldx.fijalkowski@intel.com>
Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

59b3d735

i40e: Remove unused RX realloc stat · 79f227c4

由 Joe Damato 提交于 1月 05, 2022

After commit 1a557afc ("i40e: Refactor receive routine"),
rx_stats.realloc_count is no longer being incremented, so remove it.

The debugfs string was left, but hardcoded to 0. This is intended to
prevent breaking any existing code / scripts that are parsing debugfs
for i40e.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

79f227c4

i40e: Disable hw-tc-offload feature on driver load · 647c65e1

由 Mateusz Palczewski 提交于 12月 16, 2021

After loading driver hw-tc-offload is enabled by default.
Change the behaviour of driver to disable hw-tc-offload by default as
this is the expected state. Additionally since this impacts ntuple
feature state change the way of checking NETIF_F_HW_TC flag.
Signed-off-by: NNorbert Zulinski <norbertx.zulinski@intel.com>
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: NMateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

647c65e1

03 2月, 2022 2 次提交

net: sparx5: do not refer to skb after passing it on · 81eb8b0b

由 Steen Hegelund 提交于 2月 02, 2022

Do not try to use any SKB fields after the packet has been passed up in the
receive stack.
Reported-by: Nkernel test robot <lkp@intel.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteen Hegelund <steen.hegelund@microchip.com>
Link: https://lore.kernel.org/r/20220202083039.3774851-1-steen.hegelund@microchip.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

81eb8b0b

drivers: net: Replace acpi_bus_get_device() · 52dae93f

由 Rafael J. Wysocki 提交于 2月 02, 2022

Replace acpi_bus_get_device() that is going to be dropped with
acpi_fetch_acpi_dev().

No intentional functional impact.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/11918902.O9o76ZdvQC@kreacher
Link: https://lore.kernel.org/r/11920660.O9o76ZdvQC@kreacherSigned-off-by: NJakub Kicinski <kuba@kernel.org>

52dae93f

02 2月, 2022 11 次提交

net/mlx5e: Avoid field-overflowing memcpy() · ad518573

由 Kees Cook 提交于 1月 24, 2022

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

Use flexible arrays instead of zero-element arrays (which look like they
are always overflowing) and split the cross-field memcpy() into two halves
that can be appropriately bounds-checked by the compiler.

We were doing:

	#define ETH_HLEN  14
	#define VLAN_HLEN  4
	...
	#define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
	...
        struct mlx5e_tx_wqe      *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
	...
        struct mlx5_wqe_eth_seg  *eseg = &wqe->eth;
        struct mlx5_wqe_data_seg *dseg = wqe->data;
	...
	memcpy(eseg->inline_hdr.start, xdptxd->data, MLX5E_XDP_MIN_INLINE);

target is wqe->eth.inline_hdr.start (which the compiler sees as being
2 bytes in size), but copying 18, intending to write across start
(really vlan_tci, 2 bytes). The remaining 16 bytes get written into
wqe->data[0], covering byte_count (4 bytes), lkey (4 bytes), and addr
(8 bytes).

struct mlx5e_tx_wqe {
        struct mlx5_wqe_ctrl_seg   ctrl;                 /*     0    16 */
        struct mlx5_wqe_eth_seg    eth;                  /*    16    16 */
        struct mlx5_wqe_data_seg   data[];               /*    32     0 */

        /* size: 32, cachelines: 1, members: 3 */
        /* last cacheline: 32 bytes */
};

struct mlx5_wqe_eth_seg {
        u8                         swp_outer_l4_offset;  /*     0     1 */
        u8                         swp_outer_l3_offset;  /*     1     1 */
        u8                         swp_inner_l4_offset;  /*     2     1 */
        u8                         swp_inner_l3_offset;  /*     3     1 */
        u8                         cs_flags;             /*     4     1 */
        u8                         swp_flags;            /*     5     1 */
        __be16                     mss;                  /*     6     2 */
        __be32                     flow_table_metadata;  /*     8     4 */
        union {
                struct {
                        __be16     sz;                   /*    12     2 */
                        u8         start[2];             /*    14     2 */
                } inline_hdr;                            /*    12     4 */
                struct {
                        __be16     type;                 /*    12     2 */
                        __be16     vlan_tci;             /*    14     2 */
                } insert;                                /*    12     4 */
                __be32             trailer;              /*    12     4 */
        };                                               /*    12     4 */

        /* size: 16, cachelines: 1, members: 9 */
        /* last cacheline: 16 bytes */
};

struct mlx5_wqe_data_seg {
        __be32                     byte_count;           /*     0     4 */
        __be32                     lkey;                 /*     4     4 */
        __be64                     addr;                 /*     8     8 */

        /* size: 16, cachelines: 1, members: 3 */
        /* last cacheline: 16 bytes */
};

So, split the memcpy() so the compiler can reason about the buffer
sizes.

"pahole" shows no size nor member offset changes to struct mlx5e_tx_wqe
nor struct mlx5e_umr_wqe. "objdump -d" shows no meaningful object
code changes (i.e. only source line number induced differences and
optimizations).

Fixes: b5503b99 ("net/mlx5e: XDP TX forwarding support")
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ad518573

net/mlx5e: Use struct_group() for memcpy() region · 6d5c900e

由 Kees Cook 提交于 1月 24, 2022

In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
intentionally writing across neighboring fields.

Use struct_group() in struct vlan_ethhdr around members h_dest and
h_source, so they can be referenced together. This will allow memcpy()
and sizeof() to more easily reason about sizes, improve readability,
and avoid future warnings about writing beyond the end of h_dest.

"pahole" shows no size nor member offset changes to struct vlan_ethhdr.
"objdump -d" shows no object code changes.

Fixes: 34802a42 ("net/mlx5e: Do not modify the TX SKB")
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

6d5c900e

net/mlx5e: Avoid implicit modify hdr for decap drop rule · 5b209d1a

由 Roi Dayan 提交于 2月 01, 2022

Currently the driver adds implicit modify hdr action for
decap rules on tunnel devices if the port is an ovs port.
This is also done if the action is drop and makes the modify
hdr redundant and also the FW doesn't support it and will generate
a syndrome.

kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)

Fix it by adding the implicit modify hdr only for fwd actions.

Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
Fixes: 077cdda7 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Reviewed-by: NAriel Levkovich <lariel@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5b209d1a

net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic · de47db0c

由 Raed Salem 提交于 12月 02, 2021

IPsec Tunnel mode crypto offload software parser (SWP) setting in data
path currently always set the inner L4 offset regardless of the
encapsulated L4 header type and whether it exists in the first place,
this breaks non TCP/UDP traffic as such.

Set the SWP inner L4 offset only when the IPsec tunnel encapsulated L4
header protocol is TCP/UDP.

While at it fix inner ip protocol read for setting MLX5_ETH_WQE_SWP_INNER_L4_UDP
flag to address the case where the ip header protocol is IPv6.

Fixes: f1267798 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
Signed-off-by: NRaed Salem <raeds@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

de47db0c

net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic · 5352859b

由 Raed Salem 提交于 12月 02, 2021

IPsec crypto offload always set the ethernet segment checksum flags with
the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
packet regardless of the encapsulated L4 header type, and even if it
doesn't exists in the first place, this breaks non TCP/UDP traffic as
such.

Set the inner L4 checksum flag only when the encapsulated L4 header
protocol is TCP/UDP using software parser swp_inner_l4_offset field as
indication.

Fixes: 5cfb540e ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
Signed-off-by: NRaed Salem <raeds@nvidia.com>
Reviewed-by: NMaor Dickman <maord@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5352859b

net/mlx5e: Don't treat small ceil values as unlimited in HTB offload · 736dfe4e

由 Maxim Mikityanskiy 提交于 1月 18, 2022

The hardware spec defines max_average_bw == 0 as "unlimited bandwidth".
max_average_bw is calculated as `ceil / BYTES_IN_MBIT`, which can become
0 when ceil is small, leading to an undesired effect of having no
bandwidth limit.

This commit fixes it by rounding up small values of ceil to 1 Mbit/s.

Fixes: 214baf22 ("net/mlx5e: Support HTB offload")
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

736dfe4e

net/mlx5: E-Switch, Fix uninitialized variable modact · d8e5883d

由 Maor Dickman 提交于 1月 30, 2022

The variable modact is not initialized before used in command
modify header allocation which can cause command to fail.

Fix by initializing modact with zeros.

Addresses-Coverity: ("Uninitialized scalar variable")
Fixes: 8f1e0b97 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
Signed-off-by: NMaor Dickman <maord@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

d8e5883d

net/mlx5e: Fix handling of wrong devices during bond netevent · ec41332e

由 Maor Dickman 提交于 1月 13, 2022

Current implementation of bond netevent handler only check if
the handled netdev is VF representor and it missing a check if
the VF representor is on the same phys device of the bond handling
the netevent.

Fix by adding the missing check and optimizing the check if
the netdev is VF representor so it will not access uninitialized
private data and crashes.

BUG: kernel NULL pointer dereference, address: 000000000000036c
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
Workqueue: eth3bond0 bond_mii_monitor [bonding]
RIP: 0010:mlx5e_is_uplink_rep+0xc/0x50 [mlx5_core]
RSP: 0018:ffff88812d69fd60 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8881cf800000 RCX: 0000000000000000
RDX: ffff88812d69fe10 RSI: 000000000000001b RDI: ffff8881cf800880
RBP: ffff8881cf800000 R08: 00000445cabccf2b R09: 0000000000000008
R10: 0000000000000004 R11: 0000000000000008 R12: ffff88812d69fe10
R13: 00000000fffffffe R14: ffff88820c0f9000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88846fb00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000036c CR3: 0000000103d80006 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mlx5e_eswitch_uplink_rep+0x31/0x40 [mlx5_core]
 mlx5e_rep_is_lag_netdev+0x94/0xc0 [mlx5_core]
 mlx5e_rep_esw_bond_netevent+0xeb/0x3d0 [mlx5_core]
 raw_notifier_call_chain+0x41/0x60
 call_netdevice_notifiers_info+0x34/0x80
 netdev_lower_state_changed+0x4e/0xa0
 bond_mii_monitor+0x56b/0x640 [bonding]
 process_one_work+0x1b9/0x390
 worker_thread+0x4d/0x3d0
 ? rescuer_thread+0x350/0x350
 kthread+0x124/0x150
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x1f/0x30

Fixes: 7e51891a ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: NMaor Dickman <maord@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ec41332e

net/mlx5e: Fix broken SKB allocation in HW-GRO · 7957837b

由 Khalid Manaa 提交于 1月 26, 2022

In case the HW doesn't perform header-data split, it will write the whole
packet into the data buffer in the WQ, in this case the SHAMPO CQE handler
couldn't use the header entry to build the SKB, instead it should allocate
a new memory to build the SKB using the function:
mlx5e_skb_from_cqe_mpwrq_nonlinear.

Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: NKhalid Manaa <khalidm@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

7957837b

net/mlx5e: Fix wrong calculation of header index in HW_GRO · b8d91145

由 Khalid Manaa 提交于 1月 26, 2022

The HW doesn't wrap the CQE.shampo.header_index field according to the
headers buffer size, instead it always increases it until reaching overflow
of u16 size.

Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the
CQE header_index field to find the actual header index in the headers buffer.

Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
Signed-off-by: NKhalid Manaa <khalidm@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b8d91145

net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion · 880b5176

由 Roi Dayan 提交于 1月 24, 2022

When changing mode to switchdev, rep bridge init registered to netdevice
notifier holds the devlink lock and then takes pernet_ops_rwsem.
At that time deleting a netns holds pernet_ops_rwsem and then takes
the devlink lock.

Example sequence is:
$ ip netns add foo
$ devlink dev eswitch set pci/0000:00:08.0 mode switchdev &
$ ip netns del foo

deleting netns trace:

[ 1185.365555]  ? devlink_pernet_pre_exit+0x74/0x1c0
[ 1185.368331]  ? mutex_lock_io_nested+0x13f0/0x13f0
[ 1185.370984]  ? xt_find_table+0x40/0x100
[ 1185.373244]  ? __mutex_lock+0x24a/0x15a0
[ 1185.375494]  ? net_generic+0xa0/0x1c0
[ 1185.376844]  ? wait_for_completion_io+0x280/0x280
[ 1185.377767]  ? devlink_pernet_pre_exit+0x74/0x1c0
[ 1185.378686]  devlink_pernet_pre_exit+0x74/0x1c0
[ 1185.379579]  ? devlink_nl_cmd_get_dumpit+0x3a0/0x3a0
[ 1185.380557]  ? xt_find_table+0xda/0x100
[ 1185.381367]  cleanup_net+0x372/0x8e0

changing mode to switchdev trace:

[ 1185.411267]  down_write+0x13a/0x150
[ 1185.412029]  ? down_write_killable+0x180/0x180
[ 1185.413005]  register_netdevice_notifier+0x1e/0x210
[ 1185.414000]  mlx5e_rep_bridge_init+0x181/0x360 [mlx5_core]
[ 1185.415243]  mlx5e_uplink_rep_enable+0x269/0x480 [mlx5_core]
[ 1185.416464]  ? mlx5e_uplink_rep_disable+0x210/0x210 [mlx5_core]
[ 1185.417749]  mlx5e_attach_netdev+0x232/0x400 [mlx5_core]
[ 1185.418906]  mlx5e_netdev_attach_profile+0x15b/0x1e0 [mlx5_core]
[ 1185.420172]  mlx5e_netdev_change_profile+0x15a/0x1d0 [mlx5_core]
[ 1185.421459]  mlx5e_vport_rep_load+0x557/0x780 [mlx5_core]
[ 1185.422624]  ? mlx5e_stats_grp_vport_rep_num_stats+0x10/0x10 [mlx5_core]
[ 1185.424006]  mlx5_esw_offloads_rep_load+0xdb/0x190 [mlx5_core]
[ 1185.425277]  esw_offloads_enable+0xd74/0x14a0 [mlx5_core]

Fix this by registering rep bridges for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 19e9bfa0 ("net/mlx5: Bridge, add offload infrastructure")
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Reviewed-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

880b5176

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功