1. 09 2月, 2022 3 次提交
  2. 08 2月, 2022 2 次提交
  3. 07 2月, 2022 5 次提交
  4. 05 2月, 2022 9 次提交
  5. 04 2月, 2022 8 次提交
  6. 03 2月, 2022 2 次提交
  7. 02 2月, 2022 11 次提交
    • K
      net/mlx5e: Avoid field-overflowing memcpy() · ad518573
      Kees Cook 提交于
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally writing across neighboring fields.
      
      Use flexible arrays instead of zero-element arrays (which look like they
      are always overflowing) and split the cross-field memcpy() into two halves
      that can be appropriately bounds-checked by the compiler.
      
      We were doing:
      
      	#define ETH_HLEN  14
      	#define VLAN_HLEN  4
      	...
      	#define MLX5E_XDP_MIN_INLINE (ETH_HLEN + VLAN_HLEN)
      	...
              struct mlx5e_tx_wqe      *wqe  = mlx5_wq_cyc_get_wqe(wq, pi);
      	...
              struct mlx5_wqe_eth_seg  *eseg = &wqe->eth;
              struct mlx5_wqe_data_seg *dseg = wqe->data;
      	...
      	memcpy(eseg->inline_hdr.start, xdptxd->data, MLX5E_XDP_MIN_INLINE);
      
      target is wqe->eth.inline_hdr.start (which the compiler sees as being
      2 bytes in size), but copying 18, intending to write across start
      (really vlan_tci, 2 bytes). The remaining 16 bytes get written into
      wqe->data[0], covering byte_count (4 bytes), lkey (4 bytes), and addr
      (8 bytes).
      
      struct mlx5e_tx_wqe {
              struct mlx5_wqe_ctrl_seg   ctrl;                 /*     0    16 */
              struct mlx5_wqe_eth_seg    eth;                  /*    16    16 */
              struct mlx5_wqe_data_seg   data[];               /*    32     0 */
      
              /* size: 32, cachelines: 1, members: 3 */
              /* last cacheline: 32 bytes */
      };
      
      struct mlx5_wqe_eth_seg {
              u8                         swp_outer_l4_offset;  /*     0     1 */
              u8                         swp_outer_l3_offset;  /*     1     1 */
              u8                         swp_inner_l4_offset;  /*     2     1 */
              u8                         swp_inner_l3_offset;  /*     3     1 */
              u8                         cs_flags;             /*     4     1 */
              u8                         swp_flags;            /*     5     1 */
              __be16                     mss;                  /*     6     2 */
              __be32                     flow_table_metadata;  /*     8     4 */
              union {
                      struct {
                              __be16     sz;                   /*    12     2 */
                              u8         start[2];             /*    14     2 */
                      } inline_hdr;                            /*    12     4 */
                      struct {
                              __be16     type;                 /*    12     2 */
                              __be16     vlan_tci;             /*    14     2 */
                      } insert;                                /*    12     4 */
                      __be32             trailer;              /*    12     4 */
              };                                               /*    12     4 */
      
              /* size: 16, cachelines: 1, members: 9 */
              /* last cacheline: 16 bytes */
      };
      
      struct mlx5_wqe_data_seg {
              __be32                     byte_count;           /*     0     4 */
              __be32                     lkey;                 /*     4     4 */
              __be64                     addr;                 /*     8     8 */
      
              /* size: 16, cachelines: 1, members: 3 */
              /* last cacheline: 16 bytes */
      };
      
      So, split the memcpy() so the compiler can reason about the buffer
      sizes.
      
      "pahole" shows no size nor member offset changes to struct mlx5e_tx_wqe
      nor struct mlx5e_umr_wqe. "objdump -d" shows no meaningful object
      code changes (i.e. only source line number induced differences and
      optimizations).
      
      Fixes: b5503b99 ("net/mlx5e: XDP TX forwarding support")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      ad518573
    • K
      net/mlx5e: Use struct_group() for memcpy() region · 6d5c900e
      Kees Cook 提交于
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally writing across neighboring fields.
      
      Use struct_group() in struct vlan_ethhdr around members h_dest and
      h_source, so they can be referenced together. This will allow memcpy()
      and sizeof() to more easily reason about sizes, improve readability,
      and avoid future warnings about writing beyond the end of h_dest.
      
      "pahole" shows no size nor member offset changes to struct vlan_ethhdr.
      "objdump -d" shows no object code changes.
      
      Fixes: 34802a42 ("net/mlx5e: Do not modify the TX SKB")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      6d5c900e
    • R
      net/mlx5e: Avoid implicit modify hdr for decap drop rule · 5b209d1a
      Roi Dayan 提交于
      Currently the driver adds implicit modify hdr action for
      decap rules on tunnel devices if the port is an ovs port.
      This is also done if the action is drop and makes the modify
      hdr redundant and also the FW doesn't support it and will generate
      a syndrome.
      
      kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3)
      
      Fix it by adding the implicit modify hdr only for fwd actions.
      
      Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
      Fixes: 077cdda7 ("net/mlx5e: TC, Fix memory leak with rules with internal port")
      Signed-off-by: NRoi Dayan <roid@nvidia.com>
      Reviewed-by: NAriel Levkovich <lariel@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      5b209d1a
    • R
      net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic · de47db0c
      Raed Salem 提交于
      IPsec Tunnel mode crypto offload software parser (SWP) setting in data
      path currently always set the inner L4 offset regardless of the
      encapsulated L4 header type and whether it exists in the first place,
      this breaks non TCP/UDP traffic as such.
      
      Set the SWP inner L4 offset only when the IPsec tunnel encapsulated L4
      header protocol is TCP/UDP.
      
      While at it fix inner ip protocol read for setting MLX5_ETH_WQE_SWP_INNER_L4_UDP
      flag to address the case where the ip header protocol is IPv6.
      
      Fixes: f1267798 ("net/mlx5: Fix checksum issue of VXLAN and IPsec crypto offload")
      Signed-off-by: NRaed Salem <raeds@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      de47db0c
    • R
      net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic · 5352859b
      Raed Salem 提交于
      IPsec crypto offload always set the ethernet segment checksum flags with
      the inner L4 header checksum flag enabled for encapsulated IPsec offloaded
      packet regardless of the encapsulated L4 header type, and even if it
      doesn't exists in the first place, this breaks non TCP/UDP traffic as
      such.
      
      Set the inner L4 checksum flag only when the encapsulated L4 header
      protocol is TCP/UDP using software parser swp_inner_l4_offset field as
      indication.
      
      Fixes: 5cfb540e ("net/mlx5e: Set IPsec WAs only in IP's non checksum partial case.")
      Signed-off-by: NRaed Salem <raeds@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      5352859b
    • M
      net/mlx5e: Don't treat small ceil values as unlimited in HTB offload · 736dfe4e
      Maxim Mikityanskiy 提交于
      The hardware spec defines max_average_bw == 0 as "unlimited bandwidth".
      max_average_bw is calculated as `ceil / BYTES_IN_MBIT`, which can become
      0 when ceil is small, leading to an undesired effect of having no
      bandwidth limit.
      
      This commit fixes it by rounding up small values of ceil to 1 Mbit/s.
      
      Fixes: 214baf22 ("net/mlx5e: Support HTB offload")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      736dfe4e
    • M
      net/mlx5: E-Switch, Fix uninitialized variable modact · d8e5883d
      Maor Dickman 提交于
      The variable modact is not initialized before used in command
      modify header allocation which can cause command to fail.
      
      Fix by initializing modact with zeros.
      
      Addresses-Coverity: ("Uninitialized scalar variable")
      Fixes: 8f1e0b97 ("net/mlx5: E-Switch, Mark miss packets with new chain id mapping")
      Signed-off-by: NMaor Dickman <maord@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      d8e5883d
    • M
      net/mlx5e: Fix handling of wrong devices during bond netevent · ec41332e
      Maor Dickman 提交于
      Current implementation of bond netevent handler only check if
      the handled netdev is VF representor and it missing a check if
      the VF representor is on the same phys device of the bond handling
      the netevent.
      
      Fix by adding the missing check and optimizing the check if
      the netdev is VF representor so it will not access uninitialized
      private data and crashes.
      
      BUG: kernel NULL pointer dereference, address: 000000000000036c
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP NOPTI
      Workqueue: eth3bond0 bond_mii_monitor [bonding]
      RIP: 0010:mlx5e_is_uplink_rep+0xc/0x50 [mlx5_core]
      RSP: 0018:ffff88812d69fd60 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff8881cf800000 RCX: 0000000000000000
      RDX: ffff88812d69fe10 RSI: 000000000000001b RDI: ffff8881cf800880
      RBP: ffff8881cf800000 R08: 00000445cabccf2b R09: 0000000000000008
      R10: 0000000000000004 R11: 0000000000000008 R12: ffff88812d69fe10
      R13: 00000000fffffffe R14: ffff88820c0f9000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88846fb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000000036c CR3: 0000000103d80006 CR4: 0000000000370ea0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       mlx5e_eswitch_uplink_rep+0x31/0x40 [mlx5_core]
       mlx5e_rep_is_lag_netdev+0x94/0xc0 [mlx5_core]
       mlx5e_rep_esw_bond_netevent+0xeb/0x3d0 [mlx5_core]
       raw_notifier_call_chain+0x41/0x60
       call_netdevice_notifiers_info+0x34/0x80
       netdev_lower_state_changed+0x4e/0xa0
       bond_mii_monitor+0x56b/0x640 [bonding]
       process_one_work+0x1b9/0x390
       worker_thread+0x4d/0x3d0
       ? rescuer_thread+0x350/0x350
       kthread+0x124/0x150
       ? set_kthread_struct+0x40/0x40
       ret_from_fork+0x1f/0x30
      
      Fixes: 7e51891a ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
      Signed-off-by: NMaor Dickman <maord@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      ec41332e
    • K
      net/mlx5e: Fix broken SKB allocation in HW-GRO · 7957837b
      Khalid Manaa 提交于
      In case the HW doesn't perform header-data split, it will write the whole
      packet into the data buffer in the WQ, in this case the SHAMPO CQE handler
      couldn't use the header entry to build the SKB, instead it should allocate
      a new memory to build the SKB using the function:
      mlx5e_skb_from_cqe_mpwrq_nonlinear.
      
      Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
      Signed-off-by: NKhalid Manaa <khalidm@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      7957837b
    • K
      net/mlx5e: Fix wrong calculation of header index in HW_GRO · b8d91145
      Khalid Manaa 提交于
      The HW doesn't wrap the CQE.shampo.header_index field according to the
      headers buffer size, instead it always increases it until reaching overflow
      of u16 size.
      
      Thus the mlx5e_handle_rx_cqe_mpwrq_shampo handler should mask the
      CQE header_index field to find the actual header index in the headers buffer.
      
      Fixes: f97d5c2a ("net/mlx5e: Add handle SHAMPO cqe support")
      Signed-off-by: NKhalid Manaa <khalidm@nvidia.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b8d91145
    • R
      net/mlx5: Bridge, Fix devlink deadlock on net namespace deletion · 880b5176
      Roi Dayan 提交于
      When changing mode to switchdev, rep bridge init registered to netdevice
      notifier holds the devlink lock and then takes pernet_ops_rwsem.
      At that time deleting a netns holds pernet_ops_rwsem and then takes
      the devlink lock.
      
      Example sequence is:
      $ ip netns add foo
      $ devlink dev eswitch set pci/0000:00:08.0 mode switchdev &
      $ ip netns del foo
      
      deleting netns trace:
      
      [ 1185.365555]  ? devlink_pernet_pre_exit+0x74/0x1c0
      [ 1185.368331]  ? mutex_lock_io_nested+0x13f0/0x13f0
      [ 1185.370984]  ? xt_find_table+0x40/0x100
      [ 1185.373244]  ? __mutex_lock+0x24a/0x15a0
      [ 1185.375494]  ? net_generic+0xa0/0x1c0
      [ 1185.376844]  ? wait_for_completion_io+0x280/0x280
      [ 1185.377767]  ? devlink_pernet_pre_exit+0x74/0x1c0
      [ 1185.378686]  devlink_pernet_pre_exit+0x74/0x1c0
      [ 1185.379579]  ? devlink_nl_cmd_get_dumpit+0x3a0/0x3a0
      [ 1185.380557]  ? xt_find_table+0xda/0x100
      [ 1185.381367]  cleanup_net+0x372/0x8e0
      
      changing mode to switchdev trace:
      
      [ 1185.411267]  down_write+0x13a/0x150
      [ 1185.412029]  ? down_write_killable+0x180/0x180
      [ 1185.413005]  register_netdevice_notifier+0x1e/0x210
      [ 1185.414000]  mlx5e_rep_bridge_init+0x181/0x360 [mlx5_core]
      [ 1185.415243]  mlx5e_uplink_rep_enable+0x269/0x480 [mlx5_core]
      [ 1185.416464]  ? mlx5e_uplink_rep_disable+0x210/0x210 [mlx5_core]
      [ 1185.417749]  mlx5e_attach_netdev+0x232/0x400 [mlx5_core]
      [ 1185.418906]  mlx5e_netdev_attach_profile+0x15b/0x1e0 [mlx5_core]
      [ 1185.420172]  mlx5e_netdev_change_profile+0x15a/0x1d0 [mlx5_core]
      [ 1185.421459]  mlx5e_vport_rep_load+0x557/0x780 [mlx5_core]
      [ 1185.422624]  ? mlx5e_stats_grp_vport_rep_num_stats+0x10/0x10 [mlx5_core]
      [ 1185.424006]  mlx5_esw_offloads_rep_load+0xdb/0x190 [mlx5_core]
      [ 1185.425277]  esw_offloads_enable+0xd74/0x14a0 [mlx5_core]
      
      Fix this by registering rep bridges for per net netdev notifier
      instead of global one, which operats on the net namespace without holding
      the pernet_ops_rwsem.
      
      Fixes: 19e9bfa0 ("net/mlx5: Bridge, add offload infrastructure")
      Signed-off-by: NRoi Dayan <roid@nvidia.com>
      Reviewed-by: NVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      880b5176