1. 06 5月, 2022 15 次提交
  2. 05 5月, 2022 6 次提交
  3. 04 5月, 2022 19 次提交
    • I
      mlxsw: spectrum_router: Only query neighbour activity when necessary · cff94376
      Ido Schimmel 提交于
      The driver periodically queries the device for activity of neighbour
      entries in order to report it to the kernel's neighbour table.
      
      Avoid unnecessary activity query when no neighbours are installed. Use
      an atomic variable to track the number of neighbours, as it is read
      without any locks.
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cff94376
    • I
      mlxsw: spectrum_switchdev: Only query FDB notifications when necessary · b8950003
      Ido Schimmel 提交于
      The driver periodically queries the device for FDB notifications (e.g.,
      learned, aged-out) in order to update the bridge driver. These
      notifications can only be generated when bridges are offloaded to the
      device.
      
      Avoid unnecessary queries by starting to query upon installation of the
      first bridge and stop querying upon removal of the last bridge.
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8950003
    • I
      mlxsw: spectrum_acl: Do not report activity for multicast routes · d1314096
      Ido Schimmel 提交于
      The driver periodically queries the device for activity of ACL rules in
      order to report it to tc upon 'FLOW_CLS_STATS'.
      
      In Spectrum-2 and later ASICs, multicast routes are programmed as ACL
      rules, but unlike rules installed by tc, their activity is of no
      interest.
      
      Avoid unnecessary activity query for such rules by always reporting them
      as inactive.
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1314096
    • P
      mlxsw: Treat LLDP packets as control · 0106668c
      Petr Machata 提交于
      When trapping packets for on-CPU processing, Spectrum machines
      differentiate between control and non-control traps. Traffic trapped
      through non-control traps is treated as data and kept in shared buffer in
      pools 0-4. Traffic trapped through control traps is kept in the dedicated
      control buffer 9. The advantage of marking traps as control is that
      pressure in the data plane does not prevent the control traffic to be
      processed.
      
      When the LLDP trap was introduced, it was marked as a control trap. But
      then in commit aed4b572 ("mlxsw: spectrum: PTP: Hook into packet
      receive path"), PTP traps were introduced. Because Ethernet-encapsulated
      PTP packets look to the Spectrum-1 ASIC as LLDP traffic and are trapped
      under the LLDP trap, this trap was reconfigured as non-control, in sync
      with the PTP traps.
      
      There is however no requirement that PTP traffic be handled as data.
      Besides, the usual encapsulation for PTP traffic is UDP, not bare Ethernet,
      and that is in deployments that even need PTP, which is far less common
      than LLDP. This is reflected by the default policer, which was not bumped
      up to the 19Kpps / 24Kpps that is the expected load of a PTP-enabled
      Spectrum-1 switch.
      
      Marking of LLDP trap as non-control was therefore probably misguided. In
      this patch, change it back to control.
      Reported-by: NMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0106668c
    • P
      mlxsw: spectrum_dcb: Do not warn about priority changes · b6b58456
      Petr Machata 提交于
      The idea behind the warnings is that the user would get warned in case when
      more than one priority is configured for a given DSCP value on a netdevice.
      
      The warning is currently wrong, because dcb_ieee_getapp_mask() returns
      the first matching entry, not all of them, and the warning will then claim
      that some priority is "current", when in fact it is not.
      
      But more importantly, the warning is misleading in general. Consider the
      following commands:
      
       # dcb app flush dev swp19 dscp-prio
       # dcb app add dev swp19 dscp-prio 24:3
       # dcb app replace dev swp19 dscp-prio 24:2
      
      The last command will issue the following warning:
      
       mlxsw_spectrum3 0000:07:00.0 swp19: Ignoring new priority 2 for DSCP 24 in favor of current value of 3
      
      The reason is that the "replace" command works by first adding the new
      value, and then removing all old values. This is the only way to make the
      replacement without causing the traffic to be prioritized to whatever the
      chip defaults to. The warning is issued in response to adding the new
      priority, and then no warning is shown when the old priority is removed.
      The upshot is that the canonical way to change traffic prioritization
      always produces a warning about ignoring the new priority, but what gets
      configured is in fact what the user intended.
      
      An option to just emit warning every time that the prioritization changes
      just to make it clear that it happened is obviously unsatisfactory.
      
      Therefore, in this patch, remove the warnings.
      Reported-by: NMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: NPetr Machata <petrm@nvidia.com>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6b58456
    • M
      sfc: Copy a subset of mcdi_pcol.h to siena · 6b73f20a
      Martin Habets 提交于
      For Siena we do not need new messages that were defined
      for the EF100 architecture. Several debug messages have
      also been removed.
      Signed-off-by: NMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b73f20a
    • M
      sfc: Disable Siena support · 0c38a5bd
      Martin Habets 提交于
      Disable the build of Siena code until later in this patch series.
      Prevent sfc.ko from binding to Siena NICs.
      
      efx_init_sriov/efx_fini_sriov is only used for Siena. Remove calls
      to those.
      Signed-off-by: NMartin Habets <habetsm.xilinx@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c38a5bd
    • M
      net/mlx5: Fix matching on inner TTC · a042d7f5
      Mark Bloch 提交于
      The cited commits didn't use proper matching on inner TTC
      as a result distribution of encapsulated packets wasn't symmetric
      between the physical ports.
      
      Fixes: 4c71ce50 ("net/mlx5: Support partial TTC rules")
      Fixes: 8e25a2bc ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a042d7f5
    • M
      net/mlx5: Avoid double clear or set of sync reset requested · fc3d3db0
      Moshe Shemesh 提交于
      Double clear of reset requested state can lead to NULL pointer as it
      will try to delete the timer twice. This can happen for example on a
      race between abort from FW and pci error or reset. Avoid such case using
      test_and_clear_bit() to verify only one time reset requested state clear
      flow. Similarly use test_and_set_bit() to verify only one time reset
      requested state set flow.
      
      Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: NMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      fc3d3db0
    • M
      net/mlx5: Fix deadlock in sync reset flow · cb7786a7
      Moshe Shemesh 提交于
      The sync reset flow can lead to the following deadlock when
      poll_sync_reset() is called by timer softirq and waiting on
      del_timer_sync() for the same timer. Fix that by moving the part of the
      flow that waits for the timer to reset_reload_work.
      
      It fixes the following kernel Trace:
      RIP: 0010:del_timer_sync+0x32/0x40
      ...
      Call Trace:
       <IRQ>
       mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
       poll_sync_reset.cold+0x36/0x52 [mlx5_core]
       call_timer_fn+0x32/0x130
       __run_timers.part.0+0x180/0x280
       ? tick_sched_handle+0x33/0x60
       ? tick_sched_timer+0x3d/0x80
       ? ktime_get+0x3e/0xa0
       run_timer_softirq+0x2a/0x50
       __do_softirq+0xe1/0x2d6
       ? hrtimer_interrupt+0x136/0x220
       irq_exit+0xae/0xb0
       smp_apic_timer_interrupt+0x7b/0x140
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      
      Fixes: 3c5193a8 ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: NMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      cb7786a7
    • M
      net/mlx5e: Fix trust state reset in reload · b781bff8
      Moshe Tal 提交于
      Setting dscp2prio during the driver reload can cause dcb ieee app list to
      be not empty after the reload finish and as a result to a conflict between
      the priority trust state reported by the app and the state in the device
      register.
      
      Reset the dcb ieee app list on initialization in case this is
      conflicting with the register status.
      
      Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
      Signed-off-by: NMoshe Tal <moshet@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b781bff8
    • A
      net/mlx5e: Avoid checking offload capability in post_parse action · 0e322efd
      Ariel Levkovich 提交于
      During TC action parsing, the can_offload callback is called
      before calling the action's main parsing callback.
      
      Later on, the can_offload callback is called again before handling
      the action's post_parse callback if exists.
      
      Since the main parsing callback might have changed and set parsing
      params for the rule, following can_offload checks might fail because
      some parsing params were already set.
      
      Specifically, the ct action main parsing sets the ct param in the
      parsing status structure and when the second can_offload for ct action
      is called, before handling the ct post parsing, it will return an error
      since it checks this ct param to indicate multiple ct actions which are
      not supported.
      
      Therefore, the can_offload call is removed from the post parsing
      handling to prevent such cases.
      This is allowed since the first can_offload call will ensure that the
      action can be offloaded and the fact the code reached the post parsing
      handling already means that the action can be offloaded.
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      0e322efd
    • P
      net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release · b069e14f
      Paul Blakey 提交于
      __mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
      if that is the last reference to that tuple, the actual deletion of
      the tuple can happen after the FT is already destroyed and freed.
      
      Flush the used workqueue before destroying the ct FT.
      
      Fixes: a2173131 ("net/mlx5e: CT: manage the lifetime of the ct entry object")
      Reviewed-by: NOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b069e14f
    • A
      net/mlx5e: TC, fix decap fallback to uplink when int port not supported · e3fdc71b
      Ariel Levkovich 提交于
      When resolving the decap route device for a tunnel decap rule,
      the result may be an OVS internal port device.
      
      Prior to adding the support for internal port offload, such case
      would result in using the uplink as the default decap route device
      which allowed devices that can't support internal port offload
      to offload this decap rule.
      
      This behavior got broken by adding the internal port offload which
      will fail in case the device can't support internal port offload.
      
      To restore the old behavior, use the uplink device as the decap
      route as before when internal port offload is not supported.
      
      Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      e3fdc71b
    • A
      net/mlx5e: TC, Fix ct_clear overwriting ct action metadata · 087032ee
      Ariel Levkovich 提交于
      ct_clear action is translated to clearing reg_c metadata
      which holds ct state and zone information using mod header
      actions.
      These actions are allocated during the actions parsing, as
      part of the flow attributes main mod header action list.
      
      If ct action exists in the rule, the flow's main mod header
      is used only in the post action table rule, after the ct tables
      which set the ct info in the reg_c as part of the ct actions.
      
      Therefore, if the original rule has a ct_clear action followed
      by a ct action, the ct action reg_c setting will be done first and
      will be followed by the ct_clear resetting reg_c and overwriting
      the ct info.
      
      Fix this by moving the ct_clear mod header actions allocation from
      the ct action parsing stage to the ct action post parsing stage where
      it is already known if ct_clear is followed by a ct action.
      In such case, we skip the mod header actions allocation for the ct
      clear since the ct action will write to reg_c anyway after clearing it.
      
      Fixes: 806401c2 ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NPaul Blakey <paulb@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      087032ee
    • V
      net/mlx5e: Lag, Don't skip fib events on current dst · 4a2a664e
      Vlad Buslov 提交于
      Referenced change added check to skip updating fib when new fib instance
      has same or lower priority. However, new fib instance can be an update on
      same dst address as existing one even though the structure is another
      instance that has different address. Ignoring events on such instances
      causes multipath LAG state to not be correctly updated.
      
      Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
      structure and don't skip events that have the same value of that fields.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      4a2a664e
    • V
      net/mlx5e: Lag, Fix fib_info pointer assignment · a6589155
      Vlad Buslov 提交于
      Referenced change incorrectly sets single path fib_info even when LAG is
      not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
      that verifies LAG state.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a6589155
    • V
      net/mlx5e: Lag, Fix use-after-free in fib event handler · 27b0420f
      Vlad Buslov 提交于
      Recent commit that modified fib route event handler to handle events
      according to their priority introduced use-after-free[0] in mp->mfi pointer
      usage. The pointer now is not just cached in order to be compared to
      following fib_info instances, but is also dereferenced to obtain
      fib_priority. However, since mlx5 lag code doesn't hold the reference to
      fin_info during whole mp->mfi lifetime, it could be used after fib_info
      instance has already been freed be kernel infrastructure code.
      
      Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
      type and cache fib_info priority in dedicated integer. Group
      fib_info-related data into dedicated 'fib' structure that will be further
      extended by following patches in the series.
      
      [0]:
      
      [  203.588029] ==================================================================
      [  203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
      
      [  203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G    B             5.17.0-rc7+ #6
      [  203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
      [  203.600053] Call Trace:
      [  203.600608]  <TASK>
      [  203.601110]  dump_stack_lvl+0x48/0x5e
      [  203.601860]  print_address_description.constprop.0+0x1f/0x160
      [  203.602950]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.604073]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.605177]  kasan_report.cold+0x83/0xdf
      [  203.605969]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.607102]  mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.608199]  ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
      [  203.609382]  ? read_word_at_a_time+0xe/0x20
      [  203.610463]  ? strscpy+0xa0/0x2a0
      [  203.611463]  process_one_work+0x722/0x1270
      [  203.612344]  worker_thread+0x540/0x11e0
      [  203.613136]  ? rescuer_thread+0xd50/0xd50
      [  203.613949]  kthread+0x26e/0x300
      [  203.614627]  ? kthread_complete_and_exit+0x20/0x20
      [  203.615542]  ret_from_fork+0x1f/0x30
      [  203.616273]  </TASK>
      
      [  203.617174] Allocated by task 3746:
      [  203.617874]  kasan_save_stack+0x1e/0x40
      [  203.618644]  __kasan_kmalloc+0x81/0xa0
      [  203.619394]  fib_create_info+0xb41/0x3c50
      [  203.620213]  fib_table_insert+0x190/0x1ff0
      [  203.621020]  fib_magic.isra.0+0x246/0x2e0
      [  203.621803]  fib_add_ifaddr+0x19f/0x670
      [  203.622563]  fib_inetaddr_event+0x13f/0x270
      [  203.623377]  blocking_notifier_call_chain+0xd4/0x130
      [  203.624355]  __inet_insert_ifa+0x641/0xb20
      [  203.625185]  inet_rtm_newaddr+0xc3d/0x16a0
      [  203.626009]  rtnetlink_rcv_msg+0x309/0x880
      [  203.626826]  netlink_rcv_skb+0x11d/0x340
      [  203.627626]  netlink_unicast+0x4cc/0x790
      [  203.628430]  netlink_sendmsg+0x762/0xc00
      [  203.629230]  sock_sendmsg+0xb2/0xe0
      [  203.629955]  ____sys_sendmsg+0x58a/0x770
      [  203.630756]  ___sys_sendmsg+0xd8/0x160
      [  203.631523]  __sys_sendmsg+0xb7/0x140
      [  203.632294]  do_syscall_64+0x35/0x80
      [  203.633045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.634427] Freed by task 0:
      [  203.635063]  kasan_save_stack+0x1e/0x40
      [  203.635844]  kasan_set_track+0x21/0x30
      [  203.636618]  kasan_set_free_info+0x20/0x30
      [  203.637450]  __kasan_slab_free+0xfc/0x140
      [  203.638271]  kfree+0x94/0x3b0
      [  203.638903]  rcu_core+0x5e4/0x1990
      [  203.639640]  __do_softirq+0x1ba/0x5d3
      
      [  203.640828] Last potentially related work creation:
      [  203.641785]  kasan_save_stack+0x1e/0x40
      [  203.642571]  __kasan_record_aux_stack+0x9f/0xb0
      [  203.643478]  call_rcu+0x88/0x9c0
      [  203.644178]  fib_release_info+0x539/0x750
      [  203.644997]  fib_table_delete+0x659/0xb80
      [  203.645809]  fib_magic.isra.0+0x1a3/0x2e0
      [  203.646617]  fib_del_ifaddr+0x93f/0x1300
      [  203.647415]  fib_inetaddr_event+0x9f/0x270
      [  203.648251]  blocking_notifier_call_chain+0xd4/0x130
      [  203.649225]  __inet_del_ifa+0x474/0xc10
      [  203.650016]  devinet_ioctl+0x781/0x17f0
      [  203.650788]  inet_ioctl+0x1ad/0x290
      [  203.651533]  sock_do_ioctl+0xce/0x1c0
      [  203.652315]  sock_ioctl+0x27b/0x4f0
      [  203.653058]  __x64_sys_ioctl+0x124/0x190
      [  203.653850]  do_syscall_64+0x35/0x80
      [  203.654608]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.666952] The buggy address belongs to the object at ffff888144df2000
                      which belongs to the cache kmalloc-256 of size 256
      [  203.669250] The buggy address is located 80 bytes inside of
                      256-byte region [ffff888144df2000, ffff888144df2100)
      [  203.671332] The buggy address belongs to the page:
      [  203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
      [  203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
      [  203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      [  203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
      [  203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [  203.679928] page dumped because: kasan: bad access detected
      
      [  203.681455] Memory state around the buggy address:
      [  203.682421]  ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.683863]  ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.686701]                                                  ^
      [  203.687820]  ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.689226]  ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.690620] ==================================================================
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      27b0420f
    • M
      net/mlx5e: Fix the calling of update_buffer_lossy() API · c4d963a5
      Mark Zhang 提交于
      The arguments of update_buffer_lossy() is in a wrong order. Fix it.
      
      Fixes: 88b3d5c9 ("net/mlx5e: Fix port buffers cell size value")
      Signed-off-by: NMark Zhang <markzhang@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      c4d963a5