1. 11 4月, 2019 6 次提交
    • I
      mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2 · d5949d92
      Ido Schimmel 提交于
      In Spectrum-1, when a multicast packet is admitted to the shared buffer
      it increases the quotas of all the ports and {port, TC} to which it is
      forwarded to.
      
      The above means that multicast packets are accounted multiple times in
      the shared buffer and can therefore cause the associated shared buffer
      pool to fill up very quickly.
      
      To work around this issue, commit e83c045e ("mlxsw:
      spectrum_buffers: Configure MC pool") added a dedicated multicast pool
      in which multicast packets are accounted.
      
      The issue is not present in Spectrum-2, but in order to be backward
      compatible with Spectrum-1, its default behavior is to allow a multicast
      packet to increase multiple egress quotas instead of one.
      
      Until the new (non-backward compatible) mode is supported, configure a
      dedicated multicast pool as in Spectrum-1.
      
      Fixes: fe099bf6 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5949d92
    • I
      mlxsw: spectrum_router: Do not check VRF MAC address · 972fae68
      Ido Schimmel 提交于
      Commit 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
      addresses") enabled the driver to veto router interface (RIF) MAC
      addresses that it cannot support.
      
      This check should only be performed for interfaces for which the driver
      actually configures a RIF. A VRF upper is not one of them, so ignore it.
      
      Without this patch it is not possible to set an IP address on the VRF
      device and use it as a loopback.
      
      Fixes: 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Tested-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      972fae68
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue · b442fed1
      Ido Schimmel 提交于
      The workqueue is used to periodically update the networking stack about
      activity / statistics of various objects such as neighbours and TC
      actions.
      
      It should not be called as part of memory reclaim path, so remove the
      WQ_MEM_RECLAIM flag.
      
      Fixes: 3d5479e9 ("mlxsw: core: Remove deprecated create_workqueue")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b442fed1
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue · 4af06997
      Ido Schimmel 提交于
      The ordered workqueue is used to offload various objects such as routes
      and neighbours in the order they are notified.
      
      It should not be called as part of memory reclaim path, so remove the
      WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker
      tries to flush a non-WQ_MEM_RECLAIM workqueue.
      
      [1]
      [97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker
      [97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130
      ...
      [97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      [97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum]
      [97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130
      ...
      [97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086
      [97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000
      [97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046
      [97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0
      [97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0
      [97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0
      [97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000
      [97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0
      [97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [97703.543115] Call Trace:
      [97703.543129] __flush_work+0xbd/0x1e0
      [97703.543137] ? __cancel_work_timer+0x136/0x1b0
      [97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0
      [97703.543154] __cancel_work_timer+0x136/0x1b0
      [97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core]
      [97703.543184] cancel_work_sync+0x10/0x20
      [97703.543191] rhashtable_free_and_destroy+0x23/0x140
      [97703.543198] rhashtable_destroy+0xd/0x10
      [97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum]
      [97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum]
      [97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum]
      [97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum]
      [97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum]
      [97703.543484] process_one_work+0x1fd/0x400
      [97703.543493] worker_thread+0x34/0x410
      [97703.543500] kthread+0x121/0x140
      [97703.543507] ? process_one_work+0x400/0x400
      [97703.543512] ? kthread_park+0x90/0x90
      [97703.543523] ret_from_fork+0x35/0x40
      
      Fixes: a3832b31 ("mlxsw: core: Create an ordered workqueue for FIB offload")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NSemion Lisyansky <semionl@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af06997
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue · a8c133b0
      Ido Schimmel 提交于
      The EMAD workqueue is used to handle retransmission of EMAD packets that
      contain configuration data for the device's firmware.
      
      Given the workers need to allocate these packets and that the code is
      not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
      flag.
      
      Fixes: d965465b ("mlxsw: core: Fix possible deadlock")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8c133b0
    • I
      mlxsw: spectrum_switchdev: Add MDB entries in prepare phase · d4d0e409
      Ido Schimmel 提交于
      The driver cannot guarantee in the prepare phase that it will be able to
      write an MDB entry to the device. In case the driver returned success
      during the prepare phase, but then failed to add the entry in the commit
      phase, a WARNING [1] will be generated by the switchdev core.
      
      Fix this by doing the work in the prepare phase instead.
      
      [1]
      [  358.544486] swp12s0: Commit of object (id=2) failed.
      [  358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0
      [  358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350
      [  358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  358.580582] Workqueue: events switchdev_deferred_process_work
      [  358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0
      ...
      [  358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286
      [  358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000
      [  358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8
      [  358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000
      [  358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000
      [  358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200
      [  358.659790] FS:  0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000
      [  358.668820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0
      [  358.683188] Call Trace:
      [  358.685918]  switchdev_port_obj_add_deferred+0x13/0x60
      [  358.691655]  switchdev_deferred_process+0x6b/0xf0
      [  358.696907]  switchdev_deferred_process_work+0xa/0x10
      [  358.702548]  process_one_work+0x1f5/0x3f0
      [  358.707022]  worker_thread+0x28/0x3c0
      [  358.711099]  ? process_one_work+0x3f0/0x3f0
      [  358.715768]  kthread+0x10d/0x130
      [  358.719369]  ? __kthread_create_on_node+0x180/0x180
      [  358.724815]  ret_from_fork+0x35/0x40
      
      Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlex Kushnarov <alexanderk@mellanox.com>
      Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4d0e409
  2. 30 3月, 2019 15 次提交
  3. 20 3月, 2019 1 次提交
    • A
      mlxsw: core: mlxsw: core: avoid -Wint-in-bool-context warning · 7442c483
      Arnd Bergmann 提交于
      A recently added function in mlxsw triggers a harmless compiler warning:
      
      In file included from drivers/net/ethernet/mellanox/mlxsw/core.h:17,
                       from drivers/net/ethernet/mellanox/mlxsw/core_env.c:7:
      drivers/net/ethernet/mellanox/mlxsw/core_env.c: In function 'mlxsw_env_module_temp_thresholds_get':
      drivers/net/ethernet/mellanox/mlxsw/reg.h:8015:45: error: '*' in boolean context, suggest '&&' instead [-Werror=int-in-bool-context]
       #define MLXSW_REG_MTMP_TEMP_TO_MC(val) (val * 125)
                                              ~~~~~^~~~~~
      drivers/net/ethernet/mellanox/mlxsw/core_env.c:116:8: note: in expansion of macro 'MLXSW_REG_MTMP_TEMP_TO_MC'
         if (!MLXSW_REG_MTMP_TEMP_TO_MC(module_temp)) {
              ^~~~~~~~~~~~~~~~~~~~~~~~~
      
      The warning is normally disabled, but it would be nice to enable
      it to find real bugs, and there are no other known instances at
      the moment.
      
      Replace the negation with a zero-comparison, which also matches
      the comment above it.
      
      Fixes: d93c19a1 ("mlxsw: core: Add API for QSFP module temperature thresholds reading")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7442c483
  4. 18 3月, 2019 2 次提交
  5. 13 3月, 2019 5 次提交
    • J
      net/mlx4_core: Fix qp mtt size calculation · 8511a653
      Jack Morgenstein 提交于
      Calculation of qp mtt size (in function mlx4_RST2INIT_wrapper)
      ultimately depends on function roundup_pow_of_two.
      
      If the amount of memory required by the QP is less than one page,
      roundup_pow_of_two is called with argument zero.  In this case, the
      roundup_pow_of_two result is undefined.
      
      Calling roundup_pow_of_two with a zero argument resulted in the
      following stack trace:
      
      UBSAN: Undefined behaviour in ./include/linux/log2.h:61:13
      shift exponent 64 is too large for 64-bit type 'long unsigned int'
      CPU: 4 PID: 26939 Comm: rping Tainted: G OE 4.19.0-rc1
      Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
      Call Trace:
      dump_stack+0x9a/0xeb
      ubsan_epilogue+0x9/0x7c
      __ubsan_handle_shift_out_of_bounds+0x254/0x29d
      ? __ubsan_handle_load_invalid_value+0x180/0x180
      ? debug_show_all_locks+0x310/0x310
      ? sched_clock+0x5/0x10
      ? sched_clock+0x5/0x10
      ? sched_clock_cpu+0x18/0x260
      ? find_held_lock+0x35/0x1e0
      ? mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
      mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
      
      Fix this by explicitly testing for zero, and returning one if the
      argument is zero (assuming that the next higher power of 2 in this case
      should be one).
      
      Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8511a653
    • J
      net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling · c07d2792
      Jack Morgenstein 提交于
      In procedures mlx4_cmd_use_events() and mlx4_cmd_use_polling(), we need to
      guarantee that there are no FW commands in progress on the comm channel
      (for VFs) or wrapped FW commands (on the PF) when SRIOV is active.
      
      We do this by also taking the slave_cmd_mutex when SRIOV is active.
      
      This is especially important when switching from event to polling, since we
      free the command-context array during the switch.  If there are FW commands
      in progress (e.g., waiting for a completion event), the completion event
      handler will access freed memory.
      
      Since the decision to use comm_wait or comm_poll is taken before grabbing
      the event_sem/poll_sem in mlx4_comm_cmd_wait/poll, we must take the
      slave_cmd_mutex as well (to guarantee that the decision to use events or
      polling and the call to the appropriate cmd function are atomic).
      
      Fixes: a7e1f049 ("net/mlx4_core: Fix deadlock when switching between polling and event fw commands")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c07d2792
    • J
      net/mlx4_core: Fix reset flow when in command polling mode · e15ce4b8
      Jack Morgenstein 提交于
      As part of unloading a device, the driver switches from
      FW command event mode to FW command polling mode.
      
      Part of switching over to polling mode is freeing the command context array
      memory (unfortunately, currently, without NULLing the command context array
      pointer).
      
      The reset flow calls "complete" to complete all outstanding fw commands
      (if we are in event mode). The check for event vs. polling mode here
      is to test if the command context array pointer is NULL.
      
      If the reset flow is activated after the switch to polling mode, it will
      attempt (incorrectly) to complete all the commands in the context array --
      because the pointer was not NULLed when the driver switched over to polling
      mode.
      
      As a result, we have a use-after-free situation, which results in a
      kernel crash.
      
      For example:
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
      PGD 0
      Oops: 0000 [#1] SMP
      Modules linked in: netconsole nfsv3 nfs_acl nfs lockd grace ...
      CPU: 2 PID: 940 Comm: kworker/2:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
      Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
      Workqueue: events hv_eject_device_work [pci_hyperv]
      task: ffff8d1734ca0fd0 ti: ffff8d17354bc000 task.ti: ffff8d17354bc000
      RIP: 0010:[<ffffffff876c4a8e>]  [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
      RSP: 0018:ffff8d17354bfa38  EFLAGS: 00010082
      RAX: 0000000000000000 RBX: ffff8d17362d42c8 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8d17362d42c8
      RBP: ffff8d17354bfa70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000298 R11: ffff8d173610e000 R12: ffff8d17362d42d0
      R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000003
      FS:  0000000000000000(0000) GS:ffff8d1802680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000f16d8000 CR4: 00000000001406e0
      Call Trace:
       [<ffffffff876c7adc>] complete+0x3c/0x50
       [<ffffffffc04242f0>] mlx4_cmd_wake_completions+0x70/0x90 [mlx4_core]
       [<ffffffffc041e7b1>] mlx4_enter_error_state+0xe1/0x380 [mlx4_core]
       [<ffffffffc041fa4b>] mlx4_comm_cmd+0x29b/0x360 [mlx4_core]
       [<ffffffffc041ff51>] __mlx4_cmd+0x441/0x920 [mlx4_core]
       [<ffffffff877f62b1>] ? __slab_free+0x81/0x2f0
       [<ffffffff87951384>] ? __radix_tree_lookup+0x84/0xf0
       [<ffffffffc043a8eb>] mlx4_free_mtt_range+0x5b/0xb0 [mlx4_core]
       [<ffffffffc043a957>] mlx4_mtt_cleanup+0x17/0x20 [mlx4_core]
       [<ffffffffc04272c7>] mlx4_free_eq+0xa7/0x1c0 [mlx4_core]
       [<ffffffffc042803e>] mlx4_cleanup_eq_table+0xde/0x130 [mlx4_core]
       [<ffffffffc0433e08>] mlx4_unload_one+0x118/0x300 [mlx4_core]
       [<ffffffffc0434191>] mlx4_remove_one+0x91/0x1f0 [mlx4_core]
      
      The fix is to set the command context array pointer to NULL after freeing
      the array.
      
      Fixes: f5aef5aa ("net/mlx4_core: Activate reset flow upon fatal command cases")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e15ce4b8
    • J
      mlxsw: minimal: Initialize base_mac · 426aa1fc
      Jiri Pirko 提交于
      Currently base_mac is not initialized which causes wrong reporting of
      zeroed parent_id to userspace. Fix this by initializing base_mac
      properly.
      
      Fixes: c100e47c ("mlxsw: minimal: Add ethtool support")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      426aa1fc
    • V
      mlxsw: core: Prevent duplication during QSFP module initialization · 6bab45b4
      Vadim Pasternak 提交于
      Verify during thermal initialization if QSFP module's entry is already
      configured in order to prevent duplication.
      Such scenario could happen in case two switch drivers (PCI and I2C
      based) coexist and if after boot, splitting configuration is applied
      for some ports and then I2C based driver is re-probed.
      In such case after reboot same QSFP module, associated with split will
      be discovered by I2C based driver few times, and it will cause a crash.
      
      It could happen for example on system equipped with BMC (Baseboard
      Management Controller), running I2C based driver, when the next steps
      are performed:
      - System boot
      - Host side configures port spilt.
      - BMC side is rebooted.
      
      Fixes: 6a79507c ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
      Signed-off-by: NVadim Pasternak <vadimp@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bab45b4
  6. 12 3月, 2019 10 次提交
  7. 07 3月, 2019 1 次提交