1. 27 11月, 2020 11 次提交
  2. 31 10月, 2020 1 次提交
  3. 27 10月, 2020 1 次提交
    • P
      RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049
      Parav Pandit 提交于
      When a mlx5 core devlink instance is reloaded in different net namespace,
      its associated IB device is deleted and recreated.
      
      Example sequence is:
      $ ip netns add foo
      $ devlink dev reload pci/0000:00:08.0 netns foo
      $ ip netns del foo
      
      mlx5 IB device needs to attach and detach the netdevice to it through the
      netdev notifier chain during load and unload sequence.  A below call graph
      of the unload flow.
      
      cleanup_net()
         down_read(&pernet_ops_rwsem); <- first sem acquired
           ops_pre_exit_list()
             pre_exit()
               devlink_pernet_pre_exit()
                 devlink_reload()
                   mlx5_devlink_reload_down()
                     mlx5_unload_one()
                     [...]
                       mlx5_ib_remove()
                         mlx5_ib_unbind_slave_port()
                           mlx5_remove_netdev_notifier()
                             unregister_netdevice_notifier()
                               down_write(&pernet_ops_rwsem);<- recurrsive lock
      
      Hence, when net namespace is deleted, mlx5 reload results in deadlock.
      
      When deadlock occurs, devlink mutex is also held. This not only deadlocks
      the mlx5 device under reload, but all the processes which attempt to
      access unrelated devlink devices are deadlocked.
      
      Hence, fix this by mlx5 ib driver to register for per net netdev notifier
      instead of global one, which operats on the net namespace without holding
      the pernet_ops_rwsem.
      
      Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
      Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      fbdd0049
  4. 13 10月, 2020 2 次提交
  5. 10 10月, 2020 2 次提交
  6. 03 10月, 2020 2 次提交
    • S
      net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b
      Saeed Mahameed 提交于
      In case of pci is offline reclaim_pages_cmd() will still try to call
      the FW to release FW pages, cmd_exec() in this case will return a silent
      success without actually calling the FW.
      
      This is wrong and will cause page leaks, what we should do is to detect
      pci offline or command interface un-available before tying to access the
      FW and manually release the FW pages in the driver.
      
      In this patch we share the code to check for FW command interface
      availability and we call it in sensitive places e.g. reclaim_pages_cmd().
      
      Alternative fix:
       1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
          command success simulation list.
       2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b898ce7b
    • E
      net/mlx5: Avoid possible free of command entry while timeout comp handler · 50b2412b
      Eran Ben Elisha 提交于
      Upon command completion timeout, driver simulates a forced command
      completion. In a rare case where real interrupt for that command arrives
      simultaneously, it might release the command entry while the forced
      handler might still access it.
      
      Fix that by adding an entry refcount, to track current amount of allowed
      handlers. Command entry to be released only when this refcount is
      decremented to zero.
      
      Command refcount is always initialized to one. For callback commands,
      command completion handler is the symmetric flow to decrement it. For
      non-callback commands, it is wait_func().
      
      Before ringing the doorbell, increment the refcount for the real completion
      handler. Once the real completion handler is called, it will decrement it.
      
      For callback commands, once the delayed work is scheduled, increment the
      refcount. Upon callback command completion handler, we will try to cancel
      the timeout callback. In case of success, we need to decrement the callback
      refcount as it will never run.
      
      In addition, gather the entry index free and the entry free into a one
      flow for all command types release.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      50b2412b
  7. 01 10月, 2020 1 次提交
  8. 18 9月, 2020 3 次提交
  9. 16 9月, 2020 2 次提交
    • O
      net/mlx5e: Add CQE compression support for multi-strides packets · b7cf0806
      Ofer Levi 提交于
      Add CQE compression support for completions of packets that span
      multiple strides in a Striding RQ, per the HW capability.
      In our memory model, we use small strides (256B as of today) for the
      non-linear SKB mode. This feature allows CQE compression to work also
      for multiple strides packets. In this case decompressing the mini CQE
      array will use stride index provided by HW as part of the mini CQE.
      Before this feature, compression was possible only for single-strided
      packets, i.e. for packets of size up to 256 bytes when in non-linear
      mode, and the index was maintained by SW.
      This feature is supported for ConnectX-5 and above.
      
      Feature performance test:
      This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
      62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
      packets before building the SKB to achieve low cpu utilization.
      Outcome is low cpu utilization and bottleneck on pci only.
      Test setup:
      Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
      NIC: ConnectX-6 DX.
      Sender side generates 300 byte packets at full pci bandwidth.
      Receiver side configuration:
      Single channel, one cpu processing with one ring allocated. Cpu utilization
      is ~20% while pci bandwidth is fully utilized.
      For the generated traffic and interface MTU of 4500B (to activate the
      non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
      to ~21Mpps.
      Without this feature, counters show no CQE compression blocks for
      this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
      in ~500K compression blocks.
      Signed-off-by: NOfer Levi <oferle@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b7cf0806
    • E
      net/mlx5: Always use container_of to find mdev pointer from clock struct · fb609b51
      Eran Ben Elisha 提交于
      Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
      some cases used container_of and on another used clock->mdev.
      
      Align code to use container_of amd remove clock->mdev pointer.
      While here, fix reverse xmas tree coding style.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      fb609b51
  10. 27 8月, 2020 1 次提交
  11. 29 7月, 2020 1 次提交
    • R
      net/mlx5e: Modify uplink state on interface up/down · 7d0314b1
      Ron Diskin 提交于
      When setting the PF interface up/down, notify the firmware to update
      uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.
      
      This behavior will prevent sending traffic out on uplink port when PF is
      down, such as sending traffic from a VF interface which is still up.
      Currently when calling mlx5e_open/close(), the driver only sends PAOS
      command to notify the firmware to set the physical port state to
      up/down, however, it is not sufficient. When VF is in "auto" state, it
      follows the uplink state, which was not updated on mlx5e_open/close()
      before this patch.
      
      When switchdev mode is enabled and uplink representor is first enabled,
      set the uplink port state value back to its FW default "AUTO".
      
      Fixes: 63bfd399 ("net/mlx5e: Send PAOS command on interface up/down")
      Signed-off-by: NRon Diskin <rondi@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      7d0314b1
  12. 28 7月, 2020 1 次提交
    • E
      net/mlx5: Hold pages RB tree per VF · d6945242
      Eran Ben Elisha 提交于
      Per page request event, FW request to allocated or release pages for a
      single function. Driver maintains FW pages object per function, so there
      is no need to hold one global page data-base. Instead, have a page
      data-base per function, which will improve performance release flow in all
      cases, especially for "release all pages".
      
      As the range of function IDs is large and not sequential, use xarray to
      store a per function ID page data-base, where the function ID is the key.
      
      Upon first allocation of a page to a function ID, create the page
      data-base per function. This data-base will be released only at pagealloc
      mechanism cleanup.
      
      NIC: ConnectX-4 Lx
      CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
      Test case: 32 VFs, measure release pages on one VF as part of FLR
      Before: 0.021 Sec
      After:  0.014 Sec
      
      The improvement depends on amount of VFs and memory utilization
      by them. Time measurements above were taken from idle system.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d6945242
  13. 27 7月, 2020 2 次提交
  14. 25 7月, 2020 1 次提交
  15. 17 7月, 2020 2 次提交
  16. 16 7月, 2020 3 次提交
  17. 10 7月, 2020 1 次提交
    • E
      net/mlx5e: Fix port buffers cell size value · 88b3d5c9
      Eran Ben Elisha 提交于
      Device unit for port buffers size, xoff_threshold and xon_threshold is
      cells. Fix a bug in driver where cell unit size was hard-coded to
      128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
      versions.
      
      Driver to read cell size from SBCAM register and translate bytes to cell
      units accordingly.
      
      In order to fix the bug, this patch exposes SBCAM (Shared buffer
      capabilities mask) layout and defines.
      
      If SBCAM.cap_cell_size is valid, use it for all bytes to cells
      calculations. If not valid, fallback to 128.
      
      Cell size do not change on the fly per device. Instead of issuing SBCAM
      access reg command every time such translation is needed, cache it in
      mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
      as a param to every function that needs bytes to cells translation.
      
      While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
      en_dcbnl.c, as it is only used by that file.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      88b3d5c9
  18. 09 7月, 2020 1 次提交
  19. 03 7月, 2020 1 次提交
  20. 28 6月, 2020 1 次提交