1. 23 1月, 2021 2 次提交
    • P
      net/mlx5: SF, Add auxiliary device support · 90d010b8
      Parav Pandit 提交于
      Introduce API to add and delete an auxiliary device for an SF.
      Each SF has its own dedicated window in the PCI BAR 2.
      
      SF device is similar to PCI PF and VF that supports multiple class of
      devices such as net, rdma and vdpa.
      
      SF device will be added or removed in subsequent patch during SF
      devlink port function state change command.
      
      A subfunction device exposes user supplied subfunction number which will
      be further used by systemd/udev to have deterministic name for its
      netdevice and rdma device.
      
      An mlx5 subfunction auxiliary device example:
      
      $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
      
      $ devlink port show
      pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false
      
      $ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
      pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:00:00 state inactive opstate detached
      
      $ devlink port show ens2f0npf0sf88
      pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
        function:
          hw_addr 00:00:00:00:88:88 state inactive opstate detached
      
      $ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active
      
      On activation,
      
      $ ls -l /sys/bus/auxiliary/devices/
      mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4
      
      $ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
      88
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NVu Pham <vuhuong@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      90d010b8
    • P
      net/mlx5: Introduce vhca state event notifier · f3196bb0
      Parav Pandit 提交于
      vhca state events indicates change in the state of the vhca that may
      occur due to a SF allocation, deallocation or enabling/disabling the
      SF HCA.
      
      Introduce vhca state event handler which will be used by SF devlink
      port manager and SF hardware id allocator in subsequent patches
      to act on the event.
      
      This enables single entity to subscribe, query and rearm the event
      for a function.
      Signed-off-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NVu Pham <vuhuong@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      f3196bb0
  2. 14 1月, 2021 1 次提交
  3. 08 1月, 2021 1 次提交
  4. 18 12月, 2020 1 次提交
  5. 06 12月, 2020 2 次提交
  6. 04 12月, 2020 4 次提交
    • L
      net/mlx5: Register mlx5 devices to auxiliary virtual bus · a925b5e3
      Leon Romanovsky 提交于
      Create auxiliary devices under new virtual bus. This will replace
      the custom-made mlx5 ->add()/->remove() interfaces and next patches
      will fill the missing callback and remove the old interface logic.
      
      The attachment of auxiliary drivers to the devices is possible in
      1-to-1 manner only and it requires us to create device for every protocol,
      so that device (module) will be able to connect to it.
      
      System with 2 IB and 1 RoCE cards:
      [leonro@vm ~]$ lspci |grep nox
      00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
      00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
      00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
      [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
       mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
       mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0
       mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1
       mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2
       mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1
       mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2
      [leonro@vm ~]$ rdma dev
      0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
      1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456
      2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457
      
      System with RoCE SR-IOV card with 4 VFs:
      [leonro@vm ~]$ lspci |grep nox
      01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
      01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
      01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
      01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
      01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
      [leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
       mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0
       mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1
       mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2
       mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3
       mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4
       mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0
       mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1
       mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2
       mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3
       mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4
       mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1
       mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2
       mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3
       mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4
      [leonro@vm ~]$ rdma dev
      0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
      1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456
      2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457
      3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458
      4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      a925b5e3
    • L
      vdpa/mlx5: Make hardware definitions visible to all mlx5 devices · 0aae392b
      Leon Romanovsky 提交于
      Move mlx5_vdpa IFC header file to the general include folder, so
      mlx5_core will be able to reuse it to check if VDPA is supported
      prior to creating an auxiliary device.
      
      As part of this move, update the header file name to mlx5 general
      naming scheme.
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      0aae392b
    • L
      net/mlx5_core: Clean driver version and name · 17a7612b
      Leon Romanovsky 提交于
      Remove exposed driver version as it was done in other drivers,
      so module version will work correctly by displaying the kernel
      version for which it is compiled.
      
      And move mlx5_core module name to general include, so auxiliary drivers
      will be able to use it as a basis for a name in their device ID tables.
      Reviewed-by: NParav Pandit <parav@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      17a7612b
    • Y
      net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering · d421e466
      Yevgeny Kliteynik 提交于
      STEs format for Connect-X5 and Connect-X6DX different. Currently, on
      Connext-X6DX the SW steering would break at some point when building STEs
      w/o giving a proper error message. Fix this by checking the STE format of
      the current device when initializing domain: add mlx5_ifc definitions for
      Connect-X6DX SW steering, read FW capability to get the current format
      version, and check this version when domain is being created.
      
      Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities")
      Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d421e466
  7. 27 11月, 2020 11 次提交
  8. 31 10月, 2020 1 次提交
  9. 27 10月, 2020 1 次提交
    • P
      RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049
      Parav Pandit 提交于
      When a mlx5 core devlink instance is reloaded in different net namespace,
      its associated IB device is deleted and recreated.
      
      Example sequence is:
      $ ip netns add foo
      $ devlink dev reload pci/0000:00:08.0 netns foo
      $ ip netns del foo
      
      mlx5 IB device needs to attach and detach the netdevice to it through the
      netdev notifier chain during load and unload sequence.  A below call graph
      of the unload flow.
      
      cleanup_net()
         down_read(&pernet_ops_rwsem); <- first sem acquired
           ops_pre_exit_list()
             pre_exit()
               devlink_pernet_pre_exit()
                 devlink_reload()
                   mlx5_devlink_reload_down()
                     mlx5_unload_one()
                     [...]
                       mlx5_ib_remove()
                         mlx5_ib_unbind_slave_port()
                           mlx5_remove_netdev_notifier()
                             unregister_netdevice_notifier()
                               down_write(&pernet_ops_rwsem);<- recurrsive lock
      
      Hence, when net namespace is deleted, mlx5 reload results in deadlock.
      
      When deadlock occurs, devlink mutex is also held. This not only deadlocks
      the mlx5 device under reload, but all the processes which attempt to
      access unrelated devlink devices are deadlocked.
      
      Hence, fix this by mlx5 ib driver to register for per net netdev notifier
      instead of global one, which operats on the net namespace without holding
      the pernet_ops_rwsem.
      
      Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
      Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      fbdd0049
  10. 13 10月, 2020 2 次提交
  11. 10 10月, 2020 2 次提交
  12. 03 10月, 2020 2 次提交
    • S
      net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b
      Saeed Mahameed 提交于
      In case of pci is offline reclaim_pages_cmd() will still try to call
      the FW to release FW pages, cmd_exec() in this case will return a silent
      success without actually calling the FW.
      
      This is wrong and will cause page leaks, what we should do is to detect
      pci offline or command interface un-available before tying to access the
      FW and manually release the FW pages in the driver.
      
      In this patch we share the code to check for FW command interface
      availability and we call it in sensitive places e.g. reclaim_pages_cmd().
      
      Alternative fix:
       1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
          command success simulation list.
       2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b898ce7b
    • E
      net/mlx5: Avoid possible free of command entry while timeout comp handler · 50b2412b
      Eran Ben Elisha 提交于
      Upon command completion timeout, driver simulates a forced command
      completion. In a rare case where real interrupt for that command arrives
      simultaneously, it might release the command entry while the forced
      handler might still access it.
      
      Fix that by adding an entry refcount, to track current amount of allowed
      handlers. Command entry to be released only when this refcount is
      decremented to zero.
      
      Command refcount is always initialized to one. For callback commands,
      command completion handler is the symmetric flow to decrement it. For
      non-callback commands, it is wait_func().
      
      Before ringing the doorbell, increment the refcount for the real completion
      handler. Once the real completion handler is called, it will decrement it.
      
      For callback commands, once the delayed work is scheduled, increment the
      refcount. Upon callback command completion handler, we will try to cancel
      the timeout callback. In case of success, we need to decrement the callback
      refcount as it will never run.
      
      In addition, gather the entry index free and the entry free into a one
      flow for all command types release.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      50b2412b
  13. 01 10月, 2020 1 次提交
  14. 18 9月, 2020 3 次提交
  15. 16 9月, 2020 2 次提交
    • O
      net/mlx5e: Add CQE compression support for multi-strides packets · b7cf0806
      Ofer Levi 提交于
      Add CQE compression support for completions of packets that span
      multiple strides in a Striding RQ, per the HW capability.
      In our memory model, we use small strides (256B as of today) for the
      non-linear SKB mode. This feature allows CQE compression to work also
      for multiple strides packets. In this case decompressing the mini CQE
      array will use stride index provided by HW as part of the mini CQE.
      Before this feature, compression was possible only for single-strided
      packets, i.e. for packets of size up to 256 bytes when in non-linear
      mode, and the index was maintained by SW.
      This feature is supported for ConnectX-5 and above.
      
      Feature performance test:
      This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
      62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
      packets before building the SKB to achieve low cpu utilization.
      Outcome is low cpu utilization and bottleneck on pci only.
      Test setup:
      Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
      NIC: ConnectX-6 DX.
      Sender side generates 300 byte packets at full pci bandwidth.
      Receiver side configuration:
      Single channel, one cpu processing with one ring allocated. Cpu utilization
      is ~20% while pci bandwidth is fully utilized.
      For the generated traffic and interface MTU of 4500B (to activate the
      non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
      to ~21Mpps.
      Without this feature, counters show no CQE compression blocks for
      this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
      in ~500K compression blocks.
      Signed-off-by: NOfer Levi <oferle@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b7cf0806
    • E
      net/mlx5: Always use container_of to find mdev pointer from clock struct · fb609b51
      Eran Ben Elisha 提交于
      Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
      some cases used container_of and on another used clock->mdev.
      
      Align code to use container_of amd remove clock->mdev pointer.
      While here, fix reverse xmas tree coding style.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      fb609b51
  16. 27 8月, 2020 1 次提交
  17. 29 7月, 2020 1 次提交
    • R
      net/mlx5e: Modify uplink state on interface up/down · 7d0314b1
      Ron Diskin 提交于
      When setting the PF interface up/down, notify the firmware to update
      uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.
      
      This behavior will prevent sending traffic out on uplink port when PF is
      down, such as sending traffic from a VF interface which is still up.
      Currently when calling mlx5e_open/close(), the driver only sends PAOS
      command to notify the firmware to set the physical port state to
      up/down, however, it is not sufficient. When VF is in "auto" state, it
      follows the uplink state, which was not updated on mlx5e_open/close()
      before this patch.
      
      When switchdev mode is enabled and uplink representor is first enabled,
      set the uplink port state value back to its FW default "AUTO".
      
      Fixes: 63bfd399 ("net/mlx5e: Send PAOS command on interface up/down")
      Signed-off-by: NRon Diskin <rondi@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      7d0314b1
  18. 28 7月, 2020 1 次提交
    • E
      net/mlx5: Hold pages RB tree per VF · d6945242
      Eran Ben Elisha 提交于
      Per page request event, FW request to allocated or release pages for a
      single function. Driver maintains FW pages object per function, so there
      is no need to hold one global page data-base. Instead, have a page
      data-base per function, which will improve performance release flow in all
      cases, especially for "release all pages".
      
      As the range of function IDs is large and not sequential, use xarray to
      store a per function ID page data-base, where the function ID is the key.
      
      Upon first allocation of a page to a function ID, create the page
      data-base per function. This data-base will be released only at pagealloc
      mechanism cleanup.
      
      NIC: ConnectX-4 Lx
      CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
      Test case: 32 VFs, measure release pages on one VF as part of FLR
      Before: 0.021 Sec
      After:  0.014 Sec
      
      The improvement depends on amount of VFs and memory utilization
      by them. Time measurements above were taken from idle system.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d6945242
  19. 27 7月, 2020 1 次提交