1. 27 11月, 2020 3 次提交
  2. 27 10月, 2020 1 次提交
    • P
      RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049
      Parav Pandit 提交于
      When a mlx5 core devlink instance is reloaded in different net namespace,
      its associated IB device is deleted and recreated.
      
      Example sequence is:
      $ ip netns add foo
      $ devlink dev reload pci/0000:00:08.0 netns foo
      $ ip netns del foo
      
      mlx5 IB device needs to attach and detach the netdevice to it through the
      netdev notifier chain during load and unload sequence.  A below call graph
      of the unload flow.
      
      cleanup_net()
         down_read(&pernet_ops_rwsem); <- first sem acquired
           ops_pre_exit_list()
             pre_exit()
               devlink_pernet_pre_exit()
                 devlink_reload()
                   mlx5_devlink_reload_down()
                     mlx5_unload_one()
                     [...]
                       mlx5_ib_remove()
                         mlx5_ib_unbind_slave_port()
                           mlx5_remove_netdev_notifier()
                             unregister_netdevice_notifier()
                               down_write(&pernet_ops_rwsem);<- recurrsive lock
      
      Hence, when net namespace is deleted, mlx5 reload results in deadlock.
      
      When deadlock occurs, devlink mutex is also held. This not only deadlocks
      the mlx5 device under reload, but all the processes which attempt to
      access unrelated devlink devices are deadlocked.
      
      Hence, fix this by mlx5 ib driver to register for per net netdev notifier
      instead of global one, which operats on the net namespace without holding
      the pernet_ops_rwsem.
      
      Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
      Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      fbdd0049
  3. 10 10月, 2020 1 次提交
  4. 03 10月, 2020 2 次提交
    • S
      net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b
      Saeed Mahameed 提交于
      In case of pci is offline reclaim_pages_cmd() will still try to call
      the FW to release FW pages, cmd_exec() in this case will return a silent
      success without actually calling the FW.
      
      This is wrong and will cause page leaks, what we should do is to detect
      pci offline or command interface un-available before tying to access the
      FW and manually release the FW pages in the driver.
      
      In this patch we share the code to check for FW command interface
      availability and we call it in sensitive places e.g. reclaim_pages_cmd().
      
      Alternative fix:
       1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
          command success simulation list.
       2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b898ce7b
    • E
      net/mlx5: Avoid possible free of command entry while timeout comp handler · 50b2412b
      Eran Ben Elisha 提交于
      Upon command completion timeout, driver simulates a forced command
      completion. In a rare case where real interrupt for that command arrives
      simultaneously, it might release the command entry while the forced
      handler might still access it.
      
      Fix that by adding an entry refcount, to track current amount of allowed
      handlers. Command entry to be released only when this refcount is
      decremented to zero.
      
      Command refcount is always initialized to one. For callback commands,
      command completion handler is the symmetric flow to decrement it. For
      non-callback commands, it is wait_func().
      
      Before ringing the doorbell, increment the refcount for the real completion
      handler. Once the real completion handler is called, it will decrement it.
      
      For callback commands, once the delayed work is scheduled, increment the
      refcount. Upon callback command completion handler, we will try to cancel
      the timeout callback. In case of success, we need to decrement the callback
      refcount as it will never run.
      
      In addition, gather the entry index free and the entry free into a one
      flow for all command types release.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      50b2412b
  5. 16 9月, 2020 1 次提交
  6. 28 7月, 2020 1 次提交
    • E
      net/mlx5: Hold pages RB tree per VF · d6945242
      Eran Ben Elisha 提交于
      Per page request event, FW request to allocated or release pages for a
      single function. Driver maintains FW pages object per function, so there
      is no need to hold one global page data-base. Instead, have a page
      data-base per function, which will improve performance release flow in all
      cases, especially for "release all pages".
      
      As the range of function IDs is large and not sequential, use xarray to
      store a per function ID page data-base, where the function ID is the key.
      
      Upon first allocation of a page to a function ID, create the page
      data-base per function. This data-base will be released only at pagealloc
      mechanism cleanup.
      
      NIC: ConnectX-4 Lx
      CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
      Test case: 32 VFs, measure release pages on one VF as part of FLR
      Before: 0.021 Sec
      After:  0.014 Sec
      
      The improvement depends on amount of VFs and memory utilization
      by them. Time measurements above were taken from idle system.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d6945242
  7. 17 7月, 2020 1 次提交
  8. 16 7月, 2020 2 次提交
  9. 10 7月, 2020 1 次提交
    • E
      net/mlx5e: Fix port buffers cell size value · 88b3d5c9
      Eran Ben Elisha 提交于
      Device unit for port buffers size, xoff_threshold and xon_threshold is
      cells. Fix a bug in driver where cell unit size was hard-coded to
      128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
      versions.
      
      Driver to read cell size from SBCAM register and translate bytes to cell
      units accordingly.
      
      In order to fix the bug, this patch exposes SBCAM (Shared buffer
      capabilities mask) layout and defines.
      
      If SBCAM.cap_cell_size is valid, use it for all bytes to cells
      calculations. If not valid, fallback to 128.
      
      Cell size do not change on the fly per device. Instead of issuing SBCAM
      access reg command every time such translation is needed, cache it in
      mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
      as a param to every function that needs bytes to cells translation.
      
      While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
      en_dcbnl.c, as it is only used by that file.
      
      Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Reviewed-by: NHuy Nguyen <huyn@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      88b3d5c9
  10. 30 5月, 2020 1 次提交
  11. 23 5月, 2020 3 次提交
    • E
      net/mlx5: Avoid processing commands before cmdif is ready · f7936ddd
      Eran Ben Elisha 提交于
      When driver is reloading during recovery flow, it can't get new commands
      till command interface is up again. Otherwise we may get to null pointer
      trying to access non initialized command structures.
      
      Add cmdif state to avoid processing commands while cmdif is not ready.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f7936ddd
    • E
      net/mlx5: Fix a race when moving command interface to events mode · d43b7007
      Eran Ben Elisha 提交于
      After driver creates (via FW command) an EQ for commands, the driver will
      be informed on new commands completion by EQE. However, due to a race in
      driver's internal command mode metadata update, some new commands will
      still be miss-handled by driver as if we are in polling mode. Such commands
      can get two non forced completion, leading to already freed command entry
      access.
      
      CREATE_EQ command, that maps EQ to the command queue must be posted to the
      command queue while it is empty and no other command should be posted.
      
      Add SW mechanism that once the CREATE_EQ command is about to be executed,
      all other commands will return error without being sent to the FW. Allow
      sending other commands only after successfully changing the driver's
      internal command mode metadata.
      We can safely return error to all other commands while creating the command
      EQ, as all other commands might be sent from the user/application during
      driver load. Application can rerun them later after driver's load was
      finished.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d43b7007
    • M
      net/mlx5: Add command entry handling completion · 17d00e83
      Moshe Shemesh 提交于
      When FW response to commands is very slow and all command entries in
      use are waiting for completion we can have a race where commands can get
      timeout before they get out of the queue and handled. Timeout
      completion on uninitialized command will cause releasing command's
      buffers before accessing it for initialization and then we will get NULL
      pointer exception while trying access it. It may also cause releasing
      buffers of another command since we may have timeout completion before
      even allocating entry index for this command.
      Add entry handling completion to avoid this race.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      17d00e83
  12. 19 5月, 2020 1 次提交
  13. 11 5月, 2020 1 次提交
    • G
      net/mlx5: Replace zero-length array with flexible-array · b6ca09cb
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      b6ca09cb
  14. 02 5月, 2020 1 次提交
  15. 29 4月, 2020 2 次提交
  16. 19 4月, 2020 2 次提交
  17. 27 3月, 2020 1 次提交
  18. 13 3月, 2020 2 次提交
  19. 05 3月, 2020 1 次提交
    • Y
      net/mlx5: Expose raw packet pacing APIs · 1326034b
      Yishai Hadas 提交于
      Expose raw packet pacing APIs to be used by DEVX based applications.
      The existing code was refactored to have a single flow with the new raw
      APIs.
      
      The new raw APIs considered the input of 'pp_rate_limit_context', uid,
      'dedicated', upon looking for an existing entry.
      
      This raw mode enables future device specification data in the raw
      context without changing the existing logic and code.
      
      The ability to ask for a dedicated entry gives control for application
      to allocate entries according to its needs.
      
      A dedicated entry may not be used by some other process and it also
      enables the process spreading its resources to some different entries
      for use different hardware resources as part of enforcing the rate.
      
      The counter per entry was changed to be u64 to prevent any option to
      overflow.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Acked-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      1326034b
  20. 19 2月, 2020 1 次提交
    • A
      net/mlx5: Add support for resource dump · 12206b17
      Aya Levin 提交于
      On driver load:
      - Initialize resource dump data structure and memory access tools (mkey
        & pd).
      - Read the resource dump's menu which contains the FW segment
        identifier. Each record is identified by the segment name (ASCII).
      
      During the driver's course of life, users (like reporters) may request
      dumps per segment. The user should create a command providing the
      segment identifier (SW enumeration) and command keys. In return, the
      user receives a command context. In order to receive the dump, the user
      should supply the command context and a memory (aligned to a PAGE) on
      which the dump content will be written. Since the dump may be larger
      than the given memory, the user may resubmit the command until received
      an indication of end-of-dump. It is the user's responsibility to destroy
      the command.
      Signed-off-by: NAya Levin <ayal@mellanox.com>
      Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      12206b17
  21. 26 1月, 2020 1 次提交
    • D
      IB/mlx5: Return the administrative GUID if exists · 4bbd4923
      Danit Goldberg 提交于
      A user can change the operational GUID (a.k.a affective GUID) through
      link/infiniband. Therefore it is preferred to return the currently set
      GUID if it exists instead of the operational.
      
      This way the PF can query which VF GUID will be set in the next bind.  In
      order to align with MAC address, zero is returned if administrative GUID
      is not set.
      
      For example, before setting administrative GUID:
       $ ip link show
       ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
       link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
       vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
       spoof checking off, NODE_GUID 00:00:00:00:00:00:00:00, PORT_GUID 00:00:00:00:00:00:00:00, link-state auto, trust off, query_rss off
      
      Then:
      
       $ ip link set ib0 vf 0 node_guid 11:00:af:21:cb:05:11:00
       $ ip link set ib0 vf 0 port_guid 22:11:af:21:cb:05:11:00
      
      After setting administrative GUID:
       $ ip link show
       ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 4092 qdisc mq state UP mode DEFAULT group default qlen 256
       link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
       vf 0     link/infiniband 00:00:00:08:fe:80:00:00:00:00:00:00:52:54:00:c0:fe:12:34:55 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff,
       spoof checking off, NODE_GUID 11:00:af:21:cb:05:11:00, PORT_GUID 22:11:af:21:cb:05:11:00, link-state auto, trust off, query_rss off
      
      Fixes: 9c0015ef ("IB/mlx5: Implement callbacks for getting VFs GUID attributes")
      Link: https://lore.kernel.org/r/20200116120048.12744-1-leon@kernel.orgSigned-off-by: NDanit Goldberg <danitg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4bbd4923
  22. 17 1月, 2020 3 次提交
  23. 08 1月, 2020 1 次提交
  24. 12 11月, 2019 1 次提交
  25. 02 11月, 2019 1 次提交
  26. 29 10月, 2019 1 次提交
  27. 02 9月, 2019 1 次提交
  28. 22 8月, 2019 1 次提交
    • E
      net/mlx5: Add HV VHCA infrastructure · 87175120
      Eran Ben Elisha 提交于
      HV VHCA is a layer which provides PF to VF communication channel based on
      HyperV PCI config channel. It implements Mellanox's Inter VHCA control
      communication protocol. The protocol contains control block in order to
      pass messages between the PF and VF drivers, and data blocks in order to
      pass actual data.
      
      The infrastructure is agent based. Each agent will be responsible of
      contiguous buffer blocks in the VHCA config space. This infrastructure will
      bind agents to their blocks, and those agents can only access read/write
      the buffer blocks assigned to them. Each agent will provide three
      callbacks (control, invalidate, cleanup). Control will be invoked when
      block-0 is invalidated with a command that concerns this agent. Invalidate
      callback will be invoked if one of the blocks assigned to this agent was
      invalidated. Cleanup will be invoked before the agent is being freed in
      order to clean all of its open resources or deferred works.
      
      Block-0 serves as the control block. All execution commands from the PF
      will be written by the PF over this block. VF will ack on those by
      writing on block-0 as well. Its format is described by struct
      mlx5_hv_vhca_control_block layout.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87175120
  29. 11 8月, 2019 1 次提交