1. 15 2月, 2018 4 次提交
    • Y
      IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be
      Yonatan Cohen 提交于
      The current implementation of create CQ requires contiguous
      memory, such requirement is problematic once the memory is
      fragmented or the system is low in memory, it causes for
      failures in dma_zalloc_coherent().
      
      This patch implements new scheme of fragmented CQ to overcome
      this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
      to allocate fragmented buffers, rather than contiguous ones.
      
      Base the Completion Queues (CQs) on this new fragmented buffer.
      
      It fixes following crashes:
      kworker/29:0: page allocation failure: order:6, mode:0x80d0
      CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<>] dump_stack+0x19/0x1b
      [<>] warn_alloc_failed+0x110/0x180
      [<>] __alloc_pages_slowpath+0x6b7/0x725
      [<>] __alloc_pages_nodemask+0x405/0x420
      [<>] dma_generic_alloc_coherent+0x8f/0x140
      [<>] x86_swiotlb_alloc_coherent+0x21/0x50
      [<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
      [<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
      [<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
      [<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
      [<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
      [<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
      Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      388ca8be
    • S
      net/mlx5: Remove redundant EQ API exports · 3ec5693b
      Saeed Mahameed 提交于
      EQ structure and API is private to mlx5_core driver only, external
      drivers should not have access or the means to manipulate EQ objects.
      
      Remove redundant exports and move API functions out of the linux/mlx5
      include directory into the driver's mlx5_core.h private include file.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: NGal Pressman <galp@mellanox.com>
      3ec5693b
    • S
      net/mlx5: Move CQ completion and event forwarding logic to eq.c · 3ac7afdb
      Saeed Mahameed 提交于
      Since CQ tree is now per EQ, CQ completion and event forwarding became
      specific implementation of EQ logic, this patch moves that logic to eq.c
      and makes those functions static.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: NGal Pressman <galp@mellanox.com>
      3ac7afdb
    • S
      net/mlx5: CQ Database per EQ · 02d92f79
      Saeed Mahameed 提交于
      Before this patch the driver had one CQ database protected via one
      spinlock, this spinlock is meant to synchronize between CQ
      adding/removing and CQ IRQ interrupt handling.
      
      On a system with large number of CPUs and on a work load that requires
      lots of interrupts, this global spinlock becomes a very nasty hotspot
      and introduces a contention between the active cores, which will
      significantly hurt performance and becomes a bottleneck that prevents
      seamless cpu scaling.
      
      To solve this we simply move the CQ database and its spinlock to be per
      EQ (IRQ), thus per core.
      
      Tested with:
      system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
      netperf command: ./super_netperf 200 -P 0 -t TCP_RR  -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M
      
      WITHOUT THIS PATCH:
      Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  %guest  %gnice   %idle
      Average:     all    4.32    0.00   36.15    0.09    0.00   34.02   0.00    0.00    0.00   25.41
      
      Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
      Overhead  Command          Shared Object                 Symbol
      +   14.28%  swapper          [kernel.vmlinux]              [k] intel_idle
      +   12.25%  swapper          [kernel.vmlinux]              [k] queued_spin_lock_slowpath
      +   10.29%  netserver        [kernel.vmlinux]              [k] queued_spin_lock_slowpath
      +    1.32%  netserver        [kernel.vmlinux]              [k] mlx5e_xmit
      
      WITH THIS PATCH:
      Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
      Average:     all    4.27    0.00   34.31    0.01    0.00   18.71    0.00    0.00    0.00   42.69
      
      Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
      Overhead  Command          Shared Object             Symbol
      +   23.33%  swapper          [kernel.vmlinux]          [k] intel_idle
      +    1.69%  netserver        [kernel.vmlinux]          [k] mlx5e_xmit
      Tested-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Reviewed-by: NGal Pressman <galp@mellanox.com>
      02d92f79
  2. 05 2月, 2018 1 次提交
  3. 19 1月, 2018 1 次提交
  4. 12 1月, 2018 1 次提交
    • S
      net/mlx5: Fix get vector affinity helper function · 05e0cc84
      Saeed Mahameed 提交于
      mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
      reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
      API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
      mlx5 users.  To fix this, this patch provides an alternative way to
      retrieve IRQ vector affinity using legacy IRQ API, following
      smp_affinity read procfs implementation.
      
      Fixes: 231243c8 ("Revert mlx5: move affinity hints assignments to generic code")
      Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      05e0cc84
  5. 09 1月, 2018 5 次提交
  6. 29 12月, 2017 1 次提交
    • Y
      IB/mlx5: Extend UAR stuff to support dynamic allocation · 31a78a5a
      Yishai Hadas 提交于
      This patch extends the alloc context flow to be prepared for working
      with dynamic UAR allocations.
      
      Currently upon alloc context there is some fix size of UARs that are
      allocated (named 'static allocation') and there is no option to user
      application to ask for more or control which UAR will be used by which
      QP.
      
      In this patch the driver prepares its data structures to manage both the
      static and the dynamic allocations and let the user driver knows about
      the max value of dynamic blue-flame registers that are allowed.
      
      Downstream patches from this series will enable the dynamic allocation
      and the association as part of QP creation.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      31a78a5a
  7. 22 12月, 2017 1 次提交
  8. 20 12月, 2017 2 次提交
    • M
      net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c
      Moshe Shemesh 提交于
      When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
      In such failure flow the function will return without
      releasing all EQs irqs and then pci_free_irq_vectors will fail.
      Fix by only warn on destroy EQ failure and continue to release other
      EQs and their irqs.
      
      It fixes the following kernel trace:
      kernel: kernel BUG at drivers/pci/msi.c:352!
      ...
      ...
      kernel: Call Trace:
      kernel: pci_disable_msix+0xd3/0x100
      kernel: pci_free_irq_vectors+0xe/0x20
      kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d6b2785c
    • S
      Revert "mlx5: move affinity hints assignments to generic code" · 231243c8
      Saeed Mahameed 提交于
      Before the offending commit, mlx5 core did the IRQ affinity itself,
      and it seems that the new generic code have some drawbacks and one
      of them is the lack for user ability to modify irq affinity after
      the initial affinity values got assigned.
      
      The issue is still being discussed and a solution in the new generic code
      is required, until then we need to revert this patch.
      
      This fixes the following issue:
      echo <new affinity> > /proc/irq/<x>/smp_affinity
      fails with  -EIO
      
      This reverts commit a435393a.
      Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
      it is used in mlx5_ib driver.
      
      Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jes Sorensen <jsorensen@fb.com>
      Reported-by: NJes Sorensen <jsorensen@fb.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      231243c8
  9. 05 11月, 2017 2 次提交
  10. 15 10月, 2017 1 次提交
  11. 28 9月, 2017 1 次提交
  12. 31 8月, 2017 2 次提交
  13. 29 8月, 2017 1 次提交
  14. 25 8月, 2017 1 次提交
  15. 24 8月, 2017 1 次提交
  16. 23 8月, 2017 1 次提交
  17. 09 8月, 2017 2 次提交
  18. 07 8月, 2017 2 次提交
    • E
      net/mlx5: Delay events till ib registration ends · 97834eba
      Erez Shitrit 提交于
      When mlx5_ib registers itself to mlx5_core as an interface, it will
      call mlx5_add_device which will call mlx5_ib interface add callback,
      in case the latter successfully returns, only then mlx5_core will add
      it to the interface list and async events will be forwarded to mlx5_ib.
      Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib
      interface to its devices list, arriving mlx5_core events can be missed
      by the new mlx5_ib registering interface.
      
      In other words:
      thread 1: mlx5_ib: mlx5_register_interface(dev)
      thread 1: mlx5_core: mlx5_add_device(dev)
      thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add
      thread 2: mlx5_core_event: **new event arrives, forward to dev_list
      thread 1: mlx5_core: add_ctx_to_dev_list(ctx)
      /* previous event was missed by the new interface.*/
      It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device
      but not after.
      
      We fix this race by accumulating the events that come between the
      ib_register_device (inside mlx5_add_device->(dev->add)) till the adding
      to the list completes and fire them to the new registering interface
      after that.
      
      Fixes: f1ee87fe ("net/mlx5: Organize device list API in one place")
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      97834eba
    • S
      net/mlx5: Separate between E-Switch and MPFS · eeb66cdb
      Saeed Mahameed 提交于
      Multi-Physical Function Switch (MPFs) is required for when multi-PF
      configuration is enabled to allow passing user configured unicast MAC
      addresses to the requesting PF.
      
      Before this patch eswitch.c used to manage the HW MPFS l2 table,
      E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
      contexts update on unicast mac address list changes, to populate the PF's
      MPFS L2 table accordingly.
      
      In downstream patch we would like to allow compiling the driver without
      E-Switch functionalities, for that we move MPFS l2 table logic out
      of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
      allow compiling out MPFS for those who don't want Multi-PF support.
      
      NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
      API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
      responsible of updating its MPFS l2 table on behalf of its VFs.
      
      Due to this change we also don't require enabling vport(0) (PF vport)
      unicast mac changes events anymore, for when SRIOV is not enabled.
      Which means E-Switch is now activated only on SRIOV activation, and not
      required otherwise.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Cc: Jes Sorensen <jsorensen@fb.com>
      Cc: kernel-team@fb.com
      eeb66cdb
  19. 24 7月, 2017 2 次提交
  20. 27 6月, 2017 4 次提交
    • I
      net/mlx5: FPGA, Add SBU infrastructure · a9956d35
      Ilan Tayari 提交于
      Add interface to initialize and interact with Innova FPGA SBU
      connections.
      A client driver may use these functions to set up a high-speed DMA
      connection with its SBU hardware logic, and send/receive messages
      over this connection.
      
      A later patch in this patchset will make use of these functions for
      Innova IPSec offload in mlx5 Ethernet driver.
      
      Add commands to retrieve Innova FPGA SBU capabilities, and to
      read/write Innova FPGA configuration space registers and memory,
      over internal I2C.
      
      At high level, the FPGA configuration space is divided such:
       0x00000000 - 0x007fffff is reserved for the SBU
       0x00800000 - 0xffffffff is reserved for the Shell
      0x400000000 - ...        is DDR memory
      
      A later patchset will add support for accessing FPGA CrSpace and memory
      over a high-speed connection. This is the reason for the ACCESS_TYPE
      enumeration, which currently only supports I2C.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a9956d35
    • I
      net/mlx5: Add support for multiple RoCE enable · a6f7d2af
      Ilan Tayari 提交于
      Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
      well.
      Add support for counting number of enables, so that FPGA and IB can work
      in parallel and independently.
      Program the HW to enable RoCE on the first enable call, and program to
      disable RoCE on the last disable call.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a6f7d2af
    • I
      net/mlx5: Add reserved-gids support · 52ec462e
      Ilan Tayari 提交于
      Reserved GIDs are entries in the GID table in use by the mlx5_core
      and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
      The entries are reserved at the high indexes of the GID table.
      
      A mlx5 submodule may reserve a certain amount of GIDs for its own use
      during the load sequence by calling mlx5_core_reserve_gids, and must
      also take care to un-reserve these GIDs when it closes.
      Reservation is only allowed during the load sequence and before any
      interfaces (e.g. mlx5_ib or mlx5_en) are up.
      
      After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
      free to allocate entries from the reserved GIDs pool.
      
      Reserve a GID table entry for every supported FPGA QP.
      
      A later patch in the patchset will remove them from being reported to
      IB core.
      Another such patch will make use of these for FPGA QPs in Innova NIC.
      
      Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
      expose only public mlx5 API, more mlx5 library files will be added in
      future submissions.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      52ec462e
    • M
      net/mlx5: Cancel delayed recovery work when unloading the driver · 2a0165a0
      Mohamad Haj Yahia 提交于
      Draining the health workqueue will ignore future health works including
      the one that report hardware failure and thus we can't enter error state
      Instead cancel the recovery flow and make sure only recovery flow won't
      be scheduled.
      
      Fixes: 5e44fca5 ('net/mlx5: Only cancel recovery work when cleaning up device')
      Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      2a0165a0
  21. 22 6月, 2017 1 次提交
  22. 16 6月, 2017 1 次提交
  23. 23 5月, 2017 1 次提交
    • M
      net/mlx5: Avoid using pending command interface slots · 73dd3a48
      Mohamad Haj Yahia 提交于
      Currently when firmware command gets stuck or it takes long time to
      complete, the driver command will get timeout and the command slot is
      freed and can be used for new commands, and if the firmware receive new
      command on the old busy slot its behavior is unexpected and this could
      be harmful.
      To fix this when the driver command gets timeout we return failure,
      but we don't free the command slot and we wait for the firmware to
      explicitly respond to that command.
      Once all the entries are busy we will stop processing new firmware
      commands.
      
      Fixes: 9cba4ebc ('net/mlx5: Fix potential deadlock in command mode change')
      Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
      Cc: kernel-team@fb.com
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      73dd3a48
  24. 14 5月, 2017 1 次提交
    • I
      net/mlx5: FPGA, Add basic support for Innova · e29341fb
      Ilan Tayari 提交于
      Mellanox Innova is a NIC with ConnectX and an FPGA on the same
      board. The FPGA is a bump-on-the-wire and thus affects operation of
      the mlx5_core driver on the ConnectX ASIC.
      
      Add basic support for Innova in mlx5_core.
      
      This allows using the Innova card as a regular NIC, by detecting
      the FPGA capability bit, and verifying its load state before
      initializing ConnectX interfaces.
      
      Also detect FPGA fatal runtime failures and enter error state if
      they ever happen.
      
      All new FPGA-related logic is placed in its own subdirectory 'fpga',
      which may be built by selecting CONFIG_MLX5_FPGA.
      This prepares for further support of various Innova features in later
      patchsets.
      Additional details about hardware architecture will be provided as
      more features get submitted.
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      e29341fb