1. 31 5月, 2015 1 次提交
    • M
      net/mlx4: Add EQ pool · c66fa19c
      Matan Barak 提交于
      Previously, mlx4_en allocated EQs and used them exclusively.
      This affected RoCE performance, as applications which are
      events sensitive were limited to use only the legacy EQs.
      
      Change that by introducing an EQ pool. This pool is managed
      by mlx4_core. EQs are assigned to ports (when there are limited
      number of EQs, multiple ports could be assigned to the same EQs).
      
      An exception to this rule is the ASYNC EQ which handles various events.
      
      Legacy EQs are completely removed as all EQs could be shared.
      
      When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
      EQ serving on a specific port. The core driver calculates which
      EQ should be assigned to that request.
      
      Because IRQs are shared between IB and Ethernet modules, their
      names only include the PCI device BDF address.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c66fa19c
  2. 25 5月, 2015 1 次提交
  3. 16 4月, 2015 1 次提交
  4. 10 4月, 2015 1 次提交
    • A
      mlx4/mlx5: Use dma_wmb/rmb where appropriate · 12b3375f
      Alexander Duyck 提交于
      This patch should help to improve the performance of the mlx4 and mlx5 on a
      number of architectures.  For example, on x86 the dma_wmb/rmb equates out
      to a barrer() call as the architecture is already strong ordered, and on
      PowerPC the call works out to a lwsync which is significantly less expensive
      than the sync call that was being used for wmb.
      
      I placed the new barriers between any spots that seemed to be trying to
      order memory/memory reads or writes, if there are any spots that involved
      MMIO I left the existing wmb in place as the new barriers cannot order
      transactions between coherent and non-coherent memories.
      
      v2: Reduced the replacments to just the spots where I could clearly
          identify the usage pattern.
      
      Cc: Amir Vadai <amirv@mellanox.com>
      Cc: Ido Shamay <idos@mellanox.com>
      Cc: Eli Cohen <eli@mellanox.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b3375f
  5. 25 3月, 2015 1 次提交
    • J
      net/mlx4_core: Fix GEN_EQE accessing uninitialixed mutex · bffb023a
      Jack Morgenstein 提交于
      We occasionally see in procedure mlx4_GEN_EQE that the driver tries
      to grab an uninitialized mutex.
      
      This can occur in only one of two ways:
      1. We are trying to generate an async event on an uninitialized slave.
      2. We are trying to generate an async event on an illegal slave number
         ( < 0 or > persist->num_vfs) or an inactive slave.
      
      To deal with #1: move the mutex initialization from specific slave init
      sequence in procedure mlx_master_do_cmd to mlx4_multi_func_init() (so that
      the mutex is always initialized for all slaves).
      
      To deal with #2: check in procedure mlx4_GEN_EQE that the slave number
      provided is in the proper range and that the slave is active.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bffb023a
  6. 28 1月, 2015 2 次提交
  7. 26 1月, 2015 2 次提交
    • Y
      net/mlx4_core: Enable device recovery flow with SRIOV · 55ad3592
      Yishai Hadas 提交于
      In SRIOV, both the PF and the VF may attempt device recovery whenever they
      assume that the device is not functioning.  When the PF driver resets the
      device, the VF should detect this and attempt to reinitialize itself.
      
      The VF must be able to reset itself under all circumstances, even
      if the PF is not responsive.
      
      The VF shall reset itself in the following cases:
      
      1. Commands are not processed within reasonable time over the communication channel.
      This is done considering device state and the correct return code based on
      the command as was done in the native mode, done in the next patch.
      
      2. The VF driver receives an internal error event reported by the PF on the
      communication channel. This occurs when the PF driver resets the device or
      when VF is out of sync with the PF.
      
      Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
      PF is not responsive.
      
      As PF and VF may run their reset flow simulantanisly, there are several cases
      that are handled:
      - Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
      - Prevent PF getting VF commands before it has finished initializing its resources.
      - Upon VF startup, check that comm-channel is online before sending
        commands to the PF and getting timed-out.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55ad3592
    • Y
      net/mlx4_core: Maintain a persistent memory for mlx4 device · 872bf2fb
      Yishai Hadas 提交于
      Maintain a persistent memory that should survive reset flow/PCI error.
      This comes as a preparation for coming series to support above flows.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      872bf2fb
  8. 12 12月, 2014 1 次提交
    • M
      net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42
      Matan Barak 提交于
      Previously, we've fired all our completion callbacks straight from our ISR.
      
      Some of those callbacks were lightweight (for example, mlx4_en's and
      IPoIB napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over those
      events could be so long, that we'll get into a hard lockup by the system
      watchdog.
      
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive completion
      event to a per-EQ list and schedule a tasklet. In the tasklet context
      we loop over all the CQs in the list and invoke the user callback.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dca0f42
  9. 14 11月, 2014 1 次提交
  10. 27 10月, 2014 1 次提交
    • E
      net/mlx4_core: Call synchronize_irq() before freeing EQ buffer · bf1bac5b
      Eli Cohen 提交于
      After moving the EQ ownership to software effectively destroying it, call
      synchronize_irq() to ensure that any handler routines running on other CPU
      cores finish execution. Only then free the EQ buffer.
      The same thing is done when we destroy a CQ which is one of the sources
      generating interrupts. In the case of CQ we want to avoid completion handlers
      on a CQ that was destroyed. In the case we do the same to avoid receiving
      asynchronous events after the EQ has been destroyed and its buffers freed.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf1bac5b
  11. 20 9月, 2014 1 次提交
  12. 03 7月, 2014 1 次提交
  13. 02 6月, 2014 2 次提交
  14. 15 5月, 2014 1 次提交
  15. 09 5月, 2014 1 次提交
  16. 21 3月, 2014 1 次提交
    • M
      net/mlx4: Adapt code for N-Port VF · 449fc488
      Matan Barak 提交于
      Adds support for N-Port VFs, this includes:
      1. Adding support in the wrapped FW command
      	In wrapped commands, we need to verify and convert
      	the slave's port into the real physical port.
      	Furthermore, when sending the response back to the slave,
      	a reverse conversion should be made.
      2. Adjusting sqpn for QP1 para-virtualization
      	The slave assumes that sqpn is used for QP1 communication.
      	If the slave is assigned to a port != (first port), we need
      	to adjust the sqpn that will direct its QP1 packets into the
      	correct endpoint.
      3. Adjusting gid[5] to modify the port for raw ethernet
      	In B0 steering, gid[5] contains the port. It needs
      	to be adjusted into the physical port.
      4. Adjusting number of ports in the query / ports caps in the FW commands
      	When a slave queries the hardware, it needs to view only
      	the physical ports it's assigned to.
      5. Adjusting the sched_qp according to the port number
      	The QP port is encoded in the sched_qp, thus in modify_qp we need
      	to encode the correct port in sched_qp.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      449fc488
  17. 17 1月, 2014 1 次提交
  18. 10 12月, 2013 1 次提交
    • J
      mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and MPTs · 7c6d74d2
      Jack Morgenstein 提交于
      Commit f4ec9e95 "mlx4_core: Change bitmap allocator to work in round-robin fashion"
      introduced round-robin allocation (via bitmap) for all resources which allocate
      via a bitmap.
      
      Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds.
      These are simply numbers, with no involvement of ICM memory mapping.
      
      Round robin is required for QPs, since we had a problem with immediate
      reuse of a 24-bit QP number (commit f4ec9e95).
      
      However, for other resources which use the bitmap allocator and involve
      mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable.
      
      What happens in these cases is the following:
      
      ICM memory is allocated and mapped in chunks of 256K.
      
      Since the resource allocation index goes up monotonically, the allocator
      will eventually require mapping a new chunk. Now, chunks are also unmapped
      when their reference count goes back to zero.  Thus, if a single app is
      running and starts/exits frequently we will have the following situation:
      
      When the app starts, a new chunk must be allocated and mapped.
      
      When the app exits, the chunk reference count goes back to zero, and the
      chunk is unmapped and freed. Therefore, the app must pay the cost of allocation
      and mapping of ICM memory each time it runs (although the price is paid only when
      allocating the initial entry in the new chunk).
      
      For apps which allocate MPTs/SRQs/CQs and which operate as described above,
      this presented a performance problem.
      
      We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs.
      Reported-by: NMatthew Finlay <matt@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6d74d2
  19. 08 11月, 2013 1 次提交
  20. 29 7月, 2013 1 次提交
  21. 14 6月, 2013 1 次提交
  22. 25 4月, 2013 1 次提交
  23. 22 3月, 2013 1 次提交
  24. 30 11月, 2012 1 次提交
  25. 27 11月, 2012 1 次提交
    • O
      mlx4: 64-byte CQE/EQE support · 08ff3235
      Or Gerlitz 提交于
      ConnectX-3 devices can use either 64- or 32-byte completion queue
      entries (CQEs) and event queue entries (EQEs).  Using 64-byte
      EQEs/CQEs performs better because each entry is aligned to a complete
      cacheline.  This patch queries the HCA's capabilities, and if it
      supports 64-byte CQEs and EQES the driver will configure the HW to
      work in 64-byte mode.
      
      The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
      
      Since this mode is global, userspace (libmlx4) must be updated to work
      with the configured CQE size, and guests using SR-IOV virtual
      functions need to know both EQE and CQE size.
      
      In case one of the 64-byte CQE/EQE capabilities is activated, the
      patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
      command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
      they need an update to be able to work with the PPF. This is done by
      changing the returned pf_context_behaviour not to be zero any more. In
      case none of these capabilities is activated that value remains zero
      and older guest drivers can run OK.
      
      The SRIOV related flow is as follows
      
      1. the PPF does the detection of the new capabilities using
         QUERY_DEV_CAP command.
      
      2. the PPF activates the new capabilities using INIT_HCA.
      
      3. the VF detects if the PPF activated the capabilities using
         QUERY_HCA, and if this is the case activates them for itself too.
      
      Note that the VF detects that it must be aware to the new PF behaviour
      using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.
      
      User space notification is done through a new field introduced in
      struct mlx4_ib_ucontext which holds device capabilities for which user
      space must take action. This changes the binary interface so the ABI
      towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
      when **needed** i.e. only when the driver does use 64-byte CQEs or
      future device capabilities which must be in sync by user space. This
      practice allows to work with unmodified libmlx4 on older devices (e.g
      A0, B0) which don't support 64-byte CQEs.
      
      In order to keep existing systems functional when they update to a
      newer kernel that contains these changes in VF and userspace ABI, a
      module parameter enable_64b_cqe_eqe must be set to enable 64-byte
      mode; the default is currently false.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      08ff3235
  26. 19 11月, 2012 1 次提交
  27. 26 10月, 2012 1 次提交
  28. 24 10月, 2012 1 次提交
  29. 01 10月, 2012 2 次提交
    • J
      IB/mlx4: Miscellaneous adjustments for SR-IOV IB support · 992e8e6e
      Jack Morgenstein 提交于
      1. Allow only master to change node description.
      2. Prevent AH leakage in send mads.
      3. Take device part number from PCI structure, so that guests see the
         VF part number (and not the PF part number).
      4. Place the device revision ID into caps structure at startup.
      5. SET_PORT in update_gids_task needs to go through wrapper on master.
      6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
         queue on the master, since it propagates events to slaves using
         GEN_EQE.
      7. Do not support FMR on slaves.
      8. Add spinlock to slave_event(), since it is called both in interrupt
         context and in process context (due to 6 above, and also if
         smp_snoop is used).  This fix was found and implemented by Saeed
         Mahameed <saeedm@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      992e8e6e
    • J
      mlx4_core: Add IB port-state machine and port mgmt event propagation · 993c401e
      Jack Morgenstein 提交于
      For an IB port, a slave should not show port active until that slave
      has a valid alias-guid (provided by the subnet manager).  Therefore
      the port-up event should be passed to a slave only after both the port
      is up, and the slave's alias-guid has been set.
      
      Also, provide the infrastructure for propagating port-management
      events (client-reregister, etc) to slaves.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      993c401e
  30. 19 7月, 2012 1 次提交
  31. 11 7月, 2012 1 次提交
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  32. 01 6月, 2012 1 次提交
  33. 13 3月, 2012 1 次提交
  34. 22 2月, 2012 1 次提交
    • Y
      mlx4: Replacing pool_lock with mutex · 730c41d5
      Yevgeny Petrilin 提交于
      Under the spinlock we call request_irq(), which allocates memory with GFP_KERNEL,
      This causes the following trace when DEBUG_SPINLOCK is enabled, it can cause
      the following trace:
      
       BUG: spinlock wrong CPU on CPU#2, ethtool/2595
       lock: ffff8801f9cbc2b0, .magic: dead4ead, .owner: ethtool/2595, .owner_cpu: 0
       Pid: 2595, comm: ethtool Not tainted 3.0.18 #2
       Call Trace:
       spin_bug+0xa2/0xf0
       do_raw_spin_unlock+0x71/0xa0
       _raw_spin_unlock+0xe/0x10
       mlx4_assign_eq+0x12b/0x190 [mlx4_core]
       mlx4_en_activate_cq+0x252/0x2d0 [mlx4_en]
       ? mlx4_en_activate_rx_rings+0x227/0x370 [mlx4_en]
       mlx4_en_start_port+0x189/0xb90 [mlx4_en]
       mlx4_en_set_ringparam+0x29a/0x340 [mlx4_en]
       dev_ethtool+0x816/0xb10
       ? dev_get_by_name_rcu+0xa4/0xe0
       dev_ioctl+0x2b5/0x470
       handle_mm_fault+0x1cd/0x2d0
       sock_do_ioctl+0x5d/0x70
       sock_ioctl+0x79/0x2f0
       do_vfs_ioctl+0x8c/0x340
       sys_ioctl+0xa1/0xb0
       system_call_fastpath+0x16/0x1b
      
      Replacing with mutex, which is enough in this case.
      Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      730c41d5
  35. 14 2月, 2012 1 次提交
  36. 23 1月, 2012 1 次提交