1. 17 1月, 2017 1 次提交
  2. 30 10月, 2016 1 次提交
  3. 16 9月, 2016 1 次提交
  4. 17 2月, 2016 1 次提交
    • H
      net/mlx4_core: Set UAR page size to 4KB regardless of system page size · 85743f1e
      Huy Nguyen 提交于
      problem description:
      
      The current code sets UAR page size equal to system page size.
      The ConnectX-3 and ConnectX-3 Pro HWs require minimum 128 UAR pages.
      The mlx4 kernel drivers are not loaded if there is less than 128 UAR pages.
      
      solution:
      
      Always set UAR page to 4KB. This allows more UAR pages if the OS
      has PAGE_SIZE larger than 4KB. For example, PowerPC kernel use 64KB
      system page size, with 4MB uar region, there are 4MB/2/64KB = 32
      uars (half for uar, half for blueflame). This does not meet minimum 128
      UAR pages requirement. With 4KB UAR page, there are 4MB/2/4KB = 512 uars
      which meet the minimum requirement.
      
      Note that only codes in mlx4_core that deal with firmware know that uar
      page size is 4KB. Codes that deal with usr page in cq and qp context
      (mlx4_ib, mlx4_en and part of mlx4_core) still have the same assumption
      that uar page size equals to system page size.
      
      Note that with this implementation, on 64KB system page size kernel, there
      are 16 uars per system page but only one uars is used. The other 15
      uars are ignored because of the above assumption.
      
      Regarding SR-IOV, mlx4_core in hypervisor will set the uar page size
      to 4KB and mlx4_core code in virtual OS will obtain the uar page size from
      firmware.
      
      Regarding backward compatibility in SR-IOV, if hypervisor has this new code,
      the virtual OS must be updated. If hypervisor has old code, and the virtual
      OS has this new code, the new code will be backward compatible with the
      old code. If the uar size is big enough, this new code in VF continues to
      work with 64 KB uar page size (on PowerPc kernel). If the uar size does not
      meet 128 uars requirement, this new code not loaded in VF and print the same
      error message as the old code in Hypervisor.
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85743f1e
  5. 07 12月, 2015 1 次提交
  6. 28 10月, 2015 1 次提交
    • C
      net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes · c02b0501
      Carol L Soto 提交于
      When doing memcpy/memset of EQEs, we should use sizeof struct
      mlx4_eqe as the base size and not caps.eqe_size which could be bigger.
      
      If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt
      data in the master context.
      
      When using a 64 byte stride, the memcpy copied over 63 bytes to the
      slave_eq structure.  This resulted in copying over the entire eqe of
      interest, including its ownership bit -- and also 31 bytes of garbage
      into the next WQE in the slave EQ -- which did NOT include the ownership
      bit (and therefore had no impact).
      
      However, once the stride is increased to 128, we are overwriting the
      ownership bits of *three* eqes in the slave_eq struct.  This results
      in an incorrect ownership bit for those eqes, which causes the eq to
      seem to be full. The issue therefore surfaced only once 128-byte EQEs
      started being used in SRIOV and (overarchitectures that have 128/256
      byte cache-lines such as PPC) - e.g after commit 77507aa2
      "net/mlx4_core: Enable CQE/EQE stride support".
      
      Fixes: 08ff3235 ('mlx4: 64-byte CQE/EQE support')
      Signed-off-by: NCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c02b0501
  7. 09 10月, 2015 1 次提交
  8. 27 7月, 2015 1 次提交
  9. 04 6月, 2015 1 次提交
  10. 31 5月, 2015 2 次提交
    • I
      net/mlx4_core: Move affinity hints to mlx4_core ownership · de161803
      Ido Shamay 提交于
      Now that EQs management is in the sole responsibility of mlx4_core,
      the IRQ affinity hints configuration should be in its hands as well.
      request_irq is called only once by the first consumer (maybe mlx4_ib),
      so mlx4_en passes the affinity mask too late. We also need to request
      vectors according to the cores we want to run on.
      
      mlx4_core distribution of IRQs to cores is straight forward,
      EQ(i)->IRQ will set affinity hint to core i.
      Consumers need to request EQ vectors, according to their cores
      considerations (NUMA).
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de161803
    • M
      net/mlx4: Add EQ pool · c66fa19c
      Matan Barak 提交于
      Previously, mlx4_en allocated EQs and used them exclusively.
      This affected RoCE performance, as applications which are
      events sensitive were limited to use only the legacy EQs.
      
      Change that by introducing an EQ pool. This pool is managed
      by mlx4_core. EQs are assigned to ports (when there are limited
      number of EQs, multiple ports could be assigned to the same EQs).
      
      An exception to this rule is the ASYNC EQ which handles various events.
      
      Legacy EQs are completely removed as all EQs could be shared.
      
      When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
      EQ serving on a specific port. The core driver calculates which
      EQ should be assigned to that request.
      
      Because IRQs are shared between IB and Ethernet modules, their
      names only include the PCI device BDF address.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c66fa19c
  11. 25 5月, 2015 1 次提交
  12. 16 4月, 2015 1 次提交
  13. 10 4月, 2015 1 次提交
    • A
      mlx4/mlx5: Use dma_wmb/rmb where appropriate · 12b3375f
      Alexander Duyck 提交于
      This patch should help to improve the performance of the mlx4 and mlx5 on a
      number of architectures.  For example, on x86 the dma_wmb/rmb equates out
      to a barrer() call as the architecture is already strong ordered, and on
      PowerPC the call works out to a lwsync which is significantly less expensive
      than the sync call that was being used for wmb.
      
      I placed the new barriers between any spots that seemed to be trying to
      order memory/memory reads or writes, if there are any spots that involved
      MMIO I left the existing wmb in place as the new barriers cannot order
      transactions between coherent and non-coherent memories.
      
      v2: Reduced the replacments to just the spots where I could clearly
          identify the usage pattern.
      
      Cc: Amir Vadai <amirv@mellanox.com>
      Cc: Ido Shamay <idos@mellanox.com>
      Cc: Eli Cohen <eli@mellanox.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12b3375f
  14. 25 3月, 2015 1 次提交
    • J
      net/mlx4_core: Fix GEN_EQE accessing uninitialixed mutex · bffb023a
      Jack Morgenstein 提交于
      We occasionally see in procedure mlx4_GEN_EQE that the driver tries
      to grab an uninitialized mutex.
      
      This can occur in only one of two ways:
      1. We are trying to generate an async event on an uninitialized slave.
      2. We are trying to generate an async event on an illegal slave number
         ( < 0 or > persist->num_vfs) or an inactive slave.
      
      To deal with #1: move the mutex initialization from specific slave init
      sequence in procedure mlx_master_do_cmd to mlx4_multi_func_init() (so that
      the mutex is always initialized for all slaves).
      
      To deal with #2: check in procedure mlx4_GEN_EQE that the slave number
      provided is in the proper range and that the slave is active.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bffb023a
  15. 28 1月, 2015 2 次提交
  16. 26 1月, 2015 2 次提交
    • Y
      net/mlx4_core: Enable device recovery flow with SRIOV · 55ad3592
      Yishai Hadas 提交于
      In SRIOV, both the PF and the VF may attempt device recovery whenever they
      assume that the device is not functioning.  When the PF driver resets the
      device, the VF should detect this and attempt to reinitialize itself.
      
      The VF must be able to reset itself under all circumstances, even
      if the PF is not responsive.
      
      The VF shall reset itself in the following cases:
      
      1. Commands are not processed within reasonable time over the communication channel.
      This is done considering device state and the correct return code based on
      the command as was done in the native mode, done in the next patch.
      
      2. The VF driver receives an internal error event reported by the PF on the
      communication channel. This occurs when the PF driver resets the device or
      when VF is out of sync with the PF.
      
      Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
      PF is not responsive.
      
      As PF and VF may run their reset flow simulantanisly, there are several cases
      that are handled:
      - Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
      - Prevent PF getting VF commands before it has finished initializing its resources.
      - Upon VF startup, check that comm-channel is online before sending
        commands to the PF and getting timed-out.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55ad3592
    • Y
      net/mlx4_core: Maintain a persistent memory for mlx4 device · 872bf2fb
      Yishai Hadas 提交于
      Maintain a persistent memory that should survive reset flow/PCI error.
      This comes as a preparation for coming series to support above flows.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      872bf2fb
  17. 12 12月, 2014 1 次提交
    • M
      net/mlx4_core: Use tasklet for user-space CQ completion events · 3dca0f42
      Matan Barak 提交于
      Previously, we've fired all our completion callbacks straight from our ISR.
      
      Some of those callbacks were lightweight (for example, mlx4_en's and
      IPoIB napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over those
      events could be so long, that we'll get into a hard lockup by the system
      watchdog.
      
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive completion
      event to a per-EQ list and schedule a tasklet. In the tasklet context
      we loop over all the CQs in the list and invoke the user callback.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dca0f42
  18. 14 11月, 2014 1 次提交
  19. 27 10月, 2014 1 次提交
    • E
      net/mlx4_core: Call synchronize_irq() before freeing EQ buffer · bf1bac5b
      Eli Cohen 提交于
      After moving the EQ ownership to software effectively destroying it, call
      synchronize_irq() to ensure that any handler routines running on other CPU
      cores finish execution. Only then free the EQ buffer.
      The same thing is done when we destroy a CQ which is one of the sources
      generating interrupts. In the case of CQ we want to avoid completion handlers
      on a CQ that was destroyed. In the case we do the same to avoid receiving
      asynchronous events after the EQ has been destroyed and its buffers freed.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf1bac5b
  20. 20 9月, 2014 1 次提交
  21. 03 7月, 2014 1 次提交
  22. 02 6月, 2014 2 次提交
  23. 15 5月, 2014 1 次提交
  24. 09 5月, 2014 1 次提交
  25. 21 3月, 2014 1 次提交
    • M
      net/mlx4: Adapt code for N-Port VF · 449fc488
      Matan Barak 提交于
      Adds support for N-Port VFs, this includes:
      1. Adding support in the wrapped FW command
      	In wrapped commands, we need to verify and convert
      	the slave's port into the real physical port.
      	Furthermore, when sending the response back to the slave,
      	a reverse conversion should be made.
      2. Adjusting sqpn for QP1 para-virtualization
      	The slave assumes that sqpn is used for QP1 communication.
      	If the slave is assigned to a port != (first port), we need
      	to adjust the sqpn that will direct its QP1 packets into the
      	correct endpoint.
      3. Adjusting gid[5] to modify the port for raw ethernet
      	In B0 steering, gid[5] contains the port. It needs
      	to be adjusted into the physical port.
      4. Adjusting number of ports in the query / ports caps in the FW commands
      	When a slave queries the hardware, it needs to view only
      	the physical ports it's assigned to.
      5. Adjusting the sched_qp according to the port number
      	The QP port is encoded in the sched_qp, thus in modify_qp we need
      	to encode the correct port in sched_qp.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      449fc488
  26. 17 1月, 2014 1 次提交
  27. 10 12月, 2013 1 次提交
    • J
      mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and MPTs · 7c6d74d2
      Jack Morgenstein 提交于
      Commit f4ec9e95 "mlx4_core: Change bitmap allocator to work in round-robin fashion"
      introduced round-robin allocation (via bitmap) for all resources which allocate
      via a bitmap.
      
      Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds.
      These are simply numbers, with no involvement of ICM memory mapping.
      
      Round robin is required for QPs, since we had a problem with immediate
      reuse of a 24-bit QP number (commit f4ec9e95).
      
      However, for other resources which use the bitmap allocator and involve
      mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable.
      
      What happens in these cases is the following:
      
      ICM memory is allocated and mapped in chunks of 256K.
      
      Since the resource allocation index goes up monotonically, the allocator
      will eventually require mapping a new chunk. Now, chunks are also unmapped
      when their reference count goes back to zero.  Thus, if a single app is
      running and starts/exits frequently we will have the following situation:
      
      When the app starts, a new chunk must be allocated and mapped.
      
      When the app exits, the chunk reference count goes back to zero, and the
      chunk is unmapped and freed. Therefore, the app must pay the cost of allocation
      and mapping of ICM memory each time it runs (although the price is paid only when
      allocating the initial entry in the new chunk).
      
      For apps which allocate MPTs/SRQs/CQs and which operate as described above,
      this presented a performance problem.
      
      We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs.
      Reported-by: NMatthew Finlay <matt@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6d74d2
  28. 08 11月, 2013 1 次提交
  29. 29 7月, 2013 1 次提交
  30. 14 6月, 2013 1 次提交
  31. 25 4月, 2013 1 次提交
  32. 22 3月, 2013 1 次提交
  33. 30 11月, 2012 1 次提交
  34. 27 11月, 2012 1 次提交
    • O
      mlx4: 64-byte CQE/EQE support · 08ff3235
      Or Gerlitz 提交于
      ConnectX-3 devices can use either 64- or 32-byte completion queue
      entries (CQEs) and event queue entries (EQEs).  Using 64-byte
      EQEs/CQEs performs better because each entry is aligned to a complete
      cacheline.  This patch queries the HCA's capabilities, and if it
      supports 64-byte CQEs and EQES the driver will configure the HW to
      work in 64-byte mode.
      
      The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
      
      Since this mode is global, userspace (libmlx4) must be updated to work
      with the configured CQE size, and guests using SR-IOV virtual
      functions need to know both EQE and CQE size.
      
      In case one of the 64-byte CQE/EQE capabilities is activated, the
      patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
      command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
      they need an update to be able to work with the PPF. This is done by
      changing the returned pf_context_behaviour not to be zero any more. In
      case none of these capabilities is activated that value remains zero
      and older guest drivers can run OK.
      
      The SRIOV related flow is as follows
      
      1. the PPF does the detection of the new capabilities using
         QUERY_DEV_CAP command.
      
      2. the PPF activates the new capabilities using INIT_HCA.
      
      3. the VF detects if the PPF activated the capabilities using
         QUERY_HCA, and if this is the case activates them for itself too.
      
      Note that the VF detects that it must be aware to the new PF behaviour
      using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.
      
      User space notification is done through a new field introduced in
      struct mlx4_ib_ucontext which holds device capabilities for which user
      space must take action. This changes the binary interface so the ABI
      towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
      when **needed** i.e. only when the driver does use 64-byte CQEs or
      future device capabilities which must be in sync by user space. This
      practice allows to work with unmodified libmlx4 on older devices (e.g
      A0, B0) which don't support 64-byte CQEs.
      
      In order to keep existing systems functional when they update to a
      newer kernel that contains these changes in VF and userspace ABI, a
      module parameter enable_64b_cqe_eqe must be set to enable 64-byte
      mode; the default is currently false.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      08ff3235
  35. 19 11月, 2012 1 次提交
  36. 26 10月, 2012 1 次提交