1. 01 1月, 2014 1 次提交
  2. 20 12月, 2013 2 次提交
  3. 10 12月, 2013 1 次提交
    • J
      mlx4_core: Roll back round robin bitmap allocation commit for CQs, SRQs, and MPTs · 7c6d74d2
      Jack Morgenstein 提交于
      Commit f4ec9e95 "mlx4_core: Change bitmap allocator to work in round-robin fashion"
      introduced round-robin allocation (via bitmap) for all resources which allocate
      via a bitmap.
      
      Round robin allocation is desirable for mcgs, counters, pd's, UARs, and xrcds.
      These are simply numbers, with no involvement of ICM memory mapping.
      
      Round robin is required for QPs, since we had a problem with immediate
      reuse of a 24-bit QP number (commit f4ec9e95).
      
      However, for other resources which use the bitmap allocator and involve
      mapping ICM memory -- MPTs, CQs, SRQs -- round-robin is not desirable.
      
      What happens in these cases is the following:
      
      ICM memory is allocated and mapped in chunks of 256K.
      
      Since the resource allocation index goes up monotonically, the allocator
      will eventually require mapping a new chunk. Now, chunks are also unmapped
      when their reference count goes back to zero.  Thus, if a single app is
      running and starts/exits frequently we will have the following situation:
      
      When the app starts, a new chunk must be allocated and mapped.
      
      When the app exits, the chunk reference count goes back to zero, and the
      chunk is unmapped and freed. Therefore, the app must pay the cost of allocation
      and mapping of ICM memory each time it runs (although the price is paid only when
      allocating the initial entry in the new chunk).
      
      For apps which allocate MPTs/SRQs/CQs and which operate as described above,
      this presented a performance problem.
      
      We therefore roll back the round-robin allocator modification for MPTs, CQs, SRQs.
      Reported-by: NMatthew Finlay <matt@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c6d74d2
  4. 04 12月, 2013 1 次提交
  5. 08 11月, 2013 1 次提交
  6. 05 11月, 2013 1 次提交
    • J
      mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61
      Jack Morgenstein 提交于
      This is step #1 for implementing SRIOV resource quotas for VFs.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                                   VLAN, and Counters.
      
      The quota system works as follows:
      Each entity (VF or PF) is given a max number of a given resource (its quota),
      and a guaranteed minimum number for each resource (starvation prevention).
      
      For QPs, CQs, SRQs, MPTs and MTTs:
      50% of the available quantity for the resource is divided equally among
      the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
      parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
      The other 50% is the "free pool", allocated on a first-come-first-serve basis.
      For each VF/PF, resources are first allocated from its "guaranteed-minimum"
      pool. When that pool is exhausted, the driver attempts to allocate from
      the resource "free-pool".
      
      The quota (i.e., max) for the VFs and the PF is:
        The free-pool amount (50% of the real max) + the guaranteed minimum
      
      For MACs:
        Guarantee 2 MACs per VF/PF per port. As a result, since we have only
        128 MACs per port, reduce the allowable number of VFs from 64 to 63.
        Any remaining MACs are put into a free pool.
      
      For VLANs:
        For the PF, the per-port quota is 128 and guarantee is 64
           (to allow the PF to register at least a VLAN per VF in VST mode).
        For the VFs, the per-port quota is 64 and the guarantee is 0.
            We assume that VGT VFs are trusted not to abuse the VLAN resource.
      
      For Counters:
        For all functions (PF and VFs), the quota is 128 and the guarantee is 0.
      
      In this patch, we define the needed structures, which are added to the
      resource-tracker struct.  In addition, we do initialization
      for the resource quota, and adjust the query_device response to use quotas
      rather than resource maxima.
      
      As part of the implementation, we introduce a new field in
      mlx4_dev: quotas.  This field holds the resource quotas used
      to report maxima to the upper layers (ib_core, via query_device).
      
      The HCA maxima of these values are passed to the VFs (via
      QUERY_HCA) so that they may continue to use these in handling
      QPs, CQs, SRQs and MPTs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a0d0a61
  7. 18 10月, 2013 1 次提交
  8. 02 8月, 2013 1 次提交
  9. 29 7月, 2013 1 次提交
  10. 26 6月, 2013 2 次提交
  11. 24 6月, 2013 1 次提交
  12. 18 6月, 2013 1 次提交
  13. 05 6月, 2013 1 次提交
  14. 25 4月, 2013 2 次提交
  15. 08 3月, 2013 1 次提交
  16. 26 2月, 2013 1 次提交
  17. 08 2月, 2013 1 次提交
  18. 06 2月, 2013 1 次提交
  19. 05 2月, 2013 1 次提交
  20. 01 2月, 2013 1 次提交
  21. 28 1月, 2013 1 次提交
  22. 19 1月, 2013 1 次提交
  23. 20 12月, 2012 2 次提交
  24. 08 12月, 2012 1 次提交
  25. 04 12月, 2012 1 次提交
  26. 27 11月, 2012 1 次提交
    • O
      mlx4: 64-byte CQE/EQE support · 08ff3235
      Or Gerlitz 提交于
      ConnectX-3 devices can use either 64- or 32-byte completion queue
      entries (CQEs) and event queue entries (EQEs).  Using 64-byte
      EQEs/CQEs performs better because each entry is aligned to a complete
      cacheline.  This patch queries the HCA's capabilities, and if it
      supports 64-byte CQEs and EQES the driver will configure the HW to
      work in 64-byte mode.
      
      The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
      
      Since this mode is global, userspace (libmlx4) must be updated to work
      with the configured CQE size, and guests using SR-IOV virtual
      functions need to know both EQE and CQE size.
      
      In case one of the 64-byte CQE/EQE capabilities is activated, the
      patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
      command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
      they need an update to be able to work with the PPF. This is done by
      changing the returned pf_context_behaviour not to be zero any more. In
      case none of these capabilities is activated that value remains zero
      and older guest drivers can run OK.
      
      The SRIOV related flow is as follows
      
      1. the PPF does the detection of the new capabilities using
         QUERY_DEV_CAP command.
      
      2. the PPF activates the new capabilities using INIT_HCA.
      
      3. the VF detects if the PPF activated the capabilities using
         QUERY_HCA, and if this is the case activates them for itself too.
      
      Note that the VF detects that it must be aware to the new PF behaviour
      using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.
      
      User space notification is done through a new field introduced in
      struct mlx4_ib_ucontext which holds device capabilities for which user
      space must take action. This changes the binary interface so the ABI
      towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
      when **needed** i.e. only when the driver does use 64-byte CQEs or
      future device capabilities which must be in sync by user space. This
      practice allows to work with unmodified libmlx4 on older devices (e.g
      A0, B0) which don't support 64-byte CQEs.
      
      In order to keep existing systems functional when they update to a
      newer kernel that contains these changes in VF and userspace ABI, a
      module parameter enable_64b_cqe_eqe must be set to enable 64-byte
      mode; the default is currently false.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      08ff3235
  27. 24 10月, 2012 1 次提交
  28. 01 10月, 2012 9 次提交
    • R
      mlx4_core: Disable SENSE_PORT for multifunction devices · aadf4f3f
      Roland Dreier 提交于
      In the current driver, the SENSE_PORT firmware command is issued as a
      "wrapped" command, but the command handling code doesn't have a
      wrapper, so it will never do anything other than log an error message.
      The latest ConnectX-3 2.11.500 firmware reports the SENSE_PORT
      capability even in multi-function (SR-IOV) mode, so the driver will
      try to issue the command.
      
      At least until the driver has a proper wrapper for SENSE_PORT, make
      sure we disable the command for multi-function devices.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      aadf4f3f
    • R
      mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs · ca3e57a5
      Roland Dreier 提交于
      Instead of having a hard-coded "PCI device ID != 0x1003" (which
      obviously breaks as newer devices with ID != 0x1003 become available),
      instead let's set a flag in our PCI device table for the older devices
      where we're supposed to force using SENSE_PORT.  This also avoids
      enabling SENSE_PORT for virtual functions by mistake.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      ca3e57a5
    • R
      mlx4_core: Stash PCI ID driver_data in mlx4_priv structure · 839f1243
      Roland Dreier 提交于
      That way we can check flags later on, when we've finished with the
      pci_device_id structure.  Also convert the "is VF" flag to an enum:
      "Never do in the preprocessor what can be done in C."
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      839f1243
    • R
      mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem · f3d4c89e
      Roland Dreier 提交于
      On an SR-IOV master device, __mlx4_init_one() calls mlx4_init_hca()
      before mlx4_multi_func_init().  However, for unlucky configurations,
      mlx4_init_hca() might call mlx4_SENSE_PORT() (via mlx4_dev_cap()), and
      that calls mlx4_cmd_imm() with MLX4_CMD_WRAPPED set.
      
      However, on a multifunction device with MLX4_CMD_WRAPPED, __mlx4_cmd()
      calls into mlx4_slave_cmd(), and that immediately tries to do
      
      	down(&priv->cmd.slave_sem);
      
      but priv->cmd.slave_sem isn't initialized until mlx4_multi_func_init()
      (which we haven't called yet).  The next thing it tries to do is access
      priv->mfunc.vhcr, but that hasn't been allocated yet.
      
      Fix this by moving the initialization of slave_sem and vhcr up into
      mlx4_cmd_init(). Also, since slave_sem is really just being used as a
      mutex, convert it into a slave_cmd_mutex.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f3d4c89e
    • R
      mlx4_core: Trivial cleanups to driver log messages · 84b1f153
      Roland Dreier 提交于
      Also put format string onto one line.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      84b1f153
    • J
      mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9
      Jack Morgenstein 提交于
      Previously, the structure of a guest's proxy QPs followed the
      structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
      qp1 port 2, ...).  The guest then did offset calculations on the
      sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
      
      This is now changed so that the guest does no offset calculations
      regarding proxy or tunnel QPs to use.  This change frees the PPF from
      needing to adhere to a specific order in allocating proxy and tunnel
      QPs.
      
      Now QUERY_FUNC_CAP provides each port individually with its proxy
      qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
      used directly where required (with no offset calculations).
      
      To accomplish this change, several fields were added to the phys_caps
      structure for use by the PPF and by non-SR-IOV mode:
      
          base_sqpn -- in non-sriov mode, this was formerly sqp_start.
          base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
          base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
      
      The current code in the PPF still adheres to the previous layout of
      sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
      layout without affecting VF or (paravirtualized) PF code.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      47605df9
    • J
      mlx4: Paravirtualize Node Guids for slaves · afa8fd1d
      Jack Morgenstein 提交于
      This is necessary in order to support > 1 VF/PF in a VM for software
      that uses the node guid as a discriminator, such as librdmacm.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      afa8fd1d
    • J
      mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14
      Jack Morgenstein 提交于
      This requires:
      
      1. Replacing the paravirtualized P_Key index (inserted by the guest)
         with the real P_Key index.
      
      2. For UD QPs, placing the guest's true source GID index in the
         address path structure mgid field, and setting the ud_force_mgid
         bit so that the mgid is taken from the QP context and not from the
         WQE when posting sends.
      
      3. For UC and RC QPs, placing the guest's true source GID index in the
         address path structure mgid field.
      
      4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
         proxy/tunnel pair.
      
      Since not all the above adjustments occur in all the QP transitions,
      the QP transitions require separate wrapper functions.
      
      Secondly, initialize the P_Key virtualization table to its default
      values: Master virtualized table is 1-1 with the real P_Key table,
      guest virtualized table has P_Key index 0 mapped to the real P_Key
      index 0, and all the other P_Key indices mapped to the reserved
      (invalid) P_Key at index 127.
      
      Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
      and generating events on the master only if a P_Key actually changed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      54679e14
    • J
      mlx4_core: Add proxy and tunnel QPs to the reserved QP area · e2c76824
      Jack Morgenstein 提交于
      In addition, pass the proxy and tunnel QP numbers to slaves so the
      driver can perform special QP paravirtualization.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e2c76824