1. 08 11月, 2013 1 次提交
  2. 05 11月, 2013 2 次提交
    • J
      net/mlx4_core: Fix quota handling in the QUERY_FUNC_CAP wrapper · eb456a68
      Jack Morgenstein 提交于
      In current kernels, the mlx4 driver running on a VM does not
      differentiate between max resource numbers for the HCA and
      max quotas -- it simply takes the quota values passed to it
      as max-resource values.
      
      However, the driver actually requires the VFs to be aware of
      the actual number of resources that the HCA was initialized with,
      for QPs, CQs, SRQs and MPTs.
      
      For QPs, CQs and SRQs, the reason is that in completion handling
      the driver must know which of the 24 bits are the actual resource
      number, and which are "padding" bits.
      
      For MPTs, also, the driver assumes knowledge of the number of MPTs
      in the system.
      
      The previous commit fixes the quota logic on the VM for the quota values
      passed to it by QUERY_FUNC_CAPS.
      
      For QPs, CQs, SRQs, and MPTs, it takes the max resource numbers
      from QUERY_HCA (and not QUERY_FUNC_CAPS).  The quotas passed
      in QUERY_FUNC_CAPS are used to report max resource number values
      in the response to ib_query_device.
      
      However, the Hypervisor driver must consider that VMs
      may be running previous kernels, and compatibility must be preserved.
      
      To resolve the incompatibility with previous kernels running on VMs,
      we deprecated the quota fields in mlx4_QUERY_FUNC_CAP.  In the
      deprecated fields, we pass the max-resource values from INIT_HCA
      
      The quota fields are moved to a new location, and the current kernel
      driver takes the proper values from that location. There is
      also a new flag in dword 0, bit 28 of the mlx4_QUERY_FUNC_CAP mailbox;
      if this flag is set, the (VM) driver takes the quota values from the
      new location.
      
      VMs running previous kernels will work properly, except that the max resource
      numbers reported in ib_query_device for these resources will be
      too high.  The Hypervisor driver will, however, enforce the quotas
      for these VMs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb456a68
    • J
      mlx4: Structures and init/teardown for VF resource quotas · 5a0d0a61
      Jack Morgenstein 提交于
      This is step #1 for implementing SRIOV resource quotas for VFs.
      
      Quotas are implemented per resource type for VFs and the PF, to prevent
      any entity from simply grabbing all the resources for itself and leaving
      the other entities unable to obtain such resources.
      
      Resources which are allocated using quotas:  QPs, CQs, SRQs, MPTs, MTTs, MAC,
                                                   VLAN, and Counters.
      
      The quota system works as follows:
      Each entity (VF or PF) is given a max number of a given resource (its quota),
      and a guaranteed minimum number for each resource (starvation prevention).
      
      For QPs, CQs, SRQs, MPTs and MTTs:
      50% of the available quantity for the resource is divided equally among
      the PF and all the active VFs (i.e., the number of VFs in the mlx4_core module
      parameter "num_vfs"). This 50% represents the "guaranteed minimum" pool.
      The other 50% is the "free pool", allocated on a first-come-first-serve basis.
      For each VF/PF, resources are first allocated from its "guaranteed-minimum"
      pool. When that pool is exhausted, the driver attempts to allocate from
      the resource "free-pool".
      
      The quota (i.e., max) for the VFs and the PF is:
        The free-pool amount (50% of the real max) + the guaranteed minimum
      
      For MACs:
        Guarantee 2 MACs per VF/PF per port. As a result, since we have only
        128 MACs per port, reduce the allowable number of VFs from 64 to 63.
        Any remaining MACs are put into a free pool.
      
      For VLANs:
        For the PF, the per-port quota is 128 and guarantee is 64
           (to allow the PF to register at least a VLAN per VF in VST mode).
        For the VFs, the per-port quota is 64 and the guarantee is 0.
            We assume that VGT VFs are trusted not to abuse the VLAN resource.
      
      For Counters:
        For all functions (PF and VFs), the quota is 128 and the guarantee is 0.
      
      In this patch, we define the needed structures, which are added to the
      resource-tracker struct.  In addition, we do initialization
      for the resource quota, and adjust the query_device response to use quotas
      rather than resource maxima.
      
      As part of the implementation, we introduce a new field in
      mlx4_dev: quotas.  This field holds the resource quotas used
      to report maxima to the upper layers (ib_core, via query_device).
      
      The HCA maxima of these values are passed to the VFs (via
      QUERY_HCA) so that they may continue to use these in handling
      QPs, CQs, SRQs and MPTs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a0d0a61
  3. 18 10月, 2013 2 次提交
  4. 14 10月, 2013 1 次提交
  5. 02 8月, 2013 1 次提交
  6. 29 7月, 2013 1 次提交
  7. 02 7月, 2013 1 次提交
  8. 14 6月, 2013 1 次提交
  9. 05 6月, 2013 1 次提交
  10. 12 5月, 2013 1 次提交
  11. 27 4月, 2013 4 次提交
  12. 25 4月, 2013 2 次提交
  13. 08 4月, 2013 2 次提交
  14. 08 3月, 2013 1 次提交
  15. 26 2月, 2013 2 次提交
  16. 01 2月, 2013 2 次提交
  17. 20 12月, 2012 1 次提交
    • J
      mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV · 7b8157be
      Jack Morgenstein 提交于
      Separate flow steering capability detection from the decision to activate.
      
      For the master (and for native), detect the flow steering capability
      in mlx4_dev_cap, but activate the appropriate steering type in a new
      function choose_flow_steering() based on detected data.
      
      For VFs, activate flow steering based on what was actually activated
      by the master, where that info is obtained via QUERY_HCA. This fixes
      the current VF detection which is wrongly based on QUERY_DEV_CAP.
      
      Also, for SR-IOV mode, if flow steering may be activated, do so only
      if the max number of QPs per rule is sufficient to satisfy one
      subscription per VF.  If not, fall back to B0 mode. This is needed to
      serve registrations done by L2 network drivers such as mlx4_en and
      IPoIB when the network stack attempts to join to multicast groups such
      as all-hosts or the IPoIB broadcast group.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      7b8157be
  18. 27 11月, 2012 1 次提交
    • O
      mlx4: 64-byte CQE/EQE support · 08ff3235
      Or Gerlitz 提交于
      ConnectX-3 devices can use either 64- or 32-byte completion queue
      entries (CQEs) and event queue entries (EQEs).  Using 64-byte
      EQEs/CQEs performs better because each entry is aligned to a complete
      cacheline.  This patch queries the HCA's capabilities, and if it
      supports 64-byte CQEs and EQES the driver will configure the HW to
      work in 64-byte mode.
      
      The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.
      
      Since this mode is global, userspace (libmlx4) must be updated to work
      with the configured CQE size, and guests using SR-IOV virtual
      functions need to know both EQE and CQE size.
      
      In case one of the 64-byte CQE/EQE capabilities is activated, the
      patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
      command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
      they need an update to be able to work with the PPF. This is done by
      changing the returned pf_context_behaviour not to be zero any more. In
      case none of these capabilities is activated that value remains zero
      and older guest drivers can run OK.
      
      The SRIOV related flow is as follows
      
      1. the PPF does the detection of the new capabilities using
         QUERY_DEV_CAP command.
      
      2. the PPF activates the new capabilities using INIT_HCA.
      
      3. the VF detects if the PPF activated the capabilities using
         QUERY_HCA, and if this is the case activates them for itself too.
      
      Note that the VF detects that it must be aware to the new PF behaviour
      using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.
      
      User space notification is done through a new field introduced in
      struct mlx4_ib_ucontext which holds device capabilities for which user
      space must take action. This changes the binary interface so the ABI
      towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
      when **needed** i.e. only when the driver does use 64-byte CQEs or
      future device capabilities which must be in sync by user space. This
      practice allows to work with unmodified libmlx4 on older devices (e.g
      A0, B0) which don't support 64-byte CQEs.
      
      In order to keep existing systems functional when they update to a
      newer kernel that contains these changes in VF and userspace ABI, a
      module parameter enable_64b_cqe_eqe must be set to enable 64-byte
      mode; the default is currently false.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      08ff3235
  19. 01 10月, 2012 7 次提交
  20. 12 7月, 2012 2 次提交
    • J
      mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them · 6634961c
      Jack Morgenstein 提交于
      To allow easy paravirtualization of P_Key and GID table sizes, keep
      paravirtualized sizes in mlx4_dev->caps, but save the actual physical
      sizes from FW in struct: mlx4_dev->phys_cap.
      
      In addition, in SR-IOV mode, do the following:
      
      1. Reduce reported P_Key table size by 1.
         This is done to reserve the highest P_Key index for internal use,
         for declaring an invalid P_Key in P_Key paravirtualization.
         We require a P_Key index which always contain an invalid P_Key
         value for this purpose (i.e., one which cannot be modified by
         the subnet manager).  The way to do this is to reduce the
         P_Key table size reported to the subnet manager by 1, so that
         it will not attempt to access the P_Key at index #127.
      
      2. Paravirtualize the GID table size to 1. Thus, each guest sees
         only a single GID (at its paravirtualized index 0).
      
      In addition, since we are paravirtualizing the GID table size to 1, we
      add paravirtualization of the master GID event here (i.e., we do not
      do ib_dispatch_event() for the GUID change event on the master, since
      its (only) GUID never changes).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6634961c
    • J
      mlx4_core: Allow guests to have IB ports · 105c320f
      Jack Morgenstein 提交于
      Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
      mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
      pass that to the guests (this is done in QUERY_FUNC_CAP and its
      wrapper).
      
      However, we don't activate IB support quite yet -- we leave the error
      return at the start of mlx4_ib_add in the mlx4_ib driver.
      
      In addition, set "protected fmr supported" bit to zero in the
      QUERY_FUNC_CAP wrapper.
      
      Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
      checks for the port type (IB or Ethernet).  Previously, this was not
      an issue, since only Ethernet ports were supported.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      105c320f
  21. 11 7月, 2012 1 次提交
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  22. 09 7月, 2012 1 次提交
  23. 08 7月, 2012 2 次提交
    • H
      {NET, IB}/mlx4: Add device managed flow steering firmware API · 0ff1fb65
      Hadar Hen Zion 提交于
      The driver is modified to support three operation modes.
      
      If supported by firmware use the device managed flow steering
      API, that which we call device managed steering mode. Else, if
      the firmware supports the B0 steering mode use it, and finally,
      if none of the above, use the A0 steering mode.
      
      When the steering mode is device managed, the code is modified
      such that L2 based rules set by the mlx4_en driver for Ethernet
      unicast and multicast, and the IB stack multicast attach calls
      done through the mlx4_ib driver are all routed to use the device
      managed API.
      
      When attaching rule using device managed flow steering API,
      the firmware returns a 64 bit registration id, which is to be
      provided during detach.
      
      Currently the firmware is always programmed during HCA initialization
      to use standard L2 hashing. Future work should be done to allow
      configuring the flow-steering hash function with common, non
      proprietary means.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ff1fb65
    • H
      net/mlx4: Set steering mode according to device capabilities · c96d97f4
      Hadar Hen Zion 提交于
      Instead of checking the firmware supported steering mode in various
      places in the code, add a dedicated field in the mlx4 device capabilities
      structure which is written once during the initialization flow and read
      across the code.
      
      This also set the grounds for add new steering modes. Currently two modes
      are supported, and are named after the ConnectX HW versions A0 and B0.
      
      A0 steering uses mac_index, vlan_index and priority to steer traffic
      into pre-defined range of QPs.
      
      B0 steering uses Ethernet L2 hashing rules and is enabled only
      if the firmware supports both unicast and multicast B0 steering,
      
      The current steering modes are relevant for Ethernet traffic only,
      such that Infiniband steering remains untouched.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c96d97f4