1. 01 10月, 2012 11 次提交
    • J
      mlx4: Activate SR-IOV mode for IB · 026149cb
      Jack Morgenstein 提交于
      Remove the error returns for IB ports from mlx4_ib_add,
      mlx4_INIT_PORT_wrapper, and mlx4_CLOSE_PORT_wrapper.
      
      Currently, SRIOV is supported only for devices for which the
      link layer is IB on all ports; RoCE support will be added later.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      026149cb
    • J
      IB/mlx4: Miscellaneous adjustments for SR-IOV IB support · 992e8e6e
      Jack Morgenstein 提交于
      1. Allow only master to change node description.
      2. Prevent AH leakage in send mads.
      3. Take device part number from PCI structure, so that guests see the
         VF part number (and not the PF part number).
      4. Place the device revision ID into caps structure at startup.
      5. SET_PORT in update_gids_task needs to go through wrapper on master.
      6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
         queue on the master, since it propagates events to slaves using
         GEN_EQE.
      7. Do not support FMR on slaves.
      8. Add spinlock to slave_event(), since it is called both in interrupt
         context and in process context (due to 6 above, and also if
         smp_snoop is used).  This fix was found and implemented by Saeed
         Mahameed <saeedm@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      992e8e6e
    • J
      mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode · 980e9001
      Jack Morgenstein 提交于
      Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
      transitions to RTR, or transitions to ERR/RESET respectively.
      
      In SR-IOV mode, however, the master is also paravirtualized.  This in
      turn requires that we not do INIT_PORT until the entire QP0 path (real
      QP0 and proxy QP0) is ready to receive.  When the real QP0 goes down,
      we should indicate that the port is not active.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      980e9001
    • J
      net/mlx4_core: Adjustments to SET_PORT for IB SR-IOV · efcd235d
      Jack Morgenstein 提交于
      1. Slaves may not set the IS_SM capability for the port.
      2. DEV_MGMT may not be set in multifunction mode.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      efcd235d
    • J
      IB/mlx4: Propagate P_Key and guid change port management events to slaves · 2a4fae14
      Jack Morgenstein 提交于
      P_Key change and guid change events are not of interest to all slaves,
      but only to those slaves which "see" the table slots whose contents
      have change.
      
      For example, if the guid at port 1, index 5 has changed in the PPF, we
      wish to propagate the gid-change event only to the function which has
      that guid index mapped to its port/guid table (in this case it is
      slave #5). Other functions should not get the event, since the event
      does not affect them.
      
      Similarly with P_Keys -- P_Key change events are forwarded only to
      slaves which have that P_Key index mapped to their virtual P_Key table.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2a4fae14
    • J
      mlx4: Add alias_guid mechanism · a0c64a17
      Jack Morgenstein 提交于
      For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
      GUID at index 0 seen by a slave is the actual GUID occupying the GUID
      table at the slave-id index.
      
      The driver, by default, requests at startup time that subnet manager
      populate its entire guid table with GUIDs. These guids are then mapped
      (paravirtualized) to the slaves, and appear for each slave as its GUID
      at index 0.
      
      Until each slave has such a guid, its port status is DOWN.
      
      The guid table is cached to support special QP paravirtualization, and
      event propagation to slaves on guid change (we test to see if the guid
      really changed before propagating an event to the slave).
      
      To support this caching, add capability to __mlx4_ib_query_gid() to
      obtain the network view (i.e., physical view) gid at index X, not just
      the host (paravirtualized) view.
      
      Based on a patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a0c64a17
    • J
      mlx4_core: Add IB port-state machine and port mgmt event propagation · 993c401e
      Jack Morgenstein 提交于
      For an IB port, a slave should not show port active until that slave
      has a valid alias-guid (provided by the subnet manager).  Therefore
      the port-up event should be passed to a slave only after both the port
      is up, and the slave's alias-guid has been set.
      
      Also, provide the infrastructure for propagating port-management
      events (client-reregister, etc) to slaves.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      993c401e
    • J
      mlx4: MAD_IFC paravirtualization · 0a9a0188
      Jack Morgenstein 提交于
      The MAD_IFC firmware command fulfills two functions.
      
      First, it is used in the QP0/QP1 MAD-handling flow to obtain
      information from the FW (for answering queries), and for setting
      variables in the HCA (MAD SET packets).
      
      For this, MAD_IFC should provide the FW (physical) view of the data.
      This is the view that OpenSM needs.  We call this the "network view".
      
      In the second case, MAD_IFC is used by various verbs to obtain data
      regarding the local HCA (e.g., ib_query_device()).  We call this the
      "host view".
      
      This data needs to be paravirtualized.
      
      MAD_IFC therefore needs a wrapper function, and also needs another
      flag indicating whether it should provide the network view (when it is
      called by ib_process_mad in special-qp packet handling), or the host
      view (when it is called while implementing a verb).
      
      There are currently 2 flag parameters in mlx4_MAD_IFC already:
      ignore_bkey and ignore_mkey.  These two parameters are replaced by a
      single "mad_ifc_flags" parameter, with different bits set for each
      flag.  A third flag is added: "network-view/host-view".
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      0a9a0188
    • J
      mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14
      Jack Morgenstein 提交于
      This requires:
      
      1. Replacing the paravirtualized P_Key index (inserted by the guest)
         with the real P_Key index.
      
      2. For UD QPs, placing the guest's true source GID index in the
         address path structure mgid field, and setting the ud_force_mgid
         bit so that the mgid is taken from the QP context and not from the
         WQE when posting sends.
      
      3. For UC and RC QPs, placing the guest's true source GID index in the
         address path structure mgid field.
      
      4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
         proxy/tunnel pair.
      
      Since not all the above adjustments occur in all the QP transitions,
      the QP transitions require separate wrapper functions.
      
      Secondly, initialize the P_Key virtualization table to its default
      values: Master virtualized table is 1-1 with the real P_Key table,
      guest virtualized table has P_Key index 0 mapped to the real P_Key
      index 0, and all the other P_Key indices mapped to the reserved
      (invalid) P_Key at index 127.
      
      Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
      and generating events on the master only if a P_Key actually changed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      54679e14
    • J
      IB/mlx4: Initialize SR-IOV IB support for slaves in master context · fc06573d
      Jack Morgenstein 提交于
      Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
      on the master.
      
      This has two parts.  The first part is to initialize the structures to
      contain the contexts.  This is done at master startup time in
      mlx4_ib_init_sriov().
      
      The second part is to actually create the tunneling resources required
      on the master to support a slave.  This is performed the master
      detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
      generated when a slave initializes its comm channel).
      
      For the master, there is no such startup event, so it creates its own
      tunneling resources when it starts up.  In addition, the master also
      creates the real special QPs.  The ib_core layer on the master causes
      creation of proxy special QPs, since the master is also
      paravirtualized at the ib_core layer.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      fc06573d
    • J
      mlx4_core: Add proxy and tunnel QPs to the reserved QP area · e2c76824
      Jack Morgenstein 提交于
      In addition, pass the proxy and tunnel QP numbers to slaves so the
      driver can perform special QP paravirtualization.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e2c76824
  2. 14 9月, 2012 1 次提交
  3. 08 9月, 2012 5 次提交
  4. 16 8月, 2012 3 次提交
  5. 15 8月, 2012 1 次提交
  6. 04 8月, 2012 3 次提交
  7. 26 7月, 2012 2 次提交
  8. 20 7月, 2012 1 次提交
    • T
      mlx4_en: map entire pages to increase throughput · 4cce66cd
      Thadeu Lima de Souza Cascardo 提交于
      In its receive path, mlx4_en driver maps each page chunk that it pushes
      to the hardware and unmaps it when pushing it up the stack. This limits
      throughput to about 3Gbps on a Power7 8-core machine.
      
      One solution is to map the entire allocated page at once. However, this
      requires that we keep track of every page fragment we give to a
      descriptor. We also need to work with the discipline that all fragments will
      be released (in the sense that it will not be reused by the driver
      anymore) in the order they are allocated to the driver.
      
      This requires that we don't reuse any fragments, every single one of
      them must be reallocated. We do that by releasing all the fragments that
      are processed and only after finished processing the descriptors, we
      start the refill.
      
      We also must somehow guarantee that we either refill all fragments in a
      descriptor or none at all, without resorting to giving up a page
      fragment that we would have already given. Otherwise, we would break the
      discipline of only releasing the fragments in the order they were
      allocated.
      
      This has passed page allocation fault injections (restricted to the
      driver by using required-start and required-end) and device hotplug
      while 16 TCP streams were able to deliver more than 9Gbps.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cce66cd
  9. 19 7月, 2012 3 次提交
  10. 17 7月, 2012 2 次提交
  11. 12 7月, 2012 5 次提交
    • J
      mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them · 6634961c
      Jack Morgenstein 提交于
      To allow easy paravirtualization of P_Key and GID table sizes, keep
      paravirtualized sizes in mlx4_dev->caps, but save the actual physical
      sizes from FW in struct: mlx4_dev->phys_cap.
      
      In addition, in SR-IOV mode, do the following:
      
      1. Reduce reported P_Key table size by 1.
         This is done to reserve the highest P_Key index for internal use,
         for declaring an invalid P_Key in P_Key paravirtualization.
         We require a P_Key index which always contain an invalid P_Key
         value for this purpose (i.e., one which cannot be modified by
         the subnet manager).  The way to do this is to reduce the
         P_Key table size reported to the subnet manager by 1, so that
         it will not attempt to access the P_Key at index #127.
      
      2. Paravirtualize the GID table size to 1. Thus, each guest sees
         only a single GID (at its paravirtualized index 0).
      
      In addition, since we are paravirtualizing the GID table size to 1, we
      add paravirtualization of the master GID event here (i.e., we do not
      do ib_dispatch_event() for the GUID change event on the master, since
      its (only) GUID never changes).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6634961c
    • J
      mlx4_core: Allow guests to have IB ports · 105c320f
      Jack Morgenstein 提交于
      Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
      mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
      pass that to the guests (this is done in QUERY_FUNC_CAP and its
      wrapper).
      
      However, we don't activate IB support quite yet -- we leave the error
      return at the start of mlx4_ib_add in the mlx4_ib driver.
      
      In addition, set "protected fmr supported" bit to zero in the
      QUERY_FUNC_CAP wrapper.
      
      Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
      checks for the port type (IB or Ethernet).  Previously, this was not
      an issue, since only Ethernet ports were supported.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      105c320f
    • J
      mlx4_core: Implement mechanism for reserved Q_Keys · 396f2feb
      Jack Morgenstein 提交于
      The SR-IOV special QP tunneling mechanism uses proxy special QPs
      (instead of the real special QPs) for MADs on guests.  These proxy QPs
      send their packets to a "tunnel" QP owned by the master.  The master
      then forwards the MAD (after any required paravirtualization) to the
      real special QP, which sends out the MAD.
      
      For security reasons (i.e., to prevent guests from sending MADs to
      tunnel QPs belonging to other guests), each proxy-tunnel QP pair is
      assigned a unique, reserved, Q_Key.  These Q_Keys are available only
      for proxy and tunnel QPs -- if the guest tries to use these Q_Keys
      with other QPs, it will fail.
      
      This patch introduces a mechanism for reserving a block of 64K Q_Keys
      for proxy/tunneling use.
      
      The patch introduces also two new fields into mlx4_dev: base_sqpn and
      base_tunnel_sqpn.
      
      In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps
      are added to the reserved QPN area (so that they will not change).
      There are 8 special QPs per port in the HCA, and each of them is
      assigned both a proxy and a tunnel QP, for each VF and for the PF as
      well in SR-IOV mode.
      
      The QPNs for these QPs are arranged as follows:
       1. The real SQP numbers (8)
       2. The proxy SQPs (8 * (max number of VFs + max number of PFs)
       3. The tunnel SQPs (8 * (max number of VFs + max number of PFs)
      
      To support these QPs, two new fields are added to struct mlx4_dev:
      
        base_sqp:  this is the QP number of the first of the real SQPs
        base_tunnel_sqp: this is the qp number of the first qp in the tunnel
                         sqp region. (On guests, this is the first tunnel
                         sqp of the 8 which are assigned to that guest).
      
      In addition, in SR-IOV mode, sqp_start is the number of the first
      proxy SQP in the proxy SQP region.  (In guests, this is the first
      proxy SQP of the 8 which are assigned to that guest)
      
      Note that in non-SR-IOV mode, there are no proxies and no tunnels.
      In this case, sqp_start is set to sqp_base -- which minimizes code
      changes.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      396f2feb
    • D
      net/mlx4_core: Free ICM table in case of error · 240a9207
      Dotan Barak 提交于
      In mlx4_init_icm_table(), free the allocated table if we failed to
      allocate memory to its entries.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Reviewed-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      240a9207
    • D
      mlx4_core: Remove double function declarations · f457ce47
      Dotan Barak 提交于
      Spotted four duplicate declarations in icm.h, remove them.
      Signed-off-by: NDotan Barak <dotanb@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      f457ce47
  12. 11 7月, 2012 2 次提交
    • J
      net/mlx4_core: Initialize IB port capabilities for all slaves · 2aca1172
      Jack Morgenstein 提交于
      With IB SR-IOV, each slave has its own separate copy of the port
      capabilities flags.  For example, the master can run a subnet manager
      (which causes the IsSM bit to be set in the master's port
      capabilities) without affecting the port capabilities seen by the
      slaves (the IsSM bit will be seen as cleared in the slaves).
      
      Also add a static inline mlx4_master_func_num() to enhance readability
      of the code.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2aca1172
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  13. 09 7月, 2012 1 次提交