1. 01 10月, 2012 6 次提交
    • O
      IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV · b9c5d6a6
      Oren Duer 提交于
      MCG paravirtualization support includes:
      - Creating multicast groups by VFs, and keeping accounting of them
      - Leaving multicast groups by VFs
      - Updating SM only with real changes in the overall picture of MCGs status
      - Creation of MGID=0 groups (let SM choose MGID)
      
      Note that the MCG module maintains its own internal MCG object
      reference counts.  The reason for this is that the IB core is used to
      track only the multicast groups joins generated by the PF it runs
      over.  The PF IB core layer is unaware of slaves, so it cannot be used
      to keep track of MCG joins they generate.
      Signed-off-by: NOren Duer <oren@mellanox.co.il>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b9c5d6a6
    • J
      mlx4: MAD_IFC paravirtualization · 0a9a0188
      Jack Morgenstein 提交于
      The MAD_IFC firmware command fulfills two functions.
      
      First, it is used in the QP0/QP1 MAD-handling flow to obtain
      information from the FW (for answering queries), and for setting
      variables in the HCA (MAD SET packets).
      
      For this, MAD_IFC should provide the FW (physical) view of the data.
      This is the view that OpenSM needs.  We call this the "network view".
      
      In the second case, MAD_IFC is used by various verbs to obtain data
      regarding the local HCA (e.g., ib_query_device()).  We call this the
      "host view".
      
      This data needs to be paravirtualized.
      
      MAD_IFC therefore needs a wrapper function, and also needs another
      flag indicating whether it should provide the network view (when it is
      called by ib_process_mad in special-qp packet handling), or the host
      view (when it is called while implementing a verb).
      
      There are currently 2 flag parameters in mlx4_MAD_IFC already:
      ignore_bkey and ignore_mkey.  These two parameters are replaced by a
      single "mad_ifc_flags" parameter, with different bits set for each
      flag.  A third flag is added: "network-view/host-view".
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      0a9a0188
    • J
      IB/mlx4: SR-IOV multiplex and demultiplex MADs · 37bfc7c1
      Jack Morgenstein 提交于
      Special QPs are paravirtualized.
      
      vHCAs are not given direct access to QP0/1. Rather, these QPs are
      operated by a special context hosted by the PF, which mediates access
      to/from vHCAs.  This is done by opening a "tunnel" per vHCA port per
      QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
      PF-context and a "Proxy QP" in the vHCA.  All vHCA MAD traffic must
      pass through the corresponding tunnel.  vHCA QPs cannot be assigned to
      VL15 and are denied of the well-known QKey.
      
      Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
      the real special QP).
      
      Incoming messages are "multiplexed" (i.e. steered by the PPF to the
      correct VF or to the PF)
      
      QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
      QP0s, but they never receive any SMPs and all SMPs sent are discarded.
      QP1 traffic is allowed for all vHCAs, but special care is required to
      bridge the gap between the host and network views.
      
      Specifically:
      - Transaction IDs are mapped to guarantee uniqueness among vHCAs
      - CM para-virtualization
        o   Incoming requests are steered to the correct vHCA according to the embedded GID
        o   Local communication IDs are mapped to ensure uniqueness among vHCAs
        (see the patch that adds CM paravirtualization.)
      - Multicast para-virtualization
        o   The PF context aggregates membership state from all vHCAs
        o   The SA is contacted only when the aggregate membership changes
        o   If the aggregate does not change, the PF context will provide the
            requesting vHCA with the proper response.
        (see the patch that adds multicast group paravirtualization)
      
      Incoming MADs are steered according to:
      - the DGID If a GRH is present
      - the mapped transaction ID for response MADs
      - the embedded GID in CM requests
      - the remote communication ID in other CM messages
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      37bfc7c1
    • J
      mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14
      Jack Morgenstein 提交于
      This requires:
      
      1. Replacing the paravirtualized P_Key index (inserted by the guest)
         with the real P_Key index.
      
      2. For UD QPs, placing the guest's true source GID index in the
         address path structure mgid field, and setting the ud_force_mgid
         bit so that the mgid is taken from the QP context and not from the
         WQE when posting sends.
      
      3. For UC and RC QPs, placing the guest's true source GID index in the
         address path structure mgid field.
      
      4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
         proxy/tunnel pair.
      
      Since not all the above adjustments occur in all the QP transitions,
      the QP transitions require separate wrapper functions.
      
      Secondly, initialize the P_Key virtualization table to its default
      values: Master virtualized table is 1-1 with the real P_Key table,
      guest virtualized table has P_Key index 0 mapped to the real P_Key
      index 0, and all the other P_Key indices mapped to the reserved
      (invalid) P_Key at index 127.
      
      Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
      and generating events on the master only if a P_Key actually changed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      54679e14
    • J
      IB/mlx4: Initialize SR-IOV IB support for slaves in master context · fc06573d
      Jack Morgenstein 提交于
      Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
      on the master.
      
      This has two parts.  The first part is to initialize the structures to
      contain the contexts.  This is done at master startup time in
      mlx4_ib_init_sriov().
      
      The second part is to actually create the tunneling resources required
      on the master to support a slave.  This is performed the master
      detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
      generated when a slave initializes its comm channel).
      
      For the master, there is no such startup event, so it creates its own
      tunneling resources when it starts up.  In addition, the master also
      creates the real special QPs.  The ib_core layer on the master causes
      creation of proxy special QPs, since the master is also
      paravirtualized at the ib_core layer.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      fc06573d
    • J
      IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support · 1ffeb2eb
      Jack Morgenstein 提交于
      1. Introduce the basic SR-IOV parvirtualization context objects for
         multiplexing and demultiplexing MADs.
      2. Introduce support for the new proxy and tunnel QP types.
      
      This patch introduces the objects required by the master for managing
      QP paravirtualization for guests.
      
      struct mlx4_ib_sriov is created by the master only.
      It is a container for the following:
      
      1. All the info required by the PPF to multiplex and de-multiplex MADs
         (including those from the PF). (struct mlx4_ib_demux_ctx demux)
      2. All the info required to manage alias GUIDs (i.e., the GUID at
         index 0 that each guest perceives.  In fact, this is not the GUID
         which is actually at index 0, but is, in fact, the GUID which is at
         index[<VF number>] in the physical table.
      3. structures which are used to manage CM paravirtualization
      4. structures for managing the real special QPs when running in SR-IOV
         mode.  The real SQPs are controlled by the PPF in this case.  All
         SQPs created and controlled by the ib core layer are proxy SQP.
      
      struct mlx4_ib_demux_ctx contains the information per port needed
      to manage paravirtualization:
      
      1. All multicast paravirt info
      2. All tunnel-qp paravirt info for the port.
      3. GUID-table and GUID-prefix for the port
      4. work queues.
      
      struct mlx4_ib_demux_pv_ctx contains all the info for managing the
      paravirtualized QPs for one slave/port.
      
      struct mlx4_ib_demux_pv_qp contains the info need to run an individual
      QP (either tunnel qp or real SQP).
      
      Note:  We made use of the 2 most significant bits in enum
      mlx4_ib_qp_flags (based on enum ib_qp_create_flags in ib_verbs.h).
      We need these bits in the low-level driver for internal purposes.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      1ffeb2eb
  2. 15 9月, 2012 2 次提交
  3. 08 9月, 2012 1 次提交
  4. 17 8月, 2012 1 次提交
  5. 16 8月, 2012 2 次提交
  6. 11 8月, 2012 2 次提交
    • R
      RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs · d549f55f
      Roland Dreier 提交于
      If CONFIG_VLAN_8021Q is not set, then vlan_dev_real_dev() just goes BUG(),
      so we shouldn't call it unless we're actually dealing with a VLAN netdev.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d549f55f
    • J
      IB/mlx4: Fix possible deadlock on sm_lock spinlock · df7fba66
      Jack Morgenstein 提交于
      The sm_lock spinlock is taken in the process context by
      mlx4_ib_modify_device, and in the interrupt context by update_sm_ah,
      so we need to take that spinlock with irqsave, and release it with
      irqrestore.
      
      Lockdeps reports this as follows:
      
          [ INFO: inconsistent lock state ]
          3.5.0+ #20 Not tainted
          inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
          swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
          (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
          {HARDIRQ-ON-W} state was registered at:
            [<ffffffff810b84a0>] mark_irqflags+0x120/0x190
            [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
            [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
            [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
            [<ffffffffa028d563>] mlx4_ib_modify_device+0x63/0x240 [mlx4_ib]
            [<ffffffffa026d1fc>] ib_modify_device+0x1c/0x20 [ib_core]
            [<ffffffffa026c353>] set_node_desc+0x83/0xc0 [ib_core]
            [<ffffffff8136a150>] dev_attr_store+0x20/0x30
            [<ffffffff81201fd6>] sysfs_write_file+0xe6/0x170
            [<ffffffff8118da38>] vfs_write+0xc8/0x190
            [<ffffffff8118dc01>] sys_write+0x51/0x90
            [<ffffffff8155b869>] system_call_fastpath+0x16/0x1b
      
          ...
          *** DEADLOCK ***
      
          1 lock held by swapper/0/0:
      
          stack backtrace:
          Pid: 0, comm: swapper/0 Not tainted 3.5.0+ #20
          Call Trace:
          <IRQ>  [<ffffffff810b7bea>] print_usage_bug+0x18a/0x190
          [<ffffffff810b7370>] ? print_irq_inversion_bug+0x210/0x210
          [<ffffffff810b7fb2>] mark_lock_irq+0xf2/0x280
          [<ffffffff810b8290>] mark_lock+0x150/0x240
          [<ffffffff810b84ef>] mark_irqflags+0x16f/0x190
          [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffffa026b2fa>] ? ib_create_ah+0x1a/0x40 [ib_core]
          [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff810c27c3>] ? is_module_address+0x23/0x30
          [<ffffffffa028b05b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib]
          [<ffffffffa028c177>] mlx4_ib_event+0x117/0x160 [mlx4_ib]
          [<ffffffff81552501>] ? _raw_spin_lock_irqsave+0x61/0x70
          [<ffffffffa022718c>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core]
          [<ffffffffa0221b40>] mlx4_eq_int+0x500/0x950 [mlx4_core]
      
      Reported by: Or Gerlitz <ogerlitz@mellanox.com>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      df7fba66
  7. 30 7月, 2012 1 次提交
  8. 28 7月, 2012 1 次提交
  9. 20 7月, 2012 4 次提交
  10. 19 7月, 2012 1 次提交
  11. 18 7月, 2012 1 次提交
  12. 12 7月, 2012 4 次提交
  13. 11 7月, 2012 2 次提交
    • M
      IB/qib: Fix sparse RCU warnings in qib_keys.c · 7e230177
      Mike Marciniszyn 提交于
      Commit 8aac4cc3 ("IB/qib: RCU locking for MR validation") introduced
      new sparse warnings in qib_keys.c.
      Acked-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      7e230177
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  14. 09 7月, 2012 6 次提交
  15. 08 7月, 2012 2 次提交
    • H
      {NET, IB}/mlx4: Add device managed flow steering firmware API · 0ff1fb65
      Hadar Hen Zion 提交于
      The driver is modified to support three operation modes.
      
      If supported by firmware use the device managed flow steering
      API, that which we call device managed steering mode. Else, if
      the firmware supports the B0 steering mode use it, and finally,
      if none of the above, use the A0 steering mode.
      
      When the steering mode is device managed, the code is modified
      such that L2 based rules set by the mlx4_en driver for Ethernet
      unicast and multicast, and the IB stack multicast attach calls
      done through the mlx4_ib driver are all routed to use the device
      managed API.
      
      When attaching rule using device managed flow steering API,
      the firmware returns a 64 bit registration id, which is to be
      provided during detach.
      
      Currently the firmware is always programmed during HCA initialization
      to use standard L2 hashing. Future work should be done to allow
      configuring the flow-steering hash function with common, non
      proprietary means.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ff1fb65
    • R
      RDMA/ocrdma: Fix assignment of max_srq_sge in device query · d1e09ebf
      Roland Dreier 提交于
      We want to set attr->max_srq_sge to dev->attr.max_srq_sge, not to itself.
      
      This was detected by Coverity (CID 709210).
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d1e09ebf
  16. 05 7月, 2012 1 次提交
  17. 15 6月, 2012 1 次提交
  18. 12 6月, 2012 2 次提交