1. 16 4月, 2015 1 次提交
  2. 19 3月, 2015 1 次提交
  3. 26 1月, 2015 1 次提交
  4. 11 8月, 2014 1 次提交
  5. 30 5月, 2014 1 次提交
    • J
      IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses · 97982f5a
      Jack Morgenstein 提交于
      Currently, VFs in SRIOV VFs are denied QP0 access.  The main reason
      for this decision is security, since Subnet Management Datagrams
      (SMPs) are not restricted by network partitioning and may affect the
      physical network topology.  Moreover, even the SM may be denied access
      from portions of the network by setting management keys unknown to the
      SM.
      
      However, it is desirable to grant SMI access to certain privileged
      VFs, so that certain network management activities may be conducted
      within virtual machines instead of the hypervisor.
      
      This commit does the following:
      
      1. Create QP0 tunnel QPs for all VFs.
      
      2. Discard SMI mads sent-from/received-for non-privileged VFs in the
         hypervisor MAD multiplex/demultiplex logic.  SMI mads from/for
         privileged VFs are allowed to pass.
      
      3. MAD_IFC wrapper changes/fixes.  For non-privileged VFs, only
         host-view MAD_IFC commands are allowed, and only for SMI LID-Routed
         GET mads.  For privileged VFs, there are no restrictions.
      
      This commit does not allow privileged VFs as yet.  To determine if a VF
      is privileged, it calls function mlx4_vf_smi_enabled().  This function
      returns 0 unconditionally for now.
      
      The next two commits allow defining and activating privileged VFs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      97982f5a
  6. 21 3月, 2014 1 次提交
    • M
      net/mlx4: Adapt code for N-Port VF · 449fc488
      Matan Barak 提交于
      Adds support for N-Port VFs, this includes:
      1. Adding support in the wrapped FW command
      	In wrapped commands, we need to verify and convert
      	the slave's port into the real physical port.
      	Furthermore, when sending the response back to the slave,
      	a reverse conversion should be made.
      2. Adjusting sqpn for QP1 para-virtualization
      	The slave assumes that sqpn is used for QP1 communication.
      	If the slave is assigned to a port != (first port), we need
      	to adjust the sqpn that will direct its QP1 packets into the
      	correct endpoint.
      3. Adjusting gid[5] to modify the port for raw ethernet
      	In B0 steering, gid[5] contains the port. It needs
      	to be adjusted into the physical port.
      4. Adjusting number of ports in the query / ports caps in the FW commands
      	When a slave queries the hardware, it needs to view only
      	the physical ports it's assigned to.
      5. Adjusting the sched_qp according to the port number
      	The QP port is encoded in the sched_qp, thus in modify_qp we need
      	to encode the correct port in sched_qp.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      449fc488
  7. 13 3月, 2014 3 次提交
  8. 01 8月, 2013 1 次提交
    • J
      IB/mlx4: Use default pkey when creating tunnel QPs · 3eac103f
      Jack Morgenstein 提交于
      When creating tunnel QPs for special QP tunneling, look for the
      default pkey in the slave's virtual pkey table.  If it is present, use
      the real pkey index where the default pkey is located.
      
      If the default pkey is not found in the pkey table, use the real pkey
      index which is stored at index 0 in the slave's virtual pkey table
      (this is the current behavior).
      
      This change is required to support cloud computing, where the
      paravirtualized index of the default pkey is moved to index 1 or
      higher.  The pkey at paravirtualized index 0 is used for the default
      IPoIB interface created by the VF.
      
      Its possible for the pkey value at paravirtualized index 0 to be
      invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey
      index 127, which contains pkey = 0).
      
      At some point after the VF probe, the cloud computing interface at the
      hypervisor maps virtual index 0 for the VF to the pkey index
      containing the pkey that IPoIB will use in its operation.  However,
      when the tunnel QP is created, the pkey at the slave's virtual index 0
      is still mapped to the invalid pkey index, so tunnel QP creation
      fails.
      
      This commit causes the hypervisor to search for the default pkey in
      the slave's pkey table -- and this pkey is present in the table (at
      index > 0) at tunnel QP creation time, so that the tunnel QP creation
      will succeed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3eac103f
  9. 08 5月, 2013 1 次提交
  10. 17 4月, 2013 1 次提交
  11. 16 2月, 2013 1 次提交
  12. 19 10月, 2012 1 次提交
    • J
      IB/mlx4: Fix QP1 P_Key processing in the Primary Physical Function (PPF) · 2c75d2cc
      Jack Morgenstein 提交于
      In the MAD paravirtualization code, one of the checks performed when
      forwarding QP1 (GSI) packets from wire to slave was a P_Key check: the
      P_Key received in the MAD must be present in the guest's paravirtualized
      P_Key table, and at least one of the (packet P_Key, guest P_Key) must
      be a full-membership P_Key.
      
      However, if everyone involved has only limited membership in the
      default P_Key, then packets sent by full-member remote hosts arrive at
      the PPF but are not passed on to the VFs with the current P_Key1 check.
      
      Fix this as follows:
      
      1. Don't care if P_Key received over wire is full or not. If it
         successfully passed HW checks on the real QP1, then simply pass it
         to guest regardless of whether the guest has full or limited
         membership in its P_Key table.
      
      2. If the guest (including paravirtualized master) has both full and
         limited P_Key forms in its table, preferentially pass the
         paravirtualized P_Key index of the full P_Key form in the tunnel
         header.
      
      3. In the multicast join flow (mlx4/mcg.c), use the index for the
         default P_Key (wherever it is located) in replies generated from
         within the mcg module (previously, P_Key index 0 was used in all
         cases).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2c75d2cc
  13. 01 10月, 2012 13 次提交
    • J
      IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes · 3806d08c
      Jack Morgenstein 提交于
      When we have VFs and PFs on same host, the VFs are activated within
      the mlx4_core module before the mlx4_ib kernel module is loaded.
      
      When the mlx4_ib module initializes the PF (master), it now creates
      MAD paravirtualization contexts for any VFs that already active.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3806d08c
    • J
      mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9
      Jack Morgenstein 提交于
      Previously, the structure of a guest's proxy QPs followed the
      structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
      qp1 port 2, ...).  The guest then did offset calculations on the
      sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
      
      This is now changed so that the guest does no offset calculations
      regarding proxy or tunnel QPs to use.  This change frees the PPF from
      needing to adhere to a specific order in allocating proxy and tunnel
      QPs.
      
      Now QUERY_FUNC_CAP provides each port individually with its proxy
      qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
      used directly where required (with no offset calculations).
      
      To accomplish this change, several fields were added to the phys_caps
      structure for use by the PPF and by non-SR-IOV mode:
      
          base_sqpn -- in non-sriov mode, this was formerly sqp_start.
          base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
          base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
      
      The current code in the PPF still adheres to the previous layout of
      sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
      layout without affecting VF or (paravirtualized) PF code.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      47605df9
    • J
      mlx4: Paravirtualize Node Guids for slaves · afa8fd1d
      Jack Morgenstein 提交于
      This is necessary in order to support > 1 VF/PF in a VM for software
      that uses the node guid as a discriminator, such as librdmacm.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      afa8fd1d
    • J
      IB/mlx4: Miscellaneous adjustments for SR-IOV IB support · 992e8e6e
      Jack Morgenstein 提交于
      1. Allow only master to change node description.
      2. Prevent AH leakage in send mads.
      3. Take device part number from PCI structure, so that guests see the
         VF part number (and not the PF part number).
      4. Place the device revision ID into caps structure at startup.
      5. SET_PORT in update_gids_task needs to go through wrapper on master.
      6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
         queue on the master, since it propagates events to slaves using
         GEN_EQE.
      7. Do not support FMR on slaves.
      8. Add spinlock to slave_event(), since it is called both in interrupt
         context and in process context (due to 6 above, and also if
         smp_snoop is used).  This fix was found and implemented by Saeed
         Mahameed <saeedm@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      992e8e6e
    • J
      IB/mlx4: Add iov directory in sysfs under the ib device · c1e7e466
      Jack Morgenstein 提交于
      This directory is added only for the master -- slaves do not have it.
      
      The sysfs iov directory is used to manage and examine the port P_Key
      and guid paravirtualization.
      
      Under iov/ports, the administrator may examine the gid and P_Key tables
      as they are present in the device (and as are seen in the "network
      view" presented to the SM).
      
      Under the iov/<pci slot number> directories, the admin may map the
      index numbers in the physical tables (as under iov/ports) to the
      paravirtualized index numbers that guests see.
      
      For example, if the administrator, for port 1 on guest 2 maps physical
      pkey index 10 to virtual index 1, then that guest, whenever it uses
      its pkey index 1, will actually be using the real pkey index 10.
      
      Based on patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c1e7e466
    • J
      IB/mlx4: Propagate P_Key and guid change port management events to slaves · 2a4fae14
      Jack Morgenstein 提交于
      P_Key change and guid change events are not of interest to all slaves,
      but only to those slaves which "see" the table slots whose contents
      have change.
      
      For example, if the guid at port 1, index 5 has changed in the PPF, we
      wish to propagate the gid-change event only to the function which has
      that guid index mapped to its port/guid table (in this case it is
      slave #5). Other functions should not get the event, since the event
      does not affect them.
      
      Similarly with P_Keys -- P_Key change events are forwarded only to
      slaves which have that P_Key index mapped to their virtual P_Key table.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2a4fae14
    • J
      mlx4: Add alias_guid mechanism · a0c64a17
      Jack Morgenstein 提交于
      For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
      GUID at index 0 seen by a slave is the actual GUID occupying the GUID
      table at the slave-id index.
      
      The driver, by default, requests at startup time that subnet manager
      populate its entire guid table with GUIDs. These guids are then mapped
      (paravirtualized) to the slaves, and appear for each slave as its GUID
      at index 0.
      
      Until each slave has such a guid, its port status is DOWN.
      
      The guid table is cached to support special QP paravirtualization, and
      event propagation to slaves on guid change (we test to see if the guid
      really changed before propagating an event to the slave).
      
      To support this caching, add capability to __mlx4_ib_query_gid() to
      obtain the network view (i.e., physical view) gid at index X, not just
      the host (paravirtualized) view.
      
      Based on a patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a0c64a17
    • A
      IB/mlx4: Add CM paravirtualization · 3cf69cc8
      Amir Vadai 提交于
      In CM para-virtualization:
      
      1. Incoming requests are steered to the correct vHCA according to the
         embedded GID.
      2. Communication IDs on outgoing requests are replaced by a globally
         unique ID, generated by the PPF, since there is no synchronization
         of ID generation between guests (and so these IDs are not
         guaranteed to be globally unique).  The guest's comm ID is stored,
         and is returned to the response MAD when it arrives.
      Signed-off-by: NAmir Vadai <amirv@mellanox.co.il>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3cf69cc8
    • O
      IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV · b9c5d6a6
      Oren Duer 提交于
      MCG paravirtualization support includes:
      - Creating multicast groups by VFs, and keeping accounting of them
      - Leaving multicast groups by VFs
      - Updating SM only with real changes in the overall picture of MCGs status
      - Creation of MGID=0 groups (let SM choose MGID)
      
      Note that the MCG module maintains its own internal MCG object
      reference counts.  The reason for this is that the IB core is used to
      track only the multicast groups joins generated by the PF it runs
      over.  The PF IB core layer is unaware of slaves, so it cannot be used
      to keep track of MCG joins they generate.
      Signed-off-by: NOren Duer <oren@mellanox.co.il>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b9c5d6a6
    • J
      mlx4: MAD_IFC paravirtualization · 0a9a0188
      Jack Morgenstein 提交于
      The MAD_IFC firmware command fulfills two functions.
      
      First, it is used in the QP0/QP1 MAD-handling flow to obtain
      information from the FW (for answering queries), and for setting
      variables in the HCA (MAD SET packets).
      
      For this, MAD_IFC should provide the FW (physical) view of the data.
      This is the view that OpenSM needs.  We call this the "network view".
      
      In the second case, MAD_IFC is used by various verbs to obtain data
      regarding the local HCA (e.g., ib_query_device()).  We call this the
      "host view".
      
      This data needs to be paravirtualized.
      
      MAD_IFC therefore needs a wrapper function, and also needs another
      flag indicating whether it should provide the network view (when it is
      called by ib_process_mad in special-qp packet handling), or the host
      view (when it is called while implementing a verb).
      
      There are currently 2 flag parameters in mlx4_MAD_IFC already:
      ignore_bkey and ignore_mkey.  These two parameters are replaced by a
      single "mad_ifc_flags" parameter, with different bits set for each
      flag.  A third flag is added: "network-view/host-view".
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      0a9a0188
    • J
      IB/mlx4: SR-IOV multiplex and demultiplex MADs · 37bfc7c1
      Jack Morgenstein 提交于
      Special QPs are paravirtualized.
      
      vHCAs are not given direct access to QP0/1. Rather, these QPs are
      operated by a special context hosted by the PF, which mediates access
      to/from vHCAs.  This is done by opening a "tunnel" per vHCA port per
      QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
      PF-context and a "Proxy QP" in the vHCA.  All vHCA MAD traffic must
      pass through the corresponding tunnel.  vHCA QPs cannot be assigned to
      VL15 and are denied of the well-known QKey.
      
      Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
      the real special QP).
      
      Incoming messages are "multiplexed" (i.e. steered by the PPF to the
      correct VF or to the PF)
      
      QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
      QP0s, but they never receive any SMPs and all SMPs sent are discarded.
      QP1 traffic is allowed for all vHCAs, but special care is required to
      bridge the gap between the host and network views.
      
      Specifically:
      - Transaction IDs are mapped to guarantee uniqueness among vHCAs
      - CM para-virtualization
        o   Incoming requests are steered to the correct vHCA according to the embedded GID
        o   Local communication IDs are mapped to ensure uniqueness among vHCAs
        (see the patch that adds CM paravirtualization.)
      - Multicast para-virtualization
        o   The PF context aggregates membership state from all vHCAs
        o   The SA is contacted only when the aggregate membership changes
        o   If the aggregate does not change, the PF context will provide the
            requesting vHCA with the proper response.
        (see the patch that adds multicast group paravirtualization)
      
      Incoming MADs are steered according to:
      - the DGID If a GRH is present
      - the mapped transaction ID for response MADs
      - the embedded GID in CM requests
      - the remote communication ID in other CM messages
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      37bfc7c1
    • J
      mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14
      Jack Morgenstein 提交于
      This requires:
      
      1. Replacing the paravirtualized P_Key index (inserted by the guest)
         with the real P_Key index.
      
      2. For UD QPs, placing the guest's true source GID index in the
         address path structure mgid field, and setting the ud_force_mgid
         bit so that the mgid is taken from the QP context and not from the
         WQE when posting sends.
      
      3. For UC and RC QPs, placing the guest's true source GID index in the
         address path structure mgid field.
      
      4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
         proxy/tunnel pair.
      
      Since not all the above adjustments occur in all the QP transitions,
      the QP transitions require separate wrapper functions.
      
      Secondly, initialize the P_Key virtualization table to its default
      values: Master virtualized table is 1-1 with the real P_Key table,
      guest virtualized table has P_Key index 0 mapped to the real P_Key
      index 0, and all the other P_Key indices mapped to the reserved
      (invalid) P_Key at index 127.
      
      Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
      and generating events on the master only if a P_Key actually changed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      54679e14
    • J
      IB/mlx4: Initialize SR-IOV IB support for slaves in master context · fc06573d
      Jack Morgenstein 提交于
      Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
      on the master.
      
      This has two parts.  The first part is to initialize the structures to
      contain the contexts.  This is done at master startup time in
      mlx4_ib_init_sriov().
      
      The second part is to actually create the tunneling resources required
      on the master to support a slave.  This is performed the master
      detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
      generated when a slave initializes its comm channel).
      
      For the master, there is no such startup event, so it creates its own
      tunneling resources when it starts up.  In addition, the master also
      creates the real special QPs.  The ib_core layer on the master causes
      creation of proxy special QPs, since the master is also
      paravirtualized at the ib_core layer.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      fc06573d
  14. 11 8月, 2012 1 次提交
    • J
      IB/mlx4: Fix possible deadlock on sm_lock spinlock · df7fba66
      Jack Morgenstein 提交于
      The sm_lock spinlock is taken in the process context by
      mlx4_ib_modify_device, and in the interrupt context by update_sm_ah,
      so we need to take that spinlock with irqsave, and release it with
      irqrestore.
      
      Lockdeps reports this as follows:
      
          [ INFO: inconsistent lock state ]
          3.5.0+ #20 Not tainted
          inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
          swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
          (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
          {HARDIRQ-ON-W} state was registered at:
            [<ffffffff810b84a0>] mark_irqflags+0x120/0x190
            [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
            [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
            [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
            [<ffffffffa028d563>] mlx4_ib_modify_device+0x63/0x240 [mlx4_ib]
            [<ffffffffa026d1fc>] ib_modify_device+0x1c/0x20 [ib_core]
            [<ffffffffa026c353>] set_node_desc+0x83/0xc0 [ib_core]
            [<ffffffff8136a150>] dev_attr_store+0x20/0x30
            [<ffffffff81201fd6>] sysfs_write_file+0xe6/0x170
            [<ffffffff8118da38>] vfs_write+0xc8/0x190
            [<ffffffff8118dc01>] sys_write+0x51/0x90
            [<ffffffff8155b869>] system_call_fastpath+0x16/0x1b
      
          ...
          *** DEADLOCK ***
      
          1 lock held by swapper/0/0:
      
          stack backtrace:
          Pid: 0, comm: swapper/0 Not tainted 3.5.0+ #20
          Call Trace:
          <IRQ>  [<ffffffff810b7bea>] print_usage_bug+0x18a/0x190
          [<ffffffff810b7370>] ? print_irq_inversion_bug+0x210/0x210
          [<ffffffff810b7fb2>] mark_lock_irq+0xf2/0x280
          [<ffffffff810b8290>] mark_lock+0x150/0x240
          [<ffffffff810b84ef>] mark_irqflags+0x16f/0x190
          [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffffa026b2fa>] ? ib_create_ah+0x1a/0x40 [ib_core]
          [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff810c27c3>] ? is_module_address+0x23/0x30
          [<ffffffffa028b05b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib]
          [<ffffffffa028c177>] mlx4_ib_event+0x117/0x160 [mlx4_ib]
          [<ffffffff81552501>] ? _raw_spin_lock_irqsave+0x61/0x70
          [<ffffffffa022718c>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core]
          [<ffffffffa0221b40>] mlx4_eq_int+0x500/0x950 [mlx4_core]
      
      Reported by: Or Gerlitz <ogerlitz@mellanox.com>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      df7fba66
  15. 12 7月, 2012 1 次提交
    • J
      mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them · 6634961c
      Jack Morgenstein 提交于
      To allow easy paravirtualization of P_Key and GID table sizes, keep
      paravirtualized sizes in mlx4_dev->caps, but save the actual physical
      sizes from FW in struct: mlx4_dev->phys_cap.
      
      In addition, in SR-IOV mode, do the following:
      
      1. Reduce reported P_Key table size by 1.
         This is done to reserve the highest P_Key index for internal use,
         for declaring an invalid P_Key in P_Key paravirtualization.
         We require a P_Key index which always contain an invalid P_Key
         value for this purpose (i.e., one which cannot be modified by
         the subnet manager).  The way to do this is to reduce the
         P_Key table size reported to the subnet manager by 1, so that
         it will not attempt to access the P_Key at index #127.
      
      2. Paravirtualize the GID table size to 1. Thus, each guest sees
         only a single GID (at its paravirtualized index 0).
      
      In addition, since we are paravirtualizing the GID table size to 1, we
      add paravirtualization of the master GID event here (i.e., we do not
      do ib_dispatch_event() for the GUID change event on the master, since
      its (only) GUID never changes).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6634961c
  16. 11 7月, 2012 1 次提交
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  17. 09 7月, 2012 1 次提交
  18. 31 1月, 2012 1 次提交
    • J
      IB/mlx4: pass SMP vendor-specific attribute MADs to firmware · a6f7feae
      Jack Morgenstein 提交于
      In the current code, vendor-specific MADs (e.g with the FDR-10
      attribute) are silently dropped by the driver, resulting in timeouts
      at the sending side and inability to query/configure the relevant
      feature.  However, the ConnectX firmware is able to handle such MADs.
      For unsupported attributes, the firmware returns a GET_RESPONSE MAD
      containing an error status.
      
      For example, for a FDR-10 node with LID 11:
      
          # ibstat mlx4_0 1
      
          CA: 'mlx4_0'
          Port 1:
          State: Active
          Physical state: LinkUp
          Rate: 40 (FDR10)
          Base lid: 11
          LMC: 0
          SM lid: 24
          Capability mask: 0x02514868
          Port GUID: 0x0002c903002e65d1
          Link layer: InfiniBand
      
      Extended Port Query (EPI) vendor mad timeouts before the patch:
      
          # smpquery MEPI 11 -d
      
          ibwarn: [4196] smp_query_via: attr 0xff90 mod 0x0 route Lid 11
          ibwarn: [4196] _do_madrpc: retry 1 (timeout 1000 ms)
          ibwarn: [4196] _do_madrpc: retry 2 (timeout 1000 ms)
          ibwarn: [4196] _do_madrpc: timeout after 3 retries, 3000 ms
          ibwarn: [4196] mad_rpc: _do_madrpc failed; dport (Lid 11)
          smpquery: iberror: [pid 4196] main: failed: operation EPI: ext port info query failed
      
      EPI query works OK with the patch:
      
          # smpquery MEPI 11 -d
      
          ibwarn: [6548] smp_query_via: attr 0xff90 mod 0x0 route Lid 11
          ibwarn: [6548] mad_rpc: data offs 64 sz 64
          mad data
          0000 0000 0000 0001 0000 0001 0000 0001
          0000 0000 0000 0000 0000 0000 0000 0000
          0000 0000 0000 0000 0000 0000 0000 0000
          0000 0000 0000 0000 0000 0000 0000 0000
          # Ext Port info: Lid 11 port 0
          StateChangeEnable:...............0x00
          LinkSpeedSupported:..............0x01
          LinkSpeedEnabled:................0x01
          LinkSpeedActive:.................0x01
      Signed-off-by: NJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Acked-by: NIra Weiny <weiny2@llnl.gov>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a6f7feae
  19. 14 12月, 2011 1 次提交
  20. 19 7月, 2011 1 次提交
  21. 11 1月, 2011 1 次提交
  22. 26 10月, 2010 1 次提交
    • E
      IB/mlx4: Add support for IBoE · fa417f7b
      Eli Cohen 提交于
      Add support for IBoE to mlx4_ib.  The bulk of the code is handling the
      new address vector fields; mlx4 needs the MAC address of a remote node
      to include it in a WQE (for datagrams) or in the QP context (for
      connected QPs).  Address resolution is done by assuming all unicast
      GIDs are either link-local IPv6 addresses.
      
      Multicast group attach/detach needs to update the NIC's multicast
      filters; but since attaching a QP to a multicast group can be done
      before the QP is bound to a port, for IBoE we need to keep track of
      all multicast groups that a QP is attached too before it transitions
      from INIT to RTR (since it does not have a port in the INIT state).
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      
      [ Many things cleaned up and otherwise monkeyed with; hope I didn't
        introduce too many bugs.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      fa417f7b
  23. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  24. 29 1月, 2009 1 次提交
    • M
      IB/mlx4: Fix dispatch of IB_EVENT_LID_CHANGE event · f0f6f346
      Moni Shoua 提交于
      When snooping a PortInfo MAD, its client_reregister bit is checked.
      If the bit is ON then a CLIENT_REREGISTER event is dispatched,
      otherwise a LID_CHANGE event is dispatched.  This way of decision
      ignores the cases where the MAD changes the LID along with an
      instruction to reregister (so a necessary LID_CHANGE event won't be
      dispatched) or the MAD is neither of these (and an unnecessary
      LID_CHANGE event will be dispatched).
      
      This causes problems at least with IPoIB, which will do a "light"
      flush on reregister, rather than the "heavy" flush required due to a
      LID change.
      
      Fix this by dispatching a CLIENT_REREGISTER event if the
      client_reregister bit is set, but also compare the LID in the MAD to
      the current LID.  If and only if they are not identical then a
      LID_CHANGE event is dispatched.
      Signed-off-by: NMoni Shoua <monis@voltaire.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f0f6f346
  25. 23 10月, 2008 1 次提交
  26. 15 7月, 2008 1 次提交