1. 01 10月, 2012 12 次提交
    • J
      mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations · 47605df9
      Jack Morgenstein 提交于
      Previously, the structure of a guest's proxy QPs followed the
      structure of the PPF special qps (qp0 port 1, qp0 port 2, qp1 port 1,
      qp1 port 2, ...).  The guest then did offset calculations on the
      sqp_base qp number that the PPF passed to it in QUERY_FUNC_CAP().
      
      This is now changed so that the guest does no offset calculations
      regarding proxy or tunnel QPs to use.  This change frees the PPF from
      needing to adhere to a specific order in allocating proxy and tunnel
      QPs.
      
      Now QUERY_FUNC_CAP provides each port individually with its proxy
      qp0, proxy qp1, tunnel qp0, and tunnel qp1 QP numbers, and these are
      used directly where required (with no offset calculations).
      
      To accomplish this change, several fields were added to the phys_caps
      structure for use by the PPF and by non-SR-IOV mode:
      
          base_sqpn -- in non-sriov mode, this was formerly sqp_start.
          base_proxy_sqpn -- the first physical proxy qp number -- used by PPF
          base_tunnel_sqpn -- the first physical tunnel qp number -- used by PPF.
      
      The current code in the PPF still adheres to the previous layout of
      sqps, proxy-sqps and tunnel-sqps.  However, the PPF can change this
      layout without affecting VF or (paravirtualized) PF code.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      47605df9
    • J
      mlx4: Paravirtualize Node Guids for slaves · afa8fd1d
      Jack Morgenstein 提交于
      This is necessary in order to support > 1 VF/PF in a VM for software
      that uses the node guid as a discriminator, such as librdmacm.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      afa8fd1d
    • J
      IB/mlx4: Miscellaneous adjustments for SR-IOV IB support · 992e8e6e
      Jack Morgenstein 提交于
      1. Allow only master to change node description.
      2. Prevent AH leakage in send mads.
      3. Take device part number from PCI structure, so that guests see the
         VF part number (and not the PF part number).
      4. Place the device revision ID into caps structure at startup.
      5. SET_PORT in update_gids_task needs to go through wrapper on master.
      6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
         queue on the master, since it propagates events to slaves using
         GEN_EQE.
      7. Do not support FMR on slaves.
      8. Add spinlock to slave_event(), since it is called both in interrupt
         context and in process context (due to 6 above, and also if
         smp_snoop is used).  This fix was found and implemented by Saeed
         Mahameed <saeedm@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      992e8e6e
    • J
      IB/mlx4: Add iov directory in sysfs under the ib device · c1e7e466
      Jack Morgenstein 提交于
      This directory is added only for the master -- slaves do not have it.
      
      The sysfs iov directory is used to manage and examine the port P_Key
      and guid paravirtualization.
      
      Under iov/ports, the administrator may examine the gid and P_Key tables
      as they are present in the device (and as are seen in the "network
      view" presented to the SM).
      
      Under the iov/<pci slot number> directories, the admin may map the
      index numbers in the physical tables (as under iov/ports) to the
      paravirtualized index numbers that guests see.
      
      For example, if the administrator, for port 1 on guest 2 maps physical
      pkey index 10 to virtual index 1, then that guest, whenever it uses
      its pkey index 1, will actually be using the real pkey index 10.
      
      Based on patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c1e7e466
    • J
      IB/mlx4: Propagate P_Key and guid change port management events to slaves · 2a4fae14
      Jack Morgenstein 提交于
      P_Key change and guid change events are not of interest to all slaves,
      but only to those slaves which "see" the table slots whose contents
      have change.
      
      For example, if the guid at port 1, index 5 has changed in the PPF, we
      wish to propagate the gid-change event only to the function which has
      that guid index mapped to its port/guid table (in this case it is
      slave #5). Other functions should not get the event, since the event
      does not affect them.
      
      Similarly with P_Keys -- P_Key change events are forwarded only to
      slaves which have that P_Key index mapped to their virtual P_Key table.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      2a4fae14
    • J
      mlx4: Add alias_guid mechanism · a0c64a17
      Jack Morgenstein 提交于
      For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
      GUID at index 0 seen by a slave is the actual GUID occupying the GUID
      table at the slave-id index.
      
      The driver, by default, requests at startup time that subnet manager
      populate its entire guid table with GUIDs. These guids are then mapped
      (paravirtualized) to the slaves, and appear for each slave as its GUID
      at index 0.
      
      Until each slave has such a guid, its port status is DOWN.
      
      The guid table is cached to support special QP paravirtualization, and
      event propagation to slaves on guid change (we test to see if the guid
      really changed before propagating an event to the slave).
      
      To support this caching, add capability to __mlx4_ib_query_gid() to
      obtain the network view (i.e., physical view) gid at index X, not just
      the host (paravirtualized) view.
      
      Based on a patch from Erez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a0c64a17
    • A
      IB/mlx4: Add CM paravirtualization · 3cf69cc8
      Amir Vadai 提交于
      In CM para-virtualization:
      
      1. Incoming requests are steered to the correct vHCA according to the
         embedded GID.
      2. Communication IDs on outgoing requests are replaced by a globally
         unique ID, generated by the PPF, since there is no synchronization
         of ID generation between guests (and so these IDs are not
         guaranteed to be globally unique).  The guest's comm ID is stored,
         and is returned to the response MAD when it arrives.
      Signed-off-by: NAmir Vadai <amirv@mellanox.co.il>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      3cf69cc8
    • O
      IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV · b9c5d6a6
      Oren Duer 提交于
      MCG paravirtualization support includes:
      - Creating multicast groups by VFs, and keeping accounting of them
      - Leaving multicast groups by VFs
      - Updating SM only with real changes in the overall picture of MCGs status
      - Creation of MGID=0 groups (let SM choose MGID)
      
      Note that the MCG module maintains its own internal MCG object
      reference counts.  The reason for this is that the IB core is used to
      track only the multicast groups joins generated by the PF it runs
      over.  The PF IB core layer is unaware of slaves, so it cannot be used
      to keep track of MCG joins they generate.
      Signed-off-by: NOren Duer <oren@mellanox.co.il>
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b9c5d6a6
    • J
      mlx4: MAD_IFC paravirtualization · 0a9a0188
      Jack Morgenstein 提交于
      The MAD_IFC firmware command fulfills two functions.
      
      First, it is used in the QP0/QP1 MAD-handling flow to obtain
      information from the FW (for answering queries), and for setting
      variables in the HCA (MAD SET packets).
      
      For this, MAD_IFC should provide the FW (physical) view of the data.
      This is the view that OpenSM needs.  We call this the "network view".
      
      In the second case, MAD_IFC is used by various verbs to obtain data
      regarding the local HCA (e.g., ib_query_device()).  We call this the
      "host view".
      
      This data needs to be paravirtualized.
      
      MAD_IFC therefore needs a wrapper function, and also needs another
      flag indicating whether it should provide the network view (when it is
      called by ib_process_mad in special-qp packet handling), or the host
      view (when it is called while implementing a verb).
      
      There are currently 2 flag parameters in mlx4_MAD_IFC already:
      ignore_bkey and ignore_mkey.  These two parameters are replaced by a
      single "mad_ifc_flags" parameter, with different bits set for each
      flag.  A third flag is added: "network-view/host-view".
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      0a9a0188
    • J
      IB/mlx4: SR-IOV multiplex and demultiplex MADs · 37bfc7c1
      Jack Morgenstein 提交于
      Special QPs are paravirtualized.
      
      vHCAs are not given direct access to QP0/1. Rather, these QPs are
      operated by a special context hosted by the PF, which mediates access
      to/from vHCAs.  This is done by opening a "tunnel" per vHCA port per
      QP0/1. A tunnel comprises a pair of UD QPs: a "Tunnel QP" in the
      PF-context and a "Proxy QP" in the vHCA.  All vHCA MAD traffic must
      pass through the corresponding tunnel.  vHCA QPs cannot be assigned to
      VL15 and are denied of the well-known QKey.
      
      Outgoing messages are "de-multiplexed" (i.e., directed to the wire via
      the real special QP).
      
      Incoming messages are "multiplexed" (i.e. steered by the PPF to the
      correct VF or to the PF)
      
      QP0 access is restricted to the PF vHCA. VF vHCAs also have (virtual)
      QP0s, but they never receive any SMPs and all SMPs sent are discarded.
      QP1 traffic is allowed for all vHCAs, but special care is required to
      bridge the gap between the host and network views.
      
      Specifically:
      - Transaction IDs are mapped to guarantee uniqueness among vHCAs
      - CM para-virtualization
        o   Incoming requests are steered to the correct vHCA according to the embedded GID
        o   Local communication IDs are mapped to ensure uniqueness among vHCAs
        (see the patch that adds CM paravirtualization.)
      - Multicast para-virtualization
        o   The PF context aggregates membership state from all vHCAs
        o   The SA is contacted only when the aggregate membership changes
        o   If the aggregate does not change, the PF context will provide the
            requesting vHCA with the proper response.
        (see the patch that adds multicast group paravirtualization)
      
      Incoming MADs are steered according to:
      - the DGID If a GRH is present
      - the mapped transaction ID for response MADs
      - the embedded GID in CM requests
      - the remote communication ID in other CM messages
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      37bfc7c1
    • J
      mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop · 54679e14
      Jack Morgenstein 提交于
      This requires:
      
      1. Replacing the paravirtualized P_Key index (inserted by the guest)
         with the real P_Key index.
      
      2. For UD QPs, placing the guest's true source GID index in the
         address path structure mgid field, and setting the ud_force_mgid
         bit so that the mgid is taken from the QP context and not from the
         WQE when posting sends.
      
      3. For UC and RC QPs, placing the guest's true source GID index in the
         address path structure mgid field.
      
      4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
         proxy/tunnel pair.
      
      Since not all the above adjustments occur in all the QP transitions,
      the QP transitions require separate wrapper functions.
      
      Secondly, initialize the P_Key virtualization table to its default
      values: Master virtualized table is 1-1 with the real P_Key table,
      guest virtualized table has P_Key index 0 mapped to the real P_Key
      index 0, and all the other P_Key indices mapped to the reserved
      (invalid) P_Key at index 127.
      
      Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
      and generating events on the master only if a P_Key actually changed.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      54679e14
    • J
      IB/mlx4: Initialize SR-IOV IB support for slaves in master context · fc06573d
      Jack Morgenstein 提交于
      Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
      on the master.
      
      This has two parts.  The first part is to initialize the structures to
      contain the contexts.  This is done at master startup time in
      mlx4_ib_init_sriov().
      
      The second part is to actually create the tunneling resources required
      on the master to support a slave.  This is performed the master
      detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
      generated when a slave initializes its comm channel).
      
      For the master, there is no such startup event, so it creates its own
      tunneling resources when it starts up.  In addition, the master also
      creates the real special QPs.  The ib_core layer on the master causes
      creation of proxy special QPs, since the master is also
      paravirtualized at the ib_core layer.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      fc06573d
  2. 11 8月, 2012 1 次提交
    • J
      IB/mlx4: Fix possible deadlock on sm_lock spinlock · df7fba66
      Jack Morgenstein 提交于
      The sm_lock spinlock is taken in the process context by
      mlx4_ib_modify_device, and in the interrupt context by update_sm_ah,
      so we need to take that spinlock with irqsave, and release it with
      irqrestore.
      
      Lockdeps reports this as follows:
      
          [ INFO: inconsistent lock state ]
          3.5.0+ #20 Not tainted
          inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
          swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
          (&(&ibdev->sm_lock)->rlock){?.+...}, at: [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
          {HARDIRQ-ON-W} state was registered at:
            [<ffffffff810b84a0>] mark_irqflags+0x120/0x190
            [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
            [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
            [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
            [<ffffffffa028d563>] mlx4_ib_modify_device+0x63/0x240 [mlx4_ib]
            [<ffffffffa026d1fc>] ib_modify_device+0x1c/0x20 [ib_core]
            [<ffffffffa026c353>] set_node_desc+0x83/0xc0 [ib_core]
            [<ffffffff8136a150>] dev_attr_store+0x20/0x30
            [<ffffffff81201fd6>] sysfs_write_file+0xe6/0x170
            [<ffffffff8118da38>] vfs_write+0xc8/0x190
            [<ffffffff8118dc01>] sys_write+0x51/0x90
            [<ffffffff8155b869>] system_call_fastpath+0x16/0x1b
      
          ...
          *** DEADLOCK ***
      
          1 lock held by swapper/0/0:
      
          stack backtrace:
          Pid: 0, comm: swapper/0 Not tainted 3.5.0+ #20
          Call Trace:
          <IRQ>  [<ffffffff810b7bea>] print_usage_bug+0x18a/0x190
          [<ffffffff810b7370>] ? print_irq_inversion_bug+0x210/0x210
          [<ffffffff810b7fb2>] mark_lock_irq+0xf2/0x280
          [<ffffffff810b8290>] mark_lock+0x150/0x240
          [<ffffffff810b84ef>] mark_irqflags+0x16f/0x190
          [<ffffffff810b9ce7>] __lock_acquire+0x307/0x4c0
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff810b9f51>] lock_acquire+0xb1/0x150
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff815523b1>] _raw_spin_lock+0x41/0x50
          [<ffffffffa028af1d>] ? update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffffa026b2fa>] ? ib_create_ah+0x1a/0x40 [ib_core]
          [<ffffffffa028af1d>] update_sm_ah+0xad/0x100 [mlx4_ib]
          [<ffffffff810c27c3>] ? is_module_address+0x23/0x30
          [<ffffffffa028b05b>] handle_port_mgmt_change_event+0xeb/0x150 [mlx4_ib]
          [<ffffffffa028c177>] mlx4_ib_event+0x117/0x160 [mlx4_ib]
          [<ffffffff81552501>] ? _raw_spin_lock_irqsave+0x61/0x70
          [<ffffffffa022718c>] mlx4_dispatch_event+0x6c/0x90 [mlx4_core]
          [<ffffffffa0221b40>] mlx4_eq_int+0x500/0x950 [mlx4_core]
      
      Reported by: Or Gerlitz <ogerlitz@mellanox.com>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      df7fba66
  3. 12 7月, 2012 1 次提交
    • J
      mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them · 6634961c
      Jack Morgenstein 提交于
      To allow easy paravirtualization of P_Key and GID table sizes, keep
      paravirtualized sizes in mlx4_dev->caps, but save the actual physical
      sizes from FW in struct: mlx4_dev->phys_cap.
      
      In addition, in SR-IOV mode, do the following:
      
      1. Reduce reported P_Key table size by 1.
         This is done to reserve the highest P_Key index for internal use,
         for declaring an invalid P_Key in P_Key paravirtualization.
         We require a P_Key index which always contain an invalid P_Key
         value for this purpose (i.e., one which cannot be modified by
         the subnet manager).  The way to do this is to reduce the
         P_Key table size reported to the subnet manager by 1, so that
         it will not attempt to access the P_Key at index #127.
      
      2. Paravirtualize the GID table size to 1. Thus, each guest sees
         only a single GID (at its paravirtualized index 0).
      
      In addition, since we are paravirtualizing the GID table size to 1, we
      add paravirtualization of the master GID event here (i.e., we do not
      do ib_dispatch_event() for the GUID change event on the master, since
      its (only) GUID never changes).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6634961c
  4. 11 7月, 2012 1 次提交
    • J
      mlx4: Use port management change event instead of smp_snoop · 00f5ce99
      Jack Morgenstein 提交于
      The port management change event can replace smp_snoop.  If the
      capability bit for this event is set in dev-caps, the event is used
      (by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
      mask in the MAP_EQ fw command).  In this case, when the driver passes
      incoming SMP PORT_INFO SET mads to the FW, the FW generates port
      management change events to signal any changes to the driver.
      
      If the FW generates these events, smp_snoop shouldn't be invoked in
      ib_process_mad(), or duplicate events will occur (once from the
      FW-generated event, and once from smp_snoop).
      
      In the case where the FW does not generate port management change
      events smp_snoop needs to be invoked to create these events.  The flow
      in smp_snoop has been modified to make use of the same procedures as
      in the fw-generated-event event case to generate the port management
      events (LID change, Client-rereg, Pkey change, and/or GID change).
      
      Port management change event handling required changing the
      mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
      (last argument) had to be changed to unsigned long in order to
      accomodate passing the EQE pointer.
      
      We also needed to move the definition of struct mlx4_eqe from
      net/mlx4.h to file device.h -- to make it available to the IB driver,
      to handle port management change events.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      00f5ce99
  5. 09 7月, 2012 1 次提交
  6. 31 1月, 2012 1 次提交
    • J
      IB/mlx4: pass SMP vendor-specific attribute MADs to firmware · a6f7feae
      Jack Morgenstein 提交于
      In the current code, vendor-specific MADs (e.g with the FDR-10
      attribute) are silently dropped by the driver, resulting in timeouts
      at the sending side and inability to query/configure the relevant
      feature.  However, the ConnectX firmware is able to handle such MADs.
      For unsupported attributes, the firmware returns a GET_RESPONSE MAD
      containing an error status.
      
      For example, for a FDR-10 node with LID 11:
      
          # ibstat mlx4_0 1
      
          CA: 'mlx4_0'
          Port 1:
          State: Active
          Physical state: LinkUp
          Rate: 40 (FDR10)
          Base lid: 11
          LMC: 0
          SM lid: 24
          Capability mask: 0x02514868
          Port GUID: 0x0002c903002e65d1
          Link layer: InfiniBand
      
      Extended Port Query (EPI) vendor mad timeouts before the patch:
      
          # smpquery MEPI 11 -d
      
          ibwarn: [4196] smp_query_via: attr 0xff90 mod 0x0 route Lid 11
          ibwarn: [4196] _do_madrpc: retry 1 (timeout 1000 ms)
          ibwarn: [4196] _do_madrpc: retry 2 (timeout 1000 ms)
          ibwarn: [4196] _do_madrpc: timeout after 3 retries, 3000 ms
          ibwarn: [4196] mad_rpc: _do_madrpc failed; dport (Lid 11)
          smpquery: iberror: [pid 4196] main: failed: operation EPI: ext port info query failed
      
      EPI query works OK with the patch:
      
          # smpquery MEPI 11 -d
      
          ibwarn: [6548] smp_query_via: attr 0xff90 mod 0x0 route Lid 11
          ibwarn: [6548] mad_rpc: data offs 64 sz 64
          mad data
          0000 0000 0000 0001 0000 0001 0000 0001
          0000 0000 0000 0000 0000 0000 0000 0000
          0000 0000 0000 0000 0000 0000 0000 0000
          0000 0000 0000 0000 0000 0000 0000 0000
          # Ext Port info: Lid 11 port 0
          StateChangeEnable:...............0x00
          LinkSpeedSupported:..............0x01
          LinkSpeedEnabled:................0x01
          LinkSpeedActive:.................0x01
      Signed-off-by: NJack Morgenstein <jackm@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Acked-by: NIra Weiny <weiny2@llnl.gov>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a6f7feae
  7. 14 12月, 2011 1 次提交
  8. 19 7月, 2011 1 次提交
  9. 11 1月, 2011 1 次提交
  10. 26 10月, 2010 1 次提交
    • E
      IB/mlx4: Add support for IBoE · fa417f7b
      Eli Cohen 提交于
      Add support for IBoE to mlx4_ib.  The bulk of the code is handling the
      new address vector fields; mlx4 needs the MAC address of a remote node
      to include it in a WQE (for datagrams) or in the QP context (for
      connected QPs).  Address resolution is done by assuming all unicast
      GIDs are either link-local IPv6 addresses.
      
      Multicast group attach/detach needs to update the NIC's multicast
      filters; but since attaching a QP to a multicast group can be done
      before the QP is bound to a port, for IBoE we need to keep track of
      all multicast groups that a QP is attached too before it transitions
      from INIT to RTR (since it does not have a port in the INIT state).
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      
      [ Many things cleaned up and otherwise monkeyed with; hope I didn't
        introduce too many bugs.  - Roland ]
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      fa417f7b
  11. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 29 1月, 2009 1 次提交
    • M
      IB/mlx4: Fix dispatch of IB_EVENT_LID_CHANGE event · f0f6f346
      Moni Shoua 提交于
      When snooping a PortInfo MAD, its client_reregister bit is checked.
      If the bit is ON then a CLIENT_REREGISTER event is dispatched,
      otherwise a LID_CHANGE event is dispatched.  This way of decision
      ignores the cases where the MAD changes the LID along with an
      instruction to reregister (so a necessary LID_CHANGE event won't be
      dispatched) or the MAD is neither of these (and an unnecessary
      LID_CHANGE event will be dispatched).
      
      This causes problems at least with IPoIB, which will do a "light"
      flush on reregister, rather than the "heavy" flush required due to a
      LID change.
      
      Fix this by dispatching a CLIENT_REREGISTER event if the
      client_reregister bit is set, but also compare the LID in the MAD to
      the current LID.  If and only if they are not identical then a
      LID_CHANGE event is dispatched.
      Signed-off-by: NMoni Shoua <monis@voltaire.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NYossi Etigin <yosefe@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f0f6f346
  13. 23 10月, 2008 1 次提交
  14. 15 7月, 2008 1 次提交
  15. 17 4月, 2008 1 次提交
  16. 16 8月, 2007 1 次提交
  17. 09 5月, 2007 1 次提交
    • R
      IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters · 225c7b1f
      Roland Dreier 提交于
      Add an InfiniBand driver for Mellanox ConnectX adapters.  Because
      these adapters can also be used as ethernet NICs and Fibre Channel 
      HBAs, the driver is split into two modules: 
       
        mlx4_core: Handles low-level things like device initialization and 
          processing firmware commands.  Also controls resource allocation 
          so that the InfiniBand, ethernet and FC functions can share a 
          device without stepping on each other. 
       
        mlx4_ib: Handles InfiniBand-specific things; plugs into the 
          InfiniBand midlayer. 
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      225c7b1f
  18. 30 11月, 2006 1 次提交
    • R
      IB/mthca: Fix section mismatches · f4f3d0f0
      Roland Dreier 提交于
      Commit b3b30f5e ("IB/mthca: Recover from catastrophic errors")
      introduced some section mismatch breakage, because the error recovery
      code tears down and reinitializes the device, which calls into lots of
      code originally marked __devinit and __devexit from regular .text.
      
      Fix this by getting rid of these now-incorrect section markers.
      
      Reported by Randy Dunlap <randy.dunlap@oracle.com>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f4f3d0f0
  19. 23 9月, 2006 1 次提交
  20. 18 6月, 2006 1 次提交
  21. 20 4月, 2006 1 次提交
  22. 11 4月, 2006 1 次提交
    • J
      IB: simplify static rate encoding · bf6a9e31
      Jack Morgenstein 提交于
      Push translation of static rate to HCA format into low-level drivers,
      where it belongs.  For static rate encoding, use encoding of rate
      field from IB standard PathRecord, with addition of value 0, for
      backwards compatibility with current usage.  The changes are:
      
       - Add enum ib_rate to midlayer includes.
       - Get rid of static rate translation in IPoIB; just use static rate
         directly from Path and MulticastGroup records.
       - Update mthca driver to translate absolute static rate into the
         format used by hardware.  This also fixes mthca's static rate
         handling for HCAs that are capable of 4X DDR.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      bf6a9e31
  23. 30 3月, 2006 1 次提交
  24. 21 3月, 2006 1 次提交
  25. 31 10月, 2005 1 次提交
    • T
      [PATCH] fix missing includes · 4e57b681
      Tim Schmielau 提交于
      I recently picked up my older work to remove unnecessary #includes of
      sched.h, starting from a patch by Dave Jones to not include sched.h
      from module.h. This reduces the number of indirect includes of sched.h
      by ~300. Another ~400 pointless direct includes can be removed after
      this disentangling (patch to follow later).
      However, quite a few indirect includes need to be fixed up for this.
      
      In order to feed the patches through -mm with as little disturbance as
      possible, I've split out the fixes I accumulated up to now (complete for
      i386 and x86_64, more archs to follow later) and post them before the real
      patch.  This way this large part of the patch is kept simple with only
      adding #includes, and all hunks are independent of each other.  So if any
      hunk rejects or gets in the way of other patches, just drop it.  My scripts
      will pick it up again in the next round.
      Signed-off-by: NTim Schmielau <tim@physik3.uni-rostock.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4e57b681
  26. 26 10月, 2005 1 次提交
    • S
      [IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped · 34816ad9
      Sean Hefty 提交于
      The MAD layer was violating the DMA API by touching data buffers used
      for sends after the DMA mapping was done.  This causes problems on
      non-cache-coherent architectures, because the device doing DMA won't
      see updates to the payload buffers that exist only in the CPU cache.
      
      Fix this by having all MAD consumers use ib_create_send_mad() to
      allocate their send buffers, and moving the DMA mapping into the MAD
      layer so it can be done just before calling send (and after any
      modifications of the send buffer by the MAD layer).
      
      Tested on a non-cache-coherent PowerPC 440SPe system.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      34816ad9
  27. 27 8月, 2005 3 次提交