1. 08 5月, 2008 7 次提交
  2. 07 5月, 2008 2 次提交
    • R
      RDMA/cxgb3: Fix severe limit on userspace memory registration size · 273748cc
      Roland Dreier 提交于
      Currently, iw_cxgb3 is severely limited on the amount of userspace
      memory that can be registered in in a single memory region, which
      causes big problems for applications that expect to be able to
      register 100s of MB.
      
      The problem is that the driver uses a single kmalloc()ed buffer to
      hold the physical buffer list (PBL) for the entire memory region
      during registration, which means that 8 bytes of contiguous memory are
      required for each page of memory being registered.  For example, a 64
      MB registration will require 128 KB of contiguous memory with 4 KB
      pages, and it unlikely that such an allocation will succeed on a busy
      system.
      
      This is purely a driver problem: the temporary page list buffer is not
      needed by the hardware, so we can fix this by writing the PBL to the
      hardware in page-sized chunks rather than all at once.  We do this by
      splitting the memory registration operation up into several steps:
      
       - Allocate PBL space in adapter memory for the full registration
       - Copy PBL to adapter memory in chunks
       - Allocate STag and enable memory region
      
      This also allows several other cleanups to the __cxio_tpt_op()
      interface and related parts of the driver.
      
      This change leaves the reregister memory region and memory window
      operations broken, but they already didn't work due to other
      longstanding bugs, so fixing them will be left to a later patch.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      273748cc
    • R
      RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks · 0e991336
      Roland Dreier 提交于
      Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB
      chunks.  This limits the largest single allocation that can be done to
      the same size, which means that with 4 KB pages, each of which takes 8
      bytes of PBL memory, the largest memory region that can be allocated
      is 1 GB (256K PBL entries * 4 KB/entry).
      
      Remove this limit by adding all the PBL memory in a single gen_pool
      chunk, if possible.  Add code that falls back to smaller chunks if
      gen_pool_add() fails, which can happen if there is not sufficient
      contiguous lowmem for the internal gen_pool bitmap.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0e991336
  3. 06 5月, 2008 1 次提交
  4. 03 5月, 2008 3 次提交
  5. 01 5月, 2008 1 次提交
    • R
      IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf() · 3ae15e16
      Roland Dreier 提交于
      When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I
      changed things around so that mlx4_ib_alloc_cq_buf() and
      mlx4_ib_free_cq_buf() were used everywhere they could be.  However, I
      screwed up the number of entries passed into mlx4_ib_alloc_cq_buf()
      in a couple places -- the function bumps the number of entries
      internally, so the caller shouldn't add 1 as well.
      
      Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf()
      can cause the cleanup to go off the end of an array and corrupt
      allocator state in interesting ways.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3ae15e16
  6. 30 4月, 2008 11 次提交
    • G
      RDMA/nes: Formatting cleanup · 7495ab68
      Glenn Streiff 提交于
      Various cleanups:
      	- Change // to /* .. */
      	- Place whitespace around binary operators.
      	- Trim down a few long lines.
      	- Some minor alignment formatting for better readability.
      	- Remove some silly tabs.
      Signed-off-by: NGlenn Streiff <gstreiff@neteffect.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7495ab68
    • E
      RDMA/nes: Add support for SFP+ PHY · 0e1de5d6
      Eric Schneider 提交于
      This patch enables the iw_nes module for NetEffect RNICs to support
      additional PHYs including SFP+ (referred to as ARGUS in the code).
      Signed-off-by: NEric Schneider <eric.schneider@neteffect.com>
      Signed-off-by: NGlenn Streiff <gstreiff@neteffect.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0e1de5d6
    • F
      RDMA/nes: Use LRO · 37dab411
      Faisal Latif 提交于
      Signed-off-by: Faisal Latif <flatif@neteffect.com.
      Signed-off-by: NGlenn Streiff <gstreiff@neteffect.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      37dab411
    • R
      IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute · baaad380
      Roland Dreier 提交于
      Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the
      mthca userspace ABI to provide a way for userspace to indicate which
      memory regions need the DMA write barrier attribute.  However, it is
      possible to handle this without breaking existing userspace, by having
      the mthca kernel driver recognize whether it is talking to old or new
      userspace, depending on the size of the register MR structure passed in.
      
      The only potential drawback of this is that is allows old userspace
      (which has a bug with DMA ordering on large SGI Altix systems) to
      continue to run on new kernels, but the advantage of allowing old
      userspace to continue to work on unaffected systems seems to outweigh
      this, and we can print a warning to push people to upgrade their
      userspace.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      baaad380
    • O
      IB/mthca: Avoid recycling old FMR R_Keys too soon · 0bfe151c
      Olaf Kirch 提交于
      When a FMR is unmapped, mthca resets the map count to 0, and clears
      the upper part of the R_Key which is used as the sequence counter.
      
      This poses a problem for RDS, which uses ib_fmr_unmap as a fence
      operation.  RDS assumes that after issuing an unmap, the old R_Keys
      will be invalid for a "reasonable" period of time. For instance,
      Oracle processes uses shared memory buffers allocated from a pool of
      buffers.  When a process dies, we want to reclaim these buffers -- but
      we must make sure there are no pending RDMA operations to/from those
      buffers.  The only way to achieve that is by using unmap and sync the
      TPT.
      
      However, when the sequence count is reset on unmap, there is a high
      likelihood that a new mapping will be given the same R_Key that was
      issued a few milliseconds ago.
      
      To prevent this, don't reset the sequence count when unmapping a FMR.
      Signed-off-by: NOlaf Kirch <olaf.kirch@oracle.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0bfe151c
    • S
      IB/ehca: Allocate event queue size depending on max number of CQs and QPs · d227fa72
      Stefan Roscher 提交于
      If a lot of QPs fall into Error state at once and the EQ of the
      respective HCA is too small, it might overrun, causing the eHCA driver
      to stop processing completion events and calling the application's
      completion handlers, effectively causing traffic to stop.
      
      Fix this by limiting available QPs and CQs to a customizable max
      count, and determining EQ size based on these counts and a worst-case
      assumption.
      Signed-off-by: NStefan Roscher <stefan.roscher@de.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d227fa72
    • H
      IB/ehca: handle negative return value from ibmebus_request_irq() properly · 7df109d9
      Hoang-Nam Nguyen 提交于
      ehca_create_eq() was assigning a signed return value to an unsiged
      local variable and then checking if the variable was < 0, which meant
      that errors were always ignored.  Fix this by using one variable for
      signed integer return values and another for u64 hcall return values.
      
      Bug originally found by Roel Kluin <12o3l@tiscali.nl>.
      Signed-off-by: NHoang-Nam Nguyen <hnguyen@de.ibm.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7df109d9
    • S
      RDMA/cxgb3: Support peer-2-peer connection setup · f8b0dfd1
      Steve Wise 提交于
      Open MPI, Intel MPI and other applications don't respect the iWARP
      requirement that the client (active) side of the connection send the
      first RDMA message.  This class of application connection setup is
      called peer-to-peer.  Typically once the connection is setup, _both_
      sides want to send data.
      
      This patch enables supporting peer-to-peer over the chelsio RNIC by
      enforcing this iWARP requirement in the driver itself as part of RDMA
      connection setup.
      
      Connection setup is extended, when the peer2peer module option is 1,
      such that the MPA initiator will send a 0B Read (the RTR) just after
      connection setup.  The MPA responder will suspend SQ processing until
      the RTR message is received and reply-to.
      
      In the longer term, this will be handled in a standardized way by
      enhancing the MPA negotiation so peers can indicate whether they
      want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
      should be sent.  This will be done by standardizing a few bits of the
      private data in order to negotiate all this.  However this patch
      enables peer-to-peer applications now and allows most of the required
      firmware and driver changes to be done and tested now.
      
      Design:
      
       - Add a module option, peer2peer, to enable this mode.
      
       - New firmware support for peer-to-peer mode:
      
      	- a new bit in the rdma_init WR to tell it to do peer-2-peer
      	  and what form of RTR message to send or expect.
      
      	- process _all_ preposted recvs before moving the connection
      	  into rdma mode.
      
      	- passive side: defer completing the rdma_init WR until all
      	  pre-posted recvs are processed.  Suspend SQ processing until
      	  the RTR is received.
      
      	- active side: expect and process the 0B read WR on offload TX
      	  queue. Defer completing the rdma_init WR until all
      	  pre-posted recvs are processed.  Suspend SQ processing until
      	  the 0B read WR is processed from the offload TX queue.
      
       - If peer2peer is set, driver posts 0B read request on offload TX
         queue just after posting the rdma_init WR to the offload TX queue.
      
       - Add CQ poll logic to ignore unsolicitied read responses.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f8b0dfd1
    • S
      RDMA/cxgb3: Set the max_mr_size device attribute correctly · ccaf10d0
      Steve Wise 提交于
      cxgb3 only supports 4GB memory regions.  The lustre RDMA code uses
      this attribute and currently has to code around our bad setting.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ccaf10d0
    • S
      RDMA/cxgb3: Correctly serialize peer abort path · 989a1780
      Steve Wise 提交于
      Open MPI and other stress testing exposed a few bad bugs in handling
      aborts in the middle of a normal close.  Fix these by:
      
       - serializing abort reply and peer abort processing with disconnect
         processing
      
       - warning (and ignoring) if ep timer is stopped when it wasn't running
      
       - cleaning up disconnect path to correctly deal with aborting and
         dead endpoints
      
       - in iwch_modify_qp(), taking a ref on the ep before releasing the qp
         lock if iwch_ep_disconnect() will be called.  The ref is dropped
         after calling disconnect.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      989a1780
    • Y
      mlx4_core: Add a way to set the "collapsed" CQ flag · e463c7b1
      Yevgeny Petrilin 提交于
      Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag
      for the CQ being created.
      Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e463c7b1
  7. 29 4月, 2008 1 次提交
    • A
      IB: expand ib_umem_get() prototype · cb9fbc5c
      Arthur Kepner 提交于
      Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
      when mapping user-allocated CQs with ib_umem_get().
      Signed-off-by: NArthur Kepner <akepner@sgi.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9fbc5c
  8. 24 4月, 2008 10 次提交
  9. 22 4月, 2008 4 次提交