1. 15 7月, 2008 1 次提交
    • S
      RDMA/cxgb3: MEM_MGT_EXTENSIONS support · e7e55829
      Steve Wise 提交于
      - set IB_DEVICE_MEM_MGT_EXTENSIONS capability bit if fw supports it.
      - set max_fast_reg_page_list_len device attribute.
      - add iwch_alloc_fast_reg_mr function.
      - add iwch_alloc_fastreg_pbl
      - add iwch_free_fastreg_pbl
      - adjust the WQ depth for kernel mode work queues to account for
        fastreg possibly taking 2 WR slots.
      - add fastreg_mr work request support.
      - add local_inv work request support.
      - add send_with_inv and send_with_se_inv work request support.
      - removed useless duplicate enums/defines for TPT/MW/MR stuff.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e7e55829
  2. 09 7月, 2008 1 次提交
  3. 17 5月, 2008 1 次提交
  4. 14 5月, 2008 1 次提交
  5. 07 5月, 2008 2 次提交
    • R
      RDMA/cxgb3: Fix severe limit on userspace memory registration size · 273748cc
      Roland Dreier 提交于
      Currently, iw_cxgb3 is severely limited on the amount of userspace
      memory that can be registered in in a single memory region, which
      causes big problems for applications that expect to be able to
      register 100s of MB.
      
      The problem is that the driver uses a single kmalloc()ed buffer to
      hold the physical buffer list (PBL) for the entire memory region
      during registration, which means that 8 bytes of contiguous memory are
      required for each page of memory being registered.  For example, a 64
      MB registration will require 128 KB of contiguous memory with 4 KB
      pages, and it unlikely that such an allocation will succeed on a busy
      system.
      
      This is purely a driver problem: the temporary page list buffer is not
      needed by the hardware, so we can fix this by writing the PBL to the
      hardware in page-sized chunks rather than all at once.  We do this by
      splitting the memory registration operation up into several steps:
      
       - Allocate PBL space in adapter memory for the full registration
       - Copy PBL to adapter memory in chunks
       - Allocate STag and enable memory region
      
      This also allows several other cleanups to the __cxio_tpt_op()
      interface and related parts of the driver.
      
      This change leaves the reregister memory region and memory window
      operations broken, but they already didn't work due to other
      longstanding bugs, so fixing them will be left to a later patch.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      273748cc
    • R
      RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks · 0e991336
      Roland Dreier 提交于
      Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB
      chunks.  This limits the largest single allocation that can be done to
      the same size, which means that with 4 KB pages, each of which takes 8
      bytes of PBL memory, the largest memory region that can be allocated
      is 1 GB (256K PBL entries * 4 KB/entry).
      
      Remove this limit by adding all the PBL memory in a single gen_pool
      chunk, if possible.  Add code that falls back to smaller chunks if
      gen_pool_add() fails, which can happen if there is not sufficient
      contiguous lowmem for the internal gen_pool bitmap.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0e991336
  6. 03 5月, 2008 3 次提交
  7. 30 4月, 2008 3 次提交
    • S
      RDMA/cxgb3: Support peer-2-peer connection setup · f8b0dfd1
      Steve Wise 提交于
      Open MPI, Intel MPI and other applications don't respect the iWARP
      requirement that the client (active) side of the connection send the
      first RDMA message.  This class of application connection setup is
      called peer-to-peer.  Typically once the connection is setup, _both_
      sides want to send data.
      
      This patch enables supporting peer-to-peer over the chelsio RNIC by
      enforcing this iWARP requirement in the driver itself as part of RDMA
      connection setup.
      
      Connection setup is extended, when the peer2peer module option is 1,
      such that the MPA initiator will send a 0B Read (the RTR) just after
      connection setup.  The MPA responder will suspend SQ processing until
      the RTR message is received and reply-to.
      
      In the longer term, this will be handled in a standardized way by
      enhancing the MPA negotiation so peers can indicate whether they
      want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
      should be sent.  This will be done by standardizing a few bits of the
      private data in order to negotiate all this.  However this patch
      enables peer-to-peer applications now and allows most of the required
      firmware and driver changes to be done and tested now.
      
      Design:
      
       - Add a module option, peer2peer, to enable this mode.
      
       - New firmware support for peer-to-peer mode:
      
      	- a new bit in the rdma_init WR to tell it to do peer-2-peer
      	  and what form of RTR message to send or expect.
      
      	- process _all_ preposted recvs before moving the connection
      	  into rdma mode.
      
      	- passive side: defer completing the rdma_init WR until all
      	  pre-posted recvs are processed.  Suspend SQ processing until
      	  the RTR is received.
      
      	- active side: expect and process the 0B read WR on offload TX
      	  queue. Defer completing the rdma_init WR until all
      	  pre-posted recvs are processed.  Suspend SQ processing until
      	  the 0B read WR is processed from the offload TX queue.
      
       - If peer2peer is set, driver posts 0B read request on offload TX
         queue just after posting the rdma_init WR to the offload TX queue.
      
       - Add CQ poll logic to ignore unsolicitied read responses.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f8b0dfd1
    • S
      RDMA/cxgb3: Set the max_mr_size device attribute correctly · ccaf10d0
      Steve Wise 提交于
      cxgb3 only supports 4GB memory regions.  The lustre RDMA code uses
      this attribute and currently has to code around our bad setting.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ccaf10d0
    • S
      RDMA/cxgb3: Correctly serialize peer abort path · 989a1780
      Steve Wise 提交于
      Open MPI and other stress testing exposed a few bad bugs in handling
      aborts in the middle of a normal close.  Fix these by:
      
       - serializing abort reply and peer abort processing with disconnect
         processing
      
       - warning (and ignoring) if ep timer is stopped when it wasn't running
      
       - cleaning up disconnect path to correctly deal with aborting and
         dead endpoints
      
       - in iwch_modify_qp(), taking a ref on the ep before releasing the qp
         lock if iwch_ep_disconnect() will be called.  The ref is dropped
         after calling disconnect.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      989a1780
  8. 29 4月, 2008 1 次提交
    • A
      IB: expand ib_umem_get() prototype · cb9fbc5c
      Arthur Kepner 提交于
      Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
      when mapping user-allocated CQs with ib_umem_get().
      Signed-off-by: NArthur Kepner <akepner@sgi.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9fbc5c
  9. 20 4月, 2008 1 次提交
  10. 17 4月, 2008 3 次提交
    • R
      IB/core: Add support for "send with invalidate" work requests · 0f39cf3d
      Roland Dreier 提交于
      Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
      "send with invalidate" work request as defined in the iWARP verbs and
      the InfiniBand base memory management extensions.  Also put "imm_data"
      and a new "invalidate_rkey" member in a new "ex" union in struct
      ib_send_wr. The invalidate_rkey member can be used to pass in an
      R_Key/STag to be invalidated.  Add this new union to struct
      ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
      ib_uverbs_post_send().
      
      Fix up low-level drivers to deal with the change to struct ib_send_wr,
      and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
      since that code never does any send with immediate operations.
      
      Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
      the iWARP drivers currently in the tree set the bit.  The amso1100
      driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
      if passed in as part of userspace send requests (since it does not
      implement kernel bypass work request queueing).  Remove the flag from
      all existing drivers that set it until we know which ones are OK.
      
      The values chosen for the new flag is not consecutive to avoid clashing
      with flags defined in the XRC patches, which are not merged yet but
      which are already in use and are likely to be merged soon.
      
      This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0f39cf3d
    • H
      IB: Replace remaining __FUNCTION__ occurrences with __func__ · 33718363
      Harvey Harrison 提交于
      __FUNCTION__ is gcc-specific, use __func__ instead.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      33718363
    • R
      RDMA/cxgb3: IDR IDs are signed · edba846a
      Roland Dreier 提交于
      Fix sparse warnings about pointer signedness by using a signed int when
      calling idr_get_new_above().
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      edba846a
  11. 29 3月, 2008 1 次提交
  12. 10 3月, 2008 1 次提交
  13. 01 3月, 2008 1 次提交
  14. 26 2月, 2008 1 次提交
  15. 13 2月, 2008 1 次提交
  16. 29 1月, 2008 2 次提交
  17. 26 1月, 2008 8 次提交
  18. 14 11月, 2007 1 次提交
  19. 11 10月, 2007 1 次提交
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
  20. 10 10月, 2007 1 次提交
  21. 31 8月, 2007 1 次提交
  22. 04 8月, 2007 1 次提交
  23. 20 7月, 2007 1 次提交
  24. 18 7月, 2007 1 次提交
  25. 10 7月, 2007 1 次提交