1. 14 5月, 2008 1 次提交
  2. 07 5月, 2008 1 次提交
    • R
      RDMA/cxgb3: Fix severe limit on userspace memory registration size · 273748cc
      Roland Dreier 提交于
      Currently, iw_cxgb3 is severely limited on the amount of userspace
      memory that can be registered in in a single memory region, which
      causes big problems for applications that expect to be able to
      register 100s of MB.
      
      The problem is that the driver uses a single kmalloc()ed buffer to
      hold the physical buffer list (PBL) for the entire memory region
      during registration, which means that 8 bytes of contiguous memory are
      required for each page of memory being registered.  For example, a 64
      MB registration will require 128 KB of contiguous memory with 4 KB
      pages, and it unlikely that such an allocation will succeed on a busy
      system.
      
      This is purely a driver problem: the temporary page list buffer is not
      needed by the hardware, so we can fix this by writing the PBL to the
      hardware in page-sized chunks rather than all at once.  We do this by
      splitting the memory registration operation up into several steps:
      
       - Allocate PBL space in adapter memory for the full registration
       - Copy PBL to adapter memory in chunks
       - Allocate STag and enable memory region
      
      This also allows several other cleanups to the __cxio_tpt_op()
      interface and related parts of the driver.
      
      This change leaves the reregister memory region and memory window
      operations broken, but they already didn't work due to other
      longstanding bugs, so fixing them will be left to a later patch.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      273748cc
  3. 03 5月, 2008 1 次提交
  4. 30 4月, 2008 1 次提交
    • S
      RDMA/cxgb3: Support peer-2-peer connection setup · f8b0dfd1
      Steve Wise 提交于
      Open MPI, Intel MPI and other applications don't respect the iWARP
      requirement that the client (active) side of the connection send the
      first RDMA message.  This class of application connection setup is
      called peer-to-peer.  Typically once the connection is setup, _both_
      sides want to send data.
      
      This patch enables supporting peer-to-peer over the chelsio RNIC by
      enforcing this iWARP requirement in the driver itself as part of RDMA
      connection setup.
      
      Connection setup is extended, when the peer2peer module option is 1,
      such that the MPA initiator will send a 0B Read (the RTR) just after
      connection setup.  The MPA responder will suspend SQ processing until
      the RTR message is received and reply-to.
      
      In the longer term, this will be handled in a standardized way by
      enhancing the MPA negotiation so peers can indicate whether they
      want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
      should be sent.  This will be done by standardizing a few bits of the
      private data in order to negotiate all this.  However this patch
      enables peer-to-peer applications now and allows most of the required
      firmware and driver changes to be done and tested now.
      
      Design:
      
       - Add a module option, peer2peer, to enable this mode.
      
       - New firmware support for peer-to-peer mode:
      
      	- a new bit in the rdma_init WR to tell it to do peer-2-peer
      	  and what form of RTR message to send or expect.
      
      	- process _all_ preposted recvs before moving the connection
      	  into rdma mode.
      
      	- passive side: defer completing the rdma_init WR until all
      	  pre-posted recvs are processed.  Suspend SQ processing until
      	  the RTR is received.
      
      	- active side: expect and process the 0B read WR on offload TX
      	  queue. Defer completing the rdma_init WR until all
      	  pre-posted recvs are processed.  Suspend SQ processing until
      	  the 0B read WR is processed from the offload TX queue.
      
       - If peer2peer is set, driver posts 0B read request on offload TX
         queue just after posting the rdma_init WR to the offload TX queue.
      
       - Add CQ poll logic to ignore unsolicitied read responses.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f8b0dfd1
  5. 17 4月, 2008 1 次提交
  6. 26 1月, 2008 2 次提交
  7. 11 10月, 2007 1 次提交
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
  8. 31 8月, 2007 1 次提交
  9. 10 7月, 2007 2 次提交
  10. 07 5月, 2007 1 次提交
    • R
      IB: Return "maybe missed event" hint from ib_req_notify_cq() · ed23a727
      Roland Dreier 提交于
      The semantics defined by the InfiniBand specification say that
      completion events are only generated when a completions is added to a
      completion queue (CQ) after completion notification is requested.  In
      other words, this means that the following race is possible:
      
      	while (CQ is not empty)
      		ib_poll_cq(CQ);
      	// new completion is added after while loop is exited
      	ib_req_notify_cq(CQ);
      	// no event is generated for the existing completion
      
      To close this race, the IB spec recommends doing another poll of the
      CQ after requesting notification.
      
      However, it is not always possible to arrange code this way (for
      example, we have found that NAPI for IPoIB cannot poll after
      requesting notification).  Also, some hardware (eg Mellanox HCAs)
      actually will generate an event for completions added before the call
      to ib_req_notify_cq() -- which is allowed by the spec, since there's
      no way for any upper-layer consumer to know exactly when a completion
      was really added -- so the extra poll of the CQ is just a waste.
      
      Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
      ib_req_notify_cq() so that it can return a hint about whether the a
      completion may have been added before the request for notification.
      The return value of ib_req_notify_cq() is extended so:
      
      	 < 0	means an error occurred while requesting notification
      	== 0	means notification was requested successfully, and if
      		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
      		events were missed and it is safe to wait for another
      		event.
      	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
      		passed in.  It means that the consumer must poll the
      		CQ again to make sure it is empty to avoid the race
      		described above.
      
      We add a flag to enable this behavior rather than turning it on
      unconditionally, because checking for missed events may incur
      significant overhead for some low-level drivers, and consumers that
      don't care about the results of this test shouldn't be forced to pay
      for the test.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ed23a727
  11. 27 3月, 2007 1 次提交
  12. 07 3月, 2007 1 次提交
  13. 24 2月, 2007 1 次提交
    • A
      RDMA/cxgb3: cleanups · 2b540355
      Adrian Bunk 提交于
      - don't mark static functions in C files as inline - gcc should know
        best whether inlining makes sense
      - never compile the unused cxio_dbg.c
      - make the following needlessly global functions static:
        - cxio_hal.c: cxio_hal_clear_qp_ctx()
        - iwch_provider.c: iwch_get_qp()
      - remove the following unused global functions:
        - cxio_hal.c: cxio_allocate_stag()
        - cxio_resource.: cxio_hal_get_rhdl()
        - cxio_resource.: cxio_hal_put_rhdl()
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      2b540355
  14. 17 2月, 2007 1 次提交
  15. 13 2月, 2007 1 次提交