1. 09 5月, 2007 1 次提交
    • R
      IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules · f7c6a7b5
      Roland Dreier 提交于
      Export ib_umem_get()/ib_umem_release() and put low-level drivers in
      control of when to call ib_umem_get() to pin and DMA map userspace,
      rather than always calling it in ib_uverbs_reg_mr() before calling the
      low-level driver's reg_user_mr method.
      
      Also move these functions to be in the ib_core module instead of
      ib_uverbs, so that driver modules using them do not depend on
      ib_uverbs.
      
      This has a number of advantages:
       - It is better design from the standpoint of making generic code a
         library that can be used or overridden by device-specific code as
         the details of specific devices dictate.
       - Drivers that do not need to pin userspace memory regions do not
         need to take the performance hit of calling ib_mem_get().  For
         example, although I have not tried to implement it in this patch,
         the ipath driver should be able to avoid pinning memory and just
         use copy_{to,from}_user() to access userspace memory regions.
       - Buffers that need special mapping treatment can be identified by
         the low-level driver.  For example, it may be possible to solve
         some Altix-specific memory ordering issues with mthca CQs in
         userspace by mapping CQ buffers with extra flags.
       - Drivers that need to pin and DMA map userspace memory for things
         other than memory regions can use ib_umem_get() directly, instead
         of hacks using extra parameters to their reg_phys_mr method.  For
         example, the mlx4 driver that is pending being merged needs to pin
         and DMA map QP and CQ buffers, but it does not need to create a
         memory key for these buffers.  So the cleanest solution is for mlx4
         to call ib_umem_get() in the create_qp and create_cq methods.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f7c6a7b5
  2. 07 5月, 2007 2 次提交
    • R
      IB: Return "maybe missed event" hint from ib_req_notify_cq() · ed23a727
      Roland Dreier 提交于
      The semantics defined by the InfiniBand specification say that
      completion events are only generated when a completions is added to a
      completion queue (CQ) after completion notification is requested.  In
      other words, this means that the following race is possible:
      
      	while (CQ is not empty)
      		ib_poll_cq(CQ);
      	// new completion is added after while loop is exited
      	ib_req_notify_cq(CQ);
      	// no event is generated for the existing completion
      
      To close this race, the IB spec recommends doing another poll of the
      CQ after requesting notification.
      
      However, it is not always possible to arrange code this way (for
      example, we have found that NAPI for IPoIB cannot poll after
      requesting notification).  Also, some hardware (eg Mellanox HCAs)
      actually will generate an event for completions added before the call
      to ib_req_notify_cq() -- which is allowed by the spec, since there's
      no way for any upper-layer consumer to know exactly when a completion
      was really added -- so the extra poll of the CQ is just a waste.
      
      Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
      ib_req_notify_cq() so that it can return a hint about whether the a
      completion may have been added before the request for notification.
      The return value of ib_req_notify_cq() is extended so:
      
      	 < 0	means an error occurred while requesting notification
      	== 0	means notification was requested successfully, and if
      		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
      		events were missed and it is safe to wait for another
      		event.
      	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
      		passed in.  It means that the consumer must poll the
      		CQ again to make sure it is empty to avoid the race
      		described above.
      
      We add a flag to enable this behavior rather than turning it on
      unconditionally, because checking for missed events may incur
      significant overhead for some low-level drivers, and consumers that
      don't care about the results of this test shouldn't be forced to pay
      for the test.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ed23a727
    • M
      IB: Add CQ comp_vector support · f4fd0b22
      Michael S. Tsirkin 提交于
      Add a num_comp_vectors member to struct ib_device and extend
      ib_create_cq() to pass in a comp_vector parameter -- this parallels
      the userspace libibverbs API.  Update all hardware drivers to set
      num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
      value.  Pass the value of num_comp_vectors to userspace rather than
      hard-coding a value of 1.
      
      We want multiple CQ event vector support (via MSI-X or similar for
      adapters that can generate multiple interrupts), but it's not clear
      how many vectors we want, or how we want to deal with policy issues
      such as how to decide which vector to use or how to set up interrupt
      affinity.  This patch is useful for experimenting, since no core
      changes will be necessary when updating a driver to support multiple
      vectors, and we know that we want to make at least these changes
      anyway.
      Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f4fd0b22
  3. 05 2月, 2007 2 次提交
    • M
      IB: Return qp pointer as part of ib_wc · 062dbb69
      Michael S. Tsirkin 提交于
      struct ib_wc currently only includes the local QP number: this matches
      the IB spec, but seems mostly useless. The following patch replaces
      this with the pointer to qp itself, and updates all low level drivers
      and all users.
      
      This has the following advantages:
      - Ability to get a per-qp context through wc->qp->qp_context
      - Existing drivers already have the qp pointer ready in poll cq, so
        this change actually saves a tiny bit (extra memory read) on data path
        (for ehca it would actually be expensive to find the QP pointer when
        polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
        NULL for ehca)
      - Users that need the QP number can still get it through wc->qp->qp_num
      
      Use case:
      
      In IPoIB connected mode code, I have a common CQ shared by multiple
      QPs.  To track connection usage, I need a way to get at some per-QP
      context upon the completion, and I would like to avoid allocating
      context object per work request just to stick a QP pointer into it.
      With this code, I can just use wc->qp->qp_context.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      062dbb69
    • M
      IB: Include <linux/kref.h> explicitly in <rdma/ib_verbs.h> · 459d6e2a
      Michael S. Tsirkin 提交于
      <rdma/ib_verbs.h> uses struct kref, so it should include <linux/kref.h>
      explicitly to avoid hidden include dependencies.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      459d6e2a
  4. 16 12月, 2006 1 次提交
    • R
      IB: Fix ib_dma_alloc_coherent() wrapper · c59a3da1
      Roland Dreier 提交于
      The ib_dma_alloc_coherent() wrapper uses a u64* for the dma_handle
      parameter, unlike dma_alloc_coherent, which uses dma_addr_t*.  This
      means that we need a temporary variable to handle the case when
      ib_dma_alloc_coherent() just falls through directly to
      dma_alloc_coherent() on architectures where sizeof u64 != sizeof
      dma_addr_t.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      c59a3da1
  5. 14 12月, 2006 1 次提交
  6. 13 12月, 2006 1 次提交
  7. 23 9月, 2006 2 次提交
  8. 18 6月, 2006 4 次提交
  9. 11 4月, 2006 1 次提交
    • J
      IB: simplify static rate encoding · bf6a9e31
      Jack Morgenstein 提交于
      Push translation of static rate to HCA format into low-level drivers,
      where it belongs.  For static rate encoding, use encoding of rate
      field from IB standard PathRecord, with addition of value 0, for
      backwards compatibility with current usage.  The changes are:
      
       - Add enum ib_rate to midlayer includes.
       - Get rid of static rate translation in IPoIB; just use static rate
         directly from Path and MulticastGroup records.
       - Update mthca driver to translate absolute static rate into the
         format used by hardware.  This also fixes mthca's static rate
         handling for HCAs that are capable of 4X DDR.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      bf6a9e31
  10. 21 3月, 2006 5 次提交
  11. 10 1月, 2006 1 次提交
  12. 11 11月, 2005 1 次提交
    • R
      [IB] Have cq_resize() method take an int, not int* · 40de2e54
      Roland Dreier 提交于
      Change the struct ib_device.resize_cq() method to take a plain integer
      that holds the new CQ size, rather than a pointer to an integer that
      it uses to return the new size.  This makes the interface match the
      exported ib_resize_cq() signature, and allows the low-level driver to
      update the CQ size with proper locking if necessary.
      
      No in-tree drivers are exporting this method yet.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      40de2e54
  13. 26 10月, 2005 1 次提交
    • S
      [IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped · 34816ad9
      Sean Hefty 提交于
      The MAD layer was violating the DMA API by touching data buffers used
      for sends after the DMA mapping was done.  This causes problems on
      non-cache-coherent architectures, because the device doing DMA won't
      see updates to the payload buffers that exist only in the CPU cache.
      
      Fix this by having all MAD consumers use ib_create_send_mad() to
      allocate their send buffers, and moving the DMA mapping into the MAD
      layer so it can be done just before calling send (and after any
      modifications of the send buffer by the MAD layer).
      
      Tested on a non-cache-coherent PowerPC 440SPe system.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      34816ad9
  14. 18 10月, 2005 2 次提交
  15. 27 9月, 2005 1 次提交
    • R
      [IB] uverbs: Close some exploitable races · 63c47c28
      Roland Dreier 提交于
      Al Viro pointed out that the current IB userspace verbs interface
      allows userspace to cause mischief by closing file descriptors before
      we're ready, or issuing the same command twice at the same time.  This
      patch closes those races, and fixes other obvious problems such as a
      module reference leak.
      
      Some other interface bogosities will require an ABI change to fix
      properly, so I'm deferring those fixes until 2.6.15.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      63c47c28
  16. 27 8月, 2005 4 次提交
  17. 28 7月, 2005 2 次提交
  18. 08 7月, 2005 1 次提交
    • R
      [PATCH] IB uverbs: core API extensions · e2773c06
      Roland Dreier 提交于
      First of a series of patches which add support for direct userspace access to
      InfiniBand hardware -- so-called "userspace verbs." I believe these patches
      are ready to merge, but a final review would be useful.
      
      These patches should incorporate all of the feedback from the discussion when
      I posted an earlier version back in April (see
      http://lkml.org/lkml/2005/4/4/267 for the start of the thread).  In
      particular, memory pinned for use by userspace is accounted for in
      current->mm->vm_locked and requests to pin memory are checked against
      RLIMIT_MEMLOCK.
      
      This patch:
      
      Modify the ib_verbs.h header file with changes required for InfiniBand
      userspace verbs support.  We add a few structures to keep track of userspace
      context, and extend the driver API so that low-level drivers know when they're
      creating resources that will be used from userspace.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e2773c06
  19. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4