1. 06 9月, 2009 1 次提交
    • J
      IB/mthca: Don't allow userspace open while recovering from catastrophic error · d8410647
      Jack Morgenstein 提交于
      Userspace apps are supposed to release all ib device resources if they
      receive a fatal async event (IBV_EVENT_DEVICE_FATAL).  However, the
      app has no way of knowing when the device has come back up, except to
      repeatedly attempt ibv_open_device() until it succeeds.
      
      However, currently there is no protection against the open succeeding
      while the device is in being removed following the fatal event.  In
      this case, the open will succeed, but as a result the device waits in
      the middle of its removal until the new app releases its resources --
      and the new app will not do so, since the open succeeded at a point
      following the fatal event generation.
      
      This patch adds an "active" flag to the device. The active flag is set
      to false (in the fatal event flow) before the "fatal" event is
      generated, so any subsequent ibv_dev_open() call to the device will
      fail until the device comes back up, thus preventing the above
      deadlock.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d8410647
  2. 28 5月, 2009 1 次提交
  3. 23 7月, 2008 1 次提交
    • R
      IB/mthca: Keep free count for MTT buddy allocator · e8bb4beb
      Roland Dreier 提交于
      MTT entries are allocated with a buddy allocator, which just keeps
      bitmaps for each level of the buddy table.  However, all free space
      starts out at the highest order, and small allocations start scanning
      from the lowest order.  When the lowest order tables have no free
      space, this can lead to scanning potentially millions of bits before
      finding a free entry at a higher order.
      
      We can avoid this by just keeping a count of how many free entries
      each order has, and skipping the bitmap scan when an order is
      completely empty.  This provides a nice performance boost for a
      negligible increase in memory usage.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      e8bb4beb
  4. 15 7月, 2008 2 次提交
  5. 19 4月, 2008 1 次提交
  6. 17 4月, 2008 2 次提交
  7. 26 1月, 2008 1 次提交
  8. 10 10月, 2007 1 次提交
  9. 07 5月, 2007 1 次提交
    • R
      IB: Return "maybe missed event" hint from ib_req_notify_cq() · ed23a727
      Roland Dreier 提交于
      The semantics defined by the InfiniBand specification say that
      completion events are only generated when a completions is added to a
      completion queue (CQ) after completion notification is requested.  In
      other words, this means that the following race is possible:
      
      	while (CQ is not empty)
      		ib_poll_cq(CQ);
      	// new completion is added after while loop is exited
      	ib_req_notify_cq(CQ);
      	// no event is generated for the existing completion
      
      To close this race, the IB spec recommends doing another poll of the
      CQ after requesting notification.
      
      However, it is not always possible to arrange code this way (for
      example, we have found that NAPI for IPoIB cannot poll after
      requesting notification).  Also, some hardware (eg Mellanox HCAs)
      actually will generate an event for completions added before the call
      to ib_req_notify_cq() -- which is allowed by the spec, since there's
      no way for any upper-layer consumer to know exactly when a completion
      was really added -- so the extra poll of the CQ is just a waste.
      
      Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
      ib_req_notify_cq() so that it can return a hint about whether the a
      completion may have been added before the request for notification.
      The return value of ib_req_notify_cq() is extended so:
      
      	 < 0	means an error occurred while requesting notification
      	== 0	means notification was requested successfully, and if
      		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
      		events were missed and it is safe to wait for another
      		event.
      	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
      		passed in.  It means that the consumer must poll the
      		CQ again to make sure it is empty to avoid the race
      		described above.
      
      We add a flag to enable this behavior rather than turning it on
      unconditionally, because checking for missed events may incur
      significant overhead for some low-level drivers, and consumers that
      don't care about the results of this test shouldn't be forced to pay
      for the test.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ed23a727
  10. 13 2月, 2007 1 次提交
  11. 23 9月, 2006 2 次提交
  12. 10 5月, 2006 1 次提交
    • R
      IB/mthca: Fix race in reference counting · a3285aa4
      Roland Dreier 提交于
      Fix races in in destroying various objects.  If a destroy routine
      waits for an object to become free by doing
      
      	wait_event(&obj->wait, !atomic_read(&obj->refcount));
      	/* now clean up and destroy the object */
      
      and another place drops a reference to the object by doing
      
      	if (atomic_dec_and_test(&obj->refcount))
      		wake_up(&obj->wait);
      
      then this is susceptible to a race where the wait_event() and final
      freeing of the object occur between the atomic_dec_and_test() and the
      wake_up().  And this is a use-after-free, since wake_up() will be
      called on part of the already-freed object.
      
      Fix this in mthca by replacing the atomic_t refcounts with plain old
      integers protected by a spinlock.  This makes it possible to do the
      decrement of the reference count and the wake_up() so that it appears
      as a single atomic operation to the code waiting on the wait queue.
      
      While touching this code, also simplify mthca_cq_clean(): the CQ being
      cleaned cannot go away, because it still has a QP attached to it.  So
      there's no reason to be paranoid and look up the CQ by number; it's
      perfectly safe to use the pointer that the callers already have.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a3285aa4
  13. 13 4月, 2006 1 次提交
    • J
      IB/mthca: Fix max_srq_sge returned by ib_query_device for Tavor devices · 59fef3b1
      Jack Morgenstein 提交于
      The driver allocates SRQ WQEs size with a power of 2 size both for
      Tavor and for memfree. For Tavor, however, the hardware only requires
      the WQE size to be a multiple of 16, not a power of 2, and the max
      number of scatter-gather allowed is reported accordingly by the
      firmware (and this is the value currently returned by
      ib_query_device() and ibv_query_device()).
      
      If the max number of scatter/gather entries reported by the FW is used
      when creating an SRQ, the creation will fail for Tavor, since the
      required WQE size will be increased to the next power of 2, which
      turns out to be larger than the device permitted max WQE size (which
      is not a power of 2).
      
      This patch reduces the reported SRQ max wqe size so that it can be used
      successfully in creating an SRQ on Tavor HCAs.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      59fef3b1
  14. 11 4月, 2006 1 次提交
    • J
      IB: simplify static rate encoding · bf6a9e31
      Jack Morgenstein 提交于
      Push translation of static rate to HCA format into low-level drivers,
      where it belongs.  For static rate encoding, use encoding of rate
      field from IB standard PathRecord, with addition of value 0, for
      backwards compatibility with current usage.  The changes are:
      
       - Add enum ib_rate to midlayer includes.
       - Get rid of static rate translation in IPoIB; just use static rate
         directly from Path and MulticastGroup records.
       - Update mthca driver to translate absolute static rate into the
         format used by hardware.  This also fixes mthca's static rate
         handling for HCAs that are capable of 4X DDR.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      bf6a9e31
  15. 03 4月, 2006 1 次提交
    • R
      IB/mthca: Always build debugging code unless CONFIG_EMBEDDED=y · 227c939b
      Roland Dreier 提交于
      Change the mthca debugging trace output code so that it can enabled
      and disabled at runtime with the debug_level module parameter in
      sysfs.  Also, don't allow CONFIG_INFINIBAND_MTHCA_DEBUG to be disabled
      unless CONFIG_EMBEDDED is selected.  We want users (and especially
      distros) to have this turned on unless they really need to save space,
      because by the time we want debugging output, it's usually too late to
      rebuild a kernel.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      227c939b
  16. 21 3月, 2006 7 次提交
  17. 14 2月, 2006 1 次提交
  18. 31 1月, 2006 1 次提交
  19. 13 1月, 2006 1 次提交
  20. 09 1月, 2006 1 次提交
  21. 11 11月, 2005 1 次提交
  22. 05 11月, 2005 1 次提交
  23. 29 10月, 2005 1 次提交
  24. 28 10月, 2005 1 次提交
    • R
      [IB] mthca: first pass at catastrophic error reporting · 3d155f8c
      Roland Dreier 提交于
      Add some initial support for detecting and reporting catastrophic
      errors reported by Mellanox HCAs.  We start a periodic timer which
      polls the catastrophic error reporting buffer in device memory.  If an
      error is detected, we dump the contents of the buffer for port-mortem
      debugging, and report a fatal asynchronous error to higher levels.
      
      In the future we can try to recover from these errors by resetting the
      device, but this will require some work in higher-level code as well.
      Let's get this in now, so that we at least get catastrophic errors
      reported in logs.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3d155f8c
  25. 18 10月, 2005 3 次提交
  26. 27 8月, 2005 4 次提交