1. 18 6月, 2011 2 次提交
  2. 25 5月, 2011 1 次提交
    • S
      RDMA/cxgb4: Use completion objects for event blocking · c337374b
      Steve Wise 提交于
      There exists a race condition when using wait_queue_head_t objects
      that are declared on the stack.  This was being done in a few places
      where we are sending work requests to the FW and awaiting replies, but
      we don't have an endpoint structure with an embedded c4iw_wr_wait
      struct.  So the code was allocating it locally on the stack.  Bad
      design.  The race is:
      
        1) thread on cpuX declares the wait_queue_head_t on the stack, then
           posts a firmware WR with that wait object ptr as the cookie to be
           returned in the WR reply.  This thread will proceed to block in
           wait_event_timeout() but before it does:
      
        2) An interrupt runs on cpuY with the WR reply.  fw6_msg() handles
           this and calls c4iw_wake_up().  c4iw_wake_up() sets the condition
           variable in the c4iw_wr_wait object to TRUE and will call
           wake_up(), but before it calls wake_up():
      
        3) The thread on cpuX calls c4iw_wait_for_reply(), which calls
           wait_event_timeout().  The wait_event_timeout() macro checks the
           condition variable and returns immediately since it is TRUE.  So
           this thread never blocks/sleeps. The function then returns
           effectively deallocating the c4iw_wr_wait object that was on the
           stack.
      
        4) So at this point cpuY has a pointer to the c4iw_wr_wait object
           that is no longer valid.  Further its pointing to a stack frame
           that might now be in use by some other context/thread.  So cpuY
           continues execution and calls wake_up() on a ptr to a wait object
           that as been effectively deallocated.
      
      This race, when it hits, can cause a crash in wake_up(), which I've
      seen under heavy stress. It can also corrupt the referenced stack
      which can cause any number of failures.
      
      The fix:
      
      Use struct completion, which supports on-stack declarations.
      Completions use a spinlock around setting the condition to true and
      the wake up so that steps 2 and 4 above are atomic and step 3 can
      never happen in-between.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      c337374b
  3. 10 5月, 2011 5 次提交
  4. 04 5月, 2011 1 次提交
  5. 27 4月, 2011 1 次提交
  6. 15 3月, 2011 7 次提交
  7. 13 3月, 2011 1 次提交
  8. 03 3月, 2011 1 次提交
  9. 02 3月, 2011 2 次提交
  10. 29 1月, 2011 2 次提交
  11. 11 1月, 2011 2 次提交
  12. 15 11月, 2010 1 次提交
  13. 27 10月, 2010 1 次提交
  14. 24 10月, 2010 1 次提交
  15. 23 10月, 2010 2 次提交
  16. 18 10月, 2010 1 次提交
  17. 12 10月, 2010 2 次提交
  18. 29 9月, 2010 7 次提交