1. 14 11月, 2006 1 次提交
    • R
      IB/mad: Fix race between cancel and receive completion · 39798695
      Roland Dreier 提交于
      When ib_cancel_mad() is called, it puts the canceled send on a list
      and schedules a "flushed" callback from process context.  However,
      this leaves a window where a receive completion could be processed
      before the send is fully flushed.
      
      This is fine, except that ib_find_send_mad() will find the MAD and
      return it to the receive processing, which results in the sender
      getting both a successful receive and a "flushed" send completion for
      the same request.  Understandably, this confuses the sender, which is
      expecting only one of these two callbacks, and leads to grief such as
      a use-after-free in IPoIB.
      
      Fix this by changing ib_find_send_mad() to return a send struct only
      if the status is still successful (and not "flushed").  The search of
      the send_list already had this check, so this patch just adds the same
      check to the search of the wait_list.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      39798695
  2. 27 9月, 2006 1 次提交
  3. 23 9月, 2006 2 次提交
  4. 25 7月, 2006 1 次提交
  5. 27 6月, 2006 1 次提交
  6. 18 6月, 2006 2 次提交
  7. 13 5月, 2006 1 次提交
    • S
      IB: refcount race fixes · 1b52fa98
      Sean Hefty 提交于
      Fix race condition during destruction calls to avoid possibility of
      accessing object after it has been freed.  Instead of waking up a wait
      queue directly, which is susceptible to a race where the object is
      freed between the reference count going to 0 and the wake_up(), use a
      completion to wait in the function doing the freeing.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      1b52fa98
  8. 20 4月, 2006 1 次提交
  9. 03 4月, 2006 1 次提交
    • M
      IB/mad: fix oops in cancel_mads · 37289efe
      Michael S. Tsirkin 提交于
      We have seen the following OOPs in cancel_mads, when restarting opensm
      multiple times:
      
          Call Trace:
            [<c010549b>] show_stack+0x9b/0xb0
            [<c01055ec>] show_registers+0x11c/0x190
            [<c01057cd>] die+0xed/0x160
            [<c031b966>] do_page_fault+0x3f6/0x5d0
            [<c010511f>] error_code+0x4f/0x60
            [<f8ac4e38>] cancel_mads+0x128/0x150 [ib_mad]
            [<f8ac2811>] unregister_mad_agent+0x11/0x130 [ib_mad]
            [<f8ac2a12>] ib_unregister_mad_agent+0x12/0x20 [ib_mad]
            [<f8b10f23>] ib_umad_close+0xf3/0x130 [ib_umad]
            [<c0162937>] __fput+0x187/0x1c0
            [<c01627a9>] fput+0x19/0x20
            [<c0160f7a>] filp_close+0x3a/0x60
            [<c0121ca8>] put_files_struct+0x68/0xa0
            [<c0103cf7>] do_signal+0x47/0x100
            [<c0103ded>] do_notify_resume+0x3d/0x40
            [<c0103f9e>] work_notifysig+0x13/0x25
      
      We traced this back to local_completions unlocking mad_agent_priv->lock
      while still keeping a pointer into local_list. A later call to
      list_del(&local->completion_list) would then corrupt the list.
      
      To fix this, remove the entry from local_list after looking it up but
      before releasing mad_agent_priv->lock, to prevent cancel_mads from
      finding and freeing it.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      37289efe
  10. 30 3月, 2006 2 次提交
  11. 21 3月, 2006 3 次提交
    • M
      IB/mad: Fix oopsable race on device removal · dc05980d
      Michael S. Tsirkin 提交于
      Fix an oopsable race debugged by Eli Cohen <eli@mellanox.co.il>:
      After removing the port from port_list, ib_mad_port_close flushes
      port_priv->wq before destroying the special QPs. This means that a
      completion event could arrive, and queue a new work in this work queue
      after flush.
      
      This patch also removes an unnecessary flush_workqueue():
      destroy_workqueue() already includes a flush.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      dc05980d
    • J
      IB/umad: Add support for large RMPP transfers · f36e1793
      Jack Morgenstein 提交于
      Add support for sending and receiving large RMPP transfers.  The old
      code supports transfers only as large as a single contiguous kernel
      memory allocation.  This patch uses linked list of memory buffers when
      sending and receiving data to avoid needing contiguous pages for
      larger transfers.
      
        Receive side: copy the arriving MADs in chunks instead of coalescing
        to one large buffer in kernel space.
      
        Send side: split a multipacket MAD buffer to a list of segments,
        (multipacket_list) and send these using a gather list of size 2.
        Also, save pointer to last sent segment, and retrieve requested
        segments by walking list starting at last sent segment. Finally,
        save pointer to last-acked segment.  When retrying, retrieve
        segments for resending relative to this pointer.  When updating last
        ack, start at this pointer.
      Signed-off-by: NJack Morgenstein <jackm@mellanox.co.il>
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f36e1793
    • R
      IB/mad: Simplify SMI by eliminating smi_check_local_dr_smp() · 5e9f71a1
      Ralph Campbell 提交于
      The call to ib_get_agent_port() shouldn't be possible to fail when
      smi_check_local_dr_smp() is called from ib_mad_recv_done_handler().
      When it is called from handle_outgoing_dr_smp(), the device and
      port_num come from mad_agent_priv so I assume the call to
      ib_get_agent_port() shouldn't fail either.  In either case,
      smi_check_local_smp() only uses the mad_agent pointer to check that
      mad_agent->device->process_mad is not NULL.  The device pointer would
      have to be the same as the one passed to smi_check_local_dr_smp()
      since that pointer is used later instead of the one checked in
      smi_check_local_smp().
      Signed-off-by: NHal Rosenstock <halr@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      5e9f71a1
  12. 04 2月, 2006 1 次提交
  13. 28 11月, 2005 1 次提交
  14. 07 11月, 2005 1 次提交
  15. 02 11月, 2005 1 次提交
  16. 26 10月, 2005 1 次提交
    • S
      [IB] Fix MAD layer DMA mappings to avoid touching data buffer once mapped · 34816ad9
      Sean Hefty 提交于
      The MAD layer was violating the DMA API by touching data buffers used
      for sends after the DMA mapping was done.  This causes problems on
      non-cache-coherent architectures, because the device doing DMA won't
      see updates to the payload buffers that exist only in the CPU cache.
      
      Fix this by having all MAD consumers use ib_create_send_mad() to
      allocate their send buffers, and moving the DMA mapping into the MAD
      layer so it can be done just before calling send (and after any
      modifications of the send buffer by the MAD layer).
      
      Tested on a non-cache-coherent PowerPC 440SPe system.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      34816ad9
  17. 18 10月, 2005 1 次提交
  18. 09 10月, 2005 1 次提交
  19. 27 8月, 2005 2 次提交
  20. 28 7月, 2005 15 次提交