1. 22 2月, 2013 2 次提交
  2. 06 2月, 2013 1 次提交
    • S
      IPoIB: Fix crash due to skb double destruct · 7e5a90c2
      Shlomo Pongratz 提交于
      After commit b13912bb ("IPoIB: Call skb_dst_drop() once skb is
      enqueued for sending"), using connected mode and running multithreaded
      iperf for long time, ie
      
          iperf -c <IP> -P 16 -t 3600
      
      results in a crash.
      
      After the above-mentioned patch, the driver is calling skb_orphan() and
      skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
      (also in ipoib_ib.c::ipoib_send())
      
      The problem with this is, as is written in a comment in both routines,
      "it's entirely possible that the completion handler will run before we
      execute anything after the post_send()."  This leads to running the
      skb cleanup routines simultaneously in two different contexts.
      
      The solution is to always perform the skb_orphan() and skb_dst_drop()
      before queueing the send work request.  If an error occurs, then it
      will be no different than the regular case where dev_free_skb_any() in
      the completion path, which is assumed to be after these two routines.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      7e5a90c2
  3. 20 12月, 2012 1 次提交
    • R
      IPoIB: Call skb_dst_drop() once skb is enqueued for sending · b13912bb
      Roland Dreier 提交于
      Currently, IPoIB delays collecting send completions for TX packets in
      order to batch work more efficiently.  It does skb_orphan() right after
      queuing the packets so that destructors run early, to avoid problems
      like holding socket send buffers for too long (since we might not
      collect a send completion until a long time after the packet is
      actually sent).
      
      However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
      at skb_dst() to update the PMTU when it gets a too-long packet.  This
      means that the packets sitting in the TX ring with uncollected send
      completions are holding a reference on the dst.  We've seen this lead
      to pathological behavior with respect to route and neighbour GC.  The
      easy fix for this is to call skb_dst_drop() when we call skb_orphan().
      
      Also, give packets sent via connected mode (CM) the same skb_orphan()
      / skb_dst_drop() treatment that packets sent via datagram mode get.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b13912bb
  4. 01 12月, 2012 12 次提交
  5. 29 11月, 2012 1 次提交
  6. 28 11月, 2012 1 次提交
    • N
      ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref · 9474b043
      Nicholas Bellinger 提交于
      This patch converts the main srpt_handle_cmd() I/O path to use modern
      target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage.  This includes
      dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
      in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
      response callbacks.
      
      It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
      completion of aborted commands, and adds target_wait_for_sess_cmds() into
      srpt_release_channel_work() to allow outstanding I/O to complete during
      session shutdown.
      
      Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
      transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Roland Dreier <roland@kernel.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      9474b043
  7. 07 11月, 2012 1 次提交
    • C
      target: pass sense_reason as a return value · de103c93
      Christoph Hellwig 提交于
      Pass the sense reason as an explicit return value from the I/O submission
      path instead of storing it in struct se_cmd and using negative return
      values.  This cleans up a lot of the code pathes, and with the sparse
      annotations for the new sense_reason_t type allows for much better
      error checking.
      
      (nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
            sense_reason_t with Roland's MODE SELECT changes)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Roland Dreier <roland@purestorage.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      de103c93
  8. 04 10月, 2012 1 次提交
  9. 03 10月, 2012 1 次提交
  10. 02 10月, 2012 1 次提交
    • O
      IB/ipoib: Add more rtnl_link_ops callbacks · 862096a8
      Or Gerlitz 提交于
      Add the rtnl_link_ops changelink and fill_info callbacks, through
      which the admin can now set/get the driver mode, etc policies.
      Maintain the proprietary sysfs entries only for legacy childs.
      
      For child devices, set dev->iflink to point to the parent
      device ifindex, such that user space tools can now correctly
      show the uplink relation as done for vlan, macvlan, etc
      devices. Pointed out by Patrick McHardy <kaber@trash.net>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      862096a8
  11. 01 10月, 2012 3 次提交
  12. 21 9月, 2012 1 次提交
    • O
      IB/ipoib: Add rtnl_link_ops support · 9baa0b03
      Or Gerlitz 提交于
      Add rtnl_link_ops to IPoIB, with the first usage being child device
      create/delete through them. Childs devices are now either legacy ones,
      created/deleted through the ipoib sysfs entries, or RTNL ones.
      
      Adding support for RTNL childs involved refactoring of ipoib_vlan_add
      which is now used by both the sysfs and the link_ops code.
      
      Also, added ndo_uninit entry to support calling unregister_netdevice_queue
      from the rtnl dellink entry. This required removal of calls to
      ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
      since the networking core will invoke ipoib_uninit which does exactly that.
      Signed-off-by: NErez Shitrit <erezsh@mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9baa0b03
  13. 18 9月, 2012 2 次提交
  14. 13 9月, 2012 2 次提交
  15. 16 8月, 2012 2 次提交
  16. 15 8月, 2012 2 次提交
  17. 30 7月, 2012 1 次提交
    • S
      IPoIB: Use a private hash table for path lookup in xmit path · b63b70d8
      Shlomo Pongratz 提交于
      Dave Miller <davem@davemloft.net> provided a detailed description of
      why the way IPoIB is using neighbours for its own ipoib_neigh struct
      is buggy:
      
          Any time an ipoib_neigh is changed, a sequence like the following is made:
      
          			spin_lock_irqsave(&priv->lock, flags);
          			/*
          			 * It's safe to call ipoib_put_ah() inside
          			 * priv->lock here, because we know that
          			 * path->ah will always hold one more reference,
          			 * so ipoib_put_ah() will never do more than
          			 * decrement the ref count.
          			 */
          			if (neigh->ah)
          				ipoib_put_ah(neigh->ah);
          			list_del(&neigh->list);
          			ipoib_neigh_free(dev, neigh);
          			spin_unlock_irqrestore(&priv->lock, flags);
          			ipoib_path_lookup(skb, n, dev);
      
          This doesn't work, because you're leaving a stale pointer to the freed up
          ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
          with all the locking done to protect _changes_ to *ipoib_neigh(n), and
          with the code in ipoib_neigh_free() that NULLs out the pointer.
      
          The core issue is that read side calls to *to_ipoib_neigh(n) are not
          being synchronized at all, they are performed without any locking.  So
          whether we hold the lock or not when making changes to *ipoib_neigh(n)
          you still can have threads see references to freed up ipoib_neigh
          objects.
      
          	cpu 1			cpu 2
          	n = *ipoib_neigh()
          				*ipoib_neigh() = NULL
          				kfree(n)
          	n->foo == OOPS
      
          [..]
      
          Perhaps the ipoib code can have a private path database it manages
          entirely itself, which holds all the necessary information and is
          looked up by some generic key which is available easily at transmit
          time and does not involve generic neighbour entries.
      
      See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
      <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
      for the full discussion.
      
      This patch aims to solve the race conditions found in the IPoIB driver.
      
      The patch removes the connection between the core networking neighbour
      structure and the ipoib_neigh structure.  In addition to avoiding the
      race described above, it allows us to handle SKBs carrying IP packets
      that don't have any associated neighbour.
      
      We add an ipoib_neigh hash table with N buckets where the key is the
      destination hardware address.  The ipoib_neigh is fetched from the
      hash table and instead of the stashed location in the neighbour
      structure. The hash table uses both RCU and reference counting to
      guarantee that no ipoib_neigh instance is ever deleted while in use.
      
      Fetching the ipoib_neigh structure instance from the hash also makes
      the special code in ipoib_start_xmit that handles remote and local
      bonding failover redundant.
      
      Aged ipoib_neigh instances are deleted by a garbage collection task
      that runs every M seconds and deletes every ipoib_neigh instance that
      was idle for at least 2*M seconds. The deletion is safe since the
      ipoib_neigh instances are protected using RCU and reference count
      mechanisms.
      
      The number of buckets (N) and frequency of running the GC thread (M),
      are taken from the exported arb_tbl.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b63b70d8
  18. 17 7月, 2012 2 次提交
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
    • C
      srpt: use target_execute_cmd for WRITEs in srpt_handle_rdma_comp · e672a47f
      Christoph Hellwig 提交于
      srpt_handle_rdma_comp is called from kthread context and thus can execute
      target_execute_cmd directly.  srpt_abort_cmd sets the CMD_T_LUN_STOP
      flag directly, and thus the abuse of transport_generic_handle_data can be
      replaced with an opencoded variant of that code path.  I'm still not happy
      about a fabric driver poking into target core internals like this, but
      let's defer the bigger architecture changes for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      e672a47f
  19. 11 7月, 2012 1 次提交
    • E
      IPoIB: fix skb truesize underestimatiom · b28ba726
      Eric Dumazet 提交于
      Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
      skb_try_coalesce()
      This warning tracks drivers that incorrectly set skb->truesize
      
      IPoIB indeed allocates a full page to store a fragment, but only
      accounts in skb->truesize the used part of the page (frame length)
      
      This patch fixes skb truesize underestimation, and
      also fixes a performance issue, because RX skbs have not enough tailroom
      to allow IP and TCP stacks to pull their header in skb linear part
      without an expensive call to pskb_expand_head()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Cc: Erez Shitrit <erezsh@mellanox.com>
      Cc: Shlomo Pongartz <shlomop@mellanox.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b28ba726
  20. 09 7月, 2012 1 次提交
  21. 06 7月, 2012 1 次提交