1. 04 10月, 2012 1 次提交
  2. 02 10月, 2012 1 次提交
    • O
      IB/ipoib: Add more rtnl_link_ops callbacks · 862096a8
      Or Gerlitz 提交于
      Add the rtnl_link_ops changelink and fill_info callbacks, through
      which the admin can now set/get the driver mode, etc policies.
      Maintain the proprietary sysfs entries only for legacy childs.
      
      For child devices, set dev->iflink to point to the parent
      device ifindex, such that user space tools can now correctly
      show the uplink relation as done for vlan, macvlan, etc
      devices. Pointed out by Patrick McHardy <kaber@trash.net>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      862096a8
  3. 01 10月, 2012 3 次提交
  4. 21 9月, 2012 1 次提交
    • O
      IB/ipoib: Add rtnl_link_ops support · 9baa0b03
      Or Gerlitz 提交于
      Add rtnl_link_ops to IPoIB, with the first usage being child device
      create/delete through them. Childs devices are now either legacy ones,
      created/deleted through the ipoib sysfs entries, or RTNL ones.
      
      Adding support for RTNL childs involved refactoring of ipoib_vlan_add
      which is now used by both the sysfs and the link_ops code.
      
      Also, added ndo_uninit entry to support calling unregister_netdevice_queue
      from the rtnl dellink entry. This required removal of calls to
      ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
      since the networking core will invoke ipoib_uninit which does exactly that.
      Signed-off-by: NErez Shitrit <erezsh@mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9baa0b03
  5. 13 9月, 2012 2 次提交
  6. 16 8月, 2012 2 次提交
  7. 15 8月, 2012 2 次提交
  8. 30 7月, 2012 1 次提交
    • S
      IPoIB: Use a private hash table for path lookup in xmit path · b63b70d8
      Shlomo Pongratz 提交于
      Dave Miller <davem@davemloft.net> provided a detailed description of
      why the way IPoIB is using neighbours for its own ipoib_neigh struct
      is buggy:
      
          Any time an ipoib_neigh is changed, a sequence like the following is made:
      
          			spin_lock_irqsave(&priv->lock, flags);
          			/*
          			 * It's safe to call ipoib_put_ah() inside
          			 * priv->lock here, because we know that
          			 * path->ah will always hold one more reference,
          			 * so ipoib_put_ah() will never do more than
          			 * decrement the ref count.
          			 */
          			if (neigh->ah)
          				ipoib_put_ah(neigh->ah);
          			list_del(&neigh->list);
          			ipoib_neigh_free(dev, neigh);
          			spin_unlock_irqrestore(&priv->lock, flags);
          			ipoib_path_lookup(skb, n, dev);
      
          This doesn't work, because you're leaving a stale pointer to the freed up
          ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
          with all the locking done to protect _changes_ to *ipoib_neigh(n), and
          with the code in ipoib_neigh_free() that NULLs out the pointer.
      
          The core issue is that read side calls to *to_ipoib_neigh(n) are not
          being synchronized at all, they are performed without any locking.  So
          whether we hold the lock or not when making changes to *ipoib_neigh(n)
          you still can have threads see references to freed up ipoib_neigh
          objects.
      
          	cpu 1			cpu 2
          	n = *ipoib_neigh()
          				*ipoib_neigh() = NULL
          				kfree(n)
          	n->foo == OOPS
      
          [..]
      
          Perhaps the ipoib code can have a private path database it manages
          entirely itself, which holds all the necessary information and is
          looked up by some generic key which is available easily at transmit
          time and does not involve generic neighbour entries.
      
      See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
      <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
      for the full discussion.
      
      This patch aims to solve the race conditions found in the IPoIB driver.
      
      The patch removes the connection between the core networking neighbour
      structure and the ipoib_neigh structure.  In addition to avoiding the
      race described above, it allows us to handle SKBs carrying IP packets
      that don't have any associated neighbour.
      
      We add an ipoib_neigh hash table with N buckets where the key is the
      destination hardware address.  The ipoib_neigh is fetched from the
      hash table and instead of the stashed location in the neighbour
      structure. The hash table uses both RCU and reference counting to
      guarantee that no ipoib_neigh instance is ever deleted while in use.
      
      Fetching the ipoib_neigh structure instance from the hash also makes
      the special code in ipoib_start_xmit that handles remote and local
      bonding failover redundant.
      
      Aged ipoib_neigh instances are deleted by a garbage collection task
      that runs every M seconds and deletes every ipoib_neigh instance that
      was idle for at least 2*M seconds. The deletion is safe since the
      ipoib_neigh instances are protected using RCU and reference count
      mechanisms.
      
      The number of buckets (N) and frequency of running the GC thread (M),
      are taken from the exported arb_tbl.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b63b70d8
  9. 17 7月, 2012 2 次提交
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
    • C
      srpt: use target_execute_cmd for WRITEs in srpt_handle_rdma_comp · e672a47f
      Christoph Hellwig 提交于
      srpt_handle_rdma_comp is called from kthread context and thus can execute
      target_execute_cmd directly.  srpt_abort_cmd sets the CMD_T_LUN_STOP
      flag directly, and thus the abuse of transport_generic_handle_data can be
      replaced with an opencoded variant of that code path.  I'm still not happy
      about a fabric driver poking into target core internals like this, but
      let's defer the bigger architecture changes for now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      e672a47f
  10. 11 7月, 2012 1 次提交
    • E
      IPoIB: fix skb truesize underestimatiom · b28ba726
      Eric Dumazet 提交于
      Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
      skb_try_coalesce()
      This warning tracks drivers that incorrectly set skb->truesize
      
      IPoIB indeed allocates a full page to store a fragment, but only
      accounts in skb->truesize the used part of the page (frame length)
      
      This patch fixes skb truesize underestimation, and
      also fixes a performance issue, because RX skbs have not enough tailroom
      to allow IP and TCP stacks to pull their header in skb linear part
      without an expensive call to pskb_expand_head()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Cc: Erez Shitrit <erezsh@mellanox.com>
      Cc: Shlomo Pongartz <shlomop@mellanox.com>
      Cc: Roland Dreier <roland@purestorage.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b28ba726
  11. 09 7月, 2012 1 次提交
  12. 06 7月, 2012 1 次提交
  13. 05 7月, 2012 1 次提交
  14. 19 5月, 2012 1 次提交
  15. 15 4月, 2012 2 次提交
  16. 12 4月, 2012 1 次提交
    • R
      IB/srpt: Set srq_type to IB_SRQT_BASIC · 6f360336
      Roland Dreier 提交于
      Since commit 96104eda ("RDMA/core: Add SRQ type field"), kernel
      users of SRQs need to specify srq_type = IB_SRQT_BASIC in struct
      ib_srq_init_attr, or else most low-level drivers will fail in
      when srpt_add_one() calls ib_create_srq() and gets -ENOSYS.
      
      (mlx4_ib works OK nearly all of the time, because it just needs
      srq_type != IB_SRQT_XRC.  And apparently nearly everyone using
      ib_srpt is using mlx4 hardware)
      Reported-by: NAlexey Shvetsov <alexxy@gentoo.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6f360336
  17. 20 3月, 2012 1 次提交
  18. 18 3月, 2012 1 次提交
    • N
      ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception · 187e70a5
      Nicholas Bellinger 提交于
      This patch addresses a bug in srpt_handle_cmd() failure handling where
      send_ioctx->kref is being leaked with the local extra reference after init,
      causing the expected kref_put() in srpt_handle_send_comp() to not be the final
      call to invoke srpt_put_send_ioctx_kref() -> transport_generic_free_cmd() and
      perform se_cmd descriptor memory release.
      
      It also fixes a SCF_SCSI_RESERVATION_CONFLICT handling bug where this code
      is incorrectly falling through to transport_handle_cdb_direct() after
      invoking srpt_queue_status() to send SAM_STAT_RESERVATION_CONFLICT status.
      
      Note this patch is for >= v3.3 mainline code, and current lio-core.git
      code has already been converted to target_submit_cmd() + se_cmd->cmd_kref usage,
      and internal ioctx->kref usage has been removed.  I'm including this patch
      now into target-pending/for-next with a CC' for v3.3 stable.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      187e70a5
  19. 11 3月, 2012 1 次提交
  20. 09 3月, 2012 1 次提交
    • O
      IB: Change CQE "csum_ok" field to a bit flag · d927d505
      Or Gerlitz 提交于
      Use a bit in wc_flags rather then a whole integer to hold the
      "checksum OK" flag.  By itself, this change doesn't reduce the size of
      struct ib_wc on 64bit machines -- it stays on 56 bytes because of
      padding.  However, it will allow to add more fields in the future
      without enlarging the struct.  Also, it will let us have a unified
      approach with future libibverbs checksum offload reporting, because a
      bit flag doesn't break the library ABI.
      
      This patch was suggested during conversation with Liran Liss
      <liranl@mellanox.com>.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d927d505
  21. 06 3月, 2012 1 次提交
    • O
      IB/iser: Post initial receive buffers before sending the final login request · 89e984e2
      Or Gerlitz 提交于
      An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER
      iSCSI session as fully operative.  This means that there is window
      where there are no posted receive buffers on the initiator side, so
      it's possible for the iSER RC connection to break because of RNR NAK /
      retry errors.  To fix this, rely on the flags bits in the login
      request to have FFP (0x3) in the lower nibble as a marker for the
      final login request, and post an initial chunk of receive buffers
      before sending that login request instead of after getting the login
      response.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      89e984e2
  22. 05 3月, 2012 1 次提交
  23. 28 2月, 2012 2 次提交
  24. 26 2月, 2012 3 次提交
  25. 10 2月, 2012 1 次提交
  26. 09 2月, 2012 2 次提交
  27. 07 2月, 2012 1 次提交
  28. 04 2月, 2012 1 次提交
  29. 03 2月, 2012 1 次提交