1. 23 12月, 2015 1 次提交
  2. 12 12月, 2015 1 次提交
    • C
      IB: add a proper completion queue abstraction · 14d3a3b2
      Christoph Hellwig 提交于
      This adds an abstraction that allows ULPs to simply pass a completion
      object and completion callback with each submitted WR and let the RDMA
      core handle the nitty gritty details of how to handle completion
      interrupts and poll the CQ.
      
      In detail there is a new ib_cqe structure which just contains the
      completion callback, and which can be used to get at the containing
      object using container_of.  It is pointed to by the WR and WC as an
      alternative to the wr_id field, similar to how many ULPs already use
      the field to store a pointer using casts.
      
      A driver using the new completion callbacks allocates it's CQs using
      the new ib_create_cq API, which in addition to the number of CQEs and
      the completion vectors also takes a mode on how we poll for CQEs.
      Three modes are available: direct for drivers that never take CQ
      interrupts and just poll for them, softirq to poll from softirq context
      using the to be renamed blk-iopoll infrastructure which takes care of
      rearming and budgeting, or a workqueue for consumer who want to be
      called from user context.
      
      Thanks a lot to Sagi Grimberg who helped reviewing the API, wrote
      the current version of the workqueue code because my two previous
      attempts sucked too much and converted the iSER initiator to the new
      API.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      14d3a3b2
  3. 08 10月, 2015 1 次提交
    • C
      IB: split struct ib_send_wr · e622f2f4
      Christoph Hellwig 提交于
      This patch split up struct ib_send_wr so that all non-trivial verbs
      use their own structure which embedds struct ib_send_wr.  This dramaticly
      shrinks the size of a WR for most common operations:
      
      sizeof(struct ib_send_wr) (old):	96
      
      sizeof(struct ib_send_wr):		48
      sizeof(struct ib_rdma_wr):		64
      sizeof(struct ib_atomic_wr):		96
      sizeof(struct ib_ud_wr):		88
      sizeof(struct ib_fast_reg_wr):		88
      sizeof(struct ib_bind_mw_wr):		96
      sizeof(struct ib_sig_handover_wr):	80
      
      And with Sagi's pending MR rework the fast registration WR will also be
      down to a reasonable size:
      
      sizeof(struct ib_fastreg_wr):		64
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
      Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
      Tested-by: NHaggai Eran <haggaie@mellanox.com>
      Tested-by: NSagi Grimberg <sagig@mellanox.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      e622f2f4
  4. 31 8月, 2015 2 次提交
  5. 15 7月, 2015 1 次提交
    • Y
      IB/ipoib: Scatter-Gather support in connected mode · c4268778
      Yuval Shaia 提交于
      By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
      performance.
      This MTU plus overhead puts the memory allocation for IP based packets at
      32 4k pages (order 5), which have to be contiguous.
      When the system memory under pressure, it was observed that allocating 128k
      contiguous physical memory is difficult and causes serious errors (such as
      system becomes unusable).
      
      This enhancement resolve the issue by removing the physically contiguous
      memory requirement using Scatter/Gather feature that exists in Linux stack.
      
      With this fix Scatter-Gather will be supported also in connected mode.
      
      This change reverts some of the change made in commit e112373f
      ("IPoIB/cm: Reduce connected mode TX object size").
      
      The ability to use SG in IPoIB CM is possible because the coupling
      between NETIF_F_SG and NETIF_F_CSUM was removed in commit
      ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
      Acked-by: NChristian Marie <christian@ponies.io>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c4268778
  6. 06 5月, 2015 1 次提交
  7. 16 4月, 2015 1 次提交
    • D
      IB/ipoib: Use dedicated workqueues per interface · 0b39578b
      Doug Ledford 提交于
      During my recent work on the rtnl lock deadlock in the IPoIB driver, I
      saw that even once I fixed the apparent races for a single device, as
      soon as that device had any children, new races popped up.  It turns
      out that this is because no matter how well we protect against races
      on a single device, the fact that all devices use the same workqueue,
      and flush_workqueue() flushes *everything* from that workqueue means
      that we would also have to prevent all races between different devices
      (for instance, ipoib_mcast_restart_task on interface ib0 can race with
      ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on
      the rtnl_lock).
      
      There are several possible solutions to this problem:
      
      Make carrier_on_task and mcast_restart_task try to take the rtnl for
      some set period of time and if they fail, then bail.  This runs the
      real risk of dropping work on the floor, which can end up being its
      own separate kind of deadlock.
      
      Set some global flag in the driver that says some device is in the
      middle of going down, letting all tasks know to bail.  Again, this can
      drop work on the floor.
      
      Or the method this patch attempts to use, which is when we bring an
      interface up, create a workqueue specifically for that interface, so
      that when we take it back down, we are flushing only those tasks
      associated with our interface.  In addition, keep the global
      workqueue, but now limit it to only flush tasks.  In this way, the
      flush tasks can always flush the device specific work queues without
      having deadlock issues.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0b39578b
  8. 31 1月, 2015 1 次提交
  9. 16 12月, 2014 1 次提交
    • D
      IPoIB: Use dedicated workqueues per interface · 5141861c
      Doug Ledford 提交于
      During my recent work on the rtnl lock deadlock in the IPoIB driver, I
      saw that even once I fixed the apparent races for a single device, as
      soon as that device had any children, new races popped up.  It turns
      out that this is because no matter how well we protect against races
      on a single device, the fact that all devices use the same workqueue,
      and flush_workqueue() flushes *everything* from that workqueue, we can
      have one device in the middle of a down and holding the rtnl lock and
      another totally unrelated device needing to run mcast_restart_task,
      which wants the rtnl lock and will loop trying to take it unless is
      sees its own FLAG_ADMIN_UP flag go away.  Because the unrelated
      interface will never see its own ADMIN_UP flag drop, the interface
      going down will deadlock trying to flush the queue.  There are several
      possible solutions to this problem:
      
      Make carrier_on_task and mcast_restart_task try to take the rtnl for
      some set period of time and if they fail, then bail.  This runs the
      real risk of dropping work on the floor, which can end up being its
      own separate kind of deadlock.
      
      Set some global flag in the driver that says some device is in the
      middle of going down, letting all tasks know to bail.  Again, this can
      drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
      away, then maybe after a few tries on the rtnl lock we can queue our
      own task back up as a delayed work and return and avoid dropping work
      on the floor that way.  But I'm not 100% convinced that we won't cause
      other problems.
      
      Or the method this patch attempts to use, which is when we bring an
      interface up, create a workqueue specifically for that interface, so
      that when we take it back down, we are flushing only those tasks
      associated with our interface.  In addition, keep the global
      workqueue, but now limit it to only flush tasks.  In this way, the
      flush tasks can always flush the device specific work queues without
      having deadlock issues.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5141861c
  10. 03 6月, 2014 1 次提交
    • O
      IB: Add a QP creation flag to use GFP_NOIO allocations · 09b93088
      Or Gerlitz 提交于
      This addresses a problem where NFS client writes over IPoIB connected
      mode may deadlock on memory allocation/writeback.
      
      The problem is not directly memory reclamation.  There is an indirect
      dependency between network filesystems writing back pages and
      ipoib_cm_tx_init() due to how a kworker is used.  Page reclaim cannot
      make forward progress until ipoib_cm_tx_init() succeeds and it is
      stuck in page reclaim itself waiting for network transmission.
      Ordinarily this situation may be avoided by having the caller use
      GFP_NOFS but ipoib_cm_tx_init() does not have that information.
      
      To address this, take a general approach and add a new QP creation
      flag that tells the low-level hardware driver to use GFP_NOIO for the
      memory allocations related to the new QP.
      
      Use the new flag in the ipoib connected mode path, and if the driver
      doesn't support it, re-issue the QP creation without the flag.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      09b93088
  11. 09 11月, 2013 1 次提交
  12. 14 8月, 2013 1 次提交
    • J
      IPoIB: Fix race in deleting ipoib_neigh entries · 49b8e744
      Jim Foraker 提交于
      In several places, this snippet is used when removing neigh entries:
      
      	list_del(&neigh->list);
      	ipoib_neigh_free(neigh);
      
      The list_del() removes neigh from the associated struct ipoib_path, while
      ipoib_neigh_free() removes neigh from the device's neigh entry lookup
      table.  Both of these operations are protected by the priv->lock
      spinlock.  The table however is also protected via RCU, and so naturally
      the lock is not held when doing reads.
      
      This leads to a race condition, in which a thread may successfully look
      up a neigh entry that has already been deleted from neigh->list.  Since
      the previous deletion will have marked the entry with poison, a second
      list_del() on the object will cause a panic:
      
        #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
           [exception RIP: list_del+16]
           RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
           RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
           RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
           RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
           R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
           R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
           ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
        #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
        #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
        #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
        #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
       #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
       #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca
      
      We move the list_del() into ipoib_neigh_free(), so that deletion happens
      only once, after the entry has been successfully removed from the lookup
      table.  This same behavior is already used in ipoib_del_neighs_by_gid()
      and __ipoib_reap_neigh().
      Signed-off-by: NJim Foraker <foraker1@llnl.gov>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NJack Wang <jinpu.wang@profitbricks.com>
      Reviewed-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      49b8e744
  13. 08 5月, 2013 1 次提交
  14. 17 4月, 2013 1 次提交
  15. 23 3月, 2013 1 次提交
    • M
      IPoIB: Fix send lockup due to missed TX completion · 1ee9e2aa
      Mike Marciniszyn 提交于
      Commit f0dc117a ("IPoIB: Fix TX queue lockup with mixed UD/CM
      traffic") attempts to solve an issue where unprocessed UD send
      completions can deadlock the netdev.
      
      The patch doesn't fully resolve the issue because if more than half
      the tx_outstanding's were UD and all of the destinations are RC
      reachable, arming the CQ doesn't solve the issue.
      
      This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
      ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
      callback is called directly to re-arm the send completion timer.
      
      This issue is seen in very large parallel filesystem deployments
      and the patch has been shown to correct the issue.
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      1ee9e2aa
  16. 06 2月, 2013 1 次提交
    • S
      IPoIB: Fix crash due to skb double destruct · 7e5a90c2
      Shlomo Pongratz 提交于
      After commit b13912bb ("IPoIB: Call skb_dst_drop() once skb is
      enqueued for sending"), using connected mode and running multithreaded
      iperf for long time, ie
      
          iperf -c <IP> -P 16 -t 3600
      
      results in a crash.
      
      After the above-mentioned patch, the driver is calling skb_orphan() and
      skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
      (also in ipoib_ib.c::ipoib_send())
      
      The problem with this is, as is written in a comment in both routines,
      "it's entirely possible that the completion handler will run before we
      execute anything after the post_send()."  This leads to running the
      skb cleanup routines simultaneously in two different contexts.
      
      The solution is to always perform the skb_orphan() and skb_dst_drop()
      before queueing the send work request.  If an error occurs, then it
      will be no different than the regular case where dev_free_skb_any() in
      the completion path, which is assumed to be after these two routines.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      7e5a90c2
  17. 20 12月, 2012 1 次提交
    • R
      IPoIB: Call skb_dst_drop() once skb is enqueued for sending · b13912bb
      Roland Dreier 提交于
      Currently, IPoIB delays collecting send completions for TX packets in
      order to batch work more efficiently.  It does skb_orphan() right after
      queuing the packets so that destructors run early, to avoid problems
      like holding socket send buffers for too long (since we might not
      collect a send completion until a long time after the packet is
      actually sent).
      
      However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
      at skb_dst() to update the PMTU when it gets a too-long packet.  This
      means that the packets sitting in the TX ring with uncollected send
      completions are holding a reference on the dst.  We've seen this lead
      to pathological behavior with respect to route and neighbour GC.  The
      easy fix for this is to call skb_dst_drop() when we call skb_orphan().
      
      Also, give packets sent via connected mode (CM) the same skb_orphan()
      / skb_dst_drop() treatment that packets sent via datagram mode get.
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b13912bb
  18. 03 10月, 2012 1 次提交
  19. 02 10月, 2012 1 次提交
    • O
      IB/ipoib: Add more rtnl_link_ops callbacks · 862096a8
      Or Gerlitz 提交于
      Add the rtnl_link_ops changelink and fill_info callbacks, through
      which the admin can now set/get the driver mode, etc policies.
      Maintain the proprietary sysfs entries only for legacy childs.
      
      For child devices, set dev->iflink to point to the parent
      device ifindex, such that user space tools can now correctly
      show the uplink relation as done for vlan, macvlan, etc
      devices. Pointed out by Patrick McHardy <kaber@trash.net>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      862096a8
  20. 15 8月, 2012 1 次提交
  21. 30 7月, 2012 1 次提交
    • S
      IPoIB: Use a private hash table for path lookup in xmit path · b63b70d8
      Shlomo Pongratz 提交于
      Dave Miller <davem@davemloft.net> provided a detailed description of
      why the way IPoIB is using neighbours for its own ipoib_neigh struct
      is buggy:
      
          Any time an ipoib_neigh is changed, a sequence like the following is made:
      
          			spin_lock_irqsave(&priv->lock, flags);
          			/*
          			 * It's safe to call ipoib_put_ah() inside
          			 * priv->lock here, because we know that
          			 * path->ah will always hold one more reference,
          			 * so ipoib_put_ah() will never do more than
          			 * decrement the ref count.
          			 */
          			if (neigh->ah)
          				ipoib_put_ah(neigh->ah);
          			list_del(&neigh->list);
          			ipoib_neigh_free(dev, neigh);
          			spin_unlock_irqrestore(&priv->lock, flags);
          			ipoib_path_lookup(skb, n, dev);
      
          This doesn't work, because you're leaving a stale pointer to the freed up
          ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
          with all the locking done to protect _changes_ to *ipoib_neigh(n), and
          with the code in ipoib_neigh_free() that NULLs out the pointer.
      
          The core issue is that read side calls to *to_ipoib_neigh(n) are not
          being synchronized at all, they are performed without any locking.  So
          whether we hold the lock or not when making changes to *ipoib_neigh(n)
          you still can have threads see references to freed up ipoib_neigh
          objects.
      
          	cpu 1			cpu 2
          	n = *ipoib_neigh()
          				*ipoib_neigh() = NULL
          				kfree(n)
          	n->foo == OOPS
      
          [..]
      
          Perhaps the ipoib code can have a private path database it manages
          entirely itself, which holds all the necessary information and is
          looked up by some generic key which is available easily at transmit
          time and does not involve generic neighbour entries.
      
      See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
      <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
      for the full discussion.
      
      This patch aims to solve the race conditions found in the IPoIB driver.
      
      The patch removes the connection between the core networking neighbour
      structure and the ipoib_neigh structure.  In addition to avoiding the
      race described above, it allows us to handle SKBs carrying IP packets
      that don't have any associated neighbour.
      
      We add an ipoib_neigh hash table with N buckets where the key is the
      destination hardware address.  The ipoib_neigh is fetched from the
      hash table and instead of the stashed location in the neighbour
      structure. The hash table uses both RCU and reference counting to
      guarantee that no ipoib_neigh instance is ever deleted while in use.
      
      Fetching the ipoib_neigh structure instance from the hash also makes
      the special code in ipoib_start_xmit that handles remote and local
      bonding failover redundant.
      
      Aged ipoib_neigh instances are deleted by a garbage collection task
      that runs every M seconds and deletes every ipoib_neigh instance that
      was idle for at least 2*M seconds. The deletion is safe since the
      ipoib_neigh instances are protected using RCU and reference count
      mechanisms.
      
      The number of buckets (N) and frequency of running the GC thread (M),
      are taken from the exported arb_tbl.
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b63b70d8
  22. 17 7月, 2012 1 次提交
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
  23. 09 7月, 2012 1 次提交
  24. 01 11月, 2011 1 次提交
  25. 19 10月, 2011 2 次提交
  26. 14 10月, 2011 1 次提交
  27. 27 8月, 2011 1 次提交
  28. 20 4月, 2011 1 次提交
  29. 13 1月, 2011 1 次提交
  30. 11 1月, 2011 1 次提交
  31. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  32. 12 3月, 2010 2 次提交
    • O
      IPoIB: Include return code in trace message for ib_post_send() failures · a48f509b
      Or Gerlitz 提交于
      Print the return code of ib_post_send() if it fails to make these
      debugging messages more useful.
      Signed-off-by: NOr Gerlitz <ogerlitz@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      a48f509b
    • E
      IPoIB: Fix TX queue lockup with mixed UD/CM traffic · f0dc117a
      Eli Cohen 提交于
      The IPoIB UD QP reports send completions to priv->send_cq, which is
      usually left unarmed; it only gets armed when the number of
      outstanding send requests reaches the size of the TX queue. This
      arming is done only in the send path for the UD QP.  However, when
      sending CM packets, the net queue may be stopped for the same reasons
      but no measures are taken to recover the UD path from a lockup.
      
      Consider this scenario: a host sends high rate of both CM and UD
      packets, with a TX queue length of N.  If at some time the number of
      outstanding UD packets is more than N/2 and the overall outstanding
      packets is N-1, and CM sends a packet (making the number of
      outstanding sends equal N), the TX queue will be stopped.  When all
      the CM packets complete, the number of outstanding packets will still
      be higher than N/2 so the TX queue will not be restarted.
      
      Fix this by calling ib_req_notify_cq() when the queue is stopped in
      the CM path.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      f0dc117a
  33. 19 2月, 2010 1 次提交
  34. 06 9月, 2009 1 次提交
  35. 03 9月, 2009 1 次提交
  36. 03 6月, 2009 1 次提交
  37. 19 5月, 2009 1 次提交