1. 17 2月, 2015 1 次提交
  2. 18 3月, 2014 2 次提交
  3. 14 8月, 2013 2 次提交
  4. 12 7月, 2013 1 次提交
  5. 22 6月, 2013 4 次提交
  6. 08 9月, 2012 1 次提交
  7. 30 7月, 2012 1 次提交
  8. 20 7月, 2012 2 次提交
    • M
      IB/qib: Add congestion control agent implementation · 36a8f01c
      Mike Marciniszyn 提交于
      Add a congestion control agent in the driver that handles gets and
      sets from the congestion control manager in the fabric for the
      Performance Scale Messaging (PSM) library.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      36a8f01c
    • M
      IB/qib: Reduce sdma_lock contention · 551ace12
      Mike Marciniszyn 提交于
      Profiling has shown that sdma_lock is proving a bottleneck for
      performance. The situations include:
       - RDMA reads when krcvqs > 1
       - post sends from multiple threads
      
      For RDMA read the current global qib_wq mechanism runs on all CPUs
      and contends for the sdma_lock when multiple RMDA read requests are
      fielded on differenct CPUs. For post sends, the direct call to
      qib_do_send() from multiple threads causes the contention.
      
      Since the sdma mechanism is per port, this fix converts the existing
      workqueue to a per port single thread workqueue to reduce the lock
      contention in the RDMA read case, and for any other case where the QP
      is scheduled via the workqueue mechanism from more than 1 CPU.
      
      For the post send case, This patch modifies the post send code to test
      for a non empty sdma engine.  If the sdma is not idle the (now single
      thread) workqueue will be used to trigger the send engine instead of
      the direct call to qib_do_send().
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      551ace12
  9. 15 5月, 2012 2 次提交
  10. 26 2月, 2012 1 次提交
    • M
      IB/qib: Add logic for affinity hint · a778f3fd
      Mike Marciniszyn 提交于
      Call irq_set_affinity_hint() to give userspace programs such as
      irqbalance the information to be able to distribute qib interrupts
      appropriately.
      
      The logic allocates all non-receive interrupts to the first CPU local
      to the HCA.  Receive interrupts are allocated round robin starting
      with the second CPU local to the HCA with potential wrap back to the
      second CPU.
      
      This patch also adds a refinement to the name registered for MSI-X
      interrupts so that user level scripts can determine the device
      associated with the IRQs when there are multiple HCAs with a
      potentially different set of local CPUs.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      a778f3fd
  11. 22 10月, 2011 2 次提交
  12. 07 10月, 2011 1 次提交
  13. 23 7月, 2011 1 次提交
    • M
      IB/qib: Defer HCA error events to tasklet · e67306a3
      Mike Marciniszyn 提交于
      With ib_qib options:
      
          options ib_qib krcvqs=1 pcie_caps=0x51 rcvhdrcnt=4096 singleport=1 ibmtu=4
      
      a run of ib_write_bw -a yields the following:
      
          ------------------------------------------------------------------
           #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
           1048576   5000           2910.64            229.80
          ------------------------------------------------------------------
      
      The top cpu use in a profile is:
      
          CPU: Intel Architectural Perfmon, speed 2400.15 MHz (estimated)
          Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask
          of 0x00 (No unit mask) count 1002300
          Counted LLC_MISSES events (Last level cache demand requests from this core that
          missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
          samples  %        samples  %        app name                 symbol name
          15237    29.2642  964      17.1195  ib_qib.ko                qib_7322intr
          12320    23.6618  1040     18.4692  ib_qib.ko                handle_7322_errors
          4106      7.8860  0              0  vmlinux                  vsnprintf
      
      
      Analysis of the stats, profile, the code, and the annotated profile indicate:
       - All of the overflow interrupts (one per packet overflow) are
         serviced on CPU0 with no mitigation on the frequency.
       - All of the receive interrupts are being serviced by CPU0.  (That is
         the way truescale.cmds statically allocates the kctx IRQs to CPU)
       - The code is spending all of its time servicing QIB_I_C_ERROR
         RcvEgrFullErr interrupts on CPU0, starving the packet receive
         processing.
       - The decode_err routine is very inefficient, using a printf variant
         to format a "%s" and continues to loop when the errs mask has been
         cleared.
       - Both qib_7322intr and handle_7322_errors read pci registers, which
         is very inefficient.
      
      The fix does the following:
       - Adds a tasklet to service QIB_I_C_ERROR
       - Replaces the very inefficient scnprintf() with a memcpy().  A field
         is added to qib_hwerror_msgs to save the sizeof("string") at
         compile time so that a strlen is not needed during err_decode().
       - The most frequent errors (Overflows) are serviced first to exit the
         loop as early as possible.
       - The loop now exits as soon as the errs mask is clear rather than
         fruitlessly looping through the msp array.
      
      With this fix the performance changes to:
      
          ------------------------------------------------------------------
           #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]
           1048576   5000           2990.64            2941.35
          ------------------------------------------------------------------
      
      During testing of the error handling overflow patch, it was determined
      that some CPU's were slower when servicing both overflow and receive
      interrupts on CPU0 with different MSI interrupt vectors.
      
      This patch adds an option (krcvq01_no_msi) to not use a dedicated MSI
      interrupt for kctx's < 2 and to service them on the default interrupt.
      For some CPUs, the cost of the interrupt enter/exit is more costly
      than then the additional PCI read in the default handler.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      e67306a3
  14. 31 3月, 2011 1 次提交
  15. 11 1月, 2011 1 次提交
  16. 27 10月, 2010 1 次提交
  17. 06 8月, 2010 1 次提交
  18. 20 7月, 2010 1 次提交
  19. 07 7月, 2010 1 次提交
  20. 24 5月, 2010 1 次提交