1. 05 2月, 2015 1 次提交
    • H
      cxgb4: Add low latency socket busy_poll support · 3a336cb1
      Hariprasad Shenai 提交于
      cxgb_busy_poll, corresponding to ndo_busy_poll, gets called by the socket
      waiting for data.
      
      With busy_poll enabled, improvement is seen in latency numbers as observed by
      collecting netperf TCP_RR numbers.
      Below are latency number, with and without busy-poll, in a switched environment
      for a particular msg size:
      netperf command: netperf -4 -H <ip> -l 30 -t TCP_RR -- -r1,1
      Latency without busy-poll: ~16.25 us
      Latency with busy-poll   : ~08.79 us
      
      Based on original work by Kumar Sanghvi <kumaras@chelsio.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a336cb1
  2. 28 1月, 2015 1 次提交
  3. 27 1月, 2015 4 次提交
  4. 25 1月, 2015 2 次提交
  5. 16 1月, 2015 1 次提交
  6. 09 1月, 2015 3 次提交
  7. 13 12月, 2014 1 次提交
  8. 11 12月, 2014 1 次提交
  9. 10 12月, 2014 3 次提交
  10. 23 11月, 2014 1 次提交
  11. 11 11月, 2014 2 次提交
  12. 15 10月, 2014 1 次提交
  13. 10 10月, 2014 1 次提交
  14. 29 9月, 2014 2 次提交
  15. 22 8月, 2014 1 次提交
  16. 08 8月, 2014 2 次提交
  17. 05 8月, 2014 1 次提交
  18. 16 7月, 2014 1 次提交
    • H
      cxgb4/iw_cxgb4: use firmware ord/ird resource limits · 4c2c5763
      Hariprasad Shenai 提交于
      Advertise a larger max read queue depth for qps, and gather the resource limits
      from fw and use them to avoid exhaustinq all the resources.
      
      Design:
      
      cxgb4:
      
      Obtain the max_ordird_qp and max_ird_adapter device params from FW
      at init time and pass them up to the ULDs when they attach.  If these
      parameters are not available, due to older firmware, then hard-code
      the values based on the known values for older firmware.
      iw_cxgb4:
      
      Fix the c4iw_query_device() to report these correct values based on
      adapter parameters.  ibv_query_device() will always return:
      
      max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
      max_res_rd_atom = max_ird_adapter
      
      Bump up the per qp max module option to 32, allowing it to be increased
      by the user up to the device max of max_ordird_qp.  32 seems to be
      sufficient to maximize throughput for streaming read benchmarks.
      
      Fail connection setup if the negotiated IRD exhausts the available
      adapter ird resources.  So the driver will track the amount of ird
      resource in use and not send an RI_WR/INIT to FW that would reduce the
      available ird resources below zero.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2c5763
  19. 02 7月, 2014 2 次提交
  20. 23 6月, 2014 2 次提交
  21. 11 6月, 2014 1 次提交
    • H
      iw_cxgb4: Allocate and use IQs specifically for indirect interrupts · cf38be6d
      Hariprasad Shenai 提交于
      Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
      RXQs, which also handle direct interrupts for offload CPLs during RDMA
      connection setup/teardown.  The intended T4 usage model, however, is to
      have indirect interrupts flow through dedicated IQs. IE not to mix
      indirect interrupts with CPL messages in an IQ.  This patch adds the
      concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
      LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
      flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
      CIQs.
      
      Design:
      
      cxgb4 creates and exports an array of CIQs for the RDMA ULD.  These IQs
      are sized according to the max available CQs available at adapter init.
      In addition, these IQs don't need FL buffers since they only service
      indirect interrupts.  One CIQ is setup per RX channel similar to the
      RDMA RXQs.
      
      iw_cxgb4 will utilize these CIQs based on the vector value passed into
      create_cq().  The num_comp_vectors advertised by iw_cxgb4 will be the
      number of CIQs configured, and thus the vector value will be the index
      into the array of CIQs.
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf38be6d
  22. 15 3月, 2014 1 次提交
    • S
      cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389
      Steve Wise 提交于
      The current logic suffers from a slow response time to disable user DB
      usage, and also fails to avoid DB FIFO drops under heavy load. This commit
      fixes these deficiencies and makes the avoidance logic more optimal.
      This is done by more efficiently notifying the ULDs of potential DB
      problems, and implements a smoother flow control algorithm in iw_cxgb4,
      which is the ULD that puts the most load on the DB fifo.
      
      Design:
      
      cxgb4:
      
      Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
      the ULD to stop doing user DB writes as quickly as possible.
      
      While user DB usage is disabled, the LLD will accumulate DB write events
      for its queues.  Then once DB usage is reenabled, a single DB write is
      done for each queue with its accumulated write count.  This reduces the
      load put on the DB fifo when reenabling.
      
      iw_cxgb4:
      
      Instead of marking each qp to indicate DB writes are disabled, we create
      a device-global status page that each user process maps.  This allows
      iw_cxgb4 to only set this single bit to disable all DB writes for all
      user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
      doesn't support this, then we fall back to the old approach of marking
      each QP.  Thus we allow the new driver to work with an older libcxgb4.
      
      When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
      via the status page and transition the DB state to STOPPED.  As user
      processes see that DB writes are disabled, they call into iw_cxgb4
      to submit their DB write events.  Since the DB state is in STOPPED,
      the QP trying to write gets enqueued on a new DB "flow control" list.
      As subsequent DB writes are submitted for this flow controlled QP, the
      amount of writes are accumulated for each QP on the flow control list.
      So all the user QPs that are actively ringing the DB get put on this
      list and the number of writes they request are accumulated.
      
      When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
      context, we change the DB state to FLOW_CONTROL, and begin resuming all
      the QPs that are on the flow control list.  This logic runs on until
      the flow control list is empty or we exit FLOW_CONTROL mode (due to
      a DB DROP upcall, for example).  QPs are removed from this list, and
      their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
      called chunks in the code, are removed at one time. The chunk size is 64.
      So 64 QPs are resumed at a time, and before the next chunk is resumed, the
      logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
      quickly and overflowing the FIFO.  Once the flow control list is empty,
      the db state transitions back to NORMAL and user QPs are again allowed
      to write directly to the user DB register.
      
      The algorithm is designed such that if the DB write load is high enough,
      then all the DB writes get submitted by the kernel using this flow
      controlled approach to avoid DB drops.  As the load lightens though, we
      resume to normal DB writes directly by user applications.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05eb2389
  23. 14 3月, 2014 2 次提交
  24. 19 2月, 2014 3 次提交