1. 29 10月, 2015 1 次提交
  2. 22 10月, 2015 1 次提交
  3. 08 10月, 2015 1 次提交
    • C
      IB: split struct ib_send_wr · e622f2f4
      Christoph Hellwig 提交于
      This patch split up struct ib_send_wr so that all non-trivial verbs
      use their own structure which embedds struct ib_send_wr.  This dramaticly
      shrinks the size of a WR for most common operations:
      
      sizeof(struct ib_send_wr) (old):	96
      
      sizeof(struct ib_send_wr):		48
      sizeof(struct ib_rdma_wr):		64
      sizeof(struct ib_atomic_wr):		96
      sizeof(struct ib_ud_wr):		88
      sizeof(struct ib_fast_reg_wr):		88
      sizeof(struct ib_bind_mw_wr):		96
      sizeof(struct ib_sig_handover_wr):	80
      
      And with Sagi's pending MR rework the fast registration WR will also be
      down to a reasonable size:
      
      sizeof(struct ib_fastreg_wr):		64
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
      Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
      Tested-by: NHaggai Eran <haggaie@mellanox.com>
      Tested-by: NSagi Grimberg <sagig@mellanox.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      e622f2f4
  4. 12 6月, 2015 1 次提交
  5. 05 5月, 2015 1 次提交
  6. 16 1月, 2015 4 次提交
  7. 16 12月, 2014 1 次提交
  8. 11 11月, 2014 1 次提交
  9. 02 8月, 2014 1 次提交
    • S
      RDMA/cxgb4: Only call CQ completion handler if it is armed · 678ea9b5
      Steve Wise 提交于
      The function __flush_qp() always calls the ULP's CQ completion handler
      functions even if the CQ was not armed.  This can crash the system if
      the function pointer is NULL. The iSER ULP behaves this way: no
      completion handler and never arm the CQ for notification.  So now we
      track whether the CQ is armed at flush time and only call the
      completion handlers if their CQs were armed.
      
      Also, if the RCQ and SCQ are the same CQ, the completion handler is
      getting called twice.  It should only be called once after all SQ and
      RQ WRs are flushed from the QP.  So rearrange the logic to fix this.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      678ea9b5
  10. 22 7月, 2014 2 次提交
  11. 16 7月, 2014 3 次提交
    • H
      cxgb4/iw_cxgb4: work request logging feature · 7730b4c7
      Hariprasad Shenai 提交于
      This commit enhances the iwarp driver to optionally keep a log of rdma
      work request timining data for kernel mode QPs.  If iw_cxgb4 module option
      c4iw_wr_log is set to non-zero, each work request is tracked and timing
      data maintained in a rolling log that is 4096 entries deep by default.
      Module option c4iw_wr_log_size_order allows specifing a log2 size to use
      instead of the default order of 12 (4096 entries). Both module options
      are read-only and must be passed in at module load time to set them. IE:
      
      modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10
      
      The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
      Writing anything to this file will clear all the timing data.
      Data tracked includes:
      
      - The host time when the work request was posted, just before ringing
      the doorbell.  The host time when the completion was polled by the
      application.  This is also the time the log entry is created.  The delta
      of these two times is the amount of time took processing the work request.
      
      - The qid of the EQ used to post the work request.
      
      - The work request opcode.
      
      - The cqe wr_id field.  For sq completions requests this is the swsqe
      index.  For recv completions this is the MSN of the ingress SEND.
      This value can be used to match log entries from this log with firmware
      flowc event entries.
      
      - The sge timestamp value just before ringing the doorbell when
      posting,  the sge timestamp value just after polling the completion,
      and CQE.timestamp field from the completion itself.  With these three
      timestamps we can track the latency from post to poll, and the amount
      of time the completion resided in the CQ before being reaped by the
      application.  With debug firmware, the sge timestamp is also logged by
      firmware in its flowc history so that we can compute the latency from
      posting the work request until the firmware sees it.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7730b4c7
    • H
      cxgb4/iw_cxgb4: use firmware ord/ird resource limits · 4c2c5763
      Hariprasad Shenai 提交于
      Advertise a larger max read queue depth for qps, and gather the resource limits
      from fw and use them to avoid exhaustinq all the resources.
      
      Design:
      
      cxgb4:
      
      Obtain the max_ordird_qp and max_ird_adapter device params from FW
      at init time and pass them up to the ULDs when they attach.  If these
      parameters are not available, due to older firmware, then hard-code
      the values based on the known values for older firmware.
      iw_cxgb4:
      
      Fix the c4iw_query_device() to report these correct values based on
      adapter parameters.  ibv_query_device() will always return:
      
      max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
      max_res_rd_atom = max_ird_adapter
      
      Bump up the per qp max module option to 32, allowing it to be increased
      by the user up to the device max of max_ordird_qp.  32 seems to be
      sufficient to maximize throughput for streaming read benchmarks.
      
      Fail connection setup if the negotiated IRD exhausts the available
      adapter ird resources.  So the driver will track the amount of ird
      resource in use and not send an RI_WR/INIT to FW that would reduce the
      available ird resources below zero.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2c5763
    • H
      iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21
      Hariprasad Shenai 提交于
      Updates iw_cxgb4 to determine the Ingress Padding Boundary from
      cxgb4_lld_info, and take subsequent actions.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04e10e21
  12. 29 4月, 2014 2 次提交
    • S
      RDMA/cxgb4: Only allow kernel db ringing for T4 devs · c2f9da92
      Steve Wise 提交于
      The whole db drop avoidance stuff is for T4 only.  So we cannot allow
      that to be enabled for T5 devices.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c2f9da92
    • S
      RDMA/cxgb4: Fix endpoint mutex deadlocks · cc18b939
      Steve Wise 提交于
      In cases where the cm calls c4iw_modify_rc_qp() with the endpoint
      mutex held, they must be called with internal == 1.  rx_data() and
      process_mpa_reply() are not doing this.  This causes a deadlock
      because c4iw_modify_rc_qp() might call c4iw_ep_disconnect() in some
      !internal cases, and c4iw_ep_disconnect() acquires the endpoint mutex.
      The design was intended to only do the disconnect for !internal calls.
      
      Change rx_data(), FPDU_MODE case, to call c4iw_modify_rc_qp() with
      internal == 1, and then disconnect only after releasing the mutex.
      
      Change process_mpa_reply() to call c4iw_modify_rc_qp(TERMINATE) with
      internal == 1 and set a new attr flag telling it to send a TERMINATE
      message.  Previously this was implied by !internal.
      
      Change process_mpa_reply() to return whether the caller should
      disconnect after releasing the endpoint mutex.  Now rx_data() will do
      the disconnect in the cases where process_mpa_reply() wants to
      disconnect after the TERMINATE is sent.
      
      Change c4iw_modify_rc_qp() RTS->TERM to only disconnect if !internal,
      and to send a TERMINATE message if attrs->send_term is 1.
      
      Change abort_connection() to not aquire the ep mutex for setting the
      state, and make all calls to abort_connection() do so with the mutex
      held.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      cc18b939
  13. 12 4月, 2014 5 次提交
  14. 21 3月, 2014 2 次提交
  15. 15 3月, 2014 1 次提交
    • S
      cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389
      Steve Wise 提交于
      The current logic suffers from a slow response time to disable user DB
      usage, and also fails to avoid DB FIFO drops under heavy load. This commit
      fixes these deficiencies and makes the avoidance logic more optimal.
      This is done by more efficiently notifying the ULDs of potential DB
      problems, and implements a smoother flow control algorithm in iw_cxgb4,
      which is the ULD that puts the most load on the DB fifo.
      
      Design:
      
      cxgb4:
      
      Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
      the ULD to stop doing user DB writes as quickly as possible.
      
      While user DB usage is disabled, the LLD will accumulate DB write events
      for its queues.  Then once DB usage is reenabled, a single DB write is
      done for each queue with its accumulated write count.  This reduces the
      load put on the DB fifo when reenabling.
      
      iw_cxgb4:
      
      Instead of marking each qp to indicate DB writes are disabled, we create
      a device-global status page that each user process maps.  This allows
      iw_cxgb4 to only set this single bit to disable all DB writes for all
      user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
      doesn't support this, then we fall back to the old approach of marking
      each QP.  Thus we allow the new driver to work with an older libcxgb4.
      
      When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
      via the status page and transition the DB state to STOPPED.  As user
      processes see that DB writes are disabled, they call into iw_cxgb4
      to submit their DB write events.  Since the DB state is in STOPPED,
      the QP trying to write gets enqueued on a new DB "flow control" list.
      As subsequent DB writes are submitted for this flow controlled QP, the
      amount of writes are accumulated for each QP on the flow control list.
      So all the user QPs that are actively ringing the DB get put on this
      list and the number of writes they request are accumulated.
      
      When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
      context, we change the DB state to FLOW_CONTROL, and begin resuming all
      the QPs that are on the flow control list.  This logic runs on until
      the flow control list is empty or we exit FLOW_CONTROL mode (due to
      a DB DROP upcall, for example).  QPs are removed from this list, and
      their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
      called chunks in the code, are removed at one time. The chunk size is 64.
      So 64 QPs are resumed at a time, and before the next chunk is resumed, the
      logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
      quickly and overflowing the FIFO.  Once the flow control list is empty,
      the db state transitions back to NORMAL and user QPs are again allowed
      to write directly to the user DB register.
      
      The algorithm is designed such that if the DB write load is high enough,
      then all the DB writes get submitted by the kernel using this flow
      controlled approach to avoid DB drops.  As the load lightens though, we
      resume to normal DB writes directly by user applications.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05eb2389
  16. 14 8月, 2013 2 次提交
    • S
    • S
      RDMA/cxgb4: Fix QP flush logic · 1cf24dce
      Steve Wise 提交于
      This patch makes following fixes in QP flush logic:
      
      - correctly flushes unsignaled WRs followed by a signaled WR
      - supports for flushing a CQ bound to multiple QPs
      - resets cidx_flush if a active queue starts getting HW CQEs again
      - marks WQ in error when we leave RTS. This was only being done for
        user queues, but we need it for kernel queues too so that
        post_send/post_recv will start returning the appropriate error
        synchronously
      - eats unsignaled read resp CQEs. HW always inserts CQEs so we must
        silently discard them if the read work request was unsignaled.
      - handles QP flushes with pending SW CQEs. The flush and out of order
        completion logic has a bug where if out of order completions are
        flushed but not yet polled by the consumer and the qp is then
        flushed then we end up inserting duplicate completions.
      - c4iw_flush_sq() should only flush wrs that have not already been
        flushed.  Since we already track where in the SQ we've flushed via
        sq.cidx_flush, just start at that point and flush any remaining.
        This bug only caused a problem in the presence of unsignaled work
        requests.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NVipul Pandya <vipul@chelsio.com>
      
      [ Fixed sparse warning due to htonl/ntohl confusion.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      1cf24dce
  17. 31 7月, 2013 1 次提交
  18. 17 4月, 2013 1 次提交
    • T
      RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled · 5b0c2759
      Thadeu Lima de Souza Cascardo 提交于
      Commit c079c287 ("RDMA/cxgb4: Fix error handling in create_qp()")
      broke SQ allocation.  Instead of falling back to host allocation when
      on-chip allocation fails, it tries to allocate both.  And when it
      does, and we try to free the address from the genpool using the host
      address, we hit a BUG and the system crashes as below.
      
      We create a new function that has the previous behavior and properly
      propagate the error, as intended.
      
          kernel BUG at /usr/src/packages/BUILD/kernel-ppc64-3.0.68/linux-3.0/lib/genalloc.c:340!
          Oops: Exception in kernel mode, sig: 5 [#1]
          SMP NR_CPUS=1024 NUMA pSeries
          Modules linked in: rdma_ucm rdma_cm ib_addr ib_cm iw_cm ib_sa ib_mad ib_uverbs iw_cxgb4 ib_core ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables fuse loop dm_mod ipv6 ipv6_lib sr_mod cdrom ibmveth(X) cxgb4 sg ext3 jbd mbcache sd_mod crc_t10dif scsi_dh_emc scsi_dh_hp_sw scsi_dh_alua scsi_dh_rdac scsi_dh ibmvscsic(X) scsi_transport_srp scsi_tgt scsi_mod
          Supported: Yes
          NIP: c00000000037d41c LR: d000000003913824 CTR: c00000000037d3b0
          REGS: c0000001f350ae50 TRAP: 0700   Tainted: G            X  (3.0.68-0.9-ppc64)
          MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 24042482  XER: 00000001
          TASK = c0000001f6f2a840[3616] 'rping' THREAD: c0000001f3508000 CPU: 0
          GPR00: c0000001f6e875c8 c0000001f350b0d0 c000000000fc9690 c0000001f6e875c0
          GPR04: 00000000000c0000 0000000000010000 0000000000000000 c0000000009d482a
          GPR08: 000000006a170000 0000000000100000 c0000001f350b140 c0000001f6e875c8
          GPR12: d000000003915dd0 c000000003f40000 000000003e3ecfa8 c0000001f350bea0
          GPR16: c0000001f350bcd0 00000000003c0000 0000000000040100 c0000001f6e74a80
          GPR20: d00000000399a898 c0000001f6e74ac8 c0000001fad91600 c0000001f6e74ab0
          GPR24: c0000001f7d23f80 0000000000000000 0000000000000002 000000006a170000
          GPR28: 000000000000000c c0000001f584c8d0 d000000003925180 c0000001f6e875c8
          NIP [c00000000037d41c] .gen_pool_free+0x6c/0xf8
          LR [d000000003913824] .c4iw_ocqp_pool_free+0x8c/0xd8 [iw_cxgb4]
          Call Trace:
          [c0000001f350b0d0] [c0000001f350b180] 0xc0000001f350b180 (unreliable)
          [c0000001f350b170] [d000000003913824] .c4iw_ocqp_pool_free+0x8c/0xd8 [iw_cxgb4]
          [c0000001f350b210] [d00000000390fd70] .dealloc_sq+0x90/0xb0 [iw_cxgb4]
          [c0000001f350b280] [d00000000390fe08] .destroy_qp+0x78/0xf8 [iw_cxgb4]
          [c0000001f350b310] [d000000003912738] .c4iw_destroy_qp+0x208/0x2d0 [iw_cxgb4]
          [c0000001f350b460] [d000000003861874] .ib_destroy_qp+0x5c/0x130 [ib_core]
          [c0000001f350b510] [d0000000039911bc] .ib_uverbs_cleanup_ucontext+0x174/0x4f8 [ib_uverbs]
          [c0000001f350b5f0] [d000000003991568] .ib_uverbs_close+0x28/0x70 [ib_uverbs]
          [c0000001f350b670] [c0000000001e7b2c] .__fput+0xdc/0x278
          [c0000001f350b720] [c0000000001a9590] .remove_vma+0x68/0xd8
          [c0000001f350b7b0] [c0000000001a9720] .exit_mmap+0x120/0x160
          [c0000001f350b8d0] [c0000000000af330] .mmput+0x80/0x160
          [c0000001f350b960] [c0000000000b5d0c] .exit_mm+0x1ac/0x1e8
          [c0000001f350ba10] [c0000000000b8154] .do_exit+0x1b4/0x4b8
          [c0000001f350bad0] [c0000000000b84b0] .do_group_exit+0x58/0xf8
          [c0000001f350bb60] [c0000000000ce9f4] .get_signal_to_deliver+0x2f4/0x5d0
          [c0000001f350bc60] [c000000000017ee4] .do_signal_pending+0x6c/0x3e0
          [c0000001f350bdb0] [c0000000000182cc] .do_signal+0x74/0x78
          [c0000001f350be30] [c000000000009e74] do_work+0x24/0x28
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Emil Goode <emilgoode@gmail.com>
      Cc: <stable@vger.kernel.org>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5b0c2759
  19. 23 3月, 2013 1 次提交
  20. 14 3月, 2013 5 次提交
  21. 15 2月, 2013 1 次提交
  22. 01 10月, 2012 1 次提交
  23. 06 9月, 2012 1 次提交