1. 16 6月, 2017 1 次提交
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
  2. 21 4月, 2017 2 次提交
  3. 11 1月, 2017 1 次提交
    • S
      iw_cxgb4: refactor sq/rq drain logic · 4fe7c296
      Steve Wise 提交于
      With the addition of the IB/Core drain API, iw_cxgb4 supported drain
      by watching the CQs when the QP was out of RTS and signalling "drain
      complete" when the last CQE is polled.  This, however, doesn't fully
      support the drain semantics. Namely, the drain logic is supposed to signal
      "drain complete" only when the application has _processed_ the last CQE,
      not just removed them from the CQ.  Thus a small timing hole exists that
      can cause touch after free type bugs in applications using the drain API
      (nvmf, iSER, for example).  So iw_cxgb4 needs a better solution.
      
      The iWARP Verbs spec mandates that "_at some point_ after the QP is
      moved to ERROR", the iWARP driver MUST synchronously fail post_send and
      post_recv calls.  iw_cxgb4 was currently not allowing any posts once the
      QP is in ERROR.  This was in part due to the fact that the HW queues for
      the QP in ERROR state are disabled at this point, so there wasn't much
      else to do but fail the post operation synchronously.  This restriction
      is what drove the first drain implementation in iw_cxgb4 that has the
      above mentioned flaw.
      
      This patch changes iw_cxgb4 to allow post_send and post_recv WRs after
      the QP is moved to ERROR state for kernel mode users, thus still adhering
      to the Verbs spec for user mode users, but allowing flush WRs for kernel
      users.  Since the HW queues are disabled, we just synthesize a CQE for
      this post, queue it to the SW CQ, and then call the CQ event handler.
      This enables proper drain operations for the various storage applications.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4fe7c296
  4. 17 11月, 2016 1 次提交
  5. 08 10月, 2016 1 次提交
    • S
      iw_cxgb4: add fast-path for small REG_MR operations · 49b53a93
      Steve Wise 提交于
      When processing a REG_MR work request, if fw supports the
      FW_RI_NSMR_TPTE_WR work request, and if the page list for this
      registration is <= 2 pages, and the current state of the mr is INVALID,
      then use FW_RI_NSMR_TPTE_WR to pass down a fully populated TPTE for FW
      to write.  This avoids FW having to do an async read of the TPTE blocking
      the SQ until the read completes.
      
      To know if the current MR state is INVALID or not, iw_cxgb4 must track the
      state of each fastreg MR.  The c4iw_mr struct state is updated as REG_MR
      and LOCAL_INV WRs are posted and completed, when a reg_mr is destroyed,
      and when RECV completions are processed that include a local invalidation.
      
      This optimization increases small IO IOPS for both iSER and NVMF.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      49b53a93
  6. 24 8月, 2016 1 次提交
  7. 23 6月, 2016 1 次提交
  8. 27 4月, 2016 1 次提交
    • H
      RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips · 32cc92c7
      Hariprasad S 提交于
      For T4, kernel mode qps don't use the user doorbell. User mode qps during
      flow control db ringing are forced into kernel, where user doorbell is
      treated as kernel doorbell and proper bar2 offset in bar2 virtual space is
      calculated, which incase of T4 is a bogus address, causing a kernel panic
      due to illegal write during doorbell ringing.
      In case of T4, kernel mode qp bar2 virtual address should be 0. Added T4
      check during bar2 virtual address calculation to return 0. Fixed Bar2
      range checks based on bar2 physical address.
      
      The below oops will be fixed
      
        <1>BUG: unable to handle kernel paging request at 000000000002aa08
        <1>IP: [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
        <4>PGD 1416a8067 PUD 15bf35067 PMD 0
        <4>Oops: 0002 [#1] SMP
        <4>last sysfs file:
        /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.4/infiniband/cxgb4_0/node_guid
        <4>CPU 5
        <4>Modules linked in: rdma_ucm rdma_cm ib_cm ib_sa ib_mad ib_uverbs
        ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
        iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack
        ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4
        target_core_iblock target_core_file target_core_pscsi target_core_mod
        configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q
        garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf vhost_net macvtap
        macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support sg joydev
        serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core ioatdma dca
        i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi
        ata_generic ata_piix iw_cxgb4 iw_cm ib_core ib_addr cxgb4 ipv6 dm_mirror
        dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
        <4>
        Supermicro X8ST3/X8ST3
        <4>RIP: 0010:[<ffffffffa011d800>]  [<ffffffffa011d800>]
        c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
        <4>RSP: 0000:ffff880155a03db0  EFLAGS: 00010006
        <4>RAX: 000000000000001d RBX: ffff88013ae5fc00 RCX: ffff880155adb180
        <4>RDX: 000000000002aa00 RSI: 0000000000000001 RDI: ffff88013ae5fdf8
        <4>RBP: ffff880155a03e10 R08: 0000000000000000 R09: 0000000000000001
        <4>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
        <4>R13: 000000000000001d R14: ffff880156414ab0 R15: ffffe8ffffc05b88
        <4>FS:  0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000
        <4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
        <4>CR2: 000000000002aa08 CR3: 000000015bd0e000 CR4: 00000000000007e0
        <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        <4>Process cxgb4 (pid: 394, threadinfo ffff880155a00000, task ffff880156414ab0)
        <4>Stack:
        <4> ffff880156415068 ffff880155adb180 ffff880155a03df0 ffffffffa00a344b
        <4><d> 00000000000003e8 ffff880155920000 0000000000000004 ffff880155920000
        <4><d> ffff88015592d438 ffffffffa00a3860 ffff880155a03fd8 ffffe8ffffc05b88
        <4>Call Trace:
        <4> [<ffffffffa00a344b>] ? enable_txq_db+0x2b/0x80 [cxgb4]
        <4> [<ffffffffa00a3860>] ? process_db_full+0x0/0xa0 [cxgb4]
        <4> [<ffffffffa00a38a6>] process_db_full+0x46/0xa0 [cxgb4]
        <4> [<ffffffff8109fda0>] worker_thread+0x170/0x2a0
        <4> [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40
        <4> [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0
        <4> [<ffffffff810a660e>] kthread+0x9e/0xc0
        <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
        <4> [<ffffffff810a6570>] ? kthread+0x0/0xc0
        <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
        <4>Code: e9 ba 00 00 00 66 0f 1f 44 00 00 44 8b 05 29 07 02 00 45 85 c0 0f 85
        71 02 00 00 8b 83 70 01 00 00 45 0f b7 ed c1 e0 0f 44 09 e8 <89> 42 08 0f ae f8
        66 c7 83 82 01 00 00 00 00 44 0f b7 ab dc 01
        <1>RIP  [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4]
        <4> RSP <ffff880155a03db0>
        <4>CR2: 000000000002aa08`
      
      Based on original work by Bharat Potnuri <bharat@chelsio.com>
      
      Fixes: 74217d4c ("iw_cxgb4: support for bar2 qid densities exceeding the page size")
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Reviewed-by: NLeon Romanovsky <leon@leon.nu>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      32cc92c7
  9. 01 3月, 2016 1 次提交
  10. 24 12月, 2015 1 次提交
  11. 29 10月, 2015 1 次提交
  12. 28 7月, 2015 1 次提交
  13. 13 6月, 2015 1 次提交
  14. 12 6月, 2015 1 次提交
  15. 05 5月, 2015 2 次提交
  16. 16 1月, 2015 4 次提交
  17. 11 11月, 2014 1 次提交
  18. 22 7月, 2014 1 次提交
  19. 16 7月, 2014 2 次提交
    • H
      cxgb4/iw_cxgb4: work request logging feature · 7730b4c7
      Hariprasad Shenai 提交于
      This commit enhances the iwarp driver to optionally keep a log of rdma
      work request timining data for kernel mode QPs.  If iw_cxgb4 module option
      c4iw_wr_log is set to non-zero, each work request is tracked and timing
      data maintained in a rolling log that is 4096 entries deep by default.
      Module option c4iw_wr_log_size_order allows specifing a log2 size to use
      instead of the default order of 12 (4096 entries). Both module options
      are read-only and must be passed in at module load time to set them. IE:
      
      modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10
      
      The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
      Writing anything to this file will clear all the timing data.
      Data tracked includes:
      
      - The host time when the work request was posted, just before ringing
      the doorbell.  The host time when the completion was polled by the
      application.  This is also the time the log entry is created.  The delta
      of these two times is the amount of time took processing the work request.
      
      - The qid of the EQ used to post the work request.
      
      - The work request opcode.
      
      - The cqe wr_id field.  For sq completions requests this is the swsqe
      index.  For recv completions this is the MSN of the ingress SEND.
      This value can be used to match log entries from this log with firmware
      flowc event entries.
      
      - The sge timestamp value just before ringing the doorbell when
      posting,  the sge timestamp value just after polling the completion,
      and CQE.timestamp field from the completion itself.  With these three
      timestamps we can track the latency from post to poll, and the amount
      of time the completion resided in the CQ before being reaped by the
      application.  With debug firmware, the sge timestamp is also logged by
      firmware in its flowc history so that we can compute the latency from
      posting the work request until the firmware sees it.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7730b4c7
    • H
      iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21
      Hariprasad Shenai 提交于
      Updates iw_cxgb4 to determine the Ingress Padding Boundary from
      cxgb4_lld_info, and take subsequent actions.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04e10e21
  20. 11 6月, 2014 1 次提交
    • H
      iw_cxgb4: Allocate and use IQs specifically for indirect interrupts · cf38be6d
      Hariprasad Shenai 提交于
      Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA
      RXQs, which also handle direct interrupts for offload CPLs during RDMA
      connection setup/teardown.  The intended T4 usage model, however, is to
      have indirect interrupts flow through dedicated IQs. IE not to mix
      indirect interrupts with CPL messages in an IQ.  This patch adds the
      concept of RDMA concentrator IQs, or CIQs, setup and maintained by the
      LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will
      flow through the LLD's RDMA RXQs, and CQ interrupts flow through the
      CIQs.
      
      Design:
      
      cxgb4 creates and exports an array of CIQs for the RDMA ULD.  These IQs
      are sized according to the max available CQs available at adapter init.
      In addition, these IQs don't need FL buffers since they only service
      indirect interrupts.  One CIQ is setup per RX channel similar to the
      RDMA RXQs.
      
      iw_cxgb4 will utilize these CIQs based on the vector value passed into
      create_cq().  The num_comp_vectors advertised by iw_cxgb4 will be the
      number of CIQs configured, and thus the vector value will be the index
      into the array of CIQs.
      
      Based on original work by Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf38be6d
  21. 30 5月, 2014 1 次提交
    • Y
      RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp · b6f04d3d
      Yann Droneaud 提交于
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_create_cq_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding
      is not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_create_cq_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
                __u32                      size;                 /*    28     4 */
                __u32                      qid_mask;             /*    32     4 */
      
        -       /* size: 36, cachelines: 1, members: 6 */
        -       /* last cacheline: 36 bytes */
        +       /* size: 40, cachelines: 1, members: 6 */
        +       /* padding: 4 */
        +       /* last cacheline: 40 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      This patch adds an explicit padding at end of structure
      c4iw_create_cq_resp, and, like 92b0ca7c ("IB/mlx5: Fix stack info
      leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_create_cq_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Cc: <stable@vger.kernel.org>
      Fixes: cfdda9d7 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
      Fixes: e24a72a3 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b6f04d3d
  22. 12 4月, 2014 2 次提交
    • S
      RDMA/cxgb4: Use uninitialized_var() · 97df1c67
      Steve Wise 提交于
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      97df1c67
    • S
      RDMA/cxgb4: SQ flush fix · b4e2901c
      Steve Wise 提交于
      There is a race when moving a QP from RTS->CLOSING where a SQ work
      request could be posted after the FW receives the RDMA_RI/FINI WR.
      The SQ work request will never get processed, and should be completed
      with FLUSHED status.  Function c4iw_flush_sq(), however was dropping
      the oldest SQ work request when in CLOSING or IDLE states, instead of
      completing the pending work request. If that oldest pending work
      request was actually complete and has a CQE in the CQ, then when that
      CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent
      SQ/CQ state.
      
      This is a very small timing hole and has only been hit once so far.
      
      The fix is two-fold:
      
      1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
         status regardless of the QP state.
      
      2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
         before moving the state out of RTS.  This ensures that the state
         transition will not happen while another thread is in
         post_rc_send(), because set_state() and post_rc_send() both aquire
         the qp spinlock.  Also, once we transition the state out of RTS,
         subsequent calls to post_rc_send() will fail because the "in error"
         bit is set.  I don't think this fully closes the race where the FW
         can get a FINI followed a SQ work request being posted (because
         they are posted to differente EQs), but the #1 fix will handle the
         issue by flushing the SQ work request.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b4e2901c
  23. 25 3月, 2014 1 次提交
  24. 21 3月, 2014 3 次提交
  25. 14 8月, 2013 2 次提交
    • S
      RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrap · 27ca34f5
      Steve Wise 提交于
      When determining how many WRs are completed with a signaled CQE,
      correctly deal with queue wraps.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NVipul Pandya <vipul@chelsio.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      27ca34f5
    • S
      RDMA/cxgb4: Fix QP flush logic · 1cf24dce
      Steve Wise 提交于
      This patch makes following fixes in QP flush logic:
      
      - correctly flushes unsignaled WRs followed by a signaled WR
      - supports for flushing a CQ bound to multiple QPs
      - resets cidx_flush if a active queue starts getting HW CQEs again
      - marks WQ in error when we leave RTS. This was only being done for
        user queues, but we need it for kernel queues too so that
        post_send/post_recv will start returning the appropriate error
        synchronously
      - eats unsignaled read resp CQEs. HW always inserts CQEs so we must
        silently discard them if the read work request was unsignaled.
      - handles QP flushes with pending SW CQEs. The flush and out of order
        completion logic has a bug where if out of order completions are
        flushed but not yet polled by the consumer and the qp is then
        flushed then we end up inserting duplicate completions.
      - c4iw_flush_sq() should only flush wrs that have not already been
        flushed.  Since we already track where in the SQ we've flushed via
        sq.cidx_flush, just start at that point and flush any remaining.
        This bug only caused a problem in the presence of unsignaled work
        requests.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NVipul Pandya <vipul@chelsio.com>
      
      [ Fixed sparse warning due to htonl/ntohl confusion.  - Roland ]
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      1cf24dce
  26. 29 11月, 2011 1 次提交
  27. 01 11月, 2011 1 次提交
  28. 15 10月, 2011 1 次提交
  29. 18 6月, 2011 1 次提交
  30. 29 9月, 2010 1 次提交