1. 22 7月, 2014 2 次提交
  2. 18 7月, 2014 2 次提交
  3. 16 7月, 2014 5 次提交
    • H
      cxgb4/iw_cxgb4: work request logging feature · 7730b4c7
      Hariprasad Shenai 提交于
      This commit enhances the iwarp driver to optionally keep a log of rdma
      work request timining data for kernel mode QPs.  If iw_cxgb4 module option
      c4iw_wr_log is set to non-zero, each work request is tracked and timing
      data maintained in a rolling log that is 4096 entries deep by default.
      Module option c4iw_wr_log_size_order allows specifing a log2 size to use
      instead of the default order of 12 (4096 entries). Both module options
      are read-only and must be passed in at module load time to set them. IE:
      
      modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10
      
      The timing data is viewable via the iw_cxgb4 debugfs file "wr_log".
      Writing anything to this file will clear all the timing data.
      Data tracked includes:
      
      - The host time when the work request was posted, just before ringing
      the doorbell.  The host time when the completion was polled by the
      application.  This is also the time the log entry is created.  The delta
      of these two times is the amount of time took processing the work request.
      
      - The qid of the EQ used to post the work request.
      
      - The work request opcode.
      
      - The cqe wr_id field.  For sq completions requests this is the swsqe
      index.  For recv completions this is the MSN of the ingress SEND.
      This value can be used to match log entries from this log with firmware
      flowc event entries.
      
      - The sge timestamp value just before ringing the doorbell when
      posting,  the sge timestamp value just after polling the completion,
      and CQE.timestamp field from the completion itself.  With these three
      timestamps we can track the latency from post to poll, and the amount
      of time the completion resided in the CQ before being reaped by the
      application.  With debug firmware, the sge timestamp is also logged by
      firmware in its flowc history so that we can compute the latency from
      posting the work request until the firmware sees it.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7730b4c7
    • H
      cxgb4/iw_cxgb4: display TPTE on errors · 031cf476
      Hariprasad Shenai 提交于
      With ingress WRITE or READ RESPONSE errors, HW provides the offending
      stag from the packet.  This patch adds logic to log the parsed TPTE
      in this case. cxgb4 now exports a function to read a TPTE entry
      from adapter memory.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      031cf476
    • H
      cxgb4/iw_cxgb4: use firmware ord/ird resource limits · 4c2c5763
      Hariprasad Shenai 提交于
      Advertise a larger max read queue depth for qps, and gather the resource limits
      from fw and use them to avoid exhaustinq all the resources.
      
      Design:
      
      cxgb4:
      
      Obtain the max_ordird_qp and max_ird_adapter device params from FW
      at init time and pass them up to the ULDs when they attach.  If these
      parameters are not available, due to older firmware, then hard-code
      the values based on the known values for older firmware.
      iw_cxgb4:
      
      Fix the c4iw_query_device() to report these correct values based on
      adapter parameters.  ibv_query_device() will always return:
      
      max_qp_rd_atom = max_qp_init_rd_atom = min(module_max, max_ordird_qp)
      max_res_rd_atom = max_ird_adapter
      
      Bump up the per qp max module option to 32, allowing it to be increased
      by the user up to the device max of max_ordird_qp.  32 seems to be
      sufficient to maximize throughput for streaming read benchmarks.
      
      Fail connection setup if the negotiated IRD exhausts the available
      adapter ird resources.  So the driver will track the amount of ird
      resource in use and not send an RI_WR/INIT to FW that would reduce the
      available ird resources below zero.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2c5763
    • H
      iw_cxgb4: Detect Ing. Padding Boundary at run-time · 04e10e21
      Hariprasad Shenai 提交于
      Updates iw_cxgb4 to determine the Ingress Padding Boundary from
      cxgb4_lld_info, and take subsequent actions.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04e10e21
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677
  4. 02 7月, 2014 1 次提交
  5. 11 6月, 2014 5 次提交
  6. 10 6月, 2014 1 次提交
  7. 06 6月, 2014 1 次提交
    • Y
      RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp · b7dfa889
      Yann Droneaud 提交于
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_alloc_ucontext_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding is
      not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_alloc_ucontext_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -2,9 +2,8 @@ struct c4iw_alloc_ucontext_resp {
                __u64                      status_page_key;      /*     0     8 */
                __u32                      status_page_size;     /*     8     4 */
      
        -       /* size: 12, cachelines: 1, members: 2 */
        -       /* last cacheline: 12 bytes */
        +       /* size: 16, cachelines: 1, members: 2 */
        +       /* padding: 4 */
        +       /* last cacheline: 16 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      Additionally, as reported by Dan Carpenter, without the implicit
      padding being properly cleared, an information leak would take place
      in most architectures.
      
      This patch adds an explicit padding to struct c4iw_alloc_ucontext_resp,
      and, like 92b0ca7c ("IB/mlx5: Fix stack info leak in
      mlx5_ib_alloc_ucontext()"), makes function c4iw_alloc_ucontext()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_alloc_ucontext_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Link: http://marc.info/?i=1395848977.3297.15.camel@localhost.localdomain
      Link: http://marc.info/?i=20140328082428.GH25192@mwanda
      Cc: <stable@vger.kernel.org>
      Fixes: 05eb2389 ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
      Reported-by: NYann Droneaud <ydroneaud@opteya.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b7dfa889
  8. 03 6月, 2014 3 次提交
    • J
      IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO · 40f2287b
      Jiri Kosina 提交于
      Modify the various routines used to allocate memory resources which
      serve QPs in mlx4 to get an input GFP directive.  Have the Ethernet
      driver to use GFP_KERNEL in it's QP allocations as done prior to this
      commit, and the IB driver to use GFP_NOIO when the IB verbs
      IB_QP_CREATE_USE_GFP_NOIO QP creation flag is provided.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      40f2287b
    • O
      IB: Return error for unsupported QP creation flags · 60093dc0
      Or Gerlitz 提交于
      Fix the usnic and thw qib drivers to err when QP creation flags that
      they don't understand are provided.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      60093dc0
    • Y
      IB: Allow build of hw/ and ulp/ subdirectories independently · 729ee4ef
      Yann Droneaud 提交于
      It is not possible to build only the drivers/infiniband/hw/ (or ulp/)
      subdirectory with command such as:
      
          $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/
      
      This fails with following error messages:
      
          make[2]: Nothing to be done for `all'.
          make[2]: Nothing to be done for `relocs'.
            CHK     include/config/kernel.release
            Using /home/ydroneaud/src/linux as source for kernel
            GEN     /home/ydroneaud/src/linux/obj-x86_64/Makefile
            CHK     include/generated/uapi/linux/version.h
            CHK     include/generated/utsrelease.h
            CALL    /home/ydroneaud/src/linux/scripts/checksyscalls.sh
          /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory
          make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'.  Stop.
          make[1]: *** [drivers/infiniband/hw/] Error 2
          make: *** [sub-make] Error 2
      
      This patch creates a Makefile in hw/ and ulp/ and moves each
      corresponding parts of drivers/infiniband/Makefile in the new
      Makefiles.
      
      It should not break build except if some hw/ drivers or ulp/ were
      allowed previously to be built while CONFIG_INFINIBAND is set to 'n',
      but according to drivers/infiniband/Kconfig, it's not possible. So it
      should be safe to apply.
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      729ee4ef
  9. 02 6月, 2014 2 次提交
  10. 30 5月, 2014 6 次提交
    • Y
      RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp · b6f04d3d
      Yann Droneaud 提交于
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_create_cq_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding
      is not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_create_cq_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
                __u32                      size;                 /*    28     4 */
                __u32                      qid_mask;             /*    32     4 */
      
        -       /* size: 36, cachelines: 1, members: 6 */
        -       /* last cacheline: 36 bytes */
        +       /* size: 40, cachelines: 1, members: 6 */
        +       /* padding: 4 */
        +       /* last cacheline: 40 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      This patch adds an explicit padding at end of structure
      c4iw_create_cq_resp, and, like 92b0ca7c ("IB/mlx5: Fix stack info
      leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_create_cq_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Cc: <stable@vger.kernel.org>
      Fixes: cfdda9d7 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
      Fixes: e24a72a3 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
      Acked-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      b6f04d3d
    • J
      IB/mlx4: Add interface for selecting VFs to enable QP0 via MLX proxy QPs · 65fed8a8
      Jack Morgenstein 提交于
      This commit adds the sysfs interface for enabling QP0 on VFs for
      selected VF/port.
      
      By default, no VFs are enabled for QP0 operation.
      
      To enable QP0 operation on a VF/port, under
      /sys/class/infiniband/mlx4_x/iov/<b:d:f>/ports/x there are two new entries:
      
      - smi_enabled (read-only). Indicates whether smi is currently
        enabled for the indicated VF/port
      
      - enable_smi_admin (rw). Used by the admin to request that smi
        capability be enabled or disabled for the indicated VF/port.
        0 = disable, 1 = enable.
        The requested enablement will occur at the next reset of the
        VF (e.g. driver restart on the VM which owns the VF).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      65fed8a8
    • J
      mlx4: Add infrastructure for selecting VFs to enable QP0 via MLX proxy QPs · 99ec41d0
      Jack Morgenstein 提交于
      This commit adds the infrastructure for enabling selected VFs to
      operate SMI (QP0) MADs without restriction.
      
      Additionally, for these enabled VFs, their QP0 proxy and tunnel QPs
      are MLX QPs.  As such, they operate over VL15.  Therefore, they are
      not affected by "credit" problems or changes in the VLArb table (which
      may shut down VL0).
      
      Non-enabled VFs may only create UD proxy QP0 qps (which are forced by
      the hypervisor to send packets using the q-key it assigns and places
      in the qp-context).  Thus, non-enabled VFs will not pose a security
      risk.  The hypervisor discards any privileged MADs it receives from
      these non-enabled VFs.
      
      By default, all VFs are NOT enabled, and must explicitly be enabled
      by the administrator.
      
      The sysfs interface which operates the VF enablement infrastructure
      is provided in the next commit.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      99ec41d0
    • J
      IB/mlx4: Preparation for VFs to issue/receive SMI (QP0) requests/responses · 97982f5a
      Jack Morgenstein 提交于
      Currently, VFs in SRIOV VFs are denied QP0 access.  The main reason
      for this decision is security, since Subnet Management Datagrams
      (SMPs) are not restricted by network partitioning and may affect the
      physical network topology.  Moreover, even the SM may be denied access
      from portions of the network by setting management keys unknown to the
      SM.
      
      However, it is desirable to grant SMI access to certain privileged
      VFs, so that certain network management activities may be conducted
      within virtual machines instead of the hypervisor.
      
      This commit does the following:
      
      1. Create QP0 tunnel QPs for all VFs.
      
      2. Discard SMI mads sent-from/received-for non-privileged VFs in the
         hypervisor MAD multiplex/demultiplex logic.  SMI mads from/for
         privileged VFs are allowed to pass.
      
      3. MAD_IFC wrapper changes/fixes.  For non-privileged VFs, only
         host-view MAD_IFC commands are allowed, and only for SMI LID-Routed
         GET mads.  For privileged VFs, there are no restrictions.
      
      This commit does not allow privileged VFs as yet.  To determine if a VF
      is privileged, it calls function mlx4_vf_smi_enabled().  This function
      returns 0 unconditionally for now.
      
      The next two commits allow defining and activating privileged VFs.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      97982f5a
    • J
      IB/mlx4: SET_PORT called by mlx4_ib_modify_port should be wrapped · 61565013
      Jack Morgenstein 提交于
      mlx4_ib_modify_port is invoked in IB for resetting the Q_Key violations
      counters and for modifying the IB port capability flags.
      
      For example, when opensm is started up on the hypervisor,
      mlx4_ib_modify_port is called to set the port's IsSM flag.
      
      In multifunction mode, the SET_PORT command used in this flow should
      be wrapped (so that the PF port capability flags are also tracked,
      thus enabling the aggregate of all the VF/PF capability flags to be
      tracked properly).
      
      The procedure mlx4_SET_PORT() in main.c is also renamed to mlx4_ib_SET_PORT()
      to differentiate it from procedure mlx4_SET_PORT() in port.c.
      mlx4_ib_SET_PORT() is used exclusively by mlx4_ib_modify_port().
      
      Finally, the CM invokes ib_modify_port() to set the IsCMSupported flag
      even when running over RoCE.  Therefore, when RoCE is active,
      mlx4_ib_modify_port should return OK unconditionally (since the
      capability flags and qkey violations counter are not relevant).
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      61565013
    • V
      IB/qib: Additional Intel branding changes · 0a66d2bd
      Vinit Agnihotri 提交于
      This patches changes user visible function names containing "qlogic"
      in module init and cleanup.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NVinit Agnihotri <vinit.abhay.agnihotri@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      0a66d2bd
  11. 29 5月, 2014 4 次提交
  12. 28 5月, 2014 8 次提交