1. 09 7月, 2019 8 次提交
    • Y
      RDMA/core: Provide RDMA DIM support for ULPs · da662979
      Yamin Friedman 提交于
      Added the interface in the infiniband driver that applies the rdma_dim
      adaptive moderation. There is now a special function for allocating an
      ib_cq that uses rdma_dim.
      
      Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
      NVMf between two equal end-hosts with 56 cores across a Mellanox switch
      using null_blk device:
      
      READS without DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 3.8GiB/s | 7.7M | 1401  usec               | 2442  usec
      4k       | 7.0GiB/s | 1.8M | 4817  usec               | 6587  usec
      64k      | 10.7GiB/s| 175k | 9896  usec               | 10028 usec
      
      IO WRITES without DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 3.6GiB/s | 7.5M | 1434  usec               | 2474  usec
      4k       | 6.3GiB/s | 1.6M | 938   usec               | 1221  usec
      64k      | 10.7GiB/s| 175k | 8979  usec               | 12780 usec
      
      IO READS with DIM:
      blk size | BW       | IOPS | 99th percentile latency  | 99.99th latency
      512B     | 4GiB/s   | 8.2M | 816    usec              | 889   usec
      4k       | 10.1GiB/s| 2.65M| 3359   usec              | 5080  usec
      64k      | 10.7GiB/s| 175k | 9896   usec              | 10028 usec
      
      IO WRITES with DIM:
      blk size | BW       | IOPS  | 99th percentile latency | 99.99th latency
      512B     | 3.9GiB/s | 8.1M  | 799   usec              | 922   usec
      4k       | 9.6GiB/s | 2.5M  | 717   usec              | 1004  usec
      64k      | 10.7GiB/s| 176k  | 8586  usec              | 12256 usec
      
      The rdma_dim algorithm was designed to measure the effectiveness of
      moderation on the flow in a general way and thus should be appropriate
      for all RDMA storage protocols.
      
      rdma_dim is configured to be the default option based on performance
      improvement seen after extensive tests.
      Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      da662979
    • Y
      linux/dim: Implement RDMA adaptive moderation (DIM) · f4915455
      Yamin Friedman 提交于
      RDMA DIM implements a different algorithm from net DIM and is based on
      completions which is how we can implement interrupt moderation in RDMA.
      
      The algorithm optimizes for number of completions and ratio between
      completions and events. In order to avoid long latencies, the
      implementation performs fast reduction of moderation level when the
      traffic changes.
      Signed-off-by: NYamin Friedman <yaminf@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f4915455
    • J
      Merge tag 'blk-dim-v2' into rdma.git for-next · 2ef38e38
      Jason Gunthorpe 提交于
      Generic DIM
      
      From: Tal Gilboa and Yamin Fridman
      
      Implement net DIM over a generic DIM library, add RDMA DIM
      
      dim.h lib exposes an implementation of the DIM algorithm for
      dynamically-tuned interrupt moderation for networking interfaces.
      
      We want a similar functionality for other protocols, which might need to
      optimize interrupts differently. Main motivation here is DIM for NVMf
      storage protocol.
      
      Current DIM implementation prioritizes reducing interrupt overhead over
      latency. Also, in order to reduce DIM's own overhead, the algorithm might
      take some time to identify it needs to change profiles. While this is
      acceptable for networking, it might not work well on other scenarios.
      
      Here we propose a new structure to DIM. The idea is to allow a slightly
      modified functionality without the risk of breaking Net DIM behavior for
      netdev. We verified there are no degradations in current DIM behavior with
      the modified solution.
      
      Suggested solution:
      - Common logic is implemented in lib/dim/dim.c
      - Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
        the common logic in dim.c
      - Any new DIM logic will be implemented in "lib/dim/new_dim.c".
        This new implementation will expose modified versions of profiles,
        dim_step() and dim_decision().
      - DIM API is declared in include/linux/dim.h for all implementations.
      
      Pros for this solution are:
      - Zero impact on existing net_dim implementation and usage
      - Relatively more code reuse (compared to two separate solutions)
      - Increased extensibility
      
      Required for dependencies in the next series.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2ef38e38
    • D
      IB/mlx5: Report correctly tag matching rendezvous capability · 89705e92
      Danit Goldberg 提交于
      Userspace expects the IB_TM_CAP_RC bit to indicate that the device
      supports RC transport tag matching with rendezvous offload. However the
      firmware splits this into two capabilities for eager and rendezvous tag
      matching.
      
      Only if the FW supports both modes should userspace be told the tag
      matching capability is available.
      
      Cc: <stable@vger.kernel.org> # 4.13
      Fixes: eb761894 ("IB/mlx5: Fill XRQ capabilities")
      Signed-off-by: NDanit Goldberg <danitg@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      89705e92
    • M
      docs: infiniband: add it to the driver-api bookset · a3a400da
      Mauro Carvalho Chehab 提交于
      While this contains some uAPI stuff, it was intended to be read by a
      kernel doc. So, let's not move it to a different dir, but, instead, just
      add it to the driver-api bookset.
      Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a3a400da
    • J
      Merge branch 'vhca-tunnel' into rdma.git for-next · 20893d9d
      Jason Gunthorpe 提交于
      Max Gurtovoy says:
      
      ====================
      Those two patches introduce VHCA tunnel mechanism to DEVX interface
      needed for Bluefield SOC. See extensive commit messages for more
      information.
      ====================
      
      Based on the mlx5-next branch from
      git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux for
      dependencies
      
      * branch 'vcha-tunnel':
        IB/mlx5: Implement VHCA tunnel mechanism in DEVX
        net/mlx5: Introduce VHCA tunnel device capability
      20893d9d
    • M
      IB/mlx5: Implement VHCA tunnel mechanism in DEVX · b6142608
      Max Gurtovoy 提交于
      This mechanism will allow function-A to perform operations "on behalf" of
      function-B via tunnel object. Function-A will have privileges for creating
      and using this tunnel object.
      
      For example, in the device emulation feature presented in Bluefield-1 SoC,
      using device emulation capability, one can present NVMe function to the
      host OS.
      
      Since the NVMe function doesn't have a normal command interface to the HCA
      HW, here is a need to create a channel that will be able to issue commands
      "on behalf" of this function.
      
      This channel is the VHCA_TUNNEL general object. The emulation software
      will create this tunnel for every managed function and issue commands via
      devx general cmd interface using the appropriate tunnel ID. When devX
      context will receive a command with non-zero vhca_tunnel_id, it will pass
      the command as-is down to the HCA.
      
      All the validation, security and resource tracking of the commands and the
      created tunneled objects is in the responsibility of the HCA FW. When a
      VHCA_TUNNEL object destroyed, the device will issue an internal
      FLR (function level reset) to the emulated function associated with this
      tunnel. This will destroy all the created resources using the tunnel
      mechanism.
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      b6142608
    • J
      RDMA/rvt: Do not use a kernel header in the ABI · f10ff380
      Jason Gunthorpe 提交于
      rvt was using ib_sge as part of it's ABI, which is not allowed. Introduce
      a new struct with the same layout and use it instead.
      
      Fixes: dabac6e4 ("IB/hfi1: Move receive work queue struct into uapi directory")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f10ff380
  2. 08 7月, 2019 1 次提交
    • J
      RDMA/siw: Fix DEFINE_PER_CPU compilation when ARCH_NEEDS_WEAK_PER_CPU · 4c7d6dcd
      Jason Gunthorpe 提交于
      The initializer for the variable cannot be inside the macro (and zero
      initialization isn't needed anyhow).
      
      include/linux/percpu-defs.h:92:33: warning: '__pcpu_unique_use_cnt' initialized and declared 'extern'
        extern __PCPU_DUMMY_ATTRS char __pcpu_unique_##name;  \
                                       ^~~~~~~~~~~~~~
      include/linux/percpu-defs.h:115:2: note: in expansion of macro 'DEFINE_PER_CPU_SECTION'
        DEFINE_PER_CPU_SECTION(type, name, "")
        ^~~~~~~~~~~~~~~~~~~~~~
      drivers/infiniband/sw/siw/siw_main.c:129:8: note: in expansion of macro 'DEFINE_PER_CPU'
       static DEFINE_PER_CPU(atomic_t, use_cnt = ATOMIC_INIT(0));
              ^~~~~~~~~~~~~~
      
      Also the rules for PER_CPU require the variable names to be globally
      unique, so prefix them with siw_
      
      Fixes: b9be6f18 ("rdma/siw: transmit path")
      Fixes: bdcf26bf ("rdma/siw: network and RDMA core interface")
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      4c7d6dcd
  3. 07 7月, 2019 6 次提交
  4. 05 7月, 2019 25 次提交