• I
    RDMA/mlx5: Improve PI handover performance · de0ae958
    Israel Rukshin 提交于
    In some loads, there is performance degradation when using KLM mkey
    instead of MTT mkey. This is because KLM descriptor access is via
    indirection that might require more HW resources and cycles.
    Using KLM descriptor is not necessary when there are no gaps at the
    data/metadata sg lists. As an optimization, use MTT mkey whenever it
    is possible. For that matter, allocate internal MTT mkey and choose the
    effective pi_mr for in transaction according to the required mapping
    scheme.
    
    The setup of the tested benchmark (using iSER ULP):
     - 2 servers with 24 cores (1 initiator and 1 target)
     - ConnectX-4/ConnectX-5 adapters
     - 24 target sessions with 1 LUN each
     - ramdisk backstore
     - PI active
    
    Performance results running fio (24 jobs, 128 iodepth) using
    write_generate=1 and read_verify=1 (w/w.o/baseline):
    
    bs      IOPS(read)                IOPS(write)
    ----    ----------                ----------
    512   1262.4K/1243.3K/1147.1K    1732.1K/1725.1K/1423.8K
    4k    570902/571233/457874       773982/743293/642080
    32k   72086/72388/71933          96164/71789/93249
    
    Using write_generate=0 and read_verify=0 (w/w.o patch):
    bs      IOPS(read)                IOPS(write)
    ----    ----------                ----------
    512   1600.1K/1572.1K/1393.3K    1830.3K/1823.5K/1557.2K
    4k    937272/921992/762934       815304/753772/646071
    32k   77369/75052/72058          97435/73180/94612
    Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
    Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
    Suggested-by: NMax Gurtovoy <maxg@mellanox.com>
    Suggested-by: NIdan Burstein <idanb@mellanox.com>
    Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
    de0ae958
mlx5_ib.h 41.2 KB