1. 25 6月, 2020 1 次提交
  2. 24 6月, 2020 1 次提交
  3. 23 6月, 2020 4 次提交
  4. 09 6月, 2020 1 次提交
  5. 03 6月, 2020 3 次提交
  6. 30 5月, 2020 2 次提交
    • Y
      RDMA/core: Introduce shared CQ pool API · c7ff819a
      Yamin Friedman 提交于
      Allow a ULP to ask the core to provide a completion queue based on a
      least-used search on a per-device CQ pools. The device CQ pools grow in a
      lazy fashion when more CQs are requested.
      
      This feature reduces the amount of interrupts when using many QPs.  Using
      shared CQs allows for more effcient completion handling. It also reduces
      the amount of overhead needed for CQ contexts.
      
      Test setup:
      Intel(R) Xeon(R) Platinum 8176M CPU @ 2.10GHz servers.
      Running NVMeoF 4KB read IOs over ConnectX-5EX across Spectrum switch.
      TX-depth = 32. The patch was applied in the nvme driver on both the target
      and initiator. Four controllers are accessed from each core. In the
      current test case we have exposed sixteen NVMe namespaces using four
      different subsystems (four namespaces per subsystem) from one NVM port.
      Each controller allocated X queues (RDMA QPs) and attached to Y CQs.
      Before this series we had X == Y, i.e for four controllers we've created
      total of 4X QPs and 4X CQs. In the shared case, we've created 4X QPs and
      only X CQs which means that we have four controllers that share a
      completion queue per core. Until fourteen cores there is no significant
      change in performance and the number of interrupts per second is less than
      a million in the current case.
      ==================================================
      |Cores|Current KIOPs  |Shared KIOPs  |improvement|
      |-----|---------------|--------------|-----------|
      |14   |2332           |2723          |16.7%      |
      |-----|---------------|--------------|-----------|
      |20   |2086           |2712          |30%        |
      |-----|---------------|--------------|-----------|
      |28   |1971           |2669          |35.4%      |
      |=================================================
      |Cores|Current avg lat|Shared avg lat|improvement|
      |-----|---------------|--------------|-----------|
      |14   |767us          |657us         |14.3%      |
      |-----|---------------|--------------|-----------|
      |20   |1225us         |943us         |23%        |
      |-----|---------------|--------------|-----------|
      |28   |1816us         |1341us        |26.1%      |
      ========================================================
      |Cores|Current interrupts|Shared interrupts|improvement|
      |-----|------------------|-----------------|-----------|
      |14   |1.6M/sec          |0.4M/sec         |72%        |
      |-----|------------------|-----------------|-----------|
      |20   |2.8M/sec          |0.6M/sec         |72.4%      |
      |-----|------------------|-----------------|-----------|
      |28   |2.9M/sec          |0.8M/sec         |63.4%      |
      ====================================================================
      |Cores|Current 99.99th PCTL lat|Shared 99.99th PCTL lat|improvement|
      |-----|------------------------|-----------------------|-----------|
      |14   |67ms                    |6ms                    |90.9%      |
      |-----|------------------------|-----------------------|-----------|
      |20   |5ms                     |6ms                    |-10%       |
      |-----|------------------------|-----------------------|-----------|
      |28   |8.7ms                   |6ms                    |25.9%      |
      |===================================================================
      
      Performance improvement with sixteen disks (sixteen CQs per core) is
      comparable.
      
      Link: https://lore.kernel.org/r/1590568495-101621-3-git-send-email-yaminf@mellanox.comSigned-off-by: NYamin Friedman <yaminf@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      c7ff819a
    • Y
      RDMA/core: Add protection for shared CQs used by ULPs · 3446cbd2
      Yamin Friedman 提交于
      A pre-step for adding shared CQs. Add the infrastructure to prevent shared
      CQ users from altering the CQ configurations. For now all cqs are marked
      as private (non-shared). The core driver should use the new force
      functions to perform resize/destroy/moderation changes that are not
      allowed for users of shared CQs.
      
      Link: https://lore.kernel.org/r/1590568495-101621-2-git-send-email-yaminf@mellanox.comSigned-off-by: NYamin Friedman <yaminf@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3446cbd2
  7. 22 5月, 2020 2 次提交
  8. 21 5月, 2020 3 次提交
  9. 18 5月, 2020 1 次提交
  10. 07 5月, 2020 1 次提交
  11. 06 5月, 2020 1 次提交
  12. 03 5月, 2020 3 次提交
  13. 21 2月, 2020 1 次提交
  14. 19 2月, 2020 1 次提交
  15. 17 1月, 2020 3 次提交
  16. 16 1月, 2020 2 次提交
  17. 14 1月, 2020 4 次提交
  18. 08 1月, 2020 4 次提交
  19. 13 12月, 2019 1 次提交
  20. 24 11月, 2019 1 次提交
    • J
      RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e
      Jason Gunthorpe 提交于
      Replace the internal interval tree based mmu notifier with the new common
      mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
      deadlock that can be triggered in ODP:
      
       zap_page_range()
        mmu_notifier_invalidate_range_start()
         [..]
          ib_umem_notifier_invalidate_range_start()
             down_read(&per_mm->umem_rwsem)
        unmap_single_vma()
          [..]
            __split_huge_page_pmd()
              mmu_notifier_invalidate_range_start()
              [..]
                 ib_umem_notifier_invalidate_range_start()
                    down_read(&per_mm->umem_rwsem)   // DEADLOCK
      
              mmu_notifier_invalidate_range_end()
                 up_read(&per_mm->umem_rwsem)
        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
      
      The umem_rwsem is held across the range_start/end as the ODP algorithm for
      invalidate_range_end cannot tolerate changes to the interval
      tree. However, due to the nested invalidation regions the second
      down_read() can deadlock if there are competing writers. The new core code
      provides an alternative scheme to solve this problem.
      
      Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
      Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f25a546e