1. 04 1月, 2020 2 次提交
  2. 24 11月, 2019 1 次提交
    • J
      RDMA/odp: Use mmu_interval_notifier_insert() · f25a546e
      Jason Gunthorpe 提交于
      Replace the internal interval tree based mmu notifier with the new common
      mmu_interval_notifier_insert() API. This removes a lot of code and fixes a
      deadlock that can be triggered in ODP:
      
       zap_page_range()
        mmu_notifier_invalidate_range_start()
         [..]
          ib_umem_notifier_invalidate_range_start()
             down_read(&per_mm->umem_rwsem)
        unmap_single_vma()
          [..]
            __split_huge_page_pmd()
              mmu_notifier_invalidate_range_start()
              [..]
                 ib_umem_notifier_invalidate_range_start()
                    down_read(&per_mm->umem_rwsem)   // DEADLOCK
      
              mmu_notifier_invalidate_range_end()
                 up_read(&per_mm->umem_rwsem)
        mmu_notifier_invalidate_range_end()
           up_read(&per_mm->umem_rwsem)
      
      The umem_rwsem is held across the range_start/end as the ODP algorithm for
      invalidate_range_end cannot tolerate changes to the interval
      tree. However, due to the nested invalidation regions the second
      down_read() can deadlock if there are competing writers. The new core code
      provides an alternative scheme to solve this problem.
      
      Fixes: ca748c39 ("RDMA/umem: Get rid of per_mm->notifier_count")
      Link: https://lore.kernel.org/r/20191112202231.3856-6-jgg@ziepe.caTested-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f25a546e
  3. 29 10月, 2019 1 次提交
  4. 05 10月, 2019 1 次提交
    • J
      RDMA/odp: Lift umem_mutex out of ib_umem_odp_unmap_dma_pages() · 9dc775e7
      Jason Gunthorpe 提交于
      This fixes a race of the form:
          CPU0                               CPU1
      mlx5_ib_invalidate_range()     mlx5_ib_invalidate_range()
      				 // This one actually makes npages == 0
      				 ib_umem_odp_unmap_dma_pages()
      				 if (npages == 0 && !dying)
        // This one does nothing
        ib_umem_odp_unmap_dma_pages()
        if (npages == 0 && !dying)
           dying = 1;
                                          dying = 1;
      				    schedule_work(&umem_odp->work);
           // Double schedule of the same work
           schedule_work(&umem_odp->work);  // BOOM
      
      npages and dying must be read and written under the umem_mutex lock.
      
      Since whenever ib_umem_odp_unmap_dma_pages() is called mlx5 must also call
      mlx5_ib_update_xlt, and both need to be done in the same locking region,
      hoist the lock out of unmap.
      
      This avoids an expensive double critical section in
      mlx5_ib_invalidate_range().
      
      Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
      Link: https://lore.kernel.org/r/20191001153821.23621-4-jgg@ziepe.caReviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      9dc775e7
  5. 14 9月, 2019 1 次提交
  6. 22 8月, 2019 12 次提交
  7. 08 8月, 2019 1 次提交
    • Y
      IB/mlx5: Fix implicit MR release flow · f591822c
      Yishai Hadas 提交于
      Once implicit MR is being called to be released by
      ib_umem_notifier_release() its leaves were marked as "dying".
      
      However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
      called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
      when those leaves were marked as "dying".
      
      As such ib_umem_release() for the leaves won't be called and their MRs
      will be leaked as well.
      
      When an application exits/killed without calling dereg_mr we might hit the
      above flow.
      
      This fatal scenario is reported by WARN_ON() upon
      mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
      call trace can be seen below.
      
      Originally the "dying" mark as part of ib_umem_notifier_release() was
      introduced to prevent pagefault_mr() from returning a success response
      once this happened. However, we already have today the completion
      mechanism so no need for that in those flows any more.  Even in case a
      success response will be returned the firmware will not find the pages and
      an error will be returned in the following call as a released mm will
      cause ib_umem_odp_map_dma_pages() to permanently fail mmget_not_zero().
      
      Fix the above issue by dropping the "dying" from the above flows.  The
      other flows that are using "dying" are still needed it for their
      synchronization purposes.
      
         WARNING: CPU: 1 PID: 7218 at
         drivers/infiniband/hw/mlx5/main.c:2004
      		  mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
         CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G     E
      	       5.2.0-rc6+ #13
         Call Trace:
         uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
         ib_uverbs_close+0x1f/0x80 [ib_uverbs]
         __fput+0xbe/0x250
         task_work_run+0x88/0xa0
         do_exit+0x2cb/0xc30
         ? __fput+0x14b/0x250
         do_group_exit+0x39/0xb0
         get_signal+0x191/0x920
         ? _raw_spin_unlock_bh+0xa/0x20
         ? inet_csk_accept+0x229/0x2f0
         do_signal+0x36/0x5e0
         ? put_unused_fd+0x5b/0x70
         ? __sys_accept4+0x1a6/0x1e0
         ? inet_hash+0x35/0x40
         ? release_sock+0x43/0x90
         ? _raw_spin_unlock_bh+0xa/0x20
         ? inet_listen+0x9f/0x120
         exit_to_usermode_loop+0x5c/0xc6
         do_syscall_64+0x182/0x1b0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 81713d37 ("IB/mlx5: Add implicit MR support")
      Link: https://lore.kernel.org/r/20190805083010.21777-1-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f591822c
  8. 21 6月, 2019 1 次提交
  9. 19 6月, 2019 1 次提交
  10. 28 5月, 2019 1 次提交
  11. 22 5月, 2019 1 次提交
  12. 15 5月, 2019 1 次提交
    • J
      mm/mmu_notifier: convert user range->blockable to helper function · dfcd6660
      Jérôme Glisse 提交于
      Use the mmu_notifier_range_blockable() helper function instead of directly
      dereferencing the range->blockable field.  This is done to make it easier
      to change the mmu_notifier range field.
      
      This patch is the outcome of the following coccinelle patch:
      
      %<-------------------------------------------------------------------
      @@
      identifier I1, FN;
      @@
      FN(..., struct mmu_notifier_range *I1, ...) {
      <...
      -I1->blockable
      +mmu_notifier_range_blockable(I1)
      ...>
      }
      ------------------------------------------------------------------->%
      
      spatch --in-place --sp-file blockable.spatch --dir .
      
      Link: http://lkml.kernel.org/r/20190326164747.24405-3-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Reviewed-by: NRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfcd6660
  13. 07 5月, 2019 1 次提交
  14. 09 4月, 2019 1 次提交
  15. 27 3月, 2019 1 次提交
  16. 07 3月, 2019 1 次提交
  17. 05 3月, 2019 1 次提交
  18. 22 2月, 2019 1 次提交
  19. 26 1月, 2019 1 次提交
    • A
      RDMA/umem: Add missing initialization of owning_mm · a2093dd3
      Artemy Kovalyov 提交于
      When allocating a umem leaf for implicit ODP MR during page fault the
      field owning_mm was not set.
      
      Initialize and take a reference on this field to avoid kernel panic when
      trying to access this field.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
       PGD 800000022dfed067 P4D 800000022dfed067 PUD 22dfcf067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 634 Comm: kworker/u33:0 Not tainted 4.20.0-rc6+ #89
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action [mlx5_ib]
       RIP: 0010:ib_umem_odp_map_dma_pages+0xf3/0x710 [ib_core]
       Code: 45 c0 48 21 f3 48 89 75 b0 31 f6 4a 8d 04 33 48 89 45 a8 49 8b 44 24 60 48 8b 78 10 e8 66 16 a8 c5 49 8b 54 24 08 48 89 45 98 <8b> 42 58 85 c0 0f 84 8e 05 00 00 8d 48 01 48 8d 72 58 f0 0f b1 4a
       RSP: 0000:ffffb610813a7c20 EFLAGS: 00010202
       RAX: ffff95ace6e8ac80 RBX: 0000000000000000 RCX: 000000000000000c
       RDX: 0000000000000000 RSI: 0000000000000850 RDI: ffff95aceaadae80
       RBP: ffffb610813a7ce0 R08: 0000000000000000 R09: 0000000000080c77
       R10: ffff95acfffdbd00 R11: 0000000000000000 R12: ffff95aceaa20a00
       R13: 0000000000001000 R14: 0000000000001000 R15: 000000000000000c
       FS:  0000000000000000(0000) GS:ffff95acf7800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000058 CR3: 000000022c834001 CR4: 00000000001606f0
       Call Trace:
        pagefault_single_data_segment+0x1df/0xc60 [mlx5_ib]
        mlx5_ib_eqe_pf_action+0x7bc/0xa70 [mlx5_ib]
        ? __switch_to+0xe1/0x470
        process_one_work+0x174/0x390
        worker_thread+0x4f/0x3e0
        kthread+0x102/0x140
        ? drain_workqueue+0x130/0x130
        ? kthread_stop+0x110/0x110
        ret_from_fork+0x1f/0x30
      
      Fixes: f27a0d50 ("RDMA/umem: Use umem->owning_mm inside ODP")
      Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      a2093dd3
  20. 25 1月, 2019 2 次提交
  21. 29 12月, 2018 1 次提交
    • J
      mm/mmu_notifier: use structure for invalidate_range_start/end callback · 5d6527a7
      Jérôme Glisse 提交于
      Patch series "mmu notifier contextual informations", v2.
      
      This patchset adds contextual information, why an invalidation is
      happening, to mmu notifier callback.  This is necessary for user of mmu
      notifier that wish to maintains their own data structure without having to
      add new fields to struct vm_area_struct (vma).
      
      For instance device can have they own page table that mirror the process
      address space.  When a vma is unmap (munmap() syscall) the device driver
      can free the device page table for the range.
      
      Today we do not have any information on why a mmu notifier call back is
      happening and thus device driver have to assume that it is always an
      munmap().  This is inefficient at it means that it needs to re-allocate
      device page table on next page fault and rebuild the whole device driver
      data structure for the range.
      
      Other use case beside munmap() also exist, for instance it is pointless
      for device driver to invalidate the device page table when the
      invalidation is for the soft dirtyness tracking.  Or device driver can
      optimize away mprotect() that change the page table permission access for
      the range.
      
      This patchset enables all this optimizations for device drivers.  I do not
      include any of those in this series but another patchset I am posting will
      leverage this.
      
      The patchset is pretty simple from a code point of view.  The first two
      patches consolidate all mmu notifier arguments into a struct so that it is
      easier to add/change arguments.  The last patch adds the contextual
      information (munmap, protection, soft dirty, clear, ...).
      
      This patch (of 3):
      
      To avoid having to change many callback definition everytime we want to
      add a parameter use a structure to group all parameters for the
      mmu_notifier invalidate_range_start/end callback.  No functional changes
      with this patch.
      
      [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
      Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: Jason Gunthorpe <jgg@mellanox.com>	[infiniband]
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d6527a7
  22. 27 11月, 2018 1 次提交
  23. 13 11月, 2018 1 次提交
  24. 21 9月, 2018 4 次提交