1. 07 5月, 2019 1 次提交
  2. 03 5月, 2019 1 次提交
    • S
      RDMA/umem: Handle page combining avoidance correctly in ib_umem_add_sg_table() · 7872168a
      Shiraz Saleem 提交于
      The flag update_cur_sg tracks whether contiguous pages from a new set of
      page_list pages can be merged into the SGE passed into
      ib_umem_add_sg_table(). If this flag is true, but the total segment length
      exceeds the max_seg_size supported by HW, we avoid combining to this SGE
      and move to a new SGE (x) and merge 'len' pages to it. However, if i <
      npages, the next iteration can incorrectly merge 'len' contiguous pages
      into x instead of into a new SGE since update_cur_sg is still true.
      
      Reset update_cur_sg to false always after the check to merge pages into
      the first SGE passed in to ib_umem_add_sg_table().  Also, prevent a new
      SGE's segment length from ever exceeding HW max_seg_sz.
      
      There is a crash on hfi1 as result of this where-in max_seg_sz is
      defaulting to 64K. Due to above bug, unfolding SGE's in __ib_umem_release
      points to a bad page ptr.
      
       TEST comp-wfr.perfnative.STL-22166-WDT _ perftest native 2-Write_4097QP_4MB STARTING at 1555387093
       BUG: Bad page state in process ib_write_bw  pfn:7ebca0
       page:ffffcd675faf2800 count:0 mapcount:1 mapping:0000000000000000 index:0x1
       flags: 0x17ffffc0000000()
       raw: 0017ffffc0000000 dead000000000100 dead000000000200 0000000000000000
       raw: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
       page dumped because: nonzero mapcount
       CPU: 18 PID: 15853 Comm: ib_write_bw Tainted: G    B             5.1.0-rc4 #1
       Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015
       Call Trace:
        dump_stack+0x5a/0x73
        bad_page+0xf5/0x10f
        free_pcppages_bulk+0x62c/0x680
        free_unref_page+0x54/0x70
        __ib_umem_release+0x148/0x1a0 [ib_uverbs]
        ib_umem_release+0x22/0x80 [ib_uverbs]
        rvt_dereg_mr+0x67/0xb0 [rdmavt]
        ib_dereg_mr_user+0x37/0x60 [ib_core]
        destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
        uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
        uobj_destroy+0x4d/0x60 [ib_uverbs]
        __uobj_get_destroy+0x33/0x50 [ib_uverbs]
        __uobj_perform_destroy+0xa/0x30 [ib_uverbs]
        ib_uverbs_dereg_mr+0x66/0x90 [ib_uverbs]
        ib_uverbs_write+0x3e1/0x500 [ib_uverbs]
        vfs_write+0xad/0x1b0
        ksys_write+0x5a/0xd0
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: d10bcf94 ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
      Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
      Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      7872168a
  3. 09 4月, 2019 2 次提交
  4. 27 3月, 2019 1 次提交
  5. 16 2月, 2019 1 次提交
  6. 08 2月, 2019 2 次提交
  7. 11 1月, 2019 1 次提交
  8. 28 9月, 2018 1 次提交
    • P
      RDMA/core: Acquire and release mmap_sem on page range · 3994586f
      Parav Pandit 提交于
      Currently mmap_sem is read locked while pinning the memory.  In a
      multi-threaded application of a process, holding mmap_sem lock creates
      contention with other threads who might be either registering memory,
      creating QPs or simply doing mmap() as such operations also require to
      hold the mmap_sem write lock.
      
      All such operation cannot make forward progress until one memory pin
      operation is completed.  It becomes more worse if the memory is unpinned
      and/or memory registration is large (in GB range).
      
      Therefore, instead of holding mmap_sem for too long (for whole region
      pinning), acquire and release the lock for every few pages.  For example
      on x86 with 4K page size, acquire and release mmap_sem for every 2Mbytes
      memory chunk.
      
      This allows other competing threads to make progress who might wish to
      hold mmap_sem for shorter duration.
      
      When memory registration latency is measured using [1] for memory sizes
      ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many
      runs no difference is seen other than run-to-run variance.
      
      In other targeted tests of users with large memory, desired improvements
      are seen due to reduced contention of mmap_sem.
      
      [1] https://github.com/paravmellanox/rtool
      
      $ rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A
      
      It registers pinned memory from 4K to 48GB size with 500 iterations for
      each memory size.
      
      $ rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4
      
      4 competing threads pin memory, each of 12GB size with 500 iterations.
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      3994586f
  9. 26 9月, 2018 2 次提交
  10. 21 9月, 2018 4 次提交
  11. 14 7月, 2018 2 次提交
  12. 27 6月, 2018 1 次提交
  13. 29 5月, 2018 1 次提交
  14. 16 5月, 2018 2 次提交
  15. 19 12月, 2017 1 次提交
  16. 30 11月, 2017 1 次提交
  17. 02 6月, 2017 1 次提交
    • Q
      RDMA/core: not to set page dirty bit if it's already set. · 53376fed
      Qing Huang 提交于
      This change will optimize kernel memory deregistration operations.
      __ib_umem_release() used to call set_page_dirty_lock() against every
      writable page in its memory region. Its purpose is to keep data
      synced between CPU and DMA device when swapping happens after mem
      deregistration ops. Now we choose not to set page dirty bit if it's
      already set by kernel prior to calling __ib_umem_release(). This
      reduces memory deregistration time by half or even more when we ran
      application simulation test program.
      Signed-off-by: NQing Huang <qing.huang@oracle.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      53376fed
  18. 26 4月, 2017 2 次提交
  19. 02 3月, 2017 2 次提交
  20. 15 2月, 2017 1 次提交
    • A
      IB/umem: Update on demand page (ODP) support · d07d1d70
      Artemy Kovalyov 提交于
      Currently ODP MR may explicitly register virtual address space area
      of limited length.
      This change allows MR to cover entire process virtual address space
      dynamicaly adding/removing translation entries to device MTT.
      
      Add following changes to support implicit MR:
      * Allow umem to be zero size to back-up implicit MR.
      * Add new function ib_alloc_odp_umem() to add virtual memory regions
        to implicit MR dynamically on demand.
      * Add new function rbt_ib_umem_lookup() to find dynamically added
        virtual memory regions.
      * Expose function rbt_ib_umem_for_each_in_range() to other modules and
        make it safe
      Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d07d1d70
  21. 25 1月, 2017 1 次提交
  22. 15 12月, 2016 1 次提交
  23. 17 11月, 2016 1 次提交
  24. 19 10月, 2016 1 次提交
  25. 04 8月, 2016 1 次提交
    • K
      dma-mapping: use unsigned long for dma_attrs · 00085f1e
      Krzysztof Kozlowski 提交于
      The dma-mapping core and the implementations do not change the DMA
      attributes passed by pointer.  Thus the pointer can point to const data.
      However the attributes do not have to be a bitfield.  Instead unsigned
      long will do fine:
      
      1. This is just simpler.  Both in terms of reading the code and setting
         attributes.  Instead of initializing local attributes on the stack
         and passing pointer to it to dma_set_attr(), just set the bits.
      
      2. It brings safeness and checking for const correctness because the
         attributes are passed by value.
      
      Semantic patches for this change (at least most of them):
      
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
      
          @@
          f(...,
          - struct dma_attrs *attrs
          + unsigned long attrs
          , ...)
          {
          ...
          }
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      and
      
          // Options: --all-includes
          virtual patch
          virtual context
      
          @r@
          identifier f, attrs;
          type t;
      
          @@
          t f(..., struct dma_attrs *attrs);
      
          @@
          identifier r.f;
          @@
          f(...,
          - NULL
          + 0
           )
      
      Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
      Acked-by: Mark Salter <msalter@redhat.com> [c6x]
      Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
      Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
      Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
      Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
      Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
      Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
      Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
      Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
      Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
      Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00085f1e
  26. 16 2月, 2016 1 次提交
    • D
      mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d
      Dave Hansen 提交于
      We will soon modify the vanilla get_user_pages() so it can no
      longer be used on mm/tasks other than 'current/current->mm',
      which is by far the most common way it is called.  For now,
      we allow the old-style calls, but warn when they are used.
      (implemented in previous patch)
      
      This patch switches all callers of:
      
      	get_user_pages()
      	get_user_pages_unlocked()
      	get_user_pages_locked()
      
      to stop passing tsk/mm so they will no longer see the warnings.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4edcf0d
  27. 16 4月, 2015 2 次提交
  28. 03 4月, 2015 1 次提交
  29. 16 12月, 2014 1 次提交
    • H
      IB/core: Implement support for MMU notifiers regarding on demand paging regions · 882214e2
      Haggai Eran 提交于
      * Add an interval tree implementation for ODP umems. Create an
        interval tree for each ucontext (including a count of the number of
        ODP MRs in this context, semaphore, etc.), and register ODP umems in
        the interval tree.
      * Add MMU notifiers handling functions, using the interval tree to
        notify only the relevant umems and underlying MRs.
      * Register to receive MMU notifier events from the MM subsystem upon
        ODP MR registration (and unregister accordingly).
      * Add a completion object to synchronize the destruction of ODP umems.
      * Add mechanism to abort page faults when there's a concurrent invalidation.
      
      The way we synchronize between concurrent invalidations and page
      faults is by keeping a counter of currently running invalidations, and
      a sequence number that is incremented whenever an invalidation is
      caught. The page fault code checks the counter and also verifies that
      the sequence number hasn't progressed before it updates the umem's
      page tables. This is similar to what the kvm module does.
      
      In order to prevent the case where we register a umem in the middle of
      an ongoing notifier, we also keep a per ucontext counter of the total
      number of active mmu notifiers. We only enable new umems when all the
      running notifiers complete.
      Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NShachar Raindel <raindel@mellanox.com>
      Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
      Signed-off-by: NYuval Dagan <yuvalda@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      882214e2