提交 · 4a35339958f16d42a4ca06a8da9d4b5ab39ee8ea · openeuler / Kernel

07 5月, 2019 1 次提交

RDMA/umem: Add API to find best driver supported page size in an MR · 4a353399

由 Shiraz Saleem 提交于 5月 06, 2019

This helper iterates through the SG list to find the best page size to use
from a bitmap of HW supported page sizes. Drivers that support multiple
page sizes, but not mixed sizes in an MR can use this API.
Suggested-by: NJason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4a353399

03 5月, 2019 1 次提交

RDMA/umem: Handle page combining avoidance correctly in ib_umem_add_sg_table() · 7872168a

由 Shiraz Saleem 提交于 4月 29, 2019

The flag update_cur_sg tracks whether contiguous pages from a new set of
page_list pages can be merged into the SGE passed into
ib_umem_add_sg_table(). If this flag is true, but the total segment length
exceeds the max_seg_size supported by HW, we avoid combining to this SGE
and move to a new SGE (x) and merge 'len' pages to it. However, if i <
npages, the next iteration can incorrectly merge 'len' contiguous pages
into x instead of into a new SGE since update_cur_sg is still true.

Reset update_cur_sg to false always after the check to merge pages into
the first SGE passed in to ib_umem_add_sg_table().  Also, prevent a new
SGE's segment length from ever exceeding HW max_seg_sz.

There is a crash on hfi1 as result of this where-in max_seg_sz is
defaulting to 64K. Due to above bug, unfolding SGE's in __ib_umem_release
points to a bad page ptr.

 TEST comp-wfr.perfnative.STL-22166-WDT _ perftest native 2-Write_4097QP_4MB STARTING at 1555387093
 BUG: Bad page state in process ib_write_bw  pfn:7ebca0
 page:ffffcd675faf2800 count:0 mapcount:1 mapping:0000000000000000 index:0x1
 flags: 0x17ffffc0000000()
 raw: 0017ffffc0000000 dead000000000100 dead000000000200 0000000000000000
 raw: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
 page dumped because: nonzero mapcount
 CPU: 18 PID: 15853 Comm: ib_write_bw Tainted: G    B             5.1.0-rc4 #1
 Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS SE5C610.86B.01.01.0014.121820151719 12/18/2015
 Call Trace:
  dump_stack+0x5a/0x73
  bad_page+0xf5/0x10f
  free_pcppages_bulk+0x62c/0x680
  free_unref_page+0x54/0x70
  __ib_umem_release+0x148/0x1a0 [ib_uverbs]
  ib_umem_release+0x22/0x80 [ib_uverbs]
  rvt_dereg_mr+0x67/0xb0 [rdmavt]
  ib_dereg_mr_user+0x37/0x60 [ib_core]
  destroy_hw_idr_uobject+0x1c/0x50 [ib_uverbs]
  uverbs_destroy_uobject+0x2e/0x180 [ib_uverbs]
  uobj_destroy+0x4d/0x60 [ib_uverbs]
  __uobj_get_destroy+0x33/0x50 [ib_uverbs]
  __uobj_perform_destroy+0xa/0x30 [ib_uverbs]
  ib_uverbs_dereg_mr+0x66/0x90 [ib_uverbs]
  ib_uverbs_write+0x3e1/0x500 [ib_uverbs]
  vfs_write+0xad/0x1b0
  ksys_write+0x5a/0xd0
  do_syscall_64+0x5b/0x180
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d10bcf94 ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Tested-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7872168a

09 4月, 2019 2 次提交

RDMA/umem: Use correct value for SG entries in sg_copy_to_buffer() · d0b5c01b

由 Shiraz Saleem 提交于 4月 04, 2019

With page combining, the assumption that number of SG entries in umem SGL
equal to number of system pages in umem no longer holds.

umem->sg_nents tracks the SG entries in umem SGL. Use it in
sg_pcopy_to_buffer() as opposed to ib_umem_num_pages(umem).

Fixes: d10bcf94 ("RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs")
Reported-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d0b5c01b

RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs · d10bcf94

由 Shiraz Saleem 提交于 4月 02, 2019

Combine contiguous regions of PAGE_SIZE pages into single scatter list
entry while building the scatter table for a umem. This minimizes the
number of the entries in the scatter list and reduces the DMA mapping
overhead, particularly with the IOMMU.

Set default max_seg_size in core for IB devices to 2G and do not combine
if we exceed this limit.

Also, purge npages in struct ib_umem as we now DMA map the umem SGL with
sg_nents and npage computation is not needed. Drivers should now be using
ib_umem_num_pages(), so fix the last stragglers.

Move npages tracking to ib_umem_odp as ODP drivers still need it.
Suggested-by: NJason Gunthorpe <jgg@ziepe.ca>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Acked-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Tested-by: NGal Pressman <galpress@amazon.com>
Tested-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d10bcf94

27 3月, 2019 1 次提交

IB/core: Ensure an invalidate_range callback on ODP MR · 4ae27444

由 Ira Weiny 提交于 3月 13, 2019

No device supports ODP MR without an invalidate_range callback.

Warn on any any device which attempts to support ODP without supplying
this callback.

Then we can remove the checks for the callback within the code.

This stems from the discussion

https://www.spinics.net/lists/linux-rdma/msg76460.html

...which concluded this code was no longer necessary.
Acked-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4ae27444

16 2月, 2019 1 次提交

IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows · 3d9dfd06

由 Shamir Rabinovitch 提交于 2月 07, 2019

Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows
as soon as the flow has ib_uobject.

In addition, remove rdma_get_ucontext helper function that is only used by
ib_umem_get.
Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3d9dfd06

08 2月, 2019 2 次提交

drivers/IB,core: reduce scope of mmap_sem · b95df5e3

由 Davidlohr Bueso 提交于 2月 06, 2019

ib_umem_get() uses gup_longterm() and relies on the lock to stabilze the
vma_list, so we cannot really get rid of mmap_sem altogether, but now that
the counter is atomic, we can get of some complexity that mmap_sem brings
with only pinned_vm.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b95df5e3

mm: make mm->pinned_vm an atomic64 counter · 70f8a3ca

由 Davidlohr Bueso 提交于 2月 06, 2019

Taking a sleeping lock to _only_ increment a variable is quite the
overkill, and pretty much all users do this. Furthermore, some drivers
(ie: infiniband and scif) that need pinned semantics can go to quite
some trouble to actually delay via workqueue (un)accounting for pinned
pages when not possible to acquire it.

By making the counter atomic we no longer need to hold the mmap_sem and
can simply some code around it for pinned_vm users. The counter is 64-bit
such that we need not worry about overflows such as rdma user input
controlled from userspace.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

70f8a3ca

11 1月, 2019 1 次提交

IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata · b0ea0fa5

由 Jason Gunthorpe 提交于 1月 09, 2019

ib_umem_get() can only be called in a method callback, which always has a
udata parameter. This allows ib_umem_get() to derive the ucontext pointer
directly from the udata without requiring the drivers to find it in some
way or another.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>

b0ea0fa5

28 9月, 2018 1 次提交

RDMA/core: Acquire and release mmap_sem on page range · 3994586f

由 Parav Pandit 提交于 9月 25, 2018

Currently mmap_sem is read locked while pinning the memory.  In a
multi-threaded application of a process, holding mmap_sem lock creates
contention with other threads who might be either registering memory,
creating QPs or simply doing mmap() as such operations also require to
hold the mmap_sem write lock.

All such operation cannot make forward progress until one memory pin
operation is completed.  It becomes more worse if the memory is unpinned
and/or memory registration is large (in GB range).

Therefore, instead of holding mmap_sem for too long (for whole region
pinning), acquire and release the lock for every few pages.  For example
on x86 with 4K page size, acquire and release mmap_sem for every 2Mbytes
memory chunk.

This allows other competing threads to make progress who might wish to
hold mmap_sem for shorter duration.

When memory registration latency is measured using [1] for memory sizes
ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many
runs no difference is seen other than run-to-run variance.

In other targeted tests of users with large memory, desired improvements
are seen due to reduced contention of mmap_sem.

[1] https://github.com/paravmellanox/rtool

$ rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A

It registers pinned memory from 4K to 48GB size with 500 iterations for
each memory size.

$ rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4

4 competing threads pin memory, each of 12GB size with 500 iterations.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3994586f

26 9月, 2018 2 次提交

RDMA/umem: Fix potential addition overflow · c6ce5807

由 Doug Ledford 提交于 9月 21, 2018

Given a large enough memory allocation, it is possible to wrap the
pinned_vm counter.  Check for addition overflow to prevent such
eventualities.

Fixes: 40ddacf2 ("RDMA/umem: Don't hold mmap_sem for too long")
Reported-by: NJason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c6ce5807

RDMA/umem: Minor optimizations · 3312d1c6

由 Doug Ledford 提交于 9月 21, 2018

Noticed while reviewing commit d4b4dd1b ("RDMA/umem: Do not use
current->tgid to track the mm_struct") patch. Why would we take a lock,
adjust a protected variable, drop the lock, and *then* check the input
into our protected variable adjustment? Then we have to take the lock
again on our error unwind. Let's just check the input early and skip
taking the locks needlessly if the input isn't valid.

It was also noticed that we set mm = current->mm, we then never modify
mm, but we still go back and reference current->mm a number of times
needlessly. Be consistent in using the stored reference in mm.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3312d1c6

21 9月, 2018 4 次提交

RDMA/umem: Get rid of struct ib_umem.odp_data · 597ecc5a

由 Jason Gunthorpe 提交于 9月 16, 2018

This no longer has any use, we can use container_of to get to the
umem_odp, and a simple flag to indicate if this is an odp MR. Remove the
few remaining references to it.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

597ecc5a

RDMA/umem: Make ib_umem_odp into a sub structure of ib_umem · 41b4deea

由 Jason Gunthorpe 提交于 9月 16, 2018

These two structures are linked together, use the container_of pattern
instead of a double allocation to make the code simpler and easier to
follow.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

41b4deea

RDMA/umem: Use ib_umem_odp in all function signatures connected to ODP · b5231b01

由 Jason Gunthorpe 提交于 9月 16, 2018

All of these functions already require the ODP version of the umem struct,
make this very clear by having the signature require it. This paves the
way to using the container_of() pattern to link umem_odp and umem
together.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b5231b01

RDMA/umem: Do not use current->tgid to track the mm_struct · d4b4dd1b

由 Jason Gunthorpe 提交于 9月 16, 2018

This is just wrong, the process that calls into the reg_mr is the process
associated with the umem, and that does not have to be the same process
that created the context.

When this code was first written mmgrab() didn't exist, however these days
we can just directly hold the mm_struct pointer in the umem and have no
ambiguity when it comes to releasing the umem as to which mm it was
associated with.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d4b4dd1b

14 7月, 2018 2 次提交

RDMA/umem: Refactor exit paths in ib_umem_get · 1215cb7c

由 Leon Romanovsky 提交于 7月 10, 2018

Simplify exit paths in ib_umem_get to use the standard goto unwind
pattern.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1215cb7c

RDMA/umem: Don't hold mmap_sem for too long · 40ddacf2

由 Leon Romanovsky 提交于 7月 10, 2018

DMA mapping is time consuming operation and doesn't need to be performed
with mmap_sem semaphore is held.

The semaphore only needs to be held for accounting and get_user_pages
related activities.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

40ddacf2

27 6月, 2018 1 次提交

RDMA/umem: Don't check for a negative return value of dma_map_sg_attrs() · 3a2e791c

由 Leon Romanovsky 提交于 6月 24, 2018

dma_map_sg_attrs() returns 0 on error and can't return a negative number
(ensured by BUG_ON), so don't check.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3a2e791c

29 5月, 2018 1 次提交

IB/core: Make testing MR flags for writability a static inline function · 08bb558a

由 Jack Morgenstein 提交于 5月 23, 2018

Make the MR writability flags check, which is performed in umem.c,
a static inline function in file ib_verbs.h

This allows the function to be used by low-level infiniband drivers.

Cc: <stable@vger.kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

08bb558a

16 5月, 2018 2 次提交

IB/umem: Use the correct mm during ib_umem_release · 8e907ed4

由 Lidong Chen 提交于 5月 08, 2018

User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL and ib_umem_release will not
decrease mm->pinned_vm.

Instead of using threads to locate the mm, use the overall tgid from the
ib_ucontext struct instead. This matches the behavior of ODP and
disassociate in handling the mm of the process that called ibv_reg_mr.

Cc: <stable@vger.kernel.org>
Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Signed-off-by: NLidong Chen <lidongchen@tencent.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8e907ed4

IB/core: Remove redundant return · aec05afe

由 Yuval Shaia 提交于 5月 10, 2018

"return" statement at the end of void function is redundant, removing
it.
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NQing Huang <qing.huang@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aec05afe

19 12月, 2017 1 次提交

IB/umem: Fix use of npages/nmap fields · edf1a84f

由 Artemy Kovalyov 提交于 11月 14, 2017

In ib_umem structure npages holds original number of sg entries, while
nmap is number of DMA blocks returned by dma_map_sg.

Fixes: c5d76f13 ('IB/core: Add umem function to read data from user-space')
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

edf1a84f

30 11月, 2017 1 次提交

IB/core: disable memory registration of filesystem-dax vmas · 5f1d43de

由 Dan Williams 提交于 11月 29, 2017

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reported-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJason Gunthorpe <jgg@mellanox.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f1d43de

02 6月, 2017 1 次提交

RDMA/core: not to set page dirty bit if it's already set. · 53376fed

由 Qing Huang 提交于 5月 18, 2017

This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.
Signed-off-by: NQing Huang <qing.huang@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

53376fed

26 4月, 2017 2 次提交

IB/umem: Add support to huge ODP · 0008b84e

由 Artemy Kovalyov 提交于 4月 05, 2017

Add IB_ACCESS_HUGETLB ib_reg_mr flag.
Hugetlb region registered with this flag
will use single translation entry per huge page.
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0008b84e

IB: Replace ib_umem page_size by page_shift · 3e7e1193

由 Artemy Kovalyov 提交于 4月 05, 2017

Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Acked-by: NRam Amrani <Ram.Amrani@cavium.com>
Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3e7e1193

02 3月, 2017 2 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3f07c014

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> · 6e84f315

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6e84f315

15 2月, 2017 1 次提交

IB/umem: Update on demand page (ODP) support · d07d1d70

由 Artemy Kovalyov 提交于 1月 18, 2017

Currently ODP MR may explicitly register virtual address space area
of limited length.
This change allows MR to cover entire process virtual address space
dynamicaly adding/removing translation entries to device MTT.

Add following changes to support implicit MR:
* Allow umem to be zero size to back-up implicit MR.
* Add new function ib_alloc_odp_umem() to add virtual memory regions
  to implicit MR dynamically on demand.
* Add new function rbt_ib_umem_lookup() to find dynamically added
  virtual memory regions.
* Expose function rbt_ib_umem_for_each_in_range() to other modules and
  make it safe
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d07d1d70

25 1月, 2017 1 次提交

IB/umem: Release pid in error and ODP flow · 828f6fa6

由 Kenneth Lee 提交于 1月 05, 2017

1. Release pid before enter odp flow
2. Release pid when fail to allocate memory

Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Fixes: 8ada2c1c ("IB/core: Add support for on demand paging regions")
Signed-off-by: NKenneth Lee <liguozhu@hisilicon.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

828f6fa6

15 12月, 2016 1 次提交

IB/core: fix unmap_sg argument · 17069d32

由 Sebastian Ott 提交于 12月 02, 2016

__ib_umem_release calls dma_unmap_sg with a different number of
sg_entries than ib_umem_get uses for dma_map_sg. This might cause
trouble for implementations that merge sglist entries and results
in the following dma debug complaint:

DMA-API: device driver frees DMA sg list with different entry
         count [map count=2] [unmap count=1]

Fix it by using the correct value.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

17069d32

17 11月, 2016 1 次提交

IB/core: Avoid unsigned int overflow in sg_alloc_table · 3c7ba576

由 Mark Bloch 提交于 10月 27, 2016

sg_alloc_table gets unsigned int as parameter while the driver
returns it as size_t. Check npages isn't greater than maximum
unsigned int.

Fixes: eeb8461e ("IB: Refactor umem to use linear SG table")
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3c7ba576

19 10月, 2016 1 次提交

mm: replace get_user_pages() write/force parameters with gup_flags · 768ae309

由 Lorenzo Stoakes 提交于 10月 13, 2016

This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

768ae309

04 8月, 2016 1 次提交

dma-mapping: use unsigned long for dma_attrs · 00085f1e

由 Krzysztof Kozlowski 提交于 8月 03, 2016

The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00085f1e

16 2月, 2016 1 次提交

mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d

由 Dave Hansen 提交于 2月 12, 2016

We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called.  For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)

This patch switches all callers of:

	get_user_pages()
	get_user_pages_unlocked()
	get_user_pages_locked()

to stop passing tsk/mm so they will no longer see the warnings.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d4edcf0d

16 4月, 2015 2 次提交

IB/core: don't disallow registering region starting at 0x0 · 66578b0b

由 Yann Droneaud 提交于 4月 13, 2015

In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057a
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).

This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().

There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org> # 8494057a ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic")
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

66578b0b

IB/core: disallow registering 0-sized memory region · 8abaae62

由 Yann Droneaud 提交于 4月 13, 2015

If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8abaae62

03 4月, 2015 1 次提交

IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic · 8494057a

由 Shachar Raindel 提交于 3月 18, 2015

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Cc: <stable@vger.kernel.org>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8494057a

16 12月, 2014 1 次提交

IB/core: Implement support for MMU notifiers regarding on demand paging regions · 882214e2

由 Haggai Eran 提交于 12月 11, 2014

* Add an interval tree implementation for ODP umems. Create an
  interval tree for each ucontext (including a count of the number of
  ODP MRs in this context, semaphore, etc.), and register ODP umems in
  the interval tree.
* Add MMU notifiers handling functions, using the interval tree to
  notify only the relevant umems and underlying MRs.
* Register to receive MMU notifier events from the MM subsystem upon
  ODP MR registration (and unregister accordingly).
* Add a completion object to synchronize the destruction of ODP umems.
* Add mechanism to abort page faults when there's a concurrent invalidation.

The way we synchronize between concurrent invalidations and page
faults is by keeping a counter of currently running invalidations, and
a sequence number that is incremented whenever an invalidation is
caught. The page fault code checks the counter and also verifies that
the sequence number hasn't progressed before it updates the umem's
page tables. This is similar to what the kvm module does.

In order to prevent the case where we register a umem in the middle of
an ongoing notifier, we also keep a per ucontext counter of the total
number of active mmu notifiers. We only enable new umems when all the
running notifiers complete.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NYuval Dagan <yuvalda@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

882214e2

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功