提交 · c7252a6532995fe6971295b7878e5a74b4f85d0c · openeuler / Kernel

27 3月, 2019 1 次提交

IB/core: Ensure an invalidate_range callback on ODP MR · 4ae27444

由 Ira Weiny 提交于 3月 13, 2019

No device supports ODP MR without an invalidate_range callback.

Warn on any any device which attempts to support ODP without supplying
this callback.

Then we can remove the checks for the callback within the code.

This stems from the discussion

https://www.spinics.net/lists/linux-rdma/msg76460.html

...which concluded this code was no longer necessary.
Acked-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

4ae27444

16 2月, 2019 1 次提交

IB/uverbs: Add ib_ucontext to uverbs_attr_bundle sent from ioctl and cmd flows · 3d9dfd06

由 Shamir Rabinovitch 提交于 2月 07, 2019

Add ib_ucontext to the uverbs_attr_bundle sent down the iocl and cmd flows
as soon as the flow has ib_uobject.

In addition, remove rdma_get_ucontext helper function that is only used by
ib_umem_get.
Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3d9dfd06

08 2月, 2019 2 次提交

drivers/IB,core: reduce scope of mmap_sem · b95df5e3

由 Davidlohr Bueso 提交于 2月 06, 2019

ib_umem_get() uses gup_longterm() and relies on the lock to stabilze the
vma_list, so we cannot really get rid of mmap_sem altogether, but now that
the counter is atomic, we can get of some complexity that mmap_sem brings
with only pinned_vm.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

b95df5e3

mm: make mm->pinned_vm an atomic64 counter · 70f8a3ca

由 Davidlohr Bueso 提交于 2月 06, 2019

Taking a sleeping lock to _only_ increment a variable is quite the
overkill, and pretty much all users do this. Furthermore, some drivers
(ie: infiniband and scif) that need pinned semantics can go to quite
some trouble to actually delay via workqueue (un)accounting for pinned
pages when not possible to acquire it.

By making the counter atomic we no longer need to hold the mmap_sem and
can simply some code around it for pinned_vm users. The counter is 64-bit
such that we need not worry about overflows such as rdma user input
controlled from userspace.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

70f8a3ca

11 1月, 2019 1 次提交

IB/{core,hw}: Have ib_umem_get extract the ib_ucontext from ib_udata · b0ea0fa5

由 Jason Gunthorpe 提交于 1月 09, 2019

ib_umem_get() can only be called in a method callback, which always has a
udata parameter. This allows ib_umem_get() to derive the ucontext pointer
directly from the udata without requiring the drivers to find it in some
way or another.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com>

b0ea0fa5

28 9月, 2018 1 次提交

RDMA/core: Acquire and release mmap_sem on page range · 3994586f

由 Parav Pandit 提交于 9月 25, 2018

Currently mmap_sem is read locked while pinning the memory.  In a
multi-threaded application of a process, holding mmap_sem lock creates
contention with other threads who might be either registering memory,
creating QPs or simply doing mmap() as such operations also require to
hold the mmap_sem write lock.

All such operation cannot make forward progress until one memory pin
operation is completed.  It becomes more worse if the memory is unpinned
and/or memory registration is large (in GB range).

Therefore, instead of holding mmap_sem for too long (for whole region
pinning), acquire and release the lock for every few pages.  For example
on x86 with 4K page size, acquire and release mmap_sem for every 2Mbytes
memory chunk.

This allows other competing threads to make progress who might wish to
hold mmap_sem for shorter duration.

When memory registration latency is measured using [1] for memory sizes
ranging from 4K to 48GB, <= 1% or 0.5% degradation is noticed. In many
runs no difference is seen other than run-to-run variance.

In other targeted tests of users with large memory, desired improvements
are seen due to reduced contention of mmap_sem.

[1] https://github.com/paravmellanox/rtool

$ rdma_resource_lat -c 1 -s 48G -a -u L -i 500 -A

It registers pinned memory from 4K to 48GB size with 500 iterations for
each memory size.

$ rdma_resource_lat -c 1 -s 12G -a -u L -i 500 -t 4

4 competing threads pin memory, each of 12GB size with 500 iterations.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3994586f

26 9月, 2018 2 次提交

RDMA/umem: Fix potential addition overflow · c6ce5807

由 Doug Ledford 提交于 9月 21, 2018

Given a large enough memory allocation, it is possible to wrap the
pinned_vm counter.  Check for addition overflow to prevent such
eventualities.

Fixes: 40ddacf2 ("RDMA/umem: Don't hold mmap_sem for too long")
Reported-by: NJason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

c6ce5807

RDMA/umem: Minor optimizations · 3312d1c6

由 Doug Ledford 提交于 9月 21, 2018

Noticed while reviewing commit d4b4dd1b ("RDMA/umem: Do not use
current->tgid to track the mm_struct") patch. Why would we take a lock,
adjust a protected variable, drop the lock, and *then* check the input
into our protected variable adjustment? Then we have to take the lock
again on our error unwind. Let's just check the input early and skip
taking the locks needlessly if the input isn't valid.

It was also noticed that we set mm = current->mm, we then never modify
mm, but we still go back and reference current->mm a number of times
needlessly. Be consistent in using the stored reference in mm.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3312d1c6

21 9月, 2018 4 次提交

RDMA/umem: Get rid of struct ib_umem.odp_data · 597ecc5a

由 Jason Gunthorpe 提交于 9月 16, 2018

This no longer has any use, we can use container_of to get to the
umem_odp, and a simple flag to indicate if this is an odp MR. Remove the
few remaining references to it.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

597ecc5a

RDMA/umem: Make ib_umem_odp into a sub structure of ib_umem · 41b4deea

由 Jason Gunthorpe 提交于 9月 16, 2018

These two structures are linked together, use the container_of pattern
instead of a double allocation to make the code simpler and easier to
follow.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

41b4deea

RDMA/umem: Use ib_umem_odp in all function signatures connected to ODP · b5231b01

由 Jason Gunthorpe 提交于 9月 16, 2018

All of these functions already require the ODP version of the umem struct,
make this very clear by having the signature require it. This paves the
way to using the container_of() pattern to link umem_odp and umem
together.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b5231b01

RDMA/umem: Do not use current->tgid to track the mm_struct · d4b4dd1b

由 Jason Gunthorpe 提交于 9月 16, 2018

This is just wrong, the process that calls into the reg_mr is the process
associated with the umem, and that does not have to be the same process
that created the context.

When this code was first written mmgrab() didn't exist, however these days
we can just directly hold the mm_struct pointer in the umem and have no
ambiguity when it comes to releasing the umem as to which mm it was
associated with.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d4b4dd1b

14 7月, 2018 2 次提交

RDMA/umem: Refactor exit paths in ib_umem_get · 1215cb7c

由 Leon Romanovsky 提交于 7月 10, 2018

Simplify exit paths in ib_umem_get to use the standard goto unwind
pattern.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

1215cb7c

RDMA/umem: Don't hold mmap_sem for too long · 40ddacf2

由 Leon Romanovsky 提交于 7月 10, 2018

DMA mapping is time consuming operation and doesn't need to be performed
with mmap_sem semaphore is held.

The semaphore only needs to be held for accounting and get_user_pages
related activities.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

40ddacf2

27 6月, 2018 1 次提交

RDMA/umem: Don't check for a negative return value of dma_map_sg_attrs() · 3a2e791c

由 Leon Romanovsky 提交于 6月 24, 2018

dma_map_sg_attrs() returns 0 on error and can't return a negative number
(ensured by BUG_ON), so don't check.
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

3a2e791c

29 5月, 2018 1 次提交

IB/core: Make testing MR flags for writability a static inline function · 08bb558a

由 Jack Morgenstein 提交于 5月 23, 2018

Make the MR writability flags check, which is performed in umem.c,
a static inline function in file ib_verbs.h

This allows the function to be used by low-level infiniband drivers.

Cc: <stable@vger.kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

08bb558a

16 5月, 2018 2 次提交

IB/umem: Use the correct mm during ib_umem_release · 8e907ed4

由 Lidong Chen 提交于 5月 08, 2018

User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads.

If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has
exited, get_pid_task will return NULL and ib_umem_release will not
decrease mm->pinned_vm.

Instead of using threads to locate the mm, use the overall tgid from the
ib_ucontext struct instead. This matches the behavior of ODP and
disassociate in handling the mm of the process that called ibv_reg_mr.

Cc: <stable@vger.kernel.org>
Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Signed-off-by: NLidong Chen <lidongchen@tencent.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

8e907ed4

IB/core: Remove redundant return · aec05afe

由 Yuval Shaia 提交于 5月 10, 2018

"return" statement at the end of void function is redundant, removing
it.
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NQing Huang <qing.huang@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

aec05afe

19 12月, 2017 1 次提交

IB/umem: Fix use of npages/nmap fields · edf1a84f

由 Artemy Kovalyov 提交于 11月 14, 2017

In ib_umem structure npages holds original number of sg entries, while
nmap is number of DMA blocks returned by dma_map_sg.

Fixes: c5d76f13 ('IB/core: Add umem function to read data from user-space')
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

edf1a84f

30 11月, 2017 1 次提交

IB/core: disable memory registration of filesystem-dax vmas · 5f1d43de

由 Dan Williams 提交于 11月 29, 2017

Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.

Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reported-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJason Gunthorpe <jgg@mellanox.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f1d43de

02 6月, 2017 1 次提交

RDMA/core: not to set page dirty bit if it's already set. · 53376fed

由 Qing Huang 提交于 5月 18, 2017

This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.
Signed-off-by: NQing Huang <qing.huang@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

53376fed

26 4月, 2017 2 次提交

IB/umem: Add support to huge ODP · 0008b84e

由 Artemy Kovalyov 提交于 4月 05, 2017

Add IB_ACCESS_HUGETLB ib_reg_mr flag.
Hugetlb region registered with this flag
will use single translation entry per huge page.
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0008b84e

IB: Replace ib_umem page_size by page_shift · 3e7e1193

由 Artemy Kovalyov 提交于 4月 05, 2017

Size of pages are held by struct ib_umem in page_size field.

It is better to store it as an exponent, because page size by nature
is always power-of-two and used as a factor, divisor or ilog2's argument.

The conversion of page_size to be page_shift allows to have portable
code and avoid following error while compiling on ARM:

  ERROR: "__aeabi_uldivmod" [drivers/infiniband/core/ib_core.ko] undefined!

CC: Selvin Xavier <selvin.xavier@broadcom.com>
CC: Steve Wise <swise@chelsio.com>
CC: Lijun Ou <oulijun@huawei.com>
CC: Shiraz Saleem <shiraz.saleem@intel.com>
CC: Adit Ranadive <aditr@vmware.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Ram Amrani <Ram.Amrani@Cavium.com>
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Acked-by: NRam Amrani <Ram.Amrani@cavium.com>
Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NAdit Ranadive <aditr@vmware.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3e7e1193

02 3月, 2017 2 次提交

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3f07c014

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> · 6e84f315

由 Ingo Molnar 提交于 2月 08, 2017

We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/mm.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

The APIs that are going to be moved first are:

   mm_alloc()
   __mmdrop()
   mmdrop()
   mmdrop_async_fn()
   mmdrop_async()
   mmget_not_zero()
   mmput()
   mmput_async()
   get_task_mm()
   mm_access()
   mm_release()

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6e84f315

15 2月, 2017 1 次提交

IB/umem: Update on demand page (ODP) support · d07d1d70

由 Artemy Kovalyov 提交于 1月 18, 2017

Currently ODP MR may explicitly register virtual address space area
of limited length.
This change allows MR to cover entire process virtual address space
dynamicaly adding/removing translation entries to device MTT.

Add following changes to support implicit MR:
* Allow umem to be zero size to back-up implicit MR.
* Add new function ib_alloc_odp_umem() to add virtual memory regions
  to implicit MR dynamically on demand.
* Add new function rbt_ib_umem_lookup() to find dynamically added
  virtual memory regions.
* Expose function rbt_ib_umem_for_each_in_range() to other modules and
  make it safe
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d07d1d70

25 1月, 2017 1 次提交

IB/umem: Release pid in error and ODP flow · 828f6fa6

由 Kenneth Lee 提交于 1月 05, 2017

1. Release pid before enter odp flow
2. Release pid when fail to allocate memory

Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get")
Fixes: 8ada2c1c ("IB/core: Add support for on demand paging regions")
Signed-off-by: NKenneth Lee <liguozhu@hisilicon.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Reviewed-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

828f6fa6

15 12月, 2016 1 次提交

IB/core: fix unmap_sg argument · 17069d32

由 Sebastian Ott 提交于 12月 02, 2016

__ib_umem_release calls dma_unmap_sg with a different number of
sg_entries than ib_umem_get uses for dma_map_sg. This might cause
trouble for implementations that merge sglist entries and results
in the following dma debug complaint:

DMA-API: device driver frees DMA sg list with different entry
         count [map count=2] [unmap count=1]

Fix it by using the correct value.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

17069d32

17 11月, 2016 1 次提交

IB/core: Avoid unsigned int overflow in sg_alloc_table · 3c7ba576

由 Mark Bloch 提交于 10月 27, 2016

sg_alloc_table gets unsigned int as parameter while the driver
returns it as size_t. Check npages isn't greater than maximum
unsigned int.

Fixes: eeb8461e ("IB: Refactor umem to use linear SG table")
Signed-off-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3c7ba576

19 10月, 2016 1 次提交

mm: replace get_user_pages() write/force parameters with gup_flags · 768ae309

由 Lorenzo Stoakes 提交于 10月 13, 2016

This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
Acked-by: NChristian König <christian.koenig@amd.com>
Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

768ae309

04 8月, 2016 1 次提交

dma-mapping: use unsigned long for dma_attrs · 00085f1e

由 Krzysztof Kozlowski 提交于 8月 03, 2016

The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer.  Thus the pointer can point to const data.
However the attributes do not have to be a bitfield.  Instead unsigned
long will do fine:

1. This is just simpler.  Both in terms of reading the code and setting
   attributes.  Instead of initializing local attributes on the stack
   and passing pointer to it to dma_set_attr(), just set the bits.

2. It brings safeness and checking for const correctness because the
   attributes are passed by value.

Semantic patches for this change (at least most of them):

    virtual patch
    virtual context

    @r@
    identifier f, attrs;

    @@
    f(...,
    - struct dma_attrs *attrs
    + unsigned long attrs
    , ...)
    {
    ...
    }

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

and

    // Options: --all-includes
    virtual patch
    virtual context

    @r@
    identifier f, attrs;
    type t;

    @@
    t f(..., struct dma_attrs *attrs);

    @@
    identifier r.f;
    @@
    f(...,
    - NULL
    + 0
     )

Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.comSigned-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Acked-by: NRobin Murphy <robin.murphy@arm.com>
Acked-by: NHans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: NBart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00085f1e

16 2月, 2016 1 次提交

mm/gup: Switch all callers of get_user_pages() to not pass tsk/mm · d4edcf0d

由 Dave Hansen 提交于 2月 12, 2016

We will soon modify the vanilla get_user_pages() so it can no
longer be used on mm/tasks other than 'current/current->mm',
which is by far the most common way it is called.  For now,
we allow the old-style calls, but warn when they are used.
(implemented in previous patch)

This patch switches all callers of:

	get_user_pages()
	get_user_pages_unlocked()
	get_user_pages_locked()

to stop passing tsk/mm so they will no longer see the warnings.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: jack@suse.cz
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160212210156.113E9407@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d4edcf0d

16 4月, 2015 2 次提交

IB/core: don't disallow registering region starting at 0x0 · 66578b0b

由 Yann Droneaud 提交于 4月 13, 2015

In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit 8494057a
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).

This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().

There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org> # 8494057a ("IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic")
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

66578b0b

IB/core: disallow registering 0-sized memory region · 8abaae62

由 Yann Droneaud 提交于 4月 13, 2015

If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.

This should not be allowed: it's not expected for a memory
region to have a size equal to 0.

This patch adds a check to explicitly refuse to register
a 0-sized region.

Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: <stable@vger.kernel.org>
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8abaae62

03 4月, 2015 1 次提交

IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic · 8494057a

由 Shachar Raindel 提交于 3月 18, 2015

Properly verify that the resulting page aligned end address is larger
than both the start address and the length of the memory area requested.

Both the start and length arguments for ib_umem_get are controlled by
the user. A misbehaving user can provide values which will cause an
integer overflow when calculating the page aligned end address.

This overflow can cause also miscalculation of the number of pages
mapped, and additional logic issues.

Addresses: CVE-2014-8159
Cc: <stable@vger.kernel.org>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NJack Morgenstein <jackm@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8494057a

16 12月, 2014 5 次提交

IB/core: Implement support for MMU notifiers regarding on demand paging regions · 882214e2

由 Haggai Eran 提交于 12月 11, 2014

* Add an interval tree implementation for ODP umems. Create an
  interval tree for each ucontext (including a count of the number of
  ODP MRs in this context, semaphore, etc.), and register ODP umems in
  the interval tree.
* Add MMU notifiers handling functions, using the interval tree to
  notify only the relevant umems and underlying MRs.
* Register to receive MMU notifier events from the MM subsystem upon
  ODP MR registration (and unregister accordingly).
* Add a completion object to synchronize the destruction of ODP umems.
* Add mechanism to abort page faults when there's a concurrent invalidation.

The way we synchronize between concurrent invalidations and page
faults is by keeping a counter of currently running invalidations, and
a sequence number that is incremented whenever an invalidation is
caught. The page fault code checks the counter and also verifies that
the sequence number hasn't progressed before it updates the umem's
page tables. This is similar to what the kvm module does.

In order to prevent the case where we register a umem in the middle of
an ongoing notifier, we also keep a per ucontext counter of the total
number of active mmu notifiers. We only enable new umems when all the
running notifiers complete.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NYuval Dagan <yuvalda@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

882214e2

IB/core: Add support for on demand paging regions · 8ada2c1c

由 Shachar Raindel 提交于 12月 11, 2014

* Extend the umem struct to keep the ODP related data.
* Allocate and initialize the ODP related information in the umem
  (page_list, dma_list) and freeing as needed in the end of the run.
* Store a reference to the process PID struct in the ucontext.  Used to
  safely obtain the task_struct and the mm during fault handling,
  without preventing the task destruction if needed.
* Add 2 helper functions: ib_umem_odp_map_dma_pages and
  ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
  of specific pages of the umem (and, currently, pin them).
* Support for page faults only - IB core will keep the reference on
  the pages used and call put_page when freeing an ODP umem
  area. Invalidations support will be added in a later patch.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

8ada2c1c

IB/core: Add flags for on demand paging support · 860f10a7

由 Sagi Grimberg 提交于 12月 11, 2014

* Add a configuration option for enable on-demand paging support in
  the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
  later patch, this configuration option will select the MMU_NOTIFIER
  configuration option to enable mmu notifiers.
* Add a flag for on demand paging (ODP) support in the IB device capabilities.
* Add a flag to request ODP MR in the access flags to reg_mr.
* Fail registrations done with the ODP flag when the low-level driver
  doesn't support this.
* Change the conditions in which an MR will be writable to explicitly
  specify the access flags.  This is to avoid making an MR writable just
  because it is an ODP MR.
* Add a ODP capabilities to the extended query device verb.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NShachar Raindel <raindel@mellanox.com>
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

860f10a7

IB/core: Add umem function to read data from user-space · c5d76f13

由 Haggai Eran 提交于 12月 11, 2014

In some drivers there's a need to read data from a user space area
that was pinned using ib_umem when running from a different process
context.

The ib_umem_copy_from function allows reading data from the physical
pages pinned in the ib_umem struct.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c5d76f13

IB/core: Replace ib_umem's offset field with a full address · 406f9e5f

由 Haggai Eran 提交于 12月 11, 2014

In order to allow umems that do not pin memory, we need the umem to
keep track of its region's address.

This makes the offset field redundant, and so this patch removes it.
Signed-off-by: NHaggai Eran <haggaie@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

406f9e5f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功