提交 · 637f4600a8d3be44146ad7fbb5188484c3b0a1d4 · openeuler / Kernel

29 8月, 2017 3 次提交

IB/hfi1: Move structure definitions from user_exp_rcv.c to user_exp_rcv.h · 637f4600

由 Harish Chegondi 提交于 8月 21, 2017

Clean up user_exp_rcv.c file by moving structure definitions into header
file user_exp_rcv.h. Since these structure definitions depend on the
structure definitions in mmu_rb.h, move #include "mmu_rb.h" above
the include "user_exp_rcv.h" or include of header files that include
user_exp_rcv.h
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

637f4600

IB/hfi1: Remove duplicate definitions of num_user_pages() function · ddd3affb

由 Harish Chegondi 提交于 8月 21, 2017

num_user_pages() function has been defined in both user_exp_rcv.c file
and user_sdma.c file. Move the function definition to a header file so
there is only one definition in the source repo.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ddd3affb

IB/hfi1: Clean up hfi1_user_exp_rcv_setup function · 9dc11709

由 Harish Chegondi 提交于 8月 21, 2017

Clean up hfi1_user_exp_rcv_setup function by moving page pinning and
unpinning related code to separate functions. In order to reduce the
number of parameters passed between functions, a new data structure
struct tid_user_buf is defined and used.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9dc11709

01 8月, 2017 2 次提交

IB/hfi1: Only set fd pointer when base context is completely initialized · e87473bc

由 Michael J. Ruhl 提交于 7月 29, 2017

The allocate_ctxt() function adds the context to the fd data structure.
Since the context is not completely initialized, this can cause confusion
as to whether the context is valid or not.

Move the fd reference from allocate_ctxt() to setup_base_ctxt().
Update the necessary functions to be aware of this move.
Reviewed-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e87473bc

IB/hfi1: Fix bar0 mapping to use write combining · cb51c5d2

由 Mike Marciniszyn 提交于 7月 24, 2017

When the debugpat kernel boot flag is turned on the following
traces are printed:

[ 1884.793168] x86/PAT: Overlap at 0x90000000-0x92000000
[ 1884.803510] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track uncached-minus, req write-combining, ret uncached-minus
[ 1884.818167] hfi1 0000:05:00.0: hfi1_0: WC Remapped RcvArray:
ffffc9000a980000

The ioremap_wc() clearly is not returning a write combining mapping due
to an overlap where the RcvArray is mapped in a uncached mapping prior
to creating the proposed write combining mapping.

The patch replaces the single base register for uncached CSRs that
used to overlap the RcvArray with two mappings.   One, kregbase1, from the
bar0 up to the RcvArray and another, kregbase2, from the end of the
RcvArray to the pio send buffer space.  A new dd field, base2_start,
is used to convert the zero-based offset in the CSR routines to the
correct kregbase1/kregbase2 mapping.  A single direct write of the
RcvArray CSRs is replaced with hfi1_put_tid() to insure correct access
using the new disjoint mapping.

Additionally, the kregend field is deleted since it is only ever written.

patdebug now shows the RcvArray as write combining:
[   35.688990] x86/PAT: reserve_memtype added [mem 0x91200000-0x9127ffff],
track write-combining, req write-combining, ret write-combining

To insulate from any potential issues with write combining, all
writeq are now flushed in hfi1_put_tid() and rcv_array_wc_fill().
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cb51c5d2

28 6月, 2017 1 次提交

IB/hfi1: Create common expected receive verbs/PSM code · 9c1a99c3

由 Mike Marciniszyn 提交于 6月 09, 2017

Declarations and code in common between verbs and PSM are now moved
to exp_rcv.[ch].
Reviewed-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9c1a99c3

05 5月, 2017 5 次提交

IB/hfi1: Clean up on context initialization failure · 62239fc6

由 Michael J. Ruhl 提交于 5月 04, 2017

The error path for context initialization is not consistent. Cleanup all
resources on failure.

Removed unused variable user_event_mask.

Add the _BASE_FAILED bit to the event flags so that a base context can
notify waiting sub contexts that they cannot continue.

Running out of sub contexts is an EBUSY result, not EINVAL.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

62239fc6

IB/hfi1: Clean up context initialization · 9b60d2cb

由 Michael J. Ruhl 提交于 5月 04, 2017

Context initialization mixes base context init with sub context init.
This is bad because contexts can be reused, and on reuse, reinit things
that should not re-initialized.

Normalize comments and function names to refer to base context and
sub context (not main, shared or slaves).

Separate the base context initialization from sub context initialization.

hfi1_init_ctxt() cannot return an error so changed to a void and remove
error message.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9b60d2cb

IB/hfi1: Use filedata rather than filepointer · 5042cddf

由 Michael J. Ruhl 提交于 5月 04, 2017

Since almost all functions that use the hfi1_filedata get the pointer
from the file pointer, simplify by only passing the hfi1_filedata pointer.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5042cddf

IB/hfi1: Name function prototype parameters · f4cd8765

由 Michael J. Ruhl 提交于 5月 04, 2017

To improve the readability of function prototypes, give the parameters
names.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f4cd8765

IB/hfi1: Fix a subcontext memory leak · 224d71f9

由 Michael J. Ruhl 提交于 5月 04, 2017

The only context that frees user_exp_rcv data structures is the last
context closed (from a sub-context set).  This leaks the allocations
from the other sub-contexts.  Separate the common frees from the
specific frees and call them at the appropriate time.

Using KEDR to check for memory leaks we get:

Before test:

[leak_check] Possible leaks: 25

After test:

[leak_check] Possible leaks: 31  (6 leaked data structures)

After patch applied (before and after test have the same value)

[leak_check] Possible leaks: 25

Each leak is 192 + 13440 + 6720 = 20352 bytes per sub-context.

Cc: stable@vger.kernel.org
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

224d71f9

29 4月, 2017 1 次提交

IB/hfi1: Validate the TID count before using it · db730894

由 Michael J. Ruhl 提交于 4月 09, 2017

Improve the safety of the code by validating the user supplied
tidcnt before use.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

db730894

21 4月, 2017 2 次提交

IB/hfi1: Use kcalloc() in hfi1_user_exp_rcv_init() · 4076e518

由 Markus Elfring 提交于 2月 09, 2017

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus reuse the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4076e518

IB/hfi1: Virtual Network Interface Controller (VNIC) HW support · 2280740f

由 Vishwanathapura, Niranjana 提交于 4月 12, 2017

HFI1 HW specific support for VNIC functionality.
Dynamically allocate a set of contexts for VNIC when the first vnic
port is instantiated. Allocate VNIC contexts from user contexts pool
and return them back to the same pool while freeing up. Set aside
enough MSI-X interrupts for VNIC contexts and assign them when the
contexts are allocated. On the receive side, use an RSM rule to
spread TCP/UDP streams among VNIC contexts.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NNiranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: NAndrzej Kacprowski <andrzej.kacprowski@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2280740f

19 2月, 2017 1 次提交

IB/hfi1: Code reuse with memdup_copy · 1bb0d7b7

由 Michael J. Ruhl 提交于 2月 08, 2017

Update several usages of kmalloc/user_copy to memdup_copy and
memdup_copy_nul.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1bb0d7b7

03 8月, 2016 8 次提交

IB/hfi1: Fix memory leak during unexpected shutdown · 2677a768

由 Ira Weiny 提交于 7月 28, 2016

During an unexpected shutdown, references to tid_rb_node were NULL'ed out
without properly being released.

Fix this by calling clear_tid_node in the mmu notifier remove callback
rather than after these callbacks are called.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2677a768

IB/hfi1: Remove unneeded mm argument in remove function · 082b3532

由 Dean Luick 提交于 7月 28, 2016

The reworked mmu_rb interface allows the unused mm argument to be removed.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

082b3532

IB/hfi1: Consistently call ops->remove outside spinlock · b85ced91

由 Dean Luick 提交于 7月 28, 2016

The ops->remove() callback was called by hfi1_mmu_unregister() with a
NULL mm argument while holding a spinlock.  In the case of sdma_rb_remove()
this caused it to pass current->mm to hfi1_release_user_pages()

This had 2 problems.  First this would attempt to acquire the mmap_sem
under a spin lock.  Second the use of current->mm is not always guaranteed
to be the proper mm when the fd is being closed.

Rather than depend on this implicit behavior we move all calls to
ops->remove outside of the spinlock.  This also allows the correct
mm to be used in the remove callback without fear of deadlock.

Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but
usually does, we must delay all remove callbacks until out of the notifier,
when the callbacks can take the mmap_sem if they need to.

Code comments were added to clarify what the expectations are for the
users of the mmu rb tree.
Suggested-by: NJim Foraker <foraker1@llnl.gov>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b85ced91

IB/hfi1: Fix TID caching actions · 622c202c

由 Dean Luick 提交于 7月 28, 2016

Per file descriptor TID caching actions depend on a global that can
change midway through the lifetime of that file descriptor.

Make the use of caching consistent for the life of the file descriptor
by using the presence of the cache handler to decide when to use the cache
functions.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

622c202c

IB/hfi1: Make the cache handler own its rb tree root · e0b09ac5

由 Dean Luick 提交于 7月 28, 2016

The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.

Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.

This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e0b09ac5

IB/hfi1: Make use of mm consistent · 3faa3d9a

由 Ira Weiny 提交于 7月 28, 2016

The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed.  The driver
incorrectly assumes that the close will always happen from the same
context as the open.  In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context.  In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.

Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd.  Note: this
patch leaves 1 use of current->mm in place.  This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.

If registration fails, there is no reason to keep the handler object
around.  Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.
Suggested-by: NJim Foraker <foraker1@llnl.gov>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3faa3d9a

IB/hfi1: Rename TID mmu_rb_* functions · a7cd2dc5

由 Dean Luick 提交于 7月 28, 2016

Clarify the names of the TID mmu functions.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a7cd2dc5

IB/hfi1: Remove unused sub-context parameter · 5ed3b15b

由 Ira Weiny 提交于 7月 28, 2016

subctxt is not used, just remove it.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5ed3b15b

26 5月, 2016 1 次提交

IB/hfi1: Move driver out of staging · f48ad614

由 Dennis Dalessandro 提交于 5月 19, 2016

The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f48ad614

29 4月, 2016 3 次提交

IB/hfi1: Fix memory leak in user ExpRcv and SDMA · 0ad2d3d0

由 Mitko Haralanov 提交于 4月 12, 2016

The driver had two memory leaks - one in the user
expected receive code and one in SDMA buffer cache.

The leak in the expected receive code only showed up
when the user/admin had set ulimit sufficiently low
and the driver did not have enough room in the cache
before hitting the limit of allowed cachable memory.

When this condition occurred, the driver returned
early signaling userland that it needed to free some
buffers to free up room in the cache.

The bug was that the driver was not cleaning up
allocated memory prior to returning early.

The leak in the SDMA buffer cache could occur (even
though it never did), when the insertion of a buffer
node in the interval RB tree failed. In this case, the
driver failed to unpin the pages of the node instead
erroneously returning success.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0ad2d3d0

IB/hfi1: Don't attempt to free resources if initialization failed · 94158442

由 Mitko Haralanov 提交于 4月 20, 2016

Attempting to free resources which have not been allocated and
initialized properly led to the following kernel backtrace:

    BUG: unable to handle kernel NULL pointer dereference at           (null)
    IP: [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    PGD 852a43067 PUD 85d4a6067 PMD 0
    Oops: 0000 [#1] SMP
    CPU: 0 PID: 2831 Comm: osu_bw Tainted: G          IO 3.12.18-wfr+ #1
    task: ffff88085b15b540 ti: ffff8808588fe000 task.ti: ffff8808588fe000
    RIP: 0010:[<ffffffffa09658fe>]  [<ffffffffa09658fe>] unlock_exp_tids.isra.8+0x2e/0x120 [hfi1]
    RSP: 0018:ffff8808588ffde0  EFLAGS: 00010282
    RAX: 0000000000000000 RBX: ffff880858a31800 RCX: 0000000000000000
    RDX: ffff88085d971bc0 RSI: ffff880858a318f8 RDI: ffff880858a318c0
    RBP: ffff8808588ffe20 R08: 0000000000000000 R09: 0000000000000000
    R10: ffff88087ffd6f40 R11: 0000000001100348 R12: ffff880852900000
    R13: ffff880858a318c0 R14: 0000000000000000 R15: ffff88085d971be8
    FS:  00007f4674e83740(0000) GS:ffff88087f400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000085c377000 CR4: 00000000001407f0
    Stack:
     ffffffffa0941a71 ffff880858a318f8 ffff88085d971bc0 ffff880858a31800
     ffff880852900000 ffff880858a31800 00000000003ffff7 ffff88085d971bc0
     ffff8808588ffe60 ffffffffa09663fc ffff8808588ffe60 ffff880858a31800
    Call Trace:
     [<ffffffffa0941a71>] ? find_mmu_handler+0x51/0x70 [hfi1]
     [<ffffffffa09663fc>] hfi1_user_exp_rcv_free+0x6c/0x120 [hfi1]
     [<ffffffffa0932809>] hfi1_file_close+0x1a9/0x340 [hfi1]
     [<ffffffff8116c189>] __fput+0xe9/0x270
     [<ffffffff8116c35e>] ____fput+0xe/0x10
     [<ffffffff81065707>] task_work_run+0xa7/0xe0
     [<ffffffff81002969>] do_notify_resume+0x59/0x80
     [<ffffffff814ffc1a>] int_signal+0x12/0x17

This commit re-arranges the context initialization code in a way that
would allow for context event flags to be used to determine whether
the context has been successfully initialized.

In turn, this can be used to skip the resource de-allocation if they
were never allocated in the first place.

Fixes: 3abb33ac ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com.
Signed-off-by: NDoug Ledford <dledford@redhat.com>

94158442

IB/hfi1: Prevent NULL pointer deferences in caching code · f19bd643

由 Mitko Haralanov 提交于 4月 12, 2016

There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.

The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    15
    task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
    RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    RSP: 0000:ffff88085c245878  EFLAGS: 00010002
    RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
    RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
    RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
    R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
    R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
    FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
    Stack:
     ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
     ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
     0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
    Call Trace:
     [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
     [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
     [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
     [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
     [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
     [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
     [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
     [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
     [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
     [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
     [<ffffffff81121641>] shrink_zone+0x31/0x100
     [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
     [<ffffffff811229e3>] kswapd+0x403/0x7e0
     [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
     [<ffffffff81068ac0>] kthread+0xc0/0xd0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
     [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40

To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f19bd643

22 3月, 2016 6 次提交

IB/hfi1: Switch to using the pin query function · a7922f7d

由 Mitko Haralanov 提交于 3月 08, 2016

Use the new function to query whether the expected receive
user buffer can be pinned successfully. This requires that
a new variable be added to the hfi1_filedata structure used
to hold the number of pages pinned by the expected receive
code.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a7922f7d

IB/hfi1: Specify mm when releasing pages · bd3a8947

由 Mitko Haralanov 提交于 3月 08, 2016

This change adds a pointer to the process mm_struct when
calling hfi1_release_user_pages().

Previously, the function used the mm_struct of the current
process to adjust the number of pinned pages. However, is some
cases, namely when unpinning pages due to a MMU notifier call,
we want to drop into that code block as it will cause a deadlock
(the MMU notifiers take the process' mmap_sem prior to calling
the callbacks).

By allowing to caller to specify the pointer to the mm_struct,
the caller has finer control over that part of hfi1_release_user_pages().
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bd3a8947

IB/hfi1: Remove compare callback · b8718e2e

由 Mitko Haralanov 提交于 3月 08, 2016

Interval RB trees provide their own searching function,
which also takes care of determining the path through
the tree that should be taken.

This make the compare callback unnecessary.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b8718e2e

IB/hfi1: Notify remove MMU/RB callback of calling context · 909e2cd0

由 Mitko Haralanov 提交于 3月 08, 2016

Tell the remove MMU/RB callback if it's being called as
part of a memory invalidation or not. This can be important
in preventing a deadlock if the remove callback attempts to
take the map_sem semaphore because the kernel's MMU
invalidation functions have already taken it.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

909e2cd0

IB/hfi1: Remove the use of add/remove RB function pointers · 368f2b59

由 Mitko Haralanov 提交于 3月 08, 2016

The usage of function pointers for RB node insertion
and removal in the expected receive code path was
meant to be a small performance optimization. However,
maintaining it, especially with the new MMU API, would
become more troublesome as the API is extended.

Since the performance optimization is minor, remove the
function pointers and replace with direct calls.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

368f2b59

IB/hfi1: Re-factor MMU notification code · 06e0ffa6

由 Mitko Haralanov 提交于 3月 08, 2016

The MMU notification code added to the
expected receive side has been re-factored and
split into it's own file. This was done in
order to make the code more general and, therefore,
usable by other parts of the driver.

The caching behavior remains the same. However,
the handling of the RB tree (insertion, deletions,
and searching) as well as the MMU invalidation
processing is now handled by functions in the
mmu_rb.[ch] files.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

06e0ffa6

11 3月, 2016 2 次提交

staging/rdma/hfi1: Fix header · 05d6ac1d

由 Jubin John 提交于 2月 14, 2016

Fix the header by moving the copyright notice out of the license text
and to the top of the header. Also, update the copyright date.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

05d6ac1d

staging/rdma/hfi1: Improve performance of TID cache look up · a92ba6d6

由 Mitko Haralanov 提交于 2月 03, 2016

When TID caching was enabled, the way the driver found
RB nodes when PSM was unprogramming TID entries was by
traversing the RB tree, looking for a match on the
RcvArray entry index.

The performance of this algorithm was not only poor but
also inconsistent depending on how many RB nodes would
have to be traversed before a match was found.

The lower performance was especially evident in cases where
there was a cache miss with the cache full, requiring the
unprogramming of several TID entries.

This commit changes how RB nodes are looked up when being
free'd by PSM to a index-based lookup into a flat array on
the index of the RcvArray entry. This turns the entire
look-up process into an O(1) algorithm.

Special care needs to be taken for situations when TID
caching is disabled. In those cases, there is no need to
insert the RB nodes into an actual RB tree. Since the entire
RcvArray management mechanism is managed by an index-based
algorithm, the RB nodes can be saved into the flat array,
making both "insertion" and "removal" faster.
Reviewed-by: NArthur Kepner <arthur.kepner@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a92ba6d6

01 3月, 2016 5 次提交

staging/hfi1: Enable TID caching feature · 0b091fb3

由 Mitko Haralanov 提交于 2月 05, 2016

This commit "flips the switch" on the TID caching feature
implemented in this patch series.

As well as enabling the new feature by tying the new function
with the PSM API, it also cleans up the old unneeded code,
data structure members, and variables.

Due to difference in operation and information, the tracing
functions related to expected receives had to be changed. This
patch include these changes.

The tracing function changes could not be split into a separate
commit without including both tracing variants at the same time.
This would have caused other complications and ugliness.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b091fb3

staging/hfi1: Add TID entry program function body · 7e7a436e

由 Mitko Haralanov 提交于 2月 05, 2016

The previous patch in the series added the free/invalidate
function bodies. Now, it's time for the programming side.

This large function takes the user's buffer, breaks it up
into manageable chunks, allocates enough RcvArray groups
and programs the chunks into the RcvArray entries in the
hardware.

With this function, the TID caching functionality is implemented.
However, it is still unused. The switch will come in a later
patch in the series, which will remove the old functionality and
switch the driver over to TID caching.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7e7a436e

staging/hfi1: Add TID free/clear function bodies · 455d7f1a

由 Mitko Haralanov 提交于 2月 05, 2016

Up to now, the functions which cleared the programmed
TID entries and gave PSM the list of invalidated TID entries
were just stubs. With this commit, the bodies of these
functions are added.

This commit is a bit asymmetric as it only contains the
free code path. This is done on purpose to help with patch
reviews as the programming code path is much longer.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

455d7f1a

staging/hfi1: Add MMU notifier callback function · b5eb3b2f

由 Mitko Haralanov 提交于 2月 05, 2016

TID caching will rely on the MMU notifier to be told
when memory is being invalidated. When the callback
is called, the driver will find all RcvArray entries
that span the invalidated buffer and "schedule" them
to be freed by the PSM library.

This function is currently unused and is being added
in preparation for the TID caching feature.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b5eb3b2f

staging/hfi1: Add TID cache receive init and free funcs · 3abb33ac

由 Mitko Haralanov 提交于 2月 05, 2016

The upcoming TID caching feature requires different data
structures and, by extension, different initialization for each
of the MPI processes.

The two new functions (currently unused) perform the required
initialization and freeing of required resources and structures.
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3abb33ac

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功