提交 · a383f8ec552c9af5066eb488cc7a2d8b3994151d · openeuler / Kernel

03 8月, 2016 9 次提交

IB/hfi1: Release node on insert failure · a383f8ec

由 Dean Luick 提交于 7月 28, 2016

If unable to insert node into the RB tree cache, node will be freed
before returning from the function.  Null out iovec's pointer to node
so iovec does not try to free it later.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a383f8ec

IB/hfi1: Validate SDMA user iovector count · 9ff73c87

由 Dean Luick 提交于 7月 28, 2016

Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9ff73c87

IB/hfi1: Validate SDMA user request index · 4fa0d22c

由 Dean Luick 提交于 7月 28, 2016

Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fa0d22c

IB/hfi1: Prevent null pointer dereference · 53445bb3

由 Ira Weiny 提交于 7月 28, 2016

If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

53445bb3

IB/hfi1: Make iovec loop index easy to understand · ff4ce9bd

由 Dean Luick 提交于 7月 28, 2016

Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ff4ce9bd

IB/hfi1: Use "false" not 0 · 639297b4

由 Ira Weiny 提交于 7月 28, 2016

For bool parameters "false" should be used
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

639297b4

IB/hfi1: Fix minor format error · 72720ddf

由 Ira Weiny 提交于 7月 28, 2016

Brackets should be on the next line of a function
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

72720ddf

IB/hfi1: Allow for non-double word multiple message sizes for user SDMA · c4929802

由 Ira Weiny 提交于 7月 27, 2016

The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.
Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c4929802

IB/hfi1: Improve SDMA engine assignment for user SDMA · 14833b8c

由 Jianxin Xiong 提交于 7月 01, 2016

Currently each user context is assigned a single SDMA engine
based on the VL, context id, and subcontext id. That means for
MPI applications, each rank can only use one SDMA engine for
all messages. This may create unwanted backup for independent
messages going to different destinations upon congestion at one
destination.

This patch adds the packet "dlid" to the formula of SDMA engine
selection for user SDMA requests. A simple hash table is used
to maintain even distribution among the available SDMA engines
regardless how the "dlid" values are distributed.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NTadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

14833b8c

07 6月, 2016 2 次提交

IB/hfi1: Suppress sparse warnings · 55c40648

由 Bart Van Assche 提交于 6月 03, 2016

Avoid that sparse reports the following warnings for the hfi1 driver:

trace.c:217:13: warning: no previous prototype for ‘print_u64_array’ [-Wmissing-prototypes]
user_sdma.c:1361:17: warning: dubious: !x & y
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55c40648

IB/hfi1: Use bit 0 instead of bit 1 · d55215c5

由 Bart Van Assche 提交于 6月 03, 2016

The first argument of test_bit() and clear_bit() is a bit number and
not a bitmask. Hence change that first argument from (1 << 0) into 0.
This patch avoids that smatch reports the following warnings:

user_sdma.c:1059: sdma_cache_evict() warn: test_bit() takes a bit number
user_sdma.c:1590: sdma_rb_remove() warn: test_bit() takes a bit number
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d55215c5

26 5月, 2016 3 次提交

IB/hfi1: Move driver out of staging · f48ad614

由 Dennis Dalessandro 提交于 5月 19, 2016

The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f48ad614

IB/hfi1: Fix bug that blocks process on exit after port bounce · b583faf4

由 Jianxin Xiong 提交于 5月 19, 2016

During the processing of a user SDMA request, if there was an
error before the request counter was increased, the state of
the packet queue could be updated incorrectly, causing the
counter to underflow. As the result, the process could get
stuck later since the counter could never get back to 0.

This patch adds a condition to guard the packet queue update
so that the counter is only decreased if it has been increased
before the error happens.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b583faf4

IB/hfi1: Fix an interval RB node reference count leak · 9565c6a3

由 Mitko Haralanov 提交于 5月 19, 2016

Commit e88c9271 ("IB/hfi1: Fix buffer cache corner case which
may cause corruption") introduced a bug which may cause a reference
count of a interval RB node to be leaked in the case where an SDMA
transfer from that node completes at the same time as the node is
being extended.

If a node is being extended, it is first removed from the RB tree
in order to be processed without the risk of an invalidation event
removing the node at the same time.

If a SDMA completion happens during that time, the completion handler
will fail to find the node in the RB tree and, therefore, fail to
correctly decrement its refcount. This leaves the node in the tree and
its pages pinned for the duration of the user process.

To prevent this from happening the io vector adds a reference to the
RB node, which is used during the SDMA completion instead of looking
up the node in the RB tree.

This change adds a performance improvement as a side effect by avoiding
the RB tree lookup.

Fixes: e88c9271 ("IB/hfi1: Fix buffer cache corner case which may cause corruption")
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9565c6a3

29 4月, 2016 8 次提交

IB/hfi1: Check P_KEY for all sent packets from user mode · e38d1e4f

由 Sebastian Sanchez 提交于 4月 12, 2016

Add the P_KEY check for user-context mechanism for
both PIO and SDMA. For PIO, the
SendCtxtCheckEnable.DisallowKDETHPackets is set by
default. When the P_KEY is set,
SendCtxtCheckEnable.DisallowKDETHPackets is cleared.
For SDMA, a software check was included. This change
requires user processes to set the P_KEY before sending
any packets, otherwise, the sent packet will fail. The
original submission didn't have this check but it's
required.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMikto Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e38d1e4f

IB/hfi1: Fix buffer cache races which may cause corruption · e88c9271

由 Mitko Haralanov 提交于 4月 12, 2016

There are two possible causes for node/memory corruption both
of which are related to the cache eviction algorithm. One way
to cause corruption is due to the asynchronous nature of the
MMU invalidation and the locking used when invalidating node.

The MMU invalidation routine would temporarily release the
RB tree lock to avoid a deadlock. However, this would allow
the eviction function to take the lock resulting in the removal
of cache nodes.

If the node being removed by the eviction code is the same as
the node being invalidated, the result is use after free.

The same is true in the other direction due to the temporary
release of the eviction list lock in the eviction loop.

Another corner case exists when dealing with the SDMA buffer
cache that could cause memory corruption of kernel memory.
The most common way, in which this corruption exhibits itself
is a linked list node corruption. In that case, the kernel will
complain that a node with poisoned pointers is being removed.
The fact that the pointers are already poisoned means that the
node has already been removed from the list.

To root cause of this corruption was a mishandling of the
eviction list maintained by the driver. In order for this
to happen four conditions need to be satisfied:

   1. A node describing a user buffer already exists in the
      interval RB tree,
   2. The beginning of the current user buffer matches that
      node but is bigger. This will cause the node to be
      extended.
   3. The amount of cached buffers is close or at the limit
      of the buffer cache size.
   4. The node has dropped close to the end of the eviction
      list. This will cause the node to be considered for
      eviction.

If all of the above conditions have been satisfied, it is
possible for the eviction algorithm to evict the current node,
which will free the node without the driver knowing.

To solve both issues described above:
   - the locking around the MMU invalidation loop and cache
     eviction loop has been improved so locks are not released in
     the loop body,
   - a new RB function is introduced which will "atomically" find
     and remove the matching node from the RB tree, preventing the
     MMU invalidation loop from touching it, and
   - the node being extended by the pin_vector_pages() function is
     removed from the eviction list prior to calling the eviction
     function.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e88c9271

IB/hfi1: Extract and reinsert MMU RB node on lookup · f53af85e

由 Mitko Haralanov 提交于 4月 12, 2016

The page pinning function, which also maintains the pin cache,
behaves one of two ways when an exact buffer match is not found:
  1. If no node is not found (a buffer with the same starting address
     is not found in the cache), a new node is created, the buffer
     pages are pinned, and the node is inserted into the RB tree, or
  2. If a node is found but the buffer in that node is a subset of
     the new user buffer, the node is extended with the new buffer
     pages.

Both modes of operation require (re-)insertion into the interval RB
tree.

When the node being inserted is a new node, the operations are pretty
simple. However, when the node is already existing and is being
extended, special care must be taken.

First, we want to guard against an asynchronous attempt to
delete the node by the MMU invalidation notifier. The simplest way to
do this is to remove the node from the RB tree, preventing the search
algorithm from finding it.

Second, the node needs to be re-inserted so it lands in the proper place
in the tree and the tree is correctly re-balanced. This also requires
the node to be removed from the RB tree.

This commit adds the hfi1_mmu_rb_extract() function, which will search
for a node in the interval RB tree matching an address and length and
remove it from the RB tree if found. This allows for both of the above
special cases be handled in a single step.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f53af85e

IB/hfi1: Correctly compute node interval · de79093b

由 Mitko Haralanov 提交于 4月 12, 2016

The computation of the interval of an interval RB node
was incorrect leading to data corruption due to the RB
search algorithm not properly finding the all RB nodes
in an MMU invalidation interval.

The problem stemmed from the fact that the beginning
address of the node's range was being aligned to a page
boundary. For certain buffer sizes, this would lead to
a end address calculation that was off by 1 page.

An important aspect of keeping the RB same is also
updating the node's range in the case it's being extended.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

de79093b

IB/hfi1: Fix memory leak in user ExpRcv and SDMA · 0ad2d3d0

由 Mitko Haralanov 提交于 4月 12, 2016

The driver had two memory leaks - one in the user
expected receive code and one in SDMA buffer cache.

The leak in the expected receive code only showed up
when the user/admin had set ulimit sufficiently low
and the driver did not have enough room in the cache
before hitting the limit of allowed cachable memory.

When this condition occurred, the driver returned
early signaling userland that it needed to free some
buffers to free up room in the cache.

The bug was that the driver was not cleaning up
allocated memory prior to returning early.

The leak in the SDMA buffer cache could occur (even
though it never did), when the insertion of a buffer
node in the interval RB tree failed. In this case, the
driver failed to unpin the pages of the node instead
erroneously returning success.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0ad2d3d0

IB/hfi1: Don't remove list entries if they are not in a list · 4787bc5e

由 Mitko Haralanov 提交于 4月 12, 2016

The SDMA cache logic maintains an eviction list which is ordered
by most recently used user buffers. Upon errors or buffer freeing,
the list nodes were unconditionally being deleted. This would lead
to list corruption warnings if the nodes were never inserted in the
eviction list to begin with.

This commit prevents this by checking that the nodes are already
part of the eviction list.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4787bc5e

IB/hfi1: Prevent unpinning of wrong pages · 849e3e93

由 Mitko Haralanov 提交于 4月 12, 2016

The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.

In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.

There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.

This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

849e3e93

IB/hfi1: Prevent NULL pointer deferences in caching code · f19bd643

由 Mitko Haralanov 提交于 4月 12, 2016

There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.

The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    15
    task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
    RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    RSP: 0000:ffff88085c245878  EFLAGS: 00010002
    RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
    RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
    RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
    R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
    R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
    FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
    Stack:
     ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
     ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
     0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
    Call Trace:
     [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
     [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
     [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
     [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
     [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
     [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
     [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
     [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
     [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
     [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
     [<ffffffff81121641>] shrink_zone+0x31/0x100
     [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
     [<ffffffff811229e3>] kswapd+0x403/0x7e0
     [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
     [<ffffffff81068ac0>] kthread+0xc0/0xd0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
     [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40

To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f19bd643

22 3月, 2016 3 次提交

IB/hfi1: Add SDMA cache eviction algorithm · 5511d781

由 Mitko Haralanov 提交于 3月 08, 2016

This commit adds a cache eviction algorithm for the SDMA
user buffer cache.

Besides the interval RB tree used for node lookup, the cache
nodes are also arranged in a doubly-linked list. When a node is
used, it is put at the beginning of the list. Less frequently
used nodes naturally move to the tail of the list.

When the cache limit is reached, the eviction code starts
traversing the linked list in reverse, freeing buffers until
enough space has been freed to fit the new user buffer. This
guarantees that only the least used cache nodes will be removed
from the cache.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5511d781

IB/hfi1: Specify mm when releasing pages · bd3a8947

由 Mitko Haralanov 提交于 3月 08, 2016

This change adds a pointer to the process mm_struct when
calling hfi1_release_user_pages().

Previously, the function used the mm_struct of the current
process to adjust the number of pinned pages. However, is some
cases, namely when unpinning pages due to a MMU notifier call,
we want to drop into that code block as it will cause a deadlock
(the MMU notifiers take the process' mmap_sem prior to calling
the callbacks).

By allowing to caller to specify the pointer to the mm_struct,
the caller has finer control over that part of hfi1_release_user_pages().
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bd3a8947

IB/hfi1: Implement SDMA-side buffer caching · 5cd3a88d

由 Mitko Haralanov 提交于 3月 08, 2016

Add support for caching of user buffers used for SDMA
transfers. This change improves performance by
avoiding repeatedly pinning the pages of buffers, which
are being re-used by the application.

While the cost of the pinning operation has been made
heavier by adding the extra code to search the cache tree,
re-allocate pages arrays, and future cache evictions,
that cost will be amortized against the savings when the
same buffer is re-used. It is also worth noting that in
most cases, the cost of pinning should be much lower due
to the buffer already being in the cache.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5cd3a88d

12 3月, 2016 2 次提交

staging: rdma: hfi1: Replace ALIGN with PAGE_ALIGN · 84449917

由 Amitoj Kaur Chawla 提交于 3月 04, 2016

mm.h contains a helper function PAGE_ALIGN which aligns the pointer
to the page boundary instead of using ALIGN(expression, PAGE_SIZE)

This change was made with the help of the following Coccinelle
semantic patch:
//<smpl>
@@
expression e;
symbol PAGE_SIZE;
@@
(
- ALIGN(e, PAGE_SIZE)
+ PAGE_ALIGN(e)
|
- IS_ALIGNED(e, PAGE_SIZE)
+ PAGE_ALIGNED(e)
)
//</smpl>
Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

84449917

staging: rdma: hfi1: user_sdma.c: Drop void pointer cast · 16ccad04

由 Janani Ravichandran 提交于 2月 25, 2016

Void pointers need not be cast to other pointer types.
Semantic patch used:

@r@
expression x;
void *e;
type T;
identifier f;
@@

(
  *((T *)e)
|
  ((T *)x) [...]
|
  ((T *)x)->f
|
- (T *)
  e
)
Signed-off-by: NJanani Ravichandran <janani.rvchndrn@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

16ccad04

11 3月, 2016 12 次提交

staging/rdma/hfi1: Fix header · 05d6ac1d

由 Jubin John 提交于 2月 14, 2016

Fix the header by moving the copyright notice out of the license text
and to the top of the header. Also, update the copyright date.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

05d6ac1d

staging/rdma/hfi1: Add braces on all arms of statement · e490974e

由 Jubin John 提交于 2月 14, 2016

Add braces on all arms of statements to fix checkpatch check:
CHECK: braces {} should be used on all arms of this statement
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e490974e

staging/rdma/hfi1: Fix code alignment · 17fb4f29

由 Jubin John 提交于 2月 14, 2016

Fix code alignment to fix checkpatch check:
CHECK: Alignment should match open parenthesis
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

17fb4f29

staging/rdma/hfi1: Fix block comments · 4d114fdd

由 Jubin John 提交于 2月 14, 2016

Fix block comments with proper formatting to fix checkpatch warnings:
WARNING: Block comments use * on subsequent lines
WARNING: Block comments use a trailing */ on a separate line
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4d114fdd

staging/rdma/hfi1: Remove blank line before close brace · 5161fc3e

由 Jubin John 提交于 2月 14, 2016

Remove extra blank line before close brace to fix checkpatch check:
CHECK: Blank lines aren't necessary before a close brace '}'
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5161fc3e

staging/rdma/hfi1: Remove space after cast · 50e5dcbe

由 Jubin John 提交于 2月 14, 2016

Remove the space after a cast to fix checkpatch check:
CHECK: No space is necessary after a cast
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

50e5dcbe

staging/rdma/hfi1: Add spaces around binary operators · 8638b77f

由 Jubin John 提交于 2月 14, 2016

Add spaces around binary operators.

Fixes checkpatch check:
CHECK: spaces preferred around that 'x'
where x is a binary operator
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8638b77f

staging/rdma/hfi: fix CQ completion order issue · a545f530

由 Mike Marciniszyn 提交于 2月 14, 2016

The current implementation of the sdma_wait variable
has a timing hole that can cause a completion Q entry
to be returned from a pio send prior to an older
sdma packets completion queue entry.

The sdma_wait variable used to be decremented prior to
calling the packet complete routine.  The hole is between decrement
and the verbs completion where send engine using pio could return
a out of order completion in that window.

This patch closes the hole by allowing an API option to
specify an sdma_drained callback.   The atomic dec
is positioned after the complete callback to avoid the
window as long as the pio path doesn't execute when
there is a non-zero sdma count.
Reviewed-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a545f530

staging/rdma/hfi1: Fix bug that could block the process on context exit · a402d6ab

由 Mitko Haralanov 提交于 2月 03, 2016

A race was discovred in the user SDMA code, which could result
in an process being stuck in the kernel call indefinitely in
certain error conditions.

If, during the processing of a user SDMA request, there was an
error *and* all outstanding SDMA descriptor had been completed
by the time the that error case was handled in the calling function,
the state of the packet queue would not get correctly updated
resulting in the process subsequently getting stuck, thinking that
there are more descriptors to be completed.

To handle this scenario, the driver now checks the submitted
packet count vs. the completed. If all submitted packets have also
been completed, the driver can safely free the request and signal
user level. Otherwise, this will be handled by the completion
callback.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a402d6ab

staging/rdma/hfi1: Improve performance of user SDMA · 0840aea9

由 Mitko Haralanov 提交于 2月 03, 2016

To facilitate locked page counting, the user SDMA
routines would maintain a list of io vectors, which
were freed in the completion callback and then unpin
the associated pages during the next call into the
kernel.

Since the size of this list was unbounded, doing this
was bad for performance because the driver ended up
spending too much time freeing the io vectors.

This commit changes how the io vector freeing is done
by moving the actual page unpinning in the callback and
maintaining a count of unpinned pages. This count can
then be used during the next call into the kernel to
update the mm->pinned_vm variable (since that requires
process context and the ability to sleep.)
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0840aea9

staging/rdma/hfi1: Properly determine error status of SDMA slots · c7cbf2fa

由 Mitko Haralanov 提交于 2月 03, 2016

To ensure correct operation between the driver and PSM
with respect to managing the SDMA request ring, it is
important that the status for a particular request slot
is set at the correct time. Otherwise, PSM can get out
of sync with the driver, which could lead to hangs or
errors on new requests.

Properly determining of when to set the error status of
a SDMA slot depends on knowing exactly when the last txreq
for that request has been completed. This in turn requires
that the driver knows exactly how many requests have been
generated and how many of those requests have been successfully
submitted to the SDMA queue.

The previous implementation of the mid-layer SDMA API did not
provide a way for the caller of sdma_send_txlist() to know how
many of the txreqs in the input list have actually been submitted
without traversing the list and counting. Since sdma_send_txlist()
already traverses the list in order to process it, requiring
such traversal in the caller is completely unnecessary. Therefore,
it is much easier to enhance sdma_send_txlist() to return the
number of successfully submitted txreqs.

This, in turn, allows the caller to accurately determine the
progress of the SDMA request and, therefore, correctly set the
error status at the right time.
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c7cbf2fa

staging/rdma/hfi1: Improve performance of SDMA transfers · 0f2d87d2

由 Mitko Haralanov 提交于 2月 03, 2016

Commit a0d40693 ("staging/rdma/hfi1: Add page lock limit
check for SDMA requests") added a mechanism to
delay the clean-up of user SDMA requests in order to facilitate
proper locked page counting.

This delayed processing was done using a kernel workqueue, which
meant that a kernel thread would have to spin up and take CPU
cycles to do the clean-up.

This proved detrimental to performance because now there are two
execution threads (the kernel workqueue and the user process)
needing cycles on the same CPU.

Performance-wise, it is much better to do as much of the clean-up
as can be done in interrupt context (during the callback) and do
the remaining work in-line during subsequent calls of the user
process into the driver.

The changes required to implement the above also significantly
simplify the entire SDMA completion processing code and eliminate
a memory corruption causing the following observed crash:

    [ 2881.703362] BUG: unable to handle kernel NULL pointer dereference at        (null)
    [ 2881.703389] IP: [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x18e0 [hfi1]
    [ 2881.703422] PGD 7d4d25067 PUD 77d96d067 PMD 0
    [ 2881.703427] Oops: 0000 [#1] SMP
    [ 2881.703431] Modules linked in:
    [ 2881.703504] CPU: 28 PID: 6668 Comm: mpi_stress Tainted: G           OENX 3.12.28-4-default #1
    [ 2881.703508] Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0044.090
    [ 2881.703512] task: ffff88077da8e0c0 ti: ffff880856772000 task.ti: ffff880856772000
    [ 2881.703515] RIP: 0010:[<ffffffffa02897e4>]  [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x
    [ 2881.703529] RSP: 0018:ffff880856773c48  EFLAGS: 00010287
    [ 2881.703531] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000002000
    [ 2881.703534] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000002000
    [ 2881.703537] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
    [ 2881.703540] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
    [ 2881.703543] R13: 0000000000000000 R14: ffff88071e782e68 R15: ffff8810532955c0
    [ 2881.703546] FS:  00007f9c4375e700(0000) GS:ffff88107eec0000(0000) knlGS:0000000000000000
    [ 2881.703549] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 2881.703551] CR2: 0000000000000000 CR3: 00000007d4cba000 CR4: 00000000003407e0
    [ 2881.703554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 2881.703556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 2881.703558] Stack:
    [ 2881.703559]  ffffffff00002000 ffff881000001800 ffffffff00000000 00000000000080d0
    [ 2881.703570]  0000000000000000 0000200000000000 0000000000000000 ffff88071e782db8
    [ 2881.703580]  ffff8807d4d08d80 ffff881053295600 0000000000000008 ffff88071e782fc8
    [ 2881.703589] Call Trace:
    [ 2881.703691]  [<ffffffffa028b5da>] hfi1_user_sdma_process_request+0x84a/0xab0 [hfi1]
    [ 2881.703777]  [<ffffffffa0255412>] hfi1_aio_write+0xd2/0x110 [hfi1]
    [ 2881.703828]  [<ffffffff8119e3d8>] do_sync_readv_writev+0x48/0x80
    [ 2881.703837]  [<ffffffff8119f78b>] do_readv_writev+0xbb/0x230
    [ 2881.703843]  [<ffffffff8119fab8>] SyS_writev+0x48/0xc0

This commit also addresses issues related to notification of user
processes of SDMA request slot availability. The slot should be
cleaned up first before the user processes is notified of its
availability.
Reviewed-by: NArthur Kepner <arthur.kepner@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0f2d87d2

23 2月, 2016 1 次提交

staging: rdma: hfi1: Remove header file · 84d899e7

由 Amitoj Kaur Chawla 提交于 2月 22, 2016

Remove duplicate include file. Found using includecheck.
Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

84d899e7

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功