提交 · e730139b3464cc740c33131c872f7d173744ef11 · openeuler / Kernel

12 12月, 2016 1 次提交

IB/hfi1: Disable header suppression for short packets · e730139b

由 Jakub Pawlak 提交于 12月 07, 2016

For the received packets with payload less or equal 8DWS
RxDmaDataFifoRdUncErr is not reported. There is set RHF.EccErr
if the header is not suppressed. When such packet is detected
on the send side the header suppression mechanism is disabled
by clearing SH bit in the packet header.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e730139b

02 10月, 2016 2 次提交

IB/hfi1: Add sysfs interface for affinity setup · 0cb2aa69

由 Tadeusz Struk 提交于 9月 25, 2016

Some users want more control over which cpu cores are being used by the
driver. For example, users might want to restrict the driver to some
specified subset of the cores so that they can appropriately partition
processes, irq handlers, and work threads.
To allow the user to fine tune system affinity settings new sysfs
attributes are introduced per sdma engine.  This patch adds a new
attribute type for sdma engine and a new cpu_list attribute.
When the user writes a cpu range to the cpu_list attribute the driver
will create an internal cpu->sdma map, which will be used later as a
look-up table to choose an optimal engine for a user requests.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Reviewed-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0cb2aa69

IB/hfi1: Fix the count of user packets submitted to an SDMA engine · 0b115ef1

由 Harish Chegondi 提交于 9月 06, 2016

Each user SDMA request coming into the driver may contain multiple packets.
Each user packet may use multiple SDMA descriptors to fill the send buffer.
The field seqsubmitted in struct user_sdma_request counts the number of
user packets submitted to an SDMA engine. Sometimes, the intermediate count
may not be updated properly. However, once all the packets' descriptors
are successfully submitted to the SDMA engine, the final count is updated
correctly. But, if only some of the packets are submitted to the engine due
to an error, the intermediate count doesn't reflect the partial number of
packets submitted to the SDMA engine. This can cause a hang later in the
code as the count of packets submitted to the SDMA engine doesn't match the
the count of packets processed by the SDMA engine.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0b115ef1

03 9月, 2016 1 次提交

IB/hfi1: Fix AHG KDETH Intr shift · af534939

由 Jubin John 提交于 8月 31, 2016

In the set_txreq_header_ahg(), The KDETH Intr bit is obtained from the
header in the user sdma request using a KDETH_GET shift and mask macro.
This value is then futher right shifted by 16 causing us to lose the
value i.e it is shifted to zero, leading to the following
smatch warning:
drivers/infiniband/hw/hfi1/user_sdma.c:1482 set_txreq_header_ahg()
warn: mask and shift to zero

The Intr bit should be left shifted into its correct position in the
KDETH header before the AHG update.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

af534939

03 8月, 2016 16 次提交

IB/hfi1: Remove unneeded mm argument in remove function · 082b3532

由 Dean Luick 提交于 7月 28, 2016

The reworked mmu_rb interface allows the unused mm argument to be removed.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

082b3532

IB/hfi1: Consistently call ops->remove outside spinlock · b85ced91

由 Dean Luick 提交于 7月 28, 2016

The ops->remove() callback was called by hfi1_mmu_unregister() with a
NULL mm argument while holding a spinlock.  In the case of sdma_rb_remove()
this caused it to pass current->mm to hfi1_release_user_pages()

This had 2 problems.  First this would attempt to acquire the mmap_sem
under a spin lock.  Second the use of current->mm is not always guaranteed
to be the proper mm when the fd is being closed.

Rather than depend on this implicit behavior we move all calls to
ops->remove outside of the spinlock.  This also allows the correct
mm to be used in the remove callback without fear of deadlock.

Because the MMU notifier is not guaranteed to hold mm->mmap_sem, but
usually does, we must delay all remove callbacks until out of the notifier,
when the callbacks can take the mmap_sem if they need to.

Code comments were added to clarify what the expectations are for the
users of the mmu rb tree.
Suggested-by: NJim Foraker <foraker1@llnl.gov>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b85ced91

IB/hfi1: Use evict mmu rb operation · b7df192f

由 Dean Luick 提交于 7月 28, 2016

Use the new cache evict operation in the SDMA code.  This allows the cache
to properly coordinate evicts and removes, preventing any race.  With this
change, the separate list, lock, and race flag are not needed.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b7df192f

IB/hfi1: Make the cache handler own its rb tree root · e0b09ac5

由 Dean Luick 提交于 7月 28, 2016

The objects which use cache handling should reference their own handler
object not the internal data structure it uses to track the nodes.

Have the "users" of the mmu notifier code pass opaque objects which can
then be properly used in the mmu callbacks depending on the owners needs.

This patch has the additional benefit that operations no longer require a
look up in a list to find the handlers.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e0b09ac5

IB/hfi1: Make use of mm consistent · 3faa3d9a

由 Ira Weiny 提交于 7月 28, 2016

The hfi1 driver registers a mmu_notifier callback when /dev/hfi1_* is
opened, and unregisters it when the device is closed.  The driver
incorrectly assumes that the close will always happen from the same
context as the open.  In particular, closes due to SIGKILL or OOM killer
activity may happen from a different context.  In these cases, the wrong
mm is passed to mmu_notifier_unregister(), which causes improper reference
counting for the victim mm, and eventual memory corruption.

Preserve the mm for all open file descriptors and use this mm rather than
current->mm for memory operations for the lifetime of that fd.  Note: this
patch leaves 1 use of current->mm in place.  This use is removed in a
follow on patch because other functional changes were required prior to
that use being removed.

If registration fails, there is no reason to keep the handler object
around.  Free the handler object rather than add it to the list to
prevent any mmu_notifier operations, including unregister, when
registration fails.
Suggested-by: NJim Foraker <foraker1@llnl.gov>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3faa3d9a

IB/hfi1: Fix user SDMA racy user request claim · 7b3256e3

由 Dean Luick 提交于 7月 28, 2016

The user SDMA in-use claim bit is in the structure that gets zeroed out
once the claim is made.  Move the request in-use flag into its own bit
array and use that for atomic claims.  This cleans up the claim code and
removes any race possibility.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7b3256e3

IB/hfi1: Fix error condition that needs to clean up · 9da7e9a7

由 Dean Luick 提交于 7月 28, 2016

If input validation fails, properly free the request before returning.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9da7e9a7

IB/hfi1: Release node on insert failure · a383f8ec

由 Dean Luick 提交于 7月 28, 2016

If unable to insert node into the RB tree cache, node will be freed
before returning from the function.  Null out iovec's pointer to node
so iovec does not try to free it later.
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a383f8ec

IB/hfi1: Validate SDMA user iovector count · 9ff73c87

由 Dean Luick 提交于 7月 28, 2016

Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9ff73c87

IB/hfi1: Validate SDMA user request index · 4fa0d22c

由 Dean Luick 提交于 7月 28, 2016

Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fa0d22c

IB/hfi1: Prevent null pointer dereference · 53445bb3

由 Ira Weiny 提交于 7月 28, 2016

If a context has not been assigned or assignment failed, pq may be NULL.
Move the unregister within the protection of the null check.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

53445bb3

IB/hfi1: Make iovec loop index easy to understand · ff4ce9bd

由 Dean Luick 提交于 7月 28, 2016

Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ff4ce9bd

IB/hfi1: Use "false" not 0 · 639297b4

由 Ira Weiny 提交于 7月 28, 2016

For bool parameters "false" should be used
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

639297b4

IB/hfi1: Fix minor format error · 72720ddf

由 Ira Weiny 提交于 7月 28, 2016

Brackets should be on the next line of a function
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

72720ddf

IB/hfi1: Allow for non-double word multiple message sizes for user SDMA · c4929802

由 Ira Weiny 提交于 7月 27, 2016

The driver pads non-double word multiple message sizes but it doesn't
account for this padding when the packet length is calculated. Also, the
data length is miscalculated for message sizes less than 4 bytes due to
the bit representation in LRH. And there's a check for non-double word
multiple message sizes that prevents these messages from being sent.
This patch fixes length miscalculations and enables the functionality to
send non-double word multiple message sizes.
Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NIra Weiny <ira.weiny@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c4929802

IB/hfi1: Improve SDMA engine assignment for user SDMA · 14833b8c

由 Jianxin Xiong 提交于 7月 01, 2016

Currently each user context is assigned a single SDMA engine
based on the VL, context id, and subcontext id. That means for
MPI applications, each rank can only use one SDMA engine for
all messages. This may create unwanted backup for independent
messages going to different destinations upon congestion at one
destination.

This patch adds the packet "dlid" to the formula of SDMA engine
selection for user SDMA requests. A simple hash table is used
to maintain even distribution among the available SDMA engines
regardless how the "dlid" values are distributed.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NTadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

14833b8c

07 6月, 2016 2 次提交

IB/hfi1: Suppress sparse warnings · 55c40648

由 Bart Van Assche 提交于 6月 03, 2016

Avoid that sparse reports the following warnings for the hfi1 driver:

trace.c:217:13: warning: no previous prototype for ‘print_u64_array’ [-Wmissing-prototypes]
user_sdma.c:1361:17: warning: dubious: !x & y
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

55c40648

IB/hfi1: Use bit 0 instead of bit 1 · d55215c5

由 Bart Van Assche 提交于 6月 03, 2016

The first argument of test_bit() and clear_bit() is a bit number and
not a bitmask. Hence change that first argument from (1 << 0) into 0.
This patch avoids that smatch reports the following warnings:

user_sdma.c:1059: sdma_cache_evict() warn: test_bit() takes a bit number
user_sdma.c:1590: sdma_rb_remove() warn: test_bit() takes a bit number
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d55215c5

26 5月, 2016 3 次提交

IB/hfi1: Move driver out of staging · f48ad614

由 Dennis Dalessandro 提交于 5月 19, 2016

The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f48ad614

IB/hfi1: Fix bug that blocks process on exit after port bounce · b583faf4

由 Jianxin Xiong 提交于 5月 19, 2016

During the processing of a user SDMA request, if there was an
error before the request counter was increased, the state of
the packet queue could be updated incorrectly, causing the
counter to underflow. As the result, the process could get
stuck later since the counter could never get back to 0.

This patch adds a condition to guard the packet queue update
so that the counter is only decreased if it has been increased
before the error happens.
Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b583faf4

IB/hfi1: Fix an interval RB node reference count leak · 9565c6a3

由 Mitko Haralanov 提交于 5月 19, 2016

Commit e88c9271 ("IB/hfi1: Fix buffer cache corner case which
may cause corruption") introduced a bug which may cause a reference
count of a interval RB node to be leaked in the case where an SDMA
transfer from that node completes at the same time as the node is
being extended.

If a node is being extended, it is first removed from the RB tree
in order to be processed without the risk of an invalidation event
removing the node at the same time.

If a SDMA completion happens during that time, the completion handler
will fail to find the node in the RB tree and, therefore, fail to
correctly decrement its refcount. This leaves the node in the tree and
its pages pinned for the duration of the user process.

To prevent this from happening the io vector adds a reference to the
RB node, which is used during the SDMA completion instead of looking
up the node in the RB tree.

This change adds a performance improvement as a side effect by avoiding
the RB tree lookup.

Fixes: e88c9271 ("IB/hfi1: Fix buffer cache corner case which may cause corruption")
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9565c6a3

29 4月, 2016 8 次提交

IB/hfi1: Check P_KEY for all sent packets from user mode · e38d1e4f

由 Sebastian Sanchez 提交于 4月 12, 2016

Add the P_KEY check for user-context mechanism for
both PIO and SDMA. For PIO, the
SendCtxtCheckEnable.DisallowKDETHPackets is set by
default. When the P_KEY is set,
SendCtxtCheckEnable.DisallowKDETHPackets is cleared.
For SDMA, a software check was included. This change
requires user processes to set the P_KEY before sending
any packets, otherwise, the sent packet will fail. The
original submission didn't have this check but it's
required.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NMikto Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e38d1e4f

IB/hfi1: Fix buffer cache races which may cause corruption · e88c9271

由 Mitko Haralanov 提交于 4月 12, 2016

There are two possible causes for node/memory corruption both
of which are related to the cache eviction algorithm. One way
to cause corruption is due to the asynchronous nature of the
MMU invalidation and the locking used when invalidating node.

The MMU invalidation routine would temporarily release the
RB tree lock to avoid a deadlock. However, this would allow
the eviction function to take the lock resulting in the removal
of cache nodes.

If the node being removed by the eviction code is the same as
the node being invalidated, the result is use after free.

The same is true in the other direction due to the temporary
release of the eviction list lock in the eviction loop.

Another corner case exists when dealing with the SDMA buffer
cache that could cause memory corruption of kernel memory.
The most common way, in which this corruption exhibits itself
is a linked list node corruption. In that case, the kernel will
complain that a node with poisoned pointers is being removed.
The fact that the pointers are already poisoned means that the
node has already been removed from the list.

To root cause of this corruption was a mishandling of the
eviction list maintained by the driver. In order for this
to happen four conditions need to be satisfied:

   1. A node describing a user buffer already exists in the
      interval RB tree,
   2. The beginning of the current user buffer matches that
      node but is bigger. This will cause the node to be
      extended.
   3. The amount of cached buffers is close or at the limit
      of the buffer cache size.
   4. The node has dropped close to the end of the eviction
      list. This will cause the node to be considered for
      eviction.

If all of the above conditions have been satisfied, it is
possible for the eviction algorithm to evict the current node,
which will free the node without the driver knowing.

To solve both issues described above:
   - the locking around the MMU invalidation loop and cache
     eviction loop has been improved so locks are not released in
     the loop body,
   - a new RB function is introduced which will "atomically" find
     and remove the matching node from the RB tree, preventing the
     MMU invalidation loop from touching it, and
   - the node being extended by the pin_vector_pages() function is
     removed from the eviction list prior to calling the eviction
     function.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e88c9271

IB/hfi1: Extract and reinsert MMU RB node on lookup · f53af85e

由 Mitko Haralanov 提交于 4月 12, 2016

The page pinning function, which also maintains the pin cache,
behaves one of two ways when an exact buffer match is not found:
  1. If no node is not found (a buffer with the same starting address
     is not found in the cache), a new node is created, the buffer
     pages are pinned, and the node is inserted into the RB tree, or
  2. If a node is found but the buffer in that node is a subset of
     the new user buffer, the node is extended with the new buffer
     pages.

Both modes of operation require (re-)insertion into the interval RB
tree.

When the node being inserted is a new node, the operations are pretty
simple. However, when the node is already existing and is being
extended, special care must be taken.

First, we want to guard against an asynchronous attempt to
delete the node by the MMU invalidation notifier. The simplest way to
do this is to remove the node from the RB tree, preventing the search
algorithm from finding it.

Second, the node needs to be re-inserted so it lands in the proper place
in the tree and the tree is correctly re-balanced. This also requires
the node to be removed from the RB tree.

This commit adds the hfi1_mmu_rb_extract() function, which will search
for a node in the interval RB tree matching an address and length and
remove it from the RB tree if found. This allows for both of the above
special cases be handled in a single step.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f53af85e

IB/hfi1: Correctly compute node interval · de79093b

由 Mitko Haralanov 提交于 4月 12, 2016

The computation of the interval of an interval RB node
was incorrect leading to data corruption due to the RB
search algorithm not properly finding the all RB nodes
in an MMU invalidation interval.

The problem stemmed from the fact that the beginning
address of the node's range was being aligned to a page
boundary. For certain buffer sizes, this would lead to
a end address calculation that was off by 1 page.

An important aspect of keeping the RB same is also
updating the node's range in the case it's being extended.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

de79093b

IB/hfi1: Fix memory leak in user ExpRcv and SDMA · 0ad2d3d0

由 Mitko Haralanov 提交于 4月 12, 2016

The driver had two memory leaks - one in the user
expected receive code and one in SDMA buffer cache.

The leak in the expected receive code only showed up
when the user/admin had set ulimit sufficiently low
and the driver did not have enough room in the cache
before hitting the limit of allowed cachable memory.

When this condition occurred, the driver returned
early signaling userland that it needed to free some
buffers to free up room in the cache.

The bug was that the driver was not cleaning up
allocated memory prior to returning early.

The leak in the SDMA buffer cache could occur (even
though it never did), when the insertion of a buffer
node in the interval RB tree failed. In this case, the
driver failed to unpin the pages of the node instead
erroneously returning success.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0ad2d3d0

IB/hfi1: Don't remove list entries if they are not in a list · 4787bc5e

由 Mitko Haralanov 提交于 4月 12, 2016

The SDMA cache logic maintains an eviction list which is ordered
by most recently used user buffers. Upon errors or buffer freeing,
the list nodes were unconditionally being deleted. This would lead
to list corruption warnings if the nodes were never inserted in the
eviction list to begin with.

This commit prevents this by checking that the nodes are already
part of the eviction list.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4787bc5e

IB/hfi1: Prevent unpinning of wrong pages · 849e3e93

由 Mitko Haralanov 提交于 4月 12, 2016

The routine used by the SDMA cache to handle already
cached nodes can extend an already existing node.

In its error handling code, the routine will unpin pages
when not all pages of the buffer extension were pinned.

There was a bug in that part of the routine, which would
mistakenly unpin pages from the original set rather than
the newly pinned pages.

This commit fixes that bug by offsetting the page array
to the proper place pointing at the beginning of the newly
pinned pages.
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

849e3e93

IB/hfi1: Prevent NULL pointer deferences in caching code · f19bd643

由 Mitko Haralanov 提交于 4月 12, 2016

There is a potential kernel crash when the MMU notifier calls the
invalidation routines in the hfi1 pinned page caching code for sdma.

The invalidation routine could call the remove callback
for the node, which in turn ends up dereferencing the
current task_struct to get a pointer to the mm_struct.
However, the mm_struct pointer could be NULL resulting in
the following backtrace:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
    IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    15
    task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
    RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
    RSP: 0000:ffff88085c245878  EFLAGS: 00010002
    RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
    RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
    RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
    R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
    R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
    FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
    Stack:
     ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
     ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
     0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
    Call Trace:
     [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
     [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
     [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
     [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
     [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
     [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
     [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
     [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
     [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
     [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
     [<ffffffff81121641>] shrink_zone+0x31/0x100
     [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
     [<ffffffff811229e3>] kswapd+0x403/0x7e0
     [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
     [<ffffffff81068ac0>] kthread+0xc0/0xd0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
     [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
     [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40

To correct this, the mm_struct passed to us by the MMU notifier is
used (which is what should have been done to begin with). This avoids
the broken derefences and ensures that the correct mm_struct is used.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f19bd643

22 3月, 2016 3 次提交

IB/hfi1: Add SDMA cache eviction algorithm · 5511d781

由 Mitko Haralanov 提交于 3月 08, 2016

This commit adds a cache eviction algorithm for the SDMA
user buffer cache.

Besides the interval RB tree used for node lookup, the cache
nodes are also arranged in a doubly-linked list. When a node is
used, it is put at the beginning of the list. Less frequently
used nodes naturally move to the tail of the list.

When the cache limit is reached, the eviction code starts
traversing the linked list in reverse, freeing buffers until
enough space has been freed to fit the new user buffer. This
guarantees that only the least used cache nodes will be removed
from the cache.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5511d781

IB/hfi1: Specify mm when releasing pages · bd3a8947

由 Mitko Haralanov 提交于 3月 08, 2016

This change adds a pointer to the process mm_struct when
calling hfi1_release_user_pages().

Previously, the function used the mm_struct of the current
process to adjust the number of pinned pages. However, is some
cases, namely when unpinning pages due to a MMU notifier call,
we want to drop into that code block as it will cause a deadlock
(the MMU notifiers take the process' mmap_sem prior to calling
the callbacks).

By allowing to caller to specify the pointer to the mm_struct,
the caller has finer control over that part of hfi1_release_user_pages().
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bd3a8947

IB/hfi1: Implement SDMA-side buffer caching · 5cd3a88d

由 Mitko Haralanov 提交于 3月 08, 2016

Add support for caching of user buffers used for SDMA
transfers. This change improves performance by
avoiding repeatedly pinning the pages of buffers, which
are being re-used by the application.

While the cost of the pinning operation has been made
heavier by adding the extra code to search the cache tree,
re-allocate pages arrays, and future cache evictions,
that cost will be amortized against the savings when the
same buffer is re-used. It is also worth noting that in
most cases, the cost of pinning should be much lower due
to the buffer already being in the cache.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NDean Luick <dean.luick@intel.com>
Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5cd3a88d

12 3月, 2016 2 次提交

staging: rdma: hfi1: Replace ALIGN with PAGE_ALIGN · 84449917

由 Amitoj Kaur Chawla 提交于 3月 04, 2016

mm.h contains a helper function PAGE_ALIGN which aligns the pointer
to the page boundary instead of using ALIGN(expression, PAGE_SIZE)

This change was made with the help of the following Coccinelle
semantic patch:
//<smpl>
@@
expression e;
symbol PAGE_SIZE;
@@
(
- ALIGN(e, PAGE_SIZE)
+ PAGE_ALIGN(e)
|
- IS_ALIGNED(e, PAGE_SIZE)
+ PAGE_ALIGNED(e)
)
//</smpl>
Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

84449917

staging: rdma: hfi1: user_sdma.c: Drop void pointer cast · 16ccad04

由 Janani Ravichandran 提交于 2月 25, 2016

Void pointers need not be cast to other pointer types.
Semantic patch used:

@r@
expression x;
void *e;
type T;
identifier f;
@@

(
  *((T *)e)
|
  ((T *)x) [...]
|
  ((T *)x)->f
|
- (T *)
  e
)
Signed-off-by: NJanani Ravichandran <janani.rvchndrn@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

16ccad04

11 3月, 2016 2 次提交

staging/rdma/hfi1: Fix header · 05d6ac1d

由 Jubin John 提交于 2月 14, 2016

Fix the header by moving the copyright notice out of the license text
and to the top of the header. Also, update the copyright date.
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

05d6ac1d

staging/rdma/hfi1: Add braces on all arms of statement · e490974e

由 Jubin John 提交于 2月 14, 2016

Add braces on all arms of statements to fix checkpatch check:
CHECK: braces {} should be used on all arms of this statement
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: NIra Weiny <ira.weiny@intel.com>
Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NJubin John <jubin.john@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e490974e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功