- 03 8月, 2016 9 次提交
-
-
由 Dean Luick 提交于
If unable to insert node into the RB tree cache, node will be freed before returning from the function. Null out iovec's pointer to node so iovec does not try to free it later. Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Dean Luick 提交于
Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Dean Luick 提交于
Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ira Weiny 提交于
If a context has not been assigned or assignment failed, pq may be NULL. Move the unregister within the protection of the null check. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Dean Luick 提交于
Reviewed-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ira Weiny 提交于
For bool parameters "false" should be used Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ira Weiny 提交于
Brackets should be on the next line of a function Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Ira Weiny 提交于
The driver pads non-double word multiple message sizes but it doesn't account for this padding when the packet length is calculated. Also, the data length is miscalculated for message sizes less than 4 bytes due to the bit representation in LRH. And there's a check for non-double word multiple message sizes that prevents these messages from being sent. This patch fixes length miscalculations and enables the functionality to send non-double word multiple message sizes. Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com> Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: NIra Weiny <ira.weiny@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jianxin Xiong 提交于
Currently each user context is assigned a single SDMA engine based on the VL, context id, and subcontext id. That means for MPI applications, each rank can only use one SDMA engine for all messages. This may create unwanted backup for independent messages going to different destinations upon congestion at one destination. This patch adds the packet "dlid" to the formula of SDMA engine selection for user SDMA requests. A simple hash table is used to maintain even distribution among the available SDMA engines regardless how the "dlid" values are distributed. Reviewed-by: NDean Luick <dean.luick@intel.com> Reviewed-by: NTadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 07 6月, 2016 2 次提交
-
-
由 Bart Van Assche 提交于
Avoid that sparse reports the following warnings for the hfi1 driver: trace.c:217:13: warning: no previous prototype for ‘print_u64_array’ [-Wmissing-prototypes] user_sdma.c:1361:17: warning: dubious: !x & y Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Bart Van Assche 提交于
The first argument of test_bit() and clear_bit() is a bit number and not a bitmask. Hence change that first argument from (1 << 0) into 0. This patch avoids that smatch reports the following warnings: user_sdma.c:1059: sdma_cache_evict() warn: test_bit() takes a bit number user_sdma.c:1590: sdma_rb_remove() warn: test_bit() takes a bit number Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 26 5月, 2016 3 次提交
-
-
由 Dennis Dalessandro 提交于
The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jianxin Xiong 提交于
During the processing of a user SDMA request, if there was an error before the request counter was increased, the state of the packet queue could be updated incorrectly, causing the counter to underflow. As the result, the process could get stuck later since the counter could never get back to 0. This patch adds a condition to guard the packet queue update so that the counter is only decreased if it has been increased before the error happens. Reviewed-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
Commit e88c9271 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") introduced a bug which may cause a reference count of a interval RB node to be leaked in the case where an SDMA transfer from that node completes at the same time as the node is being extended. If a node is being extended, it is first removed from the RB tree in order to be processed without the risk of an invalidation event removing the node at the same time. If a SDMA completion happens during that time, the completion handler will fail to find the node in the RB tree and, therefore, fail to correctly decrement its refcount. This leaves the node in the tree and its pages pinned for the duration of the user process. To prevent this from happening the io vector adds a reference to the RB node, which is used during the SDMA completion instead of looking up the node in the RB tree. This change adds a performance improvement as a side effect by avoiding the RB tree lookup. Fixes: e88c9271 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") Reviewed-by: NDean Luick <dean.luick@intel.com> Reviewed-by: NHarish Chegondi <harish.chegondi@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 29 4月, 2016 8 次提交
-
-
由 Sebastian Sanchez 提交于
Add the P_KEY check for user-context mechanism for both PIO and SDMA. For PIO, the SendCtxtCheckEnable.DisallowKDETHPackets is set by default. When the P_KEY is set, SendCtxtCheckEnable.DisallowKDETHPackets is cleared. For SDMA, a software check was included. This change requires user processes to set the P_KEY before sending any packets, otherwise, the sent packet will fail. The original submission didn't have this check but it's required. Reviewed-by: NDean Luick <dean.luick@intel.com> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NMikto Haralanov <mitko.haralanov@intel.com> Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
There are two possible causes for node/memory corruption both of which are related to the cache eviction algorithm. One way to cause corruption is due to the asynchronous nature of the MMU invalidation and the locking used when invalidating node. The MMU invalidation routine would temporarily release the RB tree lock to avoid a deadlock. However, this would allow the eviction function to take the lock resulting in the removal of cache nodes. If the node being removed by the eviction code is the same as the node being invalidated, the result is use after free. The same is true in the other direction due to the temporary release of the eviction list lock in the eviction loop. Another corner case exists when dealing with the SDMA buffer cache that could cause memory corruption of kernel memory. The most common way, in which this corruption exhibits itself is a linked list node corruption. In that case, the kernel will complain that a node with poisoned pointers is being removed. The fact that the pointers are already poisoned means that the node has already been removed from the list. To root cause of this corruption was a mishandling of the eviction list maintained by the driver. In order for this to happen four conditions need to be satisfied: 1. A node describing a user buffer already exists in the interval RB tree, 2. The beginning of the current user buffer matches that node but is bigger. This will cause the node to be extended. 3. The amount of cached buffers is close or at the limit of the buffer cache size. 4. The node has dropped close to the end of the eviction list. This will cause the node to be considered for eviction. If all of the above conditions have been satisfied, it is possible for the eviction algorithm to evict the current node, which will free the node without the driver knowing. To solve both issues described above: - the locking around the MMU invalidation loop and cache eviction loop has been improved so locks are not released in the loop body, - a new RB function is introduced which will "atomically" find and remove the matching node from the RB tree, preventing the MMU invalidation loop from touching it, and - the node being extended by the pin_vector_pages() function is removed from the eviction list prior to calling the eviction function. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
The page pinning function, which also maintains the pin cache, behaves one of two ways when an exact buffer match is not found: 1. If no node is not found (a buffer with the same starting address is not found in the cache), a new node is created, the buffer pages are pinned, and the node is inserted into the RB tree, or 2. If a node is found but the buffer in that node is a subset of the new user buffer, the node is extended with the new buffer pages. Both modes of operation require (re-)insertion into the interval RB tree. When the node being inserted is a new node, the operations are pretty simple. However, when the node is already existing and is being extended, special care must be taken. First, we want to guard against an asynchronous attempt to delete the node by the MMU invalidation notifier. The simplest way to do this is to remove the node from the RB tree, preventing the search algorithm from finding it. Second, the node needs to be re-inserted so it lands in the proper place in the tree and the tree is correctly re-balanced. This also requires the node to be removed from the RB tree. This commit adds the hfi1_mmu_rb_extract() function, which will search for a node in the interval RB tree matching an address and length and remove it from the RB tree if found. This allows for both of the above special cases be handled in a single step. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
The computation of the interval of an interval RB node was incorrect leading to data corruption due to the RB search algorithm not properly finding the all RB nodes in an MMU invalidation interval. The problem stemmed from the fact that the beginning address of the node's range was being aligned to a page boundary. For certain buffer sizes, this would lead to a end address calculation that was off by 1 page. An important aspect of keeping the RB same is also updating the node's range in the case it's being extended. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
The driver had two memory leaks - one in the user expected receive code and one in SDMA buffer cache. The leak in the expected receive code only showed up when the user/admin had set ulimit sufficiently low and the driver did not have enough room in the cache before hitting the limit of allowed cachable memory. When this condition occurred, the driver returned early signaling userland that it needed to free some buffers to free up room in the cache. The bug was that the driver was not cleaning up allocated memory prior to returning early. The leak in the SDMA buffer cache could occur (even though it never did), when the insertion of a buffer node in the interval RB tree failed. In this case, the driver failed to unpin the pages of the node instead erroneously returning success. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
The SDMA cache logic maintains an eviction list which is ordered by most recently used user buffers. Upon errors or buffer freeing, the list nodes were unconditionally being deleted. This would lead to list corruption warnings if the nodes were never inserted in the eviction list to begin with. This commit prevents this by checking that the nodes are already part of the eviction list. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
The routine used by the SDMA cache to handle already cached nodes can extend an already existing node. In its error handling code, the routine will unpin pages when not all pages of the buffer extension were pinned. There was a bug in that part of the routine, which would mistakenly unpin pages from the original set rather than the newly pinned pages. This commit fixes that bug by offsetting the page array to the proper place pointing at the beginning of the newly pinned pages. Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
There is a potential kernel crash when the MMU notifier calls the invalidation routines in the hfi1 pinned page caching code for sdma. The invalidation routine could call the remove callback for the node, which in turn ends up dereferencing the current task_struct to get a pointer to the mm_struct. However, the mm_struct pointer could be NULL resulting in the following backtrace: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] 15 task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000 RIP: 0010:[<ffffffffa041f75a>] [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1] RSP: 0000:ffff88085c245878 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830 RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0 RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0 R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40 FS: 0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0 Stack: ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0 ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000 0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282 Call Trace: [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1] [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1] [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1] [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70 [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470 [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120 [<ffffffff81141fad>] try_to_unmap+0x4d/0x60 [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0 [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490 [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640 [<ffffffff81121641>] shrink_zone+0x31/0x100 [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0 [<ffffffff811229e3>] kswapd+0x403/0x7e0 [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0 [<ffffffff81068ac0>] kthread+0xc0/0xd0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0 [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40 To correct this, the mm_struct passed to us by the MMU notifier is used (which is what should have been done to begin with). This avoids the broken derefences and ensures that the correct mm_struct is used. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 22 3月, 2016 3 次提交
-
-
由 Mitko Haralanov 提交于
This commit adds a cache eviction algorithm for the SDMA user buffer cache. Besides the interval RB tree used for node lookup, the cache nodes are also arranged in a doubly-linked list. When a node is used, it is put at the beginning of the list. Less frequently used nodes naturally move to the tail of the list. When the cache limit is reached, the eviction code starts traversing the linked list in reverse, freeing buffers until enough space has been freed to fit the new user buffer. This guarantees that only the least used cache nodes will be removed from the cache. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
This change adds a pointer to the process mm_struct when calling hfi1_release_user_pages(). Previously, the function used the mm_struct of the current process to adjust the number of pinned pages. However, is some cases, namely when unpinning pages due to a MMU notifier call, we want to drop into that code block as it will cause a deadlock (the MMU notifiers take the process' mmap_sem prior to calling the callbacks). By allowing to caller to specify the pointer to the mm_struct, the caller has finer control over that part of hfi1_release_user_pages(). Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
Add support for caching of user buffers used for SDMA transfers. This change improves performance by avoiding repeatedly pinning the pages of buffers, which are being re-used by the application. While the cost of the pinning operation has been made heavier by adding the extra code to search the cache tree, re-allocate pages arrays, and future cache evictions, that cost will be amortized against the savings when the same buffer is re-used. It is also worth noting that in most cases, the cost of pinning should be much lower due to the buffer already being in the cache. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 12 3月, 2016 2 次提交
-
-
由 Amitoj Kaur Chawla 提交于
mm.h contains a helper function PAGE_ALIGN which aligns the pointer to the page boundary instead of using ALIGN(expression, PAGE_SIZE) This change was made with the help of the following Coccinelle semantic patch: //<smpl> @@ expression e; symbol PAGE_SIZE; @@ ( - ALIGN(e, PAGE_SIZE) + PAGE_ALIGN(e) | - IS_ALIGNED(e, PAGE_SIZE) + PAGE_ALIGNED(e) ) //</smpl> Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Janani Ravichandran 提交于
Void pointers need not be cast to other pointer types. Semantic patch used: @r@ expression x; void *e; type T; identifier f; @@ ( *((T *)e) | ((T *)x) [...] | ((T *)x)->f | - (T *) e ) Signed-off-by: NJanani Ravichandran <janani.rvchndrn@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 11 3月, 2016 12 次提交
-
-
由 Jubin John 提交于
Fix the header by moving the copyright notice out of the license text and to the top of the header. Also, update the copyright date. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jubin John 提交于
Add braces on all arms of statements to fix checkpatch check: CHECK: braces {} should be used on all arms of this statement Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jubin John 提交于
Fix code alignment to fix checkpatch check: CHECK: Alignment should match open parenthesis Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jubin John 提交于
Fix block comments with proper formatting to fix checkpatch warnings: WARNING: Block comments use * on subsequent lines WARNING: Block comments use a trailing */ on a separate line Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jubin John 提交于
Remove extra blank line before close brace to fix checkpatch check: CHECK: Blank lines aren't necessary before a close brace '}' Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jubin John 提交于
Remove the space after a cast to fix checkpatch check: CHECK: No space is necessary after a cast Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Jubin John 提交于
Add spaces around binary operators. Fixes checkpatch check: CHECK: spaces preferred around that 'x' where x is a binary operator Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mike Marciniszyn 提交于
The current implementation of the sdma_wait variable has a timing hole that can cause a completion Q entry to be returned from a pio send prior to an older sdma packets completion queue entry. The sdma_wait variable used to be decremented prior to calling the packet complete routine. The hole is between decrement and the verbs completion where send engine using pio could return a out of order completion in that window. This patch closes the hole by allowing an API option to specify an sdma_drained callback. The atomic dec is positioned after the complete callback to avoid the window as long as the pio path doesn't execute when there is a non-zero sdma count. Reviewed-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDean Luick <dean.luick@intel.com> Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
A race was discovred in the user SDMA code, which could result in an process being stuck in the kernel call indefinitely in certain error conditions. If, during the processing of a user SDMA request, there was an error *and* all outstanding SDMA descriptor had been completed by the time the that error case was handled in the calling function, the state of the packet queue would not get correctly updated resulting in the process subsequently getting stuck, thinking that there are more descriptors to be completed. To handle this scenario, the driver now checks the submitted packet count vs. the completed. If all submitted packets have also been completed, the driver can safely free the request and signal user level. Otherwise, this will be handled by the completion callback. Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
To facilitate locked page counting, the user SDMA routines would maintain a list of io vectors, which were freed in the completion callback and then unpin the associated pages during the next call into the kernel. Since the size of this list was unbounded, doing this was bad for performance because the driver ended up spending too much time freeing the io vectors. This commit changes how the io vector freeing is done by moving the actual page unpinning in the callback and maintaining a count of unpinned pages. This count can then be used during the next call into the kernel to update the mm->pinned_vm variable (since that requires process context and the ability to sleep.) Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
To ensure correct operation between the driver and PSM with respect to managing the SDMA request ring, it is important that the status for a particular request slot is set at the correct time. Otherwise, PSM can get out of sync with the driver, which could lead to hangs or errors on new requests. Properly determining of when to set the error status of a SDMA slot depends on knowing exactly when the last txreq for that request has been completed. This in turn requires that the driver knows exactly how many requests have been generated and how many of those requests have been successfully submitted to the SDMA queue. The previous implementation of the mid-layer SDMA API did not provide a way for the caller of sdma_send_txlist() to know how many of the txreqs in the input list have actually been submitted without traversing the list and counting. Since sdma_send_txlist() already traverses the list in order to process it, requiring such traversal in the caller is completely unnecessary. Therefore, it is much easier to enhance sdma_send_txlist() to return the number of successfully submitted txreqs. This, in turn, allows the caller to accurately determine the progress of the SDMA request and, therefore, correctly set the error status at the right time. Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Mitko Haralanov 提交于
Commit a0d40693 ("staging/rdma/hfi1: Add page lock limit check for SDMA requests") added a mechanism to delay the clean-up of user SDMA requests in order to facilitate proper locked page counting. This delayed processing was done using a kernel workqueue, which meant that a kernel thread would have to spin up and take CPU cycles to do the clean-up. This proved detrimental to performance because now there are two execution threads (the kernel workqueue and the user process) needing cycles on the same CPU. Performance-wise, it is much better to do as much of the clean-up as can be done in interrupt context (during the callback) and do the remaining work in-line during subsequent calls of the user process into the driver. The changes required to implement the above also significantly simplify the entire SDMA completion processing code and eliminate a memory corruption causing the following observed crash: [ 2881.703362] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2881.703389] IP: [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x18e0 [hfi1] [ 2881.703422] PGD 7d4d25067 PUD 77d96d067 PMD 0 [ 2881.703427] Oops: 0000 [#1] SMP [ 2881.703431] Modules linked in: [ 2881.703504] CPU: 28 PID: 6668 Comm: mpi_stress Tainted: G OENX 3.12.28-4-default #1 [ 2881.703508] Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0044.090 [ 2881.703512] task: ffff88077da8e0c0 ti: ffff880856772000 task.ti: ffff880856772000 [ 2881.703515] RIP: 0010:[<ffffffffa02897e4>] [<ffffffffa02897e4>] user_sdma_send_pkts+0xcd4/0x [ 2881.703529] RSP: 0018:ffff880856773c48 EFLAGS: 00010287 [ 2881.703531] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000002000 [ 2881.703534] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000002000 [ 2881.703537] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 2881.703540] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 2881.703543] R13: 0000000000000000 R14: ffff88071e782e68 R15: ffff8810532955c0 [ 2881.703546] FS: 00007f9c4375e700(0000) GS:ffff88107eec0000(0000) knlGS:0000000000000000 [ 2881.703549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2881.703551] CR2: 0000000000000000 CR3: 00000007d4cba000 CR4: 00000000003407e0 [ 2881.703554] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2881.703556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2881.703558] Stack: [ 2881.703559] ffffffff00002000 ffff881000001800 ffffffff00000000 00000000000080d0 [ 2881.703570] 0000000000000000 0000200000000000 0000000000000000 ffff88071e782db8 [ 2881.703580] ffff8807d4d08d80 ffff881053295600 0000000000000008 ffff88071e782fc8 [ 2881.703589] Call Trace: [ 2881.703691] [<ffffffffa028b5da>] hfi1_user_sdma_process_request+0x84a/0xab0 [hfi1] [ 2881.703777] [<ffffffffa0255412>] hfi1_aio_write+0xd2/0x110 [hfi1] [ 2881.703828] [<ffffffff8119e3d8>] do_sync_readv_writev+0x48/0x80 [ 2881.703837] [<ffffffff8119f78b>] do_readv_writev+0xbb/0x230 [ 2881.703843] [<ffffffff8119fab8>] SyS_writev+0x48/0xc0 This commit also addresses issues related to notification of user processes of SDMA request slot availability. The slot should be cleaned up first before the user processes is notified of its availability. Reviewed-by: NArthur Kepner <arthur.kepner@intel.com> Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: NJubin John <jubin.john@intel.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 2月, 2016 1 次提交
-
-
由 Amitoj Kaur Chawla 提交于
Remove duplicate include file. Found using includecheck. Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-