1. 03 8月, 2016 9 次提交
  2. 07 6月, 2016 2 次提交
  3. 26 5月, 2016 3 次提交
  4. 29 4月, 2016 8 次提交
    • S
      IB/hfi1: Check P_KEY for all sent packets from user mode · e38d1e4f
      Sebastian Sanchez 提交于
      Add the P_KEY check for user-context mechanism for
      both PIO and SDMA. For PIO, the
      SendCtxtCheckEnable.DisallowKDETHPackets is set by
      default. When the P_KEY is set,
      SendCtxtCheckEnable.DisallowKDETHPackets is cleared.
      For SDMA, a software check was included. This change
      requires user processes to set the P_KEY before sending
      any packets, otherwise, the sent packet will fail. The
      original submission didn't have this check but it's
      required.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: NMikto Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NSebastian Sanchez <sebastian.sanchez@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e38d1e4f
    • M
      IB/hfi1: Fix buffer cache races which may cause corruption · e88c9271
      Mitko Haralanov 提交于
      There are two possible causes for node/memory corruption both
      of which are related to the cache eviction algorithm. One way
      to cause corruption is due to the asynchronous nature of the
      MMU invalidation and the locking used when invalidating node.
      
      The MMU invalidation routine would temporarily release the
      RB tree lock to avoid a deadlock. However, this would allow
      the eviction function to take the lock resulting in the removal
      of cache nodes.
      
      If the node being removed by the eviction code is the same as
      the node being invalidated, the result is use after free.
      
      The same is true in the other direction due to the temporary
      release of the eviction list lock in the eviction loop.
      
      Another corner case exists when dealing with the SDMA buffer
      cache that could cause memory corruption of kernel memory.
      The most common way, in which this corruption exhibits itself
      is a linked list node corruption. In that case, the kernel will
      complain that a node with poisoned pointers is being removed.
      The fact that the pointers are already poisoned means that the
      node has already been removed from the list.
      
      To root cause of this corruption was a mishandling of the
      eviction list maintained by the driver. In order for this
      to happen four conditions need to be satisfied:
      
         1. A node describing a user buffer already exists in the
            interval RB tree,
         2. The beginning of the current user buffer matches that
            node but is bigger. This will cause the node to be
            extended.
         3. The amount of cached buffers is close or at the limit
            of the buffer cache size.
         4. The node has dropped close to the end of the eviction
            list. This will cause the node to be considered for
            eviction.
      
      If all of the above conditions have been satisfied, it is
      possible for the eviction algorithm to evict the current node,
      which will free the node without the driver knowing.
      
      To solve both issues described above:
         - the locking around the MMU invalidation loop and cache
           eviction loop has been improved so locks are not released in
           the loop body,
         - a new RB function is introduced which will "atomically" find
           and remove the matching node from the RB tree, preventing the
           MMU invalidation loop from touching it, and
         - the node being extended by the pin_vector_pages() function is
           removed from the eviction list prior to calling the eviction
           function.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e88c9271
    • M
      IB/hfi1: Extract and reinsert MMU RB node on lookup · f53af85e
      Mitko Haralanov 提交于
      The page pinning function, which also maintains the pin cache,
      behaves one of two ways when an exact buffer match is not found:
        1. If no node is not found (a buffer with the same starting address
           is not found in the cache), a new node is created, the buffer
           pages are pinned, and the node is inserted into the RB tree, or
        2. If a node is found but the buffer in that node is a subset of
           the new user buffer, the node is extended with the new buffer
           pages.
      
      Both modes of operation require (re-)insertion into the interval RB
      tree.
      
      When the node being inserted is a new node, the operations are pretty
      simple. However, when the node is already existing and is being
      extended, special care must be taken.
      
      First, we want to guard against an asynchronous attempt to
      delete the node by the MMU invalidation notifier. The simplest way to
      do this is to remove the node from the RB tree, preventing the search
      algorithm from finding it.
      
      Second, the node needs to be re-inserted so it lands in the proper place
      in the tree and the tree is correctly re-balanced. This also requires
      the node to be removed from the RB tree.
      
      This commit adds the hfi1_mmu_rb_extract() function, which will search
      for a node in the interval RB tree matching an address and length and
      remove it from the RB tree if found. This allows for both of the above
      special cases be handled in a single step.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f53af85e
    • M
      IB/hfi1: Correctly compute node interval · de79093b
      Mitko Haralanov 提交于
      The computation of the interval of an interval RB node
      was incorrect leading to data corruption due to the RB
      search algorithm not properly finding the all RB nodes
      in an MMU invalidation interval.
      
      The problem stemmed from the fact that the beginning
      address of the node's range was being aligned to a page
      boundary. For certain buffer sizes, this would lead to
      a end address calculation that was off by 1 page.
      
      An important aspect of keeping the RB same is also
      updating the node's range in the case it's being extended.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      de79093b
    • M
      IB/hfi1: Fix memory leak in user ExpRcv and SDMA · 0ad2d3d0
      Mitko Haralanov 提交于
      The driver had two memory leaks - one in the user
      expected receive code and one in SDMA buffer cache.
      
      The leak in the expected receive code only showed up
      when the user/admin had set ulimit sufficiently low
      and the driver did not have enough room in the cache
      before hitting the limit of allowed cachable memory.
      
      When this condition occurred, the driver returned
      early signaling userland that it needed to free some
      buffers to free up room in the cache.
      
      The bug was that the driver was not cleaning up
      allocated memory prior to returning early.
      
      The leak in the SDMA buffer cache could occur (even
      though it never did), when the insertion of a buffer
      node in the interval RB tree failed. In this case, the
      driver failed to unpin the pages of the node instead
      erroneously returning success.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0ad2d3d0
    • M
      IB/hfi1: Don't remove list entries if they are not in a list · 4787bc5e
      Mitko Haralanov 提交于
      The SDMA cache logic maintains an eviction list which is ordered
      by most recently used user buffers. Upon errors or buffer freeing,
      the list nodes were unconditionally being deleted. This would lead
      to list corruption warnings if the nodes were never inserted in the
      eviction list to begin with.
      
      This commit prevents this by checking that the nodes are already
      part of the eviction list.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4787bc5e
    • M
      IB/hfi1: Prevent unpinning of wrong pages · 849e3e93
      Mitko Haralanov 提交于
      The routine used by the SDMA cache to handle already
      cached nodes can extend an already existing node.
      
      In its error handling code, the routine will unpin pages
      when not all pages of the buffer extension were pinned.
      
      There was a bug in that part of the routine, which would
      mistakenly unpin pages from the original set rather than
      the newly pinned pages.
      
      This commit fixes that bug by offsetting the page array
      to the proper place pointing at the beginning of the newly
      pinned pages.
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      849e3e93
    • M
      IB/hfi1: Prevent NULL pointer deferences in caching code · f19bd643
      Mitko Haralanov 提交于
      There is a potential kernel crash when the MMU notifier calls the
      invalidation routines in the hfi1 pinned page caching code for sdma.
      
      The invalidation routine could call the remove callback
      for the node, which in turn ends up dereferencing the
      current task_struct to get a pointer to the mm_struct.
      However, the mm_struct pointer could be NULL resulting in
      the following backtrace:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
          IP: [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
          15
          task: ffff88085e66e080 ti: ffff88085c244000 task.ti: ffff88085c244000
          RIP: 0010:[<ffffffffa041f75a>]  [<ffffffffa041f75a>] sdma_rb_remove+0xaa/0x100 [hfi1]
          RSP: 0000:ffff88085c245878  EFLAGS: 00010002
          RAX: 0000000000000000 RBX: ffff88105b9bbd40 RCX: ffffea003931a830
          RDX: 0000000000000004 RSI: ffff88105754a9c0 RDI: ffff88105754a9c0
          RBP: ffff88085c245890 R08: ffff88105b9bbd70 R09: 00000000fffffffb
          R10: ffff88105b9bbd58 R11: 0000000000000013 R12: ffff88105754a9c0
          R13: 0000000000000001 R14: 0000000000000001 R15: ffff88105b9bbd40
          FS:  0000000000000000(0000) GS:ffff88107ef40000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00000000000000a8 CR3: 0000000001a0b000 CR4: 00000000001407e0
          Stack:
           ffff88105b9bbd40 ffff88080ec481a8 ffff88080ec481b8 ffff88085c2458c0
           ffffffffa03fa00e ffff88080ec48190 ffff88080ed9cd00 0000000001024000
           0000000000000000 ffff88085c245920 ffffffffa03fa0e7 0000000000000282
          Call Trace:
           [<ffffffffa03fa00e>] __mmu_rb_remove.isra.5+0x5e/0x70 [hfi1]
           [<ffffffffa03fa0e7>] mmu_notifier_mem_invalidate+0xc7/0xf0 [hfi1]
           [<ffffffffa03fa143>] mmu_notifier_page+0x13/0x20 [hfi1]
           [<ffffffff81156dd0>] __mmu_notifier_invalidate_page+0x50/0x70
           [<ffffffff81140bbb>] try_to_unmap_one+0x20b/0x470
           [<ffffffff81141ee7>] try_to_unmap_anon+0xa7/0x120
           [<ffffffff81141fad>] try_to_unmap+0x4d/0x60
           [<ffffffff8111fd7b>] shrink_page_list+0x2eb/0x9d0
           [<ffffffff81120ab3>] shrink_inactive_list+0x243/0x490
           [<ffffffff81121491>] shrink_lruvec+0x4c1/0x640
           [<ffffffff81121641>] shrink_zone+0x31/0x100
           [<ffffffff81121b0f>] kswapd_shrink_zone.constprop.62+0xef/0x1c0
           [<ffffffff811229e3>] kswapd+0x403/0x7e0
           [<ffffffff811225e0>] ? shrink_all_memory+0xf0/0xf0
           [<ffffffff81068ac0>] kthread+0xc0/0xd0
           [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
           [<ffffffff814ff8ec>] ret_from_fork+0x7c/0xb0
           [<ffffffff81068a00>] ? insert_kthread_work+0x40/0x40
      
      To correct this, the mm_struct passed to us by the MMU notifier is
      used (which is what should have been done to begin with). This avoids
      the broken derefences and ensures that the correct mm_struct is used.
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Reviewed-by: NDean Luick <dean.luick@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f19bd643
  5. 22 3月, 2016 3 次提交
  6. 12 3月, 2016 2 次提交
  7. 11 3月, 2016 12 次提交
  8. 23 2月, 2016 1 次提交