1. 11 6月, 2014 18 次提交
  2. 07 6月, 2014 2 次提交
  3. 06 6月, 2014 1 次提交
    • M
      powerpc/mm: Check paca psize is up to date for huge mappings · 09567e7f
      Michael Ellerman 提交于
      We have a bug in our hugepage handling which exhibits as an infinite
      loop of hash faults. If the fault is being taken in the kernel it will
      typically trigger the softlockup detector, or the RCU stall detector.
      
      The bug is as follows:
      
       1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
       2. Slice code converts the slice psize to 16M.
       3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
          synchronises the mm->context with the paca->context. So the paca slice
          mask is updated to include the 16M slice.
       3. Either:
          * mmap() fails because there are no huge pages available.
          * mmap() succeeds and the mapping is then munmapped.
          In both cases the slice psize remains at 16M in both the paca & mm.
       4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
       5. The slice psize is converted back to 64K. Because of the check on line 539
          of slice.c we DO NOT update the paca->context. The paca slice mask is now
          out of sync with the mm slice mask.
       6. User/kernel accesses 0xa0000000.
       7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
          to create an SLB entry and inserts it in the SLB.
      18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
          is found so a data access exception is generated.
      19. The data access handler calls do_page_fault() -> handle_mm_fault().
      10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
      11. The hardware retries the access, there is still nothing in the hash table
          so once again a data access exception is generated.
      12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
          hash. Although the THP mapping maps 16M the hashing is done using 64K
          as the segment page size.
      13. hash_page() returns immediately after calling __hash_page_thp(), skipping
          over the code at line 1125. Resulting in the mismatch between the
          paca->context and mm->context not being detected.
      14. The hardware retries the access, the hash it generates using the 16M
          SLB entry does NOT match the hash we inserted.
      15. We take another data access and go into __hash_page_thp().
      16. We see a valid entry in the hpte_slot_array and so we call updatepp()
          which succeeds.
      17. Goto 14.
      
      We could fix this in two ways. The first would be to remove or modify
      the check on line 539 of slice.c.
      
      The second option is to cause the check of paca psize in hash_page() on
      line 1125 to also be done for THP pages.
      
      We prefer the latter, because the check & update of the paca psize is
      not done until we know it's necessary. It's also done only on the
      current cpu, so we don't need to IPI all other cpus.
      
      Without further rearranging the code, the simplest fix is to pull out
      the code that checks paca psize and call it in two places. Firstly for
      THP/hugetlb, and secondly for other mappings as before.
      
      Thanks to Dave Jones for trinity, which originally found this bug.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: stable@vger.kernel.org [v3.11+]
      09567e7f
  4. 05 6月, 2014 16 次提交
  5. 02 6月, 2014 1 次提交
  6. 30 5月, 2014 2 次提交
    • A
      KVM: PPC: Book3S PR: Rework SLB switching code · d8d164a9
      Alexander Graf 提交于
      On LPAR guest systems Linux enables the shadow SLB to indicate to the
      hypervisor a number of SLB entries that always have to be available.
      
      Today we go through this shadow SLB and disable all ESID's valid bits.
      However, pHyp doesn't like this approach very much and honors us with
      fancy machine checks.
      
      Fortunately the shadow SLB descriptor also has an entry that indicates
      the number of valid entries following. During the lifetime of a guest
      we can just swap that value to 0 and don't have to worry about the
      SLB restoration magic.
      
      While we're touching the code, let's also make it more readable (get
      rid of rldicl), allow it to deal with a dynamic number of bolted
      SLB entries and only do shadow SLB swizzling on LPAR systems.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d8d164a9
    • A
      KVM: PPC: Book3S PR: Use SLB entry 0 · 207438d4
      Alexander Graf 提交于
      We didn't make use of SLB entry 0 because ... of no good reason. SLB entry 0
      will always be used by the Linux linear SLB entry, so the fact that slbia
      does not invalidate it doesn't matter as we overwrite SLB 0 on exit anyway.
      
      Just enable use of SLB entry 0 for our shadow SLB code.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      207438d4