1. 17 8月, 2017 1 次提交
    • A
      powerpc/mm: Rename find_linux_pte_or_hugepte() · 94171b19
      Aneesh Kumar K.V 提交于
      Add newer helpers to make the function usage simpler. It is always
      recommended to use find_current_mm_pte() for walking the page table.
      If we cannot use find_current_mm_pte(), it should be documented why
      the said usage of __find_linux_pte() is safe against a parallel THP
      split.
      
      For now we have KVM code using __find_linux_pte(). This is because kvm
      code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
      with PACA soft_enabled = 1. We may want to fix that later and make
      sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
      we can switch kvm to use find_linux_pte().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      94171b19
  2. 16 8月, 2017 1 次提交
    • A
      powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line · 79cc38de
      Aneesh Kumar K.V 提交于
      With commit aa888a74 ("hugetlb: support larger than MAX_ORDER") we added
      support for allocating gigantic hugepages via kernel command line. Switch
      ppc64 arch specific code to use that.
      
      W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.
      
      We use the kernel command line to do reservation of hugetlb pages on powernv
      platforms. On pseries hash mmu mode the supported gigantic huge page size is
      16GB and that can only be allocated with hypervisor assist. For pseries the
      command line option doesn't do the allocation. Instead pseries does gigantic
      hugepage allocation based on hypervisor hint that is specified via
      "ibm,expected#pages" property of the memory node.
      
      Cc: Scott Wood <oss@buserror.net>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      79cc38de
  3. 08 8月, 2017 1 次提交
  4. 31 7月, 2017 1 次提交
  5. 23 6月, 2017 1 次提交
    • B
      powerpc/mm: Trace tlbie(l) instructions · 0428491c
      Balbir Singh 提交于
      Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
      Entry (Local)) instructions.
      
      The tlbie instruction has changed over the years, so not all versions
      accept the same operands. Use the ISA v3 field operands because they are
      the most verbose, we may change them in future.
      
      Example output:
      
        qemu-system-ppc-5371  [016]  1412.369519: tlbie:
        	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Add some missing trace_tlbie()s, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0428491c
  6. 11 4月, 2017 1 次提交
  7. 03 4月, 2017 1 次提交
  8. 01 4月, 2017 1 次提交
    • A
      powerpc/pseries: Skip using reserved virtual address range · 82228e36
      Aneesh Kumar K.V 提交于
      Now that we use all the available virtual address range, we need to make
      sure we don't generate VSID such that it overlaps with the reserved vsid
      range. Reserved vsid range include the virtual address range used by the
      adjunct partition and also the VRMA virtual segment. We find the context
      value that can result in generating such a VSID and reserve it early in
      boot.
      
      We don't look at the adjunct range, because for now we disable the
      adjunct usage in a Linux LPAR via CAS interface.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      82228e36
  9. 31 3月, 2017 2 次提交
  10. 02 3月, 2017 1 次提交
  11. 10 2月, 2017 2 次提交
  12. 17 1月, 2017 1 次提交
  13. 25 12月, 2016 1 次提交
  14. 25 11月, 2016 1 次提交
  15. 23 11月, 2016 1 次提交
    • P
      powerpc/64: Provide functions for accessing POWER9 partition table · 9d661958
      Paul Mackerras 提交于
      POWER9 requires the host to set up a partition table, which is a
      table in memory indexed by logical partition ID (LPID) which
      contains the pointers to page tables and process tables for the
      host and each guest.
      
      This factors out the initialization of the partition table into
      a single function.  This code was previously duplicated between
      hash_utils_64.c and pgtable-radix.c.
      
      This provides a function for setting a partition table entry,
      which is used in early MMU initialization, and will be used by
      KVM whenever a guest is created.  This function includes a tlbie
      instruction which will flush all TLB entries for the LPID and
      all caches of the partition table entry for the LPID, across the
      system.
      
      This also moves a call to memblock_set_current_limit(), which was
      in radix_init_partition_table(), but has nothing to do with the
      partition table.  By analogy with the similar code for hash, the
      call gets moved to near the end of radix__early_init_mmu().  It
      now gets called when running as a guest, whereas previously it
      would only be called if the kernel is running as the host.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9d661958
  16. 18 11月, 2016 1 次提交
  17. 12 10月, 2016 1 次提交
    • M
      powerpc/mm/hash64: Fix might_have_hea() check · 08bf75ba
      Michael Ellerman 提交于
      In commit 2b4e3ad8 ("powerpc/mm/hash64: Don't test for machine type
      to detect HEA special case") we changed the logic in might_have_hea()
      to check FW_FEATURE_SPLPAR rather than machine_is(pseries).
      
      However the check was incorrectly negated, leading to crashes on
      machines with HEA adapters, such as:
      
        mm: Hashing failure ! EA=0xd000080080004040 access=0x800000000000000e current=NetworkManager
            trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003cc033e701ae
        Unable to handle kernel paging request for data at address 0xd000080080004040
        Call Trace:
          .ehea_create_cq+0x148/0x340 [ehea] (unreliable)
          .ehea_up+0x258/0x1200 [ehea]
          .ehea_open+0x44/0x1a0 [ehea]
          ...
      
      Fix it by removing the negation.
      
      Fixes: 2b4e3ad8 ("powerpc/mm/hash64: Don't test for machine type to detect HEA special case")
      Cc: stable@vger.kernel.org # v4.8+
      Reported-by: NDenis Kirjanov <kda@linux-powerpc.org>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      08bf75ba
  18. 23 9月, 2016 1 次提交
  19. 13 9月, 2016 1 次提交
  20. 09 9月, 2016 1 次提交
    • P
      powerpc/mm: Speed up computation of base and actual page size for a HPTE · 0eeede0c
      Paul Mackerras 提交于
      This replaces a 2-D search through an array with a simple 8-bit table
      lookup for determining the actual and/or base page size for a HPT entry.
      
      The encoding in the second doubleword of the HPTE is designed to encode
      the actual and base page sizes without using any more bits than would be
      needed for a 4k page number, by using between 1 and 8 low-order bits of
      the RPN (real page number) field to encode the page sizes.  A single
      "large page" bit in the first doubleword indicates that these low-order
      bits are to be interpreted like this.
      
      We can determine the page sizes by using the low-order 8 bits of the RPN
      to look up a 256-entry table.  For actual page sizes less than 1MB, some
      of the upper bits of these 8 bits are going to be real address bits, but
      we can cope with that by replicating the entries for those smaller page
      sizes.
      
      While we're at it, let's move the hpte_page_size() and hpte_base_page_size()
      functions from a KVM-specific header to a header for 64-bit HPT systems,
      since this computation doesn't have anything specifically to do with KVM.
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      0eeede0c
  21. 01 8月, 2016 2 次提交
  22. 28 7月, 2016 1 次提交
  23. 26 7月, 2016 2 次提交
  24. 21 7月, 2016 5 次提交
  25. 17 7月, 2016 2 次提交
  26. 05 7月, 2016 1 次提交
  27. 30 6月, 2016 1 次提交
  28. 17 6月, 2016 1 次提交
    • A
      powerpc/mm/hash: Don't add memory coherence if cache inhibited is set · e568006b
      Aneesh Kumar K.V 提交于
      H_ENTER hcall handling in qemu had assumptions that a cache inhibited
      hpte entry won't have memory conference set. Also older kernel
      mentioned that some version of pHyp required this (the code removed
      by the below commit says:
      
          /* Make pHyp happy */
          if ((rflags & _PAGE_NO_CACHE) && !(rflags & _PAGE_WRITETHRU))
                  hpte_r &= ~HPTE_R_M;
      
      But with older kernel we had some inconsistent memory conherence
      mapping. We always enabled memory conherence in the page fault path and
      removed memory conherence is _PAGE_NO_CACHE was set when we mapped the
      page via htab_bolt_mapping. The commit mentioned below tried to
      consolidate that by always enabling memory conherence. But as mentioned
      above that breaks Qemu H_ENTER handling.
      
      This patch update this such that we enable memory conherence only if
      cache inhibited is not set and bring fault handling, lpar and bolt
      mapping in sync.
      
      Fixes: commit 30bda41a("powerpc/mm: Drop WIMG in favour of new constant")
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e568006b
  29. 01 6月, 2016 1 次提交
    • A
      powerpc/mm/hash: Fix the reference bit update when handling hash fault · dc47c0c1
      Aneesh Kumar K.V 提交于
      When we converted the asm routines to C functions, we missed updating
      HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower
      bits from pte via.
      
      andi.	r3,r30,0x1fe		/* Get basic set of flags */
      
      We also update the code such that we won't update the Change bit ('C'
      bit) always. This was added by commit c5cf0e30 ("powerpc: Fix
      buglet with MMU hash management").
      
      With hash64, we need to make sure that hardware doesn't do a pte update
      directly. This is because we do end up with entries in TLB with no hash
      page table entry. This happens because when we find a hash bucket full,
      we "evict" a more/less random entry from it. When we do that we don't
      invalidate the TLB (hpte_remove) because we assume the old translation
      is still technically "valid". For more info look at commit
      0608d692("powerpc/mm: Always invalidate tlb on hpte invalidate and
      update").
      
      Thus it's critical that valid hash PTEs always have reference bit set
      and writeable ones have change bit set. We do this by hashing a
      non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and
      thus R) when hashing anything else in. Any attempt by Linux at clearing
      those bits also removes the corresponding hash entry.
      
      Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always.
      We don't really need to do that because we never map a RW pte entry
      without setting 'C' bit. On READ fault on a RW pte entry, we still map
      it READ only, hence a store update in the page will still cause a hash
      pte fault.
      
      This patch reverts the part of commit c5cf0e30 ("[PATCH] powerpc:
      Fix buglet with MMU hash management") and retain the updatepp part.
      
      - If we hit the updatepp path on native, the old code without that
        commit, would fail to set C bcause native_hpte_updatepp()
        was implemented to filter the same bits as H_PROTECT and not let C
        through thus we would "upgrade" a RO HPTE to RW without setting C
        thus causing the bug. So the real fix in that commit was the change
        to native_hpte_updatepp
      
      Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C")
      Cc: stable@vger.kernel.org # v4.5+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dc47c0c1
  30. 11 5月, 2016 2 次提交
    • M
      powerpc/mm/hash64: Fix subpage protection with 4K HPTE config · aac55d75
      Michael Ellerman 提交于
      With Linux page size of 64K and hardware only supporting 4K HPTE, if we
      use subpage protection, we always fail for the subpage 0 as shown
      below (using the selftest subpage_prot test):
      
        520175565:  (4520111850): Failed at 0x3fffad4b0000 (p=13,sp=0,w=0), want=fault, got=pass !
        4520890210: (4520826495): Failed at 0x3fffad5b0000 (p=29,sp=0,w=0), want=fault, got=pass !
        4521574251: (4521510536): Failed at 0x3fffad6b0000 (p=45,sp=0,w=0), want=fault, got=pass !
        4522258324: (4522194609): Failed at 0x3fffad7b0000 (p=61,sp=0,w=0), want=fault, got=pass !
      
      This is because hash preload wrongly inserts the HPTE entry for subpage
      0 without looking at the subpage protection information.
      
      Fix it by teaching should_hash_preload() not to preload if we have
      subpage protection configured for that range.
      
      It appears this has been broken since it was introduced in 2008.
      
      Fixes: fa28237c ("[POWERPC] Provide a way to protect 4k subpages when using 64k pages")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Rework into should_hash_preload() to avoid build fails w/SLICES=n]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aac55d75
    • M
      powerpc/mm/hash64: Factor out hash preload psize check · 8bbc9b7b
      Michael Ellerman 提交于
      Currently we have a check in hash_preload() against the psize, which is
      only included when CONFIG_PPC_MM_SLICES is enabled. We want to expand
      this check in a subsequent patch, so factor it out to allow that. As a
      bonus it removes the #ifdef in the C code.
      
      Unfortunately we can't put this in the existing CONFIG_PPC_MM_SLICES
      block because it would require a forward declaration.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8bbc9b7b