1. 02 3月, 2017 4 次提交
  2. 17 2月, 2017 3 次提交
  3. 16 2月, 2017 1 次提交
    • P
      powerpc/64: Disable use of radix under a hypervisor · 3f91a89d
      Paul Mackerras 提交于
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it may try to use the radix MMU even though it doesn't have
      the necessary code to do so (it doesn't negotiate use of radix, and it
      doesn't do the H_REGISTER_PROC_TBL hcall).  If the hypervisor supports
      both radix and HPT, then it will set up the guest to use HPT (since the
      guest doesn't request radix in the CAS call), but if the radix feature
      bit is set in the ibm,pa-features property (which is valid, since
      ibm,pa-features is defined to represent the capabilities of the
      processor) the guest will try to use radix, resulting in a crash when
      it turns the MMU on.
      
      This makes the minimal fix for the current code, which is to disable
      radix unless we are running in hypervisor mode.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3f91a89d
  4. 15 2月, 2017 1 次提交
    • A
      powerpc/mm: Update PROTFAULT handling in the page fault path · 18061c17
      Aneesh Kumar K.V 提交于
      With radix, we can get page fault with DSISR_PROTFAULT value set in case of
      PROT_NONE or autonuma mapping. The PROT_NONE case in handled by the vma check
      where we consider the access bad. For autonuma we should fall through and fixup
      the access mask correctly.
      
      Without this patch we trigger the WARN_ON() on radix. This code moves that
      WARN_ON() within a radix_enabled() check. I also moved the WARN_ON() outside
      the if condition making it apply for all type of faults (exec/write/read). It
      is also conditionalized for book3s, because BOOK3E can also get a PROTFAULT to
      handle the D/I cache sync.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      18061c17
  5. 14 2月, 2017 1 次提交
    • M
      powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y · a05ef161
      Michael Ellerman 提交于
      Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y:
      
        arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’:
        arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function)
        if (get_pageblock_migratetype(page) == MIGRATE_CMA) {
        ^~~~~~~~~~~
      
      Fix it by using the existing is_migrate_cma_page(), which evaulates to
      false when CMA=n.
      
      Fixes: 2e5bbb54 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a05ef161
  6. 10 2月, 2017 3 次提交
  7. 09 2月, 2017 1 次提交
  8. 08 2月, 2017 1 次提交
  9. 31 1月, 2017 8 次提交
    • P
      powerpc/64: Make type of partition table flush depend on partition type · 16ed1416
      Paul Mackerras 提交于
      When changing a partition table entry on POWER9, we do a particular
      form of the tlbie instruction which flushes all TLBs and caches of
      the partition table for a given logical partition ID (LPID).
      This instruction has a field in the instruction word, labelled R
      (radix), which should be 1 if the partition was previously a radix
      partition and 0 if it was a HPT partition.  This implements that
      logic.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      16ed1416
    • P
      powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM · ba9b399a
      Paul Mackerras 提交于
      This exports the pgtable_cache array and the pgtable_cache_add
      function so that HV KVM can use them for allocating radix page
      tables for guests.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ba9b399a
    • P
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras 提交于
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940
    • P
      powerpc/64: Don't try to use radix MMU under a hypervisor · 18569c1f
      Paul Mackerras 提交于
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it will try to use the radix MMU even though it doesn't have
      the necessary code to use radix under a hypervisor (it doesn't negotiate
      use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
      result is that the guest kernel will crash when it tries to turn on the
      MMU.
      
      This fixes it by looking for the /chosen/ibm,architecture-vec-5
      property, and if it exists, clears the radix MMU feature bit, before we
      decide whether to initialize for radix or HPT. This property is created
      by the hypervisor as a result of the guest calling the
      ibm,client-architecture-support method to indicate its capabilities, so
      it will indicate whether the hypervisor agreed to us using radix.
      
      Systems without a hypervisor may have this property also (for example,
      skiboot creates it), so we check the HV bit in the MSR to see whether we
      are running as a guest or not. If we are in hypervisor mode, then we can
      do whatever we like including using the radix MMU.
      
      The reason for using this property is that in future, when we have
      support for using radix under a hypervisor, we will need to check this
      property to see whether the hypervisor agreed to us using radix.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      18569c1f
    • R
      powerpc/mm: unstub radix__vmemmap_remove_mapping() · 0d0a4bc2
      Reza Arbab 提交于
      Use remove_pagetable() and friends for radix vmemmap removal.
      
      We do not require the special-case handling of vmemmap done in the x86
      versions of these functions. This is because vmemmap_free() has already
      freed the mapped pages, and calls us with an aligned address range.
      
      So, add a few failsafe WARNs, but otherwise the code to remove physical
      mappings is already sufficient for vmemmap.
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0d0a4bc2
    • R
      powerpc/mm: add radix__remove_section_mapping() · 4b5d62ca
      Reza Arbab 提交于
      Tear down and free the four-level page tables of physical mappings
      during memory hotremove.
      
      Borrow the basic structure of remove_pagetable() and friends from the
      identically-named x86 functions. Reduce the frequency of tlb flushes and
      page_table_lock spinlocks by only doing them in the outermost function.
      There was some question as to whether the locking is needed at all.
      Leave it for now, but we could consider dropping it.
      
      Memory must be offline to be removed, thus not in use. So there
      shouldn't be the sort of concurrent page walking activity here that
      might prompt us to use RCU.
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4b5d62ca
    • R
      powerpc/mm: add radix__create_section_mapping() · 6cc27341
      Reza Arbab 提交于
      Wire up memory hotplug page mapping for radix. Share the mapping
      function already used by radix_init_pgtable().
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6cc27341
    • R
      powerpc/mm: refactor radix physical page mapping · b5200ec9
      Reza Arbab 提交于
      Move the page mapping code in radix_init_pgtable() into a separate
      function that will also be used for memory hotplug.
      
      The current goto loop progressively decreases its mapping size as it
      covers the tail of a range whose end is unaligned. Change this to a for
      loop which can do the same for both ends of the range.
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b5200ec9
  10. 30 1月, 2017 4 次提交
  11. 25 1月, 2017 1 次提交
  12. 23 1月, 2017 1 次提交
  13. 18 1月, 2017 3 次提交
  14. 17 1月, 2017 1 次提交
  15. 25 12月, 2016 2 次提交
  16. 13 12月, 2016 1 次提交
    • R
      powerpc/mm: allow memory hotplug into a memoryless node · 4a3bac4e
      Reza Arbab 提交于
      Patch series "enable movable nodes on non-x86 configs", v7.
      
      This patchset allows more configs to make use of movable nodes.  When
      CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such
      nodes into the system:
      
      1. Discover movable nodes at boot. Currently this is only possible on
         x86, but we will enable configs supporting fdt to do the same.
      
      2. Hotplug and online all of a node's memory using online_movable. This
         is already possible on any config supporting memory hotplug, not
         just x86, but the Kconfig doesn't say so. We will fix that.
      
      We'll also remove some cruft on power which would prevent (2).
      
      This patch (of 5):
      
      Remove the check which prevents us from hotplugging into an empty node.
      
      The original commit b226e462 ("[PATCH] powerpc: don't add memory to
      empty node/zone"), states that this was intended to be a temporary measure.
      It is a workaround for an oops which no longer occurs.
      
      Link: http://lkml.kernel.org/r/1479160961-25840-2-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alistair Popple <apopple@au1.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a3bac4e
  17. 10 12月, 2016 3 次提交
    • C
      powerpc/8xx: Implement support of hugepages · 4b914286
      Christophe Leroy 提交于
      8xx uses a two level page table with two different linux page size
      support (4k and 16k). 8xx also support two different hugepage sizes
      512k and 8M. In order to support them on linux we define two different
      page table layout.
      
      The size of pages is in the PGD entry, using PS field (bits 28-29):
      00 : Small pages (4k or 16k)
      01 : 512k pages
      10 : reserved
      11 : 8M pages
      
      For 512K hugepage size a pgd entry have the below format
      [<hugepte address >0101] . The hugepte table allocated will contain 8
      entries pointing to 512K huge pte in 4k pages mode and 64 entries in
      16k pages mode.
      
      For 8M in 16k mode, a pgd entry have the below format
      [<hugepte address >1101] . The hugepte table allocated will contain 8
      entries pointing to 8M huge pte.
      
      For 8M in 4k mode, multiple pgd entries point to the same hugepte
      address and pgd entry will have the below format
      [<hugepte address>1101]. The hugepte table allocated will only have one
      entry.
      
      For the time being, we do not support CPU15 ERRATA when HUGETLB is
      selected
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3, for the generic bits)
      Signed-off-by: NScott Wood <oss@buserror.net>
      4b914286
    • C
      powerpc: get hugetlbpage handling more generic · 03bb2d65
      Christophe Leroy 提交于
      Today there are two implementations of hugetlbpages which are managed
      by exclusive #ifdefs:
      * FSL_BOOKE: several directory entries points to the same single hugepage
      * BOOK3S: one upper level directory entry points to a table of hugepages
      
      In preparation of implementation of hugepage support on the 8xx, we
      need a mix of the two above solutions, because the 8xx needs both cases
      depending on the size of pages:
      * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
      that 2 PGD entries will be necessary to cover an 8M hugepage while a
      single PGD entry will cover 8x 512k hugepages.
      * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
      that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
      hugepages will be covers by one PGD entry.
      
      This patch:
      * removes #ifdefs in favor of if/else based on the range sizes
      * merges the two huge_pte_alloc() functions as they are pretty similar
      * merges the two hugetlbpage_init() functions as they are pretty similar
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3)
      Signed-off-by: NScott Wood <oss@buserror.net>
      03bb2d65
    • C
      powerpc: port 64 bits pgtable_cache to 32 bits · 9b081e10
      Christophe Leroy 提交于
      Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
      standard pages when using 4k pages and a single pgtable_cache
      if using other size pages.
      
      In preparation of implementing huge pages on the 8xx, this patch
      replaces the specific powerpc32 handling by the 64 bits approach.
      
      This is done by:
      * moving 64 bits pgtable_cache_add() and pgtable_cache_init()
      in a new file called init-common.c
      * modifying pgtable_cache_init() to also handle the case
      without PMD
      * removing the 32 bits version of pgtable_cache_add() and
      pgtable_cache_init()
      * copying related header contents from 64 bits into both the
      book3s/32 and nohash/32 header files
      
      On the 8xx, the following cache sizes will be used:
      * 4k pages mode:
      - PGT_CACHE(10) for PGD
      - PGT_CACHE(3) for 512k hugepage tables
      * 16k pages mode:
      - PGT_CACHE(6) for PGD
      - PGT_CACHE(7) for 512k hugepage tables
      - PGT_CACHE(3) for 8M hugepage tables
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      9b081e10
  18. 02 12月, 2016 1 次提交
    • A
      powerpc/mm/iommu, vfio/spapr: Put pages on VFIO container shutdown · 4b6fad70
      Alexey Kardashevskiy 提交于
      At the moment the userspace tool is expected to request pinning of
      the entire guest RAM when VFIO IOMMU SPAPR v2 driver is present.
      When the userspace process finishes, all the pinned pages need to
      be put; this is done as a part of the userspace memory context (MM)
      destruction which happens on the very last mmdrop().
      
      This approach has a problem that a MM of the userspace process
      may live longer than the userspace process itself as kernel threads
      use userspace process MMs which was runnning on a CPU where
      the kernel thread was scheduled to. If this happened, the MM remains
      referenced until this exact kernel thread wakes up again
      and releases the very last reference to the MM, on an idle system this
      can take even hours.
      
      This moves preregistered regions tracking from MM to VFIO; insteads of
      using mm_iommu_table_group_mem_t::used, tce_container::prereg_list is
      added so each container releases regions which it has pre-registered.
      
      This changes the userspace interface to return EBUSY if a memory
      region is already registered in a container. However it should not
      have any practical effect as the only userspace tool available now
      does register memory region once per container anyway.
      
      As tce_iommu_register_pages/tce_iommu_unregister_pages are called
      under container->lock, this does not need additional locking.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4b6fad70