1. 06 11月, 2017 1 次提交
    • M
      powerpc/64s: Replace CONFIG_PPC_STD_MMU_64 with CONFIG_PPC_BOOK3S_64 · 4e003747
      Michael Ellerman 提交于
      CONFIG_PPC_STD_MMU_64 indicates support for the "standard" powerpc MMU
      on 64-bit CPUs. The "standard" MMU refers to the hash page table MMU
      found in "server" processors, from IBM mainly.
      
      Currently CONFIG_PPC_STD_MMU_64 is == CONFIG_PPC_BOOK3S_64. While it's
      annoying to have two symbols that always have the same value, it's not
      quite annoying enough to bother removing one.
      
      However with the arrival of Power9, we now have the situation where
      CONFIG_PPC_STD_MMU_64 is enabled, but the kernel is running using the
      Radix MMU - *not* the "standard" MMU. So it is now actively confusing
      to use it, because it implies that code is disabled or inactive when
      the Radix MMU is in use, however that is not necessarily true.
      
      So s/CONFIG_PPC_STD_MMU_64/CONFIG_PPC_BOOK3S_64/, and do some minor
      formatting updates of some of the affected lines.
      
      This will be a pain for backports, but c'est la vie.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4e003747
  2. 10 8月, 2017 1 次提交
  3. 24 7月, 2017 1 次提交
  4. 02 7月, 2017 2 次提交
  5. 28 6月, 2017 1 次提交
  6. 31 3月, 2017 1 次提交
  7. 21 3月, 2017 1 次提交
  8. 06 3月, 2017 1 次提交
    • S
      powerpc: Update to new option-vector-5 format for CAS · 014d02cb
      Suraj Jitindar Singh 提交于
      On POWER9 the ibm,client-architecture-support (CAS) negotiation process
      has been updated to change how the host to guest negotiation is done for
      the new hash/radix mmu as well as the nest mmu, process tables and guest
      translation shootdown (GTSE).
      
      This is documented in the unreleased PAPR ACR "CAS option vector
      additions for P9".
      
      The host tells the guest which options it supports in
      ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
      to request in the CAS call and these are agreed to in the
      ibm,architecture-vec-5 property of the chosen node.
      
      Thus we read ibm,arch-vec-5-platform-support and make our selection before
      calling CAS. We then parse the ibm,architecture-vec-5 property of the
      chosen node to check whether we should run as hash or radix.
      
      ibm,arch-vec-5-platform-support format:
      
      index value pairs: <index, val> ... <index, val>
      
      index: Option vector 5 byte number
      val:   Some representation of supported values
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Acked-by: NPaul Mackerras <paulus@ozlabs.org>
      [mpe: Don't print about unknown options, be consistent with OV5_FEAT]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      014d02cb
  9. 16 2月, 2017 1 次提交
    • P
      powerpc/64: Disable use of radix under a hypervisor · 3f91a89d
      Paul Mackerras 提交于
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it may try to use the radix MMU even though it doesn't have
      the necessary code to do so (it doesn't negotiate use of radix, and it
      doesn't do the H_REGISTER_PROC_TBL hcall).  If the hypervisor supports
      both radix and HPT, then it will set up the guest to use HPT (since the
      guest doesn't request radix in the CAS call), but if the radix feature
      bit is set in the ibm,pa-features property (which is valid, since
      ibm,pa-features is defined to represent the capabilities of the
      processor) the guest will try to use radix, resulting in a crash when
      it turns the MMU on.
      
      This makes the minimal fix for the current code, which is to disable
      radix unless we are running in hypervisor mode.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3f91a89d
  10. 31 1月, 2017 2 次提交
    • P
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras 提交于
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940
    • P
      powerpc/64: Don't try to use radix MMU under a hypervisor · 18569c1f
      Paul Mackerras 提交于
      Currently, if the kernel is running on a POWER9 processor under a
      hypervisor, it will try to use the radix MMU even though it doesn't have
      the necessary code to use radix under a hypervisor (it doesn't negotiate
      use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The
      result is that the guest kernel will crash when it tries to turn on the
      MMU.
      
      This fixes it by looking for the /chosen/ibm,architecture-vec-5
      property, and if it exists, clears the radix MMU feature bit, before we
      decide whether to initialize for radix or HPT. This property is created
      by the hypervisor as a result of the guest calling the
      ibm,client-architecture-support method to indicate its capabilities, so
      it will indicate whether the hypervisor agreed to us using radix.
      
      Systems without a hypervisor may have this property also (for example,
      skiboot creates it), so we check the HV bit in the MSR to see whether we
      are running as a guest or not. If we are in hypervisor mode, then we can
      do whatever we like including using the radix MMU.
      
      The reason for using this property is that in future, when we have
      support for using radix under a hypervisor, we will need to check this
      property to see whether the hypervisor agreed to us using radix.
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      18569c1f
  11. 25 12月, 2016 1 次提交
  12. 10 12月, 2016 1 次提交
    • C
      powerpc: port 64 bits pgtable_cache to 32 bits · 9b081e10
      Christophe Leroy 提交于
      Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
      standard pages when using 4k pages and a single pgtable_cache
      if using other size pages.
      
      In preparation of implementing huge pages on the 8xx, this patch
      replaces the specific powerpc32 handling by the 64 bits approach.
      
      This is done by:
      * moving 64 bits pgtable_cache_add() and pgtable_cache_init()
      in a new file called init-common.c
      * modifying pgtable_cache_init() to also handle the case
      without PMD
      * removing the 32 bits version of pgtable_cache_add() and
      pgtable_cache_init()
      * copying related header contents from 64 bits into both the
      book3s/32 and nohash/32 header files
      
      On the 8xx, the following cache sizes will be used:
      * 4k pages mode:
      - PGT_CACHE(10) for PGD
      - PGT_CACHE(3) for 512k hugepage tables
      * 16k pages mode:
      - PGT_CACHE(6) for PGD
      - PGT_CACHE(7) for 512k hugepage tables
      - PGT_CACHE(3) for 8M hugepage tables
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      9b081e10
  13. 01 8月, 2016 6 次提交
  14. 01 5月, 2016 3 次提交
  15. 03 3月, 2016 1 次提交
  16. 01 3月, 2016 2 次提交
    • D
      powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6
      David Gibson 提交于
      This makes a number of cleanups to handling of mapping failures during
      memory hotplug on Power:
      
      For errors creating the linear mapping for the hot-added region:
        * This is now reported with EFAULT which is more appropriate than the
          previous EINVAL (the failure is unlikely to be related to the
          function's parameters)
        * An error in this path now prints a warning message, rather than just
          silently failing to add the extra memory.
        * Previously a failure here could result in the region being partially
          mapped.  We now clean up any partial mapping before failing.
      
      For errors creating the vmemmap for the hot-added region:
         * This is now reported with EFAULT instead of causing a BUG() - this
           could happen for external reason (e.g. full hash table) so it's better
           to handle this non-fatally
         * An error message is also printed, so the failure won't be silent
         * As above a failure could cause a partially mapped region, we now
           clean this up. [mpe: move htab_remove_mapping() out of #ifdef
           CONFIG_MEMORY_HOTPLUG to enable this]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1dace6c6
    • D
      powerpc/mm: Handle removing maybe-present bolted HPTEs · 27828f98
      David Gibson 提交于
      At the moment the hpte_removebolted callback in ppc_md returns void and
      will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
      place.  This is awkward for the case of cleaning up a mapping which was
      partially made before failing.
      
      So, we add a return value to hpte_removebolted, and have it return ENOENT
      in the case that the HPTE to remove didn't exist in the first place.
      
      In the (sole) caller, we propagate errors in hpte_removebolted to its
      caller to handle.  However, we handle ENOENT specially, continuing to
      complete the unmapping over the specified range before returning the error
      to the caller.
      
      This means that htab_remove_mapping() will work sanely on a partially
      present mapping, removing any HPTEs which are present, while also returning
      ENOENT to its caller in case it's important there.
      
      There are two callers of htab_remove_mapping():
         - In remove_section_mapping() we already WARN_ON() any error return,
           which is reasonable - in this case the mapping should be fully
           present
         - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
           just a WARN_ON() in the case of ENOENT, since failing to remove a
           mapping that wasn't there in the first place probably shouldn't be
           fatal.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      27828f98
  17. 14 12月, 2015 1 次提交
  18. 26 3月, 2015 1 次提交
    • Y
      powerpc/mm: Free string after creating kmem cache · e77553cb
      Yanjiang Jin 提交于
      kmem_cache_create()->kmem_cache_create_memcg()->kstrdup() allocates new
      space and copys name's content, so it is safe to free name memory after
      calling kmem_cache_create(). Else kmemleak will report the below
      warning:
      
      unreferenced object 0xc0000000f9002160 (size 16):
        comm "swapper/0", pid 0, jiffies 4294892296 (age 1386.640s)
        hex dump (first 16 bytes):
          70 67 74 61 62 6c 65 2d 32 5e 39 00 de ad be ef  pgtable-2^9.....
        backtrace:
          [<c0000000004e03ec>] .kvasprintf+0x5c/0xa0
          [<c0000000004e045c>] .kasprintf+0x2c/0x50
          [<c00000000002e36c>] .pgtable_cache_add+0xac/0x100
          [<c00000000002e3e4>] .pgtable_cache_init+0x24/0x80
          [<c000000000c6c67c>] .start_kernel+0x228/0x4c8
          [<c000000000000594>] .start_here_common+0x24/0x90
      Signed-off-by: NYanjiang Jin <yanjiang.jin@windriver.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e77553cb
  19. 10 11月, 2014 1 次提交
  20. 25 9月, 2014 1 次提交
  21. 05 8月, 2014 4 次提交
  22. 11 10月, 2013 1 次提交
    • A
      powerpc: Prepare to support kernel handling of IOMMU map/unmap · 8e0861fa
      Alexey Kardashevskiy 提交于
      The current VFIO-on-POWER implementation supports only user mode
      driven mapping, i.e. QEMU is sending requests to map/unmap pages.
      However this approach is really slow, so we want to move that to KVM.
      Since H_PUT_TCE can be extremely performance sensitive (especially with
      network adapters where each packet needs to be mapped/unmapped) we chose
      to implement that as a "fast" hypercall directly in "real
      mode" (processor still in the guest context but MMU off).
      
      To be able to do that, we need to provide some facilities to
      access the struct page count within that real mode environment as things
      like the sparsemem vmemmap mappings aren't accessible.
      
      This adds an API function realmode_pfn_to_page() to get page struct when
      MMU is off.
      
      This adds to MM a new function put_page_unless_one() which drops a page
      if counter is bigger than 1. It is going to be used when MMU is off
      (for example, real mode on PPC64) and we want to make sure that page
      release will not happen in real mode as it may crash the kernel in
      a horrible way.
      
      CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.
      
      Cc: linux-mm@kvack.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8e0861fa
  23. 03 10月, 2013 1 次提交
    • N
      powerpc: Fix memory hotplug with sparse vmemmap · f7e3334a
      Nathan Fontenot 提交于
      Previous commit 46723bfa... introduced a new config option
      HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
      when sparse vmemmap is not defined.
      
      This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
      register_page_bootmem_info_node. Without this we get a BUG_ON for memory
      hot remove in put_page_bootmem().
      
      This also adds a stub for register_page_bootmem_memmap to allow ppc to build
      with sparse vmemmap defined. Leaving this as a stub is fine since the same
      vmemmap addresses are also handled in vmemmap_populate and as such are
      properly mapped.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.9+]
      f7e3334a
  24. 21 6月, 2013 1 次提交
  25. 14 5月, 2013 1 次提交
  26. 30 4月, 2013 2 次提交
    • A
      powerpc: New hugepage directory format · cf9427b8
      Aneesh Kumar K.V 提交于
      Change the hugepage directory format so that we can have leaf ptes directly
      at page directory avoiding the allocation of hugepage directory.
      
      With the new table format we have 3 cases for pgds and pmds:
      (1) invalid (all zeroes)
      (2) pointer to next table, as normal; bottom 6 bits == 0
      (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table
      
      Instead of storing shift value in hugepd pointer we use mmu_psize_def index
      so that we can fit all the supported hugepage size in 4 bits
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9427b8
    • J
      sparse-vmemmap: specify vmemmap population range in bytes · 0aad818b
      Johannes Weiner 提交于
      The sparse code, when asking the architecture to populate the vmemmap,
      specifies the section range as a starting page and a number of pages.
      
      This is an awkward interface, because none of the arch-specific code
      actually thinks of the range in terms of 'struct page' units and always
      translates it to bytes first.
      
      In addition, later patches mix huge page and regular page backing for
      the vmemmap.  For this, they need to call vmemmap_populate_basepages()
      on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
      are not necessarily multiples of the 'struct page' size and so this unit
      is too coarse.
      
      Just translate the section range into bytes once in the generic sparse
      code, then pass byte ranges down the stack.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0aad818b