1. 30 1月, 2017 1 次提交
    • R
      powerpc/mm: Simplify loop control in parse_numa_properties() · 7656cd8e
      Reza Arbab 提交于
      The flow of the main loop in parse_numa_properties() is overly
      complicated. Simplify it to be less confusing and easier to read.
      No functional change.
      
      The end of the main loop in parse_numa_properties() looks like this:
      
      	for_each_node_by_type(...) {
      		...
      		if (!condition) {
      			if (--ranges)
      				goto new_range;
      			else
      				continue;
      		}
      
      		statement();
      
      		if (--ranges)
      			goto new_range;
      		/* else
      		 *	continue; <- implicit, this is the end of the loop
      		 */
      	}
      
      The only effect of !condition is to skip execution of statement(). This
      can be rewritten in a simpler way:
      
      	for_each_node_by_type(...) {
      		...
      		if (condition)
      			statement();
      
      		if (--ranges)
      			goto new_range;
      	}
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7656cd8e
  2. 25 1月, 2017 1 次提交
  3. 23 1月, 2017 1 次提交
  4. 18 1月, 2017 3 次提交
  5. 17 1月, 2017 1 次提交
  6. 25 12月, 2016 2 次提交
  7. 13 12月, 2016 1 次提交
    • R
      powerpc/mm: allow memory hotplug into a memoryless node · 4a3bac4e
      Reza Arbab 提交于
      Patch series "enable movable nodes on non-x86 configs", v7.
      
      This patchset allows more configs to make use of movable nodes.  When
      CONFIG_MOVABLE_NODE is selected, there are two ways to introduce such
      nodes into the system:
      
      1. Discover movable nodes at boot. Currently this is only possible on
         x86, but we will enable configs supporting fdt to do the same.
      
      2. Hotplug and online all of a node's memory using online_movable. This
         is already possible on any config supporting memory hotplug, not
         just x86, but the Kconfig doesn't say so. We will fix that.
      
      We'll also remove some cruft on power which would prevent (2).
      
      This patch (of 5):
      
      Remove the check which prevents us from hotplugging into an empty node.
      
      The original commit b226e462 ("[PATCH] powerpc: don't add memory to
      empty node/zone"), states that this was intended to be a temporary measure.
      It is a workaround for an oops which no longer occurs.
      
      Link: http://lkml.kernel.org/r/1479160961-25840-2-git-send-email-arbab@linux.vnet.ibm.comSigned-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alistair Popple <apopple@au1.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a3bac4e
  8. 10 12月, 2016 3 次提交
    • C
      powerpc/8xx: Implement support of hugepages · 4b914286
      Christophe Leroy 提交于
      8xx uses a two level page table with two different linux page size
      support (4k and 16k). 8xx also support two different hugepage sizes
      512k and 8M. In order to support them on linux we define two different
      page table layout.
      
      The size of pages is in the PGD entry, using PS field (bits 28-29):
      00 : Small pages (4k or 16k)
      01 : 512k pages
      10 : reserved
      11 : 8M pages
      
      For 512K hugepage size a pgd entry have the below format
      [<hugepte address >0101] . The hugepte table allocated will contain 8
      entries pointing to 512K huge pte in 4k pages mode and 64 entries in
      16k pages mode.
      
      For 8M in 16k mode, a pgd entry have the below format
      [<hugepte address >1101] . The hugepte table allocated will contain 8
      entries pointing to 8M huge pte.
      
      For 8M in 4k mode, multiple pgd entries point to the same hugepte
      address and pgd entry will have the below format
      [<hugepte address>1101]. The hugepte table allocated will only have one
      entry.
      
      For the time being, we do not support CPU15 ERRATA when HUGETLB is
      selected
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3, for the generic bits)
      Signed-off-by: NScott Wood <oss@buserror.net>
      4b914286
    • C
      powerpc: get hugetlbpage handling more generic · 03bb2d65
      Christophe Leroy 提交于
      Today there are two implementations of hugetlbpages which are managed
      by exclusive #ifdefs:
      * FSL_BOOKE: several directory entries points to the same single hugepage
      * BOOK3S: one upper level directory entry points to a table of hugepages
      
      In preparation of implementation of hugepage support on the 8xx, we
      need a mix of the two above solutions, because the 8xx needs both cases
      depending on the size of pages:
      * In 4k page size mode, each PGD entry covers a 4M bytes area. It means
      that 2 PGD entries will be necessary to cover an 8M hugepage while a
      single PGD entry will cover 8x 512k hugepages.
      * In 16 page size mode, each PGD entry covers a 64M bytes area. It means
      that 8x 8M hugepages will be covered by one PGD entry and 64x 512k
      hugepages will be covers by one PGD entry.
      
      This patch:
      * removes #ifdefs in favor of if/else based on the range sizes
      * merges the two huge_pte_alloc() functions as they are pretty similar
      * merges the two hugetlbpage_init() functions as they are pretty similar
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> (v3)
      Signed-off-by: NScott Wood <oss@buserror.net>
      03bb2d65
    • C
      powerpc: port 64 bits pgtable_cache to 32 bits · 9b081e10
      Christophe Leroy 提交于
      Today powerpc64 uses a set of pgtable_caches while powerpc32 uses
      standard pages when using 4k pages and a single pgtable_cache
      if using other size pages.
      
      In preparation of implementing huge pages on the 8xx, this patch
      replaces the specific powerpc32 handling by the 64 bits approach.
      
      This is done by:
      * moving 64 bits pgtable_cache_add() and pgtable_cache_init()
      in a new file called init-common.c
      * modifying pgtable_cache_init() to also handle the case
      without PMD
      * removing the 32 bits version of pgtable_cache_add() and
      pgtable_cache_init()
      * copying related header contents from 64 bits into both the
      book3s/32 and nohash/32 header files
      
      On the 8xx, the following cache sizes will be used:
      * 4k pages mode:
      - PGT_CACHE(10) for PGD
      - PGT_CACHE(3) for 512k hugepage tables
      * 16k pages mode:
      - PGT_CACHE(6) for PGD
      - PGT_CACHE(7) for 512k hugepage tables
      - PGT_CACHE(3) for 8M hugepage tables
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      9b081e10
  9. 02 12月, 2016 3 次提交
  10. 01 12月, 2016 1 次提交
    • M
      powerpc/mm: Fix page table dump build on non-Book3S · dd5ac03e
      Michael Ellerman 提交于
      In the recent commit 1515ab93 ("powerpc/mm: Dump hash table") we
      added code to dump the hage page table. Currently this can be selected
      to build on any platform. However it breaks the build if we're building
      for a non-Book3S platform, because none of the hash page table related
      defines and so on exist. So restrict it to building only on Book3S.
      
      Similarly in commit 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      we added code to dump the Linux page tables, which uses some constants
      which are only defined on Book3S - so guard those with an #ifdef.
      
      Fixes: 1515ab93 ("powerpc/mm: Dump hash table")
      Fixes: 8eb07b18 ("powerpc/mm: Dump linux pagetables")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd5ac03e
  11. 30 11月, 2016 1 次提交
  12. 29 11月, 2016 1 次提交
    • B
      powerpc/mm: Fix lazy icache flush on pre-POWER5 · dd7b2f03
      Benjamin Herrenschmidt 提交于
      On 64-bit CPUs with no-execute support and non-snooping icache, such as
      970 or POWER4, we have a software mechanism to ensure coherency of the
      cache (using exec faults when needed).
      
      This was broken due to a logic error when the code was rewritten
      from assembly to C, previously the assembly code did:
      
        BEGIN_FTR_SECTION
               mr      r4,r30
               mr      r5,r7
               bl      hash_page_do_lazy_icache
        END_FTR_SECTION(CPU_FTR_NOEXECUTE|CPU_FTR_COHERENT_ICACHE, CPU_FTR_NOEXECUTE)
      
      Which tests that:
         (cpu_features & (NOEXECUTE | COHERENT_ICACHE)) == NOEXECUTE
      
      Which says that the current cpu does have NOEXECUTE, but does not have
      COHERENT_ICACHE.
      
      Fixes: 91f1da99 ("powerpc/mm: Convert 4k hash insert to C")
      Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C")
      Fixes: a43c0eb8 ("powerpc/mm: Convert 4k insert from asm to C")
      Cc: stable@vger.kernel.org # v4.5+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Change log verbosification]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dd7b2f03
  13. 28 11月, 2016 2 次提交
  14. 26 11月, 2016 1 次提交
    • B
      powerpc/mm/radix: Prevent kernel execution of user space · 3b10d009
      Balbir Singh 提交于
      ISA 3 defines new encoded access authority that allows instruction
      access prevention in privileged mode and allows normal access
      to problem state. This patch just enables IAMR (Instruction Authority
      Mask Register), enabling AMR would require more work.
      
      I've tested this with a buggy driver and a simple payload. The payload
      is specific to the build I've tested.
      
      mpe: Also tested with LKDTM:
      
        # echo EXEC_USERSPACE > /sys/kernel/debug/provoke-crash/DIRECT
        lkdtm: Performing direct entry EXEC_USERSPACE
        lkdtm: attempting ok execution at c0000000005bf560
        lkdtm: attempting bad execution at 00003fff8d940000
        Unable to handle kernel paging request for instruction fetch
        Faulting instruction address: 0x3fff8d940000
        Oops: Kernel access of bad area, sig: 11 [#1]
        NIP: 00003fff8d940000 LR: c0000000005bfa58 CTR: 00003fff8d940000
        REGS: c0000000f1fcf900 TRAP: 0400   Not tainted  (4.9.0-rc5-compiler_gcc-6.2.0-00109-g956dbc06232a)
        MSR: 9000000010009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 48002222  XER: 00000000
        ...
        Call Trace:
          lkdtm_EXEC_USERSPACE+0x104/0x120 (unreliable)
          lkdtm_do_action+0x3c/0x80
          direct_entry+0x100/0x1b0
          full_proxy_write+0x94/0x100
          __vfs_write+0x3c/0x1b0
          vfs_write+0xcc/0x230
          SyS_write+0x60/0x110
          system_call+0x38/0xfc
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3b10d009
  15. 25 11月, 2016 3 次提交
  16. 23 11月, 2016 1 次提交
    • P
      powerpc/64: Provide functions for accessing POWER9 partition table · 9d661958
      Paul Mackerras 提交于
      POWER9 requires the host to set up a partition table, which is a
      table in memory indexed by logical partition ID (LPID) which
      contains the pointers to page tables and process tables for the
      host and each guest.
      
      This factors out the initialization of the partition table into
      a single function.  This code was previously duplicated between
      hash_utils_64.c and pgtable-radix.c.
      
      This provides a function for setting a partition table entry,
      which is used in early MMU initialization, and will be used by
      KVM whenever a guest is created.  This function includes a tlbie
      instruction which will flush all TLB entries for the LPID and
      all caches of the partition table entry for the LPID, across the
      system.
      
      This also moves a call to memblock_set_current_limit(), which was
      in radix_init_partition_table(), but has nothing to do with the
      partition table.  By analogy with the similar code for hash, the
      call gets moved to near the end of radix__early_init_mmu().  It
      now gets called when running as a guest, whereas previously it
      would only be called if the kernel is running as the host.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9d661958
  17. 22 11月, 2016 1 次提交
  18. 18 11月, 2016 2 次提交
  19. 17 11月, 2016 4 次提交
  20. 16 11月, 2016 1 次提交
    • P
      powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format · 6b243fcf
      Paul Mackerras 提交于
      This changes the way that we support the new ISA v3.00 HPTE format.
      Instead of adapting everything that uses HPTE values to handle either
      the old format or the new format, depending on which CPU we are on,
      we now convert explicitly between old and new formats if necessary
      in the low-level routines that actually access HPTEs in memory.
      This limits the amount of code that needs to know about the new
      format and makes the conversions explicit.  This is OK because the
      old format contains all the information that is in the new format.
      
      This also fixes operation under a hypervisor, because the H_ENTER
      hypercall (and other hypercalls that deal with HPTEs) will continue
      to require the HPTE value to be supplied in the old format.  At
      present the kernel will not boot in HPT mode on POWER9 under a
      hypervisor.
      
      This fixes and partially reverts commit 50de596d
      ("powerpc/mm/hash: Add support for Power9 Hash", 2016-04-29).
      
      Fixes: 50de596d ("powerpc/mm/hash: Add support for Power9 Hash")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6b243fcf
  21. 14 11月, 2016 2 次提交
    • B
      powerpc/hash64: Be more careful when generating tlbiel · f923efbc
      Balbir Singh 提交于
      In ISA v2.05, the tlbiel instruction takes two arguments, RB and L:
      
      tlbiel RB,L
      
      +---------+---------+----+---------+---------+---------+----+
      |    31   |    /    | L  |    /    |    RB   |   274   | /  |
      | 31 - 26 | 25 - 22 | 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
      +---------+---------+----+---------+---------+---------+----+
      
      In ISA v2.06 tlbiel takes only one argument, RB:
      
      tlbiel RB
      
      +---------+---------+---------+---------+---------+----+
      |    31   |    /    |    /    |    RB   |   274   | /  |
      | 31 - 26 | 25 - 21 | 20 - 16 | 15 - 11 |  10 - 1 | 0  |
      +---------+---------+---------+---------+---------+----+
      
      And in ISA v3.00 tlbiel takes five arguments:
      
      tlbiel RB,RS,RIC,PRS,R
      
      +---------+---------+----+---------+----+----+---------+---------+----+
      |    31   |    RS   | /  |   RIC   |PRS | R  |    RB   |   274   | /  |
      | 31 - 26 | 25 - 21 | 20 | 19 - 18 | 17 | 16 | 15 - 11 |  10 - 1 | 0  |
      +---------+---------+----+---------+----+----+---------+---------+----+
      
      However the assembler also accepts "tlbiel RB", and generates
      "tlbiel RB,r0,0,0,0".
      
      As you can see above the L field from the v2.05 encoding overlaps with the
      reserved field of the v2.06 encoding, and the low bit of the RS field of the
      v3.00 encoding.
      
      Currently in __tlbiel() we generate two tlbiel instructions manually using hex
      constants. In the first case, for MMU_PAGE_4K, we generate "tlbiel RB,0", which
      is safe in all cases, because the L bit is zero.
      
      However in the default case we generate "tlbiel RB,1", therefore setting bit 21
      to 1.
      
      This is not an actual bug on v2.06 processors, because the CPU ignores the value
      of the reserved field. However software is supposed to encode the reserved
      fields as zero to enable forward compatibility.
      
      On v3.00 processors setting bit 21 to 1 and no other bits of RS, means we are
      using r1 for the value of RS.
      
      Although it's not obvious, the code sets the IS field (bits 10-11) to 0 (by
      omission), and L=1, in the va value, which is passed as RB. We also pass R=0 in
      the instruction.
      
      The combination of IS=0, L=1 and R=0 means the value of RS is not used, so even
      on ISA v3.00 there is no actual bug.
      
      We should still fix it, as setting a reserved bit on v2.06 is naughty, and we
      are only avoiding a bug on v3.00 by accident rather than design. Use
      ASM_FTR_IFSET() to generate the single argument form on ISA v2.06 and later, and
      the two argument form on pre v2.06.
      
      Although there may be very old toolchains which don't understand tlbiel, we have
      other code in the tree which has been using tlbiel for over five years, and no
      one has reported any build failures, so just let the assembler generate the
      instructions.
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Rewrite change log, use IFSET instead of IFCLR]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f923efbc
    • N
      powerpc: Add support for relative exception tables · 61a92f70
      Nicholas Piggin 提交于
      This halves the exception table size on 64-bit builds, and it allows
      build-time sorting of exception tables to work on relocated kernels.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Minor asm fixups and bits to keep the selftests working]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      61a92f70
  22. 27 10月, 2016 1 次提交
    • A
      powerpc/mm/radix: Use tlbiel only if we ever ran on the current cpu · bd77c449
      Aneesh Kumar K.V 提交于
      Before this patch, we used tlbiel, if we ever ran only on this core.
      That was mostly derived from the nohash usage of the same. But is
      incorrect, the ISA 3.0 clarifies tlbiel such that:
      
      "All TLB entries that have all of the following properties are made
      invalid on the thread executing the tlbiel instruction"
      
      ie. tlbiel only invalidates TLB entries on the current thread. So if the
      mm has been used on any other thread (aka. cpu) then we must broadcast
      the invalidate.
      
      This bug could lead to invalid TLB entries if a program runs on multiple
      threads of a core.
      
      Hence use tlbiel, if we only ever ran on only the current cpu.
      
      Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bd77c449
  23. 19 10月, 2016 3 次提交