1. 11 4月, 2016 1 次提交
    • M
      powerpc/mm: Remove long disabled SLB code · 1f4c66e8
      Michael Ellerman 提交于
      We have a bunch of SLB related code in the tree which is there to handle
      dynamic VSIDs - but currently it's all disabled at compile time. The
      comments say "Keep that around for when we re-implement dynamic VSIDs".
      
      But that was over 10 years ago (commit 3c726f8d ("[PATCH] ppc64:
      support 64k pages")). The chance that it would still work unchanged is
      minimal, and in the meantime it's confusing to folks browsing/grepping
      the code. If we ever want to re-instate it, it's in the git history.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      1f4c66e8
  2. 29 3月, 2016 1 次提交
    • S
      powerpc/mm: Fixup preempt underflow with huge pages · 08a5bb29
      Sebastian Siewior 提交于
      hugepd_free() used __get_cpu_var() once. Nothing ensured that the code
      accessing the variable did not migrate from one CPU to another and soon
      this was noticed by Tiejun Chen in 94b09d75 ("powerpc/hugetlb:
      Replace __get_cpu_var with get_cpu_var"). So we had it fixed.
      
      Christoph Lameter was doing his __get_cpu_var() replaces and forgot
      PowerPC. Then he noticed this and sent his fixed up batch again which
      got applied as 69111bac ("powerpc: Replace __get_cpu_var uses").
      
      The careful reader will noticed one little detail: get_cpu_var() got
      replaced with this_cpu_ptr(). So now we have a put_cpu_var() which does
      a preempt_enable() and nothing that does preempt_disable() so we
      underflow the preempt counter.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      08a5bb29
  3. 18 3月, 2016 2 次提交
  4. 12 3月, 2016 9 次提交
    • C
      powerpc32: PAGE_EXEC required for inittext · 060ef9d8
      Christophe Leroy 提交于
      PAGE_EXEC is required for inittext, otherwise CONFIG_DEBUG_PAGEALLOC
      ends up with an Oops
      
      [    0.000000] Inode-cache hash table entries: 8192 (order: 1, 32768 bytes)
      [    0.000000] Sorting __ex_table...
      [    0.000000] bootmem::free_all_bootmem_core nid=0 start=0 end=2000
      [    0.000000] Unable to handle kernel paging request for instruction fetch
      [    0.000000] Faulting instruction address: 0xc045b970
      [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
      [    0.000000] PREEMPT DEBUG_PAGEALLOC CMPC885
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.25-local-dirty #1673
      [    0.000000] task: c04d83d0 ti: c04f8000 task.ti: c04f8000
      [    0.000000] NIP: c045b970 LR: c045b970 CTR: 0000000a
      [    0.000000] REGS: c04f9ea0 TRAP: 0400   Not tainted  (3.18.25-local-dirty)
      [    0.000000] MSR: 08001032 <ME,IR,DR,RI>  CR: 39955d35  XER: a000ff40
      [    0.000000]
      GPR00: c045b970 c04f9f50 c04d83d0 00000000 ffffffff c04dcdf4 00000048 c04f6b10
      GPR08: c04f6ab0 00000001 c0563488 c04f6ab0 c04f8000 00000000 00000000 b6db6db7
      GPR16: 00003474 00000180 00002000 c7fec000 00000000 000003ff 00000176 c0415014
      GPR24: c0471018 c0414ee8 c05304e8 c03aeaac c0510000 c0471018 c0471010 00000000
      [    0.000000] NIP [c045b970] free_all_bootmem+0x164/0x228
      [    0.000000] LR [c045b970] free_all_bootmem+0x164/0x228
      [    0.000000] Call Trace:
      [    0.000000] [c04f9f50] [c045b970] free_all_bootmem+0x164/0x228 (unreliable)
      [    0.000000] [c04f9fa0] [c0454044] mem_init+0x3c/0xd0
      [    0.000000] [c04f9fb0] [c045080c] start_kernel+0x1f4/0x390
      [    0.000000] [c04f9ff0] [c0002214] start_here+0x38/0x98
      [    0.000000] Instruction dump:
      [    0.000000] 2f150000 7f968840 72a90001 3ad60001 56b5f87e 419a0028 419e0024 41a20018
      [    0.000000] 807cc20c 38800000 7c638214 4bffd2f5 <3a940001> 3a100024 4bffffc8 7e368b78
      [    0.000000] ---[ end trace dc8fa200cb88537f ]---
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      060ef9d8
    • C
      powerpc: Simplify test in __dma_sync() · 8478d7f0
      Christophe Leroy 提交于
      This simplification helps the compiler. We now have only one test
      instead of two, so it reduces the number of branches.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      8478d7f0
    • C
      powerpc/8xx: rewrite flush_instruction_cache() in C · 766d45cb
      Christophe Leroy 提交于
      On PPC8xx, flushing instruction cache is performed by writing
      in register SPRN_IC_CST. This registers suffers CPU6 ERRATA.
      The patch rewrites the fonction in C so that CPU6 ERRATA will
      be handled transparently
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      766d45cb
    • C
      powerpc/8xx: rewrite set_context() in C · a7761fe4
      Christophe Leroy 提交于
      There is no real need to have set_context() in assembly.
      Now that we have mtspr() handling CPU6 ERRATA directly, we
      can rewrite set_context() in C language for easier maintenance.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      a7761fe4
    • C
      powerpc32: remove ioremap_base · e974cd4b
      Christophe Leroy 提交于
      ioremap_base is not initialised and is nowhere used so remove it
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      e974cd4b
    • C
      powerpc32: Remove useless/wrong MMU:setio progress message · c562eb06
      Christophe Leroy 提交于
      Commit 77116849 ("[POWERPC] Remove unused machine call outs")
      removed the call to setup_io_mappings(), so remove the associated
      progress line message
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      c562eb06
    • C
      powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() together · 3084cdb7
      Christophe Leroy 提交于
      x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of
      purpose, and are never defined at the same time.
      So rename them x_block_mapped() and define them in the relevant
      places
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      3084cdb7
    • C
      powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.c · 516d9189
      Christophe Leroy 提交于
      Now we have a 8xx specific .c file for that so put it in there
      as other powerpc variants do
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      516d9189
    • C
      powerpc/8xx: Map linear kernel RAM with 8M pages · a372acfa
      Christophe Leroy 提交于
      On a live running system (VoIP gateway for Air Trafic Control), over
      a 10 minutes period (with 277s idle), we get 87 millions DTLB misses
      and approximatly 35 secondes are spent in DTLB handler.
      This represents 5.8% of the overall time and even 10.8% of the
      non-idle time.
      Among those 87 millions DTLB misses, 15% are on user addresses and
      85% are on kernel addresses. And within the kernel addresses, 93%
      are on addresses from the linear address space and only 7% are on
      addresses from the virtual address space.
      
      MPC8xx has no BATs but it has 8Mb page size. This patch implements
      mapping of kernel RAM using 8Mb pages, on the same model as what is
      done on the 40x.
      
      In 4k pages mode, each PGD entry maps a 4Mb area: we map every two
      entries to the same 8Mb physical page. In each second entry, we add
      4Mb to the page physical address to ease life of the FixupDAR
      routine. This is just ignored by HW.
      
      In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry
      will point to the first page of the area. The DTLB handler adds
      the 3 bits from EPN to map the correct page.
      
      With this patch applied, we now get only 13 millions TLB misses
      during the 10 minutes period. The idle time has increased to 313s
      and the overall time spent in DTLB miss handler is 6.3s, which
      represents 1% of the overall time and 2.2% of non-idle time.
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NScott Wood <oss@buserror.net>
      a372acfa
  5. 05 3月, 2016 1 次提交
  6. 04 3月, 2016 1 次提交
  7. 03 3月, 2016 4 次提交
  8. 02 3月, 2016 1 次提交
  9. 01 3月, 2016 4 次提交
    • D
      powerpc/mm: Clean up memory hotplug failure paths · 1dace6c6
      David Gibson 提交于
      This makes a number of cleanups to handling of mapping failures during
      memory hotplug on Power:
      
      For errors creating the linear mapping for the hot-added region:
        * This is now reported with EFAULT which is more appropriate than the
          previous EINVAL (the failure is unlikely to be related to the
          function's parameters)
        * An error in this path now prints a warning message, rather than just
          silently failing to add the extra memory.
        * Previously a failure here could result in the region being partially
          mapped.  We now clean up any partial mapping before failing.
      
      For errors creating the vmemmap for the hot-added region:
         * This is now reported with EFAULT instead of causing a BUG() - this
           could happen for external reason (e.g. full hash table) so it's better
           to handle this non-fatally
         * An error message is also printed, so the failure won't be silent
         * As above a failure could cause a partially mapped region, we now
           clean this up. [mpe: move htab_remove_mapping() out of #ifdef
           CONFIG_MEMORY_HOTPLUG to enable this]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1dace6c6
    • D
      powerpc/mm: Handle removing maybe-present bolted HPTEs · 27828f98
      David Gibson 提交于
      At the moment the hpte_removebolted callback in ppc_md returns void and
      will BUG_ON() if the hpte it's asked to remove doesn't exist in the first
      place.  This is awkward for the case of cleaning up a mapping which was
      partially made before failing.
      
      So, we add a return value to hpte_removebolted, and have it return ENOENT
      in the case that the HPTE to remove didn't exist in the first place.
      
      In the (sole) caller, we propagate errors in hpte_removebolted to its
      caller to handle.  However, we handle ENOENT specially, continuing to
      complete the unmapping over the specified range before returning the error
      to the caller.
      
      This means that htab_remove_mapping() will work sanely on a partially
      present mapping, removing any HPTEs which are present, while also returning
      ENOENT to its caller in case it's important there.
      
      There are two callers of htab_remove_mapping():
         - In remove_section_mapping() we already WARN_ON() any error return,
           which is reasonable - in this case the mapping should be fully
           present
         - In vmemmap_remove_mapping() we BUG_ON() any error.  We change that to
           just a WARN_ON() in the case of ENOENT, since failing to remove a
           mapping that wasn't there in the first place probably shouldn't be
           fatal.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      27828f98
    • D
      powerpc/mm: Clean up error handling for htab_remove_mapping · abd0a0e7
      David Gibson 提交于
      Currently, the only error that htab_remove_mapping() can report is -EINVAL,
      if removal of bolted HPTEs isn't implemeted for this platform.  We make
      a few clean ups to the handling of this:
      
       * EINVAL isn't really the right code - there's nothing wrong with the
         function's arguments - use ENODEV instead
       * We were also printing a warning message, but that's a decision better
         left up to the callers, so remove it
       * One caller is vmemmap_remove_mapping(), which will just BUG_ON() on
         error, making the warning message redundant, so no change is needed
         there.
       * The other caller is remove_section_mapping().  This is called in the
         memory hot remove path at a point after vmemmap_remove_mapping() so
         if hpte_removebolted isn't implemented, we'd expect to have already
         BUG()ed anyway.  Put a WARN_ON() here, in lieu of a printk() since this
         really shouldn't be happening.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      abd0a0e7
    • A
      446957ba
  10. 29 2月, 2016 3 次提交
  11. 28 2月, 2016 1 次提交
    • D
      mm: ASLR: use get_random_long() · 5ef11c35
      Daniel Cashman 提交于
      Replace calls to get_random_int() followed by a cast to (unsigned long)
      with calls to get_random_long().  Also address shifting bug which, in
      case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
      Signed-off-by: NDaniel Cashman <dcashman@android.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef11c35
  12. 27 2月, 2016 2 次提交
  13. 22 2月, 2016 1 次提交
    • A
      powerpc/mm/hash: Clear the invalid slot information correctly · 9ab3ac23
      Aneesh Kumar K.V 提交于
      We can get a hash pte fault with 4k base page size and find the pte
      already inserted with 64K base page size. In that case we need to clear
      the existing slot information from the old pte. Fix this correctly
      
      With THP, we also clear the slot information with respect to all
      the 64K hash pte mapping that 16MB page. They are all invalid
      now. This make sure we don't find the slot valid when we fault with
      4k base page size. Finding the slot valid should not result in any wrong
      behavior because we do check again in hash page table for the validity.
      But we can avoid that check completely.
      
      Fixes: a43c0eb8 ("powerpc/mm: Convert 4k hash insert to C")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9ab3ac23
  14. 16 2月, 2016 1 次提交
  15. 15 2月, 2016 1 次提交
    • A
      powerpc/mm: Fix Multi hit ERAT cause by recent THP update · c777e2a8
      Aneesh Kumar K.V 提交于
      With ppc64 we use the deposited pgtable_t to store the hash pte slot
      information. We should not withdraw the deposited pgtable_t without
      marking the pmd none. This ensure that low level hash fault handling
      will skip this huge pte and we will handle them at upper levels.
      
      Recent change to pmd splitting changed the above in order to handle the
      race between pmd split and exit_mmap. The race is explained below.
      
      Consider following race:
      
      		CPU0				CPU1
      shrink_page_list()
        add_to_swap()
          split_huge_page_to_list()
            __split_huge_pmd_locked()
              pmdp_huge_clear_flush_notify()
      	// pmd_none() == true
      					exit_mmap()
      					  unmap_vmas()
      					    zap_pmd_range()
      					      // no action on pmd since pmd_none() == true
      	pmd_populate()
      
      As result the THP will not be freed. The leak is detected by check_mm():
      
      	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
      
      The above required us to not mark pmd none during a pmd split.
      
      The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
      level fault handling code skip this pte. At higher level we do take ptl
      lock. That should serialze us against the pmd split. Once the lock is
      acquired we do check the pmd again using pmd_same. That should always
      return false for us and hence we should retry the access. We do the
      pmd_same check in all case after taking plt with
      THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
      huge_pmd_set_accessed)
      
      Also make sure we wait for irq disable section in other cpus to finish
      before flipping a huge pte entry with a regular pmd entry. Code paths
      like find_linux_pte_or_hugepte depend on irq disable to get
      a stable pte_t pointer. A parallel thp split need to make sure we
      don't convert a pmd pte to a regular pmd entry without waiting for the
      irq disable section to finish.
      
      Fixes: eef1b3ba ("thp: implement split_huge_pmd()")
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c777e2a8
  16. 30 1月, 2016 1 次提交
    • T
      arch: Set IORESOURCE_SYSTEM_RAM flag for System RAM · 35d98e93
      Toshi Kani 提交于
      Set IORESOURCE_SYSTEM_RAM in flags of resource ranges with
      "System RAM", "Kernel code", "Kernel data", and "Kernel bss".
      
      Note that:
      
       - IORESOURCE_SYSRAM (i.e. modifier bit) is set in flags when
         IORESOURCE_MEM is already set. IORESOURCE_SYSTEM_RAM is defined
         as (IORESOURCE_MEM|IORESOURCE_SYSRAM).
      
       - Some archs do not set 'flags' for children nodes, such as
         "Kernel code".  This patch does not change 'flags' in this
         case.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Link: http://lkml.kernel.org/r/1453841853-11383-7-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35d98e93
  17. 25 1月, 2016 1 次提交
  18. 16 1月, 2016 3 次提交
  19. 09 1月, 2016 1 次提交
  20. 27 12月, 2015 1 次提交