1. 21 6月, 2013 4 次提交
    • A
      powerpc: Make linux pagetable walk safe with THP enabled · 0ac52dd7
      Aneesh Kumar K.V 提交于
      We need to have irqs disabled to handle all the possible parallel update for
      linux page table without holding locks.
      
      Events that we are intersted in while walking page tables are
      1) Page fault
      2) umap
      3) THP split
      4) THP collapse
      
      A) local_irq_disabled:
      ------------------------
      1) page fault:
      A none to valid transition via page fault is not an issue because we
      would either see a none or valid. If it is none, we would error out
      the page table walk. We may need to use on stack values when checking for
      type of page table elements, because if we do
      
      if (!is_hugepd()) {
          if (!pmd_none() {
             if (pmd_bad() {
      
      We could take that bad condition because the pmd got converted to a hugepd
      after the !is_hugepd check via a hugetlb fault.
      
      The right way would be to check for pmd_none higher up or use on stack value.
      
      2) A valid to none conversion via unmap:
      We can safely walk the upper level table, because we don't remove the the
      page table entries until rcu grace period. So even if we followed a
      wrong pointer we still have the pointer valid till the grace period.
      
      A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and
       _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent
      pte_clear we take _PAGE_BUSY.
      
      3) THP split:
      A valid transparent hugepage is converted to nomal page. Before we split we
      do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING
      So when walking page table we need to check for pmd_trans_splitting and
      handle that. The pte returned should also need to be checked for
      _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save
      the value of PTE on stack and check for the flag in the local pte value.
      If we don't have the value set we can safely operate on the local pte value
      and we atomicaly set _PAGE_BUSY.
      
      4) THP collapse:
      A normal page gets converted to hugepage. In the collapse path, we
      mark the pmd none early (pmdp_clear_flush). With irq disabled, if we
      are aleady walking page table we would see the pmd_none and won't continue.
      If we see a valid PMD, we should still check for _PAGE_PRESENT before
      setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0ac52dd7
    • A
      powerpc/THP: Add code to handle HPTE faults for hugepages · 6d492ecc
      Aneesh Kumar K.V 提交于
      The deposted PTE page in the second half of the PMD table is used to
      track the state on hash PTEs. After updating the HPTE, we mark the
      coresponding slot in the deposted PTE page valid.
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6d492ecc
    • A
      powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte · 12bc9f6f
      Aneesh Kumar K.V 提交于
      Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly
      document why we don't need to handle transparent hugepages at callsites.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      12bc9f6f
    • A
      powerpc/mm: handle hugepage size correctly when invalidating hpte entries · db3d8534
      Aneesh Kumar K.V 提交于
      If a hash bucket gets full, we "evict" a more/less random entry from it.
      When we do that we don't invalidate the TLB (hpte_remove) because we assume
      the old translation is still technically "valid". This implies that when
      we are invalidating or updating pte, even if HPTE entry is not valid
      we should do a tlb invalidate. With hugepages, we need to pass the correct
      actual page size value for tlb invalidation.
      
      This change update the patch 0608d692
      "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle
      transparent hugepages correctly.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      db3d8534
  2. 14 5月, 2013 1 次提交
    • L
      powerpc: Exception hooks for context tracking subsystem · ba12eede
      Li Zhong 提交于
      This is the exception hooks for context tracking subsystem, including
      data access, program check, single step, instruction breakpoint, machine check,
      alignment, fp unavailable, altivec assist, unknown exception, whose handlers
      might use RCU.
      
      This patch corresponds to
      [PATCH] x86: Exception hooks for userspace RCU extended QS
        commit 6ba3c97a
      
      But after the exception handling moved to generic code, and some changes in
      following two commits:
      56dd9470
        context_tracking: Move exception handling to generic code
      6c1e0256
        context_tracking: Restore correct previous context state on exception exit
      
      it is able for exception hooks to use the generic code above instead of a
      redundant arch implementation.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba12eede
  3. 06 5月, 2013 1 次提交
  4. 30 4月, 2013 3 次提交
  5. 18 4月, 2013 2 次提交
    • L
      powerpc: Try to insert the hptes repeatedly in kernel_map_linear_page() · 016af59f
      Li Zhong 提交于
      This patch fixes the following oops, which could be trigged by build the kernel
      with many concurrent threads, under CONFIG_DEBUG_PAGEALLOC.
      
      hpte_insert() might return -1, indicating that the bucket (primary here)
      is full. We are not necessarily reporting a BUG in this case. Instead, we could
      try repeatedly (try secondary, remove and try again) until we find a slot.
      
      [  543.075675] ------------[ cut here ]------------
      [  543.075701] kernel BUG at arch/powerpc/mm/hash_utils_64.c:1239!
      [  543.075714] Oops: Exception in kernel mode, sig: 5 [#1]
      [  543.075722] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
      [  543.075741] Modules linked in: binfmt_misc ehea
      [  543.075759] NIP: c000000000036eb0 LR: c000000000036ea4 CTR: c00000000005a594
      [  543.075771] REGS: c0000000a90832c0 TRAP: 0700   Not tainted  (3.8.0-next-20130222)
      [  543.075781] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 22224482  XER: 00000000
      [  543.075816] SOFTE: 0
      [  543.075823] CFAR: c00000000004c200
      [  543.075830] TASK = c0000000e506b750[23934] 'cc1' THREAD: c0000000a9080000 CPU: 1
      GPR00: 0000000000000001 c0000000a9083540 c000000000c600a8 ffffffffffffffff
      GPR04: 0000000000000050 fffffffffffffffa c0000000a90834e0 00000000004ff594
      GPR08: 0000000000000001 0000000000000000 000000009592d4d8 c000000000c86854
      GPR12: 0000000000000002 c000000006ead300 0000000000a51000 0000000000000001
      GPR16: f000000003354380 ffffffffffffffff ffffffffffffff80 0000000000000000
      GPR20: 0000000000000001 c000000000c600a8 0000000000000001 0000000000000001
      GPR24: 0000000003354380 c000000000000000 0000000000000000 c000000000b65950
      GPR28: 0000002000000000 00000000000cd50e 0000000000bf50d9 c000000000c7c230
      [  543.076005] NIP [c000000000036eb0] .kernel_map_pages+0x1e0/0x3f8
      [  543.076016] LR [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8
      [  543.076025] Call Trace:
      [  543.076033] [c0000000a9083540] [c000000000036ea4] .kernel_map_pages+0x1d4/0x3f8 (unreliable)
      [  543.076053] [c0000000a9083640] [c000000000167638] .get_page_from_freelist+0x6cc/0x8dc
      [  543.076067] [c0000000a9083800] [c000000000167a48] .__alloc_pages_nodemask+0x200/0x96c
      [  543.076082] [c0000000a90839c0] [c0000000001ade44] .alloc_pages_vma+0x160/0x1e4
      [  543.076098] [c0000000a9083a80] [c00000000018ce04] .handle_pte_fault+0x1b0/0x7e8
      [  543.076113] [c0000000a9083b50] [c00000000018d5a8] .handle_mm_fault+0x16c/0x1a0
      [  543.076129] [c0000000a9083c00] [c0000000007bf1dc] .do_page_fault+0x4d0/0x7a4
      [  543.076144] [c0000000a9083e30] [c0000000000090e8] handle_page_fault+0x10/0x30
      [  543.076155] Instruction dump:
      [  543.076163] 7c630038 78631d88 e80a0000 f8410028 7c0903a6 e91f01de e96a0010 e84a0008
      [  543.076192] 4e800421 e8410028 7c7107b4 7a200fe0 <0b000000> 7f63db78 48785781 60000000
      [  543.076224] ---[ end trace bd5807e8d6ae186b ]---
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      016af59f
    • L
      powerpc: Split the code trying to insert hpte repeatedly as an helper function · b170bd3d
      Li Zhong 提交于
      Move the logic trying to insert hpte in __hash_page_huge() to an helper
      function, so it could also be used by others.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      b170bd3d
  6. 17 3月, 2013 1 次提交
    • A
      powerpc: Update kernel VSID range · c60ac569
      Aneesh Kumar K.V 提交于
      This patch change the kernel VSID range so that we limit VSID_BITS to 37.
      This enables us to support 64TB with 65 bit VA (37+28). Without this patch
      we have boot hangs on platforms that only support 65 bit VA.
      
      With this patch we now have proto vsid generated as below:
      
      We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated
      from mmu context id and effective segment id of the address.
      
      For user processes max context id is limited to ((1ul << 19) - 5)
      for kernel space, we use the top 4 context ids to map address as below
      0x7fffc -  [ 0xc000000000000000 - 0xc0003fffffffffff ]
      0x7fffd -  [ 0xd000000000000000 - 0xd0003fffffffffff ]
      0x7fffe -  [ 0xe000000000000000 - 0xe0003fffffffffff ]
      0x7ffff -  [ 0xf000000000000000 - 0xf0003fffffffffff ]
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Tested-by: NGeoff Levand <geoff@infradead.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.8]
      c60ac569
  7. 13 3月, 2013 1 次提交
  8. 15 2月, 2013 1 次提交
  9. 17 9月, 2012 2 次提交
  10. 05 9月, 2012 1 次提交
  11. 29 3月, 2012 1 次提交
  12. 21 3月, 2012 1 次提交
  13. 23 2月, 2012 1 次提交
    • M
      fadump: Register for firmware assisted dump. · 3ccc00a7
      Mahesh Salgaonkar 提交于
      On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote:
      > On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote:
      >
      > If I have read the code correctly, we are going to get this printk on
      > non-pSeries machines or on older pSeries machines, even if the user
      > has not put the fadump=on option on the kernel command line.  The
      > printk will be annoying since there is no actual error condition.  It
      > seems to me that the condition for the printk should include
      > fw_dump.fadump_enabled.  In other words you should probably add
      >
      > 	if (!fw_dump.fadump_enabled)
      > 		return 0;
      >
      > at the beginning of the function.
      
      Hi Paul,
      
      Thanks for pointing it out. Please find the updated patch below.
      
      The existing patches above this (4/10 through 10/10) cleanly applies
      on this update.
      
      Thanks,
      -Mahesh.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3ccc00a7
  14. 01 11月, 2011 1 次提交
  15. 20 9月, 2011 2 次提交
    • A
      powerpc: Fix oops when echoing bad values to /sys/devices/system/memory/probe · a1194097
      Anton Blanchard 提交于
      If we echo an address the hypervisor doesn't like to
      /sys/devices/system/memory/probe we oops the box:
      
      # echo 0x10000000000 > /sys/devices/system/memory/probe
      
      kernel BUG at arch/powerpc/mm/hash_utils_64.c:541!
      
      The backtrace is:
      
      create_section_mapping
      arch_add_memory
      add_memory
      memory_probe_store
      sysdev_class_store
      sysfs_write_file
      vfs_write
      SyS_write
      
      In create_section_mapping we BUG if htab_bolt_mapping returned
      an error. A better approach is to return an error which will
      propagate back to userspace.
      
      Rerunning the test with this patch applied:
      
      # echo 0x10000000000 > /sys/devices/system/memory/probe
      -bash: echo: write error: Invalid argument
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a1194097
    • B
      powerpc: Hugetlb for BookE · 41151e77
      Becky Bruce 提交于
      Enable hugepages on Freescale BookE processors.  This allows the kernel to
      use huge TLB entries to map pages, which can greatly reduce the number of
      TLB misses and the amount of TLB thrashing experienced by applications with
      large memory footprints.  Care should be taken when using this on FSL
      processors, as the number of large TLB entries supported by the core is low
      (16-64) on current processors.
      
      The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g.
      Page sizes larger than the max zone size are called "gigantic" pages and
      must be allocated on the command line (and cannot be deallocated).
      
      This is currently only fully implemented for Freescale 32-bit BookE
      processors, but there is some infrastructure in the code for
      64-bit BooKE.
      Signed-off-by: NBecky Bruce <beckyb@kernel.crashing.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      41151e77
  16. 27 4月, 2011 1 次提交
  17. 20 4月, 2011 1 次提交
  18. 31 3月, 2011 1 次提交
  19. 29 11月, 2010 1 次提交
  20. 18 11月, 2010 1 次提交
    • M
      powerpc: Fix call to subpage_protection() · 1c2c25c7
      Michael Neuling 提交于
      In:
        powerpc/mm: Fix pgtable cache cleanup with CONFIG_PPC_SUBPAGE_PROT
        commit d28513bc
        Author: David Gibson <david@gibson.dropbear.id.au>
      
      subpage_protection() was changed to to take an mm rather a pgdir but it
      didn't change calling site in hashpage_preload().  The change wasn't
      noticed at compile time since hashpage_preload() used a void* as the
      parameter to subpage_protection().
      
      This is obviously wrong and can trigger the following crash when
      CONFIG_SLAB, CONFIG_DEBUG_SLAB, CONFIG_PPC_64K_PAGES
      CONFIG_PPC_SUBPAGE_PROT are enabled.
      
      Freeing unused kernel memory: 704k freed
      Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6c49b7
      Faulting instruction address: 0xc0000000000410f4
      cpu 0x2: Vector: 300 (Data Access) at [c00000004233f590]
          pc: c0000000000410f4: .hash_preload+0x258/0x338
          lr: c000000000041054: .hash_preload+0x1b8/0x338
          sp: c00000004233f810
         msr: 8000000000009032
         dar: 6b6b6b6b6b6c49b7
       dsisr: 40000000
        current = 0xc00000007e2c0070
        paca    = 0xc000000007fe0500
          pid   = 1, comm = init
      enter ? for help
      [c00000004233f810] c000000000041020 .hash_preload+0x184/0x338 (unreliable)
      [c00000004233f8f0] c00000000003ed98 .update_mmu_cache+0xb0/0xd0
      [c00000004233f990] c000000000157754 .__do_fault+0x48c/0x5dc
      [c00000004233faa0] c000000000158fd0 .handle_mm_fault+0x508/0xa8c
      [c00000004233fb90] c0000000006acdd4 .do_page_fault+0x428/0x6ac
      [c00000004233fe30] c000000000005260 handle_page_fault+0x20/0x74
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1c2c25c7
  21. 05 8月, 2010 3 次提交
  22. 04 8月, 2010 2 次提交
  23. 23 7月, 2010 2 次提交
  24. 14 7月, 2010 1 次提交
  25. 18 12月, 2009 2 次提交
    • D
      powerpc/mm: Fix stupid bug in subpge protection handling · a1128f8f
      David Gibson 提交于
      Commit d28513bc ("Fix bug in pagetable
      cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for
      breakage caused by an earlier clean up patch of mine, contains a
      stupid bug.  I changed the parameters of the subpage_protection()
      function, but failed to update one of the callers.
      
      This patch fixes it, and replaces a void * with a typed pointer so
      that the compiler will warn on such an error in future.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a1128f8f
    • S
      powerpc/mm: Fix hash_utils_64.c compile errors with DEBUG enabled. · 5c339919
      Sachin P. Sant 提交于
      This time without the funny characters.
      
      Fix following build errors generated with DEBUG=1
      
      cc1: warnings being treated as errors
      arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes':
      arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
      arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int'
      arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize':
      arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
      ... SNIP ...
      Signed-off-by: NSachin Sant <sachinp@in.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5c339919
  26. 08 12月, 2009 1 次提交
  27. 02 12月, 2009 1 次提交