1. 20 8月, 2009 9 次提交
    • B
      powerpc/mm: Move around mmu_gathers definition on 64-bit · a8f7758c
      Benjamin Herrenschmidt 提交于
      The definition for the global structure mmu_gathers, used by generic code,
      is currently defined in multiple places not including anything used by
      64-bit Book3E. This changes it by moving to one place common to all
      processors.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a8f7758c
    • B
      powerpc: Add memory management headers for new 64-bit BookE · 57e2a99f
      Benjamin Herrenschmidt 提交于
      This adds the PTE and pgtable format definitions, along with changes
      to the kernel memory map and other definitions related to implementing
      support for 64-bit Book3E. This also shields some asm-offset bits that
      are currently only relevant on 32-bit
      
      We also move the definition of the "linux" page size constants to
      the common mmu.h file and add a few sizes that are relevant to
      embedded processors.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      57e2a99f
    • B
      powerpc/mm: Rework & cleanup page table freeing code path · c7cc58a1
      Benjamin Herrenschmidt 提交于
      That patch used to just add a hook to page table flushing but
      pulling that string brought out a whole bunch of issues, so it
      now does that and more:
      
       - We now make the RCU batching of page freeing SMP only, as I
      believe it was intended initially. We make a few more things compile
      to nothing on !CONFIG_SMP
      
       - Some macros are turned into functions, though that forced me to
      out of line a few stuffs due to unsolvable include depenencies,
      however it's probably better that way anyway, it's not -that-
      critical code path.
      
       - 32-bit didn't call pte_free_finish() on tlb_flush() which means
      that it wouldn't push out the batch to RCU for delayed freeing when
      a bunch of page tables have been freed, they would just stay in there
      until the batch gets full.
      
      64-bit BookE will use that hook to maintain the virtually linear
      page tables or the indirect entries in the TLB when using the
      HW loader.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c7cc58a1
    • B
      powerpc/mm: Make low level TLB flush ops on BookE take additional args · d4e167da
      Benjamin Herrenschmidt 提交于
      We need to pass down whether the page is direct or indirect and we'll
      need to pass the page size to _tlbil_va and _tlbivax_bcast
      
      We also add a new low level _tlbil_pid_noind() which does a TLB flush
      by PID but avoids flushing indirect entries if possible
      
      This implements those new prototypes but defines them with inlines
      or macros so that no additional arguments are actually passed on current
      processors.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d4e167da
    • B
      powerpc/mm: Add support for early ioremap on non-hash 64-bit processors · a245067e
      Benjamin Herrenschmidt 提交于
      This adds some code to do early ioremap's using page tables instead of
      bolting entries in the hash table. This will be used by the upcoming
      64-bits BookE port.
      
      The patch also changes the test for early vs. late ioremap to use
      slab_is_available() instead of our old hackish mem_init_done.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a245067e
    • B
      powerpc/mm: Add HW threads support to no_hash TLB management · fcce8109
      Benjamin Herrenschmidt 提交于
      The current "no hash" MMU context management code is written with
      the assumption that one CPU == one TLB. This is not the case on
      implementations that support HW multithreading, where several
      linux CPUs can share the same TLB.
      
      This adds some basic support for this to our context management
      and our TLB flushing code.
      
      It also cleans up the optional debugging output a bit
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fcce8109
    • B
      powerpc: Use names rather than numbers for SPRGs (v2) · ee43eb78
      Benjamin Herrenschmidt 提交于
      The kernel uses SPRG registers for various purposes, typically in
      low level assembly code as scratch registers or to hold per-cpu
      global infos such as the PACA or the current thread_info pointer.
      
      We want to be able to easily shuffle the usage of those registers
      as some implementations have specific constraints realted to some
      of them, for example, some have userspace readable aliases, etc..
      and the current choice isn't always the best.
      
      This patch should not change any code generation, and replaces the
      usage of SPRN_SPRGn everywhere in the kernel with a named replacement
      and adds documentation next to the definition of the names as to
      what those are used for on each processor family.
      
      The only parts that still use the original numbers are bits of KVM
      or suspend/resume code that just blindly needs to save/restore all
      the SPRGs.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ee43eb78
    • A
      powerpc: Preload application text segment instead of TASK_UNMAPPED_BASE · de4376c2
      Anton Blanchard 提交于
      TASK_UNMAPPED_BASE is not used with the new top down mmap layout. We can
      reuse this preload slot by loading in the segment at 0x10000000, where almost
      all PowerPC binaries are linked at.
      
      On a microbenchmark that bounces a token between two 64bit processes over pipes
      and calls gettimeofday each iteration (to access the VDSO), both the 32bit and
      64bit context switch rate improves (tested on a 4GHz POWER6):
      
      32bit: 273k/sec -> 283k/sec
      64bit: 277k/sec -> 284k/sec
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      de4376c2
    • A
      powerpc: Rearrange SLB preload code · 5eb9bac0
      Anton Blanchard 提交于
      With the new top down layout it is likely that the pc and stack will be in the
      same segment, because the pc is most likely in a library allocated via a top
      down mmap. Right now we bail out early if these segments match.
      
      Rearrange the SLB preload code to sanity check all SLB preload addresses
      are not in the kernel, then check all addresses for conflicts.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5eb9bac0
  2. 30 7月, 2009 1 次提交
    • K
      powerpc/mm: Fix SMP issue with MMU context handling code · 5156ddce
      Kumar Gala 提交于
      In switch_mmu_context() if we call steal_context_smp() to get a context
      to use we shouldn't fall through and than call steal_context_up().  Doing
      so can be problematic in that the 'mm' that steal_context_up() ends up
      using will not get marked dirty in the stale_map[] for other CPUs that
      might have used that mm.  Thus we could end up with stale TLB entries in
      the other CPUs that can cause all kinda of havoc.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      5156ddce
  3. 28 7月, 2009 1 次提交
    • B
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt 提交于
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  4. 08 7月, 2009 5 次提交
  5. 26 6月, 2009 1 次提交
  6. 22 6月, 2009 1 次提交
  7. 16 6月, 2009 1 次提交
    • M
      powerpc: Add configurable -Werror for arch/powerpc · ba55bd74
      Michael Ellerman 提交于
      Add the option to build the code under arch/powerpc with -Werror.
      
      The intention is to make it harder for people to inadvertantly introduce
      warnings in the arch/powerpc code. It needs to be configurable so that
      if a warning is introduced, people can easily work around it while it's
      being fixed.
      
      The option is a negative, ie. don't enable -Werror, so that it will be
      turned on for allyes and allmodconfig builds.
      
      The default is n, in the hope that developers will build with -Werror,
      that will probably lead to some build breaks, I am prepared to be flamed.
      
      It's not enabled for math-emu, which is a steaming pile of warnings.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ba55bd74
  8. 13 6月, 2009 1 次提交
  9. 11 6月, 2009 1 次提交
  10. 09 6月, 2009 4 次提交
    • B
      powerpc: Shield code specific to 64-bit server processors · 94491685
      Benjamin Herrenschmidt 提交于
      This is a random collection of added ifdef's around portions of
      code that only mak sense on server processors. Using either
      CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.
      
      This is meant to make the future merging of Book3E 64-bit support
      easier.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94491685
    • B
      powerpc: Set init_bootmem_done on NUMA platforms as well · d3f6204a
      Benjamin Herrenschmidt 提交于
      For some obscure reason, we only set init_bootmem_done after initializing
      bootmem when NUMA isn't enabled. We even document this next to the declaration
      of that global in system.h which of course I didn't read before I had to
      debug why some WIP code wasn't working properly...
      
      This patch changes it so that we always set it after bootmem is initialized
      which should have always been the case... go figure !
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d3f6204a
    • B
      powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock · b46b6942
      Benjamin Herrenschmidt 提交于
      The MMU context_lock can be taken from switch_mm() while the
      rq->lock is held. The rq->lock can also be taken from interrupts,
      thus if we get interrupted in destroy_context() with the context
      lock held and that interrupt tries to take the rq->lock, there's
      a possible deadlock scenario with another CPU having the rq->lock
      and calling switch_mm() which takes our context lock.
      
      The fix is to always ensure interrupts are off when taking our
      context lock. The switch_mm() path is already good so this fixes
      the destroy_context() path.
      
      While at it, turn the context lock into a new style spinlock.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b46b6942
    • B
      powerpc/mm: Fix some SMP issues with MMU context handling · 3035c863
      Benjamin Herrenschmidt 提交于
      This patch fixes a couple of issues that can happen as a result
      of steal_context() dropping the context_lock when all possible
      PIDs are ineligible for stealing (hopefully an extremely hard to
      hit occurence).
      
      This case exposes the possibility of a stale context_mm[] entry
      to be seen since destroy_context() doesn't clear it and the free
      map isn't re-tested. It also means steal_context() will not notice
      a context freed while the lock was help, thus possibly trying to
      steal a context when a free one was available.
      
      This fixes it by always returning to the caller from steal_context
      when it dropped the lock with a return value that causes the
      caller to re-samble the number of free contexts, along with
      properly clearing the context_mm[] array for destroyed contexts.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3035c863
  11. 27 5月, 2009 3 次提交
  12. 26 5月, 2009 1 次提交
  13. 21 5月, 2009 1 次提交
  14. 18 5月, 2009 1 次提交
    • M
      powerpc: Do not assert pte_locked for hugepage PTE entries · af3e4aca
      Mel Gorman 提交于
      With CONFIG_DEBUG_VM, an assertion is made when changing the protection
      flags of a PTE that the PTE is locked. Huge pages use a different pagetable
      format and the assertion is bogus and will always trigger with a bug looking
      something like
      
       Unable to handle kernel paging request for data at address 0xf1a00235800006f8
       Faulting instruction address: 0xc000000000034a80
       Oops: Kernel access of bad area, sig: 11 [#1]
       SMP NR_CPUS=32 NUMA Maple
       Modules linked in: dm_snapshot dm_mirror dm_region_hash
        dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
        pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
        windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
        windfarm_cpufreq_clamp windfarm_core i2c_powermac
       NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
       REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
       MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
       DAR: f1a00235800006f8, DSISR: 0000000040010000
       TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
       GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
       GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
       GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
       GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
       GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
       GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
       GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
       GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
       NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
       LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
       Call Trace:
       [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
       [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
       [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
       [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
       [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
       [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
       [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
       Instruction dump:
       7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
       7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000
      
      This patch fixes the problem by not asseting the PTE is locked for VMAs
      backed by huge pages.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af3e4aca
  15. 15 5月, 2009 1 次提交
  16. 23 4月, 2009 2 次提交
  17. 22 4月, 2009 1 次提交
  18. 09 4月, 2009 1 次提交
  19. 08 4月, 2009 1 次提交
  20. 07 4月, 2009 1 次提交
  21. 06 4月, 2009 2 次提交