1. 11 6月, 2009 1 次提交
  2. 27 5月, 2009 3 次提交
  3. 26 5月, 2009 1 次提交
  4. 18 5月, 2009 1 次提交
    • M
      powerpc: Do not assert pte_locked for hugepage PTE entries · af3e4aca
      Mel Gorman 提交于
      With CONFIG_DEBUG_VM, an assertion is made when changing the protection
      flags of a PTE that the PTE is locked. Huge pages use a different pagetable
      format and the assertion is bogus and will always trigger with a bug looking
      something like
      
       Unable to handle kernel paging request for data at address 0xf1a00235800006f8
       Faulting instruction address: 0xc000000000034a80
       Oops: Kernel access of bad area, sig: 11 [#1]
       SMP NR_CPUS=32 NUMA Maple
       Modules linked in: dm_snapshot dm_mirror dm_region_hash
        dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
        pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
        windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
        windfarm_cpufreq_clamp windfarm_core i2c_powermac
       NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
       REGS: c000000003037600 TRAP: 0300   Not tainted (2.6.30-rc3-autokern1)
       MSR: 9000000000009032 <EE,ME,IR,DR>  CR: 28002484  XER: 200fffff
       DAR: f1a00235800006f8, DSISR: 0000000040010000
       TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
       GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
       GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
       GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
       GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
       GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
       GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
       GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
       GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
       NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
       LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
       Call Trace:
       [c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
       [c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
       [c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
       [c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
       [c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
       [c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
       [c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
       Instruction dump:
       7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
       7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000
      
      This patch fixes the problem by not asseting the PTE is locked for VMAs
      backed by huge pages.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      af3e4aca
  5. 15 5月, 2009 1 次提交
  6. 23 4月, 2009 2 次提交
  7. 22 4月, 2009 1 次提交
  8. 09 4月, 2009 1 次提交
  9. 08 4月, 2009 1 次提交
  10. 07 4月, 2009 1 次提交
  11. 06 4月, 2009 2 次提交
  12. 24 3月, 2009 5 次提交
  13. 11 3月, 2009 2 次提交
  14. 09 3月, 2009 1 次提交
  15. 23 2月, 2009 6 次提交
  16. 13 2月, 2009 3 次提交
    • D
      powerpc/mm: Fix numa reserve bootmem page selection · 06eccea6
      Dave Hansen 提交于
      Fix the powerpc NUMA reserve bootmem page selection logic.
      
      commit 8f64e1f2 (powerpc: Reserve
      in bootmem lmb reserved regions that cross NUMA nodes) changed
      the logic for how the powerpc LMB reserved regions were converted
      to bootmen reserved regions.  As the folowing discussion reports,
      the new logic was not correct.
      
      mark_reserved_regions_for_nid() goes through each LMB on the
      system that specifies a reserved area.  It searches for
      active regions that intersect with that LMB and are on the
      specified node.  It attempts to bootmem-reserve only the area
      where the active region and the reserved LMB intersect.  We
      can not reserve things on other nodes as they may not have
      bootmem structures allocated, yet.
      
      We base the size of the bootmem reservation on two possible
      things.  Normally, we just make the reservation start and
      stop exactly at the start and end of the LMB.
      
      However, the LMB reservations are not aware of NUMA nodes and
      on occasion a single LMB may cross into several adjacent
      active regions.  Those may even be on different NUMA nodes
      and will require separate calls to the bootmem reserve
      functions.  So, the bootmem reservation must be trimmed to
      fit inside the current active region.
      
      That's all fine and dandy, but we trim the reservation
      in a page-aligned fashion.  That's bad because we start the
      reservation at a non-page-aligned address: physbase.
      
      The reservation may only span 2 bytes, but that those bytes
      may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
      
      Take the case where you reserve 0x2 bytes at 0x0fff and
      where the active region ends at 0x1000.  You'll jump into
      that if() statment, but node_ar.end_pfn=0x1 and
      start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
      and then call
      
        reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
      
      0x1000 may not be on the same node as 0xfff.  Oops.
      
      In almost all the vm code, end_<anything> is not inclusive.
      If you have an end_pfn of 0x1234, page 0x1234 is not
      included in the range.  Using PFN_UP instead of the
      (>> >> PAGE_SHIFT) will make this consistent with the other VM
      code.
      
      We also need to do math for the reserved size with physbase
      instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
      *precisely* the end of the node.  However,
      (start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
      of the reserved area.  That is, of course, physbase.
      If we don't use physbase here, the reserve_size can be
      made too large.
      
      From: Dave Hansen <dave@linux.vnet.ibm.com>
      Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      06eccea6
    • K
      powerpc/fsl-booke: Fix compile warning · 96a8bac5
      Kumar Gala 提交于
      arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
      arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      96a8bac5
    • K
      powerpc/fsl-booke: Add new ISA 2.06 page sizes and MAS defines · d66c82ea
      Kumar Gala 提交于
      The Power ISA 2.06 added power of two page sizes to the embedded MMU
      architecture.  Its done it such a way to be code compatiable with the
      existing HW.  Made the minor code changes to support both power of two
      and power of four page sizes.  Also added some new MAS bits and macros
      that are defined as part of the 2.06 ISA.  Renamed some things to use
      the 'Book-3e' concept to convey the new MMU that is based on the
      Freescale Book-E MMU programming model.
      
      Note, its still invalid to try and use a page size that isn't supported
      by cpu.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      d66c82ea
  17. 11 2月, 2009 5 次提交
    • K
      powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW · f99fb8a2
      Kumar Gala 提交于
      The following commit:
      
      commit 64b3d0e8
      Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Date:   Thu Dec 18 19:13:51 2008 +0000
      
          powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
      
      broke setting of the _PAGE_COHERENT bit in the PPC HW PTE.  Since we now
      actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
      out before we propogate it to the PPC HW PTE.
      Reported-by: NMartyn Welch <martyn.welch@gefanuc.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f99fb8a2
    • B
      powerpc/mm: Rework I$/D$ coherency (v3) · 8d30c14c
      Benjamin Herrenschmidt 提交于
      This patch reworks the way we do I and D cache coherency on PowerPC.
      
      The "old" way was split in 3 different parts depending on the processor type:
      
         - Hash with per-page exec support (64-bit and >= POWER4 only) does it
      at hashing time, by preventing exec on unclean pages and cleaning pages
      on exec faults.
      
         - Everything without per-page exec support (32-bit hash, 8xx, and
      64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
      
         - Embedded with per-page exec support does it from do_page_fault() on
      exec faults, in a way similar to what the hash code does.
      
      That leads to confusion, and bugs. For example, the method using update_mmu_cache()
      is racy on SMP where another processor can see the new PTE and hash it in before
      we have cleaned the cache, and then blow trying to execute. This is hard to hit but
      I think it has bitten us in the past.
      
      Also, it's inefficient for embedded where we always end up having to do at least
      one more page fault.
      
      This reworks the whole thing by moving the cache sync into two main call sites,
      though we keep different behaviours depending on the HW capability. The call
      sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
      which joins the former in pgtable.c
      
      The base idea for Embedded with per-page exec support, is that we now do the
      flush at set_pte_at() time when coming from an exec fault, which allows us
      to avoid the double fault problem completely (we can even improve the situation
      more by implementing TLB preload in update_mmu_cache() but that's for later).
      
      If for some reason we didn't do it there and we try to execute, we'll hit
      the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
      to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
      this guys also perform the I/D cache sync for exec faults now. This second path
      is the catch all for things that weren't cleaned at set_pte_at() time.
      
      For cpus without per-pag exec support, we always do the sync at set_pte_at(),
      thus guaranteeing that when the PTE is visible to other processors, the cache
      is clean.
      
      For the 64-bit hash with per-page exec support case, we keep the old mechanism
      for now. I'll look into changing it later, once I've reworked a bit how we
      use _PAGE_EXEC.
      
      This is also a first step for adding _PAGE_EXEC support for embedded platforms
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d30c14c
    • A
      powerpc/mm: Move 64-bit unmapped_area to top of address space · 91b0f5ec
      Anton Blanchard 提交于
      We currently place mmaps just below the stack on 32bit, but leave them
      in the middle of the address space on 64bit:
      
      00100000-00120000 r-xp 00100000 00:00 0                    [vdso]
      10000000-10010000 r-xp 00000000 08:06 179534               /tmp/sleep
      10010000-10020000 rw-p 00000000 08:06 179534               /tmp/sleep
      10020000-10130000 rw-p 10020000 00:00 0                    [heap]
      40000000000-40000030000 r-xp 00000000 08:06 440743         /lib64/ld-2.9.so
      40000030000-40000040000 rw-p 00020000 08:06 440743         /lib64/ld-2.9.so
      40000050000-400001f0000 r-xp 00000000 08:06 440671         /lib64/libc-2.9.so
      400001f0000-40000200000 r--p 00190000 08:06 440671         /lib64/libc-2.9.so
      40000200000-40000220000 rw-p 001a0000 08:06 440671         /lib64/libc-2.9.so
      40000220000-40008230000 rw-p 40000220000 00:00 0
      fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0           [stack]
      
      Right now it isn't an issue, but at some stage we will run into mmap or
      hugetlb allocation issues. Using the same layout as 32bit gives us a
      some breathing room. This matches what x86-64 is doing too.
      
      00100000-00103000 r-xp 00100000 00:00 0                    [vdso]
      10000000-10001000 r-xp 00000000 08:06 554894               /tmp/test
      10010000-10011000 r--p 00000000 08:06 554894               /tmp/test
      10011000-10012000 rw-p 00001000 08:06 554894               /tmp/test
      10012000-10113000 rw-p 10012000 00:00 0                    [heap]
      fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0
      ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591         /lib64/libc-2.9.so
      ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591         /lib64/libc-2.9.so
      ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591         /lib64/libc-2.9.so
      ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591         /lib64/libc-2.9.so
      ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0
      ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663         /lib64/ld-2.9.so
      ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0
      ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0
      ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663         /lib64/ld-2.9.so
      ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663         /lib64/ld-2.9.so
      ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0
      fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0           [stack]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      91b0f5ec
    • M
      powerpc/numa: Remove redundant find_cpu_node() · 8b16cd23
      Milton Miller 提交于
      Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
      a less restrictive section (text vs cpuinit).
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8b16cd23
    • M
      powerpc/numa: Avoid possible reference beyond prop. length in find_min_common_depth() · 20fcefe5
      Milton Miller 提交于
      find_min_common_depth() was checking the property length incorrectly.
      The value is in bytes not cells, and it is using the second entry.
      Signed-off-By: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      20fcefe5
  18. 10 2月, 2009 1 次提交
  19. 29 1月, 2009 2 次提交
    • T
      powerpc/fsl-booke: Make CAM entries used for lowmem configurable · 96051465
      Trent Piepho 提交于
      On booke processors, the code that maps low memory only uses up to three
      CAM entries, even though there are sixteen and nothing else uses them.
      
      Make this number configurable in the advanced options menu along with max
      low memory size.  If one wants 1 GB of lowmem, then it's typically
      necessary to have four CAM entries.
      Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      96051465
    • T
      powerpc/fsl-booke: Allow larger CAM sizes than 256 MB · c8f3570b
      Trent Piepho 提交于
      The code that maps kernel low memory would only use page sizes up to 256
      MB.  On E500v2 pages up to 4 GB are supported.
      
      However, a page must be aligned to a multiple of the page's size.  I.e.
      256 MB pages must aligned to a 256 MB boundary.  This was enforced by a
      requirement that the physical and virtual addresses of the start of lowmem
      be aligned to 256 MB.  Clearly requiring 1GB or 4GB alignment to allow
      pages of that size isn't acceptable.
      
      To solve this, I simply have adjust_total_lowmem() take alignment into
      account when it decides what size pages to use.  Give it PAGE_OFFSET =
      0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
      pages like this:
      PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
      PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
      PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
      PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
      PA 0xA000_0000 VA 0xE000_0000 Size 256 MB
      
      Because the lowmem mapping code now takes alignment into account,
      PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB.  Even lower might be
      possible.  The lowmem code will work down to 4 kB but it's possible some of
      the boot code will fail before then.  Poor alignment will force small pages
      to be used, which combined with the limited number of TLB1 pages available,
      will result in very little memory getting mapped.  So alignments less than
      64 MB probably aren't very useful anyway.
      Signed-off-by: NTrent Piepho <tpiepho@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      c8f3570b