1. 11 2月, 2009 1 次提交
    • B
      powerpc/mm: Rework I$/D$ coherency (v3) · 8d30c14c
      Benjamin Herrenschmidt 提交于
      This patch reworks the way we do I and D cache coherency on PowerPC.
      
      The "old" way was split in 3 different parts depending on the processor type:
      
         - Hash with per-page exec support (64-bit and >= POWER4 only) does it
      at hashing time, by preventing exec on unclean pages and cleaning pages
      on exec faults.
      
         - Everything without per-page exec support (32-bit hash, 8xx, and
      64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
      
         - Embedded with per-page exec support does it from do_page_fault() on
      exec faults, in a way similar to what the hash code does.
      
      That leads to confusion, and bugs. For example, the method using update_mmu_cache()
      is racy on SMP where another processor can see the new PTE and hash it in before
      we have cleaned the cache, and then blow trying to execute. This is hard to hit but
      I think it has bitten us in the past.
      
      Also, it's inefficient for embedded where we always end up having to do at least
      one more page fault.
      
      This reworks the whole thing by moving the cache sync into two main call sites,
      though we keep different behaviours depending on the HW capability. The call
      sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
      which joins the former in pgtable.c
      
      The base idea for Embedded with per-page exec support, is that we now do the
      flush at set_pte_at() time when coming from an exec fault, which allows us
      to avoid the double fault problem completely (we can even improve the situation
      more by implementing TLB preload in update_mmu_cache() but that's for later).
      
      If for some reason we didn't do it there and we try to execute, we'll hit
      the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
      to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
      this guys also perform the I/D cache sync for exec faults now. This second path
      is the catch all for things that weren't cleaned at set_pte_at() time.
      
      For cpus without per-pag exec support, we always do the sync at set_pte_at(),
      thus guaranteeing that when the PTE is visible to other processors, the cache
      is clean.
      
      For the 64-bit hash with per-page exec support case, we keep the old mechanism
      for now. I'll look into changing it later, once I've reworked a bit how we
      use _PAGE_EXEC.
      
      This is also a first step for adding _PAGE_EXEC support for embedded platforms
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d30c14c
  2. 07 1月, 2009 1 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
  3. 21 12月, 2008 2 次提交
    • B
      powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED · 64b3d0e8
      Benjamin Herrenschmidt 提交于
      Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
      in the hash code based on some CPU feature bit.  We also manipulate
      _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
      
      This changes the logic so that instead, the PTE now contains
      _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
      that need it.  The hash code clears it if the feature bit is not set.
      
      It also adds some clean accessors to setup various valid combinations
      of access flags and change various bits of code to use them instead.
      
      This should help having the PTE actually containing the bit
      combinations that we really want.
      
      I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
      set it explicitely from the TLB miss.  I will ultimately remove it
      completely as it appears that it might not be needed after all
      but in the meantime, having it in the TLB miss makes things a
      lot easier.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      64b3d0e8
    • B
      powerpc/mm: Add SMP support to no-hash TLB handling · f048aace
      Benjamin Herrenschmidt 提交于
      This commit moves the whole no-hash TLB handling out of line into a
      new tlb_nohash.c file, and implements some basic SMP support using
      IPIs and/or broadcast tlbivax instructions.
      
      Note that I'm using local invalidations for D->I cache coherency.
      
      At worst, if another processor is trying to execute the same and
      has the old entry in its TLB, it will just take a fault and re-do
      the TLB flush locally (it won't re-do the cache flush in any case).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f048aace
  4. 20 10月, 2008 1 次提交
    • B
      mm: cleanup to make remove_memory() arch-neutral · 71088785
      Badari Pulavarty 提交于
      There is nothing architecture specific about remove_memory().
      remove_memory() function is common for all architectures which support
      hotplug memory remove.  Instead of duplicating it in every architecture,
      collapse them into arch neutral function.
      
      [akpm@linux-foundation.org: fix the export]
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71088785
  5. 07 10月, 2008 1 次提交
    • R
      powerpc: Avoid integer overflow in page_is_ram() · a880e762
      Roland Dreier 提交于
      Commit 8b150478 ("ppc: make phys_mem_access_prot() work with pfns
      instead of addresses") fixed page_is_ram() in arch/ppc to avoid overflow
      for addresses above 4G on 32-bit kernels.  However arch/powerpc's
      page_is_ram() is missing the same fix -- it computes a physical address
      by doing pfn << PAGE_SHIFT, which overflows if pfn corresponds to a page
      above 4G.
      
      In particular this causes pages above 4G to be mapped with the wrong
      caching attribute; for example many ppc440-based SoCs have PCI space
      above 4G, and mmap()ing MMIO space may end up with a mapping that has
      caching enabled.
      
      Fix this by working with the pfn and avoiding the conversion to
      physical address that causes the overflow.  This patch compares the
      pfn to max_pfn, which is a semantic change from the old code -- that
      code compared the physical address to high_memory, which corresponds
      to max_low_pfn.  However, I think that was is another bug, since
      highmem pages are still RAM.
      Reported-by: Nvb <vb@vsbe.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a880e762
  6. 04 8月, 2008 1 次提交
  7. 27 7月, 2008 1 次提交
  8. 10 7月, 2008 1 次提交
  9. 03 7月, 2008 1 次提交
  10. 09 6月, 2008 1 次提交
  11. 29 4月, 2008 1 次提交
  12. 28 4月, 2008 1 次提交
  13. 24 4月, 2008 2 次提交
    • K
      [POWERPC] Port fixmap from x86 and use for kmap_atomic · 2c419bde
      Kumar Gala 提交于
      The fixmap code from x86 allows us to have compile time virtual addresses
      that we change the physical addresses of at run time.
      
      This is useful for applications like kmap_atomic, PCI config that is done
      via direct memory map, kexec/kdump.
      
      We got ride of CONFIG_HIGHMEM_START as we can now determine a more optimal
      location for PKMAP_BASE based on where the fixmap addresses start and
      working back from there.
      
      Additionally, the kmap code in asm-powerpc/highmem.h always had debug
      enabled.  Moved to using CONFIG_DEBUG_HIGHMEM to determine if we should
      have the extra debug checking.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      2c419bde
    • K
      [POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero) · 37dd2bad
      Kumar Gala 提交于
      Added support to allow an 85xx kernel to be run from a non-zero physical
      address (useful for cooperative asymmetric multiprocessing situations and
      kdump).  The support can be configured at compile time by setting
      CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
      desired.
      
      Alternatively, the kernel build can set CONFIG_RELOCATABLE.  Setting this
      config option causes the kernel to determine at runtime the physical
      addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START.  If
      CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
      However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
      header physical address field in the resulting ELF image.
      
      Currently we are limited to running at a physical address that is a
      multiple of 256M.  This is due to how we map TLBs to cover
      lowmem.  This should be fixed to allow 64M or maybe even 16M alignment
      in the future.  It is considered an error to try and run a kernel at a
      non-aligned physical address.
      
      All the magic for this support is accomplished by proper initialization
      of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
      
      The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
      ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
      
      /dev/mem continues to allow access to any physical address in the system
      regardless of how CONFIG_PHYSICAL_START is set.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      37dd2bad
  14. 17 4月, 2008 1 次提交
  15. 01 4月, 2008 2 次提交
  16. 14 2月, 2008 1 次提交
  17. 08 2月, 2008 3 次提交
  18. 06 2月, 2008 1 次提交
  19. 24 1月, 2008 1 次提交
    • K
      [POWERPC] Fix handling of memreserve if the range lands in highmem · f98eeb4e
      Kumar Gala 提交于
      There were several issues if a memreserve range existed and happened
      to be in highmem:
      
      * The bootmem allocator is only aware of lowmem so calling
        reserve_bootmem with a highmem address would cause a BUG_ON
      * All highmem pages were provided to the buddy allocator
      
      Added a lmb_is_reserved() api that we now use to determine if a highem
      page should continue to be PageReserved or provided to the buddy
      allocator.
      
      Also, we incorrectly reported the amount of pages reserved since all
      highmem pages are initally marked reserved and we clear the
      PageReserved flag as we "free" up the highmem pages.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      f98eeb4e
  20. 20 11月, 2007 1 次提交
  21. 17 10月, 2007 1 次提交
  22. 17 8月, 2007 1 次提交
  23. 10 7月, 2007 1 次提交
  24. 14 6月, 2007 1 次提交
  25. 22 5月, 2007 1 次提交
  26. 09 5月, 2007 1 次提交
  27. 02 5月, 2007 1 次提交
  28. 13 4月, 2007 1 次提交
    • B
      [POWERPC] Fix 32-bit mm operations when not using BATs · ee4f2ea4
      Benjamin Herrenschmidt 提交于
      On hash table based 32 bits powerpc's, the hash management code runs with
      a big spinlock. It's thus important that it never causes itself a hash
      fault. That code is generally safe (it does memory accesses in real mode
      among other things) with the exception of the actual access to the code
      itself. That is, the kernel text needs to be accessible without taking
      a hash miss exceptions.
      
      This is currently guaranteed by having a BAT register mapping part of the
      linear mapping permanently, which includes the kernel text. But this is
      not true if using the "nobats" kernel command line option (which can be
      useful for debugging) and will not be true when using DEBUG_PAGEALLOC
      implemented in a subsequent patch.
      
      This patch fixes this by pre-faulting in the hash table pages that hit
      the kernel text, and making sure we never evict such a page under hash
      pressure.
      Signed-off-by: NBenjamin Herrenchmidt <benh@kernel.crashing.org>
      
       arch/powerpc/mm/hash_low_32.S |   22 ++++++++++++++++++++--
       arch/powerpc/mm/mem.c         |    3 ---
       arch/powerpc/mm/mmu_decl.h    |    4 ++++
       arch/powerpc/mm/pgtable_32.c  |   11 +++++++----
       4 files changed, 31 insertions(+), 9 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ee4f2ea4
  29. 13 2月, 2007 1 次提交
    • B
      [POWERPC] Fix vDSO page count calculation · 7ac9a137
      Benjamin Herrenschmidt 提交于
      The recent vDSO consolidation patches broke powerpc due to a mistake
      in the definition of MAXPAGES constants. This fixes it by moving to
      a dynamically allocated array of pages instead as I don't like much
      hard coded size limits. Also move the vdso initialisation to an initcall
      since it doesn't really need to be done -that- early.
      
      Applogies for not catching the breakage earlier, Roland _did_ CC me on
      his patches a while ago, I got busy with other things and forgot to test
      them.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7ac9a137
  30. 08 2月, 2007 1 次提交
  31. 07 2月, 2007 1 次提交
  32. 12 10月, 2006 1 次提交
    • M
      [PATCH] mm: use symbolic names instead of indices for zone initialisation · 6391af17
      Mel Gorman 提交于
      Arch-independent zone-sizing is using indices instead of symbolic names to
      offset within an array related to zones (max_zone_pfns).  The unintended
      impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
      of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set.  As a result, the
      the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.
      
      The following patch properly initialises the max_zone_pfns[] array and uses
      symbolic names instead of indices in each architecture using
      arch-independent zone-sizing.  Two users have successfully booted their
      powerpcs with it (one an ibook G4).  It has also been boot tested on x86,
      x86_64, ppc64 and ia64.  Please merge for 2.6.19-rc2.
      
      Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
      first fix.  Additional credit to Johannes Berg and Andreas Schwab for
      reporting the problem and testing on powerpc.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6391af17
  33. 27 9月, 2006 1 次提交
  34. 01 7月, 2006 1 次提交
  35. 28 6月, 2006 1 次提交