1. 03 10月, 2013 1 次提交
    • N
      powerpc: Fix memory hotplug with sparse vmemmap · f7e3334a
      Nathan Fontenot 提交于
      Previous commit 46723bfa... introduced a new config option
      HAVE_BOOTMEM_INFO_NODE that ended up breaking memory hot-remove for ppc
      when sparse vmemmap is not defined.
      
      This patch defines HAVE_BOOTMEM_INFO_NODE for ppc and adds the call to
      register_page_bootmem_info_node. Without this we get a BUG_ON for memory
      hot remove in put_page_bootmem().
      
      This also adds a stub for register_page_bootmem_memmap to allow ppc to build
      with sparse vmemmap defined. Leaving this as a stub is fine since the same
      vmemmap addresses are also handled in vmemmap_populate and as such are
      properly mapped.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: <stable@vger.kernel.org> [v3.9+]
      f7e3334a
  2. 21 6月, 2013 1 次提交
  3. 14 5月, 2013 1 次提交
  4. 30 4月, 2013 2 次提交
    • A
      powerpc: New hugepage directory format · cf9427b8
      Aneesh Kumar K.V 提交于
      Change the hugepage directory format so that we can have leaf ptes directly
      at page directory avoiding the allocation of hugepage directory.
      
      With the new table format we have 3 cases for pgds and pmds:
      (1) invalid (all zeroes)
      (2) pointer to next table, as normal; bottom 6 bits == 0
      (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table
      
      Instead of storing shift value in hugepd pointer we use mmu_psize_def index
      so that we can fit all the supported hugepage size in 4 bits
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9427b8
    • J
      sparse-vmemmap: specify vmemmap population range in bytes · 0aad818b
      Johannes Weiner 提交于
      The sparse code, when asking the architecture to populate the vmemmap,
      specifies the section range as a starting page and a number of pages.
      
      This is an awkward interface, because none of the arch-specific code
      actually thinks of the range in terms of 'struct page' units and always
      translates it to bytes first.
      
      In addition, later patches mix huge page and regular page backing for
      the vmemmap.  For this, they need to call vmemmap_populate_basepages()
      on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
      are not necessarily multiples of the 'struct page' size and so this unit
      is too coarse.
      
      Just translate the section range into bytes once in the generic sparse
      code, then pass byte ranges down the stack.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0aad818b
  5. 24 2月, 2013 2 次提交
  6. 05 9月, 2012 1 次提交
  7. 29 3月, 2012 1 次提交
  8. 30 6月, 2011 1 次提交
  9. 09 6月, 2011 1 次提交
    • B
      powerpc: Force page alignment for initrd reserved memory · 307cfe71
      Benjamin Herrenschmidt 提交于
      When using 64K pages with a separate cpio rootfs, U-Boot will align
      the rootfs on a 4K page boundary. When the memory is reserved, and
      subsequent early memblock_alloc is called, it will allocate memory
      between the 64K page alignment and reserved memory. When the reserved
      memory is subsequently freed, it is done so by pages, causing the
      early memblock_alloc requests to be re-used, which in my case, caused
      the device-tree to be clobbered.
      
      This patch forces the reserved memory for initrd to be kernel page
      aligned, and will move the device tree if it overlaps with the range
      extension of initrd. This patch will also consolidate the identical
      function free_initrd_mem() from mm/init_32.c, init_64.c to mm/mem.c,
      and adds the same range extension when freeing initrd. free_initrd_mem()
      is also moved to the __init section.
      
      Many thanks to Milton Miller for his input on this patch.
      
      [BenH: Fixed build without CONFIG_BLK_DEV_INITRD]
      Signed-off-by: NDave Carroll <dcarroll@astekcorp.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      307cfe71
  10. 24 8月, 2010 1 次提交
  11. 05 8月, 2010 1 次提交
    • B
      memblock: Remove rmo_size, burry it in arch/powerpc where it belongs · cd3db0c4
      Benjamin Herrenschmidt 提交于
      The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact
      server ppc64 though I hijack it on embedded ppc64 for similar purposes)
      and represents the area of memory that can be accessed in real mode
      (aka with MMU off), or on embedded, from the exception vectors (which
      is bolted in the TLB) which pretty much boils down to the same thing.
      
      We take that out of the generic MEMBLOCK data structure and move it into
      arch/powerpc where it belongs, renaming it to "RMA" while at it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cd3db0c4
  12. 14 7月, 2010 1 次提交
  13. 06 5月, 2010 1 次提交
    • M
      powerpc/mm: Track backing pages allocated by vmemmap_populate() · 91eea67c
      Mark Nelson 提交于
      We need to keep track of the backing pages that get allocated by
      vmemmap_populate() so that when we use kdump, the dump-capture kernel knows
      where these pages are.
      
      We use a simple linked list of structures that contain the physical address
      of the backing page and corresponding virtual address to track the backing
      pages.
      To save space, we just use a pointer to the next struct vmemmap_backing. We
      can also do this because we never remove nodes.  We call the pointer "list"
      to be compatible with changes made to the crash utility.
      
      vmemmap_populate() is called either at boot-time or on a memory hotplug
      operation. We don't have to worry about the boot-time calls because they
      will be inherently single-threaded, and for a memory hotplug operation
      vmemmap_populate() is called through:
      sparse_add_one_section()
                  |
                  V
      kmalloc_section_memmap()
                  |
                  V
      sparse_mem_map_populate()
                  |
                  V
      vmemmap_populate()
      and in sparse_add_one_section() we're protected by pgdat_resize_lock().
      So, we don't need a spinlock to protect the vmemmap_list.
      
      We allocate space for the vmemmap_backing structs by allocating whole pages
      in vmemmap_list_alloc() and then handing out chunks of this to
      vmemmap_list_populate().
      
      This means that we waste at most just under one page, but this keeps the code
      is simple.
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      91eea67c
  14. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  15. 30 10月, 2009 2 次提交
    • D
      powerpc/mm: Allow more flexible layouts for hugepage pagetables · a4fe3ce7
      David Gibson 提交于
      Currently each available hugepage size uses a slightly different
      pagetable layout: that is, the bottem level table of pointers to
      hugepages is a different size, and may branch off from the normal page
      tables at a different level.  Every hugepage aware path that needs to
      walk the pagetables must therefore look up the hugepage size from the
      slice info first, and work out the correct way to walk the pagetables
      accordingly.  Future hardware is likely to add more possible hugepage
      sizes, more layout options and more mess.
      
      This patch, therefore reworks the handling of hugepage pagetables to
      reduce this complexity.  In the new scheme, instead of having to
      consult the slice mask, pagetable walking code can check a flag in the
      PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
      and the entry also contains the information (eseentially hugepage
      shift) necessary to then interpret that table without recourse to the
      slice mask.  This scheme can be extended neatly to handle multiple
      levels of self-describing "special" hugepage pagetables, although for
      now we assume only one level exists.
      
      This approach means that only the pagetable allocation path needs to
      know how the pagetables should be set out.  All other (hugepage)
      pagetable walking paths can just interpret the structure as they go.
      
      There already was a flag bit in PGD/PUD/PMD entries for hugepage
      directory pointers, but it was only used for debug.  We alter that
      flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
      pointer (normally it would be 1 since the pointer lies in the linear
      mapping).  This means that asm pagetable walking can test for (and
      punt on) hugepage pointers with the same test that checks for
      unpopulated page directory entries (beq becomes bge), since hugepage
      pointers will always be positive, and normal pointers always negative.
      
      While we're at it, we get rid of the confusing (and grep defeating)
      #defining of hugepte_shift to be the same thing as mmu_huge_psizes.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a4fe3ce7
    • D
      powerpc/mm: Cleanup management of kmem_caches for pagetables · a0668cdc
      David Gibson 提交于
      Currently we have a fair bit of rather fiddly code to manage the
      various kmem_caches used to store page tables of various levels.  We
      generally have two caches holding some combination of PGD, PUD and PMD
      tables, plus several more for the special hugepage pagetables.
      
      This patch cleans this all up by taking a different approach.  Rather
      than the caches being designated as for PUDs or for hugeptes for 16M
      pages, the caches are simply allocated to be a specific size.  Thus
      sharing of caches between different types/levels of pagetables happens
      naturally.  The pagetable size, where needed, is passed around encoded
      in the same way as {PGD,PUD,PMD}_INDEX_SIZE; that is n where the
      pagetable contains 2^n pointers.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a0668cdc
  16. 23 9月, 2009 3 次提交
  17. 20 8月, 2009 1 次提交
  18. 09 6月, 2009 1 次提交
  19. 14 10月, 2008 1 次提交
  20. 11 8月, 2008 1 次提交
    • B
      powerpc/mm: Fix attribute confusion with htab_bolt_mapping() · bc033b63
      Benjamin Herrenschmidt 提交于
      The function htab_bolt_mapping() is used to create permanent
      mappings in the MMU hash table, for example, in order to create
      the linear mapping of vmemmap.  It's also used by early boot
      ioremap (before mem_init_done).
      
      However, the way ioremap uses it is incorrect as it passes it the
      protection flags in the "linux PTE" form while htab_bolt_mapping()
      expects them in the hash table format.  This is made more confusing by
      the fact that some of those flags are actually in the same position in
      both cases.
      
      This fixes it all by making htab_bolt_mapping() take normal linux
      protection flags instead, and use a little helper to convert them to
      htab flags. Callers can now use the usual PAGE_* definitions safely.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/include/asm/mmu-hash64.h |    2 -
       arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
       arch/powerpc/mm/init_64.c             |    9 +---
       3 files changed, 44 insertions(+), 32 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bc033b63
  21. 27 7月, 2008 1 次提交
  22. 25 7月, 2008 1 次提交
  23. 15 5月, 2008 1 次提交
    • B
      [POWERPC] vmemmap fixes to use smaller pages · cec08e7a
      Benjamin Herrenschmidt 提交于
      This changes vmemmap to use a different region (region 0xf) of the
      address space, and to configure the page size of that region
      dynamically at boot.
      
      The problem with the current approach of always using 16M pages is that
      it's not well suited to machines that have small amounts of memory such
      as small partitions on pseries, or PS3's.
      
      In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
      tends to prevent hotplugging the HV's "additional" memory, thus limiting
      the available memory even more, from my experience down to something
      like 80M total, which makes it really not very useable.
      
      The logic used by my match to choose the vmemmap page size is:
      
       - If 16M pages are available and there's 1G or more RAM at boot,
         use that size.
       - Else if 64K pages are available, use that
       - Else use 4K pages
      
      I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
      and it seems to work fine.
      
      Note that I intend to change the way we organize the kernel regions &
      SLBs so the actual region will change from 0xf back to something else at
      one point, as I simplify the SLB miss handler, but that will be for a
      later patch.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cec08e7a
  24. 14 5月, 2008 1 次提交
  25. 24 4月, 2008 1 次提交
    • K
      [POWERPC] 85xx: Add support for relocatable kernel (and booting at non-zero) · 37dd2bad
      Kumar Gala 提交于
      Added support to allow an 85xx kernel to be run from a non-zero physical
      address (useful for cooperative asymmetric multiprocessing situations and
      kdump).  The support can be configured at compile time by setting
      CONFIG_PAGE_OFFSET, CONFIG_KERNEL_START, and CONFIG_PHYSICAL_START as
      desired.
      
      Alternatively, the kernel build can set CONFIG_RELOCATABLE.  Setting this
      config option causes the kernel to determine at runtime the physical
      addresses of CONFIG_PAGE_OFFSET and CONFIG_KERNEL_START.  If
      CONFIG_RELOCATABLE is set, then CONFIG_PHYSICAL_START has no meaning.
      However, CONFIG_PHYSICAL_START will always be used to set the LOAD program
      header physical address field in the resulting ELF image.
      
      Currently we are limited to running at a physical address that is a
      multiple of 256M.  This is due to how we map TLBs to cover
      lowmem.  This should be fixed to allow 64M or maybe even 16M alignment
      in the future.  It is considered an error to try and run a kernel at a
      non-aligned physical address.
      
      All the magic for this support is accomplished by proper initialization
      of the kernel memory subsystem and use of ARCH_PFN_OFFSET.
      
      The use of ARCH_PFN_OFFSET only affects normal memory and not IO mappings.
      ioremap uses map_page and isn't affected by ARCH_PFN_OFFSET.
      
      /dev/mem continues to allow access to any physical address in the system
      regardless of how CONFIG_PHYSICAL_START is set.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      37dd2bad
  26. 18 4月, 2008 1 次提交
  27. 17 4月, 2008 1 次提交
  28. 01 4月, 2008 1 次提交
  29. 14 2月, 2008 1 次提交
  30. 13 11月, 2007 1 次提交
  31. 17 10月, 2007 4 次提交
  32. 03 10月, 2007 1 次提交