1. 30 10月, 2009 1 次提交
    • D
      powerpc/mm: Allow more flexible layouts for hugepage pagetables · a4fe3ce7
      David Gibson 提交于
      Currently each available hugepage size uses a slightly different
      pagetable layout: that is, the bottem level table of pointers to
      hugepages is a different size, and may branch off from the normal page
      tables at a different level.  Every hugepage aware path that needs to
      walk the pagetables must therefore look up the hugepage size from the
      slice info first, and work out the correct way to walk the pagetables
      accordingly.  Future hardware is likely to add more possible hugepage
      sizes, more layout options and more mess.
      
      This patch, therefore reworks the handling of hugepage pagetables to
      reduce this complexity.  In the new scheme, instead of having to
      consult the slice mask, pagetable walking code can check a flag in the
      PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
      and the entry also contains the information (eseentially hugepage
      shift) necessary to then interpret that table without recourse to the
      slice mask.  This scheme can be extended neatly to handle multiple
      levels of self-describing "special" hugepage pagetables, although for
      now we assume only one level exists.
      
      This approach means that only the pagetable allocation path needs to
      know how the pagetables should be set out.  All other (hugepage)
      pagetable walking paths can just interpret the structure as they go.
      
      There already was a flag bit in PGD/PUD/PMD entries for hugepage
      directory pointers, but it was only used for debug.  We alter that
      flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
      pointer (normally it would be 1 since the pointer lies in the linear
      mapping).  This means that asm pagetable walking can test for (and
      punt on) hugepage pointers with the same test that checks for
      unpopulated page directory entries (beq becomes bge), since hugepage
      pointers will always be positive, and normal pointers always negative.
      
      While we're at it, we get rid of the confusing (and grep defeating)
      #defining of hugepte_shift to be the same thing as mmu_huge_psizes.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a4fe3ce7
  2. 22 4月, 2009 1 次提交
  3. 24 3月, 2009 2 次提交
  4. 23 2月, 2009 1 次提交
  5. 22 10月, 2008 1 次提交
  6. 14 10月, 2008 1 次提交
  7. 16 9月, 2008 1 次提交
    • P
      powerpc: Make the 64-bit kernel as a position-independent executable · 549e8152
      Paul Mackerras 提交于
      This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
      a position-independent executable (PIE) when it is set.  This involves
      processing the dynamic relocations in the image in the early stages of
      booting, even if the kernel is being run at the address it is linked at,
      since the linker does not necessarily fill in words in the image for
      which there are dynamic relocations.  (In fact the linker does fill in
      such words for 64-bit executables, though not for 32-bit executables,
      so in principle we could avoid calling relocate() entirely when we're
      running a 64-bit kernel at the linked address.)
      
      The dynamic relocations are processed by a new function relocate(addr),
      where the addr parameter is the virtual address where the image will be
      run.  In fact we call it twice; once before calling prom_init, and again
      when starting the main kernel.  This means that reloc_offset() returns
      0 in prom_init (since it has been relocated to the address it is running
      at), which necessitated a few adjustments.
      
      This also changes __va and __pa to use an equivalent definition that is
      simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
      constants (for 64-bit) whereas PHYSICAL_START is a variable (and
      KERNELBASE ideally should be too, but isn't yet).
      
      With this, relocatable kernels still copy themselves down to physical
      address 0 and run there.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      549e8152
  8. 03 9月, 2008 1 次提交
    • P
      powerpc: Only make kernel text pages of linear mapping executable · 9e88ba4e
      Paul Mackerras 提交于
      Commit bc033b63 ("powerpc/mm: Fix
      attribute confusion with htab_bolt_mapping()") moved the check for
      whether we should make pages of the linear mapping executable from
      htab_bolt_mapping into its callers, including htab_initialize.
      A side-effect of this is that the decision is now made once for
      each contiguous section in the LMB array rather than for each page
      individually.  This can often mean that the whole of the linear
      mapping ends up being executable.
      
      This reverts to the previous behaviour, where individual pages are
      checked for being part of the kernel text or not, by moving the check
      back down into htab_bolt_mapping.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9e88ba4e
  9. 20 8月, 2008 1 次提交
  10. 11 8月, 2008 1 次提交
    • B
      powerpc/mm: Fix attribute confusion with htab_bolt_mapping() · bc033b63
      Benjamin Herrenschmidt 提交于
      The function htab_bolt_mapping() is used to create permanent
      mappings in the MMU hash table, for example, in order to create
      the linear mapping of vmemmap.  It's also used by early boot
      ioremap (before mem_init_done).
      
      However, the way ioremap uses it is incorrect as it passes it the
      protection flags in the "linux PTE" form while htab_bolt_mapping()
      expects them in the hash table format.  This is made more confusing by
      the fact that some of those flags are actually in the same position in
      both cases.
      
      This fixes it all by making htab_bolt_mapping() take normal linux
      protection flags instead, and use a little helper to convert them to
      htab flags. Callers can now use the usual PAGE_* definitions safely.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/include/asm/mmu-hash64.h |    2 -
       arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
       arch/powerpc/mm/init_64.c             |    9 +---
       3 files changed, 44 insertions(+), 32 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bc033b63
  11. 25 7月, 2008 2 次提交
  12. 01 7月, 2008 1 次提交
    • P
      powerpc: Only demote individual slices rather than whole process · 3a8247cc
      Paul Mackerras 提交于
      At present, if we have a kernel with a 64kB page size, and some
      process maps something that has to be mapped with 4kB pages (such as a
      cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair
      pages), we change the process to use 4kB pages everywhere.  This hurts
      the performance of HPC programs that access eHCA from userspace.
      
      With this patch, the kernel will only demote the slice(s) containing
      the eHCA or cache-inhibited mappings, leaving the remaining slices
      able to use 64kB hardware pages.
      
      This also changes the slice_get_unmapped_area code so that it is
      willing to place a 64k-page mapping into (or across) a 4k-page slice
      if there is no better alternative, i.e. if the program specified
      MAP_FIXED or if there is not sufficient space available in slices that
      are either empty or already have 64k-page mappings in them.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3a8247cc
  13. 15 5月, 2008 1 次提交
    • B
      [POWERPC] vmemmap fixes to use smaller pages · cec08e7a
      Benjamin Herrenschmidt 提交于
      This changes vmemmap to use a different region (region 0xf) of the
      address space, and to configure the page size of that region
      dynamically at boot.
      
      The problem with the current approach of always using 16M pages is that
      it's not well suited to machines that have small amounts of memory such
      as small partitions on pseries, or PS3's.
      
      In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
      tends to prevent hotplugging the HV's "additional" memory, thus limiting
      the available memory even more, from my experience down to something
      like 80M total, which makes it really not very useable.
      
      The logic used by my match to choose the vmemmap page size is:
      
       - If 16M pages are available and there's 1G or more RAM at boot,
         use that size.
       - Else if 64K pages are available, use that
       - Else use 4K pages
      
      I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
      and it seems to work fine.
      
      Note that I intend to change the way we organize the kernel regions &
      SLBs so the actual region will change from 0xf back to something else at
      one point, as I simplify the SLB miss handler, but that will be for a
      later patch.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cec08e7a
  14. 14 5月, 2008 2 次提交
  15. 07 4月, 2008 1 次提交
  16. 01 4月, 2008 1 次提交
  17. 24 3月, 2008 1 次提交
    • P
      [POWERPC] Don't use 64k pages for ioremap on pSeries · cfe666b1
      Paul Mackerras 提交于
      On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
      adapter using 64k pages, and thus the ehea driver will fail if 64k
      pages are configured.  This works around the problem by always
      using 4k pages for ioremap on pSeries (but not on other platforms).
      A better fix would be to check whether the partition could ever
      have an eHEA adapter, and only force 4k pages if it could, but this
      will do for 2.6.25.
      
      This is based on an earlier patch by Tony Breeds.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cfe666b1
  18. 13 3月, 2008 1 次提交
    • M
      [POWERPC] Fix large hash table allocation on Cell blades · 31bf1119
      Michael Ellerman 提交于
      My recent hack to allocate the hash table under 1GB on cell was poorly
      tested, *cough*. It turns out on blades with large amounts of memory we
      fail to allocate the hash table at all. This is because RTAS has been
      instantiated just below 768MB, and 0-x MB are used by the kernel,
      leaving no areas that are both large enough and also naturally-aligned.
      
      For the cell IOMMU hack the page tables must be under 2GB, so use that
      as the limit instead. This has been tested on real hardware and boots
      happily.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      31bf1119
  19. 26 2月, 2008 1 次提交
  20. 14 2月, 2008 1 次提交
  21. 31 1月, 2008 1 次提交
  22. 24 1月, 2008 1 次提交
    • P
      [POWERPC] Provide a way to protect 4k subpages when using 64k pages · fa28237c
      Paul Mackerras 提交于
      Using 64k pages on 64-bit PowerPC systems makes life difficult for
      emulators that are trying to emulate an ISA, such as x86, which use a
      smaller page size, since the emulator can no longer use the MMU and
      the normal system calls for controlling page protections.  Of course,
      the emulator can emulate the MMU by checking and possibly remapping
      the address for each memory access in software, but that is pretty
      slow.
      
      This provides a facility for such programs to control the access
      permissions on individual 4k sub-pages of 64k pages.  The idea is
      that the emulator supplies an array of protection masks to apply to a
      specified range of virtual addresses.  These masks are applied at the
      level where hardware PTEs are inserted into the hardware page table
      based on the Linux PTEs, so the Linux PTEs are not affected.  Note
      that this new mechanism does not allow any access that would otherwise
      be prohibited; it can only prohibit accesses that would otherwise be
      allowed.  This new facility is only available on 64-bit PowerPC and
      only when the kernel is configured for 64k pages.
      
      The masks are supplied using a new subpage_prot system call, which
      takes a starting virtual address and length, and a pointer to an array
      of protection masks in memory.  The array has a 32-bit word per 64k
      page to be protected; each 32-bit word consists of 16 2-bit fields,
      for which 0 allows any access (that is otherwise allowed), 1 prevents
      write accesses, and 2 or 3 prevent any access.
      
      Implicit in this is that the regions of the address space that are
      protected are switched to use 4k hardware pages rather than 64k
      hardware pages (on machines with hardware 64k page support).  In fact
      the whole process is switched to use 4k hardware pages when the
      subpage_prot system call is used, but this could be improved in future
      to switch only the affected segments.
      
      The subpage protection bits are stored in a 3 level tree akin to the
      page table tree.  The top level of this tree is stored in a structure
      that is appended to the top level of the page table tree, i.e., the
      pgd array.  Since it will often only be 32-bit addresses (below 4GB)
      that are protected, the pointers to the first four bottom level pages
      are also stored in this structure (each bottom level page contains the
      protection bits for 1GB of address space), so the protection bits for
      addresses below 4GB can be accessed with one fewer loads than those
      for higher addresses.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      fa28237c
  23. 17 1月, 2008 1 次提交
  24. 11 12月, 2007 1 次提交
  25. 08 11月, 2007 1 次提交
  26. 29 10月, 2007 1 次提交
    • B
      [POWERPC] powerpc: Fix demotion of segments to 4K pages · f6ab0b92
      Benjamin Herrenschmidt 提交于
      When demoting a process to use 4K HW pages (instead of 64K), which
      happens under various circumstances such as doing cache inhibited
      mappings on machines that do not support 64K CI pages, the assembly
      hash code calls back into the C function flush_hash_page().  This
      function prototype was recently changed to accomodate for 1T segments
      but the assembly call site was not updated, causing applications that
      do demotion to hang.  In addition, when updating the per-CPU PACA for
      the new sizes, we didn't properly update the slice "map", thus causing
      the SLB miss code to re-insert segments for the wrong size.
      
      This fixes both and adds a warning comment next to the C
      implementation to try to avoid problems next time someone changes it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f6ab0b92
  27. 17 10月, 2007 2 次提交
  28. 12 10月, 2007 1 次提交
    • P
      [POWERPC] Use 1TB segments · 1189be65
      Paul Mackerras 提交于
      This makes the kernel use 1TB segments for all kernel mappings and for
      user addresses of 1TB and above, on machines which support them
      (currently POWER5+, POWER6 and PA6T).
      
      We detect that the machine supports 1TB segments by looking at the
      ibm,processor-segment-sizes property in the device tree.
      
      We don't currently use 1TB segments for user addresses < 1T, since
      that would effectively prevent 32-bit processes from using huge pages
      unless we also had a way to revert to using 256MB segments.  That
      would be possible but would involve extra complications (such as
      keeping track of which segment size was used when HPTEs were inserted)
      and is not addressed here.
      
      Parts of this patch were originally written by Ben Herrenschmidt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1189be65
  29. 17 8月, 2007 3 次提交
  30. 03 8月, 2007 1 次提交
    • M
      [POWERPC] Fixes for the SLB shadow buffer code · 67439b76
      Michael Neuling 提交于
      On a machine with hardware 64kB pages and a kernel configured for a
      64kB base page size, we need to change the vmalloc segment from 64kB
      pages to 4kB pages if some driver creates a non-cacheable mapping in
      the vmalloc area.  However, we never updated with SLB shadow buffer.
      This fixes it.  Thanks to paulus for finding this.
      
      Also added some write barriers to ensure the shadow buffer contents
      are always consistent.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      67439b76
  31. 22 7月, 2007 1 次提交
  32. 14 6月, 2007 1 次提交
  33. 17 5月, 2007 1 次提交
  34. 09 5月, 2007 1 次提交