1. 25 7月, 2008 1 次提交
  2. 01 7月, 2008 1 次提交
    • P
      powerpc: Only demote individual slices rather than whole process · 3a8247cc
      Paul Mackerras 提交于
      At present, if we have a kernel with a 64kB page size, and some
      process maps something that has to be mapped with 4kB pages (such as a
      cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair
      pages), we change the process to use 4kB pages everywhere.  This hurts
      the performance of HPC programs that access eHCA from userspace.
      
      With this patch, the kernel will only demote the slice(s) containing
      the eHCA or cache-inhibited mappings, leaving the remaining slices
      able to use 64kB hardware pages.
      
      This also changes the slice_get_unmapped_area code so that it is
      willing to place a 64k-page mapping into (or across) a 4k-page slice
      if there is no better alternative, i.e. if the program specified
      MAP_FIXED or if there is not sufficient space available in slices that
      are either empty or already have 64k-page mappings in them.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3a8247cc
  3. 15 5月, 2008 1 次提交
    • B
      [POWERPC] vmemmap fixes to use smaller pages · cec08e7a
      Benjamin Herrenschmidt 提交于
      This changes vmemmap to use a different region (region 0xf) of the
      address space, and to configure the page size of that region
      dynamically at boot.
      
      The problem with the current approach of always using 16M pages is that
      it's not well suited to machines that have small amounts of memory such
      as small partitions on pseries, or PS3's.
      
      In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
      tends to prevent hotplugging the HV's "additional" memory, thus limiting
      the available memory even more, from my experience down to something
      like 80M total, which makes it really not very useable.
      
      The logic used by my match to choose the vmemmap page size is:
      
       - If 16M pages are available and there's 1G or more RAM at boot,
         use that size.
       - Else if 64K pages are available, use that
       - Else use 4K pages
      
      I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
      and it seems to work fine.
      
      Note that I intend to change the way we organize the kernel regions &
      SLBs so the actual region will change from 0xf back to something else at
      one point, as I simplify the SLB miss handler, but that will be for a
      later patch.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cec08e7a
  4. 14 5月, 2008 2 次提交
  5. 07 4月, 2008 1 次提交
  6. 01 4月, 2008 1 次提交
  7. 24 3月, 2008 1 次提交
    • P
      [POWERPC] Don't use 64k pages for ioremap on pSeries · cfe666b1
      Paul Mackerras 提交于
      On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
      adapter using 64k pages, and thus the ehea driver will fail if 64k
      pages are configured.  This works around the problem by always
      using 4k pages for ioremap on pSeries (but not on other platforms).
      A better fix would be to check whether the partition could ever
      have an eHEA adapter, and only force 4k pages if it could, but this
      will do for 2.6.25.
      
      This is based on an earlier patch by Tony Breeds.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cfe666b1
  8. 13 3月, 2008 1 次提交
    • M
      [POWERPC] Fix large hash table allocation on Cell blades · 31bf1119
      Michael Ellerman 提交于
      My recent hack to allocate the hash table under 1GB on cell was poorly
      tested, *cough*. It turns out on blades with large amounts of memory we
      fail to allocate the hash table at all. This is because RTAS has been
      instantiated just below 768MB, and 0-x MB are used by the kernel,
      leaving no areas that are both large enough and also naturally-aligned.
      
      For the cell IOMMU hack the page tables must be under 2GB, so use that
      as the limit instead. This has been tested on real hardware and boots
      happily.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      31bf1119
  9. 26 2月, 2008 1 次提交
  10. 14 2月, 2008 1 次提交
  11. 31 1月, 2008 1 次提交
  12. 24 1月, 2008 1 次提交
    • P
      [POWERPC] Provide a way to protect 4k subpages when using 64k pages · fa28237c
      Paul Mackerras 提交于
      Using 64k pages on 64-bit PowerPC systems makes life difficult for
      emulators that are trying to emulate an ISA, such as x86, which use a
      smaller page size, since the emulator can no longer use the MMU and
      the normal system calls for controlling page protections.  Of course,
      the emulator can emulate the MMU by checking and possibly remapping
      the address for each memory access in software, but that is pretty
      slow.
      
      This provides a facility for such programs to control the access
      permissions on individual 4k sub-pages of 64k pages.  The idea is
      that the emulator supplies an array of protection masks to apply to a
      specified range of virtual addresses.  These masks are applied at the
      level where hardware PTEs are inserted into the hardware page table
      based on the Linux PTEs, so the Linux PTEs are not affected.  Note
      that this new mechanism does not allow any access that would otherwise
      be prohibited; it can only prohibit accesses that would otherwise be
      allowed.  This new facility is only available on 64-bit PowerPC and
      only when the kernel is configured for 64k pages.
      
      The masks are supplied using a new subpage_prot system call, which
      takes a starting virtual address and length, and a pointer to an array
      of protection masks in memory.  The array has a 32-bit word per 64k
      page to be protected; each 32-bit word consists of 16 2-bit fields,
      for which 0 allows any access (that is otherwise allowed), 1 prevents
      write accesses, and 2 or 3 prevent any access.
      
      Implicit in this is that the regions of the address space that are
      protected are switched to use 4k hardware pages rather than 64k
      hardware pages (on machines with hardware 64k page support).  In fact
      the whole process is switched to use 4k hardware pages when the
      subpage_prot system call is used, but this could be improved in future
      to switch only the affected segments.
      
      The subpage protection bits are stored in a 3 level tree akin to the
      page table tree.  The top level of this tree is stored in a structure
      that is appended to the top level of the page table tree, i.e., the
      pgd array.  Since it will often only be 32-bit addresses (below 4GB)
      that are protected, the pointers to the first four bottom level pages
      are also stored in this structure (each bottom level page contains the
      protection bits for 1GB of address space), so the protection bits for
      addresses below 4GB can be accessed with one fewer loads than those
      for higher addresses.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      fa28237c
  13. 17 1月, 2008 1 次提交
  14. 11 12月, 2007 1 次提交
  15. 08 11月, 2007 1 次提交
  16. 29 10月, 2007 1 次提交
    • B
      [POWERPC] powerpc: Fix demotion of segments to 4K pages · f6ab0b92
      Benjamin Herrenschmidt 提交于
      When demoting a process to use 4K HW pages (instead of 64K), which
      happens under various circumstances such as doing cache inhibited
      mappings on machines that do not support 64K CI pages, the assembly
      hash code calls back into the C function flush_hash_page().  This
      function prototype was recently changed to accomodate for 1T segments
      but the assembly call site was not updated, causing applications that
      do demotion to hang.  In addition, when updating the per-CPU PACA for
      the new sizes, we didn't properly update the slice "map", thus causing
      the SLB miss code to re-insert segments for the wrong size.
      
      This fixes both and adds a warning comment next to the C
      implementation to try to avoid problems next time someone changes it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f6ab0b92
  17. 17 10月, 2007 2 次提交
  18. 12 10月, 2007 1 次提交
    • P
      [POWERPC] Use 1TB segments · 1189be65
      Paul Mackerras 提交于
      This makes the kernel use 1TB segments for all kernel mappings and for
      user addresses of 1TB and above, on machines which support them
      (currently POWER5+, POWER6 and PA6T).
      
      We detect that the machine supports 1TB segments by looking at the
      ibm,processor-segment-sizes property in the device tree.
      
      We don't currently use 1TB segments for user addresses < 1T, since
      that would effectively prevent 32-bit processes from using huge pages
      unless we also had a way to revert to using 256MB segments.  That
      would be possible but would involve extra complications (such as
      keeping track of which segment size was used when HPTEs were inserted)
      and is not addressed here.
      
      Parts of this patch were originally written by Ben Herrenschmidt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1189be65
  19. 17 8月, 2007 3 次提交
  20. 03 8月, 2007 1 次提交
    • M
      [POWERPC] Fixes for the SLB shadow buffer code · 67439b76
      Michael Neuling 提交于
      On a machine with hardware 64kB pages and a kernel configured for a
      64kB base page size, we need to change the vmalloc segment from 64kB
      pages to 4kB pages if some driver creates a non-cacheable mapping in
      the vmalloc area.  However, we never updated with SLB shadow buffer.
      This fixes it.  Thanks to paulus for finding this.
      
      Also added some write barriers to ensure the shadow buffer contents
      are always consistent.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      67439b76
  21. 22 7月, 2007 1 次提交
  22. 14 6月, 2007 1 次提交
  23. 17 5月, 2007 1 次提交
  24. 09 5月, 2007 3 次提交
    • B
      [POWERPC] Add ability to 4K kernel to hash in 64K pages · 16c2d476
      Benjamin Herrenschmidt 提交于
      This adds the ability for a kernel compiled with 4K page size
      to have special slices containing 64K pages and hash the right type
      of hash PTEs.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      16c2d476
    • B
      [POWERPC] Introduce address space "slices" · d0f13e3c
      Benjamin Herrenschmidt 提交于
      The basic issue is to be able to do what hugetlbfs does but with
      different page sizes for some other special filesystems; more
      specifically, my need is:
      
       - Huge pages
      
       - SPE local store mappings using 64K pages on a 4K base page size
      kernel on Cell
      
       - Some special 4K segments in 64K-page kernels for mapping a dodgy
      type of powerpc-specific infiniband hardware that requires 4K MMU
      mappings for various reasons I won't explain here.
      
      The main issues are:
      
       - To maintain/keep track of the page size per "segment" (as we can
      only have one page size per segment on powerpc, which are 256MB
      divisions of the address space).
      
       - To make sure special mappings stay within their allotted
      "segments" (including MAP_FIXED crap)
      
       - To make sure everybody else doesn't mmap/brk/grow_stack into a
      "segment" that is used for a special mapping
      
      Some of the necessary mechanisms to handle that were present in the
      hugetlbfs code, but mostly in ways not suitable for anything else.
      
      The patch relies on some changes to the generic get_unmapped_area()
      that just got merged.  It still hijacks hugetlb callbacks here or
      there as the generic code hasn't been entirely cleaned up yet but
      that shouldn't be a problem.
      
      So what is a slice ?  Well, I re-used the mechanism used formerly by our
      hugetlbfs implementation which divides the address space in
      "meta-segments" which I called "slices".  The division is done using
      256MB slices below 4G, and 1T slices above.  Thus the address space is
      divided currently into 16 "low" slices and 16 "high" slices.  (Special
      case: high slice 0 is the area between 4G and 1T).
      
      Doing so simplifies significantly the tracking of segments and avoids
      having to keep track of all the 256MB segments in the address space.
      
      While I used the "concepts" of hugetlbfs, I mostly re-implemented
      everything in a more generic way and "ported" hugetlbfs to it.
      
      Slices can have an associated page size, which is encoded in the mmu
      context and used by the SLB miss handler to set the segment sizes.  The
      hash code currently doesn't care, it has a specific check for hugepages,
      though I might add a mechanism to provide per-slice hash mapping
      functions in the future.
      
      The slice code provide a pair of "generic" get_unmapped_area() (bottomup
      and topdown) functions that should work with any slice size.  There is
      some trickiness here so I would appreciate people to have a look at the
      implementation of these and let me know if I got something wrong.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d0f13e3c
    • B
      [POWERPC] Small fixes & cleanups in segment page size demotion · 16f1c746
      Benjamin Herrenschmidt 提交于
      The code for demoting segments to 4K had some issues, like for example,
      when using _PAGE_4K_PFN flag, the first CPU to hit it would do the
      demotion, but other CPUs hitting the same page wouldn't properly flush
      their SLBs if mmu_ci_restriction isn't set.  There are also potential
      issues with hash_preload not handling _PAGE_4K_PFN.  All of these are
      non issues on current hardware but might bite us in the future.
      
      This patch thus fixes it by:
      
       - Taking the test comparing the mm and current CPU context page
      sizes to decide to flush SLBs out of the mmu_ci_restrictions test
      since that can also be triggered by _PAGE_4K_PFN pages
      
       - Due to the above being done all the time, demote_segment_4k
      doesn't need update the context and flush the SLB
      
       - demote_segment_4k can be static and doesn't need an EXPORT_SYMBOL
      
       - Making hash_preload ignore anything that has either _PAGE_4K_PFN
      or _PAGE_NO_CACHE set, thus avoiding duplication of the complicated
      logic in hash_page() (and possibly making hash_preload a little bit
      faster for the normal case).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      16f1c746
  25. 02 5月, 2007 1 次提交
    • M
      [POWERPC] Initialise spinlock in the DEBUG_PAGEALLOC code · ed166692
      Michael Ellerman 提交于
      Fixes:
      
      BUG: spinlock bad magic on CPU#0, swapper/0
       lock: c00000000064ec30, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
      Call Trace:
      [c00000000062b980] [c00000000000f920] .show_stack+0x6c/0x1a0 (unreliable)
      [c00000000062ba20] [c0000000001c2b40] .spin_bug+0xb0/0xd4
      [c00000000062bab0] [c0000000001c2ed0] ._raw_spin_lock+0x44/0x184
      [c00000000062bb50] [c0000000003a42b4] ._spin_lock+0x10/0x24
      [c00000000062bbd0] [c00000000002b4dc] .kernel_map_pages+0x198/0x278
      [c00000000062bc90] [c000000000079720] .free_hot_cold_page+0x124/0x418
      [c00000000062bd70] [c000000000530278] .free_all_bootmem_core+0x14c/0x224
      [c00000000062be50] [c00000000052a178] .mem_init+0x68/0x170
      [c00000000062bee0] [c00000000051d874] .start_kernel+0x2a0/0x37c
      [c00000000062bf90] [c0000000000084c8] .start_here_common+0x54/0x8c
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ed166692
  26. 13 4月, 2007 2 次提交
    • B
      [POWERPC] DEBUG_PAGEALLOC for 64-bit · 370a908d
      Benjamin Herrenschmidt 提交于
      Here's an implementation of DEBUG_PAGEALLOC for 64 bits powerpc.
      It applies on top of the 32 bits patch.
      
      Unlike Anton's previous attempt, I'm not using updatepp. I'm removing
      the hash entries from the bolted mapping (using a map in RAM of all the
      slots). Expensive but it doesn't really matter, does it ? :-)
      
      Memory hot-added doesn't benefit from this unless it's added at an
      address that is below end_of_DRAM() as calculated at boot time.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/Kconfig.debug      |    2
       arch/powerpc/mm/hash_utils_64.c |   84 ++++++++++++++++++++++++++++++++++++++--
       2 files changed, 82 insertions(+), 4 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      370a908d
    • P
      [POWERPC] Allow drivers to map individual 4k pages to userspace · 721151d0
      Paul Mackerras 提交于
      Some drivers have resources that they want to be able to map into
      userspace that are 4k in size.  On a kernel configured with 64k pages
      we currently end up mapping the 4k we want plus another 60k of
      physical address space, which could contain anything.  This can
      introduce security problems, for example in the case of an infiniband
      adaptor where the other 60k could contain registers that some other
      program is using for its communications.
      
      This patch adds a new function, remap_4k_pfn, which drivers can use to
      map a single 4k page to userspace regardless of whether the kernel is
      using a 4k or a 64k page size.  Like remap_pfn_range, it would
      typically be called in a driver's mmap function.  It only maps a
      single 4k page, which on a 64k page kernel appears replicated 16 times
      throughout a 64k page.  On a 4k page kernel it reduces to a call to
      remap_pfn_range.
      
      The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN,
      gets set on the linux PTE.  This alters the way that __hash_page_4K
      computes the real address to put in the HPTE.  The RPN field of the
      linux PTE becomes the 4k RPN directly rather than being interpreted as
      a 64k RPN.  Since the RPN field is 32 bits, this means that physical
      addresses being mapped with remap_4k_pfn have to be below 2^44,
      i.e. 0x100000000000.
      
      The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c
      that deals with demoting a process to use 4k pages into one function
      that gets called in the various different places where we need to do
      that.  There were some discrepancies between exactly what was done in
      the various places, such as a call to spu_flush_all_slbs in one case
      but not in others.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      721151d0
  27. 10 3月, 2007 1 次提交
    • B
      [POWERPC] Fix spu SLB invalidations · 94b2a439
      Benjamin Herrenschmidt 提交于
      The SPU code doesn't properly invalidate SPUs SLBs when necessary,
      for example when changing a segment size from the hugetlbfs code. In
      addition, it saves and restores the SLB content on context switches
      which makes it harder to properly handle those invalidations.
      
      This patch removes the saving & restoring for now, something more
      efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
      that can be used by the core mm code to flush the SLBs of all SPEs that
      are running a given mm at the time of the flush.
      
      In order to do that, it adds a spinlock to the list of all SPEs and move
      some bits & pieces from spufs to spu_base.c
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94b2a439
  28. 04 12月, 2006 1 次提交
  29. 01 7月, 2006 1 次提交
  30. 28 6月, 2006 2 次提交
  31. 15 6月, 2006 1 次提交
    • P
      powerpc: Use 64k pages without needing cache-inhibited large pages · bf72aeba
      Paul Mackerras 提交于
      Some POWER5+ machines can do 64k hardware pages for normal memory but
      not for cache-inhibited pages.  This patch lets us use 64k hardware
      pages for most user processes on such machines (assuming the kernel
      has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
      start out using 64k pages and get switched to 4k pages if they use any
      non-cacheable mappings.
      
      With this, we use 64k pages for the vmalloc region and 4k pages for
      the imalloc region.  If anything creates a non-cacheable mapping in
      the vmalloc region, the vmalloc region will get switched to 4k pages.
      I don't know of any driver other than the DRM that would do this,
      though, and these machines don't have AGP.
      
      When a region gets switched from 64k pages to 4k pages, we do not have
      to clear out all the 64k HPTEs from the hash table immediately.  We
      use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
      was hashed in as a 64k page or a set of 4k pages.  If hash_page is
      trying to insert a 4k page for a Linux PTE and it sees that it has
      already been inserted as a 64k page, it first invalidates the 64k HPTE
      before inserting the 4k HPTE.  The hash invalidation routines also use
      the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
      set of 4k HPTEs to remove.  With those two changes, we can tolerate a
      mix of 4k and 64k HPTEs in the hash table, and they will all get
      removed when the address space is torn down.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bf72aeba
  32. 22 4月, 2006 1 次提交