1. 09 5月, 2007 3 次提交
    • B
      [POWERPC] Add ability to 4K kernel to hash in 64K pages · 16c2d476
      Benjamin Herrenschmidt 提交于
      This adds the ability for a kernel compiled with 4K page size
      to have special slices containing 64K pages and hash the right type
      of hash PTEs.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      16c2d476
    • B
      [POWERPC] Introduce address space "slices" · d0f13e3c
      Benjamin Herrenschmidt 提交于
      The basic issue is to be able to do what hugetlbfs does but with
      different page sizes for some other special filesystems; more
      specifically, my need is:
      
       - Huge pages
      
       - SPE local store mappings using 64K pages on a 4K base page size
      kernel on Cell
      
       - Some special 4K segments in 64K-page kernels for mapping a dodgy
      type of powerpc-specific infiniband hardware that requires 4K MMU
      mappings for various reasons I won't explain here.
      
      The main issues are:
      
       - To maintain/keep track of the page size per "segment" (as we can
      only have one page size per segment on powerpc, which are 256MB
      divisions of the address space).
      
       - To make sure special mappings stay within their allotted
      "segments" (including MAP_FIXED crap)
      
       - To make sure everybody else doesn't mmap/brk/grow_stack into a
      "segment" that is used for a special mapping
      
      Some of the necessary mechanisms to handle that were present in the
      hugetlbfs code, but mostly in ways not suitable for anything else.
      
      The patch relies on some changes to the generic get_unmapped_area()
      that just got merged.  It still hijacks hugetlb callbacks here or
      there as the generic code hasn't been entirely cleaned up yet but
      that shouldn't be a problem.
      
      So what is a slice ?  Well, I re-used the mechanism used formerly by our
      hugetlbfs implementation which divides the address space in
      "meta-segments" which I called "slices".  The division is done using
      256MB slices below 4G, and 1T slices above.  Thus the address space is
      divided currently into 16 "low" slices and 16 "high" slices.  (Special
      case: high slice 0 is the area between 4G and 1T).
      
      Doing so simplifies significantly the tracking of segments and avoids
      having to keep track of all the 256MB segments in the address space.
      
      While I used the "concepts" of hugetlbfs, I mostly re-implemented
      everything in a more generic way and "ported" hugetlbfs to it.
      
      Slices can have an associated page size, which is encoded in the mmu
      context and used by the SLB miss handler to set the segment sizes.  The
      hash code currently doesn't care, it has a specific check for hugepages,
      though I might add a mechanism to provide per-slice hash mapping
      functions in the future.
      
      The slice code provide a pair of "generic" get_unmapped_area() (bottomup
      and topdown) functions that should work with any slice size.  There is
      some trickiness here so I would appreciate people to have a look at the
      implementation of these and let me know if I got something wrong.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d0f13e3c
    • B
      [POWERPC] Small fixes & cleanups in segment page size demotion · 16f1c746
      Benjamin Herrenschmidt 提交于
      The code for demoting segments to 4K had some issues, like for example,
      when using _PAGE_4K_PFN flag, the first CPU to hit it would do the
      demotion, but other CPUs hitting the same page wouldn't properly flush
      their SLBs if mmu_ci_restriction isn't set.  There are also potential
      issues with hash_preload not handling _PAGE_4K_PFN.  All of these are
      non issues on current hardware but might bite us in the future.
      
      This patch thus fixes it by:
      
       - Taking the test comparing the mm and current CPU context page
      sizes to decide to flush SLBs out of the mmu_ci_restrictions test
      since that can also be triggered by _PAGE_4K_PFN pages
      
       - Due to the above being done all the time, demote_segment_4k
      doesn't need update the context and flush the SLB
      
       - demote_segment_4k can be static and doesn't need an EXPORT_SYMBOL
      
       - Making hash_preload ignore anything that has either _PAGE_4K_PFN
      or _PAGE_NO_CACHE set, thus avoiding duplication of the complicated
      logic in hash_page() (and possibly making hash_preload a little bit
      faster for the normal case).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      16f1c746
  2. 02 5月, 2007 1 次提交
    • M
      [POWERPC] Initialise spinlock in the DEBUG_PAGEALLOC code · ed166692
      Michael Ellerman 提交于
      Fixes:
      
      BUG: spinlock bad magic on CPU#0, swapper/0
       lock: c00000000064ec30, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
      Call Trace:
      [c00000000062b980] [c00000000000f920] .show_stack+0x6c/0x1a0 (unreliable)
      [c00000000062ba20] [c0000000001c2b40] .spin_bug+0xb0/0xd4
      [c00000000062bab0] [c0000000001c2ed0] ._raw_spin_lock+0x44/0x184
      [c00000000062bb50] [c0000000003a42b4] ._spin_lock+0x10/0x24
      [c00000000062bbd0] [c00000000002b4dc] .kernel_map_pages+0x198/0x278
      [c00000000062bc90] [c000000000079720] .free_hot_cold_page+0x124/0x418
      [c00000000062bd70] [c000000000530278] .free_all_bootmem_core+0x14c/0x224
      [c00000000062be50] [c00000000052a178] .mem_init+0x68/0x170
      [c00000000062bee0] [c00000000051d874] .start_kernel+0x2a0/0x37c
      [c00000000062bf90] [c0000000000084c8] .start_here_common+0x54/0x8c
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ed166692
  3. 13 4月, 2007 2 次提交
    • B
      [POWERPC] DEBUG_PAGEALLOC for 64-bit · 370a908d
      Benjamin Herrenschmidt 提交于
      Here's an implementation of DEBUG_PAGEALLOC for 64 bits powerpc.
      It applies on top of the 32 bits patch.
      
      Unlike Anton's previous attempt, I'm not using updatepp. I'm removing
      the hash entries from the bolted mapping (using a map in RAM of all the
      slots). Expensive but it doesn't really matter, does it ? :-)
      
      Memory hot-added doesn't benefit from this unless it's added at an
      address that is below end_of_DRAM() as calculated at boot time.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/Kconfig.debug      |    2
       arch/powerpc/mm/hash_utils_64.c |   84 ++++++++++++++++++++++++++++++++++++++--
       2 files changed, 82 insertions(+), 4 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      370a908d
    • P
      [POWERPC] Allow drivers to map individual 4k pages to userspace · 721151d0
      Paul Mackerras 提交于
      Some drivers have resources that they want to be able to map into
      userspace that are 4k in size.  On a kernel configured with 64k pages
      we currently end up mapping the 4k we want plus another 60k of
      physical address space, which could contain anything.  This can
      introduce security problems, for example in the case of an infiniband
      adaptor where the other 60k could contain registers that some other
      program is using for its communications.
      
      This patch adds a new function, remap_4k_pfn, which drivers can use to
      map a single 4k page to userspace regardless of whether the kernel is
      using a 4k or a 64k page size.  Like remap_pfn_range, it would
      typically be called in a driver's mmap function.  It only maps a
      single 4k page, which on a 64k page kernel appears replicated 16 times
      throughout a 64k page.  On a 4k page kernel it reduces to a call to
      remap_pfn_range.
      
      The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN,
      gets set on the linux PTE.  This alters the way that __hash_page_4K
      computes the real address to put in the HPTE.  The RPN field of the
      linux PTE becomes the 4k RPN directly rather than being interpreted as
      a 64k RPN.  Since the RPN field is 32 bits, this means that physical
      addresses being mapped with remap_4k_pfn have to be below 2^44,
      i.e. 0x100000000000.
      
      The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c
      that deals with demoting a process to use 4k pages into one function
      that gets called in the various different places where we need to do
      that.  There were some discrepancies between exactly what was done in
      the various places, such as a call to spu_flush_all_slbs in one case
      but not in others.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      721151d0
  4. 10 3月, 2007 1 次提交
    • B
      [POWERPC] Fix spu SLB invalidations · 94b2a439
      Benjamin Herrenschmidt 提交于
      The SPU code doesn't properly invalidate SPUs SLBs when necessary,
      for example when changing a segment size from the hugetlbfs code. In
      addition, it saves and restores the SLB content on context switches
      which makes it harder to properly handle those invalidations.
      
      This patch removes the saving & restoring for now, something more
      efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
      that can be used by the core mm code to flush the SLBs of all SPEs that
      are running a given mm at the time of the flush.
      
      In order to do that, it adds a spinlock to the list of all SPEs and move
      some bits & pieces from spufs to spu_base.c
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94b2a439
  5. 04 12月, 2006 1 次提交
  6. 01 7月, 2006 1 次提交
  7. 28 6月, 2006 2 次提交
  8. 15 6月, 2006 1 次提交
    • P
      powerpc: Use 64k pages without needing cache-inhibited large pages · bf72aeba
      Paul Mackerras 提交于
      Some POWER5+ machines can do 64k hardware pages for normal memory but
      not for cache-inhibited pages.  This patch lets us use 64k hardware
      pages for most user processes on such machines (assuming the kernel
      has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
      start out using 64k pages and get switched to 4k pages if they use any
      non-cacheable mappings.
      
      With this, we use 64k pages for the vmalloc region and 4k pages for
      the imalloc region.  If anything creates a non-cacheable mapping in
      the vmalloc region, the vmalloc region will get switched to 4k pages.
      I don't know of any driver other than the DRM that would do this,
      though, and these machines don't have AGP.
      
      When a region gets switched from 64k pages to 4k pages, we do not have
      to clear out all the 64k HPTEs from the hash table immediately.  We
      use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
      was hashed in as a 64k page or a set of 4k pages.  If hash_page is
      trying to insert a 4k page for a Linux PTE and it sees that it has
      already been inserted as a 64k page, it first invalidates the 64k HPTE
      before inserting the 4k HPTE.  The hash invalidation routines also use
      the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
      set of 4k HPTEs to remove.  With those two changes, we can tolerate a
      mix of 4k and 64k HPTEs in the hash table, and they will all get
      removed when the address space is torn down.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bf72aeba
  9. 22 4月, 2006 1 次提交
  10. 28 3月, 2006 1 次提交
  11. 22 3月, 2006 2 次提交
    • M
      [PATCH] powerpc: Replace platform_is_lpar() with a firmware feature · 57cfb814
      Michael Ellerman 提交于
      It has been decreed that platform numbers are evil, so as a step in that
      direction, replace platform_is_lpar() with a FW_FEATURE_LPAR bit.
      
      Currently FW_FEATURE_LPAR really means i/pSeries LPAR, in the future we might
      have to clean that up if we need to be more specific about what LPAR actually
      means. But that's another patch ...
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      57cfb814
    • M
      [PATCH] powerpc: Unconfuse htab_bolt_mapping() callers · caf80e57
      Michael Ellerman 提交于
      htab_bolt_mapping() takes a vstart and pstart parameter, but all but one of
      its callers actually pass it vstart and vstart. Luckily before it passes
      paddr (calculated from paddr) to the hpte_insert routines it calls
      virt_to_abs() (aka. __pa()) on the address, so there isn't actually a bug.
      
      map_io_page() however does pass pstart properly, so currently it's broken
      AFAICT because we're calling __pa(paddr) which will get us something very
      large. Presumably no one's calling map_io_page() in the right context.
      
      Anyway, change htab_bolt_mapping() callers to properly pass pstart, and then
      use it properly in htab_bolt_mapping(), ie. don't call __pa() on it again.
      
      Booted on p5 LPAR, iSeries and Power3.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      caf80e57
  12. 28 2月, 2006 1 次提交
    • M
      [PATCH] powerpc/iseries: Fix double phys_to_abs bug in htab_bolt_mapping · 56ec6462
      Michael Ellerman 提交于
      Before the merge I updated create_pte_mapping() to work for iSeries, by
      calling iSeries_hpte_bolt_or_insert. (4c55130b)
      
      Later we changed iSeries_hpte_insert to cope with the bolting case, and called
      that instead from create_pte_mapping() (which was renamed to htab_bolt_mapping)
      (3c726f8d).
      
      Unfortunately that change introduced a subtle bug, where we pass an absolute
      address to iSeries_hpte_insert() where it expects a physical address. This
      leads to us calling phys_to_abs() twice on the physical address, which is
      seriously bogus.
      
      This only causes a problem if the absolute address from the first translation
      can be looked up again in the chunk_map, which depends on the size and layout
      of memory. I've seen it fail on one box, but not others.
      
      The minimal fix is to pass the physical address to iSeries_hpte_insert(). For
      2.6.17 we should make phys_to_abs() BUG if we try to double-translate an
      address.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      56ec6462
  13. 24 2月, 2006 1 次提交
  14. 07 2月, 2006 1 次提交
    • M
      [PATCH] powerpc: Always panic if lmb_alloc() fails · d7a5b2ff
      Michael Ellerman 提交于
      Currently most callers of lmb_alloc() don't check if it worked or not, if it
      ever does weird bad things will probably happen. The few callers who do check
      just panic or BUG_ON.
      
      So make lmb_alloc() panic internally, to catch bugs at the source. The few
      callers who did check the result no longer need to.
      
      The only caller that did anything interesting with the return result was
      careful_allocation(). For it we create __lmb_alloc_base() which _doesn't_ panic
      automatically, a little messy, but passable.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d7a5b2ff
  15. 10 1月, 2006 1 次提交
  16. 09 1月, 2006 2 次提交
    • M
      [PATCH] powerpc: Separate usage of KERNELBASE and PAGE_OFFSET · b5666f70
      Michael Ellerman 提交于
      This patch separates usage of KERNELBASE and PAGE_OFFSET. I haven't
      looked at any of the PPC32 code, if we ever want to support Kdump on
      PPC we'll have to do another audit, ditto for iSeries.
      
      This patch makes PAGE_OFFSET the constant, it'll always be 0xC * 1
      gazillion for 64-bit.
      
      To get a physical address from a virtual one you subtract PAGE_OFFSET,
      _not_ KERNELBASE.
      
      KERNELBASE is the virtual address of the start of the kernel, it's
      often the same as PAGE_OFFSET, but _might not be_.
      
      If you want to know something's offset from the start of the kernel
      you should subtract KERNELBASE.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b5666f70
    • A
      [PATCH] spufs: The SPU file system, base · 67207b96
      Arnd Bergmann 提交于
      This is the current version of the spu file system, used
      for driving SPEs on the Cell Broadband Engine.
      
      This release is almost identical to the version for the
      2.6.14 kernel posted earlier, which is available as part
      of the Cell BE Linux distribution from
      http://www.bsc.es/projects/deepcomputing/linuxoncell/.
      
      The first patch provides all the interfaces for running
      spu application, but does not have any support for
      debugging SPU tasks or for scheduling. Both these
      functionalities are added in the subsequent patches.
      
      See Documentation/filesystems/spufs.txt on how to use
      spufs.
      Signed-off-by: NArnd Bergmann <arndb@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      67207b96
  17. 30 12月, 2005 1 次提交
  18. 09 12月, 2005 1 次提交
    • D
      [PATCH] powerpc: Add missing icache flushes for hugepages · cbf52afd
      David Gibson 提交于
      On most powerpc CPUs, the dcache and icache are not coherent so
      between writing and executing a page, the caches must be flushed.
      Userspace programs assume pages given to them by the kernel are icache
      clean, so we must do this flush between the kernel clearing a page and
      it being mapped into userspace for execute.  We were not doing this
      for hugepages, this patch corrects the situation.
      
      We use the same lazy mechanism as we use for normal pages, delaying
      the flush until userspace actually attempts to execute from the page
      in question.
      
      Tested on G5.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cbf52afd
  19. 10 11月, 2005 1 次提交
  20. 08 11月, 2005 2 次提交
  21. 07 11月, 2005 3 次提交
    • D
      [PATCH] ppc64: Fix bug in SLB miss handler for hugepages · 7d24f0b8
      David Gibson 提交于
      This patch, however, should be applied on top of the 64k-page-size patch to
      fix some problems with hugepage (some pre-existing, another introduced by
      this patch).
      
      The patch fixes a bug in the SLB miss handler for hugepages on ppc64
      introduced by the dynamic hugepage patch (commit id
      c594adad) due to a misunderstanding of the
      srd instruction's behaviour (mea culpa).  The problem arises when a 64-bit
      process maps some hugepages in the low 4GB of the address space (unusual).
      In this case, as well as the 256M segment in question being marked for
      hugepages, other segments at 32G intervals will be incorrectly marked for
      hugepages.
      
      In the process, this patch tweaks the semantics of the hugepage bitmaps to
      be more sensible.  Previously, an address below 4G was marked for hugepages
      if the appropriate segment bit in the "low areas" bitmask was set *or* if
      the low bit in the "high areas" bitmap was set (which would mark all
      addresses below 1TB for hugepage).  With this patch, any given address is
      governed by a single bitmap.  Addresses below 4GB are marked for hugepage
      if and only if their bit is set in the "low areas" bitmap (256M
      granularity).  Addresses between 4GB and 1TB are marked for hugepage iff
      the low bit in the "high areas" bitmap is set.  Higher addresses are marked
      for hugepage iff their bit in the "high areas" bitmap is set (1TB
      granularity).
      
      To avoid conflicts, this patch must be applied on top of BenH's pending
      patch for 64k base page size [0].  As such, this patch also addresses a
      hugepage problem introduced by that patch.  That patch allows hugepages of
      1MB in size on hardware which supports it, however, that won't work when
      using 4k pages (4 level pagetable), because in that case hugepage PTEs are
      stored at the PMD level, and each PMD entry maps 2MB.  This patch simply
      disallows hugepages in that case (we can do something cleverer to re-enable
      them some other day).
      
      Built, booted, and a handful of hugepage related tests passed on POWER5
      LPAR (both ARCH=powerpc and ARCH=ppc64).
      
      [0] http://gate.crashing.org/~benh/ppc64-64k-pages.diffSigned-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7d24f0b8
    • D
      [PATCH] powerpc: Kill ppcdebug · dcad47fc
      David Gibson 提交于
      The ancient ppcdebug/PPCDBG mechanism is now only used in two places.
      First, in the hash setup code, one of the bits allows the size of the
      hash table to be reduced by a factor of 8 - which would be better
      accomplished with a command line option for that purpose.  The other
      was a bunch of bus walking related messages in the iSeries code, which
      would seem to be insufficient reason to keep the mechanism.
      
      This patch removes the last traces of this mechanism.
      
      Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      dcad47fc
    • B
      [PATCH] ppc64: support 64k pages · 3c726f8d
      Benjamin Herrenschmidt 提交于
      Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
      base page size to 64K.  The resulting kernel still boots on any
      hardware.  On current machines with 4K pages support only, the kernel
      will maintain 16 "subpages" for each 64K page transparently.
      
      Note that while real 64K capable HW has been tested, the current patch
      will not enable it yet as such hardware is not released yet, and I'm
      still verifying with the firmware architects the proper to get the
      information from the newer hypervisors.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3c726f8d
  22. 12 10月, 2005 1 次提交
  23. 10 10月, 2005 1 次提交
  24. 23 9月, 2005 1 次提交
    • M
      ppc64 iSeries: Update create_pte_mapping to replace iSeries_bolt_kernel() · 4c55130b
      Michael Ellerman 提交于
      early_setup() calls htab_initialize() which is similar, but not identical
      to iSeries_bolt_kernel().
      
      On iSeries the Hypervisor has already inserted some ptes for us, and we
      simply have to detect that and bolt them. iSeries_hpte_bolt_or_insert()
      implements that logic.
      
      For the case of a non-existing pte we just call iSeries_hpte_insert(). This
      appears to work, although it's not entirely equivalent to the old code in
      iSeries_make_pte() which panicked if we got a secondary slot. Not sure if
      that's important.
      
      Finally we call iSeries_hpte_bolt_or_insert() from create_pte_mapping(),
      which is called from htab_initialize() for each lmb region.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      4c55130b
  25. 21 9月, 2005 1 次提交
  26. 29 8月, 2005 2 次提交
  27. 14 7月, 2005 1 次提交
  28. 22 6月, 2005 2 次提交
  29. 06 5月, 2005 1 次提交
    • D
      [PATCH] ppc64: pgtable.h and other header cleanups · 1f8d419e
      David Gibson 提交于
      This patch started as simply removing a few never-used macros from
      asm-ppc64/pgtable.h, then kind of grew.  It now makes a bunch of
      cleanups to the ppc64 low-level header files (with corresponding
      changes to .c files where necessary) such as:
      	- Abolishing never-used macros
      	- Eliminating multiple #defines with the same purpose
      	- Removing pointless macros (cases where just expanding the
      macro everywhere turns out clearer and more sensible)
      	- Removing some cases where macros which could be defined in
      terms of each other weren't
      	- Moving imalloc() related definitions from pgtable.h to their
      own header file (imalloc.h)
      	- Re-arranging headers to group things more logically
      	- Moving all VSID allocation related things to mmu.h, instead
      of being split between mmu.h and mmu_context.h
      	- Removing some reserved space for flags from the PMD - we're
      not using it.
      	- Fix some bugs which broke compile with STRICT_MM_TYPECHECKS.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1f8d419e