1. 30 10月, 2009 1 次提交
    • D
      powerpc/mm: Allow more flexible layouts for hugepage pagetables · a4fe3ce7
      David Gibson 提交于
      Currently each available hugepage size uses a slightly different
      pagetable layout: that is, the bottem level table of pointers to
      hugepages is a different size, and may branch off from the normal page
      tables at a different level.  Every hugepage aware path that needs to
      walk the pagetables must therefore look up the hugepage size from the
      slice info first, and work out the correct way to walk the pagetables
      accordingly.  Future hardware is likely to add more possible hugepage
      sizes, more layout options and more mess.
      
      This patch, therefore reworks the handling of hugepage pagetables to
      reduce this complexity.  In the new scheme, instead of having to
      consult the slice mask, pagetable walking code can check a flag in the
      PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
      and the entry also contains the information (eseentially hugepage
      shift) necessary to then interpret that table without recourse to the
      slice mask.  This scheme can be extended neatly to handle multiple
      levels of self-describing "special" hugepage pagetables, although for
      now we assume only one level exists.
      
      This approach means that only the pagetable allocation path needs to
      know how the pagetables should be set out.  All other (hugepage)
      pagetable walking paths can just interpret the structure as they go.
      
      There already was a flag bit in PGD/PUD/PMD entries for hugepage
      directory pointers, but it was only used for debug.  We alter that
      flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
      pointer (normally it would be 1 since the pointer lies in the linear
      mapping).  This means that asm pagetable walking can test for (and
      punt on) hugepage pointers with the same test that checks for
      unpopulated page directory entries (beq becomes bge), since hugepage
      pointers will always be positive, and normal pointers always negative.
      
      While we're at it, we get rid of the confusing (and grep defeating)
      #defining of hugepte_shift to be the same thing as mmu_huge_psizes.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a4fe3ce7
  2. 02 9月, 2009 1 次提交
    • B
      powerpc/pseries: Fix to handle slb resize across migration · 46db2f86
      Brian King 提交于
      The SLB can change sizes across a live migration, which was not
      being handled, resulting in possible machine crashes during
      migration if migrating to a machine which has a smaller max SLB
      size than the source machine. Fix this by first reducing the
      SLB size to the minimum possible value, which is 32, prior to
      migration. Then during the device tree update which occurs after
      migration, we make the call to ensure the SLB gets updated. Also
      add the slb_size to the lparcfg output so that the migration
      tools can check to make sure the kernel has this capability
      before allowing migration in scenarios where the SLB size will change.
      
      BenH: Fixed #include <asm/mmu-hash64.h> -> <asm/mmu.h> to avoid
            breaking ppc32 build
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      46db2f86
  3. 20 8月, 2009 1 次提交
  4. 24 3月, 2009 1 次提交
  5. 01 12月, 2008 1 次提交
  6. 16 9月, 2008 1 次提交
    • P
      powerpc: Make the 64-bit kernel as a position-independent executable · 549e8152
      Paul Mackerras 提交于
      This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
      a position-independent executable (PIE) when it is set.  This involves
      processing the dynamic relocations in the image in the early stages of
      booting, even if the kernel is being run at the address it is linked at,
      since the linker does not necessarily fill in words in the image for
      which there are dynamic relocations.  (In fact the linker does fill in
      such words for 64-bit executables, though not for 32-bit executables,
      so in principle we could avoid calling relocate() entirely when we're
      running a 64-bit kernel at the linked address.)
      
      The dynamic relocations are processed by a new function relocate(addr),
      where the addr parameter is the virtual address where the image will be
      run.  In fact we call it twice; once before calling prom_init, and again
      when starting the main kernel.  This means that reloc_offset() returns
      0 in prom_init (since it has been relocated to the address it is running
      at), which necessitated a few adjustments.
      
      This also changes __va and __pa to use an equivalent definition that is
      simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
      constants (for 64-bit) whereas PHYSICAL_START is a variable (and
      KERNELBASE ideally should be too, but isn't yet).
      
      With this, relocatable kernels still copy themselves down to physical
      address 0 and run there.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      549e8152
  7. 11 8月, 2008 1 次提交
    • B
      powerpc/mm: Fix attribute confusion with htab_bolt_mapping() · bc033b63
      Benjamin Herrenschmidt 提交于
      The function htab_bolt_mapping() is used to create permanent
      mappings in the MMU hash table, for example, in order to create
      the linear mapping of vmemmap.  It's also used by early boot
      ioremap (before mem_init_done).
      
      However, the way ioremap uses it is incorrect as it passes it the
      protection flags in the "linux PTE" form while htab_bolt_mapping()
      expects them in the hash table format.  This is made more confusing by
      the fact that some of those flags are actually in the same position in
      both cases.
      
      This fixes it all by making htab_bolt_mapping() take normal linux
      protection flags instead, and use a little helper to convert them to
      htab flags. Callers can now use the usual PAGE_* definitions safely.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/include/asm/mmu-hash64.h |    2 -
       arch/powerpc/mm/hash_utils_64.c       |   65 ++++++++++++++++++++--------------
       arch/powerpc/mm/init_64.c             |    9 +---
       3 files changed, 44 insertions(+), 32 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bc033b63
  8. 04 8月, 2008 1 次提交
  9. 25 7月, 2008 2 次提交
  10. 15 5月, 2008 1 次提交
    • B
      [POWERPC] vmemmap fixes to use smaller pages · cec08e7a
      Benjamin Herrenschmidt 提交于
      This changes vmemmap to use a different region (region 0xf) of the
      address space, and to configure the page size of that region
      dynamically at boot.
      
      The problem with the current approach of always using 16M pages is that
      it's not well suited to machines that have small amounts of memory such
      as small partitions on pseries, or PS3's.
      
      In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
      tends to prevent hotplugging the HV's "additional" memory, thus limiting
      the available memory even more, from my experience down to something
      like 80M total, which makes it really not very useable.
      
      The logic used by my match to choose the vmemmap page size is:
      
       - If 16M pages are available and there's 1G or more RAM at boot,
         use that size.
       - Else if 64K pages are available, use that
       - Else use 4K pages
      
      I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
      and it seems to work fine.
      
      Note that I intend to change the way we organize the kernel regions &
      SLBs so the actual region will change from 0xf back to something else at
      one point, as I simplify the SLB miss handler, but that will be for a
      later patch.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cec08e7a
  11. 14 5月, 2008 1 次提交
  12. 17 4月, 2008 1 次提交
    • K
      [POWERPC] Move phys_addr_t definition into asm/types.h · d04ceb3f
      Kumar Gala 提交于
      Moved phys_addr_t out of mmu-*.h and into asm/types.h so we can use it in
      places that before would have caused recursive includes.
      
      For example to use phys_addr_t in <asm/page.h> we would have included
      <asm/mmu.h> which would have possibly included <asm/mmu-hash64.h> which
      includes <asm/page.h>.  Wheeee recursive include.
      
      CONFIG_PHYS_64BIT is a bit counterintuitive in light of ppc64 systems
      and thus the config option is only used for ppc32 systems with >32-bit
      physical addresses (44x, 85xx, 745x, etc.).
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d04ceb3f
  13. 24 1月, 2008 1 次提交
    • P
      [POWERPC] Provide a way to protect 4k subpages when using 64k pages · fa28237c
      Paul Mackerras 提交于
      Using 64k pages on 64-bit PowerPC systems makes life difficult for
      emulators that are trying to emulate an ISA, such as x86, which use a
      smaller page size, since the emulator can no longer use the MMU and
      the normal system calls for controlling page protections.  Of course,
      the emulator can emulate the MMU by checking and possibly remapping
      the address for each memory access in software, but that is pretty
      slow.
      
      This provides a facility for such programs to control the access
      permissions on individual 4k sub-pages of 64k pages.  The idea is
      that the emulator supplies an array of protection masks to apply to a
      specified range of virtual addresses.  These masks are applied at the
      level where hardware PTEs are inserted into the hardware page table
      based on the Linux PTEs, so the Linux PTEs are not affected.  Note
      that this new mechanism does not allow any access that would otherwise
      be prohibited; it can only prohibit accesses that would otherwise be
      allowed.  This new facility is only available on 64-bit PowerPC and
      only when the kernel is configured for 64k pages.
      
      The masks are supplied using a new subpage_prot system call, which
      takes a starting virtual address and length, and a pointer to an array
      of protection masks in memory.  The array has a 32-bit word per 64k
      page to be protected; each 32-bit word consists of 16 2-bit fields,
      for which 0 allows any access (that is otherwise allowed), 1 prevents
      write accesses, and 2 or 3 prevent any access.
      
      Implicit in this is that the regions of the address space that are
      protected are switched to use 4k hardware pages rather than 64k
      hardware pages (on machines with hardware 64k page support).  In fact
      the whole process is switched to use 4k hardware pages when the
      subpage_prot system call is used, but this could be improved in future
      to switch only the affected segments.
      
      The subpage protection bits are stored in a 3 level tree akin to the
      page table tree.  The top level of this tree is stored in a structure
      that is appended to the top level of the page table tree, i.e., the
      pgd array.  Since it will often only be 32-bit addresses (below 4GB)
      that are protected, the pointers to the first four bottom level pages
      are also stored in this structure (each bottom level page contains the
      protection bits for 1GB of address space), so the protection bits for
      addresses below 4GB can be accessed with one fewer loads than those
      for higher addresses.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      fa28237c
  14. 17 1月, 2008 2 次提交
  15. 15 1月, 2008 1 次提交
    • P
      [POWERPC] Fix boot failure on POWER6 · dfbe0d3b
      Paul Mackerras 提交于
      Commit 473980a9 added a call to clear
      the SLB shadow buffer before registering it.  Unfortunately this means
      that we clear out the entries that slb_initialize has previously set in
      there.  On POWER6, the hypervisor uses the SLB shadow buffer when doing
      partition switches, and that means that after the next partition switch,
      each non-boot CPU has no SLB entries to map the kernel text and data,
      which causes it to crash.
      
      This fixes it by reverting most of 473980a9 and instead clearing the
      3rd entry explicitly in slb_initialize.  This fixes the problem that
      473980a9 was trying to solve, but without breaking POWER6.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      dfbe0d3b
  16. 11 1月, 2008 1 次提交
  17. 11 12月, 2007 1 次提交
  18. 12 10月, 2007 1 次提交
    • P
      [POWERPC] Use 1TB segments · 1189be65
      Paul Mackerras 提交于
      This makes the kernel use 1TB segments for all kernel mappings and for
      user addresses of 1TB and above, on machines which support them
      (currently POWER5+, POWER6 and PA6T).
      
      We detect that the machine supports 1TB segments by looking at the
      ibm,processor-segment-sizes property in the device tree.
      
      We don't currently use 1TB segments for user addresses < 1T, since
      that would effectively prevent 32-bit processes from using huge pages
      unless we also had a way to revert to using 256MB segments.  That
      would be possible but would involve extra complications (such as
      keeping track of which segment size was used when HPTEs were inserted)
      and is not addressed here.
      
      Parts of this patch were originally written by Ben Herrenschmidt.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1189be65
  19. 03 10月, 2007 1 次提交
  20. 03 8月, 2007 1 次提交
    • M
      [POWERPC] Fixes for the SLB shadow buffer code · 67439b76
      Michael Neuling 提交于
      On a machine with hardware 64kB pages and a kernel configured for a
      64kB base page size, we need to change the vmalloc segment from 64kB
      pages to 4kB pages if some driver creates a non-cacheable mapping in
      the vmalloc area.  However, we never updated with SLB shadow buffer.
      This fixes it.  Thanks to paulus for finding this.
      
      Also added some write barriers to ensure the shadow buffer contents
      are always consistent.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      67439b76
  21. 25 6月, 2007 1 次提交
  22. 14 6月, 2007 1 次提交
  23. 10 5月, 2007 1 次提交
  24. 09 5月, 2007 1 次提交
    • B
      [POWERPC] Introduce address space "slices" · d0f13e3c
      Benjamin Herrenschmidt 提交于
      The basic issue is to be able to do what hugetlbfs does but with
      different page sizes for some other special filesystems; more
      specifically, my need is:
      
       - Huge pages
      
       - SPE local store mappings using 64K pages on a 4K base page size
      kernel on Cell
      
       - Some special 4K segments in 64K-page kernels for mapping a dodgy
      type of powerpc-specific infiniband hardware that requires 4K MMU
      mappings for various reasons I won't explain here.
      
      The main issues are:
      
       - To maintain/keep track of the page size per "segment" (as we can
      only have one page size per segment on powerpc, which are 256MB
      divisions of the address space).
      
       - To make sure special mappings stay within their allotted
      "segments" (including MAP_FIXED crap)
      
       - To make sure everybody else doesn't mmap/brk/grow_stack into a
      "segment" that is used for a special mapping
      
      Some of the necessary mechanisms to handle that were present in the
      hugetlbfs code, but mostly in ways not suitable for anything else.
      
      The patch relies on some changes to the generic get_unmapped_area()
      that just got merged.  It still hijacks hugetlb callbacks here or
      there as the generic code hasn't been entirely cleaned up yet but
      that shouldn't be a problem.
      
      So what is a slice ?  Well, I re-used the mechanism used formerly by our
      hugetlbfs implementation which divides the address space in
      "meta-segments" which I called "slices".  The division is done using
      256MB slices below 4G, and 1T slices above.  Thus the address space is
      divided currently into 16 "low" slices and 16 "high" slices.  (Special
      case: high slice 0 is the area between 4G and 1T).
      
      Doing so simplifies significantly the tracking of segments and avoids
      having to keep track of all the 256MB segments in the address space.
      
      While I used the "concepts" of hugetlbfs, I mostly re-implemented
      everything in a more generic way and "ported" hugetlbfs to it.
      
      Slices can have an associated page size, which is encoded in the mmu
      context and used by the SLB miss handler to set the segment sizes.  The
      hash code currently doesn't care, it has a specific check for hugepages,
      though I might add a mechanism to provide per-slice hash mapping
      functions in the future.
      
      The slice code provide a pair of "generic" get_unmapped_area() (bottomup
      and topdown) functions that should work with any slice size.  There is
      some trickiness here so I would appreciate people to have a look at the
      implementation of these and let me know if I got something wrong.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d0f13e3c
  25. 27 4月, 2007 1 次提交
    • D
      [POWERPC] Prepare for splitting up mmu.h by MMU type · 8d2169e8
      David Gibson 提交于
      Currently asm-powerpc/mmu.h has definitions for the 64-bit hash based
      MMU.  If CONFIG_PPC64 is not set, it instead includes asm-ppc/mmu.h
      which contains a particularly horrible mess of #ifdefs giving the
      definitions for all the various 32-bit MMUs.
      
      It would be nice to have the low level definitions for each MMU type
      neatly in their own separate files.  It would also be good to wean
      arch/powerpc off dependence on the old asm-ppc/mmu.h.
      
      This patch makes a start on such a cleanup by moving the definitions
      for the 64-bit hash MMU to their own file, asm-powerpc/mmu_hash64.h.
      Definitions for the other MMUs still all come from asm-ppc/mmu.h,
      however each MMU type can now be one-by-one moved over to their own
      file, in the process cleaning them up stripping them of cruft no
      longer necessary in arch/powerpc.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8d2169e8
  26. 24 4月, 2007 1 次提交
    • A
      [POWERPC] spufs: make spu page faults not block scheduling · 57dace23
      Arnd Bergmann 提交于
      Until now, we have always entered the spu page fault handler
      with a mutex for the spu context held. This has multiple
      bad side-effects:
      - it becomes impossible to suspend the context during
        page faults
      - if an spu program attempts to access its own mmio
        areas through DMA, we get an immediate livelock when
        the nopage function tries to acquire the same mutex
      
      This patch makes the page fault logic operate on a
      struct spu_context instead of a struct spu, and moves it
      from spu_base.c to a new file fault.c inside of spufs.
      
      We now also need to copy the dar and dsisr contents
      of the last fault into the saved context to have it
      accessible in case we schedule out the context before
      activating the page fault handler.
      Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com>
      57dace23
  27. 07 2月, 2007 1 次提交
  28. 16 10月, 2006 1 次提交
  29. 28 6月, 2006 1 次提交
  30. 15 6月, 2006 1 次提交
    • P
      powerpc: Use 64k pages without needing cache-inhibited large pages · bf72aeba
      Paul Mackerras 提交于
      Some POWER5+ machines can do 64k hardware pages for normal memory but
      not for cache-inhibited pages.  This patch lets us use 64k hardware
      pages for most user processes on such machines (assuming the kernel
      has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
      start out using 64k pages and get switched to 4k pages if they use any
      non-cacheable mappings.
      
      With this, we use 64k pages for the vmalloc region and 4k pages for
      the imalloc region.  If anything creates a non-cacheable mapping in
      the vmalloc region, the vmalloc region will get switched to 4k pages.
      I don't know of any driver other than the DRM that would do this,
      though, and these machines don't have AGP.
      
      When a region gets switched from 64k pages to 4k pages, we do not have
      to clear out all the 64k HPTEs from the hash table immediately.  We
      use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
      was hashed in as a 64k page or a set of 4k pages.  If hash_page is
      trying to insert a 4k page for a Linux PTE and it sees that it has
      already been inserted as a 64k page, it first invalidates the 64k HPTE
      before inserting the 4k HPTE.  The hash invalidation routines also use
      the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
      set of 4k HPTEs to remove.  With those two changes, we can tolerate a
      mix of 4k and 64k HPTEs in the hash table, and they will all get
      removed when the address space is torn down.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bf72aeba
  31. 09 6月, 2006 2 次提交
    • B
      [PATCH] powerpc: Fix buglet with MMU hash management · c5cf0e30
      Benjamin Herrenschmidt 提交于
      Our MMU hash management code would not set the "C" bit (changed bit) in
      the hardware PTE when updating a RO PTE into a RW PTE. That would cause
      the hardware to possibly to a write back to the hash table to set it on
      the first store access, which in addition to being a performance issue,
      might also hit a bug when running with native hash management (non-HV)
      as our code is specifically optimized for the case where no write back
      happens.
      
      Thus there is a very small therocial window were a hash PTE can become
      corrupted if that HPTE has just been upgraded to read write, a store
      access happens on it, and that races with another processor evicting
      that same slot. Since eviction (caused by an almost full hash) is
      extremely rare, the bug is very unlikely to happen fortunately.
      
      This fixes by allowing the updating of the protection bits in the native
      hash handling to also set (but not clear) the "C" bit, and, in order to
      also improve performances in the general case, by always setting that
      bit on newly inserted hash PTE so that writeback really never happens.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c5cf0e30
    • B
      [PATCH] powerpc vdso updates · a5bba930
      Benjamin Herrenschmidt 提交于
      This patch cleans up some locking & error handling in the ppc vdso and
      moves the vdso base pointer from the thread struct to the mm context
      where it more logically belongs. It brings the powerpc implementation
      closer to Ingo's new x86 one and also adds an arch_vma_name() function
      allowing to print [vsdo] in /proc/<pid>/maps if Ingo's x86 vdso patch is
      also applied.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a5bba930
  32. 22 3月, 2006 1 次提交
  33. 24 2月, 2006 1 次提交
  34. 09 1月, 2006 3 次提交
    • A
      [PATCH] powerpc: sanitize header files for user space includes · 88ced031
      Arnd Bergmann 提交于
      include/asm-ppc/ had #ifdef __KERNEL__ in all header files that
      are not meant for use by user space, include/asm-powerpc does
      not have this yet.
      
      This patch gets us a lot closer there. There are a few cases
      where I was not sure, so I left them out. I have verified
      that no CONFIG_* symbols are used outside of __KERNEL__
      any more and that there are no obvious compile errors when
      including any of the headers in user space libraries.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      88ced031
    • M
      [PATCH] powerpc: Fixups for kernel linked at 32 MB · 758438a7
      Michael Ellerman 提交于
      There's a few places where we need to fix things up for the kernel to work
      if it's linked at 32MB:
      
       - platforms/powermac/smp.c
         To start secondary cpus on pmac we patch the reset vector, which is fine.
         Except if we're above 32MB we don't have enough bits for an absolute branch,
         it needs to relative.
       - kernel/head_64.s
          - A few branches in the cpu hold code need to load the full target address
            and do a bctr.
          - after_prom_start needs to load PHYSICAL_START as the dest address, not 0.
          - The exception prolog needs to load the low word of the target adddress,
            not just the low halfword.
          - Fixup handling of the initial stab address.
       - kernel/setup_64.c
         smp_release_cpus() needs to write 1 to the spinloop flag near 0, not 32 MB.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      758438a7
    • B
      [PATCH] powerpc: Add OF address parsing code (#2) · d1405b86
      Benjamin Herrenschmidt 提交于
      Parsing addresses extracted from Open Firmware isn't a simple matter. We
      have various bits of code that try to do it in various place, including
      some heuristics in prom.c that pre-parse addresses at boot and fill
      device-nodes "addrs", but those are dodgy at best and I want to
      deprecate them. So this patch introduces a new set of routines that
      should be capable of parsing most types of addresses and translating
      them into CPU physical addresses. It currently works for things on PCI
      busses and ISA busses and should work on "standard" busses like the root
      bus or the MacIO bus that don't put funky flags in addresses. If you
      have other bus types that do use funky flags, you'll have to add new bus
      type translators, which is fairly easy.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      d1405b86
  35. 09 12月, 2005 1 次提交
    • D
      [PATCH] powerpc: Add missing icache flushes for hugepages · cbf52afd
      David Gibson 提交于
      On most powerpc CPUs, the dcache and icache are not coherent so
      between writing and executing a page, the caches must be flushed.
      Userspace programs assume pages given to them by the kernel are icache
      clean, so we must do this flush between the kernel clearing a page and
      it being mapped into userspace for execute.  We were not doing this
      for hugepages, this patch corrects the situation.
      
      We use the same lazy mechanism as we use for normal pages, delaying
      the flush until userspace actually attempts to execute from the page
      in question.
      
      Tested on G5.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cbf52afd