1. 09 6月, 2009 1 次提交
  2. 24 3月, 2009 3 次提交
    • B
      powerpc/mm: Add option for non-atomic PTE updates to ppc64 · a033a487
      Benjamin Herrenschmidt 提交于
      ppc32 has it already, add it to ppc64 as a preliminary for adding
      support for Book3E 64-bit support
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a033a487
    • B
      powerpc/mm: Merge various PTE bits and accessors definitions · 71087002
      Benjamin Herrenschmidt 提交于
      Now that they are almost identical, we can merge some of the definitions
      related to the PTE format into common files.
      
      This creates a new pte-common.h which is included by both 32 and 64-bit
      right after the CPU specific pte-*.h file, and which defines some
      bits to "default" values if they haven't been defined already, and
      then provides a generic definition of most of the bit combinations
      based on these and exposed to the rest of the kernel.
      
      I also moved to the common pgtable.h most of the "small" accessors to the
      PTE bits and modification helpers (pte_mk*). The actual accessors remain
      in their separate files.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      71087002
    • B
      powerpc/mm: Tweak PTE bit combination definitions · 8d1cf34e
      Benjamin Herrenschmidt 提交于
      This patch tweaks the way some PTE bit combinations are defined, in such a
      way that the 32 and 64-bit variant become almost identical and that will
      make it easier to bring in a new common pte-* file for the new variant
      of the Book3-E support.
      
      The combination of bits defining access to kernel pages are now clearly
      separated from the combination used by userspace and the core VM. The
      resulting generated code should remain identical unless I made a mistake.
      
      Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
      in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
      that option is enabled. Probably something that bitrot.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d1cf34e
  3. 20 3月, 2009 1 次提交
    • B
      powerpc/mm: Split the various pgtable-* headers based on MMU type · c605782b
      Benjamin Herrenschmidt 提交于
      This patch moves the definition of the PTE format for each MMU type
      to separate files instead of all in one file. This improves overall
      maintainability and will make it easier to add new types.
      
      On 64-bit, additionally, I've separated the headers relative to the
      format of the page table tree (3 vs. 4 levels for 64K vs 4K pages)
      from the headers specific to the PTE format for hash based processors,
      this will make it easier to add support for Book3 "E" 64-bit
      implementations.
      
      There are still some type-related ifdef's in the generic headers,
      we might remove them in the long run, but this patch shouldn't result
      in any code change, -hopefully- just definitions being moved around.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c605782b
  4. 11 2月, 2009 1 次提交
    • B
      powerpc/mm: Rework I$/D$ coherency (v3) · 8d30c14c
      Benjamin Herrenschmidt 提交于
      This patch reworks the way we do I and D cache coherency on PowerPC.
      
      The "old" way was split in 3 different parts depending on the processor type:
      
         - Hash with per-page exec support (64-bit and >= POWER4 only) does it
      at hashing time, by preventing exec on unclean pages and cleaning pages
      on exec faults.
      
         - Everything without per-page exec support (32-bit hash, 8xx, and
      64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
      
         - Embedded with per-page exec support does it from do_page_fault() on
      exec faults, in a way similar to what the hash code does.
      
      That leads to confusion, and bugs. For example, the method using update_mmu_cache()
      is racy on SMP where another processor can see the new PTE and hash it in before
      we have cleaned the cache, and then blow trying to execute. This is hard to hit but
      I think it has bitten us in the past.
      
      Also, it's inefficient for embedded where we always end up having to do at least
      one more page fault.
      
      This reworks the whole thing by moving the cache sync into two main call sites,
      though we keep different behaviours depending on the HW capability. The call
      sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
      which joins the former in pgtable.c
      
      The base idea for Embedded with per-page exec support, is that we now do the
      flush at set_pte_at() time when coming from an exec fault, which allows us
      to avoid the double fault problem completely (we can even improve the situation
      more by implementing TLB preload in update_mmu_cache() but that's for later).
      
      If for some reason we didn't do it there and we try to execute, we'll hit
      the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
      to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
      this guys also perform the I/D cache sync for exec faults now. This second path
      is the catch all for things that weren't cleaned at set_pte_at() time.
      
      For cpus without per-pag exec support, we always do the sync at set_pte_at(),
      thus guaranteeing that when the PTE is visible to other processors, the cache
      is clean.
      
      For the 64-bit hash with per-page exec support case, we keep the old mechanism
      for now. I'll look into changing it later, once I've reworked a bit how we
      use _PAGE_EXEC.
      
      This is also a first step for adding _PAGE_EXEC support for embedded platforms
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d30c14c
  5. 21 12月, 2008 1 次提交
    • B
      powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED · 64b3d0e8
      Benjamin Herrenschmidt 提交于
      Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
      in the hash code based on some CPU feature bit.  We also manipulate
      _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.
      
      This changes the logic so that instead, the PTE now contains
      _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
      that need it.  The hash code clears it if the feature bit is not set.
      
      It also adds some clean accessors to setup various valid combinations
      of access flags and change various bits of code to use them instead.
      
      This should help having the PTE actually containing the bit
      combinations that we really want.
      
      I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
      set it explicitely from the TLB miss.  I will ultimately remove it
      completely as it appears that it might not be needed after all
      but in the meantime, having it in the TLB miss makes things a
      lot easier.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NKumar Gala <galak@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      64b3d0e8
  6. 16 12月, 2008 1 次提交
  7. 14 10月, 2008 1 次提交
  8. 03 9月, 2008 1 次提交
  9. 04 8月, 2008 1 次提交
  10. 30 7月, 2008 1 次提交
  11. 28 7月, 2008 1 次提交
  12. 25 7月, 2008 1 次提交
    • B
      powerpc ioremap_prot · a1f242ff
      Benjamin Herrenschmidt 提交于
      This adds ioremap_prot and pte_pgprot() so that one can extract protection
      bits from a PTE and use them to ioremap_prot() (in order to support ptrace
      of VM_IO | VM_PFNMAP as per Rik's patch).
      
      This moves a couple of flag checks around in the ioremap implementations
      of arch/powerpc.  There's a side effect of allowing non-cacheable and
      non-guarded mappings on ppc32 which before would always have _PAGE_GUARDED
      set whenever _PAGE_NO_CACHE is.
      
      (standard ioremap will still set _PAGE_GUARDED, but ioremap_prot will be
      capable of setting such a non guarded mapping).
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Dave Airlie <airlied@linux.ie>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1f242ff
  13. 09 7月, 2008 2 次提交
    • D
      powerpc/mm: Define flags for Strong Access Ordering · aba46c50
      Dave Kleikamp 提交于
      This patch defines:
      
      - PROT_SAO, which is passed into mmap() and mprotect() in the prot field
      - VM_SAO in vma->vm_flags, and
      - _PAGE_SAO, the combination of WIMG bits in the pte that enables strong
      access ordering for the page.
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      aba46c50
    • D
      Correct hash flushing from huge_ptep_set_wrprotect() · 86df8642
      David Gibson 提交于
      As Andy Whitcroft recently pointed out, the current powerpc version of
      huge_ptep_set_wrprotect() has a bug.  It just calls ptep_set_wrprotect()
      which in turn calls pte_update() then hpte_need_flush() with the 'huge'
      argument set to 0.  This will cause hpte_need_flush() to flush the wrong
      hash entries (of any).  Andy's fix for this is already in the powerpc
      tree as commit 016b33c4.
      
      I have confirmed this is a real bug, not masked by some other
      synchronization, with a new testcase for libhugetlbfs.  A process write
      a (MAP_PRIVATE) hugepage mapping, fork(), then alter the mapping and
      have the child incorrectly see the second write.
      
      Therefore, this should be fixed for 2.6.26, and for the stable tree.
      Here is a suitable patch for 2.6.26, which I think will also be suitable
      for the stable tree (neither of the headers in question has been changed
      much recently).
      
      It is cut down slighlty from Andy's original version, in that it does
      not include a 32-bit version of huge_ptep_set_wrprotect().  Currently,
      hugepages are not supported on any 32-bit powerpc platform.  When they
      are, a suitable 32-bit version can be added - the only 32-bit hardware
      which supports hugepages does not use the conventional hashtable MMU and
      so will have different needs anyway.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86df8642
  14. 01 7月, 2008 1 次提交
    • A
      powerpc: Add 64 bit version of huge_ptep_set_wrprotect · 016b33c4
      Andy Whitcroft 提交于
      The implementation of huge_ptep_set_wrprotect() directly calls
      ptep_set_wrprotect() to mark a hugepte write protected.  However this
      call is not appropriate on ppc64 kernels as this is a small page only
      implementation.  This can lead to the hash not being flushed correctly
      when a mapping is being converted to COW, allowing processes to continue
      using the original copy.
      
      Currently huge_ptep_set_wrprotect() unconditionally calls
      ptep_set_wrprotect().  This is fine on ppc32 kernels as this call is
      generic.  On 64 bit this is implemented as:
      
      	pte_update(mm, addr, ptep, _PAGE_RW, 0);
      
      On ppc64 this last parameter is the page size and is passed directly on
      to hpte_need_flush():
      
      	hpte_need_flush(mm, addr, ptep, old, huge);
      
      And this directly affects the page size we pass to flush_hash_page():
      
      	flush_hash_page(vaddr, rpte, psize, ssize, 0);
      
      As this changes the way the hash is calculated we will flush the wrong
      pages, potentially leaving live hashes to the original page.
      
      Move the definition of huge_ptep_set_wrprotect() to the 32/64 bit specific
      headers.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      016b33c4
  15. 30 6月, 2008 1 次提交
  16. 15 5月, 2008 1 次提交
    • B
      [POWERPC] vmemmap fixes to use smaller pages · cec08e7a
      Benjamin Herrenschmidt 提交于
      This changes vmemmap to use a different region (region 0xf) of the
      address space, and to configure the page size of that region
      dynamically at boot.
      
      The problem with the current approach of always using 16M pages is that
      it's not well suited to machines that have small amounts of memory such
      as small partitions on pseries, or PS3's.
      
      In fact, on the PS3, failure to allocate the 16M page backing vmmemmap
      tends to prevent hotplugging the HV's "additional" memory, thus limiting
      the available memory even more, from my experience down to something
      like 80M total, which makes it really not very useable.
      
      The logic used by my match to choose the vmemmap page size is:
      
       - If 16M pages are available and there's 1G or more RAM at boot,
         use that size.
       - Else if 64K pages are available, use that
       - Else use 4K pages
      
      I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages)
      and it seems to work fine.
      
      Note that I intend to change the way we organize the kernel regions &
      SLBs so the actual region will change from 0xf back to something else at
      one point, as I simplify the SLB miss handler, but that will be for a
      later patch.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cec08e7a
  17. 28 4月, 2008 1 次提交
    • N
      mm: introduce pte_special pte bit · 7e675137
      Nick Piggin 提交于
      s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory
      model (which is more dynamic than most).  Instead, they had proposed to
      implement it with an additional path through vm_normal_page(), using a bit in
      the pte to determine whether or not the page should be refcounted:
      
      vm_normal_page()
      {
      	...
              if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                      if (vma->vm_flags & VM_MIXEDMAP) {
      #ifdef s390
      			if (!mixedmap_refcount_pte(pte))
      				return NULL;
      #else
                              if (!pfn_valid(pfn))
                                      return NULL;
      #endif
                              goto out;
                      }
      	...
      }
      
      This is fine, however if we are allowed to use a bit in the pte to determine
      refcountedness, we can use that to _completely_ replace all the vma based
      schemes.  So instead of adding more cases to the already complex vma-based
      scheme, we can have a clearly seperate and simple pte-based scheme (and get
      slightly better code generation in the process):
      
      vm_normal_page()
      {
      #ifdef s390
      	if (!mixedmap_refcount_pte(pte))
      		return NULL;
      	return pte_page(pte);
      #else
      	...
      #endif
      }
      
      And finally, we may rather make this concept usable by any architecture rather
      than making it s390 only, so implement a new type of pte state for this.
      Unfortunately the old vma based code must stay, because some architectures may
      not be able to spare pte bits.  This makes vm_normal_page a little bit more
      ugly than we would like, but the 2 cases are clearly seperate.
      
      So introduce a pte_special pte state, and use it in mm/memory.c.  It is
      currently a noop for all architectures, so this doesn't actually result in any
      compiled code changes to mm/memory.o.
      
      BTW:
      I haven't put vm_normal_page() into arch code as-per an earlier suggestion.
      The reason is that, regardless of where vm_normal_page is actually
      implemented, the *abstraction* is still exactly the same. Also, while it
      depends on whether the architecture has pte_special or not, that is the
      only two possible cases, and it really isn't an arch specific function --
      the role of the arch code should be to provide primitive functions and
      accessors with which to build the core code; pte_special does that. We do
      not want architectures to know or care about vm_normal_page itself, and
      we definitely don't want them being able to invent something new there
      out of sight of mm/ code. If we made vm_normal_page an arch function, then
      we have to make vm_insert_mixed (next patch) an arch function too. So I
      don't think moving it to arch code fundamentally improves any abstractions,
      while it does practically make the code more difficult to follow, for both
      mm and arch developers, and easier to misuse.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NCarsten Otte <cotte@de.ibm.com>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e675137
  18. 17 10月, 2007 1 次提交
  19. 18 7月, 2007 1 次提交
  20. 17 7月, 2007 1 次提交
  21. 17 6月, 2007 1 次提交
  22. 14 6月, 2007 2 次提交
    • D
      [POWERPC] Start factoring pgtable-ppc32.h and pgtable-ppc64.h · 9c709f3b
      David Gibson 提交于
      This factors some things defined in both pgtable-ppc32.h and
      pgtable-ppc64.h into the common part of asm-powerpc/pgtable.h.  These
      are all things which have essentially identical definitions, and which
      by their nature are very unlikely ever to need different definitions
      in the two cases.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9c709f3b
    • B
      [POWERPC] Rewrite IO allocation & mapping on powerpc64 · 3d5134ee
      Benjamin Herrenschmidt 提交于
      This rewrites pretty much from scratch the handling of MMIO and PIO
      space allocations on powerpc64.  The main goals are:
      
       - Get rid of imalloc and use more common code where possible
       - Simplify the current mess so that PIO space is allocated and
         mapped in a single place for PCI bridges
       - Handle allocation constraints of PIO for all bridges including
         hot plugged ones within the 2GB space reserved for IO ports,
         so that devices on hotplugged busses will now work with drivers
         that assume IO ports fit in an int.
       - Cleanup and separate tracking of the ISA space in the reserved
         low 64K of IO space. No ISA -> Nothing mapped there.
      
      I booted a cell blade with IDE on PIO and MMIO and a dual G5 so
      far, that's it :-)
      
      With this patch, all allocations are done using the code in
      mm/vmalloc.c, though we use the low level __get_vm_area with
      explicit start/stop constraints in order to manage separate
      areas for vmalloc/vmap, ioremap, and PCI IOs.
      
      This greatly simplifies a lot of things, as you can see in the
      diffstat of that patch :-)
      
      A new pair of functions pcibios_map/unmap_io_space() now replace
      all of the previous code that used to manipulate PCI IOs space.
      The allocation is done at mapping time, which is now called from
      scan_phb's, just before the devices are probed (instead of after,
      which is by itself a bug fix). The only other caller is the PCI
      hotplug code for hot adding PCI-PCI bridges (slots).
      
      imalloc is gone, as is the "sub-allocation" thing, but I do beleive
      that hotplug should still work in the sense that the space allocation
      is always done by the PHB, but if you unmap a child bus of this PHB
      (which seems to be possible), then the code should properly tear
      down all the HPTE mappings for that area of the PHB allocated IO space.
      
      I now always reserve the first 64K of IO space for the bridge with
      the ISA bus on it. I have moved the code for tracking ISA in a separate
      file which should also make it smarter if we ever are capable of
      hot unplugging or re-plugging an ISA bridge.
      
      This should have a side effect on platforms like powermac where VGA IOs
      will no longer work. This is done on purpose though as they would have
      worked semi-randomly before. The idea at this point is to isolate drivers
      that might need to access those and fix them by providing a proper
      function to obtain an offset to the legacy IOs of a given bus.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3d5134ee
  23. 02 5月, 2007 1 次提交
    • D
      [POWERPC] Remove arch/powerpc's dependence on asm-ppc/pg{alloc,table}.h · f88df14b
      David Gibson 提交于
      Currently, all 32-bit powerpc platforms use asm-ppc/pgtable.h and
      asm-ppc/pgalloc.h, even when otherwise compiled with ARCH=powerpc.
      Those asm-ppc files are a fairly nasty tangle of #ifdefs including a
      bunch of things which shouldn't be necessary any more in arch/powerpc.
      
      Cleaning up that mess is going to take a while, but this patch is a
      first step.  It separates the asm-powerpc/pg{alloc,table}.h into 64
      bit and 32 bit versions in asm-powerpc, which the basic .h files in
      asm-powerpc select based on config.  We make a few tiny tweaks to the
      innards of the files along the way, making the outermost ifdefs
      (double-inclusion protection and __KERNEL__) a little cleaner, and
      #including asm-generic/pgtable.h from the top-level
      asm-powerpc/pgtable.h (since both the old 32-bit and 64-bit versions
      ended with such an #include).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f88df14b
  24. 24 4月, 2007 1 次提交
    • D
      [POWERPC] Cleanup and fix breakage in tlbflush.h · 62102307
      David Gibson 提交于
      BenH's commit a741e679 in powerpc.git,
      although (AFAICT) only intended to affect ppc64, also has side-effects
      which break 44x.  I think 40x, 8xx and Freescale Book E are also
      affected, though I haven't tested them.
      
      The problem lies in unconditionally removing flush_tlb_pending() from
      the versions of flush_tlb_mm(), flush_tlb_range() and
      flush_tlb_kernel_range() used on ppc64 - which are also used the
      embedded platforms mentioned above.
      
      The patch below cleans up the convoluted #ifdef logic in tlbflush.h,
      in the process restoring the necessary flushes for the software TLB
      platforms.  There are three sets of definitions for the flushing
      hooks: the software TLB versions (revised to avoid using names which
      appear to related to TLB batching), the 32-bit hash based versions
      (external functions) amd the 64-bit hash based versions (which
      implement batching).
      
      It also moves the declaration of update_mmu_cache() to always be in
      tlbflush.h (previously it was in tlbflush.h except for PPC64, where it
      was in pgtable.h).
      
      Booted on Ebony (440GP) and compiled for 64-bit and 32-bit
      multiplatform.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      62102307
  25. 13 4月, 2007 1 次提交
    • B
      [POWERPC] Make tlb flush batch use lazy MMU mode · a741e679
      Benjamin Herrenschmidt 提交于
      The current tlb flush code on powerpc 64 bits has a subtle race since we
      lost the page table lock due to the possible faulting in of new PTEs
      after a previous one has been removed but before the corresponding hash
      entry has been evicted, which can leads to all sort of fatal problems.
      
      This patch reworks the batch code completely. It doesn't use the mmu_gather
      stuff anymore. Instead, we use the lazy mmu hooks that were added by the
      paravirt code. They have the nice property that the enter/leave lazy mmu
      mode pair is always fully contained by the PTE lock for a given range
      of PTEs. Thus we can guarantee that all batches are flushed on a given
      CPU before it drops that lock.
      
      We also generalize batching for any PTE update that require a flush.
      
      Batching is now enabled on a CPU by arch_enter_lazy_mmu_mode() and
      disabled by arch_leave_lazy_mmu_mode(). The code epects that this is
      always contained within a PTE lock section so no preemption can happen
      and no PTE insertion in that range from another CPU. When batching
      is enabled on a CPU, every PTE updates that need a hash flush will
      use the batch for that flush.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a741e679
  26. 26 9月, 2006 1 次提交
    • D
      [PATCH] Standardize pxx_page macros · 46a82b2d
      Dave McCracken 提交于
      One of the changes necessary for shared page tables is to standardize the
      pxx_page macros.  pte_page and pmd_page have always returned the struct
      page associated with their entry, while pte_page_kernel and pmd_page_kernel
      have returned the kernel virtual address.  pud_page and pgd_page, on the
      other hand, return the kernel virtual address.
      
      Shared page tables needs pud_page and pgd_page to return the actual page
      structures.  There are very few actual users of these functions, so it is
      simple to standardize their usage.
      
      Since this is basic cleanup, I am submitting these changes as a standalone
      patch.  Per Hugh Dickins' comments about it, I am also changing the
      pxx_page_kernel macros to pxx_page_vaddr to clarify their meaning.
      Signed-off-by: NDave McCracken <dmccr@us.ibm.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46a82b2d
  27. 15 6月, 2006 1 次提交
    • P
      powerpc: Use 64k pages without needing cache-inhibited large pages · bf72aeba
      Paul Mackerras 提交于
      Some POWER5+ machines can do 64k hardware pages for normal memory but
      not for cache-inhibited pages.  This patch lets us use 64k hardware
      pages for most user processes on such machines (assuming the kernel
      has been configured with CONFIG_PPC_64K_PAGES=y).  User processes
      start out using 64k pages and get switched to 4k pages if they use any
      non-cacheable mappings.
      
      With this, we use 64k pages for the vmalloc region and 4k pages for
      the imalloc region.  If anything creates a non-cacheable mapping in
      the vmalloc region, the vmalloc region will get switched to 4k pages.
      I don't know of any driver other than the DRM that would do this,
      though, and these machines don't have AGP.
      
      When a region gets switched from 64k pages to 4k pages, we do not have
      to clear out all the 64k HPTEs from the hash table immediately.  We
      use the _PAGE_COMBO bit in the Linux PTE to indicate whether the page
      was hashed in as a 64k page or a set of 4k pages.  If hash_page is
      trying to insert a 4k page for a Linux PTE and it sees that it has
      already been inserted as a 64k page, it first invalidates the 64k HPTE
      before inserting the 4k HPTE.  The hash invalidation routines also use
      the _PAGE_COMBO bit, to determine whether to look for a 64k HPTE or a
      set of 4k HPTEs to remove.  With those two changes, we can tolerate a
      mix of 4k and 64k HPTEs in the hash table, and they will all get
      removed when the address space is torn down.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      bf72aeba
  28. 26 4月, 2006 1 次提交
  29. 22 3月, 2006 1 次提交
    • D
      [PATCH] hugepage: Fix hugepage logic in free_pgtables() · 9da61aef
      David Gibson 提交于
      free_pgtables() has special logic to call hugetlb_free_pgd_range() instead
      of the normal free_pgd_range() on hugepage VMAs.  However, the test it uses
      to do so is incorrect: it calls is_hugepage_only_range on a hugepage sized
      range at the start of the vma.  is_hugepage_only_range() will return true
      if the given range has any intersection with a hugepage address region, and
      in this case the given region need not be hugepage aligned.  So, for
      example, this test can return true if called on, say, a 4k VMA immediately
      preceding a (nicely aligned) hugepage VMA.
      
      At present we get away with this because the powerpc version of
      hugetlb_free_pgd_range() is just a call to free_pgd_range().  On ia64 (the
      only other arch with a non-trivial is_hugepage_only_range()) we get away
      with it for a different reason; the hugepage area is not contiguous with
      the rest of the user address space, and VMAs are not permitted in between,
      so the test can't return a false positive there.
      
      Nonetheless this should be fixed.  We do that in the patch below by
      replacing the is_hugepage_only_range() test with an explicit test of the
      VMA using is_vm_hugetlb_page().
      
      This in turn changes behaviour for platforms where is_hugepage_only_range()
      returns false always (everything except powerpc and ia64).  We address this
      by ensuring that hugetlb_free_pgd_range() is defined to be identical to
      free_pgd_range() (instead of a no-op) on everything except ia64.  Even so,
      it will prevent some otherwise possible coalescing of calls down to
      free_pgd_range().  Since this only happens for hugepage VMAs, removing this
      small optimization seems unlikely to cause any trouble.
      
      This patch causes no regressions on the libhugetlbfs testsuite - ppc64
      POWER5 (8-way), ppc64 G5 (2-way) and i386 Pentium M (UP).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9da61aef
  30. 17 3月, 2006 1 次提交
  31. 09 1月, 2006 2 次提交
    • A
      [PATCH] powerpc: sanitize header files for user space includes · 88ced031
      Arnd Bergmann 提交于
      include/asm-ppc/ had #ifdef __KERNEL__ in all header files that
      are not meant for use by user space, include/asm-powerpc does
      not have this yet.
      
      This patch gets us a lot closer there. There are a few cases
      where I was not sure, so I left them out. I have verified
      that no CONFIG_* symbols are used outside of __KERNEL__
      any more and that there are no obvious compile errors when
      including any of the headers in user space libraries.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      88ced031
    • D
      [PATCH] powerpc: Replace VMALLOCBASE with VMALLOC_START · 14c89e7f
      David Gibson 提交于
      On ppc64, we independently define VMALLOCBASE and VMALLOC_START to be
      the same thing: the start of the vmalloc() area at 0xd000000000000000.
      VMALLOC_START is used much more widely, including in generic code, so
      this patch gets rid of the extraneous VMALLOCBASE.
      
      This does require moving the definitions of region IDs from page_64.h
      to pgtable.h, but they don't clearly belong in the former rather than
      the latter, anyway.  While we're moving them, clean up the definitions
      of the REGION_IDs:
      	- Abolish REGION_SIZE, it was only used once, to define
      REGION_MASK anyway
      	- Define the specific region ids in terms of the REGION_ID()
      macro.
      	- Define KERNEL_REGION_ID in terms of PAGE_OFFSET rather than
      KERNELBASE.  It amounts to the same thing, but conceptually this is
      about the region of the linear mapping (which starts at PAGE_OFFSET)
      rather than of the kernel text itself (which is at KERNELBASE).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      14c89e7f
  32. 19 11月, 2005 2 次提交
  33. 07 11月, 2005 2 次提交