1. 02 5月, 2007 4 次提交
  2. 24 4月, 2007 2 次提交
    • D
      [POWERPC] Abolish PHYS_FMT macro from arch/powerpc · 37f01d64
      David Gibson 提交于
      32-bit powerpc systems define a macro, PHYS_FMT, giving a printf
      format string fragment for displaying physical addresses, since most
      32-bit powerpc platforms use 32-bit physical addresses but a few use
      64-bit physical addresses.
      
      This macro is used in exactly one place, a rare error message, where
      we can solve the problem more simply by just unconditionally casting
      the address up to 64-bit quantity before formatting it.
      
      This patch does so, meaning that as we bring MMU definitions from
      asm-ppc over to asm-powerpc, cleaning them up in the process, we don't
      need to implement this ugly macro (which additionally has a very bad
      name for something global).
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      37f01d64
    • D
      [POWERPC] Cleanup and fix breakage in tlbflush.h · 62102307
      David Gibson 提交于
      BenH's commit a741e679 in powerpc.git,
      although (AFAICT) only intended to affect ppc64, also has side-effects
      which break 44x.  I think 40x, 8xx and Freescale Book E are also
      affected, though I haven't tested them.
      
      The problem lies in unconditionally removing flush_tlb_pending() from
      the versions of flush_tlb_mm(), flush_tlb_range() and
      flush_tlb_kernel_range() used on ppc64 - which are also used the
      embedded platforms mentioned above.
      
      The patch below cleans up the convoluted #ifdef logic in tlbflush.h,
      in the process restoring the necessary flushes for the software TLB
      platforms.  There are three sets of definitions for the flushing
      hooks: the software TLB versions (revised to avoid using names which
      appear to related to TLB batching), the 32-bit hash based versions
      (external functions) amd the 64-bit hash based versions (which
      implement batching).
      
      It also moves the declaration of update_mmu_cache() to always be in
      tlbflush.h (previously it was in tlbflush.h except for PPC64, where it
      was in pgtable.h).
      
      Booted on Ebony (440GP) and compiled for 64-bit and 32-bit
      multiplatform.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      62102307
  3. 13 4月, 2007 9 次提交
    • B
      [POWERPC] DEBUG_PAGEALLOC for 64-bit · 370a908d
      Benjamin Herrenschmidt 提交于
      Here's an implementation of DEBUG_PAGEALLOC for 64 bits powerpc.
      It applies on top of the 32 bits patch.
      
      Unlike Anton's previous attempt, I'm not using updatepp. I'm removing
      the hash entries from the bolted mapping (using a map in RAM of all the
      slots). Expensive but it doesn't really matter, does it ? :-)
      
      Memory hot-added doesn't benefit from this unless it's added at an
      address that is below end_of_DRAM() as calculated at boot time.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/Kconfig.debug      |    2
       arch/powerpc/mm/hash_utils_64.c |   84 ++++++++++++++++++++++++++++++++++++++--
       2 files changed, 82 insertions(+), 4 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      370a908d
    • B
      [POWERPC] DEBUG_PAGEALLOC for 32-bit · 88df6e90
      Benjamin Herrenschmidt 提交于
      Here's an implementation of DEBUG_PAGEALLOC for ppc32. It disables BAT
      mapping and is only tested with Hash table based processor though it
      shouldn't be too hard to adapt it to others.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/Kconfig.debug       |    9 ++++++
       arch/powerpc/mm/init_32.c        |    4 +++
       arch/powerpc/mm/pgtable_32.c     |   52 +++++++++++++++++++++++++++++++++++++++
       arch/powerpc/mm/ppc_mmu_32.c     |    4 ++-
       include/asm-powerpc/cacheflush.h |    6 ++++
       5 files changed, 74 insertions(+), 1 deletion(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      88df6e90
    • B
      [POWERPC] Fix 32-bit mm operations when not using BATs · ee4f2ea4
      Benjamin Herrenschmidt 提交于
      On hash table based 32 bits powerpc's, the hash management code runs with
      a big spinlock. It's thus important that it never causes itself a hash
      fault. That code is generally safe (it does memory accesses in real mode
      among other things) with the exception of the actual access to the code
      itself. That is, the kernel text needs to be accessible without taking
      a hash miss exceptions.
      
      This is currently guaranteed by having a BAT register mapping part of the
      linear mapping permanently, which includes the kernel text. But this is
      not true if using the "nobats" kernel command line option (which can be
      useful for debugging) and will not be true when using DEBUG_PAGEALLOC
      implemented in a subsequent patch.
      
      This patch fixes this by pre-faulting in the hash table pages that hit
      the kernel text, and making sure we never evict such a page under hash
      pressure.
      Signed-off-by: NBenjamin Herrenchmidt <benh@kernel.crashing.org>
      
       arch/powerpc/mm/hash_low_32.S |   22 ++++++++++++++++++++--
       arch/powerpc/mm/mem.c         |    3 ---
       arch/powerpc/mm/mmu_decl.h    |    4 ++++
       arch/powerpc/mm/pgtable_32.c  |   11 +++++++----
       4 files changed, 31 insertions(+), 9 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ee4f2ea4
    • B
      [POWERPC] Cleanup 32-bit map_page · 3be4e699
      Benjamin Herrenschmidt 提交于
      The 32 bits map_page() function is used internally by the mm code
      for early mmu mappings and for ioremap. It should never be called
      for an address that already has a valid PTE or hash entry, so we
      add a BUG_ON for that and remove the useless flush_HPTE call.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      
       arch/powerpc/mm/pgtable_32.c |    9 ++++++---
       1 file changed, 6 insertions(+), 3 deletions(-)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      3be4e699
    • B
      [POWERPC] Make tlb flush batch use lazy MMU mode · a741e679
      Benjamin Herrenschmidt 提交于
      The current tlb flush code on powerpc 64 bits has a subtle race since we
      lost the page table lock due to the possible faulting in of new PTEs
      after a previous one has been removed but before the corresponding hash
      entry has been evicted, which can leads to all sort of fatal problems.
      
      This patch reworks the batch code completely. It doesn't use the mmu_gather
      stuff anymore. Instead, we use the lazy mmu hooks that were added by the
      paravirt code. They have the nice property that the enter/leave lazy mmu
      mode pair is always fully contained by the PTE lock for a given range
      of PTEs. Thus we can guarantee that all batches are flushed on a given
      CPU before it drops that lock.
      
      We also generalize batching for any PTE update that require a flush.
      
      Batching is now enabled on a CPU by arch_enter_lazy_mmu_mode() and
      disabled by arch_leave_lazy_mmu_mode(). The code epects that this is
      always contained within a PTE lock section so no preemption can happen
      and no PTE insertion in that range from another CPU. When batching
      is enabled on a CPU, every PTE updates that need a hash flush will
      use the batch for that flush.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a741e679
    • S
    • P
      [POWERPC] Allow drivers to map individual 4k pages to userspace · 721151d0
      Paul Mackerras 提交于
      Some drivers have resources that they want to be able to map into
      userspace that are 4k in size.  On a kernel configured with 64k pages
      we currently end up mapping the 4k we want plus another 60k of
      physical address space, which could contain anything.  This can
      introduce security problems, for example in the case of an infiniband
      adaptor where the other 60k could contain registers that some other
      program is using for its communications.
      
      This patch adds a new function, remap_4k_pfn, which drivers can use to
      map a single 4k page to userspace regardless of whether the kernel is
      using a 4k or a 64k page size.  Like remap_pfn_range, it would
      typically be called in a driver's mmap function.  It only maps a
      single 4k page, which on a 64k page kernel appears replicated 16 times
      throughout a 64k page.  On a 4k page kernel it reduces to a call to
      remap_pfn_range.
      
      The way this works on a 64k kernel is that a new bit, _PAGE_4K_PFN,
      gets set on the linux PTE.  This alters the way that __hash_page_4K
      computes the real address to put in the HPTE.  The RPN field of the
      linux PTE becomes the 4k RPN directly rather than being interpreted as
      a 64k RPN.  Since the RPN field is 32 bits, this means that physical
      addresses being mapped with remap_4k_pfn have to be below 2^44,
      i.e. 0x100000000000.
      
      The patch also factors out the code in arch/powerpc/mm/hash_utils_64.c
      that deals with demoting a process to use 4k pages into one function
      that gets called in the various different places where we need to do
      that.  There were some discrepancies between exactly what was done in
      the various places, such as a call to spu_flush_all_slbs in one case
      but not in others.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      721151d0
    • S
      [POWERPC] Rename prom_n_size_cells to of_n_size_cells · 9213feea
      Stephen Rothwell 提交于
      This is more consistent and gets us closer to the Sparc code.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9213feea
    • S
      [POWERPC] Rename prom_n_addr_cells to of_n_addr_cells · a8bda5dd
      Stephen Rothwell 提交于
      This is more consistent and gets us closer to the Sparc code.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      a8bda5dd
  4. 10 3月, 2007 1 次提交
    • B
      [POWERPC] Fix spu SLB invalidations · 94b2a439
      Benjamin Herrenschmidt 提交于
      The SPU code doesn't properly invalidate SPUs SLBs when necessary,
      for example when changing a segment size from the hugetlbfs code. In
      addition, it saves and restores the SLB content on context switches
      which makes it harder to properly handle those invalidations.
      
      This patch removes the saving & restoring for now, something more
      efficient might be found later on. It also adds a spu_flush_all_slbs(mm)
      that can be used by the core mm code to flush the SLBs of all SPEs that
      are running a given mm at the time of the flush.
      
      In order to do that, it adds a spinlock to the list of all SPEs and move
      some bits & pieces from spufs to spu_base.c
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94b2a439
  5. 08 3月, 2007 1 次提交
    • D
      [POWERPC] Allow duplicate lmb_reserve() calls · eb6de286
      David Gibson 提交于
      At present calling lmb_reserve() (and hence lmb_add_region()) twice
      for exactly the same memory region will cause strange behaviour.
      
      This makes life difficult when booting from a flat device tree with
      memory reserve map.  Which regions are automatically reserved by the
      kernel has changed over time, so it's quite possible a newer kernel
      could attempt to auto-reserve a region which is also explicitly listed
      in the device tree's reserve map, leading to trouble.
      
      This patch avoids the problem by making lmb_reserve() ignore a call to
      reserve a previously reserved region.  It also removes a now redundant
      test designed to avoid one specific case of the problem noted above.
      
      At present, this patch deals only with duplicate reservations of an
      identical region.  Attempting to reserve two different, but
      overlapping regions will still cause problems.  I might post another
      patch later dealing with this case, but I'm avoiding it now since it
      is substantially more complicated to deal with, less likely to occur
      and more likely to indicate a genuine bug elsewhere if it does occur.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      eb6de286
  6. 18 2月, 2007 1 次提交
  7. 16 2月, 2007 1 次提交
  8. 13 2月, 2007 1 次提交
    • B
      [POWERPC] Fix vDSO page count calculation · 7ac9a137
      Benjamin Herrenschmidt 提交于
      The recent vDSO consolidation patches broke powerpc due to a mistake
      in the definition of MAXPAGES constants. This fixes it by moving to
      a dynamically allocated array of pages instead as I don't like much
      hard coded size limits. Also move the vdso initialisation to an initcall
      since it doesn't really need to be done -that- early.
      
      Applogies for not catching the breakage earlier, Roland _did_ CC me on
      his patches a while ago, I got busy with other things and forgot to test
      them.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7ac9a137
  9. 09 2月, 2007 1 次提交
    • K
      [POWERPC] Fix is_power_of_4(x) compile error · 8dabba5d
      Kumar Gala 提交于
      When building an 85xx kernel we get:
      
        CC      arch/powerpc/mm/pgtable_32.o
      arch/powerpc/mm/pgtable_32.c: In function 'io_block_mapping':
      arch/powerpc/mm/pgtable_32.c:330: error: expected identifier before '(' token
      arch/powerpc/mm/pgtable_32.c:330: error: expected statement before ')' token
      
      The is_power_of_2(x) fixup patch left an extra ')' on the is_power_of_4 macro.
      There is a similiar issue on the arch/ppc side.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      8dabba5d
  10. 08 2月, 2007 1 次提交
  11. 07 2月, 2007 2 次提交
  12. 24 1月, 2007 1 次提交
  13. 09 1月, 2007 1 次提交
    • D
      [POWERPC] Fix bogus BUG_ON() in in hugetlb_get_unmapped_area() · 6aa3e1e9
      David Gibson 提交于
      The powerpc specific version of hugetlb_get_unmapped_area() makes some
      unwarranted assumptions about what checks have been made to its
      parameters by its callers.  This will lead to a BUG_ON() if a 32-bit
      process attempts to make a hugepage mapping which extends above
      TASK_SIZE (4GB).
      
      I'm not sure if these assumptions came about because they were valid
      with earlier versions of the get_unmapped_area() path, or if it was
      always broken.  Nonetheless this patch fixes the logic, and removes
      the crash.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      6aa3e1e9
  14. 14 12月, 2006 1 次提交
    • R
      [PATCH] getting rid of all casts of k[cmz]alloc() calls · 5cbded58
      Robert P. J. Day 提交于
      Run this:
      
      	#!/bin/sh
      	for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
      	  echo "De-casting $f..."
      	  perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
      	done
      
      And then go through and reinstate those cases where code is casting pointers
      to non-pointers.
      
      And then drop a few hunks which conflicted with outstanding work.
      
      Cc: Russell King <rmk@arm.linux.org.uk>, Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Paul Fulghum <paulkf@microgate.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Karsten Keil <kkeil@suse.de>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5cbded58
  15. 11 12月, 2006 1 次提交
    • P
      [POWERPC] Support ibm,dynamic-reconfiguration-memory nodes · 0204568a
      Paul Mackerras 提交于
      For PAPR partitions with large amounts of memory, the firmware has an
      alternative, more compact representation for the information about the
      memory in the partition and its NUMA associativity information.  This
      adds the code to the kernel to parse this alternative representation.
      
      The other part of this patch is telling the firmware that we can
      handle the alternative representation.  There is however a subtlety
      here, because the firmware will invoke a reboot if the memory
      representation we request is different from the representation that
      firmware is currently using.  This is because firmware can't change
      the representation on the fly.  Further, some firmware versions used
      on POWER5+ machines have a bug where this reboot leaves the machine
      with an altered value of load-base, which will prevent any kernel
      booting until it is reset to the normal value (0x4000).  Because of
      this bug, we do NOT set fake_elf.rpanote.new_mem_def = 1, and thus we
      do not request the new representation on POWER5+ and earlier machines.
      We do request the new representation on POWER6, which uses the
      ibm,client-architecture-support call.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0204568a
  16. 08 12月, 2006 2 次提交
    • C
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter 提交于
      Replace all uses of kmem_cache_t with struct kmem_cache.
      
      The patch was generated using the following script:
      
      	#!/bin/sh
      	#
      	# Replace one string by another in all the kernel sources.
      	#
      
      	set -e
      
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      	done
      
      The script was run like this
      
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e18b890b
    • C
      [PATCH] shared page table for hugetlb page · 39dde65c
      Chen, Kenneth W 提交于
      Following up with the work on shared page table done by Dave McCracken.  This
      set of patch target shared page table for hugetlb memory only.
      
      The shared page table is particular useful in the situation of large number of
      independent processes sharing large shared memory segments.  In the normal
      page case, the amount of memory saved from process' page table is quite
      significant.  For hugetlb, the saving on page table memory is not the primary
      objective (as hugetlb itself already cuts down page table overhead
      significantly), instead, the purpose of using shared page table on hugetlb is
      to allow faster TLB refill and smaller cache pollution upon TLB miss.
      
      With PT sharing, pte entries are shared among hundreds of processes, the cache
      consumption used by all the page table is smaller and in return, application
      gets much higher cache hit ratio.  One other effect is that cache hit ratio
      with hardware page walker hitting on pte in cache will be higher and this
      helps to reduce tlb miss latency.  These two effects contribute to higher
      application performance.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Dave McCracken <dmccr@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39dde65c
  17. 04 12月, 2006 7 次提交
  18. 15 11月, 2006 1 次提交
    • H
      [PATCH] hugetlb: prepare_hugepage_range check offset too · 68589bc3
      Hugh Dickins 提交于
      (David:)
      
      If hugetlbfs_file_mmap() returns a failure to do_mmap_pgoff() - for example,
      because the given file offset is not hugepage aligned - then do_mmap_pgoff
      will go to the unmap_and_free_vma backout path.
      
      But at this stage the vma hasn't been marked as hugepage, and the backout path
      will call unmap_region() on it.  That will eventually call down to the
      non-hugepage version of unmap_page_range().  On ppc64, at least, that will
      cause serious problems if there are any existing hugepage pagetable entries in
      the vicinity - for example if there are any other hugepage mappings under the
      same PUD.  unmap_page_range() will trigger a bad_pud() on the hugepage pud
      entries.  I suspect this will also cause bad problems on ia64, though I don't
      have a machine to test it on.
      
      (Hugh:)
      
      prepare_hugepage_range() should check file offset alignment when it checks
      virtual address and length, to stop MAP_FIXED with a bad huge offset from
      unmapping before it fails further down.  PowerPC should apply the same
      prepare_hugepage_range alignment checks as ia64 and all the others do.
      
      Then none of the alignment checks in hugetlbfs_file_mmap are required (nor
      is the check for too small a mapping); but even so, move up setting of
      VM_HUGETLB and add a comment to warn of what David Gibson discovered - if
      hugetlbfs_file_mmap fails before setting it, do_mmap_pgoff's unmap_region
      when unwinding from error will go the non-huge way, which may cause bad
      behaviour on architectures (powerpc and ia64) which segregate their huge
      mappings into a separate region of the address space.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68589bc3
  19. 13 11月, 2006 1 次提交
  20. 01 11月, 2006 1 次提交