1. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: init_mm without ptlock · 872fec16
      Hugh Dickins 提交于
      First step in pushing down the page_table_lock.  init_mm.page_table_lock has
      been used throughout the architectures (usually for ioremap): not to serialize
      kernel address space allocation (that's usually vmlist_lock), but because
      pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
      
      Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
      architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
      and drop it when allocating a new one, to check lest a racing task already
      did.  Similarly no page_table_lock in vmalloc's map_vm_area.
      
      Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
      user mms, which are converted only by a later patch, for now they have to lock
      differently according to whether or not it's init_mm.
      
      If sources get muddled, there's a danger that an arch source taking
      init_mm.page_table_lock will be mixed with common source also taking it (or
      neither take it).  So break the rules and make another change, which should
      break the build for such a mismatch: remove the redundant mm arg from
      pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
      
      Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
      used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
      pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
      map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
      took page_table_lock for no good reason.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      872fec16
  2. 22 10月, 2005 1 次提交
    • B
      [PATCH] ppc64: Fix pages marked dirty abusively · 5d965515
      Benjamin Herrenschmidt 提交于
      While working on 64K pages, I found this little buglet in our
      update_mmu_cache() implementation.
      
      The code calls __hash_page() passing it an "access" parameter (the type
      of access that triggers the hash) containing the bits _PAGE_RW and
      _PAGE_USER of the linux PTE.  The latter is useless in this case and the
      former is wrong.  In fact, if we have a writeable PTE and we pass
      _PAGE_RW to hash_page(), it will set _PAGE_DIRTY (since we track dirty
      that way, by hash faulting !dirty) which is not what we want.
      
      In fact, the correct fix is to always pass 0. That means that only
      read-only or already dirty read write PTEs will be preloaded. The
      (hopefully rare) case of a non dirty read write PTE can't be preloaded
      this way, it will have to fault in hash_page on the actual access.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d965515
  3. 28 9月, 2005 1 次提交
  4. 24 9月, 2005 1 次提交
    • B
      [PATCH] ppc64: Fix huge pages MMU mapping bug · 67b10813
      Benjamin Herrenschmidt 提交于
      Current kernel has a couple of sneaky bugs in the ppc64 hugetlb code that
      cause huge pages to be potentially left stale in the hash table and TLBs
      (improperly invalidated), with all the nasty consequences that can have.
      
      One is that we forgot to set the "secondary" bit in the hash PTEs when
      hashing a huge page in the secondary bucket (fortunately very rare).
      
      The other one is on non-LPAR machines (like Apple G5s), flush_hash_range()
      which is used to flush a batch of PTEs simply did not work for huge pages.
      Historically, our huge page code didn't batch, but this was changed without
      fixing this routine.  This patch fixes both.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      67b10813
  5. 18 9月, 2005 1 次提交
  6. 12 9月, 2005 1 次提交
  7. 10 9月, 2005 1 次提交
  8. 09 9月, 2005 1 次提交
  9. 08 9月, 2005 1 次提交
  10. 06 9月, 2005 2 次提交
  11. 05 9月, 2005 1 次提交
    • B
      [PATCH] SPARSEMEM EXTREME · 802f192e
      Bob Picco 提交于
      A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME.  Architecture
      platforms with a very sparse physical address space would likely want to
      select this option.  For those architecture platforms that don't select the
      option, the code generated is equivalent to SPARSEMEM currently in -mm.
      I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.
      
      ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
      pointers to mem_sections.  This two level layout scheme is able to achieve
      smaller memory requirements for SPARSEMEM with the tradeoff of an
      additional shift and load when fetching the memory section.  The current
      SPARSEMEM -mm implementation is a one dimensional array of mem_sections
      which is the default SPARSEMEM configuration.  The patch attempts isolates
      the implementation details of the physical layout of the sparsemem section
      array.
      
      ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.
      
      I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
       I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
      and tested with aim.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      802f192e
  12. 02 9月, 2005 1 次提交
  13. 30 8月, 2005 1 次提交
    • D
      [PATCH] Remove nested feature sections · 8913ca1c
      David Gibson 提交于
      The {BEGIN,END}_FTR_SECTION asm macros used in ppc64 to nop out
      sections of code at runtime cannot be nested.  However, we do nest
      them in hash_low.S.  We get away with it there, because there is
      nothing between the BEGIN markers for each section.  However, that's
      confusing to someone reading the code.
      
      This patch removes the nested ifset and ifclr feature sections,
      replacing them with a single feature section in the full mask/value
      form.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8913ca1c
  14. 29 8月, 2005 6 次提交
    • D
      [PATCH] Dynamic hugepage addresses for ppc64 · c594adad
      David Gibson 提交于
      Paulus, I think this is now a reasonable candidate for the post-2.6.13
      queue.
      
      Relax address restrictions for hugepages on ppc64
      
      Presently, 64-bit applications on ppc64 may only use hugepages in the
      address region from 1-1.5T.  Furthermore, if hugepages are enabled in
      the kernel config, they may only use hugepages and never normal pages
      in this area.  This patch relaxes this restriction, allowing any
      address to be used with hugepages, but with a 1TB granularity.  That
      is if you map a hugepage anywhere in the region 1TB-2TB, that entire
      area will be reserved exclusively for hugepages for the remainder of
      the process's lifetime.  This works analagously to hugepages in 32-bit
      applications, where hugepages can be mapped anywhere, but with 256MB
      (mmu segment) granularity.
      
      This patch applies on top of the four level pagetable patch
      (http://patchwork.ozlabs.org/linuxppc64/patch?id=1936).
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c594adad
    • M
      [PATCH] ppc64: Remove physbase from the lmb_property struct · 180379dc
      Michael Ellerman 提交于
      We no longer need the lmb code to know about abs and phys addresses, so
      remove the physbase variable from the lmb_property struct.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      180379dc
    • M
      [PATCH] ppc64: Remove redundant abs_to_phys() macro · e88bcd1b
      Michael Ellerman 提交于
      abs_to_phys() is a macro that turns out to do nothing, and also has the
      unfortunate property that it's not the inverse of phys_to_abs() on iSeries.
      
      The following is for my benefit as much as everyone else.
      
      With CONFIG_MSCHUNKS enabled, the lmb code is changed such that it keeps
      a physbase variable for each lmb region. This is used to take the possibly
      discontiguous lmb regions and present them as a contiguous address space
      beginning from zero.
      
      In this context each lmb region's base address is its "absolute" base
      address, and its physbase is it's "physical" address (from Linux's point of
      view). The abs_to_phys() macro does the mapping from "absolute" to "physical".
      
      Note: This is not related to the iSeries mapping of physical to absolute
      (ie. Hypervisor) addresses which is maintained with the msChunks structure.
      And the msChunks structure is not controlled via CONFIG_MSCHUNKS.
      
      Once upon a time you could compile for non-iSeries with CONFIG_MSCHUNKS
      enabled. But these days CONFIG_MSCHUNKS depends on CONFIG_PPC_ISERIES, so
      for non-iSeries code abs_to_phys() is a no-op.
      
      On iSeries we always have one lmb region which spans from 0 to
      systemcfg->physicalMemorySize (arch/ppc64/kernel/iSeries_setup.c line 383).
      This region has a base (ie. absolute) address of 0, and a physbase address
      of 0 (as calculated in lmb_analyze() (arch/ppc64/kernel/lmb.c line 144)).
      
      On iSeries, abs_to_phys(aa) is defined as lmb_abs_to_phys(aa), which finds
      the lmb region containing aa (and there's only one, ie. 0), and then does:
      
       return lmb.memory.region[0].physbase + (aa - lmb.memory.region[0].base)
      
      physbase == base == 0, so you're left with "return aa".
      
      So remove abs_to_phys(), and lmb_abs_to_phys() which is the implementation
      of abs_to_phys() for iSeries.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e88bcd1b
    • M
      [PATCH] ppc64: Remove redundant uses of physRpn_to_absRpn · aefd16b0
      Michael Ellerman 提交于
      physRpn_to_absRpn is a no-op on non-iSeries platforms, remove the two
      redundant calls.
      
      There's only one caller on iSeries so fold the logic in there so we can get
      rid of it completely.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      aefd16b0
    • S
      [PATCH] ppc64: move iSeries vio iommu init · 8c65b5c9
      Stephen Rothwell 提交于
      Since the iSeries vio iommu tables cannot be used until after the vio bus has
      been initialised, move the initialisation of the tables to there.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8c65b5c9
    • D
      [PATCH] Four level pagetables for ppc64 · e28f7faf
      David Gibson 提交于
      Implement 4-level pagetables for ppc64
      
      This patch implements full four-level page tables for ppc64, thereby
      extending the usable user address range to 44 bits (16T).
      
      The patch uses a full page for the tables at the bottom and top level,
      and a quarter page for the intermediate levels.  It uses full 64-bit
      pointers at every level, thus also increasing the addressable range of
      physical memory.  This patch also tweaks the VSID allocation to allow
      matching range for user addresses (this halves the number of available
      contexts) and adds some #if and BUILD_BUG sanity checks.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e28f7faf
  15. 02 8月, 2005 1 次提交
  16. 28 7月, 2005 1 次提交
    • D
      [PATCH] ppc64: dynamically allocate segment tables · 533f0817
      David Gibson 提交于
      PPC64 machines before Power4 need a segment table page allocated for each
      CPU.  Currently these are allocated statically in a big array in head.S for
      all CPUs.  The segment tables need to be in the first segment (so
      do_stab_bolted doesn't take a recursive fault on the stab itself), but
      other than that there are no constraints which require the stabs for the
      secondary CPUs to be statically allocated.
      
      This patch allocates segment tables dynamically during boot, using
      lmb_alloc() to ensure they are within the first 256M segment.  This reduces
      the kernel image size by 192k...
      
      Tested on RS64 iSeries, POWER3 pSeries, and POWER5.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      533f0817
  17. 14 7月, 2005 1 次提交
  18. 26 6月, 2005 1 次提交
  19. 24 6月, 2005 3 次提交
  20. 22 6月, 2005 4 次提交
    • A
      [PATCH] ppc64: Mark kernel hptes dirty · 515bae9c
      Anton Blanchard 提交于
      We dont use the hardware referenced and changed bits and setting them early
      avoids a store to memory.  We already do this for userspace hptes but not
      kernel ones.  Do it.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      515bae9c
    • D
      [PATCH] ppc64: Abolish ioremap_mm · 20cee16c
      David Gibson 提交于
      Currently ppc64 has two mm_structs for the kernel, init_mm and also
      ioremap_mm.  The latter really isn't necessary: this patch abolishes it,
      instead restricting vmallocs to the lower 1TB of the init_mm's range and
      placing io mappings in the upper 1TB.  This simplifies the code in a number
      of places and eliminates an unecessary set of pagetables.  It also tweaks
      the unmap/free path a little, allowing us to remove the unmap_im_area() set
      of page table walkers, replacing them with unmap_vm_area().
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      20cee16c
    • W
      [PATCH] Avoiding mmap fragmentation · 1363c3cd
      Wolfgang Wander 提交于
      Ingo recently introduced a great speedup for allocating new mmaps using the
      free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
      causes huge performance increases in thread creation.
      
      The downside of this patch is that it does lead to fragmentation in the
      mmap-ed areas (visible via /proc/self/maps), such that some applications
      that work fine under 2.4 kernels quickly run out of memory on any 2.6
      kernel.
      
      The problem is twofold:
      
        1) the free_area_cache is used to continue a search for memory where
           the last search ended.  Before the change new areas were always
           searched from the base address on.
      
           So now new small areas are cluttering holes of all sizes
           throughout the whole mmap-able region whereas before small holes
           tended to close holes near the base leaving holes far from the base
           large and available for larger requests.
      
        2) the free_area_cache also is set to the location of the last
           munmap-ed area so in scenarios where we allocate e.g.  five regions of
           1K each, then free regions 4 2 3 in this order the next request for 1K
           will be placed in the position of the old region 3, whereas before we
           appended it to the still active region 1, placing it at the location
           of the old region 2.  Before we had 1 free region of 2K, now we only
           get two free regions of 1K -> fragmentation.
      
      The patch addresses thes issues by introducing yet another cache descriptor
      cached_hole_size that contains the largest known hole size below the
      current free_area_cache.  If a new request comes in the size is compared
      against the cached_hole_size and if the request can be filled with a hole
      below free_area_cache the search is started from the base instead.
      
      The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
      (earlier posted) leakme.c test program terminates after 50000+ iterations
      with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
      (as expected) with thread creation, Ingo's test_str02 with 20000 threads
      requires 0.7s system time.
      
      Taking out Ingo's patch (un-patch available per request) by basically
      deleting all mentions of free_area_cache from the kernel and starting the
      search for new memory always at the respective bases we observe: leakme
      terminates successfully with 11 distinctive hardly fragmented areas in
      /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
      time for Ingo's test_str02 with 20000 threads.
      
      Now - drumroll ;-) the appended patch works fine with leakme: it ends with
      only 7 distinct areas in /proc/self/maps and also thread creation seems
      sufficiently fast with 0.71s for 20000 threads.
      Signed-off-by: NWolfgang Wander <wwc@rentec.com>
      Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1363c3cd
    • D
      [PATCH] Hugepage consolidation · 63551ae0
      David Gibson 提交于
      A lot of the code in arch/*/mm/hugetlbpage.c is quite similar.  This patch
      attempts to consolidate a lot of the code across the arch's, putting the
      combined version in mm/hugetlb.c.  There are a couple of uglyish hacks in
      order to covert all the hugepage archs, but the result is a very large
      reduction in the total amount of code.  It also means things like hugepage
      lazy allocation could be implemented in one place, instead of six.
      
      Tested, at least a little, on ppc64, i386 and x86_64.
      
      Notes:
      	- this patch changes the meaning of set_huge_pte() to be more
      	  analagous to set_pte()
      	- does SH4 need s special huge_ptep_get_and_clear()??
      Acked-by: NWilliam Lee Irwin <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      63551ae0
  21. 06 5月, 2005 2 次提交
  22. 01 5月, 2005 3 次提交
  23. 20 4月, 2005 2 次提交
    • H
      [PATCH] freepgt: hugetlb area is clean · 021740dc
      Hugh Dickins 提交于
      Once we're strict about clearing away page tables, hugetlb_prefault can assume
      there are no page tables left within its range.  Since the other arches
      continue if !pte_none here, let i386 do the same.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      021740dc
    • H
      [PATCH] freepgt: hugetlb_free_pgd_range · 3bf5ee95
      Hugh Dickins 提交于
      ia64 and ppc64 had hugetlb_free_pgtables functions which were no longer being
      called, and it wasn't obvious what to do about them.
      
      The ppc64 case turns out to be easy: the associated tables are noted elsewhere
      and freed later, safe to either skip its hugetlb areas or go through the
      motions of freeing nothing.  Since ia64 does need a special case, restore to
      ppc64 the special case of skipping them.
      
      The ia64 hugetlb case has been broken since pgd_addr_end went in, though it
      probably appeared to work okay if you just had one such area; in fact it's
      been broken much longer if you consider a long munmap spanning from another
      region into the hugetlb region.
      
      In the ia64 hugetlb region, more virtual address bits are available than in
      the other regions, yet the page tables are structured the same way: the page
      at the bottom is larger.  Here we need to scale down each addr before passing
      it to the standard free_pgd_range.  Was about to write a hugely_scaled_down
      macro, but found htlbpage_to_page already exists for just this purpose.  Fixed
      off-by-one in ia64 is_hugepage_only_range.
      
      Uninline free_pgd_range to make it available to ia64.  Make sure the
      vma-gathering loop in free_pgtables cannot join a hugepage_only_range to any
      other (safe to join huges?  probably but don't bother).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3bf5ee95
  24. 17 4月, 2005 2 次提交
    • B
      [PATCH] ppc64: Fix semantics of __ioremap · dfbacdc1
      Benjamin Herrenschmidt 提交于
      This patch fixes ppc64 __ioremap() so that it stops adding implicitely
      _PAGE_GUARDED when the cache is not writeback, and instead, let the callers
      provide the flag they want here.  This allows things like framebuffers to
      explicitely request a non-cacheable and non-guarded mapping which is more
      efficient for that type of memory without side effects.  The patch also
      fixes all current callers to add _PAGE_GUARDED except btext, which is fine
      without it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dfbacdc1
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4