1. 07 1月, 2009 1 次提交
    • H
      badpage: vm_normal_page use print_bad_pte · 22b31eec
      Hugh Dickins 提交于
      print_bad_pte() is so far being called only when zap_pte_range() finds
      negative page_mapcount, or there's a fault on a pte_file where it does not
      belong.  That's weak coverage when we suspect pagetable corruption.
      
      Originally, it was called when vm_normal_page() found an invalid pfn: but
      pfn_valid is expensive on some architectures and configurations, so 2.6.24
      put that under CONFIG_DEBUG_VM (which doesn't help in the field), then
      2.6.26 replaced it by a VM_BUG_ON (likewise).
      
      Reinstate the print_bad_pte() in vm_normal_page(), but use a cheaper test
      than pfn_valid(): memmap_init_zone() (used in bootup and hotplug) keep a
      __read_mostly note of the highest_memmap_pfn, vm_normal_page() then check
      pfn against that.  We could call this pfn_plausible() or pfn_sane(), but I
      doubt we'll need it elsewhere: of course it's not reliable, but gives much
      stronger pagetable validation on many boxes.
      
      Also use print_bad_pte() when the pte_special bit is found outside a
      VM_PFNMAP or VM_MIXEDMAP area, instead of VM_BUG_ON.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22b31eec
  2. 07 11月, 2008 2 次提交
    • A
      hugetlb: pull gigantic page initialisation out of the default path · 18229df5
      Andy Whitcroft 提交于
      As we can determine exactly when a gigantic page is in use we can optimise
      the common regular page cases by pulling out gigantic page initialisation
      into its own function.  As gigantic pages are never released to buddy we
      do not need a destructor.  This effectivly reverts the previous change to
      the main buddy allocator.  It also adds a paranoid check to ensure we
      never release gigantic pages from hugetlbfs to the main buddy.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18229df5
    • A
      hugetlbfs: handle pages higher order than MAX_ORDER · 69d177c2
      Andy Whitcroft 提交于
      When working with hugepages, hugetlbfs assumes that those hugepages are
      smaller than MAX_ORDER.  Specifically it assumes that the mem_map is
      contigious and uses that to optimise access to the elements of the mem_map
      that represent the hugepage.  Gigantic pages (such as 16GB pages on
      powerpc) by definition are of greater order than MAX_ORDER (larger than
      MAX_ORDER_NR_PAGES in size).  This means that we can no longer make use of
      the buddy alloctor guarentees for the contiguity of the mem_map, which
      ensures that the mem_map is at least contigious for maximmally aligned
      areas of MAX_ORDER_NR_PAGES pages.
      
      This patch adds new mem_map accessors and iterator helpers which handle
      any discontiguity at MAX_ORDER_NR_PAGES boundaries.  It then uses these to
      implement gigantic page versions of copy_huge_page and clear_huge_page,
      and to allow follow_hugetlb_page handle gigantic pages.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: <stable@kernel.org>		[2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69d177c2
  3. 20 10月, 2008 6 次提交
    • L
      mlock: count attempts to free mlocked page · 985737cf
      Lee Schermerhorn 提交于
      Allow free of mlock()ed pages.  This shouldn't happen, but during
      developement, it occasionally did.
      
      This patch allows us to survive that condition, while keeping the
      statistics and events correct for debug.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      985737cf
    • N
      vmstat: mlocked pages statistics · 5344b7e6
      Nick Piggin 提交于
      Add NR_MLOCK zone page state, which provides a (conservative) count of
      mlocked pages (actually, the number of mlocked pages moved off the LRU).
      
      Reworked by lts to fit in with the modified mlock page support in the
      Reclaim Scalability series.
      
      [kosaki.motohiro@jp.fujitsu.com: fix incorrect Mlocked field of /proc/meminfo]
      [lee.schermerhorn@hp.com: mlocked-pages: add event counting with statistics]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5344b7e6
    • R
      mmap: handle mlocked pages during map, remap, unmap · ba470de4
      Rik van Riel 提交于
      Originally by Nick Piggin <npiggin@suse.de>
      
      Remove mlocked pages from the LRU using "unevictable infrastructure"
      during mmap(), munmap(), mremap() and truncate().  Try to move back to
      normal LRU lists on munmap() when last mlocked mapping removed.  Remove
      PageMlocked() status when page truncated from file.
      
      [akpm@linux-foundation.org: cleanup]
      [kamezawa.hiroyu@jp.fujitsu.com: fix double unlock_page()]
      [kosaki.motohiro@jp.fujitsu.com: split LRU: munlock rework]
      [lee.schermerhorn@hp.com: mlock: fix __mlock_vma_pages_range comment block]
      [akpm@linux-foundation.org: remove bogus kerneldoc token]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamewzawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba470de4
    • N
      mlock: mlocked pages are unevictable · b291f000
      Nick Piggin 提交于
      Make sure that mlocked pages also live on the unevictable LRU, so kswapd
      will not scan them over and over again.
      
      This is achieved through various strategies:
      
      1) add yet another page flag--PG_mlocked--to indicate that
         the page is locked for efficient testing in vmscan and,
         optionally, fault path.  This allows early culling of
         unevictable pages, preventing them from getting to
         page_referenced()/try_to_unmap().  Also allows separate
         accounting of mlock'd pages, as Nick's original patch
         did.
      
         Note:  Nick's original mlock patch used a PG_mlocked
         flag.  I had removed this in favor of the PG_unevictable
         flag + an mlock_count [new page struct member].  I
         restored the PG_mlocked flag to eliminate the new
         count field.
      
      2) add the mlock/unevictable infrastructure to mm/mlock.c,
         with internal APIs in mm/internal.h.  This is a rework
         of Nick's original patch to these files, taking into
         account that mlocked pages are now kept on unevictable
         LRU list.
      
      3) update vmscan.c:page_evictable() to check PageMlocked()
         and, if vma passed in, the vm_flags.  Note that the vma
         will only be passed in for new pages in the fault path;
         and then only if the "cull unevictable pages in fault
         path" patch is included.
      
      4) add try_to_unlock() to rmap.c to walk a page's rmap and
         ClearPageMlocked() if no other vmas have it mlocked.
         Reuses as much of try_to_unmap() as possible.  This
         effectively replaces the use of one of the lru list links
         as an mlock count.  If this mechanism let's pages in mlocked
         vmas leak through w/o PG_mlocked set [I don't know that it
         does], we should catch them later in try_to_unmap().  One
         hopes this will be rare, as it will be relatively expensive.
      
      Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      
      splitlru: introduce __get_user_pages():
      
        New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
        because current get_user_pages() can't grab PROT_NONE pages theresore it
        cause PROT_NONE pages can't munlock.
      
      [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
      [akpm@linux-foundation.org: untangle patch interdependencies]
      [akpm@linux-foundation.org: fix things after out-of-order merging]
      [hugh@veritas.com: fix page-flags mess]
      [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
      [kosaki.motohiro@jp.fujitsu.com: build fix]
      [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
      [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b291f000
    • L
      Unevictable LRU Infrastructure · 894bc310
      Lee Schermerhorn 提交于
      When the system contains lots of mlocked or otherwise unevictable pages,
      the pageout code (kswapd) can spend lots of time scanning over these
      pages.  Worse still, the presence of lots of unevictable pages can confuse
      kswapd into thinking that more aggressive pageout modes are required,
      resulting in all kinds of bad behaviour.
      
      Infrastructure to manage pages excluded from reclaim--i.e., hidden from
      vmscan.  Based on a patch by Larry Woodman of Red Hat.  Reworked to
      maintain "unevictable" pages on a separate per-zone LRU list, to "hide"
      them from vmscan.
      
      Kosaki Motohiro added the support for the memory controller unevictable
      lru list.
      
      Pages on the unevictable list have both PG_unevictable and PG_lru set.
      Thus, PG_unevictable is analogous to and mutually exclusive with
      PG_active--it specifies which LRU list the page is on.
      
      The unevictable infrastructure is enabled by a new mm Kconfig option
      [CONFIG_]UNEVICTABLE_LRU.
      
      A new function 'page_evictable(page, vma)' in vmscan.c tests whether or
      not a page may be evictable.  Subsequent patches will add the various
      !evictable tests.  We'll want to keep these tests light-weight for use in
      shrink_active_list() and, possibly, the fault path.
      
      To avoid races between tasks putting pages [back] onto an LRU list and
      tasks that might be moving the page from non-evictable to evictable state,
      the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()'
      -- tests the "evictability" of a page after placing it on the LRU, before
      dropping the reference.  If the page has become unevictable,
      putback_lru_page() will redo the 'putback', thus moving the page to the
      unevictable list.  This way, we avoid "stranding" evictable pages on the
      unevictable list.
      
      [akpm@linux-foundation.org: fix fallout from out-of-order merge]
      [riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build]
      [nishimura@mxp.nes.nec.co.jp: remove redundant mapping check]
      [kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework]
      [kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c]
      [kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure]
      [kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch]
      [kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Debugged-by: NBenjamin Kidwell <benjkidwell@yahoo.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      894bc310
    • N
      vmscan: move isolate_lru_page() to vmscan.c · 62695a84
      Nick Piggin 提交于
      On large memory systems, the VM can spend way too much time scanning
      through pages that it cannot (or should not) evict from memory.  Not only
      does it use up CPU time, but it also provokes lock contention and can
      leave large systems under memory presure in a catatonic state.
      
      This patch series improves VM scalability by:
      
      1) putting filesystem backed, swap backed and unevictable pages
         onto their own LRUs, so the system only scans the pages that it
         can/should evict from memory
      
      2) switching to two handed clock replacement for the anonymous LRUs,
         so the number of pages that need to be scanned when the system
         starts swapping is bound to a reasonable number
      
      3) keeping unevictable pages off the LRU completely, so the
         VM does not waste CPU time scanning them. ramfs, ramdisk,
         SHM_LOCKED shared memory segments and mlock()ed VMA pages
         are keept on the unevictable list.
      
      This patch:
      
      isolate_lru_page logically belongs to be in vmscan.c than migrate.c.
      
      It is tough, because we don't need that function without memory migration
      so there is a valid argument to have it in migrate.c.  However a
      subsequent patch needs to make use of it in the core mm, so we can happily
      move it to vmscan.c.
      
      Also, make the function a little more generic by not requiring that it
      adds an isolated page to a given list.  Callers can do that.
      
      	Note that we now have '__isolate_lru_page()', that does
      	something quite different, visible outside of vmscan.c
      	for use with memory controller.  Methinks we need to
      	rationalize these names/purposes.	--lts
      
      [akpm@linux-foundation.org: fix mm/memory_hotplug.c build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62695a84
  4. 25 7月, 2008 6 次提交
  5. 28 4月, 2008 1 次提交
    • Y
      memory hotplug: free memmaps allocated by bootmem · 0c0a4a51
      Yasunori Goto 提交于
      This patch is to free memmaps which is allocated by bootmem.
      
      Freeing usemap is not necessary.  The pages of usemap may be necessary for
      other sections.
      
      If removing section is last section on the node, its section is the final user
      of usemap page.  (usemaps are allocated on its section by previous patch.) But
      it shouldn't be freed too, because the section must be logical offline state
      which all pages are isolated against page allocater.  If it is freed, page
      alloctor may use it which will be removed physically soon.  It will be
      disaster.  So, this patch keeps it as it is.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c0a4a51
  6. 24 2月, 2008 1 次提交
    • A
      Solve section mismatch for free_area_init_core. · b5a0e011
      Alexander van Heukelum 提交于
      WARNING: vmlinux.o(.meminit.text+0x649):
      Section mismatch in reference from the
      function free_area_init_core() to the function .init.text:setup_usemap()
      The function __meminit free_area_init_core() references
      a function __init setup_usemap().
      If free_area_init_core is only used by setup_usemap then
      annotate free_area_init_core with a matching annotation.
      
      The warning is covers this stack of functions in mm/page_alloc.c:
      
      alloc_bootmem_node must be marked __init.
      alloc_bootmem_node is used by setup_usemap, if !SPARSEMEM.
      (usemap_size is only used by setup_usemap, if !SPARSEMEM.)
      setup_usemap is only used by free_area_init_core.
      free_area_init_core is only used by free_area_init_node.
      
      free_area_init_node is used by:
      arch/alpha/mm/numa.c: __init paging_init()
      arch/arm/mm/init.c: __init bootmem_init_node()
      arch/avr32/mm/init.c: __init paging_init()
      arch/cris/arch-v10/mm/init.c: __init paging_init()
      arch/cris/arch-v32/mm/init.c: __init paging_init()
      arch/m32r/mm/discontig.c: __init zone_sizes_init()
      arch/m32r/mm/init.c: __init zone_sizes_init()
      arch/m68k/mm/motorola.c: __init paging_init()
      arch/m68k/mm/sun3mmu.c: __init paging_init()
      arch/mips/sgi-ip27/ip27-memory.c: __init paging_init()
      arch/parisc/mm/init.c: __init paging_init()
      arch/sparc/mm/srmmu.c: __init srmmu_paging_init()
      arch/sparc/mm/sun4c.c: __init sun4c_paging_init()
      arch/sparc64/mm/init.c: __init paging_init()
      mm/page_alloc.c: __init free_area_init_nodes()
      mm/page_alloc.c: __init free_area_init()
      and
      mm/memory_hotplug.c: hotadd_new_pgdat()
      
      hotadd_new_pgdat can not be an __init function, but:
      
      It is compiled for MEMORY_HOTPLUG configurations only
      MEMORY_HOTPLUG depends on SPARSEMEM || X86_64_ACPI_NUMA
      X86_64_ACPI_NUMA depends on X86_64
      ARCH_FLATMEM_ENABLE depends on X86_32
      ARCH_DISCONTIGMEM_ENABLE depends on X86_32
      So X86_64_ACPI_NUMA implies SPARSEMEM, right?
      
      So we can mark the stack of functions __init for !SPARSEMEM, but we must mark
      them __meminit for SPARSEMEM configurations.  This is ok, because then the
      calls to alloc_bootmem_node are also avoided.
      
      Compile-tested on:
      silly minimal config
      defconfig x86_32
      defconfig x86_64
      defconfig x86_64 -HIBERNATION +MEMORY_HOTPLUG
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Reviewed-by: NSam Ravnborg <sam@ravnborg.org>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5a0e011
  7. 06 2月, 2008 2 次提交
  8. 17 10月, 2007 1 次提交
  9. 08 5月, 2007 1 次提交
    • C
      Make page->private usable in compound pages · d85f3385
      Christoph Lameter 提交于
      If we add a new flag so that we can distinguish between the first page and the
      tail pages then we can avoid to use page->private in the first page.
      page->private == page for the first page, so there is no real information in
      there.
      
      Freeing up page->private makes the use of compound pages more transparent.
      They become more usable like real pages.  Right now we have to be careful f.e.
       if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
      can then no longer use the private field.  This is one of the issues that
      cause us not to support debugging for page size slabs in SLAB.
      
      Having page->private available for SLUB would allow more meta information in
      the page struct.  I can probably avoid the 16 bit ints that I have in there
      right now.
      
      Also if page->private is available then a compound page may be equipped with
      buffer heads.  This may free up the way for filesystems to support larger
      blocks than page size.
      
      We add PageTail as an alias of PageReclaim.  Compound pages cannot currently
      be reclaimed.  Because of the alias one needs to check PageCompound first.
      
      The RFC for the this approach was discussed at
      http://marc.info/?t=117574302800001&r=1&w=2
      
      [nacc@us.ibm.com: fix hugetlbfs]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d85f3385
  10. 26 9月, 2006 1 次提交
    • N
      [PATCH] mm: VM_BUG_ON · 725d704e
      Nick Piggin 提交于
      Introduce a VM_BUG_ON, which is turned on with CONFIG_DEBUG_VM.  Use this
      in the lightweight, inline refcounting functions; PageLRU and PageActive
      checks in vmscan, because they're pretty well confined to vmscan.  And in
      page allocate/free fastpaths which can be the hottest parts of the kernel
      for kbuilds.
      
      Unlike BUG_ON, VM_BUG_ON must not be used to execute statements with
      side-effects, and should not be used outside core mm code.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      725d704e
  11. 22 3月, 2006 3 次提交
  12. 07 1月, 2006 2 次提交
  13. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4