1. 20 10月, 2008 4 次提交
    • N
      mlock: mlocked pages are unevictable · b291f000
      Nick Piggin 提交于
      Make sure that mlocked pages also live on the unevictable LRU, so kswapd
      will not scan them over and over again.
      
      This is achieved through various strategies:
      
      1) add yet another page flag--PG_mlocked--to indicate that
         the page is locked for efficient testing in vmscan and,
         optionally, fault path.  This allows early culling of
         unevictable pages, preventing them from getting to
         page_referenced()/try_to_unmap().  Also allows separate
         accounting of mlock'd pages, as Nick's original patch
         did.
      
         Note:  Nick's original mlock patch used a PG_mlocked
         flag.  I had removed this in favor of the PG_unevictable
         flag + an mlock_count [new page struct member].  I
         restored the PG_mlocked flag to eliminate the new
         count field.
      
      2) add the mlock/unevictable infrastructure to mm/mlock.c,
         with internal APIs in mm/internal.h.  This is a rework
         of Nick's original patch to these files, taking into
         account that mlocked pages are now kept on unevictable
         LRU list.
      
      3) update vmscan.c:page_evictable() to check PageMlocked()
         and, if vma passed in, the vm_flags.  Note that the vma
         will only be passed in for new pages in the fault path;
         and then only if the "cull unevictable pages in fault
         path" patch is included.
      
      4) add try_to_unlock() to rmap.c to walk a page's rmap and
         ClearPageMlocked() if no other vmas have it mlocked.
         Reuses as much of try_to_unmap() as possible.  This
         effectively replaces the use of one of the lru list links
         as an mlock count.  If this mechanism let's pages in mlocked
         vmas leak through w/o PG_mlocked set [I don't know that it
         does], we should catch them later in try_to_unmap().  One
         hopes this will be rare, as it will be relatively expensive.
      
      Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      
      splitlru: introduce __get_user_pages():
      
        New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
        because current get_user_pages() can't grab PROT_NONE pages theresore it
        cause PROT_NONE pages can't munlock.
      
      [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
      [akpm@linux-foundation.org: untangle patch interdependencies]
      [akpm@linux-foundation.org: fix things after out-of-order merging]
      [hugh@veritas.com: fix page-flags mess]
      [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
      [kosaki.motohiro@jp.fujitsu.com: build fix]
      [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
      [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b291f000
    • L
      Unevictable LRU Infrastructure · 894bc310
      Lee Schermerhorn 提交于
      When the system contains lots of mlocked or otherwise unevictable pages,
      the pageout code (kswapd) can spend lots of time scanning over these
      pages.  Worse still, the presence of lots of unevictable pages can confuse
      kswapd into thinking that more aggressive pageout modes are required,
      resulting in all kinds of bad behaviour.
      
      Infrastructure to manage pages excluded from reclaim--i.e., hidden from
      vmscan.  Based on a patch by Larry Woodman of Red Hat.  Reworked to
      maintain "unevictable" pages on a separate per-zone LRU list, to "hide"
      them from vmscan.
      
      Kosaki Motohiro added the support for the memory controller unevictable
      lru list.
      
      Pages on the unevictable list have both PG_unevictable and PG_lru set.
      Thus, PG_unevictable is analogous to and mutually exclusive with
      PG_active--it specifies which LRU list the page is on.
      
      The unevictable infrastructure is enabled by a new mm Kconfig option
      [CONFIG_]UNEVICTABLE_LRU.
      
      A new function 'page_evictable(page, vma)' in vmscan.c tests whether or
      not a page may be evictable.  Subsequent patches will add the various
      !evictable tests.  We'll want to keep these tests light-weight for use in
      shrink_active_list() and, possibly, the fault path.
      
      To avoid races between tasks putting pages [back] onto an LRU list and
      tasks that might be moving the page from non-evictable to evictable state,
      the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()'
      -- tests the "evictability" of a page after placing it on the LRU, before
      dropping the reference.  If the page has become unevictable,
      putback_lru_page() will redo the 'putback', thus moving the page to the
      unevictable list.  This way, we avoid "stranding" evictable pages on the
      unevictable list.
      
      [akpm@linux-foundation.org: fix fallout from out-of-order merge]
      [riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build]
      [nishimura@mxp.nes.nec.co.jp: remove redundant mapping check]
      [kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework]
      [kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c]
      [kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure]
      [kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch]
      [kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Debugged-by: NBenjamin Kidwell <benjkidwell@yahoo.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      894bc310
    • L
      pageflag helpers for configed-out flags · 8a7a8544
      Lee Schermerhorn 提交于
      Define proper false/noop inline functions for noreclaim page flags when
      !defined(CONFIG_UNEVICTABLE_LRU)
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a7a8544
    • R
      define page_file_cache() function · b2e18538
      Rik van Riel 提交于
      Define page_file_cache() function to answer the question:
      	is page backed by a file?
      
      Originally part of Rik van Riel's split-lru patch.  Extracted to make
      available for other, independent reclaim patches.
      
      Moved inline function to linux/mm_inline.h where it will be needed by
      subsequent "split LRU" and "noreclaim" patches.
      
      Unfortunately this needs to use a page flag, since the PG_swapbacked state
      needs to be preserved all the way to the point where the page is last
      removed from the LRU.  Trying to derive the status from other info in the
      page resulted in wrong VM statistics in earlier split VM patchsets.
      
      The total number of page flags in use on a 32 bit machine after this patch
      is 19.
      
      [akpm@linux-foundation.org: fix up out-of-order merge fallout]
      [hugh@veritas.com: splitlru: shmem_getpage SetPageSwapBacked sooner[
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NMinChan Kim <minchan.kim@gmail.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2e18538
  2. 05 8月, 2008 1 次提交
  3. 01 8月, 2008 1 次提交
    • M
      [S390] Optimize storage key operations for anon pages · a4b526b3
      Martin Schwidefsky 提交于
      For anonymous pages without a swap cache backing the check in
      page_remove_rmap for the physical dirty bit in page_remove_rmap is
      unnecessary. The instructions that are used to check and reset the dirty
      bit are expensive. Removing the check noticably speeds up process exit.
      In addition the clearing of the dirty bit in __SetPageUptodate is
      pointless as well. With these two changes there is no storage key
      operation for an anonymous page anymore if it does not hit the swap
      space.
      
      The micro benchmark which repeatedly executes an empty shell script
      gets about 5% faster.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a4b526b3
  4. 25 7月, 2008 3 次提交
    • A
      slob: record page flag overlays explicitly · 9023cb7e
      Andy Whitcroft 提交于
      SLOB reuses two page bits for internal purposes, it overlays PG_active and
      PG_private.  This is hidden away in slob.c.  Document these overlays
      explicitly in the main page-flags enum along with all the others.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9023cb7e
    • A
      slub: record page flag overlays explicitly · 8a38082d
      Andy Whitcroft 提交于
      SLUB reuses two page bits for internal purposes, it overlays PG_active and
      PG_error.  This is hidden away in slub.c.  Document these overlays
      explicitly in the main page-flags enum along with all the others.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a38082d
    • A
      page-flags: record page flag overlays explicitly · 0cad47cf
      Andy Whitcroft 提交于
      With the recent page flag reorganisation we have a single enum which
      defines the valid page flags and their values, nice and clear.  However
      there are a number of bits which are overloaded by different subsystems.
      Firstly there is PG_owner_priv_1 which is used by filesystems and by XEN.
      Secondly both SLOB and SLUB use a couple of extra page bits to manage
      internal state for pages they own; both overlay other bits.  All of these
      "aliases" are scattered about the source making it very hard for a reader
      to know if the bits are safe to rely on in all contexts; confusion here is
      bad.
      
      As we now have a single place where the bits are clearly assigned it makes
      sense to clarify the reuse of bits by making the aliases explicit and
      visible with the original bit assignments.  This patch creates explicit
      aliases within the enum itself for the overloaded bits, creates standard
      bit accessors PageFoo etc.  and uses those throughout.
      
      This version pulls the bit manipulation out to standard named page bit
      accessors as suggested by Christoph, it retains the explicit mapping to
      the overlayed bits.  A fusion of both ideas.  This has been SLUB and SLOB
      have been compile tested on x86_64 only, and SLUB boot tested.  If people
      feel this is worth doing then I can run a fuller set of testing.
      
      This patch:
      
      Some page flags are used for more than one purpose, for example
      PG_owner_priv_1.  Currently there are individual accessors for each user,
      each built using the common flag name far away from the bit definitions.
      This makes it hard to see all possible uses of these bits.
      
      Now that we have a single enum to generate the bit orders it makes sense
      to express overlays in the same place.  So create per use aliases for this
      bit in the main page-flags enum and use those in the accessors.
      
      [akpm@linux-foundation.org: fix xen]
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cad47cf
  5. 10 6月, 2008 1 次提交
  6. 27 5月, 2008 1 次提交
    • J
      xen: implement save/restore · 0e91398f
      Jeremy Fitzhardinge 提交于
      This patch implements Xen save/restore and migration.
      
      Saving is triggered via xenbus, which is polled in
      drivers/xen/manage.c.  When a suspend request comes in, the kernel
      prepares itself for saving by:
      
      1 - Freeze all processes.  This is primarily to prevent any
          partially-completed pagetable updates from confusing the suspend
          process.  If CONFIG_PREEMPT isn't defined, then this isn't necessary.
      
      2 - Suspend xenbus and other devices
      
      3 - Stop_machine, to make sure all the other vcpus are quiescent.  The
          Xen tools require the domain to run its save off vcpu0.
      
      4 - Within the stop_machine state, it pins any unpinned pgds (under
          construction or destruction), performs canonicalizes various other
          pieces of state (mostly converting mfns to pfns), and finally
      
      5 - Suspend the domain
      
      Restore reverses the steps used to save the domain, ending when all
      the frozen processes are thawed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      0e91398f
  7. 28 4月, 2008 9 次提交
  8. 22 2月, 2008 1 次提交
  9. 06 2月, 2008 1 次提交
    • N
      mm: fix PageUptodate data race · 0ed361de
      Nick Piggin 提交于
      After running SetPageUptodate, preceeding stores to the page contents to
      actually bring it uptodate may not be ordered with the store to set the
      page uptodate.
      
      Therefore, another CPU which checks PageUptodate is true, then reads the
      page contents can get stale data.
      
      Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after
      PageUptodate.
      
      Many places that test PageUptodate, do so with the page locked, and this
      would be enough to ensure memory ordering in those places if
      SetPageUptodate were only called while the page is locked.  Unfortunately
      that is not always the case for some filesystems, but it could be an idea
      for the future.
      
      Also bring the handling of anonymous page uptodateness in line with that of
      file backed page management, by marking anon pages as uptodate when they
      _are_ uptodate, rather than when our implementation requires that they be
      marked as such.  Doing allows us to get rid of the smp_wmb's in the page
      copying functions, which were especially added for anonymous pages for an
      analogous memory ordering problem.  Both file and anonymous pages are
      handled with the same barriers.
      
      FAQ:
      Q. Why not do this in flush_dcache_page?
      A. Firstly, flush_dcache_page handles only one side (the smb side) of the
      ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away
      memory barriers in a completely unrelated function is nasty; at least in the
      PageUptodate macros, they are located together with (half) the operations
      involved in the ordering. Thirdly, the smp_wmb is only required when first
      bringing the page uptodate, wheras flush_dcache_page should be called each time
      it is written to through the kernel mapping. It is logically the wrong place to
      put it.
      
      Q. Why does this increase my text size / reduce my performance / etc.
      A. Because it is adding the necessary instructions to eliminate the data-race.
      
      Q. Can it be improved?
      A. Yes, eg. if you were to create a rule that all SetPageUptodate operations
      run under the page lock, we could avoid the smp_rmb places where PageUptodate
      is queried under the page lock. Requires audit of all filesystems and at least
      some would need reworking. That's great you're interested, I'm eagerly awaiting
      your patches.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ed361de
  10. 20 7月, 2007 3 次提交
    • A
      move page writeback acounting out of macros · d688abf5
      Andrew Morton 提交于
      page-writeback accounting is presently performed in the page-flags macros.
      This is inconsistent and a bit ugly and makes it awkward to implement
      per-backing_dev under-writeback page accounting.
      
      So move this accounting down to the callsite(s).
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d688abf5
    • F
      mm: share PG_readahead and PG_reclaim · fe3cba17
      Fengguang Wu 提交于
      Share the same page flag bit for PG_readahead and PG_reclaim.
      
      One is used only on file reads, another is only for emergency writes.  One
      is used mostly for fresh/young pages, another is for old pages.
      
      Combinations of possible interactions are:
      
      a) clear PG_reclaim => implicit clear of PG_readahead
      	it will delay an asynchronous readahead into a synchronous one
      	it actually does _good_ for readahead:
      		the pages will be reclaimed soon, it's readahead thrashing!
      		in this case, synchronous readahead makes more sense.
      
      b) clear PG_readahead => implicit clear of PG_reclaim
      	one(and only one) page will not be reclaimed in time
      	it can be avoided by checking PageWriteback(page) in readahead first
      
      c) set PG_reclaim => implicit set of PG_readahead
      	will confuse readahead and make it restart the size rampup process
      	it's a trivial problem, and can mostly be avoided by checking
      	PageWriteback(page) first in readahead
      
      d) set PG_readahead => implicit set of PG_reclaim
      	PG_readahead will never be set on already cached pages.
      	PG_reclaim will always be cleared on dirtying a page.
      	so not a problem.
      
      In summary,
      	a)   we get better behavior
      	b,d) possible interactions can be avoided
      	c)   racy condition exists that might affect readahead, but the chance
      	     is _really_ low, and the hurt on readahead is trivial.
      
      Compound pages also use PG_reclaim, but for now they do not interact with
      reclaim/readahead code.
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe3cba17
    • F
      readahead: introduce PG_readahead · d77c2d7c
      Fengguang Wu 提交于
      Introduce a new page flag: PG_readahead.
      
      It acts as a look-ahead mark, which tells the page reader: Hey, it's time to
      invoke the read-ahead logic.  For the sake of I/O pipelining, don't wait until
      it runs out of cached pages!
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Steven Pratt <slpratt@austin.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d77c2d7c
  11. 18 7月, 2007 1 次提交
  12. 08 5月, 2007 3 次提交
  13. 27 4月, 2007 1 次提交
    • M
      [S390] split page_test_and_clear_dirty. · 6c210482
      Martin Schwidefsky 提交于
      The page_test_and_clear_dirty primitive really consists of two
      operations, page_test_dirty and the page_clear_dirty. The combination
      of the two is not an atomic operation, so it makes more sense to have
      two separate operations instead of one.
      In addition to the improved readability of the s390 version of
      SetPageUptodate, it now avoids the page_test_dirty operation which is
      an insert-storage-key-extended (iske) instruction which is an expensive
      operation.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6c210482
  14. 02 3月, 2007 1 次提交
  15. 22 12月, 2006 1 次提交
    • L
      VM: Remove "clear_page_dirty()" and "test_clear_page_dirty()" functions · fba2591b
      Linus Torvalds 提交于
      They were horribly easy to mis-use because of their tempting naming, and
      they also did way more than any users of them generally wanted them to
      do.
      
      A dirty page can become clean under two circumstances:
      
       (a) when we write it out.  We have "clear_page_dirty_for_io()" for
           this, and that function remains unchanged.
      
           In the "for IO" case it is not sufficient to just clear the dirty
           bit, you also have to mark the page as being under writeback etc.
      
       (b) when we actually remove a page due to it becoming inaccessible to
           users, notably because it was truncate()'d away or the file (or
           metadata) no longer exists, and we thus want to cancel any
           outstanding dirty state.
      
      For the (b) case, we now introduce "cancel_dirty_page()", which only
      touches the page state itself, and verifies that the page is not mapped
      (since cancelling writes on a mapped page would be actively wrong as it
      is still accessible to users).
      
      Some filesystems need to be fixed up for this: CIFS, FUSE, JFS,
      ReiserFS, XFS all use the old confusing functions, and will be fixed
      separately in subsequent commits (with some of them just removing the
      offending logic, and others using clear_page_dirty_for_io()).
      
      This was confirmed by Martin Michlmayr to fix the apt database
      corruption on ARM.
      
      Cc: Martin Michlmayr <tbm@cyrius.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Andrei Popa <andrei.popa@i-neo.ro>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fba2591b
  16. 30 9月, 2006 1 次提交
  17. 26 9月, 2006 1 次提交
  18. 01 7月, 2006 2 次提交
    • C
      [PATCH] zoned vm counters: conversion of nr_writeback to per zone counter · ce866b34
      Christoph Lameter 提交于
      Conversion of nr_writeback to per zone counter.
      
      This removes the last page_state counter from arch/i386/mm/pgtable.c so we
      drop the page_state from there.
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ce866b34
    • C
      [PATCH] zoned vm counters: create vmstat.c/.h from page_alloc.c/.h · f6ac2354
      Christoph Lameter 提交于
      NOTE: ZVC are *not* the lightweight event counters.  ZVCs are reliable whereas
      event counters do not need to be.
      
      Zone based VM statistics are necessary to be able to determine what the state
      of memory in one zone is.  In a NUMA system this can be helpful for local
      reclaim and other memory optimizations that may be able to shift VM load in
      order to get more balanced memory use.
      
      It is also useful to know how the computing load affects the memory
      allocations on various zones.  This patchset allows the retrieval of that data
      from userspace.
      
      The patchset introduces a framework for counters that is a cross between the
      existing page_stats --which are simply global counters split per cpu-- and the
      approach of deferred incremental updates implemented for nr_pagecache.
      
      Small per cpu 8 bit counters are added to struct zone.  If the counter exceeds
      certain thresholds then the counters are accumulated in an array of
      atomic_long in the zone and in a global array that sums up all zone values.
      The small 8 bit counters are next to the per cpu page pointers and so they
      will be in high in the cpu cache when pages are allocated and freed.
      
      Access to VM counter information for a zone and for the whole machine is then
      possible by simply indexing an array (Thanks to Nick Piggin for pointing out
      that approach).  The access to the total number of pages of various types does
      no longer require the summing up of all per cpu counters.
      
      Benefits of this patchset right now:
      
      - Ability for UP and SMP configuration to determine how memory
        is balanced between the DMA, NORMAL and HIGHMEM zones.
      
      - loops over all processors are avoided in writeback and
        reclaim paths. We can avoid caching the writeback information
        because the needed information is directly accessible.
      
      - Special handling for nr_pagecache removed.
      
      - zone_reclaim_interval vanishes since VM stats can now determine
        when it is worth to do local reclaim.
      
      - Fast inline per node page state determination.
      
      - Accurate counters in /sys/devices/system/node/node*/meminfo. Current
        counters are counting simply which processor allocated a page somewhere
        and guestimate based on that. So the counters were not useful to show
        the actual distribution of page use on a specific zone.
      
      - The swap_prefetch patch requires per node statistics in order to
        figure out when processors of a node can prefetch. This patch provides
        some of the needed numbers.
      
      - Detailed VM counters available in more /proc and /sys status files.
      
      References to earlier discussions:
      V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2
      V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2
      V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2
      V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2
      
      Performance tests with AIM7 did not show any regressions.  Seems to be a tad
      faster even.  Tested on ia64/NUMA.  Builds fine on i386, SMP / UP.  Includes
      fixes for s390/arm/uml arch code.
      
      This patch:
      
      Move counter code from page_alloc.c/page-flags.h to vmstat.c/h.
      
      Create vmstat.c/vmstat.h by separating the counter code and the proc
      functions.
      
      Move the vm_stat_text array before zoneinfo_show.
      
      [akpm@osdl.org: s390 build fix]
      [akpm@osdl.org: HOTPLUG_CPU build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f6ac2354
  19. 23 6月, 2006 1 次提交
    • A
      [PATCH] PG_uncached is ia64 only · f886ed44
      Andrew Morton 提交于
      As Nick points out, only ia64 uses PG_uncached.  So we can push it up into the
      higher bits of the lower half of page->flags and make room for another flag on
      32-bit machines.
      
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@sgi.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f886ed44
  20. 11 4月, 2006 2 次提交
  21. 22 3月, 2006 1 次提交