1. 16 12月, 2009 3 次提交
    • H
      ksm: hold anon_vma in rmap_item · db114b83
      Hugh Dickins 提交于
      For full functionality, page_referenced_one() and try_to_unmap_one() need
      to know the vma: to pass vma down to arch-dependent flushes, or to observe
      VM_LOCKED or VM_EXEC.  But KSM keeps no record of vma: nor can it, since
      vmas get split and merged without its knowledge.
      
      Instead, note page's anon_vma in its rmap_item when adding to stable tree:
      all the vmas which might map that page are listed by its anon_vma.
      
      page_referenced_ksm() and try_to_unmap_ksm() then traverse the anon_vma,
      first to find the probable vma, that which matches rmap_item's mm; but if
      that is not enough to locate all instances, traverse again to try the
      others.  This catches those occasions when fork has duplicated a pte of a
      ksm page, but ksmd has not yet come around to assign it an rmap_item.
      
      But each rmap_item in the stable tree which refers to an anon_vma needs to
      take a reference to it.  Andrea's anon_vma design cleverly avoided a
      reference count (an anon_vma was free when its list of vmas was empty),
      but KSM now needs to add that.  Is a 32-bit count sufficient?  I believe
      so - the anon_vma is only free when both count is 0 and list is empty.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db114b83
    • H
      ksm: let shared pages be swappable · 5ad64688
      Hugh Dickins 提交于
      Initial implementation for swapping out KSM's shared pages: add
      page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when
      faced with a PageKsm page.
      
      Most of what's needed can be got from the rmap_items listed from the
      stable_node of the ksm page, without discovering the actual vma: so in
      this patch just fake up a struct vma for page_referenced_one() or
      try_to_unmap_one(), then refine that in the next patch.
      
      Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been
      implicit there (being only set with VM_SHARED, already excluded), but
      let's make it explicit, to help justify the lack of nonlinear unmap.
      
      Rely on the page lock to protect against concurrent modifications to that
      page's node of the stable tree.
      
      The awkward part is not swapout but swapin: do_swap_page() and
      page_add_anon_rmap() now have to allow for new possibilities - perhaps a
      ksm page still in swapcache, perhaps a swapcache page associated with one
      location in one anon_vma now needed for another location or anon_vma.
      (And the vma might even be no longer VM_MERGEABLE when that happens.)
      
      ksm_might_need_to_copy() checks for that case, and supplies a duplicate
      page when necessary, simply leaving it to a subsequent pass of ksmd to
      rediscover the identity and merge them back into one ksm page.
      Disappointingly primitive: but the alternative would have to accumulate
      unswappable info about the swapped out ksm pages, limiting swappability.
      
      Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the
      particular case it was handling, so just use it instead.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ad64688
    • H
      mm: define PAGE_MAPPING_FLAGS · 3ca7b3c5
      Hugh Dickins 提交于
      At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set
      in page->mapping, with the higher bits a pointer to the anon_vma; and have
      defined PageKsm(page) as that with NULL anon_vma.
      
      But KSM swapping will need to store a pointer there: so in preparation for
      that, now define PAGE_MAPPING_FLAGS as the low two bits, including
      PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
      other use for the bit emerges).
      
      Declare page_rmapping(page) to return the pointer part of page->mapping,
      and page_anon_vma(page) to return the anon_vma pointer when that's what it
      is.  Use these in a few appropriate places: notably, unuse_vma() has been
      testing page->mapping, but is better to be testing page_anon_vma() (cases
      may be added in which flag bits are set without any pointer).
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ca7b3c5
  2. 22 9月, 2009 1 次提交
    • H
      ksm: no debug in page_dup_rmap() · 21333b2b
      Hugh Dickins 提交于
      page_dup_rmap(), used on each mapped page when forking, was originally
      just an inline atomic_inc of mapcount.  2.6.22 added CONFIG_DEBUG_VM
      out-of-line checks to it, which would need to be ever-so-slightly
      complicated to allow for the PageKsm() we're about to define.
      
      But I think these checks never caught anything.  And if it's coding errors
      we're worried about, such checks should be in page_remove_rmap() too, not
      just when forking; whereas if it's pagetable corruption we're worried
      about, then they shouldn't be limited to CONFIG_DEBUG_VM.
      
      Oh, just revert page_dup_rmap() to an inline atomic_inc of mapcount.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NChris Wright <chrisw@redhat.com>
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21333b2b
  3. 16 9月, 2009 4 次提交
    • A
      HWPOISON: The high level memory error handler in the VM v7 · 6a46079c
      Andi Kleen 提交于
      Add the high level memory handler that poisons pages
      that got corrupted by hardware (typically by a two bit flip in a DIMM
      or a cache) on the Linux level. The goal is to prevent everyone
      from accessing these pages in the future.
      
      This done at the VM level by marking a page hwpoisoned
      and doing the appropriate action based on the type of page
      it is.
      
      The code that does this is portable and lives in mm/memory-failure.c
      
      To quote the overview comment:
      
      High level machine check handler. Handles pages reported by the
      hardware as being corrupted usually due to a 2bit ECC memory or cache
      failure.
      
      This focuses on pages detected as corrupted in the background.
      When the current CPU tries to consume corruption the currently
      running process can just be killed directly instead. This implies
      that if the error cannot be handled for some reason it's safe to
      just ignore it because no corruption has been consumed yet. Instead
      when that happens another machine check will happen.
      
      Handles page cache pages in various states. The tricky part
      here is that we can access any page asynchronous to other VM
      users, because memory failures could happen anytime and anywhere,
      possibly violating some of their assumptions. This is why this code
      has to be extremely careful. Generally it tries to use normal locking
      rules, as in get the standard locks, even if that means the
      error handling takes potentially a long time.
      
      Some of the operations here are somewhat inefficient and have non
      linear algorithmic complexity, because the data structures have not
      been optimized for this case. This is in particular the case
      for the mapping from a vma to a process. Since this case is expected
      to be rare we hope we can get away with this.
      
      There are in principle two strategies to kill processes on poison:
      - just unmap the data and wait for an actual reference before
      killing
      - kill as soon as corruption is detected.
      Both have advantages and disadvantages and should be used
      in different situations. Right now both are implemented and can
      be switched with a new sysctl vm.memory_failure_early_kill
      The default is early kill.
      
      The patch does some rmap data structure walking on its own to collect
      processes to kill. This is unusual because normally all rmap data structure
      knowledge is in rmap.c only. I put it here for now to keep
      everything together and rmap knowledge has been seeping out anyways
      
      Includes contributions from Johannes Weiner, Chris Mason, Fengguang Wu,
      Nick Piggin (who did a lot of great work) and others.
      
      Cc: npiggin@suse.de
      Cc: riel@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      6a46079c
    • A
      HWPOISON: Handle hardware poisoned pages in try_to_unmap · 888b9f7c
      Andi Kleen 提交于
      When a page has the poison bit set replace the PTE with a poison entry.
      This causes the right error handling to be done later when a process runs
      into it.
      
      v2: add a new flag to not do that (needed for the memory-failure handler
      later) (Fengguang)
      v3: remove unnecessary is_migration_entry() test (Fengguang, Minchan)
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      888b9f7c
    • A
      HWPOISON: Use bitmask/action code for try_to_unmap behaviour · 14fa31b8
      Andi Kleen 提交于
      try_to_unmap currently has multiple modi (migration, munlock, normal unmap)
      which are selected by magic flag variables. The logic is not very straight
      forward, because each of these flag change multiple behaviours (e.g.
      migration turns off aging, not only sets up migration ptes etc.)
      Also the different flags interact in magic ways.
      
      A later patch in this series adds another mode to try_to_unmap, so
      this becomes quickly unmanageable.
      
      Replace the different flags with a action code (migration, munlock, munmap)
      and some additional flags as modifiers (ignore mlock, ignore aging).
      This makes the logic more straight forward and allows easier extension
      to new behaviours. Change all the caller to declare what they want to
      do.
      
      This patch is supposed to be a nop in behaviour. If anyone can prove
      it is not that would be a bug.
      
      Cc: Lee.Schermerhorn@hp.com
      Cc: npiggin@suse.de
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      14fa31b8
    • A
      HWPOISON: Export some rmap vma locking to outside world · 10be22df
      Andi Kleen 提交于
      Needed for later patch that walks rmap entries on its own.
      
      This used to be very frowned upon, but memory-failure.c does
      some rather specialized rmap walking and rmap has been stable
      for quite some time, so I think it's ok now to export it.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      10be22df
  4. 24 6月, 2009 1 次提交
    • M
      rmap: fixup page_referenced() for nommu systems · 01ff53f4
      Mike Frysinger 提交于
      After the recent changes that went into mm/vmscan.c to overhaul stuff, we
      ended up with these warnings on no-mmu systems:
      
        mm/vmscan.c: In function `shrink_page_list':
        mm/vmscan.c:580: warning: unused variable `vm_flags'
        mm/vmscan.c: In function `shrink_active_list':
        mm/vmscan.c:1294: warning: `vm_flags' may be used uninitialized in this function
        mm/vmscan.c:1242: note: `vm_flags' was declared here
      
      This is because the no-mmu function defines page_referenced() to work on
      the first argument only (the page).  It does not clear the vm_flags given
      to it because for no-mmu systems, they never actually get utilized.  Since
      that is no longer strictly true, we need to set vm_flags to 0 like
      everyone else so gcc can do proper dead code elimination without annoying
      us with unused warnings.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: David Howells <dhowells@redhat.com>
      Acked-by: NDavid McCullough <davidm@snapgear.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01ff53f4
  5. 17 6月, 2009 2 次提交
  6. 07 1月, 2009 2 次提交
  7. 20 10月, 2008 3 次提交
    • A
      make mm/rmap.c:anon_vma_cachep static · fdd2e5f8
      Adrian Bunk 提交于
      This patch makes the needlessly global anon_vma_cachep static.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdd2e5f8
    • L
      vmscan: unevictable LRU scan sysctl · af936a16
      Lee Schermerhorn 提交于
      This patch adds a function to scan individual or all zones' unevictable
      lists and move any pages that have become evictable onto the respective
      zone's inactive list, where shrink_inactive_list() will deal with them.
      
      Adds sysctl to scan all nodes, and per node attributes to individual
      nodes' zones.
      
      Kosaki: If evictable page found in unevictable lru when write
      /proc/sys/vm/scan_unevictable_pages, print filename and file offset of
      these pages.
      
      [akpm@linux-foundation.org: fix one CONFIG_MMU=n build error]
      [kosaki.motohiro@jp.fujitsu.com: adapt vmscan-unevictable-lru-scan-sysctl.patch to new sysfs API]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af936a16
    • N
      mlock: mlocked pages are unevictable · b291f000
      Nick Piggin 提交于
      Make sure that mlocked pages also live on the unevictable LRU, so kswapd
      will not scan them over and over again.
      
      This is achieved through various strategies:
      
      1) add yet another page flag--PG_mlocked--to indicate that
         the page is locked for efficient testing in vmscan and,
         optionally, fault path.  This allows early culling of
         unevictable pages, preventing them from getting to
         page_referenced()/try_to_unmap().  Also allows separate
         accounting of mlock'd pages, as Nick's original patch
         did.
      
         Note:  Nick's original mlock patch used a PG_mlocked
         flag.  I had removed this in favor of the PG_unevictable
         flag + an mlock_count [new page struct member].  I
         restored the PG_mlocked flag to eliminate the new
         count field.
      
      2) add the mlock/unevictable infrastructure to mm/mlock.c,
         with internal APIs in mm/internal.h.  This is a rework
         of Nick's original patch to these files, taking into
         account that mlocked pages are now kept on unevictable
         LRU list.
      
      3) update vmscan.c:page_evictable() to check PageMlocked()
         and, if vma passed in, the vm_flags.  Note that the vma
         will only be passed in for new pages in the fault path;
         and then only if the "cull unevictable pages in fault
         path" patch is included.
      
      4) add try_to_unlock() to rmap.c to walk a page's rmap and
         ClearPageMlocked() if no other vmas have it mlocked.
         Reuses as much of try_to_unmap() as possible.  This
         effectively replaces the use of one of the lru list links
         as an mlock count.  If this mechanism let's pages in mlocked
         vmas leak through w/o PG_mlocked set [I don't know that it
         does], we should catch them later in try_to_unmap().  One
         hopes this will be rare, as it will be relatively expensive.
      
      Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      
      splitlru: introduce __get_user_pages():
      
        New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
        because current get_user_pages() can't grab PROT_NONE pages theresore it
        cause PROT_NONE pages can't munlock.
      
      [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
      [akpm@linux-foundation.org: untangle patch interdependencies]
      [akpm@linux-foundation.org: fix things after out-of-order merging]
      [hugh@veritas.com: fix page-flags mess]
      [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
      [kosaki.motohiro@jp.fujitsu.com: build fix]
      [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
      [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b291f000
  8. 21 8月, 2008 1 次提交
    • N
      mm: dirty page tracking race fix · 479db0bf
      Nick Piggin 提交于
      There is a race with dirty page accounting where a page may not properly
      be accounted for.
      
      clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
      
      page_mkclean walks the rmaps for that page, and for each one it cleans and
      write protects the pte if it was dirty.  It uses page_check_address to
      find the pte.  That function has a shortcut to avoid the ptl if the pte is
      not present.  Unfortunately, the pte can be switched to not-present then
      back to present by other code while holding the page table lock -- this
      should not be a signal for page_mkclean to ignore that pte, because it may
      be dirty.
      
      For example, powerpc64's set_pte_at will clear a previously present pte
      before setting it to the desired value.  There may also be other code in
      core mm or in arch which do similar things.
      
      The consequence of the bug is loss of data integrity due to msync, and
      loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
      also be unreliable (depending on the exact XIP locking scheme), which can
      lead to data corruption.
      
      Fix this by having an option to always take ptl to check the pte in
      page_check_address.
      
      It's possible to retain this optimization for page_referenced and
      try_to_unmap.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Carsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      479db0bf
  9. 29 7月, 2008 1 次提交
    • A
      mmu-notifiers: add mm_take_all_locks() operation · 7906d00c
      Andrea Arcangeli 提交于
      mm_take_all_locks holds off reclaim from an entire mm_struct.  This allows
      mmu notifiers to register into the mm at any time with the guarantee that
      no mmu operation is in progress on the mm.
      
      This operation locks against the VM for all pte/vma/mm related operations
      that could ever happen on a certain mm.  This includes vmtruncate,
      try_to_unmap, and all page faults.
      
      The caller must take the mmap_sem in write mode before calling
      mm_take_all_locks().  The caller isn't allowed to release the mmap_sem
      until mm_drop_all_locks() returns.
      
      mmap_sem in write mode is required in order to block all operations that
      could modify pagetables and free pages without need of altering the vma
      layout (for example populate_range() with nonlinear vmas).  It's also
      needed in write mode to avoid new anon_vmas to be associated with existing
      vmas.
      
      A single task can't take more than one mm_take_all_locks() in a row or it
      would deadlock.
      
      mm_take_all_locks() and mm_drop_all_locks are expensive operations that
      may have to take thousand of locks.
      
      mm_take_all_locks() can fail if it's interrupted by signals.
      
      When mmu_notifier_register returns, we must be sure that the driver is
      notified if some task is in the middle of a vmtruncate for the 'mm' where
      the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
      is run around the vmtruncation but mmu_notifier_register can run after
      mmu_notifier_invalidate_range_start and before
      mmu_notifier_invalidate_range_end).  Same problem for rmap paths.  And
      we've to remove page pinning to avoid replicating the tlb_gather logic
      inside KVM (and GRU doesn't work well with page pinning regardless of
      needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
      the page, kvm would have no way to notice that it mapped into sptes a page
      that is going into the freelist without a chance of any further
      mmu_notifier notification.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Anthony Liguori <aliguori@us.ibm.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Marcelo Tosatti <marcelo@kvack.org>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Izik Eidus <izike@qumranet.com>
      Cc: Anthony Liguori <aliguori@us.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7906d00c
  10. 08 2月, 2008 1 次提交
    • B
      Memory controller: make page_referenced() cgroup aware · bed7161a
      Balbir Singh 提交于
      Make page_referenced() cgroup aware.  Without this patch, page_referenced()
      can cause a page to be skipped while reclaiming pages.  This patch ensures
      that other cgroups do not hold pages in a particular cgroup hostage.  It
      is required to ensure that shared pages are freed from a cgroup when they
      are not actively referenced from the cgroup that brought them in
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bed7161a
  11. 17 5月, 2007 1 次提交
  12. 23 12月, 2006 1 次提交
  13. 08 12月, 2006 2 次提交
  14. 26 9月, 2006 1 次提交
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  15. 23 6月, 2006 1 次提交
  16. 26 4月, 2006 1 次提交
  17. 02 2月, 2006 2 次提交
  18. 07 1月, 2006 1 次提交
  19. 29 11月, 2005 1 次提交
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
  20. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: rmap with inner ptlock · c0718806
      Hugh Dickins 提交于
      rmap's page_check_address descend without page_table_lock.  First just
      pte_offset_map in case there's no pte present worth locking for, then take
      page_table_lock for the full check, and pass ptl back to caller in the same
      style as pte_offset_map_lock.  __xip_unmap, page_referenced_one and
      try_to_unmap_one use pte_unmap_unlock.  try_to_unmap_cluster also.
      
      page_check_address reformatted to avoid progressive indentation.  No use is
      made of its one error code, return NULL when it fails.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c0718806
  21. 24 6月, 2005 1 次提交
    • C
      [PATCH] xip: fs/mm: execute in place · ceffc078
      Carsten Otte 提交于
      - generic_file* file operations do no longer have a xip/non-xip split
      - filemap_xip.c implements a new set of fops that require get_xip_page
        aop to work proper. all new fops are exported GPL-only (don't like to
        see whatever code use those except GPL modules)
      - __xip_unmap now uses page_check_address, which is no longer static
        in rmap.c, and defined in linux/rmap.h
      - mm/filemap.h is now much more clean, plainly having just Linus'
        inline funcs moved here from filemap.c
      - fix includes in filemap_xip to make it build cleanly on i386
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ceffc078
  22. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4