1. 16 9月, 2009 3 次提交
    • A
      HWPOISON: Handle hardware poisoned pages in try_to_unmap · 888b9f7c
      Andi Kleen 提交于
      When a page has the poison bit set replace the PTE with a poison entry.
      This causes the right error handling to be done later when a process runs
      into it.
      
      v2: add a new flag to not do that (needed for the memory-failure handler
      later) (Fengguang)
      v3: remove unnecessary is_migration_entry() test (Fengguang, Minchan)
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      888b9f7c
    • A
      HWPOISON: Use bitmask/action code for try_to_unmap behaviour · 14fa31b8
      Andi Kleen 提交于
      try_to_unmap currently has multiple modi (migration, munlock, normal unmap)
      which are selected by magic flag variables. The logic is not very straight
      forward, because each of these flag change multiple behaviours (e.g.
      migration turns off aging, not only sets up migration ptes etc.)
      Also the different flags interact in magic ways.
      
      A later patch in this series adds another mode to try_to_unmap, so
      this becomes quickly unmanageable.
      
      Replace the different flags with a action code (migration, munlock, munmap)
      and some additional flags as modifiers (ignore mlock, ignore aging).
      This makes the logic more straight forward and allows easier extension
      to new behaviours. Change all the caller to declare what they want to
      do.
      
      This patch is supposed to be a nop in behaviour. If anyone can prove
      it is not that would be a bug.
      
      Cc: Lee.Schermerhorn@hp.com
      Cc: npiggin@suse.de
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      14fa31b8
    • A
      HWPOISON: Export some rmap vma locking to outside world · 10be22df
      Andi Kleen 提交于
      Needed for later patch that walks rmap entries on its own.
      
      This used to be very frowned upon, but memory-failure.c does
      some rather specialized rmap walking and rmap has been stable
      for quite some time, so I think it's ok now to export it.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      10be22df
  2. 24 6月, 2009 1 次提交
    • M
      rmap: fixup page_referenced() for nommu systems · 01ff53f4
      Mike Frysinger 提交于
      After the recent changes that went into mm/vmscan.c to overhaul stuff, we
      ended up with these warnings on no-mmu systems:
      
        mm/vmscan.c: In function `shrink_page_list':
        mm/vmscan.c:580: warning: unused variable `vm_flags'
        mm/vmscan.c: In function `shrink_active_list':
        mm/vmscan.c:1294: warning: `vm_flags' may be used uninitialized in this function
        mm/vmscan.c:1242: note: `vm_flags' was declared here
      
      This is because the no-mmu function defines page_referenced() to work on
      the first argument only (the page).  It does not clear the vm_flags given
      to it because for no-mmu systems, they never actually get utilized.  Since
      that is no longer strictly true, we need to set vm_flags to 0 like
      everyone else so gcc can do proper dead code elimination without annoying
      us with unused warnings.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: David Howells <dhowells@redhat.com>
      Acked-by: NDavid McCullough <davidm@snapgear.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01ff53f4
  3. 17 6月, 2009 2 次提交
  4. 07 1月, 2009 2 次提交
  5. 20 10月, 2008 3 次提交
    • A
      make mm/rmap.c:anon_vma_cachep static · fdd2e5f8
      Adrian Bunk 提交于
      This patch makes the needlessly global anon_vma_cachep static.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fdd2e5f8
    • L
      vmscan: unevictable LRU scan sysctl · af936a16
      Lee Schermerhorn 提交于
      This patch adds a function to scan individual or all zones' unevictable
      lists and move any pages that have become evictable onto the respective
      zone's inactive list, where shrink_inactive_list() will deal with them.
      
      Adds sysctl to scan all nodes, and per node attributes to individual
      nodes' zones.
      
      Kosaki: If evictable page found in unevictable lru when write
      /proc/sys/vm/scan_unevictable_pages, print filename and file offset of
      these pages.
      
      [akpm@linux-foundation.org: fix one CONFIG_MMU=n build error]
      [kosaki.motohiro@jp.fujitsu.com: adapt vmscan-unevictable-lru-scan-sysctl.patch to new sysfs API]
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af936a16
    • N
      mlock: mlocked pages are unevictable · b291f000
      Nick Piggin 提交于
      Make sure that mlocked pages also live on the unevictable LRU, so kswapd
      will not scan them over and over again.
      
      This is achieved through various strategies:
      
      1) add yet another page flag--PG_mlocked--to indicate that
         the page is locked for efficient testing in vmscan and,
         optionally, fault path.  This allows early culling of
         unevictable pages, preventing them from getting to
         page_referenced()/try_to_unmap().  Also allows separate
         accounting of mlock'd pages, as Nick's original patch
         did.
      
         Note:  Nick's original mlock patch used a PG_mlocked
         flag.  I had removed this in favor of the PG_unevictable
         flag + an mlock_count [new page struct member].  I
         restored the PG_mlocked flag to eliminate the new
         count field.
      
      2) add the mlock/unevictable infrastructure to mm/mlock.c,
         with internal APIs in mm/internal.h.  This is a rework
         of Nick's original patch to these files, taking into
         account that mlocked pages are now kept on unevictable
         LRU list.
      
      3) update vmscan.c:page_evictable() to check PageMlocked()
         and, if vma passed in, the vm_flags.  Note that the vma
         will only be passed in for new pages in the fault path;
         and then only if the "cull unevictable pages in fault
         path" patch is included.
      
      4) add try_to_unlock() to rmap.c to walk a page's rmap and
         ClearPageMlocked() if no other vmas have it mlocked.
         Reuses as much of try_to_unmap() as possible.  This
         effectively replaces the use of one of the lru list links
         as an mlock count.  If this mechanism let's pages in mlocked
         vmas leak through w/o PG_mlocked set [I don't know that it
         does], we should catch them later in try_to_unmap().  One
         hopes this will be rare, as it will be relatively expensive.
      
      Original mm/internal.h, mm/rmap.c and mm/mlock.c changes:
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      
      splitlru: introduce __get_user_pages():
      
        New munlock processing need to GUP_FLAGS_IGNORE_VMA_PERMISSIONS.
        because current get_user_pages() can't grab PROT_NONE pages theresore it
        cause PROT_NONE pages can't munlock.
      
      [akpm@linux-foundation.org: fix this for pagemap-pass-mm-into-pagewalkers.patch]
      [akpm@linux-foundation.org: untangle patch interdependencies]
      [akpm@linux-foundation.org: fix things after out-of-order merging]
      [hugh@veritas.com: fix page-flags mess]
      [lee.schermerhorn@hp.com: fix munlock page table walk - now requires 'mm']
      [kosaki.motohiro@jp.fujitsu.com: build fix]
      [kosaki.motohiro@jp.fujitsu.com: fix truncate race and sevaral comments]
      [kosaki.motohiro@jp.fujitsu.com: splitlru: introduce __get_user_pages()]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b291f000
  6. 21 8月, 2008 1 次提交
    • N
      mm: dirty page tracking race fix · 479db0bf
      Nick Piggin 提交于
      There is a race with dirty page accounting where a page may not properly
      be accounted for.
      
      clear_page_dirty_for_io() calls page_mkclean; then TestClearPageDirty.
      
      page_mkclean walks the rmaps for that page, and for each one it cleans and
      write protects the pte if it was dirty.  It uses page_check_address to
      find the pte.  That function has a shortcut to avoid the ptl if the pte is
      not present.  Unfortunately, the pte can be switched to not-present then
      back to present by other code while holding the page table lock -- this
      should not be a signal for page_mkclean to ignore that pte, because it may
      be dirty.
      
      For example, powerpc64's set_pte_at will clear a previously present pte
      before setting it to the desired value.  There may also be other code in
      core mm or in arch which do similar things.
      
      The consequence of the bug is loss of data integrity due to msync, and
      loss of dirty page accounting accuracy.  XIP's __xip_unmap could easily
      also be unreliable (depending on the exact XIP locking scheme), which can
      lead to data corruption.
      
      Fix this by having an option to always take ptl to check the pte in
      page_check_address.
      
      It's possible to retain this optimization for page_referenced and
      try_to_unmap.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Carsten Otte <cotte@freenet.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      479db0bf
  7. 29 7月, 2008 1 次提交
    • A
      mmu-notifiers: add mm_take_all_locks() operation · 7906d00c
      Andrea Arcangeli 提交于
      mm_take_all_locks holds off reclaim from an entire mm_struct.  This allows
      mmu notifiers to register into the mm at any time with the guarantee that
      no mmu operation is in progress on the mm.
      
      This operation locks against the VM for all pte/vma/mm related operations
      that could ever happen on a certain mm.  This includes vmtruncate,
      try_to_unmap, and all page faults.
      
      The caller must take the mmap_sem in write mode before calling
      mm_take_all_locks().  The caller isn't allowed to release the mmap_sem
      until mm_drop_all_locks() returns.
      
      mmap_sem in write mode is required in order to block all operations that
      could modify pagetables and free pages without need of altering the vma
      layout (for example populate_range() with nonlinear vmas).  It's also
      needed in write mode to avoid new anon_vmas to be associated with existing
      vmas.
      
      A single task can't take more than one mm_take_all_locks() in a row or it
      would deadlock.
      
      mm_take_all_locks() and mm_drop_all_locks are expensive operations that
      may have to take thousand of locks.
      
      mm_take_all_locks() can fail if it's interrupted by signals.
      
      When mmu_notifier_register returns, we must be sure that the driver is
      notified if some task is in the middle of a vmtruncate for the 'mm' where
      the mmu notifier was registered (mmu_notifier_invalidate_range_start/end
      is run around the vmtruncation but mmu_notifier_register can run after
      mmu_notifier_invalidate_range_start and before
      mmu_notifier_invalidate_range_end).  Same problem for rmap paths.  And
      we've to remove page pinning to avoid replicating the tlb_gather logic
      inside KVM (and GRU doesn't work well with page pinning regardless of
      needing tlb_gather), so without mm_take_all_locks when vmtruncate frees
      the page, kvm would have no way to notice that it mapped into sptes a page
      that is going into the freelist without a chance of any further
      mmu_notifier notification.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAndrea Arcangeli <andrea@qumranet.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Kanoj Sarcar <kanojsarcar@yahoo.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Avi Kivity <avi@qumranet.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Anthony Liguori <aliguori@us.ibm.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Marcelo Tosatti <marcelo@kvack.org>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Izik Eidus <izike@qumranet.com>
      Cc: Anthony Liguori <aliguori@us.ibm.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7906d00c
  8. 08 2月, 2008 1 次提交
    • B
      Memory controller: make page_referenced() cgroup aware · bed7161a
      Balbir Singh 提交于
      Make page_referenced() cgroup aware.  Without this patch, page_referenced()
      can cause a page to be skipped while reclaiming pages.  This patch ensures
      that other cgroups do not hold pages in a particular cgroup hostage.  It
      is required to ensure that shared pages are freed from a cgroup when they
      are not actively referenced from the cgroup that brought them in
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bed7161a
  9. 17 5月, 2007 1 次提交
  10. 23 12月, 2006 1 次提交
  11. 08 12月, 2006 2 次提交
  12. 26 9月, 2006 1 次提交
    • P
      [PATCH] mm: tracking shared dirty pages · d08b3851
      Peter Zijlstra 提交于
      Tracking of dirty pages in shared writeable mmap()s.
      
      The idea is simple: write protect clean shared writeable pages, catch the
      write-fault, make writeable and set dirty.  On page write-back clean all the
      PTE dirty bits and write protect them once again.
      
      The implementation is a tad harder, mainly because the default
      backing_dev_info capabilities were too loosely maintained.  Hence it is not
      enough to test the backing_dev_info for cap_account_dirty.
      
      The current heuristic is as follows, a VMA is eligible when:
       - its shared writeable
          (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
       - it is not a 'special' mapping
          (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
       - the backing_dev_info is cap_account_dirty
          mapping_cap_account_dirty(vma->vm_file->f_mapping)
       - f_op->mmap() didn't change the default page protection
      
      Page from remap_pfn_range() are explicitly excluded because their COW
      semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
      because they don't have a backing store anyway.
      
      mprotect() is taught about the new behaviour as well.  However it overrides
      the last condition.
      
      Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
      It can be called on any page, but is currently only implemented for mapped
      pages, if the page is found the be of a VMA that accounts dirty pages it will
      also wrprotect the PTE.
      
      Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
      under ->private_lock.  This seems to be safe, since ->private_lock is used to
      serialize access to the buffers, not the page itself.  This is needed because
      clear_page_dirty() will call into page_mkclean() and would thereby violate
      locking order.
      
      [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d08b3851
  13. 23 6月, 2006 1 次提交
  14. 26 4月, 2006 1 次提交
  15. 02 2月, 2006 2 次提交
  16. 07 1月, 2006 1 次提交
  17. 29 11月, 2005 1 次提交
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
  18. 30 10月, 2005 1 次提交
    • H
      [PATCH] mm: rmap with inner ptlock · c0718806
      Hugh Dickins 提交于
      rmap's page_check_address descend without page_table_lock.  First just
      pte_offset_map in case there's no pte present worth locking for, then take
      page_table_lock for the full check, and pass ptl back to caller in the same
      style as pte_offset_map_lock.  __xip_unmap, page_referenced_one and
      try_to_unmap_one use pte_unmap_unlock.  try_to_unmap_cluster also.
      
      page_check_address reformatted to avoid progressive indentation.  No use is
      made of its one error code, return NULL when it fails.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c0718806
  19. 24 6月, 2005 1 次提交
    • C
      [PATCH] xip: fs/mm: execute in place · ceffc078
      Carsten Otte 提交于
      - generic_file* file operations do no longer have a xip/non-xip split
      - filemap_xip.c implements a new set of fops that require get_xip_page
        aop to work proper. all new fops are exported GPL-only (don't like to
        see whatever code use those except GPL modules)
      - __xip_unmap now uses page_check_address, which is no longer static
        in rmap.c, and defined in linux/rmap.h
      - mm/filemap.h is now much more clean, plainly having just Linus'
        inline funcs moved here from filemap.c
      - fix includes in filemap_xip to make it build cleanly on i386
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ceffc078
  20. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4