1. 23 6月, 2006 40 次提交
    • D
      [PATCH] SELinux: add security_task_movememory calls to mm code · 86c3a764
      David Quigley 提交于
      This patch inserts security_task_movememory hook calls into memory management
      code to enable security modules to mediate this operation between tasks.
      
      Since the last posting, the hook has been renamed following feedback from
      Christoph Lameter.
      Signed-off-by: NDavid Quigley <dpquigl@tycho.nsa.gov>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86c3a764
    • C
      [PATCH] page migration: sys_move_pages(): support moving of individual pages · 742755a1
      Christoph Lameter 提交于
      move_pages() is used to move individual pages of a process. The function can
      be used to determine the location of pages and to move them onto the desired
      node. move_pages() returns status information for each page.
      
      long move_pages(pid, number_of_pages_to_move,
      		addresses_of_pages[],
      		nodes[] or NULL,
      		status[],
      		flags);
      
      The addresses of pages is an array of void * pointing to the
      pages to be moved.
      
      The nodes array contains the node numbers that the pages should be moved
      to. If a NULL is passed instead of an array then no pages are moved but
      the status array is updated. The status request may be used to determine
      the page state before issuing another move_pages() to move pages.
      
      The status array will contain the state of all individual page migration
      attempts when the function terminates. The status array is only valid if
      move_pages() completed successfullly.
      
      Possible page states in status[]:
      
      0..MAX_NUMNODES	The page is now on the indicated node.
      
      -ENOENT		Page is not present
      
      -EACCES		Page is mapped by multiple processes and can only
      		be moved if MPOL_MF_MOVE_ALL is specified.
      
      -EPERM		The page has been mlocked by a process/driver and
      		cannot be moved.
      
      -EBUSY		Page is busy and cannot be moved. Try again later.
      
      -EFAULT		Invalid address (no VMA or zero page).
      
      -ENOMEM		Unable to allocate memory on target node.
      
      -EIO		Unable to write back page. The page must be written
      		back in order to move it since the page is dirty and the
      		filesystem does not provide a migration function that
      		would allow the moving of dirty pages.
      
      -EINVAL		A dirty page cannot be moved. The filesystem does not provide
      		a migration function and has no ability to write back pages.
      
      The flags parameter indicates what types of pages to move:
      
      MPOL_MF_MOVE	Move pages that are only mapped by the process.
      
      MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes.
      		Requires sufficient capabilities.
      
      Possible return codes from move_pages()
      
      -ENOENT		No pages found that would require moving. All pages
      		are either already on the target node, not present, had an
      		invalid address or could not be moved because they were
      		mapped by multiple processes.
      
      -EINVAL		Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt
      		to migrate pages in a kernel thread.
      
      -EPERM		MPOL_MF_MOVE_ALL specified without sufficient priviledges.
      		or an attempt to move a process belonging to another user.
      
      -EACCES		One of the target nodes is not allowed by the current cpuset.
      
      -ENODEV		One of the target nodes is not online.
      
      -ESRCH		Process does not exist.
      
      -E2BIG		Too many pages to move.
      
      -ENOMEM		Not enough memory to allocate control array.
      
      -EFAULT		Parameters could not be accessed.
      
      A test program for move_pages() may be found with the patches
      on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3
      
      From: Christoph Lameter <clameter@sgi.com>
      
        Detailed results for sys_move_pages()
      
        Pass a pointer to an integer to get_new_page() that may be used to
        indicate where the completion status of a migration operation should be
        placed.  This allows sys_move_pags() to report back exactly what happened to
        each page.
      
        Wish there would be a better way to do this. Looks a bit hacky.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      742755a1
    • C
      [PATCH] page migration: use allocator function for migrate_pages() · 95a402c3
      Christoph Lameter 提交于
      Instead of passing a list of new pages, pass a function to allocate a new
      page.  This allows the correct placement of MPOL_INTERLEAVE pages during page
      migration.  It also further simplifies the callers of migrate pages.
      migrate_pages() becomes similar to migrate_pages_to() so drop
      migrate_pages_to().  The batching of new page allocations becomes unnecessary.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      95a402c3
    • C
      [PATCH] page migration: handle freeing of pages in migrate_pages() · aaa994b3
      Christoph Lameter 提交于
      Do not leave pages on the lists passed to migrate_pages().  Seems that we will
      not need any postprocessing of pages.  This will simplify the handling of
      pages by the callers of migrate_pages().
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aaa994b3
    • C
      [PATCH] page migration: simplify migrate_pages() · e24f0b8f
      Christoph Lameter 提交于
      Currently migrate_pages() is mess with lots of goto.  Extract two functions
      from migrate_pages() and get rid of the gotos.
      
      Plus we can just unconditionally set the locked bit on the new page since we
      are the only one holding a reference.  Locking is to stop others from
      accessing the page once we establish references to the new page.
      
      Remove the list_del from move_to_lru in order to have finer control over list
      processing.
      
      [akpm@osdl.org: add debug check]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Jes Sorensen <jes@trained-monkey.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e24f0b8f
    • K
      [PATCH] printk() should not be called under zone->lock · 8f9de51a
      Kirill Korotaev 提交于
      This patch fixes printk() under zone->lock in show_free_areas().  It can be
      unsafe to call printk() under this lock, since caller can try to
      allocate/free some memory and selfdeadlock on this lock.  I found
      allocations/freeing mem both in netconsole and serial console.
      
      This issue was faced in reallity when meminfo was periodically printed for
      debug purposes and netconsole was used.
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f9de51a
    • R
      [PATCH] kernel-doc for mm/filemap.c · 485bb99b
      Randy Dunlap 提交于
      mm/filemap.c:
      - add lots of kernel-doc;
      - fix some typos and kernel-doc errors;
      - drop some blank lines between function close and EXPORT_SYMBOL();
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      485bb99b
    • P
      [PATCH] slab: kmalloc, kzalloc comments cleanup and fix · 800590f5
      Paul Drynoff 提交于
      - Move comments for kmalloc to right place, currently it near __do_kmalloc
      
      - Comments for kzalloc
      
      - More detailed comments for kmalloc
      
      - Appearance of "kmalloc" and "kzalloc" man pages after "make mandocs"
      
      [rdunlap@xenotime.net: simplification]
      Signed-off-by: NPaul Drynoff <pauldrynoff@gmail.com>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      800590f5
    • K
    • A
      [PATCH] initialise total_memory() earlier · bd1e22b8
      Andrew Morton 提交于
      Initialise total_memory earlier in boot.  Because if for some reason we run
      page reclaim early in boot, we don't want total_memory to be zero when we use
      it as a divisor.
      
      And rename total_memory to vm_total_pages to avoid naming clashes with
      architectures.
      
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Martin Bligh <mbligh@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd1e22b8
    • I
      [PATCH] mm/slab.c: fix early init assumption · e0a42726
      Ingo Molnar 提交于
      The SLAB bootstrap code assumes that the first two kmalloc caches created
      (the INDEX_AC and INDEX_L3 kmalloc caches) wont be off-slab.  But due to AC
      and L3 structure size increase in lockdep, one of them ended up being
      off-slab, and subsequently crashing with:
      
      Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
       [<ffffffff80267478>] kmem_cache_alloc+0x26/0x7d
      
      The fix is to introduce a bootstrap flag and to use it to prevent off-slab
      caches being created so early during bootup.
      
      (The calculation for off-slab caches is quite complex so i didnt want to
      complicate things with introducing yet another INDEX_ calculation, the flag
      approach is simpler and smaller.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0a42726
    • H
      [PATCH] fix update_mmu_cache in fremap.c · 668e0d8f
      Hugh Dickins 提交于
      There are two calls to update_mmu_cache in fremap.c, both defective.
      The one in install_page needs to be accompanied by lazy_mmu_prot_update
      (some other cleanup time, move that into ia64 update_mmu_cache itself); and
      the one in install_file_pte should be removed since the pte is not present.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      668e0d8f
    • H
      [PATCH] swapoff: use atomic_inc_not_zero() on mm_users · 70af7c5c
      Hugh Dickins 提交于
      Now that we have atomic_inc_not_zero, it's more elegant for try_to_unuse to
      use that on mm_users: doesn't actually matter at present, but safer to be
      sure that once mm_users has gone to 0, nothing raises it for an instant.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      70af7c5c
    • D
      [PATCH] add page_mkwrite() vm_operations method · 9637a5ef
      David Howells 提交于
      Add a new VMA operation to notify a filesystem or other driver about the
      MMU generating a fault because userspace attempted to write to a page
      mapped through a read-only PTE.
      
      This facility permits the filesystem or driver to:
      
       (*) Implement storage allocation/reservation on attempted write, and so to
           deal with problems such as ENOSPC more gracefully (perhaps by generating
           SIGBUS).
      
       (*) Delay making the page writable until the contents have been written to a
           backing cache. This is useful for NFS/AFS when using FS-Cache/CacheFS.
           It permits the filesystem to have some guarantee about the state of the
           cache.
      
       (*) Account and limit number of dirty pages. This is one piece of the puzzle
           needed to make shared writable mapping work safely in FUSE.
      
      Needed by cachefs (Or is it cachefiles?  Or fscache? <head spins>).
      
      At least four other groups have stated an interest in it or a desire to use
      the functionality it provides: FUSE, OCFS2, NTFS and JFFS2.  Also, things like
      EXT3 really ought to use it to deal with the case of shared-writable mmap
      encountering ENOSPC before we permit the page to be dirtied.
      
      From: Peter Zijlstra <a.p.zijlstra@chello.nl>
      
        get_user_pages(.write=1, .force=1) can generate COW hits on read-only
        shared mappings, this patch traps those as mkpage_write candidates and fails
        to handle them the old way.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Joel Becker <Joel.Becker@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9637a5ef
    • A
      [PATCH] sparsemem: record nid during memory present · 30c253e6
      Andy Whitcroft 提交于
      Record the node id as we mark sections for instantiation.  Use this nid
      during instantiation to direct allocations.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Mike Kravetz <kravetz@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Bob Picco <bob.picco@hp.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Martin Bligh <mbligh@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      30c253e6
    • P
      [PATCH] slab: verify pointers before free · ddc2e812
      Pekka Enberg 提交于
      Passing an invalid pointer to kfree() and kmem_cache_free() is likely to
      cause bad memory corruption or even take down the whole system because the
      bad pointer is likely reused immediately due to the per-CPU caches.  Until
      now, we don't do any verification for this if CONFIG_DEBUG_SLAB is
      disabled.
      
      As suggested by Linus, add PageSlab check to page_to_cache() and
      page_to_slab() to verify pointers passed to kfree().  Also, move the
      stronger check from cache_free_debugcheck() to kmem_cache_free() to ensure
      the passed pointer actually belongs to the cache we're about to free the
      object.
      
      For page_to_cache() and page_to_slab(), the assertions should have
      virtually no extra cost (two instructions, no data cache pressure) and for
      kmem_cache_free() the overhead should be minimal.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Linus Torvalds <torvalds@osdl.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ddc2e812
    • C
      [PATCH] More page migration: use migration entries for file pages · 04e62a29
      Christoph Lameter 提交于
      This implements the use of migration entries to preserve ptes of file backed
      pages during migration.  Processes can therefore be migrated back and forth
      without loosing their connection to pagecache pages.
      
      Note that we implement the migration entries only for linear mappings.
      Nonlinear mappings still require the unmapping of the ptes for migration.
      
      And another writepage() ugliness shows up.  writepage() can drop the page
      lock.  Therefore we have to remove migration ptes before calling writepages()
      in order to avoid having migration entries point to unlocked pages.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04e62a29
    • C
      [PATCH] More page migration: do not inc/dec rss counters · 442c9137
      Christoph Lameter 提交于
      If we install a migration entry then the rss not really decreases since the
      page is just moved somewhere else.  We can save ourselves the work of
      decrementing and later incrementing which will just eventually cause cacheline
      bouncing.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      442c9137
    • C
      [PATCH] Swapless page migration: modify core logic · 6c5240ae
      Christoph Lameter 提交于
      Use the migration entries for page migration
      
      This modifies the migration code to use the new migration entries.  It now
      becomes possible to migrate anonymous pages without having to add a swap
      entry.
      
      We add a couple of new functions to replace migration entries with the proper
      ptes.
      
      We cannot take the tree_lock for migrating anonymous pages anymore.  However,
      we know that we hold the only remaining reference to the page when the page
      count reaches 1.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6c5240ae
    • C
      [PATCH] Swapless page migration: rip out swap based logic · d75a0fcd
      Christoph Lameter 提交于
      Rip the page migration logic out.
      
      Remove all code that has to do with swapping during page migration.
      
      This also guts the ability to migrate pages to swap.  No one used that so lets
      let it go for good.
      
      Page migration should be a bit broken after this patch.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d75a0fcd
    • C
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter 提交于
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0697212a
    • C
      [PATCH] page migration cleanup: move fallback handling into special function · 8351a6e4
      Christoph Lameter 提交于
      Move the fallback code into a new fallback function and make the function
      behave like any other migration function.  This requires retaking the lock if
      pageout() drops it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8351a6e4
    • C
      [PATCH] page migration cleanup: pass "mapping" to migration functions · 2d1db3b1
      Christoph Lameter 提交于
      Change handling of address spaces.
      
      Pass a pointer to the address space in which the page is migrated to all
      migration function.  This avoids repeatedly having to retrieve the address
      space pointer from the page and checking it for validity.  The old page
      mapping will change once migration has gone to a certain step, so it is less
      confusing to have the pointer always available.
      
      Move the setting of the mapping and index for the new page into
      migrate_pages().
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2d1db3b1
    • C
      [PATCH] page migration cleanup: extract try_to_unmap from migration functions · c3fcf8a5
      Christoph Lameter 提交于
      Extract try_to_unmap and rename remove_references -> move_mapping
      
      try_to_unmap() may significantly change the page state by for example setting
      the dirty bit.  It is therefore best to unmap in migrate_pages() before
      calling any migration functions.
      
      migrate_page_remove_references() will then only move the new page in place of
      the old page in the mapping.  Rename the function to
      migrate_page_move_mapping().
      
      This allows us to get rid of the special unmapping for the fallback path.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3fcf8a5
    • C
      [PATCH] page migration cleanup: drop nr_refs in remove_references() · 5b5c7120
      Christoph Lameter 提交于
      Drop nr_refs parameter from migrate_page_remove_references()
      
      The nr_refs parameter is not really useful since the number of remaining
      references is always
      
      1 for anonymous pages without a mapping
      2 for pages with a mapping
      3 for pages with a mapping and PagePrivate set.
      
      Remove the early check for the number of references since we are checking
      page_mapcount() earlier.  Ultimately only the refcount matters after the
      tree_lock has been obtained.
      Signed-off-by: NChristoph Lameter <clameter@sgi.coim>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b5c7120
    • C
      [PATCH] page migration cleanup: remove useless definitions · e7340f73
      Christoph Lameter 提交于
      Remove the export for migrate_page_remove_references() and migrate_page_copy()
      that are unlikely to be used directly by filesystems implementing migration.
      The export was useful when buffer_migrate_page() lived in fs/buffer.c but it
      has now been moved to migrate.c in the migration reorg.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7340f73
    • C
      [PATCH] page migration cleanup: group functions · 1d8b85cc
      Christoph Lameter 提交于
      Reorder functions in migrate.c.  Group all migration functions for struct
      address_space_operations together.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1d8b85cc
    • C
      [PATCH] page migration cleanup: rename "ignrefs" to "migration" · 7352349a
      Christoph Lameter 提交于
      migrate is a better name since it is only used by page migration.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7352349a
    • O
      [PATCH] writeback: fix range handling · 111ebb6e
      OGAWA Hirofumi 提交于
      When a writeback_control's `start' and `end' fields are used to
      indicate a one-byte-range starting at file offset zero, the required
      values of .start=0,.end=0 mean that the ->writepages() implementation
      has no way of telling that it is being asked to perform a range
      request.  Because we're currently overloading (start == 0 && end == 0)
      to mean "this is not a write-a-range request".
      
      To make all this sane, the patch changes range of writeback_control.
      
      So caller does: If it is calling ->writepages() to write pages, it
      sets range (range_start/end or range_cyclic) always.
      
      And if range_cyclic is true, ->writepages() thinks the range is
      cyclic, otherwise it just uses range_start and range_end.
      
      This patch does,
      
          - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
            -1 is usually ok for range_end (type is long long). But, if someone did,
      
      		range_end += val;		range_end is "val - 1"
      		u64val = range_end >> bits;	u64val is "~(0ULL)"
      
            or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
            things, and uses LLONG_MAX for range_end.
      
          - All callers of ->writepages() sets range_start/end or range_cyclic.
      
          - Fix updates of ->writeback_index. It seems already bit strange.
            If it starts at 0 and ended by check of nr_to_write, this last
            index may reduce chance to scan end of file.  So, this updates
            ->writeback_index only if range_cyclic is true or whole-file is
            scanned.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      111ebb6e
    • P
      [PATCH] slab: redzone double-free detection · 58ce1fd5
      Pekka Enberg 提交于
      At present our slab debugging tells us that it detected a double-free or
      corruption - it does not distinguish between them.  Sometimes it's useful
      to be able to differentiate between these two types of information.
      
      Add double-free detection to redzone verification when freeing an object.
      As explained by Manfred, when we are freeing an object, both redzones
      should be RED_ACTIVE.  However, if both are RED_INACTIVE, we are trying to
      free an object that was already free'd.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      58ce1fd5
    • H
      [PATCH] likely cleanup: remove unlikely in sys_mprotect() · b344e05c
      Hua Zhong 提交于
      With likely/unlikely profiling on my not-so-busy-typical-developmentsystem
      there are 5k misses vs 2k hits.  So I guess we should remove the unlikely.
      Signed-off-by: NHua Zhong <hzhong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b344e05c
    • N
      [PATCH] mm: introduce remap_vmalloc_range() · 83342314
      Nick Piggin 提交于
      Add remap_vmalloc_range, vmalloc_user, and vmalloc_32_user so that drivers
      can have a nice interface for remapping vmalloc memory.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      83342314
    • R
      [PATCH] swsusp: rework memory shrinker · d6277db4
      Rafael J. Wysocki 提交于
      Rework the swsusp's memory shrinker in the following way:
      
      - Simplify balance_pgdat() by removing all of the swsusp-related code
        from it.
      
      - Make shrink_all_memory() use shrink_slab() and a new function
        shrink_all_zones() which calls shrink_active_list() and
        shrink_inactive_list() directly for each zone in a way that's optimized
        for suspend.
      
      In shrink_all_memory() we try to free exactly as many pages as the caller
      asks for, preferably in one shot, starting from easier targets.   If slab
      caches are huge, they are most likely to have enough pages to reclaim.
       The inactive lists are next (the zones with more inactive pages go first)
      etc.
      
      Each time shrink_all_memory() attempts to shrink the active and inactive
      lists for each zone in 5 passes.   In the first pass, only the inactive
      lists are taken into consideration.   In the next two passes the active
      lists are also shrunk, but mapped pages are not reclaimed.   In the last
      two passes the active and inactive lists are shrunk and mapped pages are
      reclaimed as well.  The aim of this is to alter the reclaim logic to choose
      the best pages to keep on resume and improve the responsiveness of the
      resumed system.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6277db4
    • C
      [PATCH] slab: stop using list_for_each · 7a7c381d
      Christoph Hellwig 提交于
      Use the _entry variant everywhere to clean the code up a tiny bit.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7a7c381d
    • C
      [PATCH] slab: clean up kmem_getpages · e1b6aa6f
      Christoph Hellwig 提交于
      The last ifdef addition hit the ugliness treshold on this functions, so:
      
       - rename the variable i to nr_pages so it's somewhat descriptive
       - remove the addr variable and do the page_address call at the very end
       - instead of ifdef'ing the whole alloc_pages_node call just make the
         __GFP_COMP addition to flags conditional
       - rewrite the __GFP_COMP comment to make sense
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e1b6aa6f
    • C
      [PATCH] tightening hugetlb strict accounting · a43a8c39
      Chen, Kenneth W 提交于
      Current hugetlb strict accounting for shared mapping always assume mapping
      starts at zero file offset and reserves pages between zero and size of the
      file.  This assumption often reserves (or lock down) a lot more pages then
      necessary if application maps at none zero file offset.  libhugetlbfs is
      one example that requires proper reservation on shared mapping starts at
      none zero offset.
      
      This patch extends the reservation and hugetlb strict accounting to support
      any arbitrary pair of (offset, len), resulting a much more robust and
      accurate scheme.  More importantly, it won't lock down any hugetlb pages
      outside file mapping.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a43a8c39
    • D
      [PATCH] mm: fix typos in comments in mm/oom_kill.c · 6937a25c
      Dave Peterson 提交于
      This fixes a few typos in the comments in mm/oom_kill.c.
      Signed-off-by: NDavid S. Peterson <dsp@llnl.gov>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6937a25c
    • K
      [PATCH] support for panic at OOM · fadd8fbd
      KAMEZAWA Hiroyuki 提交于
      This patch adds panic_on_oom sysctl under sys.vm.
      
      When sysctl vm.panic_on_oom = 1, the kernel panics intead of killing rogue
      processes.  And if vm.panic_on_oom is 0 the kernel will do oom_kill() in
      the same way as it does today.  Of course, the default value is 0 and only
      root can modifies it.
      
      In general, oom_killer works well and kill rogue processes.  So the whole
      system can survive.  But there are environments where panic is preferable
      rather than kill some processes.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fadd8fbd
    • A
      [PATCH] squash duplicate page_to_pfn and pfn_to_page · 67de6482
      Andy Whitcroft 提交于
      We have architectures where the size of page_to_pfn and pfn_to_page are
      significant enough to overall image size that they wish to push them out of
      line.  However, in the process we have grown a second copy of the
      implementation of each of these routines for each memory model.  Share the
      implmentation exposing it either inline or out-of-line as required.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      67de6482
    • Y
      [PATCH] wait_table and zonelist initializing for memory hotadd: update zonelists · 6811378e
      Yasunori Goto 提交于
      In current code, zonelist is considered to be build once, no modification.
      But MemoryHotplug can add new zone/pgdat.  It must be updated.
      
      This patch modifies build_all_zonelists().  By this, build_all_zonelist() can
      reconfig pgdat's zonelists.
      
      To update them safety, this patch use stop_machine_run().  Other cpus don't
      touch among updating them by using it.
      
      In old version (V2 of node hotadd), kernel updated them after zone
      initialization.  But present_page of its new zone is still 0, because
      online_page() is not called yet at this time.  Build_zonelists() checks
      present_pages to find present zone.  It was too early.  So, I changed it after
      online_pages().
      Signed-off-by: NYasunori Goto     <y-goto@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6811378e