1. 01 7月, 2006 1 次提交
  2. 28 6月, 2006 1 次提交
  3. 23 6月, 2006 5 次提交
    • A
      [PATCH] initialise total_memory() earlier · bd1e22b8
      Andrew Morton 提交于
      Initialise total_memory earlier in boot.  Because if for some reason we run
      page reclaim early in boot, we don't want total_memory to be zero when we use
      it as a divisor.
      
      And rename total_memory to vm_total_pages to avoid naming clashes with
      architectures.
      
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Martin Bligh <mbligh@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd1e22b8
    • C
      [PATCH] mm: fix swap unused warning · bd96b9eb
      Con Kolivas 提交于
      If CONFIG_SWAP is not defined we get:
      
      mm/vmscan.c: In function ‘remove_mapping’:
      mm/vmscan.c:387: warning: unused variable ‘swap’
      
      Convert defines in swap.h into blank inline functions to fix this warning
      and be consistent.
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bd96b9eb
    • C
      [PATCH] More page migration: use migration entries for file pages · 04e62a29
      Christoph Lameter 提交于
      This implements the use of migration entries to preserve ptes of file backed
      pages during migration.  Processes can therefore be migrated back and forth
      without loosing their connection to pagecache pages.
      
      Note that we implement the migration entries only for linear mappings.
      Nonlinear mappings still require the unmapping of the ptes for migration.
      
      And another writepage() ugliness shows up.  writepage() can drop the page
      lock.  Therefore we have to remove migration ptes before calling writepages()
      in order to avoid having migration entries point to unlocked pages.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04e62a29
    • C
      [PATCH] Swapless page migration: add R/W migration entries · 0697212a
      Christoph Lameter 提交于
      Implement read/write migration ptes
      
      We take the upper two swapfiles for the two types of migration ptes and define
      a series of macros in swapops.h.
      
      The VM is modified to handle the migration entries.  migration entries can
      only be encountered when the page they are pointing to is locked.  This limits
      the number of places one has to fix.  We also check in copy_pte_range and in
      mprotect_pte_range() for migration ptes.
      
      We check for migration ptes in do_swap_cache and call a function that will
      then wait on the page lock.  This allows us to effectively stop all accesses
      to apge.
      
      Migration entries are created by try_to_unmap if called for migration and
      removed by local functions in migrate.c
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration (I've no NUMA, just
        hacking it up to migrate recklessly while running load), I've hit the
        BUG_ON(!PageLocked(p)) in migration_entry_to_page.
      
        This comes from an orphaned migration entry, unrelated to the current
        correctly locked migration, but hit by remove_anon_migration_ptes as it
        checks an address in each vma of the anon_vma list.
      
        Such an orphan may be left behind if an earlier migration raced with fork:
        copy_one_pte can duplicate a migration entry from parent to child, after
        remove_anon_migration_ptes has checked the child vma, but before it has
        removed it from the parent vma.  (If the process were later to fault on this
        orphaned entry, it would hit the same BUG from migration_entry_wait.)
      
        This could be fixed by locking anon_vma in copy_one_pte, but we'd rather
        not.  There's no such problem with file pages, because vma_prio_tree_add
        adds child vma after parent vma, and the page table locking at each end is
        enough to serialize.  Follow that example with anon_vma: add new vmas to the
        tail instead of the head.
      
        (There's no corresponding problem when inserting migration entries,
        because a missed pte will leave the page count and mapcount high, which is
        allowed for.  And there's no corresponding problem when migrating via swap,
        because a leftover swap entry will be correctly faulted.  But the swapless
        method has no refcounting of its entries.)
      
      From: Ingo Molnar <mingo@elte.hu>
      
        pte_unmap_unlock() takes the pte pointer as an argument.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Several times while testing swapless page migration, gcc has tried to exec
        a pointer instead of a string: smells like COW mappings are not being
        properly write-protected on fork.
      
        The protection in copy_one_pte looks very convincing, until at last you
        realize that the second arg to make_migration_entry is a boolean "write",
        and SWP_MIGRATION_READ is 30.
      
        Anyway, it's better done like in change_pte_range, using
        is_write_migration_entry and make_migration_entry_read.
      
      From: Hugh Dickins <hugh@veritas.com>
      
        Remove unnecessary obfuscation from sys_swapon's range check on swap type,
        which blew up causing memory corruption once swapless migration made
        MAX_SWAPFILES no longer 2 ^ MAX_SWAPFILES_SHIFT.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NChristoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      From: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0697212a
    • A
      [PATCH] reserve space for swap label · e8f03d02
      Andreas Dilger 提交于
      Reserve space in the swap disk header for a LABEL and UUID to be specified.
       This has been possible with util-linux-2.12b (via e2fsprogs 1.36
      libblkid), and is used by at least FC3 and later.  The kernel doesn't
      really care about this, but the space shouldn't accidentally be used by
      something else either.
      
      Also make the on-disk structures be fixed-size types, instead of "int",
      though I don't know of any architecture in use where an "int" isn't the
      same size as a "__u32" (all current kernel arches have it as "unsigned
      int").
      Signed-off-by: NAndreas Dilger <adilger@shaw.ca>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e8f03d02
  4. 16 5月, 2006 1 次提交
  5. 26 4月, 2006 1 次提交
  6. 11 4月, 2006 1 次提交
    • H
      [PATCH] overcommit: add calculate_totalreserve_pages() · cb45b0e9
      Hideo AOKI 提交于
      These patches are an enhancement of OVERCOMMIT_GUESS algorithm in
      __vm_enough_memory().
      
      - why the kernel needed patching
      
        When the kernel can't allocate anonymous pages in practice, currnet
        OVERCOMMIT_GUESS could return success. This implementation might be
        the cause of oom kill in memory pressure situation.
      
        If the Linux runs with page reservation features like
        /proc/sys/vm/lowmem_reserve_ratio and without swap region, I think
        the oom kill occurs easily.
      
      - the overall design approach in the patch
      
        When the OVERCOMMET_GUESS algorithm calculates number of free pages,
        the reserved free pages are regarded as non-free pages.
      
        This change helps to avoid the pitfall that the number of free pages
        become less than the number which the kernel tries to keep free.
      
      - testing results
      
        I tested the patches using my test kernel module.
      
        If the patches aren't applied to the kernel, __vm_enough_memory()
        returns success in the situation but autual page allocation is
        failed.
      
        On the other hand, if the patches are applied to the kernel, memory
        allocation failure is avoided since __vm_enough_memory() returns
        failure in the situation.
      
        I checked that on i386 SMP 16GB memory machine. I haven't tested on
        nommu environment currently.
      
      This patch adds totalreserve_pages for __vm_enough_memory().
      
      Calculate_totalreserve_pages() checks maximum lowmem_reserve pages and
      pages_high in each zone. Finally, the function stores the sum of each
      zone to totalreserve_pages.
      
      The totalreserve_pages is calculated when the VM is initilized.
      And the variable is updated when /proc/sys/vm/lowmem_reserve_raito
      or /proc/sys/vm/min_free_kbytes are changed.
      Signed-off-by: NHideo Aoki <haoki@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cb45b0e9
  7. 23 3月, 2006 1 次提交
  8. 22 3月, 2006 2 次提交
  9. 21 2月, 2006 1 次提交
    • C
      [PATCH] Terminate process that fails on a constrained allocation · 9b0f8b04
      Christoph Lameter 提交于
      Some allocations are restricted to a limited set of nodes (due to memory
      policies or cpuset constraints).  If the page allocator is not able to find
      enough memory then that does not mean that overall system memory is low.
      
      In particular going postal and more or less randomly shooting at processes
      is not likely going to help the situation but may just lead to suicide (the
      whole system coming down).
      
      It is better to signal to the process that no memory exists given the
      constraints that the process (or the configuration of the process) has
      placed on the allocation behavior.  The process may be killed but then the
      sysadmin or developer can investigate the situation.  The solution is
      similar to what we do when running out of hugepages.
      
      This patch adds a check before we kill processes.  At that point
      performance considerations do not matter much so we just scan the zonelist
      and reconstruct a list of nodes.  If the list of nodes does not contain all
      online nodes then this is a constrained allocation and we should kill the
      current process.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b0f8b04
  10. 02 2月, 2006 4 次提交
  11. 19 1月, 2006 2 次提交
  12. 15 1月, 2006 1 次提交
  13. 09 1月, 2006 6 次提交
    • C
      [PATCH] SwapMig: Extend parameters for migrate_pages() · d4984711
      Christoph Lameter 提交于
      Extend the parameters of migrate_pages() to allow the caller control over the
      fate of successfully migrated or impossible to migrate pages.
      
      Swap migration and direct migration will have the same interface after this
      patch so that patches can be independently applied to the policy layer and the
      core migration code.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d4984711
    • C
      [PATCH] SwapMig: add_to_swap() avoid atomic allocations · 1480a540
      Christoph Lameter 提交于
      Add gfp_mask to add_to_swap
      
      add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
      swapping.  During migration we may have use add_to_swap extensively which may
      lead to out of memory errors.
      
      This patch makes add_to_swap take a parameter that specifies the gfp mask.
      The page migration code can then make add_to_swap use GFP_KERNEL.
      Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1480a540
    • C
      [PATCH] SwapMig: CONFIG_MIGRATION fixes · 8419c318
      Christoph Lameter 提交于
      Move move_to_lru, putback_lru_pages and isolate_lru in section surrounded by
      CONFIG_MIGRATION saving some codesize for single processor kernels.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8419c318
    • C
      [PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support · 7cbe34cf
      Christoph Lameter 提交于
      Include page migration if the system is NUMA or having a memory model that
      allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).
      
      And:
      - Only include lru_add_drain_per_cpu if building for an SMP system.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cbe34cf
    • C
      [PATCH] Swap Migration V5: migrate_pages() function · 49d2e9cc
      Christoph Lameter 提交于
      This adds the basic page migration function with a minimal implementation that
      only allows the eviction of pages to swap space.
      
      Page eviction and migration may be useful to migrate pages, to suspend
      programs or for remapping single pages (useful for faulty pages or pages with
      soft ECC failures)
      
      The process is as follows:
      
      The function wanting to migrate pages must first build a list of pages to be
      migrated or evicted and take them off the lru lists via isolate_lru_page().
      isolate_lru_page determines that a page is freeable based on the LRU bit set.
      
      Then the actual migration or swapout can happen by calling migrate_pages().
      
      migrate_pages does its best to migrate or swapout the pages and does multiple
      passes over the list.  Some pages may only be swappable if they are not dirty.
       migrate_pages may start writing out dirty pages in the initial passes over
      the pages.  However, migrate_pages may not be able to migrate or evict all
      pages for a variety of reasons.
      
      The remaining pages may be returned to the LRU lists using putback_lru_pages().
      
      Changelog V4->V5:
      - Use the lru caches to return pages to the LRU
      
      Changelog V3->V4:
      - Restructure code so that applying patches to support full migration does
        require minimal changes. Rename swapout_pages() to migrate_pages().
      
      Changelog V2->V3:
      - Extract common code from shrink_list() and swapout_pages()
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      49d2e9cc
    • C
      [PATCH] Swap Migration V5: LRU operations · 21eac81f
      Christoph Lameter 提交于
      This is the start of the `swap migration' patch series.
      
      Swap migration allows the moving of the physical location of pages between
      nodes in a numa system while the process is running.  This means that the
      virtual addresses that the process sees do not change.  However, the system
      rearranges the physical location of those pages.
      
      The main intent of page migration patches here is to reduce the latency of
      memory access by moving pages near to the processor where the process
      accessing that memory is running.
      
      The patchset allows a process to manually relocate the node on which its
      pages are located through the MF_MOVE and MF_MOVE_ALL options while
      setting a new memory policy.
      
      The pages of process can also be relocated from another process using the
      sys_migrate_pages() function call.  Requires CAP_SYS_ADMIN.  The migrate_pages
      function call takes two sets of nodes and moves pages of a process that are
      located on the from nodes to the destination nodes.
      
      Manual migration is very useful if for example the scheduler has relocated a
      process to a processor on a distant node.  A batch scheduler or an
      administrator can detect the situation and move the pages of the process
      nearer to the new processor.
      
      sys_migrate_pages() could be used on non-numa machines as well, to force all
      of a particualr process's pages out to swap, if someone thinks that's useful.
      
      Larger installations usually partition the system using cpusets into sections
      of nodes.  Paul has equipped cpusets with the ability to move pages when a
      task is moved to another cpuset.  This allows automatic control over locality
      of a process.  If a task is moved to a new cpuset then also all its pages are
      moved with it so that the performance of the process does not sink
      dramatically (as is the case today).
      
      Swap migration works by simply evicting the page.  The pages must be faulted
      back in.  The pages are then typically reallocated by the system near the node
      where the process is executing.
      
      For swap migration the destination of the move is controlled by the allocation
      policy.  Cpusets set the allocation policy before calling sys_migrate_pages()
      in order to move the pages as intended.
      
      No allocation policy changes are performed for sys_migrate_pages().  This
      means that the pages may not faulted in to the specified nodes if no
      allocation policy was set by other means.  The pages will just end up near the
      node where the fault occurred.
      
      There's another patch series in the pipeline which implements "direct
      migration".
      
      The direct migration patchset extends the migration functionality to avoid
      going through swap.  The destination node of the relation is controllable
      during the actual moving of pages.  The crutch of using the allocation policy
      to relocate is not necessary and the pages are moved directly to the target.
      Its also faster since swap is not used.
      
      And sys_migrate_pages() can then move pages directly to the specified node.
      Implement functions to isolate pages from the LRU and put them back later.
      
      This patch:
      
      An earlier implementation was provided by Hirokazu Takahashi
      <taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
      memory hotplug project.
      
      From: Magnus
      
      This breaks out isolate_lru_page() and putpack_lru_page().  Needed for swap
      migration.
      Signed-off-by: NMagnus Damm <magnus.damm@gmail.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      21eac81f
  14. 07 1月, 2006 2 次提交
  15. 29 11月, 2005 1 次提交
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
  16. 28 10月, 2005 1 次提交
  17. 09 10月, 2005 1 次提交
  18. 05 9月, 2005 5 次提交
    • H
      [PATCH] swap: swap_lock replace list+device · 5d337b91
      Hugh Dickins 提交于
      The idea of a swap_device_lock per device, and a swap_list_lock over them all,
      is appealing; but in practice almost every holder of swap_device_lock must
      already hold swap_list_lock, which defeats the purpose of the split.
      
      The only exceptions have been swap_duplicate, valid_swaphandles and an
      untrodden path in try_to_unuse (plus a few places added in this series).
      valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
      demand attention.  However, with the hold time in get_swap_pages so much
      reduced, I've not yet found a load and set of swap device priorities to show
      even swap_duplicate benefitting from the split.  Certainly the split is mere
      overhead in the common case of a single swap device.
      
      So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
      (generally we seem to prefer an _ in the name, and not hide in a macro).
      
      If someone can show a regression in swap_duplicate, then probably we should
      add a hashlock for the swap_map entries alone (shorts being anatomic), so as
      to help the case of the single swap device too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d337b91
    • H
      [PATCH] swap: scan_swap_map drop swap_device_lock · 52b7efdb
      Hugh Dickins 提交于
      get_swap_page has often shown up on latency traces, doing lengthy scans while
      holding two spinlocks.  swap_list_lock is already dropped, now scan_swap_map
      drop swap_device_lock before scanning the swap_map.
      
      While scanning for an empty cluster, don't worry that racing tasks may
      allocate what was free and free what was allocated; but when allocating an
      entry, check it's still free after retaking the lock.  Avoid dropping the lock
      in the expected common path.  No barriers beyond the locks, just let the
      cookie crumble; highest_bit limit is volatile, but benign.
      
      Guard against swapoff: must check SWP_WRITEOK before allocating, must raise
      SWP_SCANNING reference count while in scan_swap_map, swapoff wait for that to
      fall - just use schedule_timeout, we don't want to burden scan_swap_map
      itself, and it's very unlikely that anyone can really still be in
      scan_swap_map once swapoff gets this far.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52b7efdb
    • H
      [PATCH] swap: swap unsigned int consistency · 6eb396dc
      Hugh Dickins 提交于
      The swap header's unsigned int last_page determines the range of swap pages,
      but swap_info has been using int or unsigned long in some cases: use unsigned
      int throughout (except, in several places a local unsigned long is useful to
      avoid overflows when adding).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6eb396dc
    • H
      [PATCH] swap: show span of swap extents · 53092a74
      Hugh Dickins 提交于
      The "Adding %dk swap" message shows the number of swap extents, as a guide to
      how fragmented the swapfile may be.  But a useful further guide is what total
      extent they span across (sometimes scarily large).
      
      And there's no need to keep nr_extents in swap_info: it's unused after the
      initial message, so save a little space by keeping it on stack.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      53092a74
    • H
      [PATCH] swap: swap extent list is ordered · 11d31886
      Hugh Dickins 提交于
      There are several comments that swap's extent_list.prev points to the lowest
      extent: that's not so, it's extent_list.next which points to it, as you'd
      expect.  And a couple of loops in add_swap_extent which go all the way through
      the list, when they should just add to the other end.
      
      Fix those up, and let map_swap_page search the list forwards: profiles shows
      it to be twice as quick that way - because prefetch works better on how the
      structs are typically kmalloc'ed?  or because usually more is written to than
      read from swap, and swap is allocated ascendingly?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      11d31886
  19. 08 8月, 2005 1 次提交
  20. 13 7月, 2005 1 次提交
  21. 08 7月, 2005 1 次提交