1. 13 3月, 2010 1 次提交
    • D
      memcg: move charges of anonymous swap · 02491447
      Daisuke Nishimura 提交于
      This patch is another core part of this move-charge-at-task-migration
      feature.  It enables moving charges of anonymous swaps.
      
      To move the charge of swap, we need to exchange swap_cgroup's record.
      
      In current implementation, swap_cgroup's record is protected by:
      
        - page lock: if the entry is on swap cache.
        - swap_lock: if the entry is not on swap cache.
      
      This works well in usual swap-in/out activity.
      
      But this behavior make the feature of moving swap charge check many
      conditions to exchange swap_cgroup's record safely.
      
      So I changed modification of swap_cgroup's recored(swap_cgroup_record())
      to use xchg, and define a new function to cmpxchg swap_cgroup's record.
      
      This patch also enables moving charge of non pte_present but not uncharged
      swap caches, which can be exist on swap-out path, by getting the target
      pages via find_get_page() as do_mincore() does.
      
      [kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
      [akpm@linux-foundation.org: fix typos]
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02491447
  2. 16 12月, 2009 10 次提交
  3. 24 9月, 2009 2 次提交
  4. 22 9月, 2009 1 次提交
  5. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Add support for poison swap entries v2 · a7420aa5
      Andi Kleen 提交于
      Memory migration uses special swap entry types to trigger special actions on
      page faults. Extend this mechanism to also support poisoned swap entries, to
      trigger poison handling on page faults. This allows follow-on patches to
      prevent processes from faulting in poisoned pages again.
      
      v2: Fix overflow in MAX_SWAPFILES (Fengguang Wu)
      v3: Better overflow fix (Hidehiro Kawai)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      a7420aa5
  6. 24 6月, 2009 1 次提交
  7. 19 6月, 2009 2 次提交
  8. 17 6月, 2009 4 次提交
  9. 29 5月, 2009 1 次提交
  10. 01 4月, 2009 2 次提交
    • H
      shmem: writepage directly to swap · 9fab5619
      Hugh Dickins 提交于
      Synopsis: if shmem_writepage calls swap_writepage directly, most shmem
      swap loads benefit, and a catastrophic interaction between SLUB and some
      flash storage is avoided.
      
      shmem_writepage() has always been peculiar in making no attempt to write:
      it has just transferred a shmem page from file cache to swap cache, then
      let that page make its way around the LRU again before being written and
      freed.
      
      The idea was that people use tmpfs because they want those pages to stay
      in RAM; so although we give it an overflow to swap, we should resist
      writing too soon, giving those pages a second chance before they can be
      reclaimed.
      
      That was always questionable, and I've toyed with this patch for years;
      but never had a clear justification to depart from the original design.
      
      It became more questionable in 2.6.28, when the split LRU patches classed
      shmem and tmpfs pages as SwapBacked rather than as file_cache: that in
      itself gives them more resistance to reclaim than normal file pages.  I
      prepared this patch for 2.6.29, but the merge window arrived before I'd
      completed gathering statistics to justify sending it in.
      
      Then while comparing SLQB against SLUB, running SLUB on a laptop I'd
      habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping
      tests five times slower than SLAB or SLQB - other machines slower too, but
      nowhere near so bad.  Simpler "cp -a" swapping tests showed the same.
      
      slub_max_order=0 brings sanity to all, but heavy swapping is too far from
      normal to justify such a tuning.  The crucial factor on that laptop turns
      out to be that I'm using an SD card for swap.  What happens is this:
      
      By default, SLUB uses order-2 pages for shmem_inode_cache (and many other
      fs inodes), so creating tmpfs files under memory pressure brings lumpy
      reclaim into play.  One subpage of the order is chosen from the bottom of
      the LRU as usual, then the other three picked out from their random
      positions on the LRUs.
      
      In a tmpfs load, many of these pages will be ones which already passed
      through shmem_writepage, so already have swap allocated.  And though their
      offsets on swap were probably allocated sequentially, now that the pages
      are picked off at random, their swap offsets are scattered.
      
      But the flash storage on the SD card is very sensitive to having its
      writes merged: once swap is written at scattered offsets, performance
      falls apart.  Rotating disk seeks increase too, but less disastrously.
      
      So: stop giving shmem/tmpfs pages a second pass around the LRU, write them
      out to swap as soon as their swap has been allocated.
      
      It's surely possible to devise an artificial load which runs faster the
      old way, one whose sizing is such that the tmpfs pages on their second
      pass are the ones that are wanted again, and other pages not.
      
      But I've not yet found such a load: on all machines, under the loads I've
      tried, immediate swap_writepage speeds up shmem swapping: especially when
      using the SLUB allocator (and more effectively than slub_max_order=0), but
      also with the others; and it also reduces the variance between runs.  How
      much faster varies widely: a factor of five is rare, 5% is common.
      
      One load which might have suffered: imagine a swapping shmem load in a
      limited mem_cgroup on a machine with plenty of memory.  Before 2.6.29 the
      swapcache was not charged, and such a load would have run quickest with
      the shmem swapcache never written to swap.  But now swapcache is charged,
      so even this load benefits from shmem_writepage directly to swap.
      
      Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h:
      it's silly because that will never get called; but refactoring shmem.c
      sensibly according to CONFIG_SWAP will be a separate task.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fab5619
    • K
      vmscan: fix it to take care of nodemask · 327c0e96
      KAMEZAWA Hiroyuki 提交于
      try_to_free_pages() is used for the direct reclaim of up to
      SWAP_CLUSTER_MAX pages when watermarks are low.  The caller to
      alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to
      be used but this is not passed to try_to_free_pages().  This can lead to
      unnecessary reclaim of pages that are unusable by the caller and int the
      worst case lead to allocation failure as progress was not been make where
      it is needed.
      
      This patch passes the nodemask used for alloc_pages_nodemask() to
      try_to_free_pages().
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      327c0e96
  11. 09 1月, 2009 4 次提交
    • K
      memcg: fix shmem's swap accounting · b5a84319
      KAMEZAWA Hiroyuki 提交于
      Now, you can see following even when swap accounting is enabled.
      
       1. Create Group 01, and 02.
       2. allocate a "file" on tmpfs by a task under 01.
       3. swap out the "file" (by memory pressure)
       4. Read "file" from a task in group 02.
       5. the charge of "file" is moved to group 02.
      
      This is not ideal behavior. This is because SwapCache which was loaded
      by read-ahead is not taken into account..
      
      This is a patch to fix shmem's swapcache behavior.
        - remove mem_cgroup_cache_charge_swapin().
        - Add SwapCache handler routine to mem_cgroup_cache_charge().
          By this, shmem's file cache is charged at add_to_page_cache()
          with GFP_NOWAIT.
        - pass the page of swapcache to shrink_mem_cgroup.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5a84319
    • K
      memcg: swappiness · a7885eb8
      KOSAKI Motohiro 提交于
      Currently, /proc/sys/vm/swappiness can change swappiness ratio for global
      reclaim.  However, memcg reclaim doesn't have tuning parameter for itself.
      
      In general, the optimal swappiness depend on workload.  (e.g.  hpc
      workload need to low swappiness than the others.)
      
      Then, per cgroup swappiness improve administrator tunability.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7885eb8
    • K
      memcg: mem+swap controller core · 8c7c6e34
      KAMEZAWA Hiroyuki 提交于
      This patch implements per cgroup limit for usage of memory+swap.  However
      there are SwapCache, double counting of swap-cache and swap-entry is
      avoided.
      
      Mem+Swap controller works as following.
        - memory usage is limited by memory.limit_in_bytes.
        - memory + swap usage is limited by memory.memsw_limit_in_bytes.
      
      This has following benefits.
        - A user can limit total resource usage of mem+swap.
      
          Without this, because memory resource controller doesn't take care of
          usage of swap, a process can exhaust all the swap (by memory leak.)
          We can avoid this case.
      
          And Swap is shared resource but it cannot be reclaimed (goes back to memory)
          until it's used. This characteristic can be trouble when the memory
          is divided into some parts by cpuset or memcg.
          Assume group A and group B.
          After some application executes, the system can be..
      
          Group A -- very large free memory space but occupy 99% of swap.
          Group B -- under memory shortage but cannot use swap...it's nearly full.
      
          Ability to set appropriate swap limit for each group is required.
      
      Maybe someone wonder "why not swap but mem+swap ?"
      
        - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
          to move account from memory to swap...there is no change in usage of
          mem+swap.
      
          In other words, when we want to limit the usage of swap without affecting
          global LRU, mem+swap limit is better than just limiting swap.
      
      Accounting target information is stored in swap_cgroup which is
      per swap entry record.
      
      Charge is done as following.
        map
          - charge  page and memsw.
      
        unmap
          - uncharge page/memsw if not SwapCache.
      
        swap-out (__delete_from_swap_cache)
          - uncharge page
          - record mem_cgroup information to swap_cgroup.
      
        swap-in (do_swap_page)
          - charged as page and memsw.
            record in swap_cgroup is cleared.
            memsw accounting is decremented.
      
        swap-free (swap_free())
          - if swap entry is freed, memsw is uncharged by PAGE_SIZE.
      
      There are people work under never-swap environments and consider swap as
      something bad. For such people, this mem+swap controller extension is just an
      overhead.  This overhead is avoided by config or boot option.
      (see Kconfig. detail is not in this patch.)
      
      TODO:
       - maybe more optimization can be don in swap-in path. (but not very safe.)
         But we just do simple accounting at this stage.
      
      [nishimura@mxp.nes.nec.co.jp: make resize limit hold mutex]
      [hugh@veritas.com: memswap controller core swapcache fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c7c6e34
    • K
      memcg: handle swap caches · d13d1443
      KAMEZAWA Hiroyuki 提交于
      SwapCache support for memory resource controller (memcg)
      
      Before mem+swap controller, memcg itself should handle SwapCache in proper
      way.  This is cut-out from it.
      
      In current memcg, SwapCache is just leaked and the user can create tons of
      SwapCache.  This is a leak of account and should be handled.
      
      SwapCache accounting is done as following.
      
        charge (anon)
      	- charged when it's mapped.
      	  (because of readahead, charge at add_to_swap_cache() is not sane)
        uncharge (anon)
      	- uncharged when it's dropped from swapcache and fully unmapped.
      	  means it's not uncharged at unmap.
      	  Note: delete from swap cache at swap-in is done after rmap information
      	        is established.
        charge (shmem)
      	- charged at swap-in. this prevents charge at add_to_page_cache().
      
        uncharge (shmem)
      	- uncharged when it's dropped from swapcache and not on shmem's
      	  radix-tree.
      
        at migration, check against 'old page' is modified to handle shmem.
      
      Comparing to the old version discussed (and caused troubles), we have
      advantages of
        - PCG_USED bit.
        - simple migrating handling.
      
      So, situation is much easier than several months ago, maybe.
      
      [hugh@veritas.com: memcg: handle swap caches build fix]
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d13d1443
  12. 07 1月, 2009 11 次提交
    • H
      badpage: zap print_bad_pte on swap and file · 2509ef26
      Hugh Dickins 提交于
      Complete zap_pte_range()'s coverage of bad pagetable entries by calling
      print_bad_pte() on a pte_file in a linear vma and on a bad swap entry.
      That needs free_swap_and_cache() to tell it, which will also have shown
      one of those "swap_free" errors (but with much less information).
      
      Similar checks in fork's copy_one_pte()?  No, that would be more noisy
      than helpful: we'll see them when parent and child exec or exit.
      
      Where do_nonlinear_fault() calls print_bad_pte(): omit !VM_CAN_NONLINEAR
      case, that could only be a bug in sys_remap_file_pages(), not a bad pte.
      VM_FAULT_OOM rather than VM_FAULT_SIGBUS?  Well, okay, that is consistent
      with what happens if do_swap_page() operates a bad swap entry; but don't
      we have patches to be more careful about killing when VM_FAULT_OOM?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2509ef26
    • H
      swapfile: swapon randomize if nonrot · 20137a49
      Hugh Dickins 提交于
      Swap allocation has always started from the beginning of the swap area;
      but if we're dealing with a solidstate swap device which can only remap
      blocks within limited zones, that would sooner wear out the first zone.
      
      Therefore sys_swapon() test whether blk_queue is non-rotational, and if so
      randomize the cluster_next starting position for allocation.
      
      If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
      with an "SS" at the right end of the kernel's "Adding ...  swap" message
      (so that if it's both nonrot and discardable, "SSD" will be shown there).
      Perhaps something should be shown in /proc/swaps (swapon -s), but we have
      to be more cautious before making any addition to that format.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20137a49
    • H
      swapfile: swap allocation use discard · 7992fde7
      Hugh Dickins 提交于
      When scan_swap_map() finds a free cluster of swap pages to allocate,
      discard the old contents of the cluster if the device supports discard.
      But don't bother when swap is so fragmented that we allocate single pages.
      
      Be careful about racing allocations made while we're scanning for a
      cluster; and hold up allocations made while we're discarding.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7992fde7
    • H
      swapfile: swapon use discard (trim) · 6a6ba831
      Hugh Dickins 提交于
      When adding swap, all the old data on swap can be forgotten: sys_swapon()
      discard all but the header page of the swap partition (or every extent but
      the header of the swap file), to give a solidstate swap device the
      opportunity to optimize its wear-levelling.
      
      If that succeeds, note SWP_DISCARDABLE for later use, and report it with a
      "D" at the right end of the kernel's "Adding ...  swap" message.  Perhaps
      something should be shown in /proc/swaps (swapon -s), but we have to be
      more cautious before making any addition to that format.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a6ba831
    • H
      swapfile: rearrange scan and swap_info · ebebbbe9
      Hugh Dickins 提交于
      Before making functional changes, rearrange scan_swap_map() to simplify
      subsequent diffs.  Actually, there is one functional change in there:
      leave cluster_nr negative while scanning for a new cluster - resetting it
      early increased the likelihood that when we have difficulty finding a free
      cluster, another task may come in and try doing exactly the same - just a
      waste of cpu.
      
      Before making functional changes, rearrange struct swap_info_struct
      slightly: flags will be needed as an unsigned long (for wait_on_bit), next
      is a good int to pair with prio, old_block_size is uninteresting so shift
      it to the end.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebebbbe9
    • H
      swapfile: remove SWP_ACTIVE mask · 22c6f8fd
      Hugh Dickins 提交于
      Remove the SWP_ACTIVE mask: it just obscures the SWP_WRITEOK flag.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22c6f8fd
    • H
      mm: optimize get_scan_ratio for no swap · b962716b
      Hugh Dickins 提交于
      Rik suggests a simplified get_scan_ratio() for !CONFIG_SWAP.  Yes, the gcc
      optimizer gives us that, when nr_swap_pages is #defined as 0L.  Move usual
      declaration to swapfile.c: it never belonged in page_alloc.c.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b962716b
    • H
      mm: add add_to_swap stub · 60371d97
      Hugh Dickins 提交于
      If we add a failing stub for add_to_swap(), then we can remove the #ifdef
      CONFIG_SWAP from mm/vmscan.c.
      
      This was intended as a source cleanup, but looking more closely, it turns
      out that the !CONFIG_SWAP case was going to keep_locked for an anonymous
      page, whereas now it goes to the more suitable activate_locked, like the
      CONFIG_SWAP nr_swap_pages 0 case.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60371d97
    • H
      mm: remove gfp_mask from add_to_swap · ac47b003
      Hugh Dickins 提交于
      Remove gfp_mask argument from add_to_swap(): it's misleading because its
      only caller, shrink_page_list(), is not atomic at that point; and in due
      course (implementing discard) we'll sometimes want to allocate some memory
      with GFP_NOIO (as is used in swap_writepage) when allocating swap.
      
      No change to the gfp_mask passed down to add_to_swap_cache(): still use
      __GFP_HIGH without __GFP_WAIT (with nomemalloc and nowarn as before):
      though it's not obvious if that's the best combination to ask for here.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac47b003
    • H
      mm: try_to_free_swap replaces remove_exclusive_swap_page · a2c43eed
      Hugh Dickins 提交于
      remove_exclusive_swap_page(): its problem is in living up to its name.
      
      It doesn't matter if someone else has a reference to the page (raised
      page_count); it doesn't matter if the page is mapped into userspace
      (raised page_mapcount - though that hints it may be worth keeping the
      swap): all that matters is that there be no more references to the swap
      (and no writeback in progress).
      
      swapoff (try_to_unuse) has been removing pages from swapcache for years,
      with no concern for page count or page mapcount, and we used to have a
      comment in lookup_swap_cache() recognizing that: if you go for a page of
      swapcache, you'll get the right page, but it could have been removed from
      swapcache by the time you get page lock.
      
      So, give up asking for exclusivity: get rid of
      remove_exclusive_swap_page(), and remove_exclusive_swap_page_ref() and
      remove_exclusive_swap_page_count() which were spawned for the recent LRU
      work: replace them by the simpler try_to_free_swap() which just checks
      page_swapcount().
      
      Similarly, remove the page_count limitation from free_swap_and_count(),
      but assume that it's worth holding on to the swap if page is mapped and
      swap nowhere near full.  Add a vm_swap_full() test in free_swap_cache()?
      It would be consistent, but I think we probably have enough for now.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c43eed
    • H
      mm: reuse_swap_page replaces can_share_swap_page · 7b1fe597
      Hugh Dickins 提交于
      A good place to free up old swap is where do_wp_page(), or do_swap_page(),
      is about to redirty the page: the data on disk is then stale and won't be
      read again; and if we do decide to write the page out later, using the
      previous swap location makes an unnecessary disk seek very likely.
      
      So give can_share_swap_page() the side-effect of delete_from_swap_cache()
      when it safely can.  And can_share_swap_page() was always a misleading
      name, the more so if it has a side-effect: rename it reuse_swap_page().
      
      Irrelevant cleanup nearby: remove swap_token_default_timeout definition
      from swap.h: it's used nowhere.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b1fe597