1. 07 1月, 2009 14 次提交
    • H
      swapfile: swap allocation cycle if nonrot · c60aa176
      Hugh Dickins 提交于
      Though attempting to find free clusters (Andrea), swap allocation has
      always restarted its searches from the beginning of the swap area (sct),
      to reduce seek times between swap pages, by not scattering them all over
      the partition.
      
      But on a solidstate swap device, seeks are cheap, and block remapping to
      level the wear may be limited by zones: in that case it's better to cycle
      around the whole partition.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c60aa176
    • H
      swapfile: swapon randomize if nonrot · 20137a49
      Hugh Dickins 提交于
      Swap allocation has always started from the beginning of the swap area;
      but if we're dealing with a solidstate swap device which can only remap
      blocks within limited zones, that would sooner wear out the first zone.
      
      Therefore sys_swapon() test whether blk_queue is non-rotational, and if so
      randomize the cluster_next starting position for allocation.
      
      If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
      with an "SS" at the right end of the kernel's "Adding ...  swap" message
      (so that if it's both nonrot and discardable, "SSD" will be shown there).
      Perhaps something should be shown in /proc/swaps (swapon -s), but we have
      to be more cautious before making any addition to that format.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20137a49
    • H
      swapfile: swap allocation use discard · 7992fde7
      Hugh Dickins 提交于
      When scan_swap_map() finds a free cluster of swap pages to allocate,
      discard the old contents of the cluster if the device supports discard.
      But don't bother when swap is so fragmented that we allocate single pages.
      
      Be careful about racing allocations made while we're scanning for a
      cluster; and hold up allocations made while we're discarding.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7992fde7
    • H
      swapfile: swapon use discard (trim) · 6a6ba831
      Hugh Dickins 提交于
      When adding swap, all the old data on swap can be forgotten: sys_swapon()
      discard all but the header page of the swap partition (or every extent but
      the header of the swap file), to give a solidstate swap device the
      opportunity to optimize its wear-levelling.
      
      If that succeeds, note SWP_DISCARDABLE for later use, and report it with a
      "D" at the right end of the kernel's "Adding ...  swap" message.  Perhaps
      something should be shown in /proc/swaps (swapon -s), but we have to be
      more cautious before making any addition to that format.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Joern Engel <joern@logfs.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Donjun Shin <djshin90@gmail.com>
      Cc: Tejun Heo <teheo@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a6ba831
    • H
      swapfile: rearrange scan and swap_info · ebebbbe9
      Hugh Dickins 提交于
      Before making functional changes, rearrange scan_swap_map() to simplify
      subsequent diffs.  Actually, there is one functional change in there:
      leave cluster_nr negative while scanning for a new cluster - resetting it
      early increased the likelihood that when we have difficulty finding a free
      cluster, another task may come in and try doing exactly the same - just a
      waste of cpu.
      
      Before making functional changes, rearrange struct swap_info_struct
      slightly: flags will be needed as an unsigned long (for wait_on_bit), next
      is a good int to pair with prio, old_block_size is uninteresting so shift
      it to the end.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebebbbe9
    • H
      swapfile: remove v0 SWAP-SPACE message · 81e33971
      Hugh Dickins 提交于
      The kernel has not supported v0 SWAP-SPACE since 2.5.22: I think we can
      now safely drop its "version 0 swap is no longer supported" message - just
      say "Unable to find swap-space signature" as usual.  This removes one
      level of indentation from a stretch of sys_swapon().
      
      I'd have liked to be specific, saying "Unable to find SWAPSPACE2
      signature", but it's just too confusing that the version 1 signature shows
      the number 2.
      
      Irrelevant nearby cleanup: kmap(page) already gives page_address(page).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81e33971
    • H
      swapfile: remove surplus whitespace · 886bb7e9
      Hugh Dickins 提交于
      Remove trailing whitespace from swapfile.c, and odd swap_show() alignment.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      886bb7e9
    • H
      swapfile: remove SWP_ACTIVE mask · 22c6f8fd
      Hugh Dickins 提交于
      Remove the SWP_ACTIVE mask: it just obscures the SWP_WRITEOK flag.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22c6f8fd
    • H
      swapfile: swapon needs larger size type · 73fd8748
      Hugh Dickins 提交于
      sys_swapon()'s swapfilesize (better renamed swapfilepages) is declared as
      an int, but should be an unsigned long like the maxpages it's compared
      against: on 64-bit (with 4kB pages) a swapfile of 2^44 bytes was rejected
      with "Swap area shorter than signature indicates".
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73fd8748
    • H
      mm: optimize get_scan_ratio for no swap · b962716b
      Hugh Dickins 提交于
      Rik suggests a simplified get_scan_ratio() for !CONFIG_SWAP.  Yes, the gcc
      optimizer gives us that, when nr_swap_pages is #defined as 0L.  Move usual
      declaration to swapfile.c: it never belonged in page_alloc.c.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b962716b
    • H
      mm: try_to_unuse check removing right swap · 68bdc8d6
      Hugh Dickins 提交于
      There's a possible race in try_to_unuse() which Nick Piggin led me to two
      years ago.  Where it does lock_page() after read_swap_cache_async(), what
      if another task removed that page from swapcache just before we locked it?
      
      It would sail though the (*swap_map > 1) tests doing nothing (because it
      could not have been removed from swapcache before its swap references were
      gone), until it reaches the delete_from_swap_cache(page) near the bottom.
      
      Now imagine that this page has been allocated to swap on a different swap
      area while we dropped page lock (perhaps at the top, perhaps in unuse_mm):
      we could wrongly remove from swap cache before the page has been written
      to swap, so a subsequent do_swap_page() would read in stale data from
      swap.
      
      I think this case could not happen before: remove_exclusive_swap_page()
      refused while page count was raised.  But now with reuse_swap_page() and
      try_to_free_swap() removing from swap cache without minding page count, I
      think it could happen - the previous patch argued that it was safe because
      try_to_unuse() already ignored page count, but overlooked that it might be
      breaking the assumptions in try_to_unuse() itself.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68bdc8d6
    • H
      mm: try_to_free_swap replaces remove_exclusive_swap_page · a2c43eed
      Hugh Dickins 提交于
      remove_exclusive_swap_page(): its problem is in living up to its name.
      
      It doesn't matter if someone else has a reference to the page (raised
      page_count); it doesn't matter if the page is mapped into userspace
      (raised page_mapcount - though that hints it may be worth keeping the
      swap): all that matters is that there be no more references to the swap
      (and no writeback in progress).
      
      swapoff (try_to_unuse) has been removing pages from swapcache for years,
      with no concern for page count or page mapcount, and we used to have a
      comment in lookup_swap_cache() recognizing that: if you go for a page of
      swapcache, you'll get the right page, but it could have been removed from
      swapcache by the time you get page lock.
      
      So, give up asking for exclusivity: get rid of
      remove_exclusive_swap_page(), and remove_exclusive_swap_page_ref() and
      remove_exclusive_swap_page_count() which were spawned for the recent LRU
      work: replace them by the simpler try_to_free_swap() which just checks
      page_swapcount().
      
      Similarly, remove the page_count limitation from free_swap_and_count(),
      but assume that it's worth holding on to the swap if page is mapped and
      swap nowhere near full.  Add a vm_swap_full() test in free_swap_cache()?
      It would be consistent, but I think we probably have enough for now.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2c43eed
    • H
      mm: reuse_swap_page replaces can_share_swap_page · 7b1fe597
      Hugh Dickins 提交于
      A good place to free up old swap is where do_wp_page(), or do_swap_page(),
      is about to redirty the page: the data on disk is then stale and won't be
      read again; and if we do decide to write the page out later, using the
      previous swap location makes an unnecessary disk seek very likely.
      
      So give can_share_swap_page() the side-effect of delete_from_swap_cache()
      when it safely can.  And can_share_swap_page() was always a misleading
      name, the more so if it has a side-effect: rename it reuse_swap_page().
      
      Irrelevant cleanup nearby: remove swap_token_default_timeout definition
      from swap.h: it's used nowhere.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b1fe597
    • H
      mm: replace some BUG_ONs by VM_BUG_ONs · 51726b12
      Hugh Dickins 提交于
      The swap code is over-provisioned with BUG_ONs on assorted page flags,
      mostly dating back to 2.3.  They're good documentation, and guard against
      developer error, but a waste of space on most systems: change them to
      VM_BUG_ONs, conditional on CONFIG_DEBUG_VM.  Just delete the PagePrivate
      ones: they're later, from 2.5.69, but even less interesting now.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51726b12
  2. 17 12月, 2008 1 次提交
    • J
      x86: consolidate __swp_XXX() macros · 1796316a
      Jan Beulich 提交于
      Impact: cleanup, code robustization
      
      The __swp_...() macros silently relied upon which bits are used for
      _PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
      our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
      were reported that could have been avoided if these macros properly
      used the symbolic constants. Since, as pointed out earlier, for Xen
      Dom0 support mainline likewise will need to eliminate the conflict
      between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
      adjustments, plus it introduces a mechanism to check consistency
      between MAX_SWAPFILES_SHIFT and the actual encoding macros.
      
      This also fixes a latent bug in that x86-64 used a 6-bit mask in
      __swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
      seemingly unrelated) linux/swap.h, this would have resulted in a
      collision with _PAGE_FILE.
      
      Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
      pgoff_to_pte() calculations.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1796316a
  3. 20 10月, 2008 2 次提交
  4. 05 8月, 2008 1 次提交
  5. 31 7月, 2008 1 次提交
  6. 27 7月, 2008 2 次提交
  7. 25 7月, 2008 1 次提交
    • H
      mm: fix ever-decreasing swap priority · 78ecba08
      Hugh Dickins 提交于
      Vegard Nossum has noticed the ever-decreasing negative priority in a
      swapon /swapoff loop, which eventually would misprioritize when int wraps
      positive.  Not worth spending much code on, but probably better fixed.
      
      It's easy to handle the swapping on and off of just one area, but there's
      not much point if a pair or more still misbehave.  To handle the general
      case, swapoff should compact negative priorities, keeping them always from
      -1 to -MAX_SWAPFILES.  That's a change, but should cause no regression,
      since these negative (unspecified) priorities are disjoint from the the
      positive specified priorities 0 to 32767.
      
      One small functional difference, which seems appropriate: when swapoff
      fails to free all swap from a negative priority area, that area is now
      reinserted at lowest priority, rather than at its original priority.
      
      In moving down swapon's setting of priority, I notice that an area is
      visible to /proc/swaps when it has swap_map set, yet that was being set
      before all the visible fields were properly filled in: corrected.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78ecba08
  8. 29 4月, 2008 1 次提交
  9. 28 4月, 2008 1 次提交
  10. 15 2月, 2008 1 次提交
  11. 08 2月, 2008 4 次提交
    • H
      memcgroup: reinstate swapoff mod · 044d66c1
      Hugh Dickins 提交于
      This patch reinstates the "swapoff: scan ptes preemptibly" mod we started
      with: in due course it should be rendered down into the earlier patches,
      leaving us with a more straightforward mem_cgroup_charge mod to unuse_pte,
      allocating with GFP_KERNEL while holding no spinlock and no atomic kmap.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      044d66c1
    • B
      Memory controller: make charging gfp mask aware · e1a1cd59
      Balbir Singh 提交于
      Nick Piggin pointed out that swap cache and page cache addition routines
      could be called from non GFP_KERNEL contexts.  This patch makes the
      charging routine aware of the gfp context.  Charging might fail if the
      cgroup is over it's limit, in which case a suitable error is returned.
      
      This patch was tested on a Powerpc box.  I am still looking at being able
      to test the path, through which allocations happen in non GFP_KERNEL
      contexts.
      
      [kamezawa.hiroyu@jp.fujitsu.com: problem with ZONE_MOVABLE]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1a1cd59
    • B
      Memory controller: memory accounting · 8a9f3ccd
      Balbir Singh 提交于
      Add the accounting hooks.  The accounting is carried out for RSS and Page
      Cache (unmapped) pages.  There is now a common limit and accounting for both.
      The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
      time.  Page cache is accounted at add_to_page_cache(),
      __delete_from_page_cache().  Swap cache is also accounted for.
      
      Each page's page_cgroup is protected with the last bit of the
      page_cgroup pointer, this makes handling of race conditions involving
      simultaneous mappings of a page easier.  A reference count is kept in the
      page_cgroup to deal with cases where a page might be unmapped from the RSS
      of all tasks, but still lives in the page cache.
      
      Credits go to Vaidyanathan Srinivasan for helping with reference counting work
      of the page cgroup.  Almost all of the page cache accounting code has help
      from Vaidyanathan Srinivasan.
      
      [hugh@veritas.com: fix swapoff breakage]
      [akpm@linux-foundation.org: fix locking]
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a9f3ccd
    • H
      memcgroup: temporarily revert swapoff mod · 59bd2658
      Hugh Dickins 提交于
      This patch precisely reverts the "swapoff: scan ptes preemptibly" patch
      just presented.  It's a temporary measure to allow existing memory
      controller patches to apply without rejects: in due course they should be
      rendered down into one sensible patch, and this reversion disappear.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59bd2658
  12. 06 2月, 2008 4 次提交
    • H
      tmpfs: open a window in shmem_unuse_inode · 2e0e26c7
      Hugh Dickins 提交于
      There are a couple of reasons (patches follow) why it would be good to open a
      window for sleep in shmem_unuse_inode, between its search for a matching swap
      entry, and its handling of the entry found.
      
      shmem_unuse_inode must then use igrab to hold the inode against deletion in
      that window, and its corresponding iput might result in deletion: so it had
      better unlock_page before the iput, and might as well release the page too.
      
      Nor is there any need to hold on to shmem_swaplist_mutex once we know we'll
      leave the loop.  So this unwinding moves from try_to_unuse and shmem_unuse
      into shmem_unuse_inode, in the case when it finds a match.
      
      Let try_to_unuse break on error in the shmem_unuse case, as it does in the
      unuse_mm case: though at this point in the series, no error to break on.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e0e26c7
    • H
      swapoff: scan ptes preemptibly · 2e441889
      Hugh Dickins 提交于
      Provided that CONFIG_HIGHPTE is not set, unuse_pte_range can reduce latency
      in swapoff by scanning the page table preemptibly: so long as unuse_pte is
      careful to recheck that entry under pte lock.
      
      (To tell the truth, this patch was not inspired by any cries for lower
      latency here: rather, this restructuring permits a future memory controller
      patch to allocate with GFP_KERNEL in unuse_pte, where before it could not.
      But it would be wrong to tuck this change away inside a memcgroup patch.)
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e441889
    • H
      swapin: fix valid_swaphandles defect · 8952898b
      Hugh Dickins 提交于
      valid_swaphandles is supposed to do a quick pass over the swap map entries
      neigbouring the entry which swapin_readahead is targetting, to determine for
      it a range worth reading all together.  But since it always starts its search
      from the beginning of the swap "cluster", a reject (free entry) there
      immediately curtails the readaround, and every swapin_readahead from that
      cluster is for just a single page.  Instead scan forwards and backwards around
      the target entry.
      
      Use better names for some variables: a swap_info pointer is usually called
      "si" not "swapdev".  And at the end, if only the target page should be read,
      return count of 0 to disable readaround, to avoid the unnecessarily repeated
      call to read_swap_cache_async.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8952898b
    • H
      swapin needs gfp_mask for loop on tmpfs · 02098fea
      Hugh Dickins 提交于
      Building in a filesystem on a loop device on a tmpfs file can hang when
      swapping, the loop thread caught in that infamous throttle_vm_writeout.
      
      In theory this is a long standing problem, which I've either never seen in
      practice, or long ago suppressed the recollection, after discounting my load
      and my tmpfs size as unrealistically high.  But now, with the new aops, it has
      become easy to hang on one machine.
      
      Loop used to grab_cache_page before the old prepare_write to tmpfs, which
      seems to have been enough to free up some memory for any swapin needed; but
      the new write_begin lets tmpfs find or allocate the page (much nicer, since
      grab_cache_page missed tmpfs pages in swapcache).
      
      When allocating a fresh page, tmpfs respects loop's mapping_gfp_mask, which
      has __GFP_IO|__GFP_FS stripped off, and throttle_vm_writeout is designed to
      break out when __GFP_IO or GFP_FS is unset; but when tmfps swaps in,
      read_swap_cache_async allocates with GFP_HIGHUSER_MOVABLE regardless of the
      mapping_gfp_mask - hence the hang.
      
      So, pass gfp_mask down the line from shmem_getpage to shmem_swapin to
      swapin_readahead to read_swap_cache_async to add_to_swap_cache.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02098fea
  13. 30 7月, 2007 1 次提交
  14. 17 7月, 2007 1 次提交
  15. 08 5月, 2007 1 次提交
  16. 06 1月, 2007 1 次提交
  17. 09 12月, 2006 1 次提交
  18. 08 12月, 2006 2 次提交