1. 23 3月, 2011 14 次提交
  2. 25 2月, 2011 1 次提交
  3. 14 1月, 2011 1 次提交
    • A
      thp: split_huge_page paging · 3f04f62f
      Andrea Arcangeli 提交于
      Paging logic that splits the page before it is unmapped and added to swap
      to ensure backwards compatibility with the legacy swap code.  Eventually
      swap should natively pageout the hugepages to increase performance and
      decrease seeking and fragmentation of swap space.  swapoff can just skip
      over huge pmd as they cannot be part of swap yet.  In add_to_swap be
      careful to split the page only if we got a valid swap entry so we don't
      split hugepages with a full swap.
      
      In theory we could split pages before isolating them during the lru scan,
      but for khugepaged to be safe, I'm relying on either mmap_sem write mode,
      or PG_lock taken, so split_huge_page has to run either with mmap_sem
      read/write mode or PG_lock taken.  Calling it from isolate_lru_page would
      make locking more complicated, in addition to that split_huge_page would
      deadlock if called by __isolate_lru_page because it has to take the lru
      lock to add the tail pages.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f04f62f
  4. 13 11月, 2010 1 次提交
    • T
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo 提交于
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
  5. 27 10月, 2010 1 次提交
  6. 17 9月, 2010 1 次提交
    • C
      block: remove BLKDEV_IFL_WAIT · dd3932ed
      Christoph Hellwig 提交于
      All the blkdev_issue_* helpers can only sanely be used for synchronous
      caller.  To issue cache flushes or barriers asynchronously the caller needs
      to set up a bio by itself with a completion callback to move the asynchronous
      state machine ahead.  So drop the BLKDEV_IFL_WAIT flag that is always
      specified when calling blkdev_issue_* and also remove the now unused flags
      argument to blkdev_issue_flush and blkdev_issue_zeroout.  For
      blkdev_issue_discard we need to keep it for the secure discard flag, which
      gains a more descriptive name and loses the bitops vs flag confusion.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd3932ed
  7. 10 9月, 2010 5 次提交
    • C
      swap: do not send discards as barriers · 349f429e
      Christoph Hellwig 提交于
      The swap code already uses synchronous discards, no need to add I/O barriers.
      
      tj: superflous newlines removed.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NHugh Dickins <hughd@google.com>
      Tested-by: NNigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      349f429e
    • H
      swap: discard while swapping only if SWAP_FLAG_DISCARD · 33994466
      Hugh Dickins 提交于
      Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs
      show a shift since I last tested in December: in part because of firmware
      updates, in part because of the necessary move from barriers to awaiting
      completion at the block layer.  While discard at swapon still shows as
      slightly beneficial on both, discarding 1MB swap cluster when allocating
      is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV).
      
      Surrender: discard as presently implemented is more hindrance than help
      for swap; but might prove useful on other devices, or with improvements.
      So continue to do the discard at swapon, but make discard while swapping
      conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using
      only the lower 16 bits of int flags).
      
      We can add a --discard or -d to swapon(8), and a "discard" to swap in
      /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33994466
    • C
      swap: do not send discards as barriers · 8f2ae0fa
      Christoph Hellwig 提交于
      The swap code already uses synchronous discards, no need to add I/O
      barriers.
      
      This fixes the worst of the terrible slowdown in swap allocation for
      hibernation, reported on 2.6.35 by Nigel Cunningham; but does not entirely
      eliminate that regression.
      
      [tj@kernel.org: superflous newlines removed]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NNigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f2ae0fa
    • H
      swap: prevent reuse during hibernation · b73d7fce
      Hugh Dickins 提交于
      Move the hibernation check from scan_swap_map() into try_to_free_swap():
      to catch not only the common case when hibernation's allocation itself
      triggers swap reuse, but also the less likely case when concurrent page
      reclaim (shrink_page_list) might happen to try_to_free_swap from a page.
      
      Hibernation already clears __GFP_IO from the gfp_allowed_mask, to stop
      reclaim from going to swap: check that to prevent swap reuse too.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ondrej Zary <linux@rainbow-software.org>
      Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b73d7fce
    • H
      swap: revert special hibernation allocation · 910321ea
      Hugh Dickins 提交于
      Please revert 2.6.36-rc commit d2997b10
      "hibernation: freeze swap at hibernation".  It complicated matters by
      adding a second swap allocation path, just for hibernation; without in any
      way fixing the issue that it was intended to address - page reclaim after
      fixing the hibernation image might free swap from a page already imaged as
      swapcache, letting its swap be reallocated to store a different page of
      the image: resulting in data corruption if the imaged page were freed as
      clean then swapped back in.  Pages freed to si->swap_map were still in
      danger of being reallocated by the alternative allocation path.
      
      I guess it inadvertently fixed slow SSD swap allocation for hibernation,
      as reported by Nigel Cunningham: by missing out the discards that occur on
      the usual swap allocation path; but that was unintentional, and needs a
      separate fix.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ondrej Zary <linux@rainbow-software.org>
      Cc: Andrea Gelmini <andrea.gelmini@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nigel Cunningham <nigel@tuxonice.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910321ea
  8. 10 8月, 2010 2 次提交
  9. 19 5月, 2010 2 次提交
  10. 29 4月, 2010 1 次提交
  11. 13 3月, 2010 1 次提交
    • D
      memcg: move charges of anonymous swap · 02491447
      Daisuke Nishimura 提交于
      This patch is another core part of this move-charge-at-task-migration
      feature.  It enables moving charges of anonymous swaps.
      
      To move the charge of swap, we need to exchange swap_cgroup's record.
      
      In current implementation, swap_cgroup's record is protected by:
      
        - page lock: if the entry is on swap cache.
        - swap_lock: if the entry is not on swap cache.
      
      This works well in usual swap-in/out activity.
      
      But this behavior make the feature of moving swap charge check many
      conditions to exchange swap_cgroup's record safely.
      
      So I changed modification of swap_cgroup's recored(swap_cgroup_record())
      to use xchg, and define a new function to cmpxchg swap_cgroup's record.
      
      This patch also enables moving charge of non pte_present but not uncharged
      swap caches, which can be exist on swap-out path, by getting the target
      pages via find_get_page() as do_mincore() does.
      
      [kosaki.motohiro@jp.fujitsu.com: fix ia64 build]
      [akpm@linux-foundation.org: fix typos]
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02491447
  12. 07 3月, 2010 4 次提交
    • H
      mm: add comment on swap_duplicate's error code · 08259d58
      Hugh Dickins 提交于
      swap_duplicate()'s loop appears to miss out on returning the error code
      from __swap_duplicate(), except when that's -ENOMEM.  In fact this is
      intentional: prior to -ENOMEM for swap_count_continuation,
      swap_duplicate() was void (and the case only occurs when copy_one_pte()
      hits a corrupt pte).  But that's surprising behaviour, which certainly
      deserves a comment.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NHuang Shijie <shijie8@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08259d58
    • H
      mm/swapfile.c: fix swapon size off-by-one · ad2bd7e0
      Hugh Dickins 提交于
      There's an off-by-one disagreement between mkswap and swapon about the
      meaning of swap_header last_page: mkswap (in all versions I've looked at:
      util-linux-ng and BusyBox and old util-linux; probably as far back as
      1999) consistently means the offset (in page units) of the last page of
      the swap area, whereas kernel sys_swapon (as far back as 2.2 and 2.3)
      strangely takes it to mean the size (in page units) of the swap area.
      
      This disagreement is the safe way round; but it's worrying people, and
      loses us one page of swap.
      
      The fix is not just to add one to nr_good_pages: we need to get maxpages
      (the size of the swap_map array) right before that; and though that is an
      unsigned long, be careful not to overflow the unsigned int p->max which
      later holds it (probably why header uses __u32 last_page instead of size).
      
      Why did we subtract one from the maximum swp_offset to calculate maxpages?
       Though it was probably me who made that change in 2.4.10, I don't get it:
      and now we should be adding one (without risk of overflow in this case).
      
      Fix the handling of swap_header badpages: it could have overrun the
      swap_map when very large swap area used on a more limited architecture.
      
      Remove pre-initializations of swap_header, nr_good_pages and maxpages:
      those date from when sys_swapon was supporting other versions of header.
      Reported-by: NNitin Gupta <ngupta@vflare.org>
      Reported-by: NJarkko Lavinen <jarkko.lavinen@nokia.com>
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad2bd7e0
    • K
      mm: count swap usage · b084d435
      KAMEZAWA Hiroyuki 提交于
      A frequent questions from users about memory management is what numbers of
      swap ents are user for processes.  And this information will give some
      hints to oom-killer.
      
      Besides we can count the number of swapents per a process by scanning
      /proc/<pid>/smaps, this is very slow and not good for usual process
      information handler which works like 'ps' or 'top'.  (ps or top is now
      enough slow..)
      
      This patch adds a counter of swapents to mm_counter and update is at each
      swap events.  Information is exported via /proc/<pid>/status file as
      
      [kamezawa@bluextal memory]$ cat /proc/self/status
      Name:   cat
      State:  R (running)
      Tgid:   2910
      Pid:    2910
      PPid:   2823
      TracerPid:      0
      Uid:    500     500     500     500
      Gid:    500     500     500     500
      FDSize: 256
      Groups: 500
      VmPeak:    82696 kB
      VmSize:    82696 kB
      VmLck:         0 kB
      VmHWM:       432 kB
      VmRSS:       432 kB
      VmData:      172 kB
      VmStk:        84 kB
      VmExe:        48 kB
      VmLib:      1568 kB
      VmPTE:        40 kB
      VmSwap:        0 kB <=============== this.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b084d435
    • K
      mm: clean up mm_counter · d559db08
      KAMEZAWA Hiroyuki 提交于
      Presently, per-mm statistics counter is defined by macro in sched.h
      
      This patch modifies it to
        - defined in mm.h as inlinf functions
        - use array instead of macro's name creation.
      
      This patch is for reducing patch size in future patch to modify
      implementation of per-mm counter.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d559db08
  13. 16 12月, 2009 6 次提交
    • H
      ksm: let shared pages be swappable · 5ad64688
      Hugh Dickins 提交于
      Initial implementation for swapping out KSM's shared pages: add
      page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when
      faced with a PageKsm page.
      
      Most of what's needed can be got from the rmap_items listed from the
      stable_node of the ksm page, without discovering the actual vma: so in
      this patch just fake up a struct vma for page_referenced_one() or
      try_to_unmap_one(), then refine that in the next patch.
      
      Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been
      implicit there (being only set with VM_SHARED, already excluded), but
      let's make it explicit, to help justify the lack of nonlinear unmap.
      
      Rely on the page lock to protect against concurrent modifications to that
      page's node of the stable tree.
      
      The awkward part is not swapout but swapin: do_swap_page() and
      page_add_anon_rmap() now have to allow for new possibilities - perhaps a
      ksm page still in swapcache, perhaps a swapcache page associated with one
      location in one anon_vma now needed for another location or anon_vma.
      (And the vma might even be no longer VM_MERGEABLE when that happens.)
      
      ksm_might_need_to_copy() checks for that case, and supplies a duplicate
      page when necessary, simply leaving it to a subsequent pass of ksmd to
      rediscover the identity and merge them back into one ksm page.
      Disappointingly primitive: but the alternative would have to accumulate
      unswappable info about the swapped out ksm pages, limiting swappability.
      
      Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the
      particular case it was handling, so just use it instead.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ad64688
    • H
      mm: define PAGE_MAPPING_FLAGS · 3ca7b3c5
      Hugh Dickins 提交于
      At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set
      in page->mapping, with the higher bits a pointer to the anon_vma; and have
      defined PageKsm(page) as that with NULL anon_vma.
      
      But KSM swapping will need to store a pointer there: so in preparation for
      that, now define PAGE_MAPPING_FLAGS as the low two bits, including
      PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
      other use for the bit emerges).
      
      Declare page_rmapping(page) to return the pointer part of page->mapping,
      and page_anon_vma(page) to return the anon_vma pointer when that's what it
      is.  Use these in a few appropriate places: notably, unuse_vma() has been
      testing page->mapping, but is better to be testing page_anon_vma() (cases
      may be added in which flag bits are set without any pointer).
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ca7b3c5
    • L
      swap: rework map_swap_page() again · d4906e1a
      Lee Schermerhorn 提交于
      Seems that page_io.c doesn't really need to know that page_private(page)
      is the swp_entry 'val'.  Rework map_swap_page() to do what its name says
      and map a page to a page offset in the swap space.
      
      The only other caller of map_swap_page() is internal to mm/swapfile.c and
      it does want to map a swap entry to the 'sector'.  So rename
      map_swap_page() to map_swap_entry(), make it 'static' and and implement
      map_swap_page() as a wrapper around that.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4906e1a
    • H
      swap_info: note SWAP_MAP_SHMEM · aaa46865
      Hugh Dickins 提交于
      While we're fiddling with the swap_map values, let's assign a particular
      value to shmem/tmpfs swap pages: their swap counts are never incremented,
      and it helps swapoff's try_to_unuse() a little if it can immediately
      distinguish those pages from process pages.
      
      Since we've no use for SWAP_MAP_BAD | COUNT_CONTINUED,
      we might as well use that 0xbf value for SWAP_MAP_SHMEM.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aaa46865
    • H
      swap_info: swap count continuations · 570a335b
      Hugh Dickins 提交于
      Swap is duplicated (reference count incremented by one) whenever the same
      swap page is inserted into another mm (when forking finds a swap entry in
      place of a pte, or when reclaim unmaps a pte to insert the swap entry).
      
      swap_info_struct's vmalloc'ed swap_map is the array of these reference
      counts: but what happens when the unsigned short (or unsigned char since
      the preceding patch) is full? (and its high bit is kept for a cache flag)
      
      We then lose track of it, never freeing, leaving it in use until swapoff:
      at which point we _hope_ that a single pass will have found all instances,
      assume there are no more, and will lose user data if we're wrong.
      
      Swapping of KSM pages has not yet been enabled; but it is implemented,
      and makes it very easy for a user to overflow the maximum swap count:
      possible with ordinary process pages, but unlikely, even when pid_max
      has been raised from PID_MAX_DEFAULT.
      
      This patch implements swap count continuations: when the count overflows,
      a continuation page is allocated and linked to the original vmalloc'ed
      map page, and this used to hold the continuation counts for that entry
      and its neighbours.  These continuation pages are seldom referenced:
      the common paths all work on the original swap_map, only referring to
      a continuation page when the low "digit" of a count is incremented or
      decremented through SWAP_MAP_MAX.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      570a335b
    • H
      swap_info: swap_map of chars not shorts · 8d69aaee
      Hugh Dickins 提交于
      Halve the vmalloc'ed swap_map array from unsigned shorts to unsigned
      chars: it's still very unusual to reach a swap count of 126, and the
      next patch allows it to be extended indefinitely.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d69aaee