1. 22 9月, 2009 11 次提交
    • H
      ksm: fix endless loop on oom · d952b791
      Hugh Dickins 提交于
      break_ksm has been looping endlessly ignoring VM_FAULT_OOM: that should
      only be a problem for ksmd when a memory control group imposes limits
      (normally the OOM killer will kill others with an mm until it succeeds);
      but in general (especially for MADV_UNMERGEABLE and KSM_RUN_UNMERGE) we
      do need to route the error (or kill) back to the caller (or sighandling).
      
      Test signal_pending in unmerge_ksm_pages, which could be a lengthy
      procedure if it has to spill into swap: returning -ERESTARTSYS so that
      trivial signals will restart but fatals will terminate (is that right?
      we do different things in different places in mm, none exactly this).
      
      unmerge_and_remove_all_rmap_items was forgetting to lock when going
      down the mm_list: fix that.  Whether it's successful or not, reset
      ksm_scan cursor to head; but only if it's successful, reset seqnr
      (shown in full_scans) - page counts will have gone down to zero.
      
      This patch leaves a significant OOM deadlock, but it's a good step
      on the way, and that deadlock is fixed in a subsequent patch.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d952b791
    • H
      ksm: five little cleanups · 81464e30
      Hugh Dickins 提交于
      1. We don't use __break_cow entry point now: merge it into break_cow.
      2. remove_all_slot_rmap_items is just a special case of
         remove_trailing_rmap_items: use the latter instead.
      3. Extend comment on unmerge_ksm_pages and rmap_items.
      4. try_to_merge_two_pages should use try_to_merge_with_ksm_page
         instead of duplicating its code; and so swap them around.
      5. Comment on cmp_and_merge_page described last year's: update it.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81464e30
    • H
      ksm: keep quiet while list empty · 6e158384
      Hugh Dickins 提交于
      ksm_scan_thread already sleeps in wait_event_interruptible until setting
      ksm_run activates it; but if there's nothing on its list to look at, i.e.
      nobody has yet said madvise MADV_MERGEABLE, it's a shame to be clocking
      up system time and full_scans: ksmd_should_run added to check that too.
      
      And move the mutex_lock out around it: the new counts showed that when
      ksm_run is stopped, a little work often got done afterwards, because it
      had been read before taking the mutex.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e158384
    • H
      ksm: break cow once unshared · 26465d3e
      Hugh Dickins 提交于
      We kept agreeing not to bother about the unswappable shared KSM pages
      which later become unshared by others: observation suggests they're not
      a significant proportion.  But they are disadvantageous, and it is easier
      to break COW to replace them by swappable pages, than offer statistics
      to show that they don't matter; then we can stop worrying about them.
      
      Doing this in ksm_do_scan, they don't go through cmp_and_merge_page on
      this pass: give them a good chance of getting into the unstable tree
      on the next pass, or back into the stable, by computing checksum now.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26465d3e
    • H
      ksm: pages_unshared and pages_volatile · 473b0ce4
      Hugh Dickins 提交于
      The pages_shared and pages_sharing counts give a good picture of how
      successful KSM is at sharing; but no clue to how much wasted work it's
      doing to get there.  Add pages_unshared (count of unique pages waiting
      in the unstable tree, hoping to find a mate) and pages_volatile.
      
      pages_volatile is harder to define.  It includes those pages changing
      too fast to get into the unstable tree, but also whatever other edge
      conditions prevent a page getting into the trees: a high value may
      deserve investigation.  Don't try to calculate it from the various
      conditions: it's the total of rmap_items less those accounted for.
      
      Also show full_scans: the number of completed scans of everything
      registered in the mm list.
      
      The locking for all these counts is simply ksm_thread_mutex.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      473b0ce4
    • H
      ksm: move pages_sharing updates · e178dfde
      Hugh Dickins 提交于
      The pages_shared count is incremented and decremented when adding a node
      to and removing a node from the stable tree: easy to understand.  But the
      pages_sharing count was hard to follow, being adjusted in various places:
      increment and decrement it when adding to and removing from the stable tree.
      
      And the pages_sharing variable used to include the pages_shared, then those
      were subtracted when shown in the pages_sharing sysfs file: now keep it as
      an exclusive count of leaves hanging off the stable tree nodes, throughout.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e178dfde
    • H
      ksm: rename kernel_pages_allocated · b4028260
      Hugh Dickins 提交于
      We're not implementing swapping of KSM pages in its first release;
      but when that follows, "kernel_pages_allocated" will be a very poor
      name for the sysfs file showing number of nodes in the stable tree:
      rename that to "pages_shared" throughout.
      
      But we already have a "pages_shared", counting those page slots
      sharing the shared pages: first rename that to... "pages_sharing".
      
      What will become of "max_kernel_pages" when the pages shared can
      be swapped?  I guess it will just be removed, so keep that name.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4028260
    • I
      ksm: change ksm nice level to be 5 · 339aa624
      Izik Eidus 提交于
      ksm should try not to disturb other tasks as much as possible.
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      339aa624
    • I
      ksm: change copyright message · 36b2528d
      Izik Eidus 提交于
      Adding Hugh Dickins into the authors list.
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36b2528d
    • I
      ksm: Kernel SamePage Merging · 31dbd01f
      Izik Eidus 提交于
      Ksm is code that allows merging of identical pages between one or more
      applications, in a way invisible to the applications that use it.  Pages
      that are merged are marked as read-only, then COWed when any application
      tries to change them.
      
      Whereas fork() allows sharing anonymous pages between parent and child,
      ksm can share anonymous pages between unrelated processes.
      
      Ksm works by walking over the memory pages of the applications it scans,
      in order to find identical pages.  It uses two sorted data structures,
      called the stable and unstable trees, to locate identical pages in an
      effective way.
      
      When ksm finds two identical pages, it marks them as readonly and merges
      them into a single page.  After the pages have been marked as readonly and
      merged into one, Linux treats them as normal copy-on-write pages, copying
      to a fresh anonymous page if write access is required later.
      
      Ksm scans and merges anonymous pages only in those memory areas that have
      been registered with it by madvise(addr, length, MADV_MERGEABLE).
      
      The ksm scanner is controlled by sysfs files in /sys/kernel/mm/ksm/:
      
      max_kernel_pages - the maximum number of unswappable kernel pages
                         which may be allocated by ksm (0 for unlimited).
      
      kernel_pages_allocated - how many ksm pages are currently allocated,
                               sharing identical content between different
                               processes (pages unswappable in this release).
      
      pages_shared - how many pages have been saved by sharing with ksm pages
                     (kernel_pages_allocated being excluded from this count).
      
      pages_to_scan - how many pages ksm should scan before sleeping.
      
      sleep_millisecs - how many milliseconds ksm should sleep between scans.
      
      run - write 0 to disable ksm, read 0 while ksm is disabled (default),
            write 1 to run ksm, read 1 while ksm is running,
            write 2 to disable ksm and unmerge all its pages.
      
      Includes contributions by Andrea Arcangeli Chris Wright and Hugh Dickins.
      
      [hugh.dickins@tiscali.co.uk: fix rare page leak]
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NChris Wright <chrisw@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31dbd01f
    • H
      ksm: the mm interface to ksm · f8af4da3
      Hugh Dickins 提交于
      This patch presents the mm interface to a dummy version of ksm.c, for
      better scrutiny of that interface: the real ksm.c follows later.
      
      When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
      MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
      pretending that they can be serviced.  But when CONFIG_KSM=y, accept them
      even if KSM is not currently running, and even on areas which KSM will not
      touch (e.g.  hugetlb or shared file or special driver mappings).
      
      Like other madvices, report ENOMEM despite success if any area in the
      range is unmapped, and use EAGAIN to report out of memory.
      
      Define vma flag VM_MERGEABLE to identify an area on which KSM may try
      merging pages: leave it to ksm_madvise() to decide whether to set it.
      Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
      VM_MERGEABLE areas, to minimize callouts when forking or exiting.
      
      Based upon earlier patches by Chris Wright and Izik Eidus.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NChris Wright <chrisw@redhat.com>
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8af4da3