1. 10 8月, 2010 1 次提交
  2. 16 12月, 2009 5 次提交
    • H
      fix ksm.h breakage of nommu build · f42647ac
      Hugh Dickins 提交于
      Commit 5ad64688 "ksm: let shared pages
      be swappable" breaks the build on m68knommu and I suspect on any nommu:
      
        In file included from kernel/fork.c:52:
        include/linux/ksm.h:129: warning: 'enum ttu_flags' declared inside parameter list
        include/linux/ksm.h:129: warning: its scope is only this definition or declaration, which is probably not what you want
        include/linux/ksm.h:129: error: parameter 2 ('flags') has incomplete type
        make[1]: *** [kernel/fork.o] Error 1
        make[1]: *** Waiting for unfinished jobs....
      
      Let's fix that with CONFIG_MMU around most of the !CONFIG_KSM declarations.
      Reported-by: NSteven King <sfking@fdwdc.com>
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Tested-by: NSteven King <sfking@fdwdc.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f42647ac
    • H
      ksm: rmap_walk to remove_migation_ptes · e9995ef9
      Hugh Dickins 提交于
      A side-effect of making ksm pages swappable is that they have to be placed
      on the LRUs: which then exposes them to isolate_lru_page() and hence to
      page migration.
      
      Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and
      rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c.  Perhaps some
      consolidation with existing code is possible, but don't attempt that yet
      (try_to_unmap needs to handle nonlinears, but migration pte removal does
      not).
      
      rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like
      remove_anon_migration_ptes() which it replaces, avoids calling
      page_lock_anon_vma(), because that includes a page_mapped() test which
      fails when all migration ptes are in place.  That was valid when NUMA page
      migration was introduced (holding mmap_sem provided the missing guarantee
      that anon_vma's slab had not already been destroyed), but I believe not
      valid in the memory hotremove case added since.
      
      For now do the same as before, and consider the best way to fix that
      unlikely race later on.  When fixed, we can probably use rmap_walk() on
      hwpoisoned ksm pages too: for now, they remain among hwpoison's various
      exceptions (its PageKsm test comes before the page is locked, but its
      page_lock_anon_vma fails safely if an anon gets upgraded).
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9995ef9
    • H
      ksm: let shared pages be swappable · 5ad64688
      Hugh Dickins 提交于
      Initial implementation for swapping out KSM's shared pages: add
      page_referenced_ksm() and try_to_unmap_ksm(), which rmap.c calls when
      faced with a PageKsm page.
      
      Most of what's needed can be got from the rmap_items listed from the
      stable_node of the ksm page, without discovering the actual vma: so in
      this patch just fake up a struct vma for page_referenced_one() or
      try_to_unmap_one(), then refine that in the next patch.
      
      Add VM_NONLINEAR to ksm_madvise()'s list of exclusions: it has always been
      implicit there (being only set with VM_SHARED, already excluded), but
      let's make it explicit, to help justify the lack of nonlinear unmap.
      
      Rely on the page lock to protect against concurrent modifications to that
      page's node of the stable tree.
      
      The awkward part is not swapout but swapin: do_swap_page() and
      page_add_anon_rmap() now have to allow for new possibilities - perhaps a
      ksm page still in swapcache, perhaps a swapcache page associated with one
      location in one anon_vma now needed for another location or anon_vma.
      (And the vma might even be no longer VM_MERGEABLE when that happens.)
      
      ksm_might_need_to_copy() checks for that case, and supplies a duplicate
      page when necessary, simply leaving it to a subsequent pass of ksmd to
      rediscover the identity and merge them back into one ksm page.
      Disappointingly primitive: but the alternative would have to accumulate
      unswappable info about the swapped out ksm pages, limiting swappability.
      
      Remove page_add_ksm_rmap(): page_add_anon_rmap() now has to allow for the
      particular case it was handling, so just use it instead.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ad64688
    • H
      ksm: stable_node point to page and back · 08beca44
      Hugh Dickins 提交于
      Add a pointer to the ksm page into struct stable_node, holding a reference
      to the page while the node exists.  Put a pointer to the stable_node into
      the ksm page's ->mapping.
      
      Then we don't need get_ksm_page() while traversing the stable tree: the
      page to compare against is sure to be present and correct, even if it's no
      longer visible through any of its existing rmap_items.
      
      And we can handle the forked ksm page case more efficiently: no need to
      memcmp our way through the tree to find its match.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08beca44
    • H
      mm: define PAGE_MAPPING_FLAGS · 3ca7b3c5
      Hugh Dickins 提交于
      At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set
      in page->mapping, with the higher bits a pointer to the anon_vma; and have
      defined PageKsm(page) as that with NULL anon_vma.
      
      But KSM swapping will need to store a pointer there: so in preparation for
      that, now define PAGE_MAPPING_FLAGS as the low two bits, including
      PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
      other use for the bit emerges).
      
      Declare page_rmapping(page) to return the pointer part of page->mapping,
      and page_anon_vma(page) to return the anon_vma pointer when that's what it
      is.  Use these in a few appropriate places: notably, unuse_vma() has been
      testing page->mapping, but is better to be testing page_anon_vma() (cases
      may be added in which flag bits are set without any pointer).
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ca7b3c5
  3. 22 9月, 2009 5 次提交
    • H
      ksm: clean up obsolete references · a913e182
      Hugh Dickins 提交于
      A few cleanups, given the munlock fix: the comment on ksm_test_exit() no
      longer applies, and it can be made private to ksm.c; there's no more
      reference to mmu_gather or tlb.h, and mmap.c doesn't need ksm.h.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a913e182
    • A
      ksm: fix deadlock with munlock in exit_mmap · 1c2fb7a4
      Andrea Arcangeli 提交于
      Rawhide users have reported hang at startup when cryptsetup is run: the
      same problem can be simply reproduced by running a program int main() {
      mlockall(MCL_CURRENT | MCL_FUTURE); return 0; }
      
      The problem is that exit_mmap() applies munlock_vma_pages_all() to
      clean up VM_LOCKED areas, and its current implementation (stupidly)
      tries to fault in absent pages, for example where PROT_NONE prevented
      them being faulted in when mlocking.  Whereas the "ksm: fix oom
      deadlock" patch, knowing there's a race by which KSM might try to fault
      in pages after exit_mmap() had finally zapped the range, backs out of
      such faults doing nothing when its ksm_test_exit() notices mm_users 0.
      
      So revert that part of "ksm: fix oom deadlock" which moved the
      ksm_exit() call from before exit_mmap() to the middle of exit_mmap();
      and remove those ksm_test_exit() checks from the page fault paths, so
      allowing the munlocking to proceed without interference.
      
      ksm_exit, if there are rmap_items still chained on this mm slot, takes
      mmap_sem write side: so preventing KSM from working on an mm while
      exit_mmap runs.  And KSM will bail out as soon as it notices that
      mm_users is already zero, thanks to its internal ksm_test_exit checks.
      So that when a task is killed by OOM killer or the user, KSM will not
      indefinitely prevent it from running exit_mmap to release its memory.
      
      This does break a part of what "ksm: fix oom deadlock" was trying to
      achieve.  When unmerging KSM (echo 2 >/sys/kernel/mm/ksm), and even
      when ksmd itself has to cancel a KSM page, it is possible that the
      first OOM-kill victim would be the KSM process being faulted: then its
      memory won't be freed until a second victim has been selected (freeing
      memory for the unmerging fault to complete).
      
      But the OOM killer is already liable to kill a second victim once the
      intended victim's p->mm goes to NULL: so there's not much point in
      rejecting this KSM patch before fixing that OOM behaviour.  It is very
      much more important to allow KSM users to boot up, than to haggle over
      an unlikely and poorly supported OOM case.
      
      We also intend to fix munlocking to not fault pages: at which point
      this patch _could_ be reverted; though that would be controversial, so
      we hope to find a better solution.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NJustin M. Forbes <jforbes@redhat.com>
      Acked-for-now-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c2fb7a4
    • H
      ksm: fix oom deadlock · 9ba69294
      Hugh Dickins 提交于
      There's a now-obvious deadlock in KSM's out-of-memory handling:
      imagine ksmd or KSM_RUN_UNMERGE handling, holding ksm_thread_mutex,
      trying to allocate a page to break KSM in an mm which becomes the
      OOM victim (quite likely in the unmerge case): it's killed and goes
      to exit, and hangs there waiting to acquire ksm_thread_mutex.
      
      Clearly we must not require ksm_thread_mutex in __ksm_exit, simple
      though that made everything else: perhaps use mmap_sem somehow?
      And part of the answer lies in the comments on unmerge_ksm_pages:
      __ksm_exit should also leave all the rmap_item removal to ksmd.
      
      But there's a fundamental problem, that KSM relies upon mmap_sem to
      guarantee the consistency of the mm it's dealing with, yet exit_mmap
      tears down an mm without taking mmap_sem.  And bumping mm_users won't
      help at all, that just ensures that the pages the OOM killer assumes
      are on their way to being freed will not be freed.
      
      The best answer seems to be, to move the ksm_exit callout from just
      before exit_mmap, to the middle of exit_mmap: after the mm's pages
      have been freed (if the mmu_gather is flushed), but before its page
      tables and vma structures have been freed; and down_write,up_write
      mmap_sem there to serialize with KSM's own reliance on mmap_sem.
      
      But KSM then needs to be careful, whenever it downs mmap_sem, to
      check that the mm is not already exiting: there's a danger of using
      find_vma on a layout that's being torn apart, or writing into page
      tables which have been freed for reuse; and even do_anonymous_page
      and __do_fault need to check they're not being called by break_ksm
      to reinstate a pte after zap_pte_range has zapped that page table.
      
      Though it might be clearer to add an exiting flag, set while holding
      mmap_sem in __ksm_exit, that wouldn't cover the issue of reinstating
      a zapped pte.  All we need is to check whether mm_users is 0 - but
      must remember that ksmd may detect that before __ksm_exit is reached.
      So, ksm_test_exit(mm) added to comment such checks on mm->mm_users.
      
      __ksm_exit now has to leave clearing up the rmap_items to ksmd,
      that needs ksm_thread_mutex; but shift the exiting mm just after the
      ksm_scan cursor so that it will soon be dealt with.  __ksm_enter raise
      mm_count to hold the mm_struct, ksmd's exit processing (exactly like
      its processing when it finds all VM_MERGEABLEs unmapped) mmdrop it,
      similar procedure for KSM_RUN_UNMERGE (which has stopped ksmd).
      
      But also give __ksm_exit a fast path: when there's no complication
      (no rmap_items attached to mm and it's not at the ksm_scan cursor),
      it can safely do all the exiting work itself.  This is not just an
      optimization: when ksmd is not running, the raised mm_count would
      otherwise leak mm_structs.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ba69294
    • H
      ksm: identify PageKsm pages · 9a840895
      Hugh Dickins 提交于
      KSM will need to identify its kernel merged pages unambiguously, and
      /proc/kpageflags will probably like to do so too.
      
      Since KSM will only be substituting anonymous pages, statistics are best
      preserved by making a PageKsm page a special PageAnon page: one with no
      anon_vma.
      
      But KSM then needs its own page_add_ksm_rmap() - keep it in ksm.h near
      PageKsm; and do_wp_page() must COW them, unlike singly mapped PageAnons.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NChris Wright <chrisw@redhat.com>
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a840895
    • H
      ksm: the mm interface to ksm · f8af4da3
      Hugh Dickins 提交于
      This patch presents the mm interface to a dummy version of ksm.c, for
      better scrutiny of that interface: the real ksm.c follows later.
      
      When CONFIG_KSM is not set, madvise(2) reject MADV_MERGEABLE and
      MADV_UNMERGEABLE with EINVAL, since that seems more helpful than
      pretending that they can be serviced.  But when CONFIG_KSM=y, accept them
      even if KSM is not currently running, and even on areas which KSM will not
      touch (e.g.  hugetlb or shared file or special driver mappings).
      
      Like other madvices, report ENOMEM despite success if any area in the
      range is unmapped, and use EAGAIN to report out of memory.
      
      Define vma flag VM_MERGEABLE to identify an area on which KSM may try
      merging pages: leave it to ksm_madvise() to decide whether to set it.
      Define mm flag MMF_VM_MERGEABLE to identify an mm which might contain
      VM_MERGEABLE areas, to minimize callouts when forking or exiting.
      
      Based upon earlier patches by Chris Wright and Izik Eidus.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NChris Wright <chrisw@redhat.com>
      Signed-off-by: NIzik Eidus <ieidus@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8af4da3