1. 22 3月, 2012 11 次提交
    • R
      vmscan: kswapd carefully call compaction · 7be62de9
      Rik van Riel 提交于
      With CONFIG_COMPACTION enabled, kswapd does not try to free contiguous
      free pages, even when it is woken for a higher order request.
      
      This could be bad for eg.  jumbo frame network allocations, which are done
      from interrupt context and cannot compact memory themselves.  Higher than
      before allocation failure rates in the network receive path have been
      observed in kernels with compaction enabled.
      
      Teach kswapd to defragment the memory zones in a node, but only if
      required and compaction is not deferred in a zone.
      
      [akpm@linux-foundation.org: reduce scope of zones_need_compaction]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7be62de9
    • R
      vmscan: reclaim at order 0 when compaction is enabled · fe2c2a10
      Rik van Riel 提交于
      When built with CONFIG_COMPACTION, kswapd should not try to free
      contiguous pages, because it is not trying hard enough to have a real
      chance at being successful, but still disrupts the LRU enough to break
      other things.
      
      Do not do higher order page isolation unless we really are in lumpy
      reclaim mode.
      
      Stop reclaiming pages once we have enough free pages that compaction can
      deal with things, and we hit the normal order 0 watermarks used by kswapd.
      
      Also remove a line of code that increments balanced right before exiting
      the function.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe2c2a10
    • R
      mm: make swapin readahead skip over holes · 67f96aa2
      Rik van Riel 提交于
      Ever since abandoning the virtual scan of processes, for scalability
      reasons, swap space has been a little more fragmented than before.  This
      can lead to the situation where a large memory user is killed, swap space
      ends up full of "holes" and swapin readahead is totally ineffective.
      
      On my home system, after killing a leaky firefox it took over an hour to
      page just under 2GB of memory back in, slowing the virtual machines down
      to a crawl.
      
      This patch makes swapin readahead simply skip over holes, instead of
      stopping at them.  This allows the system to swap things back in at rates
      of several MB/second, instead of a few hundred kB/second.
      
      The checks done in valid_swaphandles are already done in
      read_swap_cache_async as well, allowing us to remove a fair amount of
      code.
      
      [akpm@linux-foundation.org: fix it for page_cluster >= 32]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Adrian Drzewiecki <z@drze.net>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67f96aa2
    • H
      mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone() · c38446cc
      Hillf Danton 提交于
      The value of nr_reclaimed is the number of pages reclaimed in the current
      round of the loop, whereas nr_to_reclaim should be compared with the
      number of pages reclaimed in all rounds.
      
      In each round of the loop, reclaimed pages are cut off from the reclaim
      goal, and the loop stops once the goal achieved.
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c38446cc
    • K
      mm: make get_mm_counter static-inline · 69c97823
      Konstantin Khlebnikov 提交于
      Make get_mm_counter() always static inline, it is simple enough for that.
      And remove unused set_mm_counter()
      
      bloat-o-meter:
      
      add/remove: 0/1 grow/shrink: 4/12 up/down: 99/-341 (-242)
      function                                     old     new   delta
      try_to_unmap_one                             886     952     +66
      sys_remap_file_pages                        1214    1230     +16
      dup_mm                                      1684    1700     +16
      do_exit                                     2277    2278      +1
      zap_page_range                               208     205      -3
      unmap_region                                 304     296      -8
      static.oom_kill_process                      554     546      -8
      try_to_unmap_file                           1716    1700     -16
      getrusage                                    925     909     -16
      flush_old_exec                              1704    1688     -16
      static.dump_header                           416     390     -26
      acct_update_integrals                        218     187     -31
      do_task_stat                                2986    2954     -32
      get_mm_counter                                34       -     -34
      xacct_add_tsk                                371     334     -37
      task_statm                                   172     118     -54
      task_mem                                     383     323     -60
      
      try_to_unmap_one() grows because update_hiwater_rss() now completely inline.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69c97823
    • H
      mm/vmscan.c: cleanup with s/reclaim_mode/isolate_mode/ · 61317289
      Hillf Danton 提交于
      With tons of reclaim_mode (defined as one field of struct scan_control)
      already in the file, it is clearer to rename the local reclaim_mode when
      setting up the isolation mode.
      Signed-off-by: NHillf Danton <dhillf@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61317289
    • D
      mm, oom: introduce independent oom killer ratelimit state · dc3f21ea
      David Rientjes 提交于
      printk_ratelimit() uses the global ratelimit state for all printks.  The
      oom killer should not be subjected to this state just because another
      subsystem or driver may be flooding the kernel log.
      
      This patch introduces printk ratelimiting specifically for the oom killer.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc3f21ea
    • D
      mm, oom: do not emit oom killer warning if chosen thread is already exiting · 8447d950
      David Rientjes 提交于
      If a thread is chosen for oom kill and is already PF_EXITING, then the oom
      killer simply sets TIF_MEMDIE and returns.  This allows the thread to have
      access to memory reserves so that it may quickly exit.  This logic is
      preceeded with a comment saying there's no need to alarm the sysadmin.
      This patch adds truth to that statement.
      
      There's no need to emit any warning about the oom condition if the thread
      is already exiting since it will not be killed.  In this condition, just
      silently return the oom killer since its only giving access to memory
      reserves and is otherwise a no-op.
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8447d950
    • D
      mm, oom: fold oom_kill_task() into oom_kill_process() · 647f2bdf
      David Rientjes 提交于
      oom_kill_task() has a single caller, so fold it into its parent function,
      oom_kill_process().  Slightly reduces the number of lines in the oom
      killer.
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      647f2bdf
    • D
      mm, oom: avoid looping when chosen thread detaches its mm · 2a1c9b1f
      David Rientjes 提交于
      oom_kill_task() returns non-zero iff the chosen process does not have any
      threads with an attached ->mm.
      
      In such a case, it's better to just return to the page allocator and retry
      the allocation because memory could have been freed in the interim and the
      oom condition may no longer exist.  It's unnecessary to loop in the oom
      killer and find another thread to kill.
      
      This allows both oom_kill_task() and oom_kill_process() to be converted to
      void functions.  If the oom condition persists, the oom killer will be
      recalled.
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a1c9b1f
    • A
      mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode · 1a5a9906
      Andrea Arcangeli 提交于
      In some cases it may happen that pmd_none_or_clear_bad() is called with
      the mmap_sem hold in read mode.  In those cases the huge page faults can
      allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
      false positive from pmd_bad() that will not like to see a pmd
      materializing as trans huge.
      
      It's not khugepaged causing the problem, khugepaged holds the mmap_sem
      in write mode (and all those sites must hold the mmap_sem in read mode
      to prevent pagetables to go away from under them, during code review it
      seems vm86 mode on 32bit kernels requires that too unless it's
      restricted to 1 thread per process or UP builds).  The race is only with
      the huge pagefaults that can convert a pmd_none() into a
      pmd_trans_huge().
      
      Effectively all these pmd_none_or_clear_bad() sites running with
      mmap_sem in read mode are somewhat speculative with the page faults, and
      the result is always undefined when they run simultaneously.  This is
      probably why it wasn't common to run into this.  For example if the
      madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
      fault, the hugepage will not be zapped, if the page fault runs first it
      will be zapped.
      
      Altering pmd_bad() not to error out if it finds hugepmds won't be enough
      to fix this, because zap_pmd_range would then proceed to call
      zap_pte_range (which would be incorrect if the pmd become a
      pmd_trans_huge()).
      
      The simplest way to fix this is to read the pmd in the local stack
      (regardless of what we read, no need of actual CPU barriers, only
      compiler barrier needed), and be sure it is not changing under the code
      that computes its value.  Even if the real pmd is changing under the
      value we hold on the stack, we don't care.  If we actually end up in
      zap_pte_range it means the pmd was not none already and it was not huge,
      and it can't become huge from under us (khugepaged locking explained
      above).
      
      All we need is to enforce that there is no way anymore that in a code
      path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
      can run into a hugepmd.  The overhead of a barrier() is just a compiler
      tweak and should not be measurable (I only added it for THP builds).  I
      don't exclude different compiler versions may have prevented the race
      too by caching the value of *pmd on the stack (that hasn't been
      verified, but it wouldn't be impossible considering
      pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
      and there's no external function called in between pmd_trans_huge and
      pmd_none_or_clear_bad).
      
      		if (pmd_trans_huge(*pmd)) {
      			if (next-addr != HPAGE_PMD_SIZE) {
      				VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
      				split_huge_page_pmd(vma->vm_mm, pmd);
      			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
      				continue;
      			/* fall through */
      		}
      		if (pmd_none_or_clear_bad(pmd))
      
      Because this race condition could be exercised without special
      privileges this was reported in CVE-2012-1179.
      
      The race was identified and fully explained by Ulrich who debugged it.
      I'm quoting his accurate explanation below, for reference.
      
      ====== start quote =======
            mapcount 0 page_mapcount 1
            kernel BUG at mm/huge_memory.c:1384!
      
          At some point prior to the panic, a "bad pmd ..." message similar to the
          following is logged on the console:
      
            mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).
      
          The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
          the page's PMD table entry.
      
              143 void pmd_clear_bad(pmd_t *pmd)
              144 {
          ->  145         pmd_ERROR(*pmd);
              146         pmd_clear(pmd);
              147 }
      
          After the PMD table entry has been cleared, there is an inconsistency
          between the actual number of PMD table entries that are mapping the page
          and the page's map count (_mapcount field in struct page). When the page
          is subsequently reclaimed, __split_huge_page() detects this inconsistency.
      
             1381         if (mapcount != page_mapcount(page))
             1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
             1383                        mapcount, page_mapcount(page));
          -> 1384         BUG_ON(mapcount != page_mapcount(page));
      
          The root cause of the problem is a race of two threads in a multithreaded
          process. Thread B incurs a page fault on a virtual address that has never
          been accessed (PMD entry is zero) while Thread A is executing an madvise()
          system call on a virtual address within the same 2 MB (huge page) range.
      
                     virtual address space
                    .---------------------.
                    |                     |
                    |                     |
                  .-|---------------------|
                  | |                     |
                  | |                     |<-- B(fault)
                  | |                     |
            2 MB  | |/////////////////////|-.
            huge <  |/////////////////////|  > A(range)
            page  | |/////////////////////|-'
                  | |                     |
                  | |                     |
                  '-|---------------------|
                    |                     |
                    |                     |
                    '---------------------'
      
          - Thread A is executing an madvise(..., MADV_DONTNEED) system call
            on the virtual address range "A(range)" shown in the picture.
      
          sys_madvise
            // Acquire the semaphore in shared mode.
            down_read(&current->mm->mmap_sem)
            ...
            madvise_vma
              switch (behavior)
              case MADV_DONTNEED:
                   madvise_dontneed
                     zap_page_range
                       unmap_vmas
                         unmap_page_range
                           zap_pud_range
                             zap_pmd_range
                               //
                               // Assume that this huge page has never been accessed.
                               // I.e. content of the PMD entry is zero (not mapped).
                               //
                               if (pmd_trans_huge(*pmd)) {
                                   // We don't get here due to the above assumption.
                               }
                               //
                               // Assume that Thread B incurred a page fault and
                   .---------> // sneaks in here as shown below.
                   |           //
                   |           if (pmd_none_or_clear_bad(pmd))
                   |               {
                   |                 if (unlikely(pmd_bad(*pmd)))
                   |                     pmd_clear_bad
                   |                     {
                   |                       pmd_ERROR
                   |                         // Log "bad pmd ..." message here.
                   |                       pmd_clear
                   |                         // Clear the page's PMD entry.
                   |                         // Thread B incremented the map count
                   |                         // in page_add_new_anon_rmap(), but
                   |                         // now the page is no longer mapped
                   |                         // by a PMD entry (-> inconsistency).
                   |                     }
                   |               }
                   |
                   v
          - Thread B is handling a page fault on virtual address "B(fault)" shown
            in the picture.
      
          ...
          do_page_fault
            __do_page_fault
              // Acquire the semaphore in shared mode.
              down_read_trylock(&mm->mmap_sem)
              ...
              handle_mm_fault
                if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
                    // We get here due to the above assumption (PMD entry is zero).
                    do_huge_pmd_anonymous_page
                      alloc_hugepage_vma
                        // Allocate a new transparent huge page here.
                      ...
                      __do_huge_pmd_anonymous_page
                        ...
                        spin_lock(&mm->page_table_lock)
                        ...
                        page_add_new_anon_rmap
                          // Here we increment the page's map count (starts at -1).
                          atomic_set(&page->_mapcount, 0)
                        set_pmd_at
                          // Here we set the page's PMD entry which will be cleared
                          // when Thread A calls pmd_clear_bad().
                        ...
                        spin_unlock(&mm->page_table_lock)
      
          The mmap_sem does not prevent the race because both threads are acquiring
          it in shared mode (down_read).  Thread B holds the page_table_lock while
          the page's map count and PMD table entry are updated.  However, Thread A
          does not synchronize on that lock.
      
      ====== end quote =======
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Jones <davej@redhat.com>
      Acked-by: NLarry Woodman <lwoodman@redhat.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>		[2.6.38+]
      Cc: Mark Salter <msalter@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a5a9906
  2. 20 3月, 2012 1 次提交
  3. 16 3月, 2012 1 次提交
    • H
      memcg: free mem_cgroup by RCU to fix oops · 59927fb9
      Hugh Dickins 提交于
      After fixing the GPF in mem_cgroup_lru_del_list(), three times one
      machine running a similar load (moving and removing memcgs while
      swapping) has oopsed in mem_cgroup_zone_nr_lru_pages(), when retrieving
      memcg zone numbers for get_scan_count() for shrink_mem_cgroup_zone():
      this is where a struct mem_cgroup is first accessed after being chosen
      by mem_cgroup_iter().
      
      Just what protects a struct mem_cgroup from being freed, in between
      mem_cgroup_iter()'s css_get_next() and its css_tryget()? css_tryget()
      fails once css->refcnt is zero with CSS_REMOVED set in flags, yes: but
      what if that memory is freed and reused for something else, which sets
      "refcnt" non-zero? Hmm, and scope for an indefinite freeze if refcnt is
      left at zero but flags are cleared.
      
      It's tempting to move the css_tryget() into css_get_next(), to make it
      really "get" the css, but I don't think that actually solves anything:
      the same difficulty in moving from css_id found to stable css remains.
      
      But we already have rcu_read_lock() around the two, so it's easily fixed
      if __mem_cgroup_free() just uses kfree_rcu() to free mem_cgroup.
      
      However, a big struct mem_cgroup is allocated with vzalloc() instead of
      kzalloc(), and we're not allowed to vfree() at interrupt time: there
      doesn't appear to be a general vfree_rcu() to help with this, so roll
      our own using schedule_work().  The compiler decently removes
      vfree_work() and vfree_rcu() when the config doesn't need them.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59927fb9
  4. 10 3月, 2012 1 次提交
  5. 07 3月, 2012 4 次提交
    • L
      vm: avoid using find_vma_prev() unnecessarily · 097d5910
      Linus Torvalds 提交于
      Several users of "find_vma_prev()" were not in fact interested in the
      previous vma if there was no primary vma to be found either.  And in
      those cases, we're much better off just using the regular "find_vma()",
      and then "prev" can be looked up by just checking vma->vm_prev.
      
      The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
      commit 83cd904d: "mm: fix find_vma_prev"), and the whole "return
      prev by reference" means that it generates worse code too.
      
      Thus this "let's avoid using this inconvenient and clearly too subtle
      interface when we don't really have to" patch.
      
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      097d5910
    • M
      mm: fix find_vma_prev · 83cd904d
      Mikulas Patocka 提交于
      Commit 6bd4837d ("mm: simplify find_vma_prev()") broke memory
      management on PA-RISC.
      
      After application of the patch, programs that allocate big arrays on the
      stack crash with segfault, for example, this will crash if compiled
      without optimization:
      
        int main()
        {
      	char array[200000];
      	array[199999] = 0;
      	return 0;
        }
      
      The reason is that PA-RISC has up-growing stack and the stack is usually
      the last memory area.  In the above example, a page fault happens above
      the stack.
      
      Previously, if we passed too high address to find_vma_prev, it returned
      NULL and stored the last VMA in *pprev.  After "simplify find_vma_prev"
      change, it stores NULL in *pprev.  Consequently, the stack area is not
      found and it is not expanded, as it used to be before the change.
      
      This patch restores the old behavior and makes it return the last VMA in
      *pprev if the requested address is higher than address of any other VMA.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83cd904d
    • H
      mmap: EINVAL not ENOMEM when rejecting VM_GROWS · ce8fea7a
      Hugh Dickins 提交于
      Currently error is -ENOMEM when rejecting VM_GROWSDOWN|VM_GROWSUP
      from shared anonymous: hoist the file case's -EINVAL up for both.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce8fea7a
    • H
      page_cgroup: fix horrid swap accounting regression · c09ff089
      Hugh Dickins 提交于
      Why is memcg's swap accounting so broken? Insane counts, wrong
      ownership, unfreeable structures, which later get freed and then
      accessed after free.
      
      Turns out to be a tiny a little 3.3-rc1 regression in 9fb4b7cc
      "page_cgroup: add helper function to get swap_cgroup": the helper
      function (actually named lookup_swap_cgroup()) returns an address using
      void* arithmetic, but the structure in question is a short.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NBob Liu <lliubbo@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c09ff089
  6. 06 3月, 2012 6 次提交
    • N
      memcg: fix mapcount check in move charge code for anonymous page · e6ca7b89
      Naoya Horiguchi 提交于
      Currently the charge on shared anonyous pages is supposed not to moved in
      task migration.  To implement this, we need to check that mapcount > 1,
      instread of > 2.  So this patch fixes it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6ca7b89
    • A
      mm: thp: fix BUG on mm->nr_ptes · 1c641e84
      Andrea Arcangeli 提交于
      Dave Jones reports a few Fedora users hitting the BUG_ON(mm->nr_ptes...)
      in exit_mmap() recently.
      
      Quoting Hugh's discovery and explanation of the SMP race condition:
      
        "mm->nr_ptes had unusual locking: down_read mmap_sem plus
         page_table_lock when incrementing, down_write mmap_sem (or mm_users
         0) when decrementing; whereas THP is careful to increment and
         decrement it under page_table_lock.
      
         Now most of those paths in THP also hold mmap_sem for read or write
         (with appropriate checks on mm_users), but two do not: when
         split_huge_page() is called by hwpoison_user_mappings(), and when
         called by add_to_swap().
      
         It's conceivable that the latter case is responsible for the
         exit_mmap() BUG_ON mm->nr_ptes that has been reported on Fedora."
      
      The simplest way to fix it without having to alter the locking is to make
      split_huge_page() a noop in nr_ptes terms, so by counting the preallocated
      pagetables that exists for every mapped hugepage.  It was an arbitrary
      choice not to count them and either way is not wrong or right, because
      they are not used but they're still allocated.
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Josh Boyer <jwboyer@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.0.x, 3.1.x, 3.2.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c641e84
    • H
      memcg: fix GPF when cgroup removal races with last exit · 7512102c
      Hugh Dickins 提交于
      When moving tasks from old memcg (with move_charge_at_immigrate on new
      memcg), followed by removal of old memcg, hit General Protection Fault in
      mem_cgroup_lru_del_list() (called from release_pages called from
      free_pages_and_swap_cache from tlb_flush_mmu from tlb_finish_mmu from
      exit_mmap from mmput from exit_mm from do_exit).
      
      Somewhat reproducible, takes a few hours: the old struct mem_cgroup has
      been freed and poisoned by SLAB_DEBUG, but mem_cgroup_lru_del_list() is
      still trying to update its stats, and take page off lru before freeing.
      
      A task, or a charge, or a page on lru: each secures a memcg against
      removal.  In this case, the last task has been moved out of the old memcg,
      and it is exiting: anonymous pages are uncharged one by one from the
      memcg, as they are zapped from its pagetables, so the charge gets down to
      0; but the pages themselves are queued in an mmu_gather for freeing.
      
      Most of those pages will be on lru (and force_empty is careful to
      lru_add_drain_all, to add pages from pagevec to lru first), but not
      necessarily all: perhaps some have been isolated for page reclaim, perhaps
      some isolated for other reasons.  So, force_empty may find no task, no
      charge and no page on lru, and let the removal proceed.
      
      There would still be no problem if these pages were immediately freed; but
      typically (and the put_page_testzero protocol demands it) they have to be
      added back to lru before they are found freeable, then removed from lru
      and freed.  We don't see the issue when adding, because the
      mem_cgroup_iter() loops keep their own reference to the memcg being
      scanned; but when it comes to mem_cgroup_lru_del_list().
      
      I believe this was not an issue in v3.2: there, PageCgroupAcctLRU and
      PageCgroupUsed flags were used (like a trick with mirrors) to deflect view
      of pc->mem_cgroup to the stable root_mem_cgroup when neither set.
      38c5d72f ("memcg: simplify LRU handling by new rule") mercifully
      removed those convolutions, but left this General Protection Fault.
      
      But it's surprisingly easy to restore the old behaviour: just check
      PageCgroupUsed in mem_cgroup_lru_add_list() (which decides on which lruvec
      to add), and reset pc to root_mem_cgroup if page is uncharged.  A risky
      change?  just going back to how it worked before; testing, and an audit of
      uses of pc->mem_cgroup, show no problem.
      
      And there's a nice bonus: with mem_cgroup_lru_add_list() itself making
      sure that an uncharged page goes to root lru, mem_cgroup_reset_owner() no
      longer has any purpose, and we can safely revert 4e5f01c2 ("memcg:
      clear pc->mem_cgroup if necessary").
      
      Calling update_page_reclaim_stat() after add_page_to_lru_list() in swap.c
      is not strictly necessary: the lru_lock there, with RCU before memcg
      structures are freed, makes mem_cgroup_get_reclaim_stat_from_page safe
      without that; but it seems cleaner to rely on one dependency less.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7512102c
    • H
      memcg: fix deadlock by inverting lrucare nesting · 9ce70c02
      Hugh Dickins 提交于
      We have forgotten the rules of lock nesting: the irq-safe ones must be
      taken inside the non-irq-safe ones, otherwise we are open to deadlock:
      
      CPU0                          CPU1
      ----                          ----
      lock(&(&pc->lock)->rlock);
                                    local_irq_disable();
                                    lock(&(&zone->lru_lock)->rlock);
                                    lock(&(&pc->lock)->rlock);
      <Interrupt>
      lock(&(&zone->lru_lock)->rlock);
      
      To check a different locking issue, I happened to add a spin_lock to
      memcg's bit_spin_lock in lock_page_cgroup(), and lockdep very quickly
      complained about __mem_cgroup_commit_charge_lrucare() (on CPU1 above).
      
      So delete __mem_cgroup_commit_charge_lrucare(), passing a bool lrucare to
      __mem_cgroup_commit_charge() instead, taking zone->lru_lock under
      lock_page_cgroup() in the lrucare case.
      
      The original was using spin_lock_irqsave, but we'd be in more trouble if
      it were ever called at interrupt time: unconditional _irq is enough.  And
      ClearPageLRU before del from lru, SetPageLRU before add to lru: no strong
      reason, but that is the ordering used consistently elsewhere.
      
      Fixes 36b62ad5 ("memcg: simplify corner case handling
      of LRU").
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ce70c02
    • A
      flush_tlb_range() needs ->page_table_lock when ->mmap_sem is not held · cd2934a3
      Al Viro 提交于
      All other callers already hold either ->mmap_sem (exclusive) or
      ->page_table_lock.  And we need it because some page table flushing
      instanced do work explicitly with ge tables.
      
      See e.g.  arch/powerpc/mm/tlb_hash32.c, flush_tlb_range() and
      flush_range() in there.  The same goes for uml, with a lot more
      extensive playing with page tables.
      
      Almost all callers are actually fine - flush_tlb_range() may have no
      need to bother playing with page tables, but it can do so safely; again,
      this caller is the sole exception - everything else either has exclusive
      ->mmap_sem on the mm in question, or mm->page_table_lock is held.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd2934a3
    • A
      835ee797
  7. 01 3月, 2012 1 次提交
    • T
      memblock: Fix size aligning of memblock_alloc_base_nid() · 847854f5
      Tejun Heo 提交于
      memblock allocator aligns @size to @align to reduce the amount
      of fragmentation.  Commit:
      
       7bd0b0f0 ("memblock: Reimplement memblock allocation using reverse free area iterator")
      
      Broke it by incorrectly relocating @size aligning to
      memblock_find_in_range_node().  As the aligned size is not
      propagated back to memblock_alloc_base_nid(), the actually
      reserved size isn't aligned.
      
      While this increases memory use for memblock reserved array,
      this shouldn't cause any critical failure; however, it seems
      that the size aligning was hiding a use-beyond-allocation bug in
      sparc64 and losing the aligning causes boot failure.
      
      The underlying problem is currently being debugged but this is a
      proper fix in itself, it's already pretty late in -rc cycle for
      boot failures and reverting the change for debugging isn't
      difficult. Restore the size aligning moving it to
      memblock_alloc_base_nid().
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120228205621.GC3252@dhcp-172-17-108-109.mtv.corp.google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <alpine.SOC.1.00.1202130942030.1488@math.ut.ee>
      847854f5
  8. 25 2月, 2012 3 次提交
  9. 23 2月, 2012 1 次提交
  10. 14 2月, 2012 1 次提交
  11. 09 2月, 2012 2 次提交
    • H
      mm: fix UP THP spin_is_locked BUGs · b9980cdc
      Hugh Dickins 提交于
      Fix CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_SMP=n CONFIG_DEBUG_VM=y
      CONFIG_DEBUG_SPINLOCK=n kernel: spin_is_locked() is then always false,
      and so triggers some BUGs in Transparent HugePage codepaths.
      
      asm-generic/bug.h mentions this problem, and provides a WARN_ON_SMP(x);
      but being too lazy to add VM_BUG_ON_SMP, BUG_ON_SMP, WARN_ON_SMP_ONCE,
      VM_WARN_ON_SMP_ONCE, just test NR_CPUS != 1 in the existing VM_BUG_ONs.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9980cdc
    • M
      mm: compaction: check for overlapping nodes during isolation for migration · dc908600
      Mel Gorman 提交于
      When isolating pages for migration, migration starts at the start of a
      zone while the free scanner starts at the end of the zone.  Migration
      avoids entering a new zone by never going beyond the free scanned.
      
      Unfortunately, in very rare cases nodes can overlap.  When this happens,
      migration isolates pages without the LRU lock held, corrupting lists
      which will trigger errors in reclaim or during page free such as in the
      following oops
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        IP: [<ffffffff810f795c>] free_pcppages_bulk+0xcc/0x450
        PGD 1dda554067 PUD 1e1cb58067 PMD 0
        Oops: 0000 [#1] SMP
        CPU 37
        Pid: 17088, comm: memcg_process_s Tainted: G            X
        RIP: free_pcppages_bulk+0xcc/0x450
        Process memcg_process_s (pid: 17088, threadinfo ffff881c2926e000, task ffff881c2926c0c0)
        Call Trace:
          free_hot_cold_page+0x17e/0x1f0
          __pagevec_free+0x90/0xb0
          release_pages+0x22a/0x260
          pagevec_lru_move_fn+0xf3/0x110
          putback_lru_page+0x66/0xe0
          unmap_and_move+0x156/0x180
          migrate_pages+0x9e/0x1b0
          compact_zone+0x1f3/0x2f0
          compact_zone_order+0xa2/0xe0
          try_to_compact_pages+0xdf/0x110
          __alloc_pages_direct_compact+0xee/0x1c0
          __alloc_pages_slowpath+0x370/0x830
          __alloc_pages_nodemask+0x1b1/0x1c0
          alloc_pages_vma+0x9b/0x160
          do_huge_pmd_anonymous_page+0x160/0x270
          do_page_fault+0x207/0x4c0
          page_fault+0x25/0x30
      
      The "X" in the taint flag means that external modules were loaded but but
      is unrelated to the bug triggering.  The real problem was because the PFN
      layout looks like this
      
        Zone PFN ranges:
          DMA      0x00000010 -> 0x00001000
          DMA32    0x00001000 -> 0x00100000
          Normal   0x00100000 -> 0x01e80000
        Movable zone start PFN for each node
        early_node_map[14] active PFN ranges
            0: 0x00000010 -> 0x0000009b
            0: 0x00000100 -> 0x0007a1ec
            0: 0x0007a354 -> 0x0007a379
            0: 0x0007f7ff -> 0x0007f800
            0: 0x00100000 -> 0x00680000
            1: 0x00680000 -> 0x00e80000
            0: 0x00e80000 -> 0x01080000
            1: 0x01080000 -> 0x01280000
            0: 0x01280000 -> 0x01480000
            1: 0x01480000 -> 0x01680000
            0: 0x01680000 -> 0x01880000
            1: 0x01880000 -> 0x01a80000
            0: 0x01a80000 -> 0x01c80000
            1: 0x01c80000 -> 0x01e80000
      
      The fix is straight-forward.  isolate_migratepages() has to make a
      similar check to isolate_freepage to ensure that it never isolates pages
      from a zone it does not hold the LRU lock for.
      
      This was discovered in a 3.0-based kernel but it affects 3.1.x, 3.2.x
      and current mainline.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc908600
  12. 04 2月, 2012 5 次提交
    • M
      mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block... · 0bf380bc
      Mel Gorman 提交于
      mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for migration
      
      When isolating for migration, migration starts at the start of a zone
      which is not necessarily pageblock aligned.  Further, it stops isolating
      when COMPACT_CLUSTER_MAX pages are isolated so migrate_pfn is generally
      not aligned.  This allows isolate_migratepages() to call pfn_to_page() on
      an invalid PFN which can result in a crash.  This was originally reported
      against a 3.0-based kernel with the following trace in a crash dump.
      
      PID: 9902   TASK: d47aecd0  CPU: 0   COMMAND: "memcg_process_s"
       #0 [d72d3ad0] crash_kexec at c028cfdb
       #1 [d72d3b24] oops_end at c05c5322
       #2 [d72d3b38] __bad_area_nosemaphore at c0227e60
       #3 [d72d3bec] bad_area at c0227fb6
       #4 [d72d3c00] do_page_fault at c05c72ec
       #5 [d72d3c80] error_code (via page_fault) at c05c47a4
          EAX: 00000000  EBX: 000c0000  ECX: 00000001  EDX: 00000807  EBP: 000c0000
          DS:  007b      ESI: 00000001  ES:  007b      EDI: f3000a80  GS:  6f50
          CS:  0060      EIP: c030b15a  ERR: ffffffff  EFLAGS: 00010002
       #6 [d72d3cb4] isolate_migratepages at c030b15a
       #7 [d72d3d14] zone_watermark_ok at c02d26cb
       #8 [d72d3d2c] compact_zone at c030b8de
       #9 [d72d3d68] compact_zone_order at c030bba1
      #10 [d72d3db4] try_to_compact_pages at c030bc84
      #11 [d72d3ddc] __alloc_pages_direct_compact at c02d61e7
      #12 [d72d3e08] __alloc_pages_slowpath at c02d66c7
      #13 [d72d3e78] __alloc_pages_nodemask at c02d6a97
      #14 [d72d3eb8] alloc_pages_vma at c030a845
      #15 [d72d3ed4] do_huge_pmd_anonymous_page at c03178eb
      #16 [d72d3f00] handle_mm_fault at c02f36c6
      #17 [d72d3f30] do_page_fault at c05c70ed
      #18 [d72d3fb0] error_code (via page_fault) at c05c47a4
          EAX: b71ff000  EBX: 00000001  ECX: 00001600  EDX: 00000431
          DS:  007b      ESI: 08048950  ES:  007b      EDI: bfaa3788
          SS:  007b      ESP: bfaa36e0  EBP: bfaa3828  GS:  6f50
          CS:  0073      EIP: 080487c8  ERR: ffffffff  EFLAGS: 00010202
      
      It was also reported by Herbert van den Bergh against 3.1-based kernel
      with the following snippet from the console log.
      
      BUG: unable to handle kernel paging request at 01c00008
      IP: [<c0522399>] isolate_migratepages+0x119/0x390
      *pdpt = 000000002f7ce001 *pde = 0000000000000000
      
      It is expected that it also affects 3.2.x and current mainline.
      
      The problem is that pfn_valid is only called on the first PFN being
      checked and that PFN is not necessarily aligned.  Lets say we have a case
      like this
      
      H = MAX_ORDER_NR_PAGES boundary
      | = pageblock boundary
      m = cc->migrate_pfn
      f = cc->free_pfn
      o = memory hole
      
      H------|------H------|----m-Hoooooo|ooooooH-f----|------H
      
      The migrate_pfn is just below a memory hole and the free scanner is beyond
      the hole.  When isolate_migratepages started, it scans from migrate_pfn to
      migrate_pfn+pageblock_nr_pages which is now in a memory hole.  It checks
      pfn_valid() on the first PFN but then scans into the hole where there are
      not necessarily valid struct pages.
      
      This patch ensures that isolate_migratepages calls pfn_valid when
      necessary.
      Reported-by: NHerbert van den Bergh <herbert.van.den.bergh@oracle.com>
      Tested-by: NHerbert van den Bergh <herbert.van.den.bergh@oracle.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bf380bc
    • S
      readahead: fix pipeline break caused by block plug · 3deaa719
      Shaohua Li 提交于
      Herbert Poetzl reported a performance regression since 2.6.39.  The test
      is a simple dd read, but with big block size.  The reason is:
      
      T1: ra (A, A+128k), (A+128k, A+256k)
      T2: lock_page for page A, submit the 256k
      T3: hit page A+128K, ra (A+256k, A+384). the range isn't submitted
      because of plug and there isn't any lock_page till we hit page A+256k
      because all pages from A to A+256k is in memory
      T4: hit page A+256k, ra (A+384, A+ 512). Because of plug, the range isn't
      submitted again.
      T5: lock_page A+256k, so (A+256k, A+512k) will be submitted. The task is
      waitting for (A+256k, A+512k) finish.
      
      There is no request to disk in T3 and T4, so readahead pipeline breaks.
      
      We really don't need block plug for generic_file_aio_read() for buffered
      I/O.  The readahead already has plug and has fine grained control when I/O
      should be submitted.  Deleting plug for buffered I/O fixes the regression.
      
      One side effect is plug makes the request size 256k, the size is 128k
      without it.  This is because default ra size is 128k and not a reason we
      need plug here.
      
      Vivek said:
      
      : We submit some readahead IO to device request queue but because of nested
      : plug, queue never gets unplugged.  When read logic reaches a page which is
      : not in page cache, it waits for page to be read from the disk
      : (lock_page_killable()) and that time we flush the plug list.
      :
      : So effectively read ahead logic is kind of broken in parts because of
      : nested plugging.  Removing top level plug (generic_file_aio_read()) for
      : buffered reads, will allow unplugging queue earlier for readahead.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reported-by: NHerbert Poetzl <herbert@13thfloor.at>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3deaa719
    • C
      mm/filemap_xip.c: fix race condition in xip_file_fault() · 99f02ef1
      Carsten Otte 提交于
      Fix a race condition that shows in conjunction with xip_file_fault() when
      two threads of the same user process fault on the same memory page.
      
      In this case, the race winner will install the page table entry and the
      unlucky loser will cause an oops: xip_file_fault calls vm_insert_pfn (via
      vm_insert_mixed) which drops out at this check:
      
      	retval = -EBUSY;
      	if (!pte_none(*pte))
      		goto out_unlock;
      
      The resulting -EBUSY return value will trigger a BUG_ON() in
      xip_file_fault.
      
      This fix simply considers the fault as fixed in this case, because the
      race winner has successfully installed the pte.
      
      [akpm@linux-foundation.org: use conventional (and consistent) comment layout]
      Reported-by: NDavid Sadler <dsadler@us.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Reported-by: NLouis Alex Eisner <leisner@cs.ucsd.edu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99f02ef1
    • A
      mm/memcontrol.c: fix warning with CONFIG_NUMA=n · 82b3f2a7
      Andrew Morton 提交于
      mm/memcontrol.c: In function 'memcg_check_events':
      mm/memcontrol.c:779: warning: unused variable 'do_numainfo'
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Hiroyuki KAMEZAWA <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82b3f2a7
    • K
      mm: postpone migrated page mapping reset · 35512eca
      Konstantin Khlebnikov 提交于
      Postpone resetting page->mapping until the final remove_migration_ptes().
      Otherwise the expression PageAnon(migration_entry_to_page(entry)) does not
      work.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35512eca
  13. 03 2月, 2012 2 次提交
  14. 01 2月, 2012 1 次提交