1. 01 3月, 2011 1 次提交
  2. 17 2月, 2011 1 次提交
  3. 12 2月, 2011 5 次提交
    • K
      memcg: fix leak of accounting at failure path of hugepage collapsing · 678ff896
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_uncharge_page() should be called in all failure cases after
      mem_cgroup_charge_newpage() is called in huge_memory.c::collapse_huge_page()
      
       [ 4209.076861] BUG: Bad page state in process khugepaged  pfn:1e9800
       [ 4209.077601] page:ffffea0006b14000 count:0 mapcount:0 mapping:          (null) index:0x2800
       [ 4209.078674] page flags: 0x40000000004000(head)
       [ 4209.079294] pc:ffff880214a30000 pc->flags:2146246697418756 pc->mem_cgroup:ffffc9000177a000
       [ 4209.082177] (/A)
       [ 4209.082500] Pid: 31, comm: khugepaged Not tainted 2.6.38-rc3-mm1 #1
       [ 4209.083412] Call Trace:
       [ 4209.083678]  [<ffffffff810f4454>] ? bad_page+0xe4/0x140
       [ 4209.084240]  [<ffffffff810f53e6>] ? free_pages_prepare+0xd6/0x120
       [ 4209.084837]  [<ffffffff8155621d>] ? rwsem_down_failed_common+0xbd/0x150
       [ 4209.085509]  [<ffffffff810f5462>] ? __free_pages_ok+0x32/0xe0
       [ 4209.086110]  [<ffffffff810f552b>] ? free_compound_page+0x1b/0x20
       [ 4209.086699]  [<ffffffff810fad6c>] ? __put_compound_page+0x1c/0x30
       [ 4209.087333]  [<ffffffff810fae1d>] ? put_compound_page+0x4d/0x200
       [ 4209.087935]  [<ffffffff810fb015>] ? put_page+0x45/0x50
       [ 4209.097361]  [<ffffffff8113f779>] ? khugepaged+0x9e9/0x1430
       [ 4209.098364]  [<ffffffff8107c870>] ? autoremove_wake_function+0x0/0x40
       [ 4209.099121]  [<ffffffff8113ed90>] ? khugepaged+0x0/0x1430
       [ 4209.099780]  [<ffffffff8107c236>] ? kthread+0x96/0xa0
       [ 4209.100452]  [<ffffffff8100dda4>] ? kernel_thread_helper+0x4/0x10
       [ 4209.101214]  [<ffffffff8107c1a0>] ? kthread+0x0/0xa0
       [ 4209.101842]  [<ffffffff8100dda0>] ? kernel_thread_helper+0x0/0x10
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      678ff896
    • J
      vmscan: fix zone shrinking exit when scan work is done · f0fdc5e8
      Johannes Weiner 提交于
      Commit 3e7d3449 ("mm: vmscan: reclaim order-0 and use compaction
      instead of lumpy reclaim") introduced an indefinite loop in
      shrink_zone().
      
      It meant to break out of this loop when no pages had been reclaimed and
      not a single page was even scanned.  The way it would detect the latter
      is by taking a snapshot of sc->nr_scanned at the beginning of the
      function and comparing it against the new sc->nr_scanned after the scan
      loop.  But it would re-iterate without updating that snapshot, looping
      forever if sc->nr_scanned changed at least once since shrink_zone() was
      invoked.
      
      This is not the sole condition that would exit that loop, but it
      requires other processes to change the zone state, as the reclaimer that
      is stuck obviously can not anymore.
      
      This is only happening for higher-order allocations, where reclaim is
      run back to back with compaction.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Tested-by: Kent Overstreet<kent.overstreet@gmail.com>
      Reported-by: NKent Overstreet <kent.overstreet@gmail.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0fdc5e8
    • M
      mlock: do not munlock pages in __do_fault() · 419d8c96
      Michel Lespinasse 提交于
      If the page is going to be written to, __do_page needs to break COW.
      
      However, the old page (before breaking COW) was never mapped mapped into
      the current pte (__do_fault is only called when the pte is not present),
      so vmscan can't have marked the old page as PageMlocked due to being
      mapped in __do_fault's VMA.  Therefore, __do_fault() does not need to
      worry about clearing PageMlocked() on the old page.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      419d8c96
    • M
      mlock: fix race when munlocking pages in do_wp_page() · e15f8c01
      Michel Lespinasse 提交于
      vmscan can lazily find pages that are mapped within VM_LOCKED vmas, and
      set the PageMlocked bit on these pages, transfering them onto the
      unevictable list.  When do_wp_page() breaks COW within a VM_LOCKED vma,
      it may need to clear PageMlocked on the old page and set it on the new
      page instead.
      
      This change fixes an issue where do_wp_page() was clearing PageMlocked
      on the old page while the pte was still pointing to it (as well as
      rmap).  Therefore, we were not protected against vmscan immediately
      transfering the old page back onto the unevictable list.  This could
      cause pages to get stranded there forever.
      
      I propose to move the corresponding code to the end of do_wp_page(),
      after the pte (and rmap) have been pointed to the new page.
      Additionally, we can use munlock_vma_page() instead of
      clear_page_mlock(), so that the old page stays mlocked if there are
      still other VM_LOCKED vmas mapping it.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e15f8c01
    • Y
      memblock: don't adjust size in memblock_find_base() · e6d2e2b2
      Yinghai Lu 提交于
      While applying patch to use memblock to find aperture for 64bit x86.
      Ingo found system with 1g + force_iommu
      
      > No AGP bridge found
      > Node 0: aperture @ 38000000 size 32 MB
      > Aperture pointing to e820 RAM. Ignoring.
      > Your BIOS doesn't leave a aperture memory hole
      > Please enable the IOMMU option in the BIOS setup
      > This costs you 64 MB of RAM
      > Cannot allocate aperture memory hole (0,65536K)
      
      the corresponding code:
      
      	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
      	if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) {
      		printk(KERN_ERR
      			"Cannot allocate aperture memory hole (%lx,%uK)\n",
      				addr, aper_size>>10);
      		return 0;
      	}
      	memblock_x86_reserve_range(addr, addr + aper_size, "aperture64")
      
      fails because memblock core code align the size with 512M.  That could
      make size way too big.
      
      So don't align the size in that case.
      
      actually __memblock_alloc_base, the another caller already align that
      before calling that function.
      
      BTW. x86 does not use __memblock_alloc_base...
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6d2e2b2
  4. 03 2月, 2011 11 次提交
  5. 02 2月, 2011 1 次提交
  6. 28 1月, 2011 2 次提交
  7. 26 1月, 2011 9 次提交
    • K
      memcg: fix race at move_parent around compound_order() · 52dbb905
      KAMEZAWA Hiroyuki 提交于
      A fix up mem_cgroup_move_parent() which use compound_order() in
      asynchronous manner.  This compound_order() may return unknown value
      because we don't take lock.  Use PageTransHuge() and HPAGE_SIZE instead
      of it.
      
      Also clean up for mem_cgroup_move_parent().
       - remove unnecessary initialization of local variable.
       - rename charge_size -> page_size
       - remove unnecessary (wrong) comment.
       - added a comment about THP.
      
      Note:
       Current design take compound_page_lock() in caller of move_account().
       This should be revisited when we implement direct move_task of hugepage
       without splitting.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52dbb905
    • K
      memcg: bugfix check mem_cgroup_disabled() at split fixup · 3d37c4a9
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_disabled() should be checked at splitting.  If disabled, no
      heavy work is necesary.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d37c4a9
    • K
      memcg: fix account leak at failure of memsw acconting · 01c88e2d
      KAMEZAWA Hiroyuki 提交于
      Commit 4b534334 ("memcg: clean up try_charge main loop") removes a
      cancel of charge at case: memory charge-> success.  mem+swap charge->
      failure.
      
      This leaks usage of memory.  Fix it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>	[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c88e2d
    • M
      mm: migration: clarify migrate_pages() comment · 28bd6578
      Minchan Kim 提交于
      Callers of migrate_pages should putback_lru_pages to return pages
      isolated to LRU or free list.  Now comment is rather confusing.  It says
      caller always have to call it.
      
      It is more clear to point out that the caller has to call it if
      migrate_pages's return value isn't zero.
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28bd6578
    • A
      mm: compaction: don't depend on HUGETLB_PAGE · 33a93877
      Andrea Arcangeli 提交于
      Commit 5d689240 ("thp: select CONFIG_COMPACTION if TRANSPARENT_HUGEPAGE
      enabled") causes this warning during the configuration process:
      
        warning: (TRANSPARENT_HUGEPAGE) selects COMPACTION which has unmet
        direct dependencies (EXPERIMENTAL && HUGETLB_PAGE && MMU)
      
      COMPACTION doesn't depend on HUGETLB_PAGE, it doesn't depend on THP
      either, it is also useful for regular alloc_pages(order > 0) including
      the very kernel stack during fork (THREAD_ORDER = 1).  It's always
      better to enable COMPACTION.
      
      The warning should be an error because we would end up with MIGRATION
      not selected, and COMPACTION wouldn't work without migration (despite it
      seems to build with an inline migrate_pages returning -ENOSYS).
      
      I'd also like to remove EXPERIMENTAL: compaction has been in the kernel
      for some releases (for full safety the default remains disabled which I
      think is enough).
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NLuca Tettamanti <kronos.it@gmail.com>
      Tested-by: NLuca Tettamanti <kronos.it@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33a93877
    • J
      mm/memcontrol.c: fix uninitialized variable use in mem_cgroup_move_parent() · 8dba474f
      Jesper Juhl 提交于
      In mm/memcontrol.c::mem_cgroup_move_parent() there's a path that jumps
      to the 'put_back' label
      
        	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge);
        	if (ret || !parent)
        		goto put_back;
      
      where we'll
      
        	if (charge > PAGE_SIZE)
        		compound_unlock_irqrestore(page, flags);
      
      but, we have not assigned anything to 'flags' at this point, nor have we
      called 'compound_lock_irqsave()' (which is what sets 'flags').  The
      'put_back' label should be moved below the call to
      compound_unlock_irqrestore() as per this patch.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8dba474f
    • D
      mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator · 2ff754fa
      David Rientjes 提交于
      Commit 0e093d99 ("writeback: do not sleep on the congestion queue if
      there are no congested BDIs or if significant congestion is not being
      encountered in the current zone") uncovered a livelock in the page
      allocator that resulted in tasks infinitely looping trying to find
      memory and kswapd running at 100% cpu.
      
      The issue occurs because drain_all_pages() is called immediately
      following direct reclaim when no memory is freed and try_to_free_pages()
      returns non-zero because all zones in the zonelist do not have their
      all_unreclaimable flag set.
      
      When draining the per-cpu pagesets back to the buddy allocator for each
      zone, the zone->pages_scanned counter is cleared to avoid erroneously
      setting zone->all_unreclaimable later.  The problem is that no pages may
      actually be drained and, thus, the unreclaimable logic never fails
      direct reclaim so the oom killer may be invoked.
      
      This apparently only manifested after wait_iff_congested() was
      introduced and the zone was full of anonymous memory that would not
      congest the backing store.  The page allocator would infinitely loop if
      there were no other tasks waiting to be scheduled and clear
      zone->pages_scanned because of drain_all_pages() as the result of this
      change before kswapd could scan enough pages to trigger the reclaim
      logic.  Additionally, with every loop of the page allocator and in the
      reclaim path, kswapd would be kicked and would end up running at 100%
      cpu.  In this scenario, current and kswapd are all running continuously
      with kswapd incrementing zone->pages_scanned and current clearing it.
      
      The problem is even more pronounced when current swaps some of its
      memory to swap cache and the reclaimable logic then considers all active
      anonymous memory in the all_unreclaimable logic, which requires a much
      higher zone->pages_scanned value for try_to_free_pages() to return zero
      that is never attainable in this scenario.
      
      Before wait_iff_congested(), the page allocator would incur an
      unconditional timeout and allow kswapd to elevate zone->pages_scanned to
      a level that the oom killer would be called the next time it loops.
      
      The fix is to only attempt to drain pcp pages if there is actually a
      quantity to be drained.  The unconditional clearing of
      zone->pages_scanned in free_pcppages_bulk() need not be changed since
      other callers already ensure that draining will occur.  This patch
      ensures that free_pcppages_bulk() will actually free memory before
      calling into it from drain_all_pages() so zone->pages_scanned is only
      cleared if appropriate.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ff754fa
    • D
      mm: fix deferred congestion timeout if preferred zone is not allowed · f33261d7
      David Rientjes 提交于
      Before 0e093d99 ("writeback: do not sleep on the congestion queue if
      there are no congested BDIs or if significant congestion is not being
      encountered in the current zone"), preferred_zone was only used for NUMA
      statistics, to determine the zoneidx from which to allocate from given
      the type requested, and whether to utilize memory compaction.
      
      wait_iff_congested(), though, uses preferred_zone to determine if the
      congestion wait should be deferred because its dirty pages are backed by
      a congested bdi.  This incorrectly defers the timeout and busy loops in
      the page allocator with various cond_resched() calls if preferred_zone
      is not allowed in the current context, usually consuming 100% of a cpu.
      
      This patch ensures preferred_zone is an allowed zone in the fastpath
      depending on whether current is constrained by its cpuset or nodes in
      its mempolicy (when the nodemask passed is non-NULL).  This is correct
      since the fastpath allocation always passes ALLOC_CPUSET when trying to
      allocate memory.  In the slowpath, this patch resets preferred_zone to
      the first zone of the allowed type when the allocation is not
      constrained by current's cpuset, i.e.  it does not pass ALLOC_CPUSET.
      
      This patch also ensures preferred_zone is from the set of allowed nodes
      when called from within direct reclaim since allocations are always
      constrained by cpusets in this context (it is blockable).
      
      Both of these uses of cpuset_current_mems_allowed are protected by
      get_mems_allowed().
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f33261d7
    • A
      mm/pgtable-generic.c: fix CONFIG_SWAP=n build · f95ba941
      Andrew Morton 提交于
      mips (and sparc32):
      
        In file included from arch/mips/include/asm/tlb.h:21,
                         from mm/pgtable-generic.c:9:
        include/asm-generic/tlb.h: In function `tlb_flush_mmu':
        include/asm-generic/tlb.h:76: error: implicit declaration of function `release_pages'
        include/asm-generic/tlb.h: In function `tlb_remove_page':
        include/asm-generic/tlb.h:105: error: implicit declaration of function `page_cache_release'
      
      free_pages_and_swap_cache() and free_page_and_swap_cache() are macros
      which call release_pages() and page_cache_release().  The obvious fix is
      to include pagemap.h in swap.h, where those macros are defined.  But that
      breaks sparc for weird reasons.
      
      So fix it within mm/pgtable-generic.c instead.
      Reported-by: NYoichi Yuasa <yuasa@linux-mips.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Sergei Shtylyov <sshtylyov@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f95ba941
  8. 21 1月, 2011 10 次提交