1. 12 2月, 2011 1 次提交
    • Y
      memblock: don't adjust size in memblock_find_base() · e6d2e2b2
      Yinghai Lu 提交于
      While applying patch to use memblock to find aperture for 64bit x86.
      Ingo found system with 1g + force_iommu
      
      > No AGP bridge found
      > Node 0: aperture @ 38000000 size 32 MB
      > Aperture pointing to e820 RAM. Ignoring.
      > Your BIOS doesn't leave a aperture memory hole
      > Please enable the IOMMU option in the BIOS setup
      > This costs you 64 MB of RAM
      > Cannot allocate aperture memory hole (0,65536K)
      
      the corresponding code:
      
      	addr = memblock_find_in_range(0, 1ULL<<32, aper_size, 512ULL<<20);
      	if (addr == MEMBLOCK_ERROR || addr + aper_size > 0xffffffff) {
      		printk(KERN_ERR
      			"Cannot allocate aperture memory hole (%lx,%uK)\n",
      				addr, aper_size>>10);
      		return 0;
      	}
      	memblock_x86_reserve_range(addr, addr + aper_size, "aperture64")
      
      fails because memblock core code align the size with 512M.  That could
      make size way too big.
      
      So don't align the size in that case.
      
      actually __memblock_alloc_base, the another caller already align that
      before calling that function.
      
      BTW. x86 does not use __memblock_alloc_base...
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6d2e2b2
  2. 03 2月, 2011 11 次提交
  3. 02 2月, 2011 1 次提交
  4. 28 1月, 2011 2 次提交
  5. 26 1月, 2011 9 次提交
    • K
      memcg: fix race at move_parent around compound_order() · 52dbb905
      KAMEZAWA Hiroyuki 提交于
      A fix up mem_cgroup_move_parent() which use compound_order() in
      asynchronous manner.  This compound_order() may return unknown value
      because we don't take lock.  Use PageTransHuge() and HPAGE_SIZE instead
      of it.
      
      Also clean up for mem_cgroup_move_parent().
       - remove unnecessary initialization of local variable.
       - rename charge_size -> page_size
       - remove unnecessary (wrong) comment.
       - added a comment about THP.
      
      Note:
       Current design take compound_page_lock() in caller of move_account().
       This should be revisited when we implement direct move_task of hugepage
       without splitting.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52dbb905
    • K
      memcg: bugfix check mem_cgroup_disabled() at split fixup · 3d37c4a9
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_disabled() should be checked at splitting.  If disabled, no
      heavy work is necesary.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d37c4a9
    • K
      memcg: fix account leak at failure of memsw acconting · 01c88e2d
      KAMEZAWA Hiroyuki 提交于
      Commit 4b534334 ("memcg: clean up try_charge main loop") removes a
      cancel of charge at case: memory charge-> success.  mem+swap charge->
      failure.
      
      This leaks usage of memory.  Fix it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>	[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c88e2d
    • M
      mm: migration: clarify migrate_pages() comment · 28bd6578
      Minchan Kim 提交于
      Callers of migrate_pages should putback_lru_pages to return pages
      isolated to LRU or free list.  Now comment is rather confusing.  It says
      caller always have to call it.
      
      It is more clear to point out that the caller has to call it if
      migrate_pages's return value isn't zero.
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28bd6578
    • A
      mm: compaction: don't depend on HUGETLB_PAGE · 33a93877
      Andrea Arcangeli 提交于
      Commit 5d689240 ("thp: select CONFIG_COMPACTION if TRANSPARENT_HUGEPAGE
      enabled") causes this warning during the configuration process:
      
        warning: (TRANSPARENT_HUGEPAGE) selects COMPACTION which has unmet
        direct dependencies (EXPERIMENTAL && HUGETLB_PAGE && MMU)
      
      COMPACTION doesn't depend on HUGETLB_PAGE, it doesn't depend on THP
      either, it is also useful for regular alloc_pages(order > 0) including
      the very kernel stack during fork (THREAD_ORDER = 1).  It's always
      better to enable COMPACTION.
      
      The warning should be an error because we would end up with MIGRATION
      not selected, and COMPACTION wouldn't work without migration (despite it
      seems to build with an inline migrate_pages returning -ENOSYS).
      
      I'd also like to remove EXPERIMENTAL: compaction has been in the kernel
      for some releases (for full safety the default remains disabled which I
      think is enough).
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NLuca Tettamanti <kronos.it@gmail.com>
      Tested-by: NLuca Tettamanti <kronos.it@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33a93877
    • J
      mm/memcontrol.c: fix uninitialized variable use in mem_cgroup_move_parent() · 8dba474f
      Jesper Juhl 提交于
      In mm/memcontrol.c::mem_cgroup_move_parent() there's a path that jumps
      to the 'put_back' label
      
        	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge);
        	if (ret || !parent)
        		goto put_back;
      
      where we'll
      
        	if (charge > PAGE_SIZE)
        		compound_unlock_irqrestore(page, flags);
      
      but, we have not assigned anything to 'flags' at this point, nor have we
      called 'compound_lock_irqsave()' (which is what sets 'flags').  The
      'put_back' label should be moved below the call to
      compound_unlock_irqrestore() as per this patch.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8dba474f
    • D
      mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator · 2ff754fa
      David Rientjes 提交于
      Commit 0e093d99 ("writeback: do not sleep on the congestion queue if
      there are no congested BDIs or if significant congestion is not being
      encountered in the current zone") uncovered a livelock in the page
      allocator that resulted in tasks infinitely looping trying to find
      memory and kswapd running at 100% cpu.
      
      The issue occurs because drain_all_pages() is called immediately
      following direct reclaim when no memory is freed and try_to_free_pages()
      returns non-zero because all zones in the zonelist do not have their
      all_unreclaimable flag set.
      
      When draining the per-cpu pagesets back to the buddy allocator for each
      zone, the zone->pages_scanned counter is cleared to avoid erroneously
      setting zone->all_unreclaimable later.  The problem is that no pages may
      actually be drained and, thus, the unreclaimable logic never fails
      direct reclaim so the oom killer may be invoked.
      
      This apparently only manifested after wait_iff_congested() was
      introduced and the zone was full of anonymous memory that would not
      congest the backing store.  The page allocator would infinitely loop if
      there were no other tasks waiting to be scheduled and clear
      zone->pages_scanned because of drain_all_pages() as the result of this
      change before kswapd could scan enough pages to trigger the reclaim
      logic.  Additionally, with every loop of the page allocator and in the
      reclaim path, kswapd would be kicked and would end up running at 100%
      cpu.  In this scenario, current and kswapd are all running continuously
      with kswapd incrementing zone->pages_scanned and current clearing it.
      
      The problem is even more pronounced when current swaps some of its
      memory to swap cache and the reclaimable logic then considers all active
      anonymous memory in the all_unreclaimable logic, which requires a much
      higher zone->pages_scanned value for try_to_free_pages() to return zero
      that is never attainable in this scenario.
      
      Before wait_iff_congested(), the page allocator would incur an
      unconditional timeout and allow kswapd to elevate zone->pages_scanned to
      a level that the oom killer would be called the next time it loops.
      
      The fix is to only attempt to drain pcp pages if there is actually a
      quantity to be drained.  The unconditional clearing of
      zone->pages_scanned in free_pcppages_bulk() need not be changed since
      other callers already ensure that draining will occur.  This patch
      ensures that free_pcppages_bulk() will actually free memory before
      calling into it from drain_all_pages() so zone->pages_scanned is only
      cleared if appropriate.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ff754fa
    • D
      mm: fix deferred congestion timeout if preferred zone is not allowed · f33261d7
      David Rientjes 提交于
      Before 0e093d99 ("writeback: do not sleep on the congestion queue if
      there are no congested BDIs or if significant congestion is not being
      encountered in the current zone"), preferred_zone was only used for NUMA
      statistics, to determine the zoneidx from which to allocate from given
      the type requested, and whether to utilize memory compaction.
      
      wait_iff_congested(), though, uses preferred_zone to determine if the
      congestion wait should be deferred because its dirty pages are backed by
      a congested bdi.  This incorrectly defers the timeout and busy loops in
      the page allocator with various cond_resched() calls if preferred_zone
      is not allowed in the current context, usually consuming 100% of a cpu.
      
      This patch ensures preferred_zone is an allowed zone in the fastpath
      depending on whether current is constrained by its cpuset or nodes in
      its mempolicy (when the nodemask passed is non-NULL).  This is correct
      since the fastpath allocation always passes ALLOC_CPUSET when trying to
      allocate memory.  In the slowpath, this patch resets preferred_zone to
      the first zone of the allowed type when the allocation is not
      constrained by current's cpuset, i.e.  it does not pass ALLOC_CPUSET.
      
      This patch also ensures preferred_zone is from the set of allowed nodes
      when called from within direct reclaim since allocations are always
      constrained by cpusets in this context (it is blockable).
      
      Both of these uses of cpuset_current_mems_allowed are protected by
      get_mems_allowed().
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f33261d7
    • A
      mm/pgtable-generic.c: fix CONFIG_SWAP=n build · f95ba941
      Andrew Morton 提交于
      mips (and sparc32):
      
        In file included from arch/mips/include/asm/tlb.h:21,
                         from mm/pgtable-generic.c:9:
        include/asm-generic/tlb.h: In function `tlb_flush_mmu':
        include/asm-generic/tlb.h:76: error: implicit declaration of function `release_pages'
        include/asm-generic/tlb.h: In function `tlb_remove_page':
        include/asm-generic/tlb.h:105: error: implicit declaration of function `page_cache_release'
      
      free_pages_and_swap_cache() and free_page_and_swap_cache() are macros
      which call release_pages() and page_cache_release().  The obvious fix is
      to include pagemap.h in swap.h, where those macros are defined.  But that
      breaks sparc for weird reasons.
      
      So fix it within mm/pgtable-generic.c instead.
      Reported-by: NYoichi Yuasa <yuasa@linux-mips.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Sergei Shtylyov <sshtylyov@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f95ba941
  6. 21 1月, 2011 10 次提交
  7. 18 1月, 2011 2 次提交
    • L
      Revert "mm: simplify code of swap.c" · 83896fb5
      Linus Torvalds 提交于
      This reverts commit d8505dee.
      
      Chris Mason ended up chasing down some page allocation errors and pages
      stuck waiting on the IO scheduler, and was able to narrow it down to two
      commits: commit 744ed144 ("mm: batch activate_page() to reduce lock
      contention") and d8505dee ("mm: simplify code of swap.c").
      
      This reverts the second one.
      Reported-and-debugged-by: NChris Mason <chris.mason@oracle.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83896fb5
    • L
      Revert "mm: batch activate_page() to reduce lock contention" · 7a608572
      Linus Torvalds 提交于
      This reverts commit 744ed144.
      
      Chris Mason ended up chasing down some page allocation errors and pages
      stuck waiting on the IO scheduler, and was able to narrow it down to two
      commits: commit 744ed144 ("mm: batch activate_page() to reduce lock
      contention") and d8505dee ("mm: simplify code of swap.c").
      
      This reverts the first of them.
      Reported-and-debugged-by: NChris Mason <chris.mason@oracle.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a608572
  8. 17 1月, 2011 1 次提交
    • A
      fix non-x86 build failure in pmdp_get_and_clear · b3697c02
      Andrea Arcangeli 提交于
      pmdp_get_and_clear/pmdp_clear_flush/pmdp_splitting_flush were trapped as
      BUG() and they were defined only to diminish the risk of build issues on
      not-x86 archs and to be consistent with the generic pte methods previously
      defined in include/asm-generic/pgtable.h.
      
      But they are causing more trouble than they were supposed to solve, so
      it's simpler not to define them when THP is off.
      
      This is also correcting the export of pmdp_splitting_flush which is
      currently unused (x86 isn't using the generic implementation in
      mm/pgtable-generic.c and no other arch needs that [yet]).
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Sam Ravnborg <sam@ravnborg.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3697c02
  9. 15 1月, 2011 1 次提交
  10. 14 1月, 2011 2 次提交