1. 24 2月, 2013 8 次提交
  2. 19 2月, 2013 1 次提交
    • L
      mm: fix pageblock bitmap allocation · 7c45512d
      Linus Torvalds 提交于
      Commit c060f943 ("mm: use aligned zone start for pfn_to_bitidx
      calculation") fixed out calculation of the index into the pageblock
      bitmap when a !SPARSEMEM zome was not aligned to pageblock_nr_pages.
      
      However, the _allocation_ of that bitmap had never taken this alignment
      requirement into accout, so depending on the exact size and alignment of
      the zone, the use of that index could then access past the allocation,
      resulting in some very subtle memory corruption.
      
      This was reported (and bisected) by Ingo Molnar: one of his random
      config builds would hang with certain very specific kernel command line
      options.
      
      In the meantime, commit c060f943 has been marked for stable, so this
      fix needs to be back-ported to the stable kernels that backported the
      commit to use the right alignment.
      Bisected-and-tested-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c45512d
  3. 13 2月, 2013 1 次提交
  4. 08 2月, 2013 1 次提交
  5. 12 1月, 2013 3 次提交
    • M
      mm: compaction: partially revert capture of suitable high-order page · 8fb74b9f
      Mel Gorman 提交于
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca ("mm: compaction: capture a
      suitable high-order page immediately when it is made available").
      Reported-and-tested-by: NEric Wong <normalperson@yhbt.net>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8fb74b9f
    • L
      mm: use aligned zone start for pfn_to_bitidx calculation · c060f943
      Laura Abbott 提交于
      The current calculation in pfn_to_bitidx assumes that (pfn -
      zone->zone_start_pfn) >> pageblock_order will return the same bit for
      all pfn in a pageblock.  If zone_start_pfn is not aligned to
      pageblock_nr_pages, this may not always be correct.
      
      Consider the following with pageblock order = 10, zone start 2MB:
      
        pfn     | pfn - zone start | (pfn - zone start) >> page block order
        ----------------------------------------------------------------
        0x26000 | 0x25e00	   |  0x97
        0x26100 | 0x25f00	   |  0x97
        0x26200 | 0x26000	   |  0x98
        0x26300 | 0x26100	   |  0x98
      
      This means that calling {get,set}_pageblock_migratetype on a single page
      will not set the migratetype for the full block.  Fix this by rounding
      down zone_start_pfn when doing the bitidx calculation.
      
      For our use case, the effects of this bug were mostly tied to the fact
      that CMA allocations would either take a long time or fail to happen.
      Depending on the driver using CMA, this could result in anything from
      visual glitches to application failures.
      Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c060f943
    • M
      mm: compaction: Partially revert capture of suitable high-order page · 47ecfcb7
      Mel Gorman 提交于
      Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
      waiting for POLLIN on a local TCP socket.  It was easier to trigger if
      there was disk IO and dirty pages at the same time and he bisected it to
      commit 1fb3f8ca ("mm: compaction: capture a suitable high-order page
      immediately when it is made available").
      
      The intention of that patch was to improve high-order allocations under
      memory pressure after changes made to reclaim in 3.6 drastically hurt
      THP allocations but the approach was flawed.  For Eric, the problem was
      that page->pfmemalloc was not being cleared for captured pages leading
      to a poor interaction with swap-over-NFS support causing the packets to
      be dropped.  However, I identified a few more problems with the patch
      including the fact that it can increase contention on zone->lock in some
      cases which could result in async direct compaction being aborted early.
      
      In retrospect the capture patch took the wrong approach.  What it should
      have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
      was allocating for THP and avoided races that way.  While the patch was
      showing to improve allocation success rates at the time, the benefit is
      marginal given the relative complexity and it should be revisited from
      scratch in the context of the other reclaim-related changes that have
      taken place since the patch was first written and tested.  This patch
      partially reverts commit 1fb3f8ca "mm: compaction: capture a suitable
      high-order page immediately when it is made available".
      Reported-and-tested-by: NEric Wong <normalperson@yhbt.net>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47ecfcb7
  6. 05 1月, 2013 1 次提交
  7. 21 12月, 2012 1 次提交
  8. 19 12月, 2012 2 次提交
  9. 13 12月, 2012 5 次提交
  10. 12 12月, 2012 8 次提交
    • M
      mm: cma: remove watermark hacks · bc357f43
      Marek Szyprowski 提交于
      Commits 2139cbe6 ("cma: fix counting of isolated pages") and
      d95ea5d1 ("cma: fix watermark checking") introduced a reliable
      method of free page accounting when memory is being allocated from CMA
      regions, so the workaround introduced earlier by commit 49f223a9
      ("mm: trigger page reclaim in alloc_contig_range() to stabilise
      watermarks") can be finally removed.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc357f43
    • M
      mm: cma: skip watermarks check for already isolated blocks in split_free_page() · 2e30abd1
      Marek Szyprowski 提交于
      Since commit 2139cbe6 ("cma: fix counting of isolated pages") free
      pages in isolated pageblocks are not accounted to NR_FREE_PAGES counters,
      so watermarks check is not required if one operates on a free page in
      isolated pageblock.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e30abd1
    • R
      mm: introduce putback_movable_pages() · 5733c7d1
      Rafael Aquini 提交于
      The PATCH "mm: introduce compaction and migration for virtio ballooned pages"
      hacks around putback_lru_pages() in order to allow ballooned pages to be
      re-inserted on balloon page list as if a ballooned page was like a LRU page.
      
      As ballooned pages are not legitimate LRU pages, this patch introduces
      putback_movable_pages() to properly cope with cases where the isolated
      pageset contains ballooned pages and LRU pages, thus fixing the mentioned
      inelegant hack around putback_lru_pages().
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5733c7d1
    • W
      memory-hotplug: allocate zone's pcp before onlining pages · 6dcd73d7
      Wen Congyang 提交于
      We use __free_page() to put a page to buddy system when onlining pages.
      __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
      should allocate zone's pcp before onlining pages, otherwise we will lose
      some free pages.
      
      [mhocko@suse.cz: make zone_pcp_reset independent of MEMORY_HOTREMOVE]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dcd73d7
    • W
      memory-hotplug: fix NR_FREE_PAGES mismatch · 97d0da22
      Wen Congyang 提交于
      NR_FREE_PAGES will be wrong after offlining pages.  We add/dec
      NR_FREE_PAGES like this now:
      
      1. move all pages in buddy system to MIGRATE_ISOLATE, and dec NR_FREE_PAGES
      
      2. don't add NR_FREE_PAGES when it is freed and the migratetype is
         MIGRATE_ISOLATE
      
      3. dec NR_FREE_PAGES when offlining isolated pages.
      
      4. add NR_FREE_PAGES when undoing isolate pages.
      
      When we come to step 3, all pages are in MIGRATE_ISOLATE list, and
      NR_FREE_PAGES are right.  When we come to step4, all pages are not in
      buddy system, so we don't change NR_FREE_PAGES in this step, but we change
      NR_FREE_PAGES in step3.  So NR_FREE_PAGES is wrong after offlining pages.
      So there is no need to change NR_FREE_PAGES in step3.
      
      This patch also fixs a problem in step2: if the migratetype is
      MIGRATE_ISOLATE, we should not add NR_FRR_PAGES when we remove pages from
      pcppages.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Jianguo Wu <wujianguo106@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97d0da22
    • W
      memory-hotplug: skip HWPoisoned page when offlining pages · b023f468
      Wen Congyang 提交于
      hwpoisoned may be set when we offline a page by the sysfs interface
      /sys/devices/system/memory/soft_offline_page or
      /sys/devices/system/memory/hard_offline_page. If we don't clear
      this flag when onlining pages, this page can't be freed, and will
      not in free list. So we can't offline these pages again. So we
      should skip such page when offlining pages.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b023f468
    • K
      mm: use IS_ENABLED(CONFIG_NUMA) instead of NUMA_BUILD · e5adfffc
      Kirill A. Shutemov 提交于
      We don't need custom NUMA_BUILD anymore, since we have handy
      IS_ENABLED().
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5adfffc
    • R
      mm: show migration types in show_mem · 377e4f16
      Rabin Vincent 提交于
      This is useful to diagnose the reason for page allocation failure for
      cases where there appear to be several free pages.
      
      Example, with this alloc_pages(GFP_ATOMIC) failure:
      
       swapper/0: page allocation failure: order:0, mode:0x0
       ...
       Mem-info:
       Normal per-cpu:
       CPU    0: hi:   90, btch:  15 usd:  48
       CPU    1: hi:   90, btch:  15 usd:  21
       active_anon:0 inactive_anon:0 isolated_anon:0
        active_file:0 inactive_file:84 isolated_file:0
        unevictable:0 dirty:0 writeback:0 unstable:0
        free:4026 slab_reclaimable:75 slab_unreclaimable:484
        mapped:0 shmem:0 pagetables:0 bounce:0
       Normal free:16104kB min:2296kB low:2868kB high:3444kB active_anon:0kB
       inactive_anon:0kB active_file:0kB inactive_file:336kB unevictable:0kB
       isolated(anon):0kB isolated(file):0kB present:331776kB mlocked:0kB
       dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:300kB
       slab_unreclaimable:1936kB kernel_stack:328kB pagetables:0kB unstable:0kB
       bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
       lowmem_reserve[]: 0 0
      
      Before the patch, it's hard (for me, at least) to say why all these free
      chunks weren't considered for allocation:
      
       Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB
       1*1024kB 1*2048kB 3*4096kB = 16128kB
      
      After the patch, it's obvious that the reason is that all of these are
      in the MIGRATE_CMA (C) freelist:
      
       Normal: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB (C) 1*512kB
       (C) 1*1024kB (C) 1*2048kB (C) 3*4096kB (C) = 16128kB
      Signed-off-by: NRabin Vincent <rabin.vincent@stericsson.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      377e4f16
  11. 11 12月, 2012 5 次提交
  12. 06 12月, 2012 1 次提交
  13. 01 12月, 2012 3 次提交
    • M
      mm: avoid waking kswapd for THP allocations when compaction is deferred or contended · 782fd304
      Mel Gorman 提交于
      With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
      based on failures" reverted, Zdenek Kabelac reported the following
      
        Hmm,  so it's just took longer to hit the problem and observe
        kswapd0 spinning on my CPU again - it's not as endless like before -
        but still it easily eats minutes - it helps to turn off  Firefox
        or TB  (memory hungry apps) so kswapd0 stops soon - and restart
        those apps again.  (And I still have like >1GB of cached memory)
      
        kswapd0         R  running task        0    30      2 0x00000000
        Call Trace:
          preempt_schedule+0x42/0x60
          _raw_spin_unlock+0x55/0x60
          put_super+0x31/0x40
          drop_super+0x22/0x30
          prune_super+0x149/0x1b0
          shrink_slab+0xba/0x510
      
      The sysrq+m indicates the system has no swap so it'll never reclaim
      anonymous pages as part of reclaim/compaction.  That is one part of the
      problem but not the root cause as file-backed pages could also be
      reclaimed.
      
      The likely underlying problem is that kswapd is woken up or kept awake
      for each THP allocation request in the page allocator slow path.
      
      If compaction fails for the requesting process then compaction will be
      deferred for a time and direct reclaim is avoided.  However, if there
      are a storm of THP requests that are simply rejected, it will still be
      the the case that kswapd is awake for a prolonged period of time as
      pgdat->kswapd_max_order is updated each time.  This is noticed by the
      main kswapd() loop and it will not call kswapd_try_to_sleep().  Instead
      it will loopp, shrinking a small number of pages and calling
      shrink_slab() on each iteration.
      
      This patch defers when kswapd gets woken up for THP allocations.  For
      !THP allocations, kswapd is always woken up.  For THP allocations,
      kswapd is woken up iff the process is willing to enter into direct
      reclaim/compaction.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Zdenek Kabelac <zkabelac@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      782fd304
    • A
      revert "Revert "mm: remove __GFP_NO_KSWAPD"" · a5091539
      Andrew Morton 提交于
      It apepars that this patch was innocent, and we hope that "mm: avoid
      waking kswapd for THP allocations when compaction is deferred or
      contended" will fix the final kswapd-spinning cause.
      
      Cc: Zdenek Kabelac <zkabelac@redhat.com>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5091539
    • M
      mm: compaction: fix return value of capture_free_page() · 58d00209
      Mel Gorman 提交于
      Commit ef6c5be6 ("fix incorrect NR_FREE_PAGES accounting (appears
      like memory leak)") fixes a NR_FREE_PAGE accounting leak but missed the
      return value which was also missed by this reviewer until today.
      
      That return value is used by compaction when adding pages to a list of
      isolated free pages and without this follow-up fix, there is a risk of
      free list corruption.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58d00209