1. 14 11月, 2014 1 次提交
    • J
      mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype · ad53f92e
      Joonsoo Kim 提交于
      Before describing bugs itself, I first explain definition of freepage.
      
       1. pages on buddy list are counted as freepage.
       2. pages on isolate migratetype buddy list are *not* counted as freepage.
       3. pages on cma buddy list are counted as CMA freepage, too.
      
      Now, I describe problems and related patch.
      
      Patch 1: There is race conditions on getting pageblock migratetype that
      it results in misplacement of freepages on buddy list, incorrect
      freepage count and un-availability of freepage.
      
      Patch 2: Freepages on pcp list could have stale cached information to
      determine migratetype of buddy list to go.  This causes misplacement of
      freepages on buddy list and incorrect freepage count.
      
      Patch 4: Merging between freepages on different migratetype of
      pageblocks will cause freepages accouting problem.  This patch fixes it.
      
      Without patchset [3], above problem doesn't happens on my CMA allocation
      test, because CMA reserved pages aren't used at all.  So there is no
      chance for above race.
      
      With patchset [3], I did simple CMA allocation test and get below
      result:
      
       - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
       - run kernel build (make -j16) on background
       - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
       - Result: more than 5000 freepage count are missed
      
      With patchset [3] and this patchset, I found that no freepage count are
      missed so that I conclude that problems are solved.
      
      On my simple memory offlining test, these problems also occur on that
      environment, too.
      
      This patch (of 4):
      
      There are two paths to reach core free function of buddy allocator,
      __free_one_page(), one is free_one_page()->__free_one_page() and the
      other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
      Each paths has race condition causing serious problems.  At first, this
      patch is focused on first type of freepath.  And then, following patch
      will solve the problem in second type of freepath.
      
      In the first type of freepath, we got migratetype of freeing page
      without holding the zone lock, so it could be racy.  There are two cases
      of this race.
      
       1. pages are added to isolate buddy list after restoring orignal
          migratetype
      
          CPU1                                   CPU2
      
          get migratetype => return MIGRATE_ISOLATE
          call free_one_page() with MIGRATE_ISOLATE
      
                                      grab the zone lock
                                      unisolate pageblock
                                      release the zone lock
      
          grab the zone lock
          call __free_one_page() with MIGRATE_ISOLATE
          freepage go into isolate buddy list,
          although pageblock is already unisolated
      
      This may cause two problems.  One is that we can't use this page anymore
      until next isolation attempt of this pageblock, because freepage is on
      isolate buddy list.  The other is that freepage accouting could be wrong
      due to merging between different buddy list.  Freepages on isolate buddy
      list aren't counted as freepage, but ones on normal buddy list are
      counted as freepage.  If merge happens, buddy freepage on normal buddy
      list is inevitably moved to isolate buddy list without any consideration
      of freepage accouting so it could be incorrect.
      
       2. pages are added to normal buddy list while pageblock is isolated.
          It is similar with above case.
      
      This also may cause two problems.  One is that we can't keep these
      freepages from being allocated.  Although this pageblock is isolated,
      freepage would be added to normal buddy list so that it could be
      allocated without any restriction.  And the other problem is same as
      case 1, that it, incorrect freepage accouting.
      
      This race condition would be prevented by checking migratetype again
      with holding the zone lock.  Because it is somewhat heavy operation and
      it isn't needed in common case, we want to avoid rechecking as much as
      possible.  So this patch introduce new variable, nr_isolate_pageblock in
      struct zone to check if there is isolated pageblock.  With this, we can
      avoid to re-check migratetype in common case and do it only if there is
      isolated pageblock or migratetype is MIGRATE_ISOLATE.  This solve above
      mentioned problems.
      
      Changes from v3:
      Add one more check in free_one_page() that checks whether migratetype is
      MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad53f92e
  2. 12 9月, 2013 1 次提交
    • N
      mm: memory-hotplug: enable memory hotplug to handle hugepage · c8721bbb
      Naoya Horiguchi 提交于
      Until now we can't offline memory blocks which contain hugepages because a
      hugepage is considered as an unmovable page.  But now with this patch
      series, a hugepage has become movable, so by using hugepage migration we
      can offline such memory blocks.
      
      What's different from other users of hugepage migration is that we need to
      decompose all the hugepages inside the target memory block into free buddy
      pages after hugepage migration, because otherwise free hugepages remaining
      in the memory block intervene the memory offlining.  For this reason we
      introduce new functions dissolve_free_huge_page() and
      dissolve_free_huge_pages().
      
      Other than that, what this patch does is straightforwardly to add hugepage
      migration code, that is, adding hugepage code to the functions which scan
      over pfn and collect hugepages to be migrated, and adding a hugepage
      allocation function to alloc_migrate_target().
      
      As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
      over them because it's larger than memory block.  So we now simply leave
      it to fail as it is.
      
      [yongjun_wei@trendmicro.com.cn: remove duplicated include]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8721bbb
  3. 20 8月, 2013 1 次提交
  4. 05 1月, 2013 1 次提交
  5. 12 12月, 2012 1 次提交
    • W
      memory-hotplug: skip HWPoisoned page when offlining pages · b023f468
      Wen Congyang 提交于
      hwpoisoned may be set when we offline a page by the sysfs interface
      /sys/devices/system/memory/soft_offline_page or
      /sys/devices/system/memory/hard_offline_page. If we don't clear
      this flag when onlining pages, this page can't be freed, and will
      not in free list. So we can't offline these pages again. So we
      should skip such page when offlining pages.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b023f468
  6. 09 10月, 2012 6 次提交
  7. 01 8月, 2012 2 次提交
    • M
      memory-hotplug: fix kswapd looping forever problem · 702d1a6e
      Minchan Kim 提交于
      When hotplug offlining happens on zone A, it starts to mark freed page as
      MIGRATE_ISOLATE type in buddy for preventing further allocation.
      (MIGRATE_ISOLATE is very irony type because it's apparently on buddy but
      we can't allocate them).
      
      When the memory shortage happens during hotplug offlining, current task
      starts to reclaim, then wake up kswapd.  Kswapd checks watermark, then go
      sleep because current zone_watermark_ok_safe doesn't consider
      MIGRATE_ISOLATE freed page count.  Current task continue to reclaim in
      direct reclaim path without kswapd's helping.  The problem is that
      zone->all_unreclaimable is set by only kswapd so that current task would
      be looping forever like below.
      
      __alloc_pages_slowpath
      restart:
      	wake_all_kswapd
      rebalance:
      	__alloc_pages_direct_reclaim
      		do_try_to_free_pages
      			if global_reclaim && !all_unreclaimable
      				return 1; /* It means we did did_some_progress */
      	skip __alloc_pages_may_oom
      	should_alloc_retry
      		goto rebalance;
      
      If we apply KOSAKI's patch[1] which doesn't depends on kswapd about
      setting zone->all_unreclaimable, we can solve this problem by killing some
      task in direct reclaim path.  But it doesn't wake up kswapd, still.  It
      could be a problem still if other subsystem needs GFP_ATOMIC request.  So
      kswapd should consider MIGRATE_ISOLATE when it calculate free pages BEFORE
      going sleep.
      
      This patch counts the number of MIGRATE_ISOLATE page block and
      zone_watermark_ok_safe will consider it if the system has such blocks
      (fortunately, it's very rare so no problem in POV overhead and kswapd is
      never hotpath).
      
      Copy/modify from Mel's quote
      "
      Ideal solution would be "allocating" the pageblock.
      It would keep the free space accounting as it is but historically,
      memory hotplug didn't allocate pages because it would be difficult to
      detect if a pageblock was isolated or if part of some balloon.
      Allocating just full pageblocks would work around this, However,
      it would play very badly with CMA.
      "
      
      [1] http://lkml.org/lkml/2012/6/14/74
      
      [akpm@linux-foundation.org: simplify nr_zone_isolate_freepages(), rework zone_watermark_ok_safe() comment, simplify set_pageblock_isolate() and restore_pageblock_isolate()]
      [akpm@linux-foundation.org: fix CONFIG_MEMORY_ISOLATION=n build]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Suggested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: NAaditya Kumar <aaditya.kumar.30@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      702d1a6e
    • M
      mm: factor out memory isolate functions · ee6f509c
      Minchan Kim 提交于
      mm/page_alloc.c has some memory isolation functions but they are used only
      when we enable CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}.  So let's make
      it configurable by new CONFIG_MEMORY_ISOLATION so that it can reduce
      binary size and we can check it simple by CONFIG_MEMORY_ISOLATION, not if
      defined CONFIG_{CMA|MEMORY_HOTPLUG|MEMORY_FAILURE}.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee6f509c
  8. 21 5月, 2012 1 次提交
  9. 27 10月, 2010 1 次提交
  10. 07 11月, 2008 1 次提交
  11. 03 10月, 2008 1 次提交
    • G
      memory hotplug: missing zone->lock in test_pages_isolated() · 6c1b7f68
      Gerald Schaefer 提交于
      __test_page_isolated_in_pageblock() in mm/page_isolation.c has a comment
      saying that the caller must hold zone->lock. But the only caller of that
      function, test_pages_isolated(), does not hold zone->lock and the lock is
      also not acquired anywhere before. This patch adds the missing zone->lock
      to test_pages_isolated().
      
      We reproducibly run into BUG_ON(!PageBuddy(page)) in __offline_isolated_pages()
      during memory hotplug stress test, see trace below. This patch fixes that
      problem, it would be good if we could have it in 2.6.27.
      
      kernel BUG at /home/autobuild/BUILD/linux-2.6.26-20080909/mm/page_alloc.c:4561!
      illegal operation: 0001 [#1] PREEMPT SMP
      Modules linked in: dm_multipath sunrpc bonding qeth_l3 dm_mod qeth ccwgroup vmur
      CPU: 1 Not tainted 2.6.26-29.x.20080909-s390default #1
      Process memory_loop_all (pid: 10025, task: 2f444028, ksp: 2b10dd28)
      Krnl PSW : 040c0000 801727ea (__offline_isolated_pages+0x18e/0x1c4)
       R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:0 PM:0
      Krnl GPRS: 00000000 7e27fc00 00000000 7e27fc00
       00000000 00000400 00014000 7e27fc01
       00606f00 7e27fc00 00013fe0 2b10dd28
       00000005 80172662 801727b2 2b10dd28
      Krnl Code: 801727de: 5810900c l %r1,12(%r9)
       801727e2: a7f4ffb3 brc 15,80172748
       801727e6: a7f40001 brc 15,801727e8
       >801727ea: a7f4ffbc brc 15,80172762
       801727ee: a7f40001 brc 15,801727f0
       801727f2: a7f4ffaf brc 15,80172750
       801727f6: 0707 bcr 0,%r7
       801727f8: 0017 unknown
      Call Trace:
      ([<0000000000172772>] __offline_isolated_pages+0x116/0x1c4)
       [<00000000001953a2>] offline_isolated_pages_cb+0x22/0x34
       [<000000000013164c>] walk_memory_resource+0xcc/0x11c
       [<000000000019520e>] offline_pages+0x36a/0x498
       [<00000000001004d6>] remove_memory+0x36/0x44
       [<000000000028fb06>] memory_block_change_state+0x112/0x150
       [<000000000028ffb8>] store_mem_state+0x90/0xe4
       [<0000000000289c00>] sysdev_store+0x34/0x40
       [<00000000001ee048>] sysfs_write_file+0xd0/0x178
       [<000000000019b1a8>] vfs_write+0x74/0x118
       [<000000000019b9ae>] sys_write+0x46/0x7c
       [<000000000011160e>] sysc_do_restart+0x12/0x16
       [<0000000077f3e8ca>] 0x77f3e8ca
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c1b7f68
  12. 02 9月, 2008 1 次提交
  13. 15 11月, 2007 1 次提交
  14. 17 10月, 2007 1 次提交
    • K
      memory unplug: page isolation · a5d76b54
      KAMEZAWA Hiroyuki 提交于
      Implement generic chunk-of-pages isolation method by using page grouping ops.
      
      This patch add MIGRATE_ISOLATE to MIGRATE_TYPES. By this
       - MIGRATE_TYPES increases.
       - bitmap for migratetype is enlarged.
      
      pages of MIGRATE_ISOLATE migratetype will not be allocated even if it is free.
      By this, you can isolated *freed* pages from users. How-to-free pages is not
      a purpose of this patch. You may use reclaim and migrate codes to free pages.
      
      If start_isolate_page_range(start,end) is called,
       - migratetype of the range turns to be MIGRATE_ISOLATE  if
         its type is MIGRATE_MOVABLE. (*) this check can be updated if other
         memory reclaiming works make progress.
       - MIGRATE_ISOLATE is not on migratetype fallback list.
       - All free pages and will-be-freed pages are isolated.
      To check all pages in the range are isolated or not,  use test_pages_isolated(),
      To cancel isolation, use undo_isolate_page_range().
      
      Changes V6 -> V7
       - removed unnecessary #ifdef
      
      There are HOLES_IN_ZONE handling codes...I'm glad if we can remove them..
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5d76b54