1. 24 2月, 2013 12 次提交
  2. 19 12月, 2012 1 次提交
  3. 13 12月, 2012 3 次提交
    • J
      mm: introduce new field "managed_pages" to struct zone · 9feedc9d
      Jiang Liu 提交于
      Currently a zone's present_pages is calcuated as below, which is
      inaccurate and may cause trouble to memory hotplug.
      
      	spanned_pages - absent_pages - memmap_pages - dma_reserve.
      
      During fixing bugs caused by inaccurate zone->present_pages, we found
      zone->present_pages has been abused.  The field zone->present_pages may
      have different meanings in different contexts:
      
      1) pages existing in a zone.
      2) pages managed by the buddy system.
      
      For more discussions about the issue, please refer to:
        http://lkml.org/lkml/2012/11/5/866
        https://patchwork.kernel.org/patch/1346751/
      
      This patchset tries to introduce a new field named "managed_pages" to
      struct zone, which counts "pages managed by the buddy system".  And revert
      zone->present_pages to count "physical pages existing in a zone", which
      also keep in consistence with pgdat->node_present_pages.
      
      We will set an initial value for zone->managed_pages in function
      free_area_init_core() and will adjust it later if the initial value is
      inaccurate.
      
      For DMA/normal zones, the initial value is set to:
      
      	(spanned_pages - absent_pages - memmap_pages - dma_reserve)
      
      Later zone->managed_pages will be adjusted to the accurate value when the
      bootmem allocator frees all free pages to the buddy system in function
      free_all_bootmem_node() and free_all_bootmem().
      
      The bootmem allocator doesn't touch highmem pages, so highmem zones'
      managed_pages is set to the accurate value "spanned_pages - absent_pages"
      in function free_area_init_core() and won't be updated anymore.
      
      This patch also adds a new field "managed_pages" to /proc/zoneinfo
      and sysrq showmem.
      
      [akpm@linux-foundation.org: small comment tweaks]
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
      Tested-by: NChris Clayton <chris2553@googlemail.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9feedc9d
    • L
      memory_hotplug: allow online/offline memory to result movable node · 09285af7
      Lai Jiangshan 提交于
      Now, memory management can handle movable node or nodes which don't have
      any normal memory, so we can dynamic configure and add movable node by:
      
      	online a ZONE_MOVABLE memory from a previous offline node
      	offline the last normal memory which result a non-normal-memory-node
      
      movable-node is very important for power-saving, hardware partitioning and
      high-available-system(hardware fault management).
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09285af7
    • L
      hotplug: update nodemasks management · 6715ddf9
      Lai Jiangshan 提交于
      Update nodemasks management for N_MEMORY.
      
      [lliubbo@gmail.com: fix build]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Lin Feng <linfeng@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NBob Liu <lliubbo@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6715ddf9
  4. 12 12月, 2012 7 次提交
    • L
      memory_hotplug: ensure every online node has NORMAL memory · 74d42d8f
      Lai Jiangshan 提交于
      Old memory hotplug code and new online/movable may cause a online node
      don't have any normal memory, but memory-management acts bad when we have
      nodes which is online but don't have any normal memory.  Example: it may
      cause a bound task fail on all kernel allocation and cause the task can't
      create task or create other kernel object.
      
      So we disable non-normal-memory-node here, we will enable it when we
      prepared.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74d42d8f
    • L
      memory_hotplug: handle empty zone when online_movable/online_kernel · e455a9b9
      Lai Jiangshan 提交于
      Make online_movable/online_kernel can empty a zone or can move memory to a
      empty zone.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e455a9b9
    • L
      mm, memory-hotplug: dynamic configure movable memory and portion memory · 511c2aba
      Lai Jiangshan 提交于
      Add online_movable and online_kernel for logic memory hotplug.  This is
      the dynamic version of "movablecore" & "kernelcore".
      
      We have the same reason to introduce it as to introduce "movablecore" &
      "kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
      it is dynamic/running-time:
      
      o We can configure memory as kernelcore or movablecore after boot.
      
        Userspace workload is increased, we need more hugepage, we can't use
        "online_movable" to add memory and allow the system use more
        THP(transparent-huge-page), vice-verse when kernel workload is increase.
      
        Also help for virtualization to dynamic configure host/guest's memory,
        to save/(reduce waste) memory.
      
        Memory capacity on Demand
      
      o When a new node is physically online after boot, we need to use
        "online_movable" or "online_kernel" to configure/portion it as we
        expected when we logic-online it.
      
        This configuration also helps for physically-memory-migrate.
      
      o all benefit as the same as existed "movablecore" & "kernelcore".
      
      o Preparing for movable-node, which is very important for power-saving,
        hardware partitioning and high-available-system(hardware fault
        management).
      
      (Note, we don't introduce movable-node here.)
      
      Action behavior:
      When a memoryblock/memorysection is onlined by "online_movable", the kernel
      will not have directly reference to the page of the memoryblock,
      thus we can remove that memory any time when needed.
      
      When it is online by "online_kernel", the kernel can use it.
      When it is online by "online", the zone type doesn't changed.
      
      Current constraints:
      Only the memoryblock which is adjacent to the ZONE_MOVABLE
      can be online from ZONE_NORMAL to ZONE_MOVABLE.
      
      [akpm@linux-foundation.org: use min_t, cleanups]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      511c2aba
    • T
      mm/memory_hotplug.c: update start_pfn in zone and pg_data when spanned_pages == 0. · 712cd386
      Tang Chen 提交于
      If we hot-remove memory only and leave the cpus alive, the corresponding
      node will not be removed.  But the node_start_pfn and node_spanned_pages
      in pg_data will be reset to 0.  In this case, when we hot-add the memory
      back next time, the node_start_pfn will always be 0 because no pfn is less
      than 0.  After that, if we hot-remove the memory again, it will cause
      kernel panic in function find_biggest_section_pfn() when it tries to scan
      all the pfns.
      
      The zone will also have the same problem.
      
      This patch sets start_pfn to the start_pfn of the section being added when
      spanned_pages of the zone or pg_data is 0.
      
        ---How to reproduce---
      
      1. hot-add a container with some memory and cpus;
      2. hot-remove the container's memory, and leave cpus there;
      3. hot-add these memory again;
      4. hot-remove them again;
      
      then, the kernel will panic.
      
        ---Call trace---
      
        BUG: unable to handle kernel paging request at 00000fff82a8cc38
        IP: [<ffffffff811c0d55>] find_biggest_section_pfn+0xe5/0x180
        ......
        Call Trace:
         [<ffffffff811c1124>] __remove_zone+0x184/0x1b0
         [<ffffffff811c11dc>] __remove_section+0x8c/0xb0
         [<ffffffff811c12e7>] __remove_pages+0xe7/0x120
         [<ffffffff81654f7c>] arch_remove_memory+0x2c/0x80
         [<ffffffff81655bb6>] remove_memory+0x56/0x90
         [<ffffffff813da0c8>] acpi_memory_device_remove_memory+0x48/0x73
         [<ffffffff813da55a>] acpi_memory_device_notify+0x153/0x274
         [<ffffffff813b6786>] acpi_ev_notify_dispatch+0x41/0x5f
         [<ffffffff813a3867>] acpi_os_execute_deferred+0x27/0x34
         [<ffffffff81090589>] process_one_work+0x219/0x680
         [<ffffffff810923be>] worker_thread+0x12e/0x320
         [<ffffffff81098396>] kthread+0xc6/0xd0
         [<ffffffff8167c7c4>] kernel_thread_helper+0x4/0x10
        ......
        ---[ end trace 96d845dbf33fee11 ]---
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      712cd386
    • L
      memory_hotplug: fix possible incorrect node_states[N_NORMAL_MEMORY] · d9713679
      Lai Jiangshan 提交于
      Currently memory_hotplug only manages the node_states[N_HIGH_MEMORY], it
      forgets to manage node_states[N_NORMAL_MEMORY].  This may cause
      node_states[N_NORMAL_MEMORY] to become incorrect.
      
      Example, if a node is empty before online, and we online a memory which is
      in ZONE_NORMAL.  And after online, node_states[N_HIGH_MEMORY] is correct,
      but node_states[N_NORMAL_MEMORY] is incorrect, the online code doesn't set
      the new online node to node_states[N_NORMAL_MEMORY].
      
      The same thing will happen when offlining (the offline code doesn't clear
      the node from node_states[N_NORMAL_MEMORY] when needed).  Some memory
      managment code depends node_states[N_NORMAL_MEMORY], so we have to fix up
      the node_states[N_NORMAL_MEMORY].
      
      We add node_states_check_changes_online() and
      node_states_check_changes_offline() to detect whether
      node_states[N_HIGH_MEMORY] and node_states[N_NORMAL_MEMORY] are changed
      while hotpluging.
      
      Also add @status_change_nid_normal to struct memory_notify, thus the
      memory hotplug callbacks know whether the node_states[N_NORMAL_MEMORY] are
      changed.  (We can add a @flags and reuse @status_change_nid instead of
      introducing @status_change_nid_normal, but it will add much more
      complexity in memory hotplug callback in every subsystem.  So introducing
      @status_change_nid_normal is better and it doesn't change the sematics of
      @status_change_nid)
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9713679
    • W
      memory-hotplug: allocate zone's pcp before onlining pages · 6dcd73d7
      Wen Congyang 提交于
      We use __free_page() to put a page to buddy system when onlining pages.
      __free_page() will store NR_FREE_PAGES in zone's pcp.vm_stat_diff, so we
      should allocate zone's pcp before onlining pages, otherwise we will lose
      some free pages.
      
      [mhocko@suse.cz: make zone_pcp_reset independent of MEMORY_HOTREMOVE]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dcd73d7
    • W
      memory-hotplug: skip HWPoisoned page when offlining pages · b023f468
      Wen Congyang 提交于
      hwpoisoned may be set when we offline a page by the sysfs interface
      /sys/devices/system/memory/soft_offline_page or
      /sys/devices/system/memory/hard_offline_page. If we don't clear
      this flag when onlining pages, this page can't be freed, and will
      not in free list. So we can't offline these pages again. So we
      should skip such page when offlining pages.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b023f468
  5. 11 12月, 2012 1 次提交
  6. 19 11月, 2012 1 次提交
  7. 17 11月, 2012 1 次提交
    • A
      revert "mm: fix-up zone present pages" · 5576646f
      Andrew Morton 提交于
      Revert commit 7f1290f2 ("mm: fix-up zone present pages")
      
      That patch tried to fix a issue when calculating zone->present_pages,
      but it caused a regression on 32bit systems with HIGHMEM.  With that
      change, reset_zone_present_pages() resets all zone->present_pages to
      zero, and fixup_zone_present_pages() is called to recalculate
      zone->present_pages when the boot allocator frees core memory pages into
      buddy allocator.  Because highmem pages are not freed by bootmem
      allocator, all highmem zones' present_pages becomes zero.
      
      Various options for improving the situation are being discussed but for
      now, let's return to the 3.6 code.
      
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Tested-by: NChris Clayton <chris2553@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5576646f
  8. 09 10月, 2012 6 次提交
    • Y
      memory-hotplug: suppress "Trying to free nonexistent resource... · d760afd4
      Yasuaki Ishimatsu 提交于
      memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
      
      When our x86 box calls __remove_pages(), release_mem_region() shows many
      warnings.  And x86 box cannot unregister iomem_resource.
      
        "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>"
      
      release_mem_region() has been changed to be called in each
      PAGES_PER_SECTION by commit de7f0cba ("memory hotplug: release
      memory regions in PAGES_PER_SECTION chunks").  Because powerpc registers
      iomem_resource in each PAGES_PER_SECTION chunk.  But when I hot add
      memory on x86 box, iomem_resource is register in each _CRS not
      PAGES_PER_SECTION chunk.  So x86 box unregisters iomem_resource.
      
      The patch fixes the problem.
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Nathan Fontenot <nfont@austin.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d760afd4
    • W
      memory-hotplug: update memory block's state and notify userspace · e90bdb7f
      Wen Congyang 提交于
      remove_memory() will be called when hot removing a memory device.  But
      even if offlining memory, we cannot notice it.  So the patch updates the
      memory block's state and sends notification to userspace.
      
      Additionally, the memory device may contain more than one memory block.
      If the memory block has been offlined, __offline_pages() will fail.  So we
      should try to offline one memory block at a time.
      
      Thus remove_memory() also check each memory block's state.  So there is no
      need to check the memory block's state before calling remove_memory().
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e90bdb7f
    • W
      memory-hotplug: preparation to notify memory block's state at memory hot remove · a16cee10
      Wen Congyang 提交于
      remove_memory() is called in two cases:
      1. echo offline >/sys/devices/system/memory/memoryXX/state
      2. hot remove a memory device
      
      In the 1st case, the memory block's state is changed and the notification
      that memory block's state changed is sent to userland after calling
      remove_memory().  So user can notice memory block is changed.
      
      But in the 2nd case, the memory block's state is not changed and the
      notification is not also sent to userspcae even if calling
      remove_memory().  So user cannot notice memory block is changed.
      
      For adding the notification at memory hot remove, the patch just prepare
      as follows:
      1st case uses offline_pages() for offlining memory.
      2nd case uses remove_memory() for offlining memory and changing memory block's
          state and notifing the information.
      
      The patch does not implement notification to remove_memory().
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a16cee10
    • J
      mm: fix-up zone present pages · 7f1290f2
      Jianguo Wu 提交于
      I think zone->present_pages indicates pages that buddy system can management,
      it should be:
      
      	zone->present_pages = spanned pages - absent pages - bootmem pages,
      
      but is now:
      	zone->present_pages = spanned pages - absent pages - memmap pages.
      
      spanned pages: total size, including holes.
      absent pages: holes.
      bootmem pages: pages used in system boot, managed by bootmem allocator.
      memmap pages: pages used by page structs.
      
      This may cause zone->present_pages less than it should be.  For example,
      numa node 1 has ZONE_NORMAL and ZONE_MOVABLE, it's memmap and other
      bootmem will be allocated from ZONE_MOVABLE, so ZONE_NORMAL's
      present_pages should be spanned pages - absent pages, but now it also
      minus memmap pages(free_area_init_core), which are actually allocated from
      ZONE_MOVABLE.  When offlining all memory of a zone, this will cause
      zone->present_pages less than 0, because present_pages is unsigned long
      type, it is actually a very large integer, it indirectly caused
      zone->watermark[WMARK_MIN] becomes a large
      integer(setup_per_zone_wmarks()), than cause totalreserve_pages become a
      large integer(calculate_totalreserve_pages()), and finally cause memory
      allocating failure when fork process(__vm_enough_memory()).
      
      [root@localhost ~]# dmesg
      -bash: fork: Cannot allocate memory
      
      I think the bug described in
      
        http://marc.info/?l=linux-mm&m=134502182714186&w=2
      
      is also caused by wrong zone present pages.
      
      This patch intends to fix-up zone->present_pages when memory are freed to
      buddy system on x86_64 and IA64 platforms.
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Reported-by: NPetr Tesarik <ptesarik@suse.cz>
      Tested-by: NPetr Tesarik <ptesarik@suse.cz>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f1290f2
    • M
      memory-hotplug: don't replace lowmem pages with highmem · 74c08f98
      Minchan Kim 提交于
      The changelog for commit 6a6dccba ("mm: cma: don't replace lowmem
      pages with highmem") mentioned that lowmem pages can be replaced by
      highmem pages during CMA migration.  6a6dccba fixed that issue.
      
      Quote from that changelog:
      
      :   The filesystem layer expects pages in the block device's mapping to not
      :   be in highmem (the mapping's gfp mask is set in bdget()), but CMA can
      :   currently replace lowmem pages with highmem pages, leading to crashes in
      :   filesystem code such as the one below:
      :
      :     Unable to handle kernel NULL pointer dereference at virtual address 00000400
      :     pgd = c0c98000
      :     [00000400] *pgd=00c91831, *pte=00000000, *ppte=00000000
      :     Internal error: Oops: 817 [#1] PREEMPT SMP ARM
      :     CPU: 0    Not tainted  (3.5.0-rc5+ #80)
      :     PC is at __memzero+0x24/0x80
      :     ...
      :     Process fsstress (pid: 323, stack limit = 0xc0cbc2f0)
      :     Backtrace:
      :     [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98)
      :     [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc)
      :      r4:c15337f0
      :     [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98)
      :     [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac)
      :      r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0
      :     [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24)
      :      r6:beccdcf0 r5:00074000 r4:beccdbbc
      :     [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
      
      Memory-hotplug has same problem as CMA has so the same fix can be applied
      to memory-hotplug as well.
      
      Fix it by reusing.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74c08f98
    • X
      memory-hotplug: build zonelists when offlining pages · 1e8537ba
      Xishi Qiu 提交于
      online_pages() does build_all_zonelists() and zone_pcp_update(), I think
      offline_pages() should do it too.
      
      When the zone has no memory to allocate, remove it from other nodes'
      zonelists.  zone_batchsize() depends on zone's present pages, if zone's
      present pages are changed, zone's pcp should be updated.
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e8537ba
  9. 18 9月, 2012 1 次提交
    • Q
      memory hotplug: fix section info double registration bug · f14851af
      qiuxishi 提交于
      There may be a bug when registering section info.  For example, on my
      Itanium platform, the pfn range of node0 includes the other nodes, so
      other nodes' section info will be double registered, and memmap's page
      count will equal to 3.
      
        node0: start_pfn=0x100,    spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
        node1: start_pfn=0x80000,  spanned_pfn=0x80000,  present_pfn=0x80000, => 0x080000-0x100000
        node2: start_pfn=0x100000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x100000-0x180000
        node3: start_pfn=0x180000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x180000-0x200000
      
        free_all_bootmem_node()
      	register_page_bootmem_info_node()
      		register_page_bootmem_info_section()
      
      When hot remove memory, we can't free the memmap's page because
      page_count() is 2 after put_page_bootmem().
      
        sparse_remove_one_section()
      	free_section_usemap()
      		free_map_bootmem()
      			put_page_bootmem()
      
      [akpm@linux-foundation.org: add code comment]
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f14851af
  10. 01 8月, 2012 3 次提交
    • J
      mm/hotplug: free zone->pageset when a zone becomes empty · 340175b7
      Jiang Liu 提交于
      When a zone becomes empty after memory offlining, free zone->pageset.
      Otherwise it will cause memory leak when adding memory to the empty zone
      again because build_all_zonelists() will allocate zone->pageset for an
      empty zone.
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Signed-off-by: NWei Wang <Bessel.Wang@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      340175b7
    • J
      mm/hotplug: correctly add new zone to all other nodes' zone lists · 08dff7b7
      Jiang Liu 提交于
      When online_pages() is called to add new memory to an empty zone, it
      rebuilds all zone lists by calling build_all_zonelists().  But there's a
      bug which prevents the new zone to be added to other nodes' zone lists.
      
      online_pages() {
      	build_all_zonelists()
      	.....
      	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
      }
      
      Here the node of the zone is put into N_HIGH_MEMORY state after calling
      build_all_zonelists(), but build_all_zonelists() only adds zones from
      nodes in N_HIGH_MEMORY state to the fallback zone lists.
      build_all_zonelists()
      
          ->__build_all_zonelists()
      	->build_zonelists()
      	    ->find_next_best_node()
      		->for_each_node_state(n, N_HIGH_MEMORY)
      
      So memory in the new zone will never be used by other nodes, and it may
      cause strange behavor when system is under memory pressure.  So put node
      into N_HIGH_MEMORY state before calling build_all_zonelists().
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08dff7b7
    • J
      mm/hotplug: correctly setup fallback zonelists when creating new pgdat · 9adb62a5
      Jiang Liu 提交于
      When hotadd_new_pgdat() is called to create new pgdat for a new node, a
      fallback zonelist should be created for the new node.  There's code to try
      to achieve that in hotadd_new_pgdat() as below:
      
      	/*
      	 * The node we allocated has no zone fallback lists. For avoiding
      	 * to access not-initialized zonelist, build here.
      	 */
      	mutex_lock(&zonelists_mutex);
      	build_all_zonelists(pgdat, NULL);
      	mutex_unlock(&zonelists_mutex);
      
      But it doesn't work as expected.  When hotadd_new_pgdat() is called, the
      new node is still in offline state because node_set_online(nid) hasn't
      been called yet.  And build_all_zonelists() only builds zonelists for
      online nodes as:
      
              for_each_online_node(nid) {
                      pg_data_t *pgdat = NODE_DATA(nid);
      
                      build_zonelists(pgdat);
                      build_zonelist_cache(pgdat);
              }
      
      Though we hope to create zonelist for the new pgdat, but it doesn't.  So
      add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
      the new pgdat too.
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9adb62a5
  11. 12 7月, 2012 1 次提交
  12. 30 5月, 2012 1 次提交
  13. 21 5月, 2012 1 次提交
  14. 13 1月, 2012 1 次提交
    • M
      mm: compaction: introduce sync-light migration for use by compaction · a6bc32b8
      Mel Gorman 提交于
      This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
      mode that avoids writing back pages to backing storage.  Async compaction
      maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
      For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
      used.
      
      This avoids sync compaction stalling for an excessive length of time,
      particularly when copying files to a USB stick where there might be a
      large number of dirty pages backed by a filesystem that does not support
      ->writepages.
      
      [aarcange@redhat.com: This patch is heavily based on Andrea's work]
      [akpm@linux-foundation.org: fix fs/nfs/write.c build]
      [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6bc32b8