1. 18 9月, 2012 1 次提交
    • Q
      memory hotplug: fix section info double registration bug · f14851af
      qiuxishi 提交于
      There may be a bug when registering section info.  For example, on my
      Itanium platform, the pfn range of node0 includes the other nodes, so
      other nodes' section info will be double registered, and memmap's page
      count will equal to 3.
      
        node0: start_pfn=0x100,    spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00
        node1: start_pfn=0x80000,  spanned_pfn=0x80000,  present_pfn=0x80000, => 0x080000-0x100000
        node2: start_pfn=0x100000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x100000-0x180000
        node3: start_pfn=0x180000, spanned_pfn=0x80000,  present_pfn=0x80000, => 0x180000-0x200000
      
        free_all_bootmem_node()
      	register_page_bootmem_info_node()
      		register_page_bootmem_info_section()
      
      When hot remove memory, we can't free the memmap's page because
      page_count() is 2 after put_page_bootmem().
      
        sparse_remove_one_section()
      	free_section_usemap()
      		free_map_bootmem()
      			put_page_bootmem()
      
      [akpm@linux-foundation.org: add code comment]
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f14851af
  2. 01 8月, 2012 3 次提交
    • J
      mm/hotplug: free zone->pageset when a zone becomes empty · 340175b7
      Jiang Liu 提交于
      When a zone becomes empty after memory offlining, free zone->pageset.
      Otherwise it will cause memory leak when adding memory to the empty zone
      again because build_all_zonelists() will allocate zone->pageset for an
      empty zone.
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Signed-off-by: NWei Wang <Bessel.Wang@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      340175b7
    • J
      mm/hotplug: correctly add new zone to all other nodes' zone lists · 08dff7b7
      Jiang Liu 提交于
      When online_pages() is called to add new memory to an empty zone, it
      rebuilds all zone lists by calling build_all_zonelists().  But there's a
      bug which prevents the new zone to be added to other nodes' zone lists.
      
      online_pages() {
      	build_all_zonelists()
      	.....
      	node_set_state(zone_to_nid(zone), N_HIGH_MEMORY)
      }
      
      Here the node of the zone is put into N_HIGH_MEMORY state after calling
      build_all_zonelists(), but build_all_zonelists() only adds zones from
      nodes in N_HIGH_MEMORY state to the fallback zone lists.
      build_all_zonelists()
      
          ->__build_all_zonelists()
      	->build_zonelists()
      	    ->find_next_best_node()
      		->for_each_node_state(n, N_HIGH_MEMORY)
      
      So memory in the new zone will never be used by other nodes, and it may
      cause strange behavor when system is under memory pressure.  So put node
      into N_HIGH_MEMORY state before calling build_all_zonelists().
      Signed-off-by: NJianguo Wu <wujianguo@huawei.com>
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08dff7b7
    • J
      mm/hotplug: correctly setup fallback zonelists when creating new pgdat · 9adb62a5
      Jiang Liu 提交于
      When hotadd_new_pgdat() is called to create new pgdat for a new node, a
      fallback zonelist should be created for the new node.  There's code to try
      to achieve that in hotadd_new_pgdat() as below:
      
      	/*
      	 * The node we allocated has no zone fallback lists. For avoiding
      	 * to access not-initialized zonelist, build here.
      	 */
      	mutex_lock(&zonelists_mutex);
      	build_all_zonelists(pgdat, NULL);
      	mutex_unlock(&zonelists_mutex);
      
      But it doesn't work as expected.  When hotadd_new_pgdat() is called, the
      new node is still in offline state because node_set_online(nid) hasn't
      been called yet.  And build_all_zonelists() only builds zonelists for
      online nodes as:
      
              for_each_online_node(nid) {
                      pg_data_t *pgdat = NODE_DATA(nid);
      
                      build_zonelists(pgdat);
                      build_zonelist_cache(pgdat);
              }
      
      Though we hope to create zonelist for the new pgdat, but it doesn't.  So
      add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
      the new pgdat too.
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9adb62a5
  3. 12 7月, 2012 1 次提交
  4. 30 5月, 2012 1 次提交
  5. 21 5月, 2012 1 次提交
  6. 13 1月, 2012 1 次提交
    • M
      mm: compaction: introduce sync-light migration for use by compaction · a6bc32b8
      Mel Gorman 提交于
      This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
      mode that avoids writing back pages to backing storage.  Async compaction
      maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
      For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
      used.
      
      This avoids sync compaction stalling for an excessive length of time,
      particularly when copying files to a USB stick where there might be a
      large number of dirty pages backed by a filesystem that does not support
      ->writepages.
      
      [aarcange@redhat.com: This patch is heavily based on Andrea's work]
      [akpm@linux-foundation.org: fix fs/nfs/write.c build]
      [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6bc32b8
  7. 31 10月, 2011 1 次提交
  8. 26 7月, 2011 1 次提交
    • D
      mm: extend memory hotplug API to allow memory hotplug in virtual machines · 9d0ad8ca
      Daniel Kiper 提交于
      This patch contains online_page_callback and apropriate functions for
      registering/unregistering online page callbacks.  It allows to do some
      machine specific tasks during online page stage which is required to
      implement memory hotplug in virtual machines.  Currently this patch is
      required by latest memory hotplug support for Xen balloon driver patch
      which will be posted soon.
      
      Additionally, originial online_page() function was splited into
      following functions doing "atomic" operations:
      
        - __online_page_set_limits() - set new limits for memory management code,
        - __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
        - __online_page_free() - free page to allocator.
      
      It was done to:
        - not duplicate existing code,
        - ease hotplug code devolpment by usage of well defined interface,
        - avoid stupid bugs which are unavoidable when the same code
          (by design) is developed in many places.
      
      [akpm@linux-foundation.org: use explicit indirect-call syntax]
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d0ad8ca
  9. 23 6月, 2011 2 次提交
  10. 16 6月, 2011 1 次提交
    • K
      mm/memory_hotplug.c: fix building of node hotplug zonelist · 959ecc48
      KAMEZAWA Hiroyuki 提交于
      During memory hotplug we refresh zonelists when we online a page in a new
      zone.  It means that the node's zonelist is not initialized until pages
      are onlined.  So for example, "nid" passed by MEM_GOING_ONLINE notifier
      will point to NODE_DATA(nid) which has no zone fallback list.  Moreover,
      if we hot-add cpu-only nodes, alloc_pages() will do no fallback.
      
      This patch makes a zonelist when a new pgdata is available.
      
      Note: in production, at fujitsu, memory should be onlined before cpu
            and our server didn't have any memory-less nodes and had no problems.
      
            But recent changes in MEM_GOING_ONLINE+page_cgroup
            will access not initialized zonelist of node.
            Anyway, there are memory-less node and we need some care.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      959ecc48
  11. 25 5月, 2011 4 次提交
  12. 15 4月, 2011 1 次提交
  13. 31 3月, 2011 1 次提交
  14. 14 1月, 2011 3 次提交
  15. 11 1月, 2011 1 次提交
  16. 03 12月, 2010 1 次提交
  17. 27 10月, 2010 6 次提交
  18. 19 10月, 2010 1 次提交
  19. 10 9月, 2010 1 次提交
  20. 25 5月, 2010 3 次提交
    • H
      mem-hotplug: fix potential race while building zonelist for new populated zone · 4eaf3f64
      Haicheng Li 提交于
      Add global mutex zonelists_mutex to fix the possible race:
      
           CPU0                                  CPU1                    CPU2
      (1) zone->present_pages += online_pages;
      (2)                                       build_all_zonelists();
      (3)                                                               alloc_page();
      (4)                                                               free_page();
      (5) build_all_zonelists();
      (6)   __build_all_zonelists();
      (7)     zone->pageset = alloc_percpu();
      
      In step (3,4), zone->pageset still points to boot_pageset, so bad
      things may happen if 2+ nodes are in this state. Even if only 1 node
      is accessing the boot_pageset, (3) may still consume too much memory
      to fail the memory allocations in step (7).
      
      Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
      since there is a new fresh memory block added in step(6).
      
      [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eaf3f64
    • H
      mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset · 1f522509
      Haicheng Li 提交于
      For each new populated zone of hotadded node, need to update its pagesets
      with dynamically allocated per_cpu_pageset struct for all possible CPUs:
      
          1) Detach zone->pageset from the shared boot_pageset
             at end of __build_all_zonelists().
      
          2) Use mutex to protect zone->pageset when it's still
             shared in onlined_pages()
      
      Otherwises, multiple zones of different nodes would share same boot strapping
      boot_pageset for same CPU, which will finally cause below kernel panic:
      
        ------------[ cut here ]------------
        kernel BUG at mm/page_alloc.c:1239!
        invalid opcode: 0000 [#1] SMP
        ...
        Call Trace:
         [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
         [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0
         [<ffffffff81128407>] __page_cache_alloc+0x67/0x70
         [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
         [<ffffffff81132751>] ra_submit+0x21/0x30
         [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
         [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
         [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
         [<ffffffff81266cfa>] nfs_file_read+0xca/0x130
         [<ffffffff8117b20a>] do_sync_read+0xfa/0x140
         [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0
         [<ffffffff8117c151>] sys_read+0x51/0x80
         [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b
        RIP  [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
         RSP <ffff88000d1e78a8>
        ---[ end trace 4bda28328b9990db ]
      
      [akpm@linux-foundation.org: merge fix]
      Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NAndi Kleen <andi.kleen@intel.com>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f522509
    • M
      cpu/mem hotplug: enable CPUs online before local memory online · cf23422b
      minskey guo 提交于
      Enable users to online CPUs even if the CPUs belongs to a numa node which
      doesn't have onlined local memory.
      
      The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
      in system boot/init period, or at the time of local memory online.  For a
      numa node without onlined local memory, its zonelists are not initialized
      at present.  As a result, any memory allocation operations executed by
      CPUs within this node will fail.  In fact, an out-of-memory error is
      triggered when attempt to online CPUs before memory comes to online.
      
      This patch tries to create zonelists for such numa nodes, so that the
      memory allocation for this node can be fallback'ed to other nodes.
      
      [akpm@linux-foundation.org: remove unneeded export]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: minskey guo<chaohong.guo@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf23422b
  21. 13 3月, 2010 1 次提交
    • W
      mm: introduce dump_page() and print symbolic flag names · 718a3821
      Wu Fengguang 提交于
      - introduce dump_page() to print the page info for debugging some error
        condition.
      
      - convert three mm users: bad_page(), print_bad_pte() and memory offline
        failure.
      
      - print an extra field: the symbolic names of page->flags
      
      Example dump_page() output:
      
      [  157.521694] page:ffffea0000a7cba8 count:2 mapcount:1 mapping:ffff88001c901791 index:0x147
      [  157.525570] page flags: 0x100000000100068(uptodate|lru|active|swapbacked)
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alex Chiang <achiang@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      718a3821
  22. 07 3月, 2010 1 次提交
  23. 16 12月, 2009 3 次提交
    • R
      mm: fix section mismatch in memory_hotplug.c · 23ce932a
      Rakib Mullick 提交于
      __free_pages_bootmem() is a __meminit function - which has been called
      from put_pages_bootmem thus causes a section mismatch warning.
      
       We were warned by the following warning:
      
        LD      mm/built-in.o
      WARNING: mm/built-in.o(.text+0x26b22): Section mismatch in reference
      from the function put_page_bootmem() to the function
      .meminit.text:__free_pages_bootmem()
      The function put_page_bootmem() references
      the function __meminit __free_pages_bootmem().
      This is often because put_page_bootmem lacks a __meminit
      annotation or the annotation of __free_pages_bootmem is wrong.
      Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23ce932a
    • A
      mm: memory_hotplug: make offline_pages() static · b4e655a4
      Andrew Morton 提交于
      It has no references outside memory_hotplug.c.
      
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4e655a4
    • H
      ksm: memory hotremove migration only · 62b61f61
      Hugh Dickins 提交于
      The previous patch enables page migration of ksm pages, but that soon gets
      into trouble: not surprising, since we're using the ksm page lock to lock
      operations on its stable_node, but page migration switches the page whose
      lock is to be used for that.  Another layer of locking would fix it, but
      do we need that yet?
      
      Do we actually need page migration of ksm pages?  Yes, memory hotremove
      needs to offline sections of memory: and since we stopped allocating ksm
      pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
      candidates for migration.
      
      But KSM is currently unconscious of NUMA issues, happily merging pages
      from different NUMA nodes: at present the rule must be, not to use
      MADV_MERGEABLE where you care about NUMA.  So no, NUMA page migration of
      ksm pages does not make sense yet.
      
      So, to complete support for ksm swapping we need to make hotremove safe.
      ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
      release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE.  But if mapped pages
      are freed before migration reaches them, stable_nodes may be left still
      pointing to struct pages which have been removed from the system: the
      stable_node needs to identify a page by pfn rather than page pointer, then
      it can safely prune them when MEM_OFFLINE.
      
      And make NUMA migration skip PageKsm pages where it skips PageReserved.
      But it's only when we reach unmap_and_move() that the page lock is taken
      and we can be sure that raised pagecount has prevented a PageAnon from
      being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
      page when offlining (has sufficient locking) but reject it otherwise.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Izik Eidus <ieidus@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Wright <chrisw@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62b61f61