1. 16 1月, 2016 1 次提交
  2. 23 10月, 2015 1 次提交
  3. 28 8月, 2015 1 次提交
    • D
      mm: ZONE_DEVICE for "device memory" · 033fbae9
      Dan Williams 提交于
      While pmem is usable as a block device or via DAX mappings to userspace
      there are several usage scenarios that can not target pmem due to its
      lack of struct page coverage. In preparation for "hot plugging" pmem
      into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
      separately from the ones that are subject to standard page allocations.
      Importantly "device memory" can be removed at will by userspace
      unbinding the driver of the device.
      
      Having a separate zone prevents allocation and otherwise marks these
      pages that are distinct from typical uniform memory.  Device memory has
      different lifetime and performance characteristics than RAM.  However,
      since we have run out of ZONES_SHIFT bits this functionality currently
      depends on sacrificing ZONE_DMA.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Jerome Glisse <j.glisse@gmail.com>
      [hch: various simplifications in the arch interface]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      033fbae9
  4. 15 4月, 2015 1 次提交
    • D
      mm, hotplug: fix concurrent memory hot-add deadlock · 30467e0b
      David Rientjes 提交于
      There's a deadlock when concurrently hot-adding memory through the probe
      interface and switching a memory block from offline to online.
      
      When hot-adding memory via the probe interface, add_memory() first takes
      mem_hotplug_begin() and then device_lock() is later taken when registering
      the newly initialized memory block.  This creates a lock dependency of (1)
      mem_hotplug.lock (2) dev->mutex.
      
      When switching a memory block from offline to online, dev->mutex is first
      grabbed in device_online() when the write(2) transitions an existing
      memory block from offline to online, and then online_pages() will take
      mem_hotplug_begin().
      
      This creates a lock inversion between mem_hotplug.lock and dev->mutex.
      Vitaly reports that this deadlock can happen when kworker handling a probe
      event races with systemd-udevd switching a memory block's state.
      
      This patch requires the state transition to take mem_hotplug_begin()
      before dev->mutex.  Hot-adding memory via the probe interface creates a
      memory block while holding mem_hotplug_begin(), there is no way to take
      dev->mutex first in this case.
      
      online_pages() and offline_pages() are only called when transitioning
      memory block state.  We now require that mem_hotplug_begin() is taken
      before calling them -- this requires exporting the mem_hotplug_begin() and
      mem_hotplug_done() to generic code.  In all hot-add and hot-remove cases,
      mem_hotplug_begin() is done prior to device_online().  This is all that is
      needed to avoid the deadlock.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30467e0b
  5. 10 10月, 2014 1 次提交
    • Z
      memory-hotplug: add sysfs valid_zones attribute · ed2f2400
      Zhang Zhen 提交于
      Currently memory-hotplug has two limits:
      
      1. If the memory block is in ZONE_NORMAL, you can change it to
         ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
      
      2. If the memory block is in ZONE_MOVABLE, you can change it to
         ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
      
      With this patch, we can easy to know a memory block can be onlined to
      which zone, and don't need to know the above two limits.
      
      Updated the related Documentation.
      
      [akpm@linux-foundation.org: use conventional comment layout]
      [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
      [akpm@linux-foundation.org: remove unused local zone_prev]
      Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed2f2400
  6. 07 8月, 2014 2 次提交
    • W
      memory-hotplug: add zone_for_memory() for selecting zone for new memory · 63264400
      Wang Nan 提交于
      This series of patches fixes a problem when adding memory in bad manner.
      For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
      memory installed, following commands cause problem:
      
        # echo 0x40000000 > /sys/devices/system/memory/probe
       [   28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
        # echo 0x48000000 > /sys/devices/system/memory/probe
       [   28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
        # echo online_movable > /sys/devices/system/memory/memory9/state
        # echo 0x50000000 > /sys/devices/system/memory/probe
       [   29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
        # echo 0x58000000 > /sys/devices/system/memory/probe
       [   29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
        # echo online_movable > /sys/devices/system/memory/memory11/state
        # echo online> /sys/devices/system/memory/memory8/state
        # echo online> /sys/devices/system/memory/memory10/state
        # echo offline> /sys/devices/system/memory/memory9/state
       [   30.558819] Offlined Pages 32768
        # free
                    total       used       free     shared    buffers     cached
       Mem:        780588 18014398509432020     830552          0          0      51180
       -/+ buffers/cache: 18014398509380840     881732
       Swap:            0          0          0
      
      This is because the above commands probe higher memory after online a
      section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
      for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.
      
      After the second online_movable, the problem can be observed from
      zoneinfo:
      
        # cat /proc/zoneinfo
        ...
        Node 0, zone  Movable
          pages free     65491
                min      250
                low      312
                high     375
                scanned  0
                spanned  18446744073709518848
                present  65536
                managed  65536
        ...
      
      This series of patches solve the problem by checking ZONE_MOVABLE when
      choosing zone for new memory.  If new memory is inside or higher than
      ZONE_MOVABLE, makes it go there instead.
      
      After applying this series of patches, following are free and zoneinfo
      result (after offlining memory9):
      
        bash-4.2# free
                      total       used       free     shared    buffers     cached
         Mem:        780956      80112     700844          0          0      51180
         -/+ buffers/cache:      28932     752024
         Swap:            0          0          0
      
        bash-4.2# cat /proc/zoneinfo
      
        Node 0, zone      DMA
          pages free     3389
                min      14
                low      17
                high     21
                scanned  0
                spanned  4095
                present  3998
                managed  3977
            nr_free_pages 3389
        ...
          start_pfn:         1
          inactive_ratio:    1
        Node 0, zone    DMA32
          pages free     73724
                min      341
                low      426
                high     511
                scanned  0
                spanned  98304
                present  98304
                managed  92958
            nr_free_pages 73724
          ...
          start_pfn:         4096
          inactive_ratio:    1
        Node 0, zone   Normal
          pages free     32630
                min      120
                low      150
                high     180
                scanned  0
                spanned  32768
                present  32768
                managed  32768
            nr_free_pages 32630
        ...
          start_pfn:         262144
          inactive_ratio:    1
        Node 0, zone  Movable
          pages free     65476
                min      241
                low      301
                high     361
                scanned  0
                spanned  98304
                present  65536
                managed  65536
            nr_free_pages 65476
        ...
          start_pfn:         294912
          inactive_ratio:    1
      
      This patch (of 7):
      
      Introduce zone_for_memory() in arch independent code for
      arch_add_memory() use.
      
      Many arch_add_memory() function simply selects ZONE_HIGHMEM or
      ZONE_NORMAL and add new memory into it.  However, with the existance of
      ZONE_MOVABLE, the selection method should be carefully considered: if
      new, higher memory is added after ZONE_MOVABLE is setup, the default
      zone and ZONE_MOVABLE may overlap each other.
      
      should_add_memory_movable() checks the status of ZONE_MOVABLE.  If it
      has already contain memory, compare the address of new memory and
      movable memory.  If new memory is higher than movable, it should be
      added into ZONE_MOVABLE instead of default zone.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: "Mel Gorman" <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63264400
    • T
      mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1 · 4f7c6b49
      Tang Chen 提交于
      In store_mem_state(), we have:
      
        ...
        334         else if (!strncmp(buf, "offline", min_t(int, count, 7)))
        335                 online_type = -1;
        ...
        355         case -1:
        356                 ret = device_offline(&mem->dev);
        357                 break;
        ...
      
      Here, "offline" is hard coded as -1.
      
      This patch does the following renaming:
      
       ONLINE_KEEP     ->  MMOP_ONLINE_KEEP
       ONLINE_KERNEL   ->  MMOP_ONLINE_KERNEL
       ONLINE_MOVABLE  ->  MMOP_ONLINE_MOVABLE
      
      and introduces MMOP_OFFLINE = -1 to avoid hard coding.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Hu Tao <hutao@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f7c6b49
  7. 05 6月, 2014 1 次提交
    • V
      mem-hotplug: implement get/put_online_mems · bfc8c901
      Vladimir Davydov 提交于
      kmem_cache_{create,destroy,shrink} need to get a stable value of
      cpu/node online mask, because they init/destroy/access per-cpu/node
      kmem_cache parts, which can be allocated or destroyed on cpu/mem
      hotplug.  To protect against cpu hotplug, these functions use
      {get,put}_online_cpus.  However, they do nothing to synchronize with
      memory hotplug - taking the slab_mutex does not eliminate the
      possibility of race as described in patch 2.
      
      What we need there is something like get_online_cpus, but for memory.
      We already have lock_memory_hotplug, which serves for the purpose, but
      it's a bit of a hammer right now, because it's backed by a mutex.  As a
      result, it imposes some limitations to locking order, which are not
      desirable, and can't be used just like get_online_cpus.  That's why in
      patch 1 I substitute it with get/put_online_mems, which work exactly
      like get/put_online_cpus except they block not cpu, but memory hotplug.
      
      [ v1 can be found at https://lkml.org/lkml/2014/4/6/68.  I NAK'ed it by
        myself, because it used an rw semaphore for get/put_online_mems,
        making them dead lock prune.  ]
      
      This patch (of 2):
      
      {un}lock_memory_hotplug, which is used to synchronize against memory
      hotplug, is currently backed by a mutex, which makes it a bit of a
      hammer - threads that only want to get a stable value of online nodes
      mask won't be able to proceed concurrently.  Also, it imposes some
      strong locking ordering rules on it, which narrows down the set of its
      usage scenarios.
      
      This patch introduces get/put_online_mems, which are the same as
      get/put_online_cpus, but for memory hotplug, i.e.  executing a code
      inside a get/put_online_mems section will guarantee a stable value of
      online nodes, present pages, etc.
      
      lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfc8c901
  8. 13 11月, 2013 2 次提交
  9. 02 6月, 2013 3 次提交
  10. 12 5月, 2013 1 次提交
    • R
      ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes · e2ff3940
      Rafael J. Wysocki 提交于
      During ACPI memory hotplug configuration bind memory blocks residing
      in modules removable through the standard ACPI mechanism to struct
      acpi_device objects associated with ACPI namespace objects
      representing those modules.  Accordingly, unbind those memory blocks
      from the struct acpi_device objects when the memory modules in
      question are being removed.
      
      When "offline" operation for devices representing memory blocks is
      introduced, this will allow the ACPI core's device hot-remove code to
      use it to carry out remove_memory() for those memory blocks and check
      the results of that before it actually removes the modules holding
      them from the system.
      
      Since walk_memory_range() is used for accessing all memory blocks
      corresponding to a given ACPI namespace object, it is exported from
      memory_hotplug.c so that the code in acpi_memhotplug.c can use it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NVasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      e2ff3940
  11. 30 4月, 2013 1 次提交
  12. 24 2月, 2013 5 次提交
    • W
      memory-hotplug: export the function try_offline_node() · 90b30cdc
      Wen Congyang 提交于
      try_offline_node() will be needed in the tristate
      drivers/acpi/processor_driver.c.
      
      The node will be offlined when all memory/cpu on the node have been
      hotremoved.  So we need the function try_offline_node() in cpu-hotplug
      path.
      
      If the memory-hotplug is disabled, and cpu-hotplug is enabled
      
      1. no memory no the node
         we don't online the node, and cpu's node is the nearest node.
      
      2. the node contains some memory
         the node has been onlined, and cpu's node is still needed
         to migrate the sleep task on the cpu to the same node.
      
      So we do nothing in try_offline_node() in this case.
      
      [rientjes@google.com: export the function try_offline_node() fix]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90b30cdc
    • T
      memory-hotplug: remove sysfs file of node · 60a5a19e
      Tang Chen 提交于
      Introduce a new function try_offline_node() to remove sysfs file of node
      when all memory sections of this node are removed.  If some memory
      sections of this node are not removed, this function does nothing.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60a5a19e
    • Y
      memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap · 46723bfa
      Yasuaki Ishimatsu 提交于
      For removing memmap region of sparse-vmemmap which is allocated bootmem,
      memmap region of sparse-vmemmap needs to be registered by
      get_page_bootmem().  So the patch searches pages of virtual mapping and
      registers the pages by get_page_bootmem().
      
      NOTE: register_page_bootmem_memmap() is not implemented for ia64,
            ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
            and revert register_page_bootmem_info_node() when platform doesn't
            support it.
      
            It's implemented by adding a new Kconfig option named
            CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
            by memory-hotplug feature fully supported archs(currently only on
            x86_64).
      
            Since we have 2 config options called MEMORY_HOTPLUG and
            MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
            and codes in function register_page_bootmem_info_node() are only
            used for collecting infomation for hot-remove, so reside it under
            MEMORY_HOTREMOVE.
      
            Besides page_isolation.c selected by MEMORY_ISOLATION under
            MEMORY_HOTPLUG is also such case, move it too.
      
      [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
      [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
      [mhocko@suse.cz: remove the arch specific functions without any implementation]
      [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
      [rientjes@google.com: fix defined but not used warning]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NWu Jianguo <wujianguo@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46723bfa
    • W
      memory-hotplug: introduce new arch_remove_memory() for removing page table · 24d335ca
      Wen Congyang 提交于
      For removing memory, we need to remove page tables.  But it depends on
      architecture.  So the patch introduce arch_remove_memory() for removing
      page table.  Now it only calls __remove_pages().
      
      Note: __remove_pages() for some archtecuture is not implemented
            (I don't know how to implement it for s390).
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24d335ca
    • Y
      memory-hotplug: check whether all memory blocks are offlined or not when removing memory · 6677e3ea
      Yasuaki Ishimatsu 提交于
      We remove the memory like this:
      
       1. lock memory hotplug
       2. offline a memory block
       3. unlock memory hotplug
       4. repeat 1-3 to offline all memory blocks
       5. lock memory hotplug
       6. remove memory(TODO)
       7. unlock memory hotplug
      
      All memory blocks must be offlined before removing memory.  But we don't
      hold the lock in the whole operation.  So we should check whether all
      memory blocks are offlined before step6.  Otherwise, kernel maybe
      panicked.
      
      Offlining a memory block and removing a memory device can be two
      different operations.  Users can just offline some memory blocks without
      removing the memory device.  For this purpose, the kernel has held
      lock_memory_hotplug() in __offline_pages().  To reuse the code for
      memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
      repeatedly lock and unlock memory hotplug, but not hold the memory
      hotplug lock in the whole operation.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6677e3ea
  13. 12 12月, 2012 1 次提交
    • L
      mm, memory-hotplug: dynamic configure movable memory and portion memory · 511c2aba
      Lai Jiangshan 提交于
      Add online_movable and online_kernel for logic memory hotplug.  This is
      the dynamic version of "movablecore" & "kernelcore".
      
      We have the same reason to introduce it as to introduce "movablecore" &
      "kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
      it is dynamic/running-time:
      
      o We can configure memory as kernelcore or movablecore after boot.
      
        Userspace workload is increased, we need more hugepage, we can't use
        "online_movable" to add memory and allow the system use more
        THP(transparent-huge-page), vice-verse when kernel workload is increase.
      
        Also help for virtualization to dynamic configure host/guest's memory,
        to save/(reduce waste) memory.
      
        Memory capacity on Demand
      
      o When a new node is physically online after boot, we need to use
        "online_movable" or "online_kernel" to configure/portion it as we
        expected when we logic-online it.
      
        This configuration also helps for physically-memory-migrate.
      
      o all benefit as the same as existed "movablecore" & "kernelcore".
      
      o Preparing for movable-node, which is very important for power-saving,
        hardware partitioning and high-available-system(hardware fault
        management).
      
      (Note, we don't introduce movable-node here.)
      
      Action behavior:
      When a memoryblock/memorysection is onlined by "online_movable", the kernel
      will not have directly reference to the page of the memoryblock,
      thus we can remove that memory any time when needed.
      
      When it is online by "online_kernel", the kernel can use it.
      When it is online by "online", the zone type doesn't changed.
      
      Current constraints:
      Only the memoryblock which is adjacent to the ZONE_MOVABLE
      can be online from ZONE_NORMAL to ZONE_MOVABLE.
      
      [akpm@linux-foundation.org: use min_t, cleanups]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      511c2aba
  14. 09 10月, 2012 2 次提交
    • W
      memory-hotplug: update memory block's state and notify userspace · e90bdb7f
      Wen Congyang 提交于
      remove_memory() will be called when hot removing a memory device.  But
      even if offlining memory, we cannot notice it.  So the patch updates the
      memory block's state and sends notification to userspace.
      
      Additionally, the memory device may contain more than one memory block.
      If the memory block has been offlined, __offline_pages() will fail.  So we
      should try to offline one memory block at a time.
      
      Thus remove_memory() also check each memory block's state.  So there is no
      need to check the memory block's state before calling remove_memory().
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e90bdb7f
    • W
      memory-hotplug: preparation to notify memory block's state at memory hot remove · a16cee10
      Wen Congyang 提交于
      remove_memory() is called in two cases:
      1. echo offline >/sys/devices/system/memory/memoryXX/state
      2. hot remove a memory device
      
      In the 1st case, the memory block's state is changed and the notification
      that memory block's state changed is sent to userland after calling
      remove_memory().  So user can notice memory block is changed.
      
      But in the 2nd case, the memory block's state is not changed and the
      notification is not also sent to userspcae even if calling
      remove_memory().  So user cannot notice memory block is changed.
      
      For adding the notification at memory hot remove, the patch just prepare
      as follows:
      1st case uses offline_pages() for offlining memory.
      2nd case uses remove_memory() for offlining memory and changing memory block's
          state and notifing the information.
      
      The patch does not implement notification to remove_memory().
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a16cee10
  15. 05 3月, 2012 1 次提交
    • P
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker 提交于
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  16. 26 7月, 2011 1 次提交
    • D
      mm: extend memory hotplug API to allow memory hotplug in virtual machines · 9d0ad8ca
      Daniel Kiper 提交于
      This patch contains online_page_callback and apropriate functions for
      registering/unregistering online page callbacks.  It allows to do some
      machine specific tasks during online page stage which is required to
      implement memory hotplug in virtual machines.  Currently this patch is
      required by latest memory hotplug support for Xen balloon driver patch
      which will be posted soon.
      
      Additionally, originial online_page() function was splited into
      following functions doing "atomic" operations:
      
        - __online_page_set_limits() - set new limits for memory management code,
        - __online_page_increment_counters() - increment totalram_pages and totalhigh_pages,
        - __online_page_free() - free page to allocator.
      
      It was done to:
        - not duplicate existing code,
        - ease hotplug code devolpment by usage of well defined interface,
        - avoid stupid bugs which are unavoidable when the same code
          (by design) is developed in many places.
      
      [akpm@linux-foundation.org: use explicit indirect-call syntax]
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d0ad8ca
  17. 14 1月, 2011 1 次提交
  18. 11 1月, 2011 1 次提交
  19. 03 12月, 2010 1 次提交
  20. 27 10月, 2010 1 次提交
    • K
      memory hotplug: unify is_removable and offline detection code · 49ac8255
      KAMEZAWA Hiroyuki 提交于
      Now, sysfs interface of memory hotplug shows whether the section is
      removable or not.  But it checks only migrateype of pages and doesn't
      check details of cluster of pages.
      
      Next, memory hotplug's set_migratetype_isolate() has the same kind of
      check, too.
      
      This patch adds the function __count_unmovable_pages() and makes above 2
      checks to use the same logic.  Then, is_removable and hotremove code uses
      the same logic.  No changes in the hotremove logic itself.
      
      TODO: need to find a way to check RECLAMABLE. But, considering bit,
            calling shrink_slab() against a range before starting memory hotremove
            sounds better. If so, this patch's logic doesn't need to be changed.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49ac8255
  21. 25 5月, 2010 1 次提交
    • M
      cpu/mem hotplug: enable CPUs online before local memory online · cf23422b
      minskey guo 提交于
      Enable users to online CPUs even if the CPUs belongs to a numa node which
      doesn't have onlined local memory.
      
      The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
      in system boot/init period, or at the time of local memory online.  For a
      numa node without onlined local memory, its zonelists are not initialized
      at present.  As a result, any memory allocation operations executed by
      CPUs within this node will fail.  In fact, an out-of-memory error is
      triggered when attempt to online CPUs before memory comes to online.
      
      This patch tries to create zonelists for such numa nodes, so that the
      memory allocation for this node can be fallback'ed to other nodes.
      
      [akpm@linux-foundation.org: remove unneeded export]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: minskey guo<chaohong.guo@intel.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf23422b
  22. 16 12月, 2009 1 次提交
  23. 23 9月, 2009 1 次提交
    • K
      walk system ram range · 908eedc6
      KAMEZAWA Hiroyuki 提交于
      Originally, walk_memory_resource() was introduced to traverse all memory
      of "System RAM" for detecting memory hotplug/unplug range.  For doing so,
      flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
      memory hotplug.
      
      But for using other purpose, /proc/kcore, this may includes some firmware
      area marked as IORESOURCE_BUSY | IORESOUCE_MEM.  This patch makes the
      check strict to find out busy "System RAM".
      
      Note: PPC64 keeps their own walk_memory_resouce(), which walk through
      ppc64's lmb informaton.  Because old kclist_add() is called per lmb, this
      patch makes no difference in behavior, finally.
      
      And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
      Because pfn_valid() just show "there is memmap or not* and cannot be used
      for "there is physical memory or not", this function is useful in generic
      to scan physical memory range.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Américo Wang <xiyou.wangcong@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      908eedc6
  24. 07 1月, 2009 1 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
  25. 25 7月, 2008 2 次提交
    • B
      memory-hotplug: add sysfs removable attribute for hotplug memory remove · 5c755e9f
      Badari Pulavarty 提交于
      Memory may be hot-removed on a per-memory-block basis, particularly on
      POWER where the SPARSEMEM section size often matches the memory-block
      size.  A user-level agent must be able to identify which sections of
      memory are likely to be removable before attempting the potentially
      expensive operation.  This patch adds a file called "removable" to the
      memory directory in sysfs to help such an agent.  In this patch, a memory
      block is considered removable if;
      
      o It contains only MOVABLE pageblocks
      o It contains only pageblocks with free pages regardless of pageblock type
      
      On the other hand, a memory block starting with a PageReserved() page will
      never be considered removable.  Without this patch, the user-agent is
      forced to choose a memory block to remove randomly.
      
      Sample output of the sysfs files:
      
      ./memory/memory0/removable: 0
      ./memory/memory1/removable: 0
      ./memory/memory2/removable: 0
      ./memory/memory3/removable: 0
      ./memory/memory4/removable: 0
      ./memory/memory5/removable: 0
      ./memory/memory6/removable: 0
      ./memory/memory7/removable: 1
      ./memory/memory8/removable: 0
      ./memory/memory9/removable: 0
      ./memory/memory10/removable: 0
      ./memory/memory11/removable: 0
      ./memory/memory12/removable: 0
      ./memory/memory13/removable: 0
      ./memory/memory14/removable: 0
      ./memory/memory15/removable: 0
      ./memory/memory16/removable: 0
      ./memory/memory17/removable: 1
      ./memory/memory18/removable: 1
      ./memory/memory19/removable: 1
      ./memory/memory20/removable: 1
      ./memory/memory21/removable: 1
      ./memory/memory22/removable: 1
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c755e9f
    • Y
      memory hotplug: small fixes to bootmem freeing for memory hotremove · af370fb8
      Yasunori Goto 提交于
      - Change some naming
        * Magic -> types
        * MIX_INFO -> MIX_SECTION_INFO
        * Change definition of bootmem type from direct hex value
      
      - __free_pages_bootmem() becomes __meminit.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af370fb8
  26. 09 6月, 2008 1 次提交
  27. 28 4月, 2008 2 次提交
    • Y
      memory hotplug: register section/node id to free · 04753278
      Yasunori Goto 提交于
      This patch set is to free pages which is allocated by bootmem for
      memory-hotremove.  Some structures of memory management are allocated by
      bootmem.  ex) memmap, etc.
      
      To remove memory physically, some of them must be freed according to
      circumstance.  This patch set makes basis to free those pages, and free
      memmaps.
      
      Basic my idea is using remain members of struct page to remember information
      of users of bootmem (section number or node id).  When the section is
      removing, kernel can confirm it.  By this information, some issues can be
      solved.
      
        1) When the memmap of removing section is allocated on other
           section by bootmem, it should/can be free.
        2) When the memmap of removing section is allocated on the
           same section, it shouldn't be freed. Because the section has to be
           logical memory offlined already and all pages must be isolated against
           page allocater. If it is freed, page allocator may use it which will
           be removed physically soon.
        3) When removing section has other section's memmap,
           kernel will be able to show easily which section should be removed
           before it for user. (Not implemented yet)
        4) When the above case 2), the page isolation will be able to check and skip
           memmap's page when logical memory offline (offline_pages()).
           Current page isolation code fails in this case because this page is
           just reserved page and it can't distinguish this pages can be
           removed or not. But, it will be able to do by this patch.
           (Not implemented yet.)
        5) The node information like pgdat has similar issues. But, this
           will be able to be solved too by this.
           (Not implemented yet, but, remembering node id in the pages.)
      
      Fortunately, current bootmem allocator just keeps PageReserved flags,
      and doesn't use any other members of page struct. The users of
      bootmem doesn't use them too.
      
      This patch:
      
      This is to register information which is node or section's id.  Kernel can
      distinguish which node/section uses the pages allcated by bootmem.  This is
      basis for hot-remove sections or nodes.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04753278
    • B
      hotplug memory remove: generic __remove_pages() support · ea01ea93
      Badari Pulavarty 提交于
      Generic helper function to remove section mappings and sysfs entries for the
      section of the memory we are removing.  offline_pages() correctly adjusted
      zone and marked the pages reserved.
      
      TODO: Yasunori Goto is working on patches to free up allocations from bootmem.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea01ea93
  28. 17 10月, 2007 2 次提交