1. 05 6月, 2021 1 次提交
    • D
      drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 · 92813053
      David Hildenbrand 提交于
      offline_pages() properly checks for memory holes and bails out.
      However, we do a page_zone(pfn_to_page(start_pfn)) before calling
      offline_pages() when offlining a memory block.
      
      We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on
      aarch64 in offlining code, otherwise we can trigger a BUG when hitting a
      memory hole:
      
         kernel BUG at include/linux/mm.h:1383!
         Internal error: Oops - BUG: 0 [#1] SMP
         Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
         CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
         Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
         pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
         pc : memory_subsys_offline+0x1f8/0x250
         lr : memory_subsys_offline+0x1f8/0x250
         Call trace:
           memory_subsys_offline+0x1f8/0x250
           device_offline+0x154/0x1d8
           online_store+0xa4/0x118
           dev_attr_store+0x44/0x78
           sysfs_kf_write+0xe8/0x138
           kernfs_fop_write_iter+0x26c/0x3d0
           new_sync_write+0x2bc/0x4f8
           vfs_write+0x718/0xc88
           ksys_write+0xf8/0x1e0
           __arm64_sys_write+0x74/0xa8
           invoke_syscall.constprop.0+0x78/0x1e8
           do_el0_svc+0xe4/0x298
           el0_svc+0x20/0x30
           el0_sync_handler+0xb0/0xb8
           el0_sync+0x178/0x180
         Kernel panic - not syncing: Oops - BUG: Fatal exception
         SMP: stopping secondary CPUs
         Kernel Offset: disabled
         CPU features: 0x00000251,20000846
         Memory Limit: none
      
      If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
      memory that doesn't have any holes.  So call
      page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
      nr_vmemmap_pages is set and we actually adjust the present pages.
      
      Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
      Fixes: a08a2ae3 ("mm,memory_hotplug: allocate memmap from the added memory range")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reported-by: NQian Cai (QUIC) <quic_qiancai@quicinc.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92813053
  2. 04 6月, 2021 1 次提交
  3. 06 5月, 2021 2 次提交
    • O
      mm,memory_hotplug: allocate memmap from the added memory range · a08a2ae3
      Oscar Salvador 提交于
      Physical memory hotadd has to allocate a memmap (struct page array) for
      the newly added memory section.  Currently, alloc_pages_node() is used
      for those allocations.
      
      This has some disadvantages:
       a) an existing memory is consumed for that purpose
          (eg: ~2MB per 128MB memory section on x86_64)
          This can even lead to extreme cases where system goes OOM because
          the physically hotplugged memory depletes the available memory before
          it is onlined.
       b) if the whole node is movable then we have off-node struct pages
          which has performance drawbacks.
       c) It might be there are no PMD_ALIGNED chunks so memmap array gets
          populated with base pages.
      
      This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
      
      Vmemap page tables can map arbitrary memory.  That means that we can
      reserve a part of the physically hotadded memory to back vmemmap page
      tables.  This implementation uses the beginning of the hotplugged memory
      for that purpose.
      
      There are some non-obviously things to consider though.
      
      Vmemmap pages are allocated/freed during the memory hotplug events
      (add_memory_resource(), try_remove_memory()) when the memory is
      added/removed.  This means that the reserved physical range is not
      online although it is used.  The most obvious side effect is that
      pfn_to_online_page() returns NULL for those pfns.  The current design
      expects that this should be OK as the hotplugged memory is considered a
      garbage until it is onlined.  For example hibernation wouldn't save the
      content of those vmmemmaps into the image so it wouldn't be restored on
      resume but this should be OK as there no real content to recover anyway
      while metadata is reachable from other data structures (e.g.  vmemmap
      page tables).
      
      The reserved space is therefore (de)initialized during the {on,off}line
      events (mhp_{de}init_memmap_on_memory).  That is done by extracting page
      allocator independent initialization from the regular onlining path.
      The primary reason to handle the reserved space outside of
      {on,off}line_pages is to make each initialization specific to the
      purpose rather than special case them in a single function.
      
      As per above, the functions that are introduced are:
      
       - mhp_init_memmap_on_memory:
         Initializes vmemmap pages by calling move_pfn_range_to_zone(), calls
         kasan_add_zero_shadow(), and onlines as many sections as vmemmap pages
         fully span.
      
       - mhp_deinit_memmap_on_memory:
         Offlines as many sections as vmemmap pages fully span, removes the
         range from zhe zone by remove_pfn_range_from_zone(), and calls
         kasan_remove_zero_shadow() for the range.
      
      The new function memory_block_online() calls mhp_init_memmap_on_memory()
      before doing the actual online_pages().  Should online_pages() fail, we
      clean up by calling mhp_deinit_memmap_on_memory().  Adjusting of
      present_pages is done at the end once we know that online_pages()
      succedeed.
      
      On offline, memory_block_offline() needs to unaccount vmemmap pages from
      present_pages() before calling offline_pages().  This is necessary because
      offline_pages() tears down some structures based on the fact whether the
      node or the zone become empty.  If offline_pages() fails, we account back
      vmemmap pages.  If it succeeds, we call mhp_deinit_memmap_on_memory().
      
      Hot-remove:
      
       We need to be careful when removing memory, as adding and
       removing memory needs to be done with the same granularity.
       To check that this assumption is not violated, we check the
       memory range we want to remove and if a) any memory block has
       vmemmap pages and b) the range spans more than a single memory
       block, we scream out loud and refuse to proceed.
      
       If all is good and the range was using memmap on memory (aka vmemmap pages),
       we construct an altmap structure so free_hugepage_table does the right
       thing and calls vmem_altmap_free instead of free_pagetable.
      
      Link: https://lkml.kernel.org/r/20210421102701.25051-5-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a08a2ae3
    • O
      drivers/base/memory: introduce memory_block_{online,offline} · 8736cc2d
      Oscar Salvador 提交于
      Patch series "Allocate memmap from hotadded memory (per device)", v10.
      
      The primary goal of this patchset is to reduce memory overhead of the
      hot-added memory (at least for SPARSEMEM_VMEMMAP memory model).  The
      current way we use to populate memmap (struct page array) has two main
      drawbacks:
      
      a) it consumes an additional memory until the hotadded memory itself is
         onlined and
      
      b) memmap might end up on a different numa node which is especially
         true for movable_node configuration.
      
      c) due to fragmentation we might end up populating memmap with base
         pages
      
      One way to mitigate all these issues is to simply allocate memmap array
      (which is the largest memory footprint of the physical memory hotplug)
      from the hot-added memory itself.  SPARSEMEM_VMEMMAP memory model allows
      us to map any pfn range so the memory doesn't need to be online to be
      usable for the array.  See patch 4 for more details.  This feature is
      only usable when CONFIG_SPARSEMEM_VMEMMAP is set.
      
      [Overall design]:
      
      Implementation wise we reuse vmem_altmap infrastructure to override the
      default allocator used by vmemap_populate.  memory_block structure gains a
      new field called nr_vmemmap_pages, which accounts for the number of
      vmemmap pages used by that memory_block.  E.g: On x86_64, that is 512
      vmemmap pages on small memory bloks and 4096 on large memory blocks (1GB)
      
      We also introduce new two functions: memory_block_{online,offline}.  These
      functions take care of initializing/unitializing vmemmap pages prior to
      calling {online,offline}_pages, so the latter functions can remain totally
      untouched.
      
      More details can be found in the respective changelogs.
      
      This patch (of 8):
      
      This is a preparatory patch that introduces two new functions:
      memory_block_online() and memory_block_offline().
      
      For now, these functions will only call online_pages() and offline_pages()
      respectively, but they will be later in charge of preparing the vmemmap
      pages, carrying out the initialization and proper accounting of such
      pages.
      
      Since memory_block struct contains all the information, pass this struct
      down the chain till the end functions.
      
      Link: https://lkml.kernel.org/r/20210421102701.25051-1-osalvador@suse.de
      Link: https://lkml.kernel.org/r/20210421102701.25051-2-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8736cc2d
  4. 27 2月, 2021 2 次提交
  5. 17 10月, 2020 1 次提交
    • D
      mm/memory_hotplug: prepare passing flags to add_memory() and friends · b6117199
      David Hildenbrand 提交于
      We soon want to pass flags, e.g., to mark added System RAM resources.
      mergeable.  Prepare for that.
      
      This patch is based on a similar patch by Oscar Salvador:
      
      https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.deSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: Juergen Gross <jgross@suse.com> # Xen related part
      Reviewed-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Acked-by: NWei Liu <wei.liu@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Libor Pechacek <lpechacek@suse.cz>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Leonardo Bras <leobras.c@gmail.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julien Grall <julien@xen.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6117199
  6. 02 10月, 2020 3 次提交
    • J
      drivers core: Miscellaneous changes for sysfs_emit · 948b3edb
      Joe Perches 提交于
      Change additional instances that could use sysfs_emit and sysfs_emit_at
      that the coccinelle script could not convert.
      
      o macros creating show functions with ## concatenation
      o unbound sprintf uses with buf+len for start of output to sysfs_emit_at
      o returns with ?: tests and sprintf to sysfs_emit
      o sysfs output with struct class * not struct device * arguments
      
      Miscellanea:
      
      o remove unnecessary initializations around these changes
      o consistently use int len for return length of show functions
      o use octal permissions and not S_<FOO>
      o rename a few show function names so DEVICE_ATTR_<FOO> can be used
      o use DEVICE_ATTR_ADMIN_RO where appropriate
      o consistently use const char *output for strings
      o checkpatch/style neatening
      Signed-off-by: NJoe Perches <joe@perches.com>
      Link: https://lore.kernel.org/r/8bc24444fe2049a9b2de6127389b57edfdfe324d.1600285923.git.joe@perches.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      948b3edb
    • J
      drivers core: Remove strcat uses around sysfs_emit and neaten · 973c3911
      Joe Perches 提交于
      strcat is no longer necessary for sysfs_emit and sysfs_emit_at uses.
      
      Convert the strcat uses to sysfs_emit calls and neaten other block
      uses of direct returns to use an intermediate const char *.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Link: https://lore.kernel.org/r/5d606519698ce4c8f1203a2b35797d8254c6050a.1600285923.git.joe@perches.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      973c3911
    • J
      drivers core: Use sysfs_emit and sysfs_emit_at for show(device *...) functions · aa838896
      Joe Perches 提交于
      Convert the various sprintf fmaily calls in sysfs device show functions
      to sysfs_emit and sysfs_emit_at for PAGE_SIZE buffer safety.
      
      Done with:
      
      $ spatch -sp-file sysfs_emit_dev.cocci --in-place --max-width=80 .
      
      And cocci script:
      
      $ cat sysfs_emit_dev.cocci
      @@
      identifier d_show;
      identifier dev, attr, buf;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	return
      -	sprintf(buf,
      +	sysfs_emit(buf,
      	...);
      	...>
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	return
      -	snprintf(buf, PAGE_SIZE,
      +	sysfs_emit(buf,
      	...);
      	...>
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	return
      -	scnprintf(buf, PAGE_SIZE,
      +	sysfs_emit(buf,
      	...);
      	...>
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      expression chr;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	return
      -	strcpy(buf, chr);
      +	sysfs_emit(buf, chr);
      	...>
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      identifier len;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	len =
      -	sprintf(buf,
      +	sysfs_emit(buf,
      	...);
      	...>
      	return len;
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      identifier len;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	len =
      -	snprintf(buf, PAGE_SIZE,
      +	sysfs_emit(buf,
      	...);
      	...>
      	return len;
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      identifier len;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      	len =
      -	scnprintf(buf, PAGE_SIZE,
      +	sysfs_emit(buf,
      	...);
      	...>
      	return len;
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      identifier len;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	<...
      -	len += scnprintf(buf + len, PAGE_SIZE - len,
      +	len += sysfs_emit_at(buf, len,
      	...);
      	...>
      	return len;
      }
      
      @@
      identifier d_show;
      identifier dev, attr, buf;
      expression chr;
      @@
      
      ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
      {
      	...
      -	strcpy(buf, chr);
      -	return strlen(buf);
      +	return sysfs_emit(buf, chr);
      }
      Signed-off-by: NJoe Perches <joe@perches.com>
      Link: https://lore.kernel.org/r/3d033c33056d88bbe34d4ddb62afd05ee166ab9a.1600285923.git.joe@perches.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa838896
  7. 10 7月, 2020 2 次提交
  8. 04 6月, 2020 1 次提交
    • S
      drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup · 4fb6eabf
      Scott Cheloha 提交于
      Searching for a particular memory block by id is an O(n) operation because
      each memory block's underlying device is kept in an unsorted linked list
      on the subsystem bus.
      
      We can cut the lookup cost to O(log n) if we cache each memory block
      in an xarray.  This time complexity improvement is significant on
      systems with many memory blocks.  For example:
      
      1. A 128GB POWER9 VM with 256MB memblocks has 512 blocks.  With this
         change  memory_dev_init() completes ~12ms faster and walk_memory_blocks()
         completes ~12ms faster.
      
      Before:
      [    0.005042] memory_dev_init: adding memory blocks
      [    0.021591] memory_dev_init: added memory blocks
      [    0.022699] walk_memory_blocks: walking memory blocks
      [    0.038730] walk_memory_blocks: walked memory blocks 0-511
      
      After:
      [    0.005057] memory_dev_init: adding memory blocks
      [    0.009415] memory_dev_init: added memory blocks
      [    0.010519] walk_memory_blocks: walking memory blocks
      [    0.014135] walk_memory_blocks: walked memory blocks 0-511
      
      2. A 256GB POWER9 LPAR with 256MB memblocks has 1024 blocks.  With
         this change memory_dev_init() completes ~88ms faster and
         walk_memory_blocks() completes ~87ms faster.
      
      Before:
      [    0.252246] memory_dev_init: adding memory blocks
      [    0.395469] memory_dev_init: added memory blocks
      [    0.409413] walk_memory_blocks: walking memory blocks
      [    0.433028] walk_memory_blocks: walked memory blocks 0-511
      [    0.433094] walk_memory_blocks: walking memory blocks
      [    0.500244] walk_memory_blocks: walked memory blocks 131072-131583
      
      After:
      [    0.245063] memory_dev_init: adding memory blocks
      [    0.299539] memory_dev_init: added memory blocks
      [    0.313609] walk_memory_blocks: walking memory blocks
      [    0.315287] walk_memory_blocks: walked memory blocks 0-511
      [    0.315349] walk_memory_blocks: walking memory blocks
      [    0.316988] walk_memory_blocks: walked memory blocks 131072-131583
      
      3. A 32TB POWER9 LPAR with 256MB memblocks has 131072 blocks.  With
         this change we complete memory_dev_init() ~37 minutes faster and
         walk_memory_blocks() at least ~30 minutes faster.  The exact timing
         for walk_memory_blocks() is  missing, though I observed that the
         soft lockups in walk_memory_blocks() disappeared with the change,
         suggesting that lower bound.
      
      Before:
      [   13.703907] memory_dev_init: adding blocks
      [ 2287.406099] memory_dev_init: added all blocks
      [ 2347.494986] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 2527.625378] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 2707.761977] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 2887.899975] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 3068.028318] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 3248.158764] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 3428.287296] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 3608.425357] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 3788.554572] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 3968.695071] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      [ 4148.823970] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
      
      After:
      [   13.696898] memory_dev_init: adding blocks
      [   15.660035] memory_dev_init: added all blocks
      (the walk_memory_blocks traces disappear)
      
      There should be no significant negative impact for machines with few
      memory blocks.  A sparse xarray has a small footprint and an O(log n)
      lookup is negligibly slower than an O(n) lookup for only the smallest
      number of memory blocks.
      
      1. A 16GB x86 machine with 128MB memblocks has 132 blocks.  With this
         change memory_dev_init() completes ~300us faster and walk_memory_blocks()
         completes no faster or slower.  The improvement is pretty close to noise.
      
      Before:
      [    0.224752] memory_dev_init: adding memory blocks
      [    0.227116] memory_dev_init: added memory blocks
      [    0.227183] walk_memory_blocks: walking memory blocks
      [    0.227183] walk_memory_blocks: walked memory blocks 0-131
      
      After:
      [    0.224911] memory_dev_init: adding memory blocks
      [    0.226935] memory_dev_init: added memory blocks
      [    0.227089] walk_memory_blocks: walking memory blocks
      [    0.227089] walk_memory_blocks: walked memory blocks 0-131
      
      [david@redhat.com: document the locking]
        Link: http://lkml.kernel.org/r/bc21eec6-7251-4c91-2f57-9a0671f8d414@redhat.comSigned-off-by: NScott Cheloha <cheloha@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NNathan Lynch <nathanl@linux.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rick Lindsley <ricklind@linux.vnet.ibm.com>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Link: http://lkml.kernel.org/r/20200121231028.13699-1-cheloha@linux.ibm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fb6eabf
  9. 08 4月, 2020 7 次提交
    • D
      mm/memory_hotplug: allow to specify a default online_type · 5f47adf7
      David Hildenbrand 提交于
      For now, distributions implement advanced udev rules to essentially
      - Don't online any hotplugged memory (s390x)
      - Online all memory to ZONE_NORMAL (e.g., most virt environments like
        hyperv)
      - Online all memory to ZONE_MOVABLE in case the zone imbalance is taken
        care of (e.g., bare metal, special virt environments)
      
      In summary: All memory is usually onlined the same way, however, the
      kernel always has to ask user space to come up with the same answer.
      E.g., Hyper-V always waits for a memory block to get onlined before
      continuing, otherwise it might end up adding memory faster than
      onlining it, which can result in strange OOM situations.  This waiting
      slows down adding of a bigger amount of memory.
      
      Let's allow to specify a default online_type, not just "online" and
      "offline".  This allows distributions to configure the default online_type
      when booting up and be done with it.
      
      We can now specify "offline", "online", "online_movable" and
      "online_kernel" via
      - "memhp_default_state=" on the kernel cmdline
      - /sys/devices/system/memory/auto_online_blocks
      just like we are able to specify for a single memory block via
      /sys/devices/system/memory/memoryX/state
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-9-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f47adf7
    • D
      mm/memory_hotplug: convert memhp_auto_online to store an online_type · 862919e5
      David Hildenbrand 提交于
      ...  and rename it to memhp_default_online_type.  This is a preparation
      for more detailed default online behavior.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-8-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      862919e5
    • D
      drivers/base/memory: store mapping between MMOP_* and string in an array · 4dc8207b
      David Hildenbrand 提交于
      Let's use a simple array which we can reuse soon.  While at it, move the
      string->mmop conversion out of the device hotplug lock.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-4-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4dc8207b
    • D
      drivers/base/memory: map MMOP_OFFLINE to 0 · efc978ad
      David Hildenbrand 提交于
      Historically, we used the value -1.  Just treat 0 as the special case now.
      Clarify a comment (which was wrong, when we come via device_online() the
      first time, the online_type would have been 0 / MEM_ONLINE).  The default
      is now always MMOP_OFFLINE.  This removes the last user of the manual
      "-1", which didn't use the enum value.
      
      This is a preparation to use the online_type as an array index.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Link: http://lkml.kernel.org/r/20200317104942.11178-3-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      efc978ad
    • D
      drivers/base/memory: rename MMOP_ONLINE_KEEP to MMOP_ONLINE · 956f8b44
      David Hildenbrand 提交于
      Patch series "mm/memory_hotplug: allow to specify a default online_type", v3.
      
      Distributions nowadays use udev rules ([1] [2]) to specify if and how to
      online hotplugged memory.  The rules seem to get more complex with many
      special cases.  Due to the various special cases,
      CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used.  All memory hotplug
      is handled via udev rules.
      
      Every time we hotplug memory, the udev rule will come to the same
      conclusion.  Especially Hyper-V (but also soon virtio-mem) add a lot of
      memory in separate memory blocks and wait for memory to get onlined by
      user space before continuing to add more memory blocks (to not add memory
      faster than it is getting onlined).  This of course slows down the whole
      memory hotplug process.
      
      To make the job of distributions easier and to avoid udev rules that get
      more and more complicated, let's extend the mechanism provided by
      - /sys/devices/system/memory/auto_online_blocks
      - "memhp_default_state=" on the kernel cmdline
      to be able to specify also "online_movable" as well as "online_kernel"
      
      === Example /usr/libexec/config-memhotplug ===
      
      #!/bin/bash
      
      VIRT=`systemd-detect-virt --vm`
      ARCH=`uname -p`
      
      sense_virtio_mem() {
        if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then
          DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l | wc -l`
          if [ $DEVICES != "0" ]; then
              return 0
          fi
        fi
        return 1
      }
      
      if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then
        echo "Memory hotplug configuration support missing in the kernel"
        exit 1
      fi
      
      if grep "memhp_default_state=" /proc/cmdline > /dev/null; then
        echo "Memory hotplug configuration overridden in kernel cmdline (memhp_default_state=)"
        exit 1
      fi
      
      if [ $VIRT == "microsoft" ]; then
        echo "Detected Hyper-V on $ARCH"
        # Hyper-V wants all memory in ZONE_NORMAL
        ONLINE_TYPE="online_kernel"
      elif sense_virtio_mem; then
        echo "Detected virtio-mem on $ARCH"
        # virtio-mem wants all memory in ZONE_NORMAL
        ONLINE_TYPE="online_kernel"
      elif [ $ARCH == "s390x" ] || [ $ARCH == "s390" ]; then
        echo "Detected $ARCH"
        # standby memory should not be onlined automatically
        ONLINE_TYPE="offline"
      elif [ $ARCH == "ppc64" ] || [ $ARCH == "ppc64le" ]; then
        echo "Detected" $ARCH
        # PPC64 onlines all hotplugged memory right from the kernel
        ONLINE_TYPE="offline"
      elif [ $VIRT == "none" ]; then
        echo "Detected bare-metal on $ARCH"
        # Bare metal users expect hotplugged memory to be unpluggable. We assume
        # that ZONE imbalances on such enterpise servers cannot happen and is
        # properly documented
        ONLINE_TYPE="online_movable"
      else
        # TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE
        # imbalances won't happen
        echo "Detected $VIRT on $ARCH"
        # Usually, ballooning is used in virtual environments, so memory should go to
        # ZONE_NORMAL. However, sometimes "movable_node" is relevant.
        ONLINE_TYPE="online"
      fi
      
      echo "Selected online_type:" $ONLINE_TYPE
      
      # Configure what to do with memory that will be hotplugged in the future
      echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks
      if [ $? != "0" ]; then
        echo "Memory hotplug cannot be configured (e.g., old kernel or missing permissions)"
        # A backup udev rule should handle old kernels if necessary
        exit 1
      fi
      
      # Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or virtio-mem)
      if [ $ONLINE_TYPE != "offline" ]; then
        for MEMORY in /sys/devices/system/memory/memory*; do
          STATE=`cat $MEMORY/state`
          if [ $STATE == "offline" ]; then
              echo $ONLINE_TYPE > $MEMORY/state
          fi
        done
      fi
      
      === Example /usr/lib/systemd/system/config-memhotplug.service ===
      
      [Unit]
      Description=Configure memory hotplug behavior
      DefaultDependencies=no
      Conflicts=shutdown.target
      Before=sysinit.target shutdown.target
      After=systemd-modules-load.service
      ConditionPathExists=|/sys/devices/system/memory/auto_online_blocks
      
      [Service]
      ExecStart=/usr/libexec/config-memhotplug
      Type=oneshot
      TimeoutSec=0
      RemainAfterExit=yes
      
      [Install]
      WantedBy=sysinit.target
      
      === Example modification to the 40-redhat.rules [2] ===
      
      : diff --git a/40-redhat.rules b/40-redhat.rules-new
      : index 2c690e5..168fd03 100644
      : --- a/40-redhat.rules
      : +++ b/40-redhat.rules-new
      : @@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}
      :  # Memory hotadd request
      :  SUBSYSTEM!="memory", GOTO="memory_hotplug_end"
      :  ACTION!="add", GOTO="memory_hotplug_end"
      : +# memory hotplug behavior configured
      : +PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", GOTO="memory_hotplug_end"
      : +
      :  PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"
      :
      :  ENV{.state}="online"
      
      ===
      
      [1] https://github.com/lnykryn/systemd-rhel/pull/281
      [2] https://github.com/lnykryn/systemd-rhel/blob/staging/rules/40-redhat.rules
      
      This patch (of 8):
      
      The name is misleading and it's not really clear what is "kept".  Let's
      just name it like the online_type name we expose to user space ("online").
      
      Add some documentation to the types.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Yumei Huang <yuhuang@redhat.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Link: http://lkml.kernel.org/r/20200319131221.14044-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20200317104942.11178-2-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      956f8b44
    • D
      drivers/base/memory.c: drop pages_correctly_probed() · fada9ae3
      David Hildenbrand 提交于
      pages_correctly_probed() is a leftover from ancient times.  It dates back
      to commit 3947be19 ("[PATCH] memory hotplug: sysfs and add/remove
      functions"), where Pg_reserved checks were added as a sfety net:
      
      	/*
      	 * The probe routines leave the pages reserved, just
      	 * as the bootmem code does.  Make sure they're still
      	 * that way.
      	 */
      
      The checks were refactored quite a bit over the years, especially in
      commit b77eab70 ("mm/memory_hotplug: optimize probe routine"), where
      checks for present, valid, and online sections were added.
      
      Hotplugged memory is added via add_memory(), which will create the full
      memmap for the hotplugged memory, and mark all sections valid and present.
      
      Only full memory blocks are onlined/offlined, so we also cannot have an
      inconsistency in that regard (especially, memory blocks with some sections
      being online and some being offline).
      
      1. Boot memory always starts online.  Since commit c5e79ef5
         ("mm/memory_hotplug.c: don't allow to online/offline memory blocks with
         holes") we disallow to offline any memory with holes.  Therefore, we
         never online memory with holes.  Present and validity checks are
         superfluous.
      
      2. Only complete memory blocks are onlined/offlined (and especially,
         the state - online or offline - is stored for whole memory blocks).
         Besides the core, only arch/powerpc/platforms/powernv/memtrace.c
         manually calls offline_pages() and fiddels with memory block states.
         But it also only offlines complete memory blocks.
      
      3. To make any of these conditions trigger, something would have to be
         terribly messed up in the core.  (e.g., online/offline only some
         sections of a memory block).
      
      4. Memory unplug properly makes sure that all sysfs attributes were
         removed (and therefore, that all threads left the sysfs handlers).  We
         don't have to worry about zombie devices at this point.
      
      5. The valid_section_nr(section_nr) check is actually dead code, as it
         would never have been reached due to the WARN_ON_ONCE(!pfn_valid(pfn)).
      
      No wonder we haven't seen any of these errors in a long time (or even
         ever, according to my search).  Let's just get rid of them.  Now, all
         checks that could hinder onlining and offlining are completely
         contained in online_pages()/offline_pages().
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Link: http://lkml.kernel.org/r/20200127110424.5757-3-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fada9ae3
    • D
      drivers/base/memory.c: drop section_count · 68c3a6ac
      David Hildenbrand 提交于
      Patch series "mm: drop superfluous section checks when onlining/offlining".
      
      Let's drop some superfluous section checks on the onlining/offlining path.
      
      This patch (of 3):
      
      Since commit c5e79ef5 ("mm/memory_hotplug.c: don't allow to
      online/offline memory blocks with holes") we have a generic check in
      offline_pages() that disallows offlining memory blocks with holes.
      
      Memory blocks with missing sections are just another variant of these type
      of blocks.  We can stop checking (and especially storing) present
      sections.  A proper error message is now printed why offlining failed.
      
      section_count was initially introduced in commit 07681215 ("Driver
      core: Add section count to memory_block struct") in order to detect when
      it is okay to remove a memory block.  It was used in commit 26bbe7ef
      ("drivers/base/memory.c: prohibit offlining of memory blocks with missing
      sections") to disallow offlining memory blocks with missing sections.  As
      we refactored creation/removal of memory devices and have a proper check
      for holes in place, we can drop the section_count.
      
      This also removes a leftover comment regarding the mem_sysfs_mutex, which
      was removed in commit 848e19ad ("drivers/base/memory.c: drop the
      mem_sysfs_mutex").
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Link: http://lkml.kernel.org/r/20200127110424.5757-2-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68c3a6ac
  10. 30 3月, 2020 1 次提交
    • D
      drivers/base/memory.c: indicate all memory blocks as removable · 53cdc1cb
      David Hildenbrand 提交于
      We see multiple issues with the implementation/interface to compute
      whether a memory block can be offlined (exposed via
      /sys/devices/system/memory/memoryX/removable) and would like to simplify
      it (remove the implementation).
      
      1. It runs basically lockless. While this might be good for performance,
         we see possible races with memory offlining that will require at
         least some sort of locking to fix.
      
      2. Nowadays, more false positives are possible. No arch-specific checks
         are performed that validate if memory offlining will not be denied
         right away (and such check will require locking). For example, arm64
         won't allow to offline any memory block that was added during boot -
         which will imply a very high error rate. Other archs have other
         constraints.
      
      3. The interface is inherently racy. E.g., if a memory block is detected
         to be removable (and was not a false positive at that time), there is
         still no guarantee that offlining will actually succeed. So any
         caller already has to deal with false positives.
      
      4. It is unclear which performance benefit this interface actually
         provides. The introducing commit 5c755e9f ("memory-hotplug: add
         sysfs removable attribute for hotplug memory remove") mentioned
      
      	"A user-level agent must be able to identify which sections
      	 of memory are likely to be removable before attempting the
      	 potentially expensive operation."
      
         However, no actual performance comparison was included.
      
      Known users:
      
       - lsmem: Will group memory blocks based on the "removable" property. [1]
      
       - chmem: Indirect user. It has a RANGE mode where one can specify
                removable ranges identified via lsmem to be offlined. However,
                it also has a "SIZE" mode, which allows a sysadmin to skip the
                manual "identify removable blocks" step. [2]
      
       - powerpc-utils: Uses the "removable" attribute to skip some memory
                blocks right away when trying to find some to offline+remove.
                However, with ballooning enabled, it already skips this
                information completely (because it once resulted in many false
                negatives). Therefore, the implementation can deal with false
                positives properly already. [3]
      
      According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
      driven from userspace via the drmgr command (powerpc-utils).  Nowadays
      it's managed in the kernel - including onlining/offlining of memory
      blocks - triggered by drmgr writing to /sys/kernel/dlpar.  So the
      affected legacy userspace handling is only active on old kernels.  Only
      very old versions of drmgr on a new kernel (unlikely) might execute
      slower - totally acceptable.
      
      With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
      break any user space tool.  We implement a very bad heuristic now.
      Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
      "not removable" as before.
      
      Original discussion can be found in [4] ("[PATCH RFC v1] mm:
      is_mem_section_removable() overhaul").
      
      Other users of is_mem_section_removable() will be removed next, so that
      we can remove is_mem_section_removable() completely.
      
      [1] http://man7.org/linux/man-pages/man1/lsmem.1.html
      [2] http://man7.org/linux/man-pages/man8/chmem.8.html
      [3] https://github.com/ibm-power-utilities/powerpc-utils
      [4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com
      
      Also, this patch probably fixes a crash reported by Steve.
      http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.comReported-by: N"Scargall, Steve" <steve.scargall@intel.com>
      Suggested-by: NMichal Hocko <mhocko@kernel.org>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NNathan Fontenot <ndfont@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Karel Zak <kzak@redhat.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53cdc1cb
  11. 04 2月, 2020 1 次提交
  12. 01 2月, 2020 2 次提交
  13. 02 12月, 2019 2 次提交
    • D
      drivers/base/memory.c: drop the mem_sysfs_mutex · 848e19ad
      David Hildenbrand 提交于
      The mem_sysfs_mutex isn't really helpful.  Also, it's not really clear
      what the mutex protects at all.
      
      The device lists of the memory subsystem are protected separately.  We
      don't need that mutex when looking up.  creating, or removing
      independent devices.  find_memory_block_by_id() will perform locking on
      its own and grab a reference of the returned device.
      
      At the time memory_dev_init() is called, we cannot have concurrent
      hot(un)plug operations yet - we're still fairly early during boot.  We
      don't need any locking.
      
      The creation/removal of memory block devices should be protected on a
      higher level - especially using the device hotplug lock to avoid
      documented issues (see Documentation/core-api/memory-hotplug.rst) - or
      if that is reworked, using similar locking.
      
      Protecting in the context of these functions only doesn't really make
      sense.  Especially, if we would have a situation where the same memory
      blocks are created/deleted at the same time, there is something horribly
      going wrong (imagining adding/removing a DIMM at the same time from two
      call paths) - after the functions succeeded something else in the
      callers would blow up (e.g., create_memory_block_devices() succeeded but
      there are no memory block devices anymore).
      
      All relevant call paths (except when adding memory early during boot via
      ACPI, which is now documented) hold the device hotplug lock when adding
      memory, and when removing memory.  Let's document that instead.
      
      Add a simple safety net to create_memory_block_devices() in case we
      would actually remove memory blocks while adding them, so we'll never
      dereference a NULL pointer.  Simplify memory_dev_init() now that the
      lock is gone.
      
      Link: http://lkml.kernel.org/r/20190925082621.4927-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      848e19ad
    • N
      mm, soft-offline: convert parameter to pfn · feec24a6
      Naoya Horiguchi 提交于
      Currently soft_offline_page() receives struct page, and its sibling
      memory_failure() receives pfn.  This discrepancy looks weird and makes
      precheck on pfn validity tricky.  So let's align them.
      
      Link: http://lkml.kernel.org/r/20191016234706.GA5493@www9186uo.sakura.ne.jpSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      feec24a6
  14. 16 11月, 2019 1 次提交
    • D
      mm/memory_hotplug: fix try_offline_node() · 2c91f8fc
      David Hildenbrand 提交于
      try_offline_node() is pretty much broken right now:
      
       - The node span is updated when onlining memory, not when adding it. We
         ignore memory that was mever onlined. Bad.
      
       - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
         trigger a kernel panic. Bad for memory that is offline but also bad
         for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
         first PFN of a section might contain garbage.
      
       - Sections belonging to mixed nodes are not properly considered.
      
      As memory blocks might belong to multiple nodes, we would have to walk
      all pageblocks (or at least subsections) within present sections.
      However, we don't have a way to identify whether a memmap that is not
      online was initialized (relevant for ZONE_DEVICE).  This makes things
      more complicated.
      
      Luckily, we can piggy pack on the node span and the nid stored in memory
      blocks.  Currently, the node span is grown when calling
      move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
      removing memory, before calling try_offline_node().  Sysfs links are
      created via link_mem_sections(), e.g., during boot or when adding
      memory.
      
      If the node still spans memory or if any memory block belongs to the
      nid, we don't set the node offline.  As memory blocks that span multiple
      nodes cannot get offlined, the nid stored in memory blocks is reliable
      enough (for such online memory blocks, the node still spans the memory).
      
      Introduce for_each_memory_block() to efficiently walk all memory blocks.
      
      Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
      when removing ZONE_DEVICE memory to fix similar issues (access of
      garbage memmaps) - until we have a reliable way to identify whether
      these memmaps were properly initialized.  This implies later, that once
      a node had ZONE_DEVICE memory, we won't be able to set a node offline -
      which should be acceptable.
      
      Since commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") memory that is added is not
      assoziated with a zone/node (memmap not initialized).  The introducing
      commit 60a5a19e ("memory-hotplug: remove sysfs file of node")
      already missed that we could have multiple nodes for a section and that
      the zone/node span is updated when onlining pages, not when adding them.
      
      I tested this by hotplugging two DIMMs to a memory-less and cpu-less
      NUMA node.  The node is properly onlined when adding the DIMMs.  When
      removing the DIMMs, the node is properly offlined.
      
      Masayoshi Mizuma reported:
      
      : Without this patch, memory hotplug fails as panic:
      :
      :  BUG: kernel NULL pointer dereference, address: 0000000000000000
      :  ...
      :  Call Trace:
      :   remove_memory_block_devices+0x81/0xc0
      :   try_remove_memory+0xb4/0x130
      :   __remove_memory+0xa/0x20
      :   acpi_memory_device_remove+0x84/0x100
      :   acpi_bus_trim+0x57/0x90
      :   acpi_bus_trim+0x2e/0x90
      :   acpi_device_hotplug+0x2b2/0x4d0
      :   acpi_hotplug_work_fn+0x1a/0x30
      :   process_one_work+0x171/0x380
      :   worker_thread+0x49/0x3f0
      :   kthread+0xf8/0x130
      :   ret_from_fork+0x35/0x40
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
      Fixes: 60a5a19e ("memory-hotplug: remove sysfs file of node")
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e8Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c91f8fc
  15. 19 10月, 2019 1 次提交
  16. 25 9月, 2019 4 次提交
  17. 19 7月, 2019 8 次提交
    • D
      drivers/base/memory.c: get rid of find_memory_block_hinted() · dd625285
      David Hildenbrand 提交于
      No longer needed, let's remove it.  Also, drop the "hint" parameter
      completely from "find_memory_block_by_id", as nobody needs it anymore.
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-7-david@redhat.com
      [david@redhat.com: handle zero-length walks]
        Link: http://lkml.kernel.org/r/1c2edc22-afd7-2211-c4c7-40e54e5007e8@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NQian Cai <cai@lca.pw>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd625285
    • D
      mm/memory_hotplug: move and simplify walk_memory_blocks() · ea884641
      David Hildenbrand 提交于
      Let's move walk_memory_blocks() to the place where memory block logic
      resides and simplify it.  While at it, add a type for the callback
      function.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-6-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea884641
    • D
      drivers/base/memory: use "unsigned long" for block ids · 90ec010f
      David Hildenbrand 提交于
      Block ids are just shifted section numbers, so let's also use "unsigned
      long" for them, too.
      
      Link: http://lkml.kernel.org/r/20190614100114.311-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90ec010f
    • D
      mm: section numbers use the type "unsigned long" · 2491f0a2
      David Hildenbrand 提交于
      Patch series "mm: Further memory block device cleanups", v1.
      
      Some further cleanups around memory block devices.  Especially, clean up
      and simplify walk_memory_range().  Including some other minor cleanups.
      
      This patch (of 6):
      
      We are using a mixture of "int" and "unsigned long".  Let's make this
      consistent by using "unsigned long" everywhere.  We'll do the same with
      memory block ids next.
      
      While at it, turn the "unsigned long i" in removable_show() into an int
      - sections_per_block is an int.
      
      [akpm@linux-foundation.org: s/unsigned long i/unsigned long nr/]
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-2-david@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2491f0a2
    • D
      mm/memory_hotplug: remove memory block devices before arch_remove_memory() · 4c4b7f9b
      David Hildenbrand 提交于
      Let's factor out removing of memory block devices, which is only
      necessary for memory added via add_memory() and friends that created
      memory block devices.  Remove the devices before calling
      arch_remove_memory().
      
      This finishes factoring out memory block device handling from
      arch_add_memory() and arch_remove_memory().
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-10-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c4b7f9b
    • D
      mm/memory_hotplug: create memory block devices after arch_add_memory() · db051a0d
      David Hildenbrand 提交于
      Only memory to be added to the buddy and to be onlined/offlined by user
      space using /sys/devices/system/memory/...  needs (and should have!)
      memory block devices.
      
      Factor out creation of memory block devices.  Create all devices after
      arch_add_memory() succeeded.  We can later drop the want_memblock
      parameter, because it is now effectively stale.
      
      Only after memory block devices have been added, memory can be onlined
      by user space.  This implies, that memory is not visible to user space
      at all before arch_add_memory() succeeded.
      
      While at it
       - use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
       - introduce find_memory_block_by_id() to search via block id
       - Use find_memory_block_by_id() in init_memory_block() to catch
         duplicates
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-8-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db051a0d
    • D
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand 提交于
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
    • D
      drivers/base/memory: pass a block_id to init_memory_block() · 18115825
      David Hildenbrand 提交于
      We'll rework hotplug_memory_register() shortly, so it no longer consumes
      pass a section.
      
      [cai@lca.pw: fix a compilation warning]
        Link: http://lkml.kernel.org/r/1559320186-28337-1-git-send-email-cai@lca.pw
      Link: http://lkml.kernel.org/r/20190527111152.16324-6-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18115825