1. 08 8月, 2020 3 次提交
    • M
      mm/sparse: cleanup the code surrounding memory_present() · c89ab04f
      Mike Rapoport 提交于
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
      functions that call memory_present() for each region in memblock.memory:
      sparse_memory_present_with_active_regions() and membocks_present().
      
      Moreover, all architectures have a call to either of these functions
      preceding the call to sparse_init() and in the most cases they are called
      one after the other.
      
      Mark the regions from memblock.memory as present during sparce_init() by
      making sparse_init() call memblocks_present(), make memblocks_present()
      and memory_present() functions static and remove redundant
      sparse_memory_present_with_active_regions() function.
      
      Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c89ab04f
    • W
      mm/sparse: never partially remove memmap for early section · ef69bc9f
      Wei Yang 提交于
      For early sections, its memmap is handled specially even sub-section is
      enabled.  The memmap could only be populated as a whole.
      
      Quoted from the comment of section_activate():
      
          * The early init code does not consider partially populated
          * initial sections, it simply assumes that memory will never be
          * referenced.  If we hot-add memory into such a section then we
          * do not need to populate the memmap and can simply reuse what
          * is already there.
      
      While current section_deactivate() breaks this rule.  When hot-remove a
      sub-section, section_deactivate() would depopulate its memmap.  The
      consequence is if we hot-add this subsection again, its memmap never get
      proper populated.
      
      We can reproduce the case by following steps:
      
      1. Hacking qemu to allow sub-section early section
      
      :   diff --git a/hw/i386/pc.c b/hw/i386/pc.c
      :   index 51b3050d01..c6a78d83c0 100644
      :   --- a/hw/i386/pc.c
      :   +++ b/hw/i386/pc.c
      :   @@ -1010,7 +1010,7 @@ void pc_memory_init(PCMachineState *pcms,
      :            }
      :
      :            machine->device_memory->base =
      :   -            ROUND_UP(0x100000000ULL + x86ms->above_4g_mem_size, 1 * GiB);
      :   +            0x100000000ULL + x86ms->above_4g_mem_size;
      :
      :            if (pcmc->enforce_aligned_dimm) {
      :                /* size device region assuming 1G page max alignment per slot */
      
      2. Bootup qemu with PSE disabled and a sub-section aligned memory size
      
         Part of the qemu command would look like this:
      
         sudo x86_64-softmmu/qemu-system-x86_64 \
             --enable-kvm -cpu host,pse=off \
             -m 4160M,maxmem=20G,slots=1 \
             -smp sockets=2,cores=16 \
             -numa node,nodeid=0,cpus=0-1 -numa node,nodeid=1,cpus=2-3 \
             -machine pc,nvdimm \
             -nographic \
             -object memory-backend-ram,id=mem0,size=8G \
             -device nvdimm,id=vm0,memdev=mem0,node=0,addr=0x144000000,label-size=128k
      
      3. Re-config a pmem device with sub-section size in guest
      
         ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax --size=16M
      
      Then you would see the following call trace:
      
         pmem0: detected capacity change from 0 to 16777216
         BUG: unable to handle page fault for address: ffffec73c51000b4
         #PF: supervisor write access in kernel mode
         #PF: error_code(0x0002) - not-present page
         PGD 81ff8067 P4D 81ff8067 PUD 81ff7067 PMD 1437cb067 PTE 0
         Oops: 0002 [#1] SMP NOPTI
         CPU: 16 PID: 1348 Comm: ndctl Kdump: loaded Tainted: G        W         5.8.0-rc2+ #24
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.4
         RIP: 0010:memmap_init_zone+0x154/0x1c2
         Code: 77 16 f6 40 10 02 74 10 48 03 48 08 48 89 cb 48 c1 eb 0c e9 3a ff ff ff 48 89 df 48 c1 e7 06 48f
         RSP: 0018:ffffbdc7011a39b0 EFLAGS: 00010282
         RAX: ffffec73c5100088 RBX: 0000000000144002 RCX: 0000000000144000
         RDX: 0000000000000004 RSI: 007ffe0000000000 RDI: ffffec73c5100080
         RBP: 027ffe0000000000 R08: 0000000000000001 R09: ffff9f8d38f6d708
         R10: ffffec73c0000000 R11: 0000000000000000 R12: 0000000000000004
         R13: 0000000000000001 R14: 0000000000144200 R15: 0000000000000000
         FS:  00007efe6b65d780(0000) GS:ffff9f8d3f780000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: ffffec73c51000b4 CR3: 000000007d718000 CR4: 0000000000340ee0
         Call Trace:
          move_pfn_range_to_zone+0x128/0x150
          memremap_pages+0x4e4/0x5a0
          devm_memremap_pages+0x1e/0x60
          dev_dax_probe+0x69/0x160 [device_dax]
          really_probe+0x298/0x3c0
          driver_probe_device+0xe1/0x150
          ? driver_allows_async_probing+0x50/0x50
          bus_for_each_drv+0x7e/0xc0
          __device_attach+0xdf/0x160
          bus_probe_device+0x8e/0xa0
          device_add+0x3b9/0x740
          __devm_create_dev_dax+0x127/0x1c0
          __dax_pmem_probe+0x1f2/0x219 [dax_pmem_core]
          dax_pmem_probe+0xc/0x1b [dax_pmem]
          nvdimm_bus_probe+0x69/0x1c0 [libnvdimm]
          really_probe+0x147/0x3c0
          driver_probe_device+0xe1/0x150
          device_driver_attach+0x53/0x60
          bind_store+0xd1/0x110
          kernfs_fop_write+0xce/0x1b0
          vfs_write+0xb6/0x1a0
          ksys_write+0x5f/0xe0
          do_syscall_64+0x4d/0x90
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: ba72b4c8 ("mm/sparsemem: support sub-section hotplug")
      Signed-off-by: NWei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Link: http://lkml.kernel.org/r/20200625223534.18024-1-richard.weiyang@linux.alibaba.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef69bc9f
    • M
      mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40
      Mike Rapoport 提交于
      Patch series "mm: cleanup usage of <asm/pgalloc.h>"
      
      Most architectures have very similar versions of pXd_alloc_one() and
      pXd_free_one() for intermediate levels of page table.  These patches add
      generic versions of these functions in <asm-generic/pgalloc.h> and enable
      use of the generic functions where appropriate.
      
      In addition, functions declared and defined in <asm/pgalloc.h> headers are
      used mostly by core mm and early mm initialization in arch and there is no
      actual reason to have the <asm/pgalloc.h> included all over the place.
      The first patch in this series removes unneeded includes of
      <asm/pgalloc.h>
      
      In the end it didn't work out as neatly as I hoped and moving
      pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
      unnecessary changes to arches that have custom page table allocations, so
      I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
      to mm/.
      
      This patch (of 8):
      
      In most cases <asm/pgalloc.h> header is required only for allocations of
      page table memory.  Most of the .c files that include that header do not
      use symbols declared in <asm/pgalloc.h> and do not require that header.
      
      As for the other header files that used to include <asm/pgalloc.h>, it is
      possible to move that include into the .c file that actually uses symbols
      from <asm/pgalloc.h> and drop the include from the header file.
      
      The process was somewhat automated using
      
      	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                      $(grep -L -w -f /tmp/xx \
                              $(git grep -E -l '[<"]asm/pgalloc\.h'))
      
      where /tmp/xx contains all the symbols defined in
      arch/*/include/asm/pgalloc.h.
      
      [rppt@linux.ibm.com: fix powerpc warning]
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca15ca40
  2. 10 6月, 2020 1 次提交
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  3. 05 6月, 2020 1 次提交
  4. 08 4月, 2020 5 次提交
  5. 03 4月, 2020 3 次提交
  6. 30 3月, 2020 1 次提交
    • A
      mm/sparse: fix kernel crash with pfn_section_valid check · b943f045
      Aneesh Kumar K.V 提交于
      Fix the crash like this:
      
          BUG: Kernel NULL pointer dereference on read at 0x00000000
          Faulting instruction address: 0xc000000000c3447c
          Oops: Kernel access of bad area, sig: 11 [#1]
          LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
          CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
          ...
          NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
          LR [c000000000088354] vmemmap_free+0x144/0x320
          Call Trace:
             section_deactivate+0x220/0x240
             __remove_pages+0x118/0x170
             arch_remove_memory+0x3c/0x150
             memunmap_pages+0x1cc/0x2f0
             devm_action_release+0x30/0x50
             release_nodes+0x2f8/0x3e0
             device_release_driver_internal+0x168/0x270
             unbind_store+0x130/0x170
             drv_attr_store+0x44/0x60
             sysfs_kf_write+0x68/0x80
             kernfs_fop_write+0x100/0x290
             __vfs_write+0x3c/0x70
             vfs_write+0xcc/0x240
             ksys_write+0x7c/0x140
             system_call+0x5c/0x68
      
      The crash is due to NULL dereference at
      
      	test_bit(idx, ms->usage->subsection_map);
      
      due to ms->usage = NULL in pfn_section_valid()
      
      With commit d41e2f3b ("mm/hotplug: fix hot remove failure in
      SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
      depopulate_section_mem().  This was done so that pfn_page() can work
      correctly with kernel config that disables SPARSEMEM_VMEMMAP.  With that
      config pfn_to_page does
      
      	__section_mem_map_addr(__sec) + __pfn;
      
      where
      
        static inline struct page *__section_mem_map_addr(struct mem_section *section)
        {
      	unsigned long map = section->section_mem_map;
      	map &= SECTION_MAP_MASK;
      	return (struct page *)map;
        }
      
      Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
      used to check the pfn validity (pfn_valid()).  Since section_deactivate
      release mem_section->usage if a section is fully deactivated,
      pfn_valid() check after a subsection_deactivate cause a kernel crash.
      
        static inline int pfn_valid(unsigned long pfn)
        {
        ...
      	return early_section(ms) || pfn_section_valid(ms, pfn);
        }
      
      where
      
        static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
        {
      	int idx = subsection_map_index(pfn);
      
      	return test_bit(idx, ms->usage->subsection_map);
        }
      
      Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
      freed.  For architectures like ppc64 where large pages are used for
      vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
      sections.  Hence before a vmemmap mapping page can be freed, the kernel
      needs to make sure there are no valid sections within that mapping.
      Clearing the section valid bit before depopulate_section_memap enables
      this.
      
      [aneesh.kumar@linux.ibm.com: add comment]
        Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.comLink: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
      Fixes: d41e2f3b ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Reviewed-by: NWei Yang <richard.weiyang@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b943f045
  7. 22 3月, 2020 1 次提交
  8. 22 2月, 2020 1 次提交
    • W
      mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM · 18e19f19
      Wei Yang 提交于
      When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page()
      doesn't work before sparse_init_one_section() is called.
      
      This leads to a crash when hotplug memory:
      
          BUG: unable to handle page fault for address: 0000000006400000
          #PF: supervisor write access in kernel mode
          #PF: error_code(0x0002) - not-present page
          PGD 0 P4D 0
          Oops: 0002 [#1] SMP PTI
          CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G        W         5.5.0-next-20200205+ #343
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
          Workqueue: kacpi_hotplug acpi_hotplug_work_fn
          RIP: 0010:__memset+0x24/0x30
          Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
          RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87
          RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000
          RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000
          RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000
          R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000
          R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280
          FS:  0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           sparse_add_section+0x1c9/0x26a
           __add_pages+0xbf/0x150
           add_pages+0x12/0x60
           add_memory_resource+0xc8/0x210
           __add_memory+0x62/0xb0
           acpi_memory_device_add+0x13f/0x300
           acpi_bus_attach+0xf6/0x200
           acpi_bus_scan+0x43/0x90
           acpi_device_hotplug+0x275/0x3d0
           acpi_hotplug_work_fn+0x1a/0x30
           process_one_work+0x1a7/0x370
           worker_thread+0x30/0x380
           kthread+0x112/0x130
           ret_from_fork+0x35/0x40
      
      We should use memmap as it did.
      
      On x86 the impact is limited to x86_32 builds, or x86_64 configurations
      that override the default setting for SPARSEMEM_VMEMMAP.
      
      Other memory hotplug archs (arm64, ia64, and ppc) also default to
      SPARSEMEM_VMEMMAP=y.
      
      [dan.j.williams@intel.com: changelog update]
      {rppt@linux.ibm.com: changelog update]
      Link: http://lkml.kernel.org/r/20200219030454.4844-1-bhe@redhat.com
      Fixes: ba72b4c8 ("mm/sparsemem: support sub-section hotplug")
      Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18e19f19
  9. 04 2月, 2020 1 次提交
  10. 01 2月, 2020 1 次提交
  11. 14 1月, 2020 1 次提交
    • D
      mm/memory_hotplug: don't free usage map when removing a re-added early section · 8068df3b
      David Hildenbrand 提交于
      When we remove an early section, we don't free the usage map, as the
      usage maps of other sections are placed into the same page.  Once the
      section is removed, it is no longer an early section (especially, the
      memmap is freed).  When we re-add that section, the usage map is reused,
      however, it is no longer an early section.  When removing that section
      again, we try to kfree() a usage map that was allocated during early
      boot - bad.
      
      Let's check against PageReserved() to see if we are dealing with an
      usage map that was allocated during boot.  We could also check against
      !(PageSlab(usage_page) || PageCompound(usage_page)), but PageReserved() is
      cleaner.
      
      Can be triggered using memtrace under ppc64/powernv:
      
        $ mount -t debugfs none /sys/kernel/debug/
        $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
        $ echo 0x20000000 > /sys/kernel/debug/powerpc/memtrace/enable
         ------------[ cut here ]------------
         kernel BUG at mm/slub.c:3969!
         Oops: Exception in kernel mode, sig: 5 [#1]
         LE PAGE_SIZE=3D64K MMU=3DHash SMP NR_CPUS=3D2048 NUMA PowerNV
         Modules linked in:
         CPU: 0 PID: 154 Comm: sh Not tainted 5.5.0-rc2-next-20191216-00005-g0be1dba7b7c0 #61
         NIP kfree+0x338/0x3b0
         LR section_deactivate+0x138/0x200
         Call Trace:
           section_deactivate+0x138/0x200
           __remove_pages+0x114/0x150
           arch_remove_memory+0x3c/0x160
           try_remove_memory+0x114/0x1a0
           __remove_memory+0x20/0x40
           memtrace_enable_set+0x254/0x850
           simple_attr_write+0x138/0x160
           full_proxy_write+0x8c/0x110
           __vfs_write+0x38/0x70
           vfs_write+0x11c/0x2a0
           ksys_write+0x84/0x140
           system_call+0x5c/0x68
         ---[ end trace 4b053cbd84e0db62 ]---
      
      The first invocation will offline+remove memory blocks.  The second
      invocation will first add+online them again, in order to offline+remove
      them again (usually we are lucky and the exact same memory blocks will
      get "reallocated").
      
      Tested on powernv with boot memory: The usage map will not get freed.
      Tested on x86-64 with DIMMs: The usage map will get freed.
      
      Using Dynamic Memory under a Power DLAPR can trigger it easily.
      
      Triggering removal (I assume after previously removed+re-added) of
      memory from the HMC GUI can crash the kernel with the same call trace
      and is fixed by this patch.
      
      Link: http://lkml.kernel.org/r/20191217104637.5509-1-david@redhat.com
      Fixes: 326e1b8f ("mm/sparsemem: introduce a SECTION_IS_EARLY flag")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Tested-by: NPingfan Liu <piliu@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8068df3b
  12. 02 12月, 2019 4 次提交
  13. 08 10月, 2019 1 次提交
  14. 25 9月, 2019 5 次提交
  15. 19 7月, 2019 11 次提交
    • D
      mm/sparsemem: cleanup 'section number' data types · 9a845030
      Dan Williams 提交于
      David points out that there is a mixture of 'int' and 'unsigned long'
      usage for section number data types.  Update the memory hotplug path to
      use 'unsigned long' consistently for section numbers.
      
      [akpm@linux-foundation.org: fix printk format]
      Link: http://lkml.kernel.org/r/156107543656.1329419.11505835211949439815.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a845030
    • D
      mm/sparsemem: support sub-section hotplug · ba72b4c8
      Dan Williams 提交于
      The libnvdimm sub-system has suffered a series of hacks and broken
      workarounds for the memory-hotplug implementation's awkward
      section-aligned (128MB) granularity.
      
      For example the following backtrace is emitted when attempting
      arch_add_memory() with physical address ranges that intersect 'System
      RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:
      
          # cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
          100000000-1ffffffff : System RAM
          200000000-303ffffff : Persistent Memory (legacy)
          304000000-43fffffff : System RAM
          440000000-23ffffffff : Persistent Memory
          2400000000-43bfffffff : Persistent Memory
            2400000000-43bfffffff : namespace2.0
      
          WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
          [..]
          RIP: 0010:add_pages+0x5c/0x60
          [..]
          Call Trace:
           devm_memremap_pages+0x460/0x6e0
           pmem_attach_disk+0x29e/0x680 [nd_pmem]
           ? nd_dax_probe+0xfc/0x120 [libnvdimm]
           nvdimm_bus_probe+0x66/0x160 [libnvdimm]
      
      It was discovered that the problem goes beyond RAM vs PMEM collisions as
      some platform produce PMEM vs PMEM collisions within a given section.
      The libnvdimm workaround for that case revealed that the libnvdimm
      section-alignment-padding implementation has been broken for a long
      while.
      
      A fix for that long-standing breakage introduces as many problems as it
      solves as it would require a backward-incompatible change to the
      namespace metadata interpretation.  Instead of that dubious route [1],
      address the root problem in the memory-hotplug implementation.
      
      Note that EEXIST is no longer treated as success as that is how
      sparse_add_section() reports subsection collisions, it was also obviated
      by recent changes to perform the request_region() for 'System RAM'
      before arch_add_memory() in the add_memory() sequence.
      
      [1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      
      [osalvador@suse.de: fix deactivate_section for early sections]
        Link: http://lkml.kernel.org/r/20190715081549.32577-2-osalvador@suse.de
      Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NOscar Salvador <osalvador@suse.de>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba72b4c8
    • D
      mm/sparsemem: prepare for sub-section ranges · 7ea62160
      Dan Williams 提交于
      Prepare the memory hot-{add,remove} paths for handling sub-section
      ranges by plumbing the starting page frame and number of pages being
      handled through arch_{add,remove}_memory() to
      sparse_{add,remove}_one_section().
      
      This is simply plumbing, small cleanups, and some identifier renames.
      No intended functional changes.
      
      Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ea62160
    • D
      mm/sparsemem: convert kmalloc_section_memmap() to populate_section_memmap() · e9c0a3f0
      Dan Williams 提交于
      Allow sub-section sized ranges to be added to the memmap.
      
      populate_section_memmap() takes an explict pfn range rather than
      assuming a full section, and those parameters are plumbed all the way
      through to vmmemap_populate().  There should be no sub-section usage in
      current deployments.  New warnings are added to clarify which memmap
      allocation paths are sub-section capable.
      
      Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9c0a3f0
    • D
      mm/sparsemem: add helpers track active portions of a section at boot · f46edbd1
      Dan Williams 提交于
      Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
      sub-section active bitmask, each bit representing a PMD_SIZE span of the
      architecture's memory hotplug section size.
      
      The implications of a partially populated section is that pfn_valid()
      needs to go beyond a valid_section() check and either determine that the
      section is an "early section", or read the sub-section active ranges
      from the bitmask.  The expectation is that the bitmask (subsection_map)
      fits in the same cacheline as the valid_section() / early_section()
      data, so the incremental performance overhead to pfn_valid() should be
      negligible.
      
      The rationale for using early_section() to short-ciruit the
      subsection_map check is that there are legacy code paths that use
      pfn_valid() at section granularity before validating the pfn against
      pgdat data.  So, the early_section() check allows those traditional
      assumptions to persist while also permitting subsection_map to tell the
      truth for purposes of populating the unused portions of early sections
      with PMEM and other ZONE_DEVICE mappings.
      
      Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Tested-by: NJane Chu <jane.chu@oracle.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f46edbd1
    • D
      mm/sparsemem: introduce a SECTION_IS_EARLY flag · 326e1b8f
      Dan Williams 提交于
      In preparation for sub-section hotplug, track whether a given section
      was created during early memory initialization, or later via memory
      hotplug.  This distinction is needed to maintain the coarse expectation
      that pfn_valid() returns true for any pfn within a given section even if
      that section has pages that are reserved from the page allocator.
      
      For example one of the of goals of subsection hotplug is to support
      cases where the system physical memory layout collides System RAM and
      PMEM within a section.  Several pfn_valid() users expect to just check
      if a section is valid, but they are not careful to check if the given
      pfn is within a "System RAM" boundary and instead expect pgdat
      information to further validate the pfn.
      
      Rather than unwind those paths to make their pfn_valid() queries more
      precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
      traditional expectation that pfn_valid() returns true for all early
      sections.
      
      Link: https://lore.kernel.org/lkml/1560366952-10660-1-git-send-email-cai@lca.pw/
      Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NQian Cai <cai@lca.pw>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      326e1b8f
    • D
      mm/sparsemem: introduce struct mem_section_usage · f1eca35a
      Dan Williams 提交于
      Patch series "mm: Sub-section memory hotplug support", v10.
      
      The memory hotplug section is an arbitrary / convenient unit for memory
      hotplug.  'Section-size' units have bled into the user interface
      ('memblock' sysfs) and can not be changed without breaking existing
      userspace.  The section-size constraint, while mostly benign for typical
      memory hotplug, has and continues to wreak havoc with 'device-memory'
      use cases, persistent memory (pmem) in particular.  Recall that pmem
      uses devm_memremap_pages(), and subsequently arch_add_memory(), to
      allocate a 'struct page' memmap for pmem.  However, it does not use the
      'bottom half' of memory hotplug, i.e.  never marks pmem pages online and
      never exposes the userspace memblock interface for pmem.  This leaves an
      opening to redress the section-size constraint.
      
      To date, the libnvdimm subsystem has attempted to inject padding to
      satisfy the internal constraints of arch_add_memory().  Beyond
      complicating the code, leading to bugs [2], wasting memory, and limiting
      configuration flexibility, the padding hack is broken when the platform
      changes this physical memory alignment of pmem from one boot to the
      next.  Device failure (intermittent or permanent) and physical
      reconfiguration are events that can cause the platform firmware to
      change the physical placement of pmem on a subsequent boot, and device
      failure is an everyday event in a data-center.
      
      It turns out that sections are only a hard requirement of the
      user-facing interface for memory hotplug and with a bit more
      infrastructure sub-section arch_add_memory() support can be added for
      kernel internal usages like devm_memremap_pages().  Here is an analysis
      of the current design assumptions in the current code and how they are
      addressed in the new implementation:
      
      Current design assumptions:
      
       - Sections that describe boot memory (early sections) are never
         unplugged / removed.
      
       - pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
         valid_section() check
      
       - __add_pages() and helper routines assume all operations occur in
         PAGES_PER_SECTION units.
      
       - The memblock sysfs interface only comprehends full sections
      
      New design assumptions:
      
       - Sections are instrumented with a sub-section bitmask to track (on
         x86) individual 2MB sub-divisions of a 128MB section.
      
       - Partially populated early sections can be extended with additional
         sub-sections, and those sub-sections can be removed with
         arch_remove_memory(). With this in place we no longer lose usable
         memory capacity to padding.
      
       - pfn_valid() is updated to look deeper than valid_section() to also
         check the active-sub-section mask. This indication is in the same
         cacheline as the valid_section() so the performance impact is
         expected to be negligible. So far the lkp robot has not reported any
         regressions.
      
       - Outside of the core vmemmap population routines which are replaced,
         other helper routines like shrink_{zone,pgdat}_span() are updated to
         handle the smaller granularity. Core memory hotplug routines that
         deal with online memory are not touched.
      
       - The existing memblock sysfs user api guarantees / assumptions are not
         touched since this capability is limited to !online
         !memblock-sysfs-accessible sections.
      
      Meanwhile the issue reports continue to roll in from users that do not
      understand when and how the 128MB constraint will bite them.  The current
      implementation relied on being able to support at least one misaligned
      namespace, but that immediately falls over on any moderately complex
      namespace creation attempt.  Beyond the initial problem of 'System RAM'
      colliding with pmem, and the unsolvable problem of physical alignment
      changes, Linux is now being exposed to platforms that collide pmem ranges
      with other pmem ranges by default [3].  In short, devm_memremap_pages()
      has pushed the venerable section-size constraint past the breaking point,
      and the simplicity of section-aligned arch_add_memory() is no longer
      tenable.
      
      These patches are exposed to the kbuild robot on a subsection-v10 branch
      [4], and a preview of the unit test for this functionality is available
      on the 'subsection-pending' branch of ndctl [5].
      
      [2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
      [3]: https://github.com/pmem/ndctl/issues/76
      [4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10
      [5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
      
      This patch (of 13):
      
      Towards enabling memory hotplug to track partial population of a section,
      introduce 'struct mem_section_usage'.
      
      A pointer to a 'struct mem_section_usage' instance replaces the existing
      pointer to a 'pageblock_flags' bitmap.  Effectively it adds one more
      'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house
      a new 'subsection_map' bitmap.  The new bitmap enables the memory
      hot{plug,remove} implementation to act on incremental sub-divisions of a
      section.
      
      SUBSECTION_SHIFT is defined as global constant instead of per-architecture
      value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of
      subsection users.  Specifically a common subsection size allows for the
      possibility that persistent memory namespace configurations be made
      compatible across architectures.
      
      The primary motivation for this functionality is to support platforms that
      mix "System RAM" and "Persistent Memory" within a single section, or
      multiple PMEM ranges with different mapping lifetimes within a single
      section.  The section restriction for hotplug has caused an ongoing saga
      of hacks and bugs for devm_memremap_pages() users.
      
      Beyond the fixups to teach existing paths how to retrieve the 'usemap'
      from a section, and updates to usemap allocation path, there are no
      expected behavior changes.
      
      Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NWei Yang <richardw.yang@linux.intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1eca35a
    • D
      mm: section numbers use the type "unsigned long" · 2491f0a2
      David Hildenbrand 提交于
      Patch series "mm: Further memory block device cleanups", v1.
      
      Some further cleanups around memory block devices.  Especially, clean up
      and simplify walk_memory_range().  Including some other minor cleanups.
      
      This patch (of 6):
      
      We are using a mixture of "int" and "unsigned long".  Let's make this
      consistent by using "unsigned long" everywhere.  We'll do the same with
      memory block ids next.
      
      While at it, turn the "unsigned long i" in removable_show() into an int
      - sections_per_block is an int.
      
      [akpm@linux-foundation.org: s/unsigned long i/unsigned long nr/]
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20190620183139.4352-2-david@redhat.com
      Link: http://lkml.kernel.org/r/20190614100114.311-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2491f0a2
    • W
      mm/sparse.c: set section nid for hot-add memory · 26f26bed
      Wei Yang 提交于
      In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
      section_to_node_table[].  While for hot-add memory, this is missed.
      Without this information, page_to_nid() may not give the right node id.
      
      BTW, current online_pages works because it leverages nid in
      memory_block.  But the granularity of node id should be mem_section
      wide.
      
      Link: http://lkml.kernel.org/r/20190618005537.18878-1-richardw.yang@linux.intel.comSigned-off-by: NWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26f26bed
    • D
      mm/memory_hotplug: remove "zone" parameter from sparse_remove_one_section · b9bf8d34
      David Hildenbrand 提交于
      The parameter is unused, so let's drop it.  Memory removal paths should
      never care about zones.  This is the job of memory offlining and will
      require more refactorings.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-12-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NWei Yang <richardw.yang@linux.intel.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9bf8d34
    • D
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand 提交于
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d