1. 09 1月, 2018 1 次提交
  2. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  3. 09 9月, 2017 1 次提交
    • M
      mm/memory_hotplug: introduce add_pages · 3072e413
      Michal Hocko 提交于
      There are new users of memory hotplug emerging.  Some of them require
      different subset of arch_add_memory.  There are some which only require
      allocation of struct pages without mapping those pages to the kernel
      address space.  We currently have __add_pages for that purpose.  But this
      is rather lowlevel and not very suitable for the code outside of the
      memory hotplug.  E.g.  x86_64 wants to update max_pfn which should be done
      by the caller.  Introduce add_pages() which should care about those
      details if they are needed.  Each architecture should define its
      implementation and select CONFIG_ARCH_HAS_ADD_PAGES.  All others use the
      currently existing __add_pages.
      
      Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.comSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mark Hairgrove <mhairgrove@nvidia.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Sherry Cheung <SCheung@nvidia.com>
      Cc: Subhash Gutti <sgutti@nvidia.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Bob Liu <liubo95@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3072e413
  4. 07 9月, 2017 1 次提交
    • M
      mm, memory_hotplug: display allowed zones in the preferred ordering · e5e68930
      Michal Hocko 提交于
      Prior to commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") we used to allow to change the
      valid zone types of a memory block if it is adjacent to a different zone
      type.
      
      This fact was reflected in memoryNN/valid_zones by the ordering of
      printed zones.  The first one was default (echo online > memoryNN/state)
      and the other one could be onlined explicitly by online_{movable,kernel}.
      
      This behavior was removed by the said patch and as such the ordering was
      not all that important.  In most cases a kernel zone would be default
      anyway.  The only exception is movable_node handled by "mm,
      memory_hotplug: support movable_node for hotpluggable nodes".
      
      Let's reintroduce this behavior again because later patch will remove
      the zone overlap restriction and so user will be allowed to online
      kernel resp.  movable block regardless of its placement.  Original
      behavior will then become significant again because it would be
      non-trivial for users to see what is the default zone to online into.
      
      Implementation is really simple.  Pull out zone selection out of
      move_pfn_range into zone_for_pfn_range helper and use it in
      show_valid_zones to display the zone for default onlining and then both
      kernel and movable if they are allowed.  Default online zone is not
      duplicated.
      
      Link: http://lkml.kernel.org/r/20170714121233.16861-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kani Toshimitsu <toshi.kani@hpe.com>
      Cc: <slaoub@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5e68930
  5. 07 7月, 2017 7 次提交
    • M
      mm, memory_hotplug: move movable_node to the hotplug proper · 4932381e
      Michal Hocko 提交于
      movable_node_is_enabled is defined in memblock proper while it is
      initialized from the memory hotplug proper.  This is quite messy and it
      makes a dependency between the two so move movable_node along with the
      helper functions to memory_hotplug.
      
      To make it more entertaining the kernel parameter is ignored unless
      CONFIG_HAVE_MEMBLOCK_NODE_MAP=y because we do not have the node
      information for each memblock otherwise.  So let's warn when the option
      is disabled.
      
      Link: http://lkml.kernel.org/r/20170529114141.536-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kani Toshimitsu <toshi.kani@hpe.com>
      Cc: Chen Yucong <slaoub@gmail.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4932381e
    • M
      mm, memory_hotplug: remove unused cruft after memory hotplug rework · 559bfc7d
      Michal Hocko 提交于
      zone_for_memory doesn't have any user anymore as well as the whole zone
      shifting infrastructure so drop them all.
      
      This shouldn't introduce any functional changes.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-15-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      559bfc7d
    • M
      mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory · 3d79a728
      Michal Hocko 提交于
      arch_add_memory gets for_device argument which then controls whether we
      want to create memblocks for created memory sections.  Simplify the
      logic by telling whether we want memblocks directly rather than going
      through pointless negation.  This also makes the api easier to
      understand because it is clear what we want rather than nothing telling
      for_device which can mean anything.
      
      This shouldn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d79a728
    • M
      mm, memory_hotplug: do not assume ZONE_NORMAL is default kernel zone · c246a213
      Michal Hocko 提交于
      Heiko Carstens has noticed that he can generate overlapping zones for
      ZONE_DMA and ZONE_NORMAL:
      
        DMA      [mem 0x0000000000000000-0x000000007fffffff]
        Normal   [mem 0x0000000080000000-0x000000017fffffff]
      
        $ cat /sys/devices/system/memory/block_size_bytes
        10000000
        $ cat /sys/devices/system/memory/memory5/valid_zones
        DMA
        $ echo 0 > /sys/devices/system/memory/memory5/online
        $ cat /sys/devices/system/memory/memory5/valid_zones
        Normal
        $ echo 1 > /sys/devices/system/memory/memory5/online
        Normal
      
        $ cat /proc/zoneinfo
        Node 0, zone      DMA
        spanned  524288        <-----
        present  458752
        managed  455078
        start_pfn:           0 <-----
      
        Node 0, zone   Normal
        spanned  720896
        present  589824
        managed  571648
        start_pfn:           327680 <-----
      
      The reason is that we assume that the default zone for kernel onlining
      is ZONE_NORMAL.  This was a simplification introduced by the memory
      hotplug rework and it is easily fixable by checking the range overlap in
      the zone order and considering the first matching zone as the default
      one.  If there is no such zone then assume ZONE_NORMAL as we have been
      doing so far.
      
      Fixes: "mm, memory_hotplug: do not associate hotadded memory to zones until online"
      Link: http://lkml.kernel.org/r/20170601083746.4924-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c246a213
    • M
      mm, memory_hotplug: do not associate hotadded memory to zones until online · f1dd2cd1
      Michal Hocko 提交于
      The current memory hotplug implementation relies on having all the
      struct pages associate with a zone/node during the physical hotplug
      phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
      vast majority of cases this means that they are added to ZONE_NORMAL.
      This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory
      hotadd without sparsemem") and it wasn't a big deal back then because
      movable onlining didn't exist yet.
      
      Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
      onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable
      memory and portion memory") and then things got more complicated.
      Rather than reconsidering the zone association which was no longer
      needed (because the memory hotplug already depended on SPARSEMEM) a
      convoluted semantic of zone shifting has been developed.  Only the
      currently last memblock or the one adjacent to the zone_movable can be
      onlined movable.  This essentially means that the online type changes as
      the new memblocks are added.
      
      Let's simulate memory hot online manually
        $ echo 0x100000000 > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory32/valid_zones
        Normal Movable
      
        $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
      
        $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal
        /sys/devices/system/memory/memory34/valid_zones:Normal Movable
      
        $ echo online_movable > /sys/devices/system/memory/memory34/state
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Movable Normal
      
      This is an awkward semantic because an udev event is sent as soon as the
      block is onlined and an udev handler might want to online it based on
      some policy (e.g.  association with a node) but it will inherently race
      with new blocks showing up.
      
      This patch changes the physical online phase to not associate pages with
      any zone at all.  All the pages are just marked reserved and wait for
      the onlining phase to be associated with the zone as per the online
      request.  There are only two requirements
      
      	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
      
      	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
      
      the latter one is not an inherent requirement and can be changed in the
      future.  It preserves the current behavior and made the code slightly
      simpler.  This is subject to change in future.
      
      This means that the same physical online steps as above will lead to the
      following state: Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Movable
      
      Implementation:
      The current move_pfn_range is reimplemented to check the above
      requirements (allow_online_pfn_range) and then updates the respective
      zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
      pfn range with the zone/node.  __add_pages is updated to not require the
      zone and only initializes sections in the range.  This allowed to
      simplify the arch_add_memory code (s390 could get rid of quite some of
      code).
      
      devm_memremap_pages is the only user of arch_add_memory which relies on
      the zone association because it only hooks into the memory hotplug only
      half way.  It uses it to associate the new memory with ZONE_DEVICE but
      doesn't allow it to be {on,off}lined via sysfs.  This means that this
      particular code path has to call move_pfn_range_to_zone explicitly.
      
      The original zone shifting code is kept in place and will be removed in
      the follow up patch for an easier review.
      
      Please note that this patch also changes the original behavior when
      offlining a memory block adjacent to another zone (Normal vs.  Movable)
      used to allow to change its movable type.  This will be handled later.
      
      [richard.weiyang@gmail.com: simplify zone_intersects()]
        Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
      [richard.weiyang@gmail.com: remove duplicate call for set_page_links]
        Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
      [akpm@linux-foundation.org: remove unused local `i']
      Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1dd2cd1
    • M
      mm: consider zone which is not fully populated to have holes · 2d070eab
      Michal Hocko 提交于
      __pageblock_pfn_to_page has two users currently, set_zone_contiguous
      which checks whether the given zone contains holes and
      pageblock_pfn_to_page which then carefully returns a first valid page
      from the given pfn range for the given zone.  This doesn't handle zones
      which are not fully populated though.  Memory pageblocks can be offlined
      or might not have been onlined yet.  In such a case the zone should be
      considered to have holes otherwise pfn walkers can touch and play with
      offline pages.
      
      Current callers of pageblock_pfn_to_page in compaction seem to work
      properly right now because they only isolate PageBuddy
      (isolate_freepages_block) or PageLRU resp.  __PageMovable
      (isolate_migratepages_block) which will be always false for these pages.
      It would be safer to skip these pages altogether, though.
      
      In order to do this patch adds a new memory section state
      (SECTION_IS_ONLINE) which is set in memory_present (during boot time) or
      in online_pages_range during the memory hotplug.  Similarly
      offline_mem_sections clears the bit and it is called when the memory
      range is offlined.
      
      pfn_to_online_page helper is then added which check the mem section and
      only returns a page if it is onlined already.
      
      Use the new helper in __pageblock_pfn_to_page and skip the whole page
      block in such a case.
      
      [mhocko@suse.com: check valid section number in pfn_to_online_page (Vlastimil),
       mark sections online after all struct pages are initialized in
       online_pages_range (Vlastimil)]
        Link: http://lkml.kernel.org/r/20170518164210.GD18333@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170515085827.16474-8-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d070eab
    • M
      mm, memory_hotplug: get rid of is_zone_device_section · 1b862aec
      Michal Hocko 提交于
      Device memory hotplug hooks into regular memory hotplug only half way.
      It needs memory sections to track struct pages but there is no
      need/desire to associate those sections with memory blocks and export
      them to the userspace via sysfs because they cannot be onlined anyway.
      
      This is currently expressed by for_device argument to arch_add_memory
      which then makes sure to associate the given memory range with
      ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
      to distinguish special memory hotplug from the regular one.  While this
      works now, later patches in this series want to move __add_zone outside
      of arch_add_memory path so we have to come up with something else.
      
      Add want_memblock down the __add_pages path and use it to control
      whether the section->memblock association should be done.
      arch_add_memory then just trivially want memblock for everything but
      for_device hotplug.
      
      remove_memory_section doesn't need is_zone_device_section either.  We
      can simply skip all the memblock specific cleanup if there is no
      memblock for the given section.
      
      This shouldn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b862aec
  6. 04 2月, 2017 1 次提交
    • T
      base/memory, hotplug: fix a kernel oops in show_valid_zones() · a96dfddb
      Toshi Kani 提交于
      Reading a sysfs "memoryN/valid_zones" file leads to the following oops
      when the first page of a range is not backed by struct page.
      show_valid_zones() assumes that 'start_pfn' is always valid for
      page_zone().
      
       BUG: unable to handle kernel paging request at ffffea017a000000
       IP: show_valid_zones+0x6f/0x160
      
      This issue may happen on x86-64 systems with 64GiB or more memory since
      their memory block size is bumped up to 2GiB.  [1] An example of such
      systems is desribed below.  0x3240000000 is only aligned by 1GiB and
      this memory block starts from 0x3200000000, which is not backed by
      struct page.
      
       BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable
      
      Since test_pages_in_a_zone() already checks holes, fix this issue by
      extending this function to return 'valid_start' and 'valid_end' for a
      given range.  show_valid_zones() then proceeds with the valid range.
      
      [1] 'Commit bdee237c ("x86: mm: Use 2GB memory block size on
          large-memory x86-64 systems")'
      
      Link: http://lkml.kernel.org/r/20170127222149.30893-3-toshi.kani@hpe.comSigned-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>	[4.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a96dfddb
  7. 25 1月, 2017 1 次提交
    • Y
      memory_hotplug: make zone_can_shift() return a boolean value · 8a1f780e
      Yasuaki Ishimatsu 提交于
      online_{kernel|movable} is used to change the memory zone to
      ZONE_{NORMAL|MOVABLE} and online the memory.
      
      To check that memory zone can be changed, zone_can_shift() is used.
      Currently the function returns minus integer value, plus integer
      value and 0. When the function returns minus or plus integer value,
      it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.
      
      But when the function returns 0, there are two meanings.
      
      One of the meanings is that the memory zone does not need to be changed.
      For example, when memory is in ZONE_NORMAL and onlined by online_kernel
      the memory zone does not need to be changed.
      
      Another meaning is that the memory zone cannot be changed. When memory
      is in ZONE_NORMAL and onlined by online_movable, the memory zone may
      not be changed to ZONE_MOVALBE due to memory online limitation(see
      Documentation/memory-hotplug.txt). In this case, memory must not be
      onlined.
      
      The patch changes the return type of zone_can_shift() so that memory
      online operation fails when memory zone cannot be changed as follows:
      
      Before applying patch:
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  7864320
                 managed  7864320
         # echo online_movable > memory4097/state
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  8388608
                 managed  8388608
      
         online_movable operation succeeded. But memory is onlined as
         ZONE_NORMAL, not ZONE_MOVABLE.
      
      After applying patch:
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  7864320
                 managed  7864320
         # echo online_movable > memory4097/state
         bash: echo: write error: Invalid argument
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  7864320
                 managed  7864320
      
         online_movable operation failed because of failure of changing
         the memory zone from ZONE_NORMAL to ZONE_MOVABLE
      
      Fixes: df429ac0 ("memory-hotplug: more general validation of zone during online")
      Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.comSigned-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a1f780e
  8. 27 7月, 2016 1 次提交
  9. 28 5月, 2016 1 次提交
    • L
      mm: fix section mismatch warning · 7ded384a
      Linus Torvalds 提交于
      The register_page_bootmem_info_node() function needs to be marked __init
      in order to avoid a new warning introduced by commit f65e91df ("mm:
      use early_pfn_to_nid in register_page_bootmem_info_node").
      
      Otherwise you'll get a warning about how a non-init function calls
      early_pfn_to_nid (which is __meminit)
      
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ded384a
  10. 20 5月, 2016 1 次提交
  11. 16 3月, 2016 2 次提交
    • J
      mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous · 7cf91a98
      Joonsoo Kim 提交于
      There is a performance drop report due to hugepage allocation and in
      there half of cpu time are spent on pageblock_pfn_to_page() in
      compaction [1].
      
      In that workload, compaction is triggered to make hugepage but most of
      pageblocks are un-available for compaction due to pageblock type and
      skip bit so compaction usually fails.  Most costly operations in this
      case is to find valid pageblock while scanning whole zone range.  To
      check if pageblock is valid to compact, valid pfn within pageblock is
      required and we can obtain it by calling pageblock_pfn_to_page().  This
      function checks whether pageblock is in a single zone and return valid
      pfn if possible.  Problem is that we need to check it every time before
      scanning pageblock even if we re-visit it and this turns out to be very
      expensive in this workload.
      
      Although we have no way to skip this pageblock check in the system where
      hole exists at arbitrary position, we can use cached value for zone
      continuity and just do pfn_to_page() in the system where hole doesn't
      exist.  This optimization considerably speeds up in above workload.
      
      Before vs After
        Max: 1096 MB/s vs 1325 MB/s
        Min: 635 MB/s 1015 MB/s
        Avg: 899 MB/s 1194 MB/s
      
      Avg is improved by roughly 30% [2].
      
      [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
      [2]: https://lkml.org/lkml/2015/12/9/23
      
      [akpm@linux-foundation.org: don't forget to restore zone->contiguous on error path, per Vlastimil]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: NAaron Lu <aaron.lu@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NAaron Lu <aaron.lu@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cf91a98
    • V
      memory-hotplug: add automatic onlining policy for the newly added memory · 31bc3858
      Vitaly Kuznetsov 提交于
      Currently, all newly added memory blocks remain in 'offline' state
      unless someone onlines them, some linux distributions carry special udev
      rules like:
      
        SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
      
      to make this happen automatically.  This is not a great solution for
      virtual machines where memory hotplug is being used to address high
      memory pressure situations as such onlining is slow and a userspace
      process doing this (udev) has a chance of being killed by the OOM killer
      as it will probably require to allocate some memory.
      
      Introduce default policy for the newly added memory blocks in
      /sys/devices/system/memory/auto_online_blocks file with two possible
      values: "offline" which preserves the current behavior and "online"
      which causes all newly added memory blocks to go online as soon as
      they're added.  The default is "offline".
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NDaniel Kiper <daniel.kiper@oracle.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31bc3858
  12. 16 1月, 2016 1 次提交
  13. 23 10月, 2015 1 次提交
  14. 28 8月, 2015 1 次提交
    • D
      mm: ZONE_DEVICE for "device memory" · 033fbae9
      Dan Williams 提交于
      While pmem is usable as a block device or via DAX mappings to userspace
      there are several usage scenarios that can not target pmem due to its
      lack of struct page coverage. In preparation for "hot plugging" pmem
      into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
      separately from the ones that are subject to standard page allocations.
      Importantly "device memory" can be removed at will by userspace
      unbinding the driver of the device.
      
      Having a separate zone prevents allocation and otherwise marks these
      pages that are distinct from typical uniform memory.  Device memory has
      different lifetime and performance characteristics than RAM.  However,
      since we have run out of ZONES_SHIFT bits this functionality currently
      depends on sacrificing ZONE_DMA.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Jerome Glisse <j.glisse@gmail.com>
      [hch: various simplifications in the arch interface]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      033fbae9
  15. 15 4月, 2015 1 次提交
    • D
      mm, hotplug: fix concurrent memory hot-add deadlock · 30467e0b
      David Rientjes 提交于
      There's a deadlock when concurrently hot-adding memory through the probe
      interface and switching a memory block from offline to online.
      
      When hot-adding memory via the probe interface, add_memory() first takes
      mem_hotplug_begin() and then device_lock() is later taken when registering
      the newly initialized memory block.  This creates a lock dependency of (1)
      mem_hotplug.lock (2) dev->mutex.
      
      When switching a memory block from offline to online, dev->mutex is first
      grabbed in device_online() when the write(2) transitions an existing
      memory block from offline to online, and then online_pages() will take
      mem_hotplug_begin().
      
      This creates a lock inversion between mem_hotplug.lock and dev->mutex.
      Vitaly reports that this deadlock can happen when kworker handling a probe
      event races with systemd-udevd switching a memory block's state.
      
      This patch requires the state transition to take mem_hotplug_begin()
      before dev->mutex.  Hot-adding memory via the probe interface creates a
      memory block while holding mem_hotplug_begin(), there is no way to take
      dev->mutex first in this case.
      
      online_pages() and offline_pages() are only called when transitioning
      memory block state.  We now require that mem_hotplug_begin() is taken
      before calling them -- this requires exporting the mem_hotplug_begin() and
      mem_hotplug_done() to generic code.  In all hot-add and hot-remove cases,
      mem_hotplug_begin() is done prior to device_online().  This is all that is
      needed to avoid the deadlock.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30467e0b
  16. 10 10月, 2014 1 次提交
    • Z
      memory-hotplug: add sysfs valid_zones attribute · ed2f2400
      Zhang Zhen 提交于
      Currently memory-hotplug has two limits:
      
      1. If the memory block is in ZONE_NORMAL, you can change it to
         ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
      
      2. If the memory block is in ZONE_MOVABLE, you can change it to
         ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
      
      With this patch, we can easy to know a memory block can be onlined to
      which zone, and don't need to know the above two limits.
      
      Updated the related Documentation.
      
      [akpm@linux-foundation.org: use conventional comment layout]
      [akpm@linux-foundation.org: fix build with CONFIG_MEMORY_HOTREMOVE=n]
      [akpm@linux-foundation.org: remove unused local zone_prev]
      Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed2f2400
  17. 07 8月, 2014 2 次提交
    • W
      memory-hotplug: add zone_for_memory() for selecting zone for new memory · 63264400
      Wang Nan 提交于
      This series of patches fixes a problem when adding memory in bad manner.
      For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
      memory installed, following commands cause problem:
      
        # echo 0x40000000 > /sys/devices/system/memory/probe
       [   28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
        # echo 0x48000000 > /sys/devices/system/memory/probe
       [   28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
        # echo online_movable > /sys/devices/system/memory/memory9/state
        # echo 0x50000000 > /sys/devices/system/memory/probe
       [   29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
        # echo 0x58000000 > /sys/devices/system/memory/probe
       [   29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
        # echo online_movable > /sys/devices/system/memory/memory11/state
        # echo online> /sys/devices/system/memory/memory8/state
        # echo online> /sys/devices/system/memory/memory10/state
        # echo offline> /sys/devices/system/memory/memory9/state
       [   30.558819] Offlined Pages 32768
        # free
                    total       used       free     shared    buffers     cached
       Mem:        780588 18014398509432020     830552          0          0      51180
       -/+ buffers/cache: 18014398509380840     881732
       Swap:            0          0          0
      
      This is because the above commands probe higher memory after online a
      section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
      for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.
      
      After the second online_movable, the problem can be observed from
      zoneinfo:
      
        # cat /proc/zoneinfo
        ...
        Node 0, zone  Movable
          pages free     65491
                min      250
                low      312
                high     375
                scanned  0
                spanned  18446744073709518848
                present  65536
                managed  65536
        ...
      
      This series of patches solve the problem by checking ZONE_MOVABLE when
      choosing zone for new memory.  If new memory is inside or higher than
      ZONE_MOVABLE, makes it go there instead.
      
      After applying this series of patches, following are free and zoneinfo
      result (after offlining memory9):
      
        bash-4.2# free
                      total       used       free     shared    buffers     cached
         Mem:        780956      80112     700844          0          0      51180
         -/+ buffers/cache:      28932     752024
         Swap:            0          0          0
      
        bash-4.2# cat /proc/zoneinfo
      
        Node 0, zone      DMA
          pages free     3389
                min      14
                low      17
                high     21
                scanned  0
                spanned  4095
                present  3998
                managed  3977
            nr_free_pages 3389
        ...
          start_pfn:         1
          inactive_ratio:    1
        Node 0, zone    DMA32
          pages free     73724
                min      341
                low      426
                high     511
                scanned  0
                spanned  98304
                present  98304
                managed  92958
            nr_free_pages 73724
          ...
          start_pfn:         4096
          inactive_ratio:    1
        Node 0, zone   Normal
          pages free     32630
                min      120
                low      150
                high     180
                scanned  0
                spanned  32768
                present  32768
                managed  32768
            nr_free_pages 32630
        ...
          start_pfn:         262144
          inactive_ratio:    1
        Node 0, zone  Movable
          pages free     65476
                min      241
                low      301
                high     361
                scanned  0
                spanned  98304
                present  65536
                managed  65536
            nr_free_pages 65476
        ...
          start_pfn:         294912
          inactive_ratio:    1
      
      This patch (of 7):
      
      Introduce zone_for_memory() in arch independent code for
      arch_add_memory() use.
      
      Many arch_add_memory() function simply selects ZONE_HIGHMEM or
      ZONE_NORMAL and add new memory into it.  However, with the existance of
      ZONE_MOVABLE, the selection method should be carefully considered: if
      new, higher memory is added after ZONE_MOVABLE is setup, the default
      zone and ZONE_MOVABLE may overlap each other.
      
      should_add_memory_movable() checks the status of ZONE_MOVABLE.  If it
      has already contain memory, compare the address of new memory and
      movable memory.  If new memory is higher than movable, it should be
      added into ZONE_MOVABLE instead of default zone.
      Signed-off-by: NWang Nan <wangnan0@huawei.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: "Mel Gorman" <mgorman@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63264400
    • T
      mem-hotplug: introduce MMOP_OFFLINE to replace the hard coding -1 · 4f7c6b49
      Tang Chen 提交于
      In store_mem_state(), we have:
      
        ...
        334         else if (!strncmp(buf, "offline", min_t(int, count, 7)))
        335                 online_type = -1;
        ...
        355         case -1:
        356                 ret = device_offline(&mem->dev);
        357                 break;
        ...
      
      Here, "offline" is hard coded as -1.
      
      This patch does the following renaming:
      
       ONLINE_KEEP     ->  MMOP_ONLINE_KEEP
       ONLINE_KERNEL   ->  MMOP_ONLINE_KERNEL
       ONLINE_MOVABLE  ->  MMOP_ONLINE_MOVABLE
      
      and introduces MMOP_OFFLINE = -1 to avoid hard coding.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Hu Tao <hutao@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f7c6b49
  18. 05 6月, 2014 1 次提交
    • V
      mem-hotplug: implement get/put_online_mems · bfc8c901
      Vladimir Davydov 提交于
      kmem_cache_{create,destroy,shrink} need to get a stable value of
      cpu/node online mask, because they init/destroy/access per-cpu/node
      kmem_cache parts, which can be allocated or destroyed on cpu/mem
      hotplug.  To protect against cpu hotplug, these functions use
      {get,put}_online_cpus.  However, they do nothing to synchronize with
      memory hotplug - taking the slab_mutex does not eliminate the
      possibility of race as described in patch 2.
      
      What we need there is something like get_online_cpus, but for memory.
      We already have lock_memory_hotplug, which serves for the purpose, but
      it's a bit of a hammer right now, because it's backed by a mutex.  As a
      result, it imposes some limitations to locking order, which are not
      desirable, and can't be used just like get_online_cpus.  That's why in
      patch 1 I substitute it with get/put_online_mems, which work exactly
      like get/put_online_cpus except they block not cpu, but memory hotplug.
      
      [ v1 can be found at https://lkml.org/lkml/2014/4/6/68.  I NAK'ed it by
        myself, because it used an rw semaphore for get/put_online_mems,
        making them dead lock prune.  ]
      
      This patch (of 2):
      
      {un}lock_memory_hotplug, which is used to synchronize against memory
      hotplug, is currently backed by a mutex, which makes it a bit of a
      hammer - threads that only want to get a stable value of online nodes
      mask won't be able to proceed concurrently.  Also, it imposes some
      strong locking ordering rules on it, which narrows down the set of its
      usage scenarios.
      
      This patch introduces get/put_online_mems, which are the same as
      get/put_online_cpus, but for memory hotplug, i.e.  executing a code
      inside a get/put_online_mems section will guarantee a stable value of
      online nodes, present pages, etc.
      
      lock_memory_hotplug()/unlock_memory_hotplug() are removed altogether.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfc8c901
  19. 13 11月, 2013 2 次提交
  20. 02 6月, 2013 3 次提交
  21. 12 5月, 2013 1 次提交
    • R
      ACPI / memhotplug: Bind removable memory blocks to ACPI device nodes · e2ff3940
      Rafael J. Wysocki 提交于
      During ACPI memory hotplug configuration bind memory blocks residing
      in modules removable through the standard ACPI mechanism to struct
      acpi_device objects associated with ACPI namespace objects
      representing those modules.  Accordingly, unbind those memory blocks
      from the struct acpi_device objects when the memory modules in
      question are being removed.
      
      When "offline" operation for devices representing memory blocks is
      introduced, this will allow the ACPI core's device hot-remove code to
      use it to carry out remove_memory() for those memory blocks and check
      the results of that before it actually removes the modules holding
      them from the system.
      
      Since walk_memory_range() is used for accessing all memory blocks
      corresponding to a given ACPI namespace object, it is exported from
      memory_hotplug.c so that the code in acpi_memhotplug.c can use it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NVasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      e2ff3940
  22. 30 4月, 2013 1 次提交
  23. 24 2月, 2013 5 次提交
    • W
      memory-hotplug: export the function try_offline_node() · 90b30cdc
      Wen Congyang 提交于
      try_offline_node() will be needed in the tristate
      drivers/acpi/processor_driver.c.
      
      The node will be offlined when all memory/cpu on the node have been
      hotremoved.  So we need the function try_offline_node() in cpu-hotplug
      path.
      
      If the memory-hotplug is disabled, and cpu-hotplug is enabled
      
      1. no memory no the node
         we don't online the node, and cpu's node is the nearest node.
      
      2. the node contains some memory
         the node has been onlined, and cpu's node is still needed
         to migrate the sleep task on the cpu to the same node.
      
      So we do nothing in try_offline_node() in this case.
      
      [rientjes@google.com: export the function try_offline_node() fix]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      90b30cdc
    • T
      memory-hotplug: remove sysfs file of node · 60a5a19e
      Tang Chen 提交于
      Introduce a new function try_offline_node() to remove sysfs file of node
      when all memory sections of this node are removed.  If some memory
      sections of this node are not removed, this function does nothing.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60a5a19e
    • Y
      memory-hotplug: implement register_page_bootmem_info_section of sparse-vmemmap · 46723bfa
      Yasuaki Ishimatsu 提交于
      For removing memmap region of sparse-vmemmap which is allocated bootmem,
      memmap region of sparse-vmemmap needs to be registered by
      get_page_bootmem().  So the patch searches pages of virtual mapping and
      registers the pages by get_page_bootmem().
      
      NOTE: register_page_bootmem_memmap() is not implemented for ia64,
            ppc, s390, and sparc.  So introduce CONFIG_HAVE_BOOTMEM_INFO_NODE
            and revert register_page_bootmem_info_node() when platform doesn't
            support it.
      
            It's implemented by adding a new Kconfig option named
            CONFIG_HAVE_BOOTMEM_INFO_NODE, which will be automatically selected
            by memory-hotplug feature fully supported archs(currently only on
            x86_64).
      
            Since we have 2 config options called MEMORY_HOTPLUG and
            MEMORY_HOTREMOVE used for memory hot-add and hot-remove separately,
            and codes in function register_page_bootmem_info_node() are only
            used for collecting infomation for hot-remove, so reside it under
            MEMORY_HOTREMOVE.
      
            Besides page_isolation.c selected by MEMORY_ISOLATION under
            MEMORY_HOTPLUG is also such case, move it too.
      
      [mhocko@suse.cz: put register_page_bootmem_memmap inside CONFIG_MEMORY_HOTPLUG_SPARSE]
      [linfeng@cn.fujitsu.com: introduce CONFIG_HAVE_BOOTMEM_INFO_NODE and revert register_page_bootmem_info_node()]
      [mhocko@suse.cz: remove the arch specific functions without any implementation]
      [linfeng@cn.fujitsu.com: mm/Kconfig: move auto selects from MEMORY_HOTPLUG to MEMORY_HOTREMOVE as needed]
      [rientjes@google.com: fix defined but not used warning]
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NWu Jianguo <wujianguo@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46723bfa
    • W
      memory-hotplug: introduce new arch_remove_memory() for removing page table · 24d335ca
      Wen Congyang 提交于
      For removing memory, we need to remove page tables.  But it depends on
      architecture.  So the patch introduce arch_remove_memory() for removing
      page table.  Now it only calls __remove_pages().
      
      Note: __remove_pages() for some archtecuture is not implemented
            (I don't know how to implement it for s390).
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      24d335ca
    • Y
      memory-hotplug: check whether all memory blocks are offlined or not when removing memory · 6677e3ea
      Yasuaki Ishimatsu 提交于
      We remove the memory like this:
      
       1. lock memory hotplug
       2. offline a memory block
       3. unlock memory hotplug
       4. repeat 1-3 to offline all memory blocks
       5. lock memory hotplug
       6. remove memory(TODO)
       7. unlock memory hotplug
      
      All memory blocks must be offlined before removing memory.  But we don't
      hold the lock in the whole operation.  So we should check whether all
      memory blocks are offlined before step6.  Otherwise, kernel maybe
      panicked.
      
      Offlining a memory block and removing a memory device can be two
      different operations.  Users can just offline some memory blocks without
      removing the memory device.  For this purpose, the kernel has held
      lock_memory_hotplug() in __offline_pages().  To reuse the code for
      memory hot-remove, we repeat step 1-3 to offline all the memory blocks,
      repeatedly lock and unlock memory hotplug, but not hold the memory
      hotplug lock in the whole operation.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6677e3ea
  24. 12 12月, 2012 1 次提交
    • L
      mm, memory-hotplug: dynamic configure movable memory and portion memory · 511c2aba
      Lai Jiangshan 提交于
      Add online_movable and online_kernel for logic memory hotplug.  This is
      the dynamic version of "movablecore" & "kernelcore".
      
      We have the same reason to introduce it as to introduce "movablecore" &
      "kernelcore".  It has the same motive as "movablecore" & "kernelcore", but
      it is dynamic/running-time:
      
      o We can configure memory as kernelcore or movablecore after boot.
      
        Userspace workload is increased, we need more hugepage, we can't use
        "online_movable" to add memory and allow the system use more
        THP(transparent-huge-page), vice-verse when kernel workload is increase.
      
        Also help for virtualization to dynamic configure host/guest's memory,
        to save/(reduce waste) memory.
      
        Memory capacity on Demand
      
      o When a new node is physically online after boot, we need to use
        "online_movable" or "online_kernel" to configure/portion it as we
        expected when we logic-online it.
      
        This configuration also helps for physically-memory-migrate.
      
      o all benefit as the same as existed "movablecore" & "kernelcore".
      
      o Preparing for movable-node, which is very important for power-saving,
        hardware partitioning and high-available-system(hardware fault
        management).
      
      (Note, we don't introduce movable-node here.)
      
      Action behavior:
      When a memoryblock/memorysection is onlined by "online_movable", the kernel
      will not have directly reference to the page of the memoryblock,
      thus we can remove that memory any time when needed.
      
      When it is online by "online_kernel", the kernel can use it.
      When it is online by "online", the zone type doesn't changed.
      
      Current constraints:
      Only the memoryblock which is adjacent to the ZONE_MOVABLE
      can be online from ZONE_NORMAL to ZONE_MOVABLE.
      
      [akpm@linux-foundation.org: use min_t, cleanups]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      511c2aba
  25. 09 10月, 2012 1 次提交