1. 02 9月, 2020 2 次提交
    • A
      mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINE · 80aa4777
      Alex Shi 提交于
      task #29077503
      commit 756d25be457fc5497da0ceee0f3d0c9eb4d8535d upstream
      We have two types of users of page isolation:
      
       1. Memory offlining:  Offline memory so it can be unplugged. Memory
                             won't be touched.
      
       2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to
                             become the owner of the memory and make use of
                             it.
      
      For example, in case we want to offline memory, we can ignore (skip
      over) PageHWPoison() pages, as the memory won't get used.  We can allow
      to offline memory.  In contrast, we don't want to allow to allocate such
      memory.
      
      Let's generalize the approach so we can special case other types of
      pages we want to skip over in case we offline memory.  While at it, also
      pass the same flags to test_pages_isolated().
      Original-by: NDavid Hildenbrand <david@redhat.com>
      Link: http://lkml.kernel.org/r/20191021172353.3056-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from ccommit 756d25be457fc5497da0ceee0f3d0c9eb4d8535d)
      Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
      
      Conflicts:
      	reenable patch context on all files.
      80aa4777
    • M
      mm: only report isolation failures when offlining memory · 5316eb6e
      Michal Hocko 提交于
      task #29077503
      commit d381c54760dcfad23743da40516e7e003d73952a upstream
      Heiko has complained that his log is swamped by warnings from
      has_unmovable_pages
      
      [   20.536664] page dumped because: has_unmovable_pages
      [   20.536792] page:000003d081ff4080 count:1 mapcount:0 mapping:000000008ff88600 index:0x0 compound_mapcount: 0
      [   20.536794] flags: 0x3fffe0000010200(slab|head)
      [   20.536795] raw: 03fffe0000010200 0000000000000100 0000000000000200 000000008ff88600
      [   20.536796] raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000
      [   20.536797] page dumped because: has_unmovable_pages
      [   20.536814] page:000003d0823b0000 count:1 mapcount:0 mapping:0000000000000000 index:0x0
      [   20.536815] flags: 0x7fffe0000000000()
      [   20.536817] raw: 07fffe0000000000 0000000000000100 0000000000000200 0000000000000000
      [   20.536818] raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000
      
      which are not triggered by the memory hotplug but rather CMA allocator.
      The original idea behind dumping the page state for all call paths was
      that these messages will be helpful debugging failures.  From the above it
      seems that this is not the case for the CMA path because we are lacking
      much more context.  E.g the second reported page might be a CMA allocated
      page.  It is still interesting to see a slab page in the CMA area but it
      is hard to tell whether this is bug from the above output alone.
      
      Address this issue by dumping the page state only on request.  Both
      start_isolate_page_range and has_unmovable_pages already have an argument
      to ignore hwpoison pages so make this argument more generic and turn it
      into flags and allow callers to combine non-default modes into a mask.
      While we are at it, has_unmovable_pages call from
      is_pageblock_removable_nolock (sysfs removable file) is questionable to
      report the failure so drop it from there as well.
      
      Link: http://lkml.kernel.org/r/20181218092802.31429-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      (cherry picked from ccommit d381c54760dcfad23743da40516e7e003d73952a)
      Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
      
      Conflicts:
      	mm/page_alloc.c
      5316eb6e
  2. 13 4月, 2020 1 次提交
  3. 12 4月, 2018 1 次提交
  4. 06 4月, 2018 1 次提交
    • M
      mm/page_isolation.c: make start_isolate_page_range() fail if already isolated · 2c7452a0
      Mike Kravetz 提交于
      start_isolate_page_range() is used to set the migrate type of a set of
      pageblocks to MIGRATE_ISOLATE while attempting to start a migration
      operation.  It assumes that only one thread is calling it for the
      specified range.  This routine is used by CMA, memory hotplug and
      gigantic huge pages.  Each of these users synchronize access to the
      range within their subsystem.  However, two subsystems (CMA and gigantic
      huge pages for example) could attempt operations on the same range.  If
      this happens, one thread may 'undo' the work another thread is doing.
      This can result in pageblocks being incorrectly left marked as
      MIGRATE_ISOLATE and therefore not available for page allocation.
      
      What is ideally needed is a way to synchronize access to a set of
      pageblocks that are undergoing isolation and migration.  The only thing
      we know about these pageblocks is that they are all in the same zone.  A
      per-node mutex is too coarse as we want to allow multiple operations on
      different ranges within the same zone concurrently.  Instead, we will
      use the migration type of the pageblocks themselves as a form of
      synchronization.
      
      start_isolate_page_range sets the migration type on a set of page-
      blocks going in order from the one associated with the smallest pfn to
      the largest pfn.  The zone lock is acquired to check and set the
      migration type.  When going through the list of pageblocks check if
      MIGRATE_ISOLATE is already set.  If so, this indicates another thread is
      working on this pageblock.  We know exactly which pageblocks we set, so
      clean up by undo those and return -EBUSY.
      
      This allows start_isolate_page_range to serve as a synchronization
      mechanism and will allow for more general use of callers making use of
      these interfaces.  Update comments in alloc_contig_range to reflect this
      new functionality.
      
      Each CPU holds the associated zone lock to modify or examine the
      migration type of a pageblock.  And, it will only examine/update a
      single pageblock per lock acquire/release cycle.
      
      Link: http://lkml.kernel.org/r/20180309224731.16978-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c7452a0
  5. 16 11月, 2017 1 次提交
  6. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  7. 11 7月, 2017 1 次提交
    • M
      mm: unify new_node_page and alloc_migrate_target · 8b913238
      Michal Hocko 提交于
      Commit 394e31d2 ("mem-hotplug: alloc new page from a nearest
      neighbor node when mem-offline") has duplicated a large part of
      alloc_migrate_target with some hotplug specific special casing.
      
      To be more precise it tried to enfore the allocation from a different
      node than the original page.  As a result the two function diverged in
      their shared logic, e.g.  the hugetlb allocation strategy.
      
      Let's unify the two and express different NUMA requirements by the given
      nodemask.  new_node_page will simply exclude the node it doesn't care
      about and alloc_migrate_target will use all the available nodes.
      alloc_migrate_target will then learn to migrate hugetlb pages more
      sanely and use preallocated pool when possible.
      
      Please note that alloc_migrate_target used to call alloc_page resp.
      alloc_pages_current so the memory policy of the current context which is
      quite strange when we consider that it is used in the context of
      alloc_contig_range which just tries to migrate pages which stand in the
      way.
      
      Link: http://lkml.kernel.org/r/20170608074553.22152-4-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b913238
  8. 07 7月, 2017 1 次提交
    • M
      mm: __first_valid_page skip over offline pages · 2ce13640
      Michal Hocko 提交于
      __first_valid_page skips over invalid pfns in the range but it might
      still stumble over offline pages.  At least start_isolate_page_range
      will mark those set_migratetype_isolate.  This doesn't represent any
      immediate AFAICS because alloc_contig_range will fail to isolate those
      pages but it relies on not fully initialized page which will become a
      problem later when we stop associating offline pages to zones.  Use
      pfn_to_online_page to handle this.
      
      This is more a preparatory patch than a fix.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-10-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ce13640
  9. 09 5月, 2017 1 次提交
    • V
      mm, page_alloc: count movable pages when stealing from pageblock · 02aa0cdd
      Vlastimil Babka 提交于
      When stealing pages from pageblock of a different migratetype, we count
      how many free pages were stolen, and change the pageblock's migratetype
      if more than half of the pageblock was free.  This might be too
      conservative, as there might be other pages that are not free, but were
      allocated with the same migratetype as our allocation requested.
      
      While we cannot determine the migratetype of allocated pages precisely
      (at least without the page_owner functionality enabled), we can count
      pages that compaction would try to isolate for migration - those are
      either on LRU or __PageMovable().  The rest can be assumed to be
      MIGRATE_RECLAIMABLE or MIGRATE_UNMOVABLE, which we cannot easily
      distinguish.  This counting can be done as part of free page stealing
      with little additional overhead.
      
      The page stealing code is changed so that it considers free pages plus
      pages of the "good" migratetype for the decision whether to change
      pageblock's migratetype.
      
      The result should be more accurate migratetype of pageblocks wrt the
      actual pages in the pageblocks, when stealing from semi-occupied
      pageblocks.  This should help the efficiency of page grouping by
      mobility.
      
      In testing based on 4.9 kernel with stress-highalloc from mmtests
      configured for order-4 GFP_KERNEL allocations, this patch has reduced
      the number of unmovable allocations falling back to movable pageblocks
      by 47%.  The number of movable allocations falling back to other
      pageblocks are increased by 55%, but these events don't cause permanent
      fragmentation, so the tradeoff should be positive.  Later patches also
      offset the movable fallback increase to some extent.
      
      [akpm@linux-foundation.org: merge fix]
      Link: http://lkml.kernel.org/r/20170307131545.28577-5-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02aa0cdd
  10. 04 5月, 2017 1 次提交
  11. 23 2月, 2017 2 次提交
  12. 08 10月, 2016 1 次提交
  13. 27 7月, 2016 3 次提交
  14. 20 5月, 2016 2 次提交
  15. 02 4月, 2016 2 次提交
  16. 16 1月, 2016 1 次提交
  17. 15 1月, 2016 3 次提交
  18. 09 9月, 2015 2 次提交
    • N
      mm, page_isolation: make set/unset_migratetype_isolate() file-local · c5b4e1b0
      Naoya Horiguchi 提交于
      Nowaday, set/unset_migratetype_isolate() is defined and used only in
      mm/page_isolation, so let's limit the scope within the file.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5b4e1b0
    • V
      mm, page_isolation: remove bogus tests for isolated pages · aa016d14
      Vlastimil Babka 提交于
      The __test_page_isolated_in_pageblock() is used to verify whether all
      pages in pageblock were either successfully isolated, or are hwpoisoned.
      Two of the possible state of pages, that are tested, are however bogus
      and misleading.
      
      Both tests rely on get_freepage_migratetype(page), which however has no
      guarantees about pages on freelists.  Specifically, it doesn't guarantee
      that the migratetype returned by the function actually matches the
      migratetype of the freelist that the page is on.  Such guarantee is not
      its purpose and would have negative impact on allocator performance.
      
      The first test checks whether the freepage_migratetype equals
      MIGRATE_ISOLATE, supposedly to catch races between page isolation and
      allocator activity.  These races should be fixed nowadays with
      51bb1a40 ("mm/page_alloc: add freepage on isolate pageblock to correct
      buddy list") and related patches.  As explained above, the check
      wouldn't be able to catch them reliably anyway.  For the same reason
      false positives can happen, although they are harmless, as the
      move_freepages() call would just move the page to the same freelist it's
      already on.  So removing the test is not a bug fix, just cleanup.  After
      this patch, we assume that all PageBuddy pages are on the correct
      freelist and that the races were really fixed.  A truly reliable
      verification in the form of e.g.  VM_BUG_ON() would be complicated and
      is arguably not needed.
      
      The second test (page_count(page) == 0 && get_freepage_migratetype(page)
      == MIGRATE_ISOLATE) is probably supposed (the code comes from a big
      memory isolation patch from 2007) to catch pages on MIGRATE_ISOLATE
      pcplists.  However, pcplists don't contain MIGRATE_ISOLATE freepages
      nowadays, those are freed directly to free lists, so the check is
      obsolete.  Remove it as well.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Seungho Park <seungho1.park@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aa016d14
  19. 15 5月, 2015 1 次提交
    • H
      CMA: page_isolation: check buddy before accessing it · 1ae7013d
      Hui Zhu 提交于
      I had an issue:
      
          Unable to handle kernel NULL pointer dereference at virtual address 0000082a
          pgd = cc970000
          [0000082a] *pgd=00000000
          Internal error: Oops: 5 [#1] PREEMPT SMP ARM
          PC is at get_pageblock_flags_group+0x5c/0xb0
          LR is at unset_migratetype_isolate+0x148/0x1b0
          pc : [<c00cc9a0>]    lr : [<c0109874>]    psr: 80000093
          sp : c7029d00  ip : 00000105  fp : c7029d1c
          r10: 00000001  r9 : 0000000a  r8 : 00000004
          r7 : 60000013  r6 : 000000a4  r5 : c0a357e4  r4 : 00000000
          r3 : 00000826  r2 : 00000002  r1 : 00000000  r0 : 0000003f
          Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
          Control: 10c5387d  Table: 2cb7006a  DAC: 00000015
          Backtrace:
              get_pageblock_flags_group+0x0/0xb0
              unset_migratetype_isolate+0x0/0x1b0
              undo_isolate_page_range+0x0/0xdc
              __alloc_contig_range+0x0/0x34c
              alloc_contig_range+0x0/0x18
      
      This issue is because when calling unset_migratetype_isolate() to unset
      a part of CMA memory, it try to access the buddy page to get its status:
      
      		if (order >= pageblock_order) {
      			page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
      			buddy_idx = __find_buddy_index(page_idx, order);
      			buddy = page + (buddy_idx - page_idx);
      
      			if (!is_migrate_isolate_page(buddy)) {
      
      But the begin addr of this part of CMA memory is very close to a part of
      memory that is reserved at boot time (not in buddy system).  So add a
      check before accessing it.
      
      [akpm@linux-foundation.org: use conventional code layout]
      Signed-off-by: NHui Zhu <zhuhui@xiaomi.com>
      Suggested-by: NLaura Abbott <labbott@redhat.com>
      Suggested-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ae7013d
  20. 26 3月, 2015 1 次提交
    • L
      mm/page_alloc.c: call kernel_map_pages in unset_migrateype_isolate · cfa86943
      Laura Abbott 提交于
      Commit 3c605096 ("mm/page_alloc: restrict max order of merging on
      isolated pageblock") changed the logic of unset_migratetype_isolate to
      check the buddy allocator and explicitly call __free_pages to merge.
      
      The page that is being freed in this path never had prep_new_page called
      so set_page_refcounted is called explicitly but there is no call to
      kernel_map_pages.  With the default kernel_map_pages this is mostly
      harmless but if kernel_map_pages does any manipulation of the page
      tables (unmapping or setting pages to read only) this may trigger a
      fault:
      
          alloc_contig_range test_pages_isolated(ceb00, ced00) failed
          Unable to handle kernel paging request at virtual address ffffffc0cec00000
          pgd = ffffffc045fc4000
          [ffffffc0cec00000] *pgd=0000000000000000
          Internal error: Oops: 9600004f [#1] PREEMPT SMP
          Modules linked in: exfatfs
          CPU: 1 PID: 23237 Comm: TimedEventQueue Not tainted 3.10.49-gc72ad36-dirty #1
          task: ffffffc03de52100 ti: ffffffc015388000 task.ti: ffffffc015388000
          PC is at memset+0xc8/0x1c0
          LR is at kernel_map_pages+0x1ec/0x244
      
      Fix this by calling kernel_map_pages to ensure the page is set in the
      page table properly
      
      Fixes: 3c605096 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
      Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfa86943
  21. 11 12月, 2014 2 次提交
    • V
      mm, page_isolation: drain single zone pcplists · ec25af84
      Vlastimil Babka 提交于
      When setting MIGRATETYPE_ISOLATE on a pageblock, pcplists are drained to
      have a better chance that all pages will be successfully isolated and
      not left in the per-cpu caches.  Since isolation is always concerned
      with a single zone, we can reduce the pcplists drain to the single zone,
      which is now possible.
      
      The change should make memory isolation faster and not disturbing
      unrelated pcplists anymore.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec25af84
    • V
      mm: introduce single zone pcplists drain · 93481ff0
      Vlastimil Babka 提交于
      The functions for draining per-cpu pages back to buddy allocators
      currently always operate on all zones.  There are however several cases
      where the drain is only needed in the context of a single zone, and
      spilling other pcplists is a waste of time both due to the extra
      spilling and later refilling.
      
      This patch introduces new zone pointer parameter to drain_all_pages()
      and changes the dummy parameter of drain_local_pages() to be also a zone
      pointer.  When NULL is passed, the functions operate on all zones as
      usual.  Passing a specific zone pointer reduces the work to the single
      zone.
      
      All callers are updated to pass the NULL pointer in this patch.
      Conversion to single zone (where appropriate) is done in further
      patches.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93481ff0
  22. 14 11月, 2014 2 次提交
    • J
      mm/page_alloc: restrict max order of merging on isolated pageblock · 3c605096
      Joonsoo Kim 提交于
      Current pageblock isolation logic could isolate each pageblock
      individually.  This causes freepage accounting problem if freepage with
      pageblock order on isolate pageblock is merged with other freepage on
      normal pageblock.  We can prevent merging by restricting max order of
      merging to pageblock order if freepage is on isolate pageblock.
      
      A side-effect of this change is that there could be non-merged buddy
      freepage even if finishing pageblock isolation, because undoing
      pageblock isolation is just to move freepage from isolate buddy list to
      normal buddy list rather than to consider merging.  So, the patch also
      makes undoing pageblock isolation consider freepage merge.  When
      un-isolation, freepage with more than pageblock order and it's buddy are
      checked.  If they are on normal pageblock, instead of just moving, we
      isolate the freepage and free it in order to get merged.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c605096
    • J
      mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype · ad53f92e
      Joonsoo Kim 提交于
      Before describing bugs itself, I first explain definition of freepage.
      
       1. pages on buddy list are counted as freepage.
       2. pages on isolate migratetype buddy list are *not* counted as freepage.
       3. pages on cma buddy list are counted as CMA freepage, too.
      
      Now, I describe problems and related patch.
      
      Patch 1: There is race conditions on getting pageblock migratetype that
      it results in misplacement of freepages on buddy list, incorrect
      freepage count and un-availability of freepage.
      
      Patch 2: Freepages on pcp list could have stale cached information to
      determine migratetype of buddy list to go.  This causes misplacement of
      freepages on buddy list and incorrect freepage count.
      
      Patch 4: Merging between freepages on different migratetype of
      pageblocks will cause freepages accouting problem.  This patch fixes it.
      
      Without patchset [3], above problem doesn't happens on my CMA allocation
      test, because CMA reserved pages aren't used at all.  So there is no
      chance for above race.
      
      With patchset [3], I did simple CMA allocation test and get below
      result:
      
       - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
       - run kernel build (make -j16) on background
       - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
       - Result: more than 5000 freepage count are missed
      
      With patchset [3] and this patchset, I found that no freepage count are
      missed so that I conclude that problems are solved.
      
      On my simple memory offlining test, these problems also occur on that
      environment, too.
      
      This patch (of 4):
      
      There are two paths to reach core free function of buddy allocator,
      __free_one_page(), one is free_one_page()->__free_one_page() and the
      other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
      Each paths has race condition causing serious problems.  At first, this
      patch is focused on first type of freepath.  And then, following patch
      will solve the problem in second type of freepath.
      
      In the first type of freepath, we got migratetype of freeing page
      without holding the zone lock, so it could be racy.  There are two cases
      of this race.
      
       1. pages are added to isolate buddy list after restoring orignal
          migratetype
      
          CPU1                                   CPU2
      
          get migratetype => return MIGRATE_ISOLATE
          call free_one_page() with MIGRATE_ISOLATE
      
                                      grab the zone lock
                                      unisolate pageblock
                                      release the zone lock
      
          grab the zone lock
          call __free_one_page() with MIGRATE_ISOLATE
          freepage go into isolate buddy list,
          although pageblock is already unisolated
      
      This may cause two problems.  One is that we can't use this page anymore
      until next isolation attempt of this pageblock, because freepage is on
      isolate buddy list.  The other is that freepage accouting could be wrong
      due to merging between different buddy list.  Freepages on isolate buddy
      list aren't counted as freepage, but ones on normal buddy list are
      counted as freepage.  If merge happens, buddy freepage on normal buddy
      list is inevitably moved to isolate buddy list without any consideration
      of freepage accouting so it could be incorrect.
      
       2. pages are added to normal buddy list while pageblock is isolated.
          It is similar with above case.
      
      This also may cause two problems.  One is that we can't keep these
      freepages from being allocated.  Although this pageblock is isolated,
      freepage would be added to normal buddy list so that it could be
      allocated without any restriction.  And the other problem is same as
      case 1, that it, incorrect freepage accouting.
      
      This race condition would be prevented by checking migratetype again
      with holding the zone lock.  Because it is somewhat heavy operation and
      it isn't needed in common case, we want to avoid rechecking as much as
      possible.  So this patch introduce new variable, nr_isolate_pageblock in
      struct zone to check if there is isolated pageblock.  With this, we can
      avoid to re-check migratetype in common case and do it only if there is
      isolated pageblock or migratetype is MIGRATE_ISOLATE.  This solve above
      mentioned problems.
      
      Changes from v3:
      Add one more check in free_one_page() that checks whether migratetype is
      MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad53f92e
  23. 12 9月, 2013 1 次提交
    • N
      mm: memory-hotplug: enable memory hotplug to handle hugepage · c8721bbb
      Naoya Horiguchi 提交于
      Until now we can't offline memory blocks which contain hugepages because a
      hugepage is considered as an unmovable page.  But now with this patch
      series, a hugepage has become movable, so by using hugepage migration we
      can offline such memory blocks.
      
      What's different from other users of hugepage migration is that we need to
      decompose all the hugepages inside the target memory block into free buddy
      pages after hugepage migration, because otherwise free hugepages remaining
      in the memory block intervene the memory offlining.  For this reason we
      introduce new functions dissolve_free_huge_page() and
      dissolve_free_huge_pages().
      
      Other than that, what this patch does is straightforwardly to add hugepage
      migration code, that is, adding hugepage code to the functions which scan
      over pfn and collect hugepages to be migrated, and adding a hugepage
      allocation function to alloc_migrate_target().
      
      As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
      over them because it's larger than memory block.  So we now simply leave
      it to fail as it is.
      
      [yongjun_wei@trendmicro.com.cn: remove duplicated include]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8721bbb
  24. 20 8月, 2013 1 次提交
  25. 05 1月, 2013 1 次提交
  26. 12 12月, 2012 1 次提交
    • W
      memory-hotplug: skip HWPoisoned page when offlining pages · b023f468
      Wen Congyang 提交于
      hwpoisoned may be set when we offline a page by the sysfs interface
      /sys/devices/system/memory/soft_offline_page or
      /sys/devices/system/memory/hard_offline_page. If we don't clear
      this flag when onlining pages, this page can't be freed, and will
      not in free list. So we can't offline these pages again. So we
      should skip such page when offlining pages.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b023f468
  27. 09 10月, 2012 3 次提交