1. 15 8月, 2015 1 次提交
  2. 07 8月, 2015 4 次提交
    • N
      mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_* · f4c18e6f
      Naoya Horiguchi 提交于
      The race condition addressed in commit add05cec ("mm: soft-offline:
      don't free target page in successful page migration") was not closed
      completely, because that can happen not only for soft-offline, but also
      for hard-offline.  Consider that a slab page is about to be freed into
      buddy pool, and then an uncorrected memory error hits the page just
      after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
      PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
      necessary because the data on the affected page is not consumed.
      
      To solve it, this patch drops __PG_HWPOISON from page flag checks at
      allocation/free time.  I think it's justified because __PG_HWPOISON
      flags is defined to prevent the page from being reused, and setting it
      outside the page's alloc-free cycle is a designed behavior (not a bug.)
      
      For recent months, I was annoyed about BUG_ON when soft-offlined page
      remains on lru cache list for a while, which is avoided by calling
      put_page() instead of putback_lru_page() in page migration's success
      path.  This means that this patch reverts a major change from commit
      add05cec about the new refcounting rule of soft-offlined pages, so
      "reuse window" revives.  This will be closed by a subsequent patch.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dean Nelson <dnelson@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4c18e6f
    • M
      fs, file table: reinit files_stat.max_files after deferred memory initialisation · 4248b0da
      Mel Gorman 提交于
      Dave Hansen reported the following;
      
      	My laptop has been behaving strangely with 4.2-rc2.  Once I log
      	in to my X session, I start getting all kinds of strange errors
      	from applications and see this in my dmesg:
      
              	VFS: file-max limit 8192 reached
      
      The problem is that the file-max is calculated before memory is fully
      initialised and miscalculates how much memory the kernel is using.  This
      patch recalculates file-max after deferred memory initialisation.  Note
      that using memory hotplug infrastructure would not have avoided this
      problem as the value is not recalculated after memory hot-add.
      
      4.1:             files_stat.max_files = 6582781
      4.2-rc2:         files_stat.max_files = 8192
      4.2-rc2 patched: files_stat.max_files = 6562467
      
      Small differences with the patch applied and 4.1 but not enough to matter.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alex Ng <alexng@microsoft.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4248b0da
    • N
      mm, meminit: replace rwsem with completion · d3cd131d
      Nicolai Stange 提交于
      Commit 0e1cc95b ("mm: meminit: finish initialisation of struct pages
      before basic setup") introduced a rwsem to signal completion of the
      initialization workers.
      
      Lockdep complains about possible recursive locking:
        =============================================
        [ INFO: possible recursive locking detected ]
        4.1.0-12802-g1dc51b82 #3 Not tainted
        ---------------------------------------------
        swapper/0/1 is trying to acquire lock:
        (pgdat_init_rwsem){++++.+},
          at: [<ffffffff8424c7fb>] page_alloc_init_late+0xc7/0xe6
      
        but task is already holding lock:
        (pgdat_init_rwsem){++++.+},
          at: [<ffffffff8424c772>] page_alloc_init_late+0x3e/0xe6
      
      Replace the rwsem by a completion together with an atomic
      "outstanding work counter".
      
      [peterz@infradead.org: Barrier removal on the grounds of being pointless]
      [mgorman@suse.de: Applied review feedback]
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alex Ng <alexng@microsoft.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3cd131d
    • M
      mm, meminit: allow early_pfn_to_nid to be used during runtime · 7ace9917
      Mel Gorman 提交于
      early_pfn_to_nid() historically was inherently not SMP safe but only
      used during boot which is inherently single threaded or during hotplug
      which is protected by a giant mutex.
      
      With deferred memory initialisation there was a thread-safe version
      introduced and the early_pfn_to_nid would trigger a BUG_ON if used
      unsafely.  Memory hotplug hit that check.  This patch makes
      early_pfn_to_nid introduces a lock to make it safe to use during
      hotplug.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NAlex Ng <alexng@microsoft.com>
      Tested-by: NAlex Ng <alexng@microsoft.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ace9917
  3. 18 7月, 2015 3 次提交
    • J
      mm/page_owner: set correct gfp_mask on page_owner · e2cfc911
      Joonsoo Kim 提交于
      Currently, we set wrong gfp_mask to page_owner info in case of isolated
      freepage by compaction and split page.  It causes incorrect mixed
      pageblock report that we can get from '/proc/pagetypeinfo'.  This metric
      is really useful to measure fragmentation effect so should be accurate.
      This patch fixes it by setting correct information.
      
      Without this patch, after kernel build workload is finished, number of
      mixed pageblock is 112 among roughly 210 movable pageblocks.
      
      But, with this fix, output shows that mixed pageblock is just 57.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2cfc911
    • J
      mm/page_owner: fix possible access violation · f3a14ced
      Joonsoo Kim 提交于
      When I tested my new patches, I found that page pointer which is used
      for setting page_owner information is changed.  This is because page
      pointer is used to set new migratetype in loop.  After this work, page
      pointer could be out of bound.  If this wrong pointer is used for
      page_owner, access violation happens.  Below is error message that I
      got.
      
        BUG: unable to handle kernel paging request at 0000000000b00018
        IP: [<ffffffff81025f30>] save_stack_address+0x30/0x40
        PGD 1af2d067 PUD 166e0067 PMD 0
        Oops: 0002 [#1] SMP
        ...snip...
        Call Trace:
          print_context_stack+0xcf/0x100
          dump_trace+0x15f/0x320
          save_stack_trace+0x2f/0x50
          __set_page_owner+0x46/0x70
          __isolate_free_page+0x1f7/0x210
          split_free_page+0x21/0xb0
          isolate_freepages_block+0x1e2/0x410
          compaction_alloc+0x22d/0x2d0
          migrate_pages+0x289/0x8b0
          compact_zone+0x409/0x880
          compact_zone_order+0x6d/0x90
          try_to_compact_pages+0x110/0x210
          __alloc_pages_direct_compact+0x3d/0xe6
          __alloc_pages_nodemask+0x6cd/0x9a0
          alloc_pages_current+0x91/0x100
          runtest_store+0x296/0xa50
          simple_attr_write+0xbd/0xe0
          __vfs_write+0x28/0xf0
          vfs_write+0xa9/0x1b0
          SyS_write+0x46/0xb0
          system_call_fastpath+0x16/0x75
      
      This patch fixes this error by moving up set_page_owner().
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3a14ced
    • M
      mm, meminit: suppress unused memory variable warning · ae026b2a
      Mel Gorman 提交于
      The kbuild test robot reported the following
      
        tree:   git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
        head:   14a6f198
        commit: 3b242c66 x86: mm: enable deferred struct page initialisation on x86-64
        date:   3 days ago
        config: x86_64-randconfig-x006-201527 (attached as .config)
        reproduce:
          git checkout 3b242c66
          # save the attached .config to linux build tree
          make ARCH=x86_64
      
        All warnings (new ones prefixed by >>):
      
           mm/page_alloc.c: In function 'early_page_uninitialised':
        >> mm/page_alloc.c:247:6: warning: unused variable 'nid' [-Wunused-variable]
             int nid = early_pfn_to_nid(pfn);
      
      It's due to the NODE_DATA macro ignoring the nid parameter on !NUMA
      configurations.  This patch avoids the warning by not declaring nid.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae026b2a
  4. 01 7月, 2015 12 次提交
  5. 25 6月, 2015 5 次提交
  6. 12 5月, 2015 1 次提交
    • A
      mm/net: Rename and move page fragment handling from net/ to mm/ · b63ae8ca
      Alexander Duyck 提交于
      This change moves the __alloc_page_frag functionality out of the networking
      stack and into the page allocation portion of mm.  The idea it so help make
      this maintainable by placing it with other page allocation functions.
      
      Since we are moving it from skbuff.c to page_alloc.c I have also renamed
      the basic defines and structure from netdev_alloc_cache to page_frag_cache
      to reflect that this is now part of a different kernel subsystem.
      
      I have also added a simple __free_page_frag function which can handle
      freeing the frags based on the skb->head pointer.  The model for this is
      based off of __free_pages since we don't actually need to deal with all of
      the cases that put_page handles.  I incorporated the virt_to_head_page call
      and compound_order into the function as it actually allows for a signficant
      size reduction by reducing code duplication.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b63ae8ca
  7. 16 4月, 2015 1 次提交
  8. 15 4月, 2015 7 次提交
    • Y
      42ff2703
    • D
      mm: remove GFP_THISNODE · 4167e9b2
      David Rientjes 提交于
      NOTE: this is not about __GFP_THISNODE, this is only about GFP_THISNODE.
      
      GFP_THISNODE is a secret combination of gfp bits that have different
      behavior than expected.  It is a combination of __GFP_THISNODE,
      __GFP_NORETRY, and __GFP_NOWARN and is special-cased in the page
      allocator slowpath to fail without trying reclaim even though it may be
      used in combination with __GFP_WAIT.
      
      An example of the problem this creates: commit e97ca8e5 ("mm: fix
      GFP_THISNODE callers and clarify") fixed up many users of GFP_THISNODE
      that really just wanted __GFP_THISNODE.  The problem doesn't end there,
      however, because even it was a no-op for alloc_misplaced_dst_page(),
      which also sets __GFP_NORETRY and __GFP_NOWARN, and
      migrate_misplaced_transhuge_page(), where __GFP_NORETRY and __GFP_NOWAIT
      is set in GFP_TRANSHUGE.  Converting GFP_THISNODE to __GFP_THISNODE is a
      no-op in these cases since the page allocator special-cases
      __GFP_THISNODE && __GFP_NORETRY && __GFP_NOWARN.
      
      It's time to just remove GFP_THISNODE entirely.  We leave __GFP_THISNODE
      to restrict an allocation to a local node, but remove GFP_THISNODE and
      its obscurity.  Instead, we require that a caller clear __GFP_WAIT if it
      wants to avoid reclaim.
      
      This allows the aforementioned functions to actually reclaim as they
      should.  It also enables any future callers that want to do
      __GFP_THISNODE but also __GFP_NORETRY && __GFP_NOWARN to reclaim.  The
      rule is simple: if you don't want to reclaim, then don't set __GFP_WAIT.
      
      Aside: ovs_flow_stats_update() really wants to avoid reclaim as well, so
      it is unchanged.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Jarno Rajahalme <jrajahalme@nicira.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4167e9b2
    • K
      mm: completely remove dumping per-cpu lists from show_mem() · 761b0677
      Konstantin Khlebnikov 提交于
      It seems nobody needs this.
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      761b0677
    • K
      mm: hide per-cpu lists in output of show_mem() · d1bfcdb8
      Konstantin Khlebnikov 提交于
      This makes show_mem() much less verbose on huge machines.  Instead of huge
      and almost useless dump of counters for each per-zone per-cpu lists this
      patch prints the sum of these counters for each zone (free_pcp) and size
      of per-cpu list for current cpu (local_pcp).
      
      The filter flag SHOW_MEM_PERCPU_LISTS reverts to the old verbose mode.
      
      [akpm@linux-foundation.org: update show_free_areas comment]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1bfcdb8
    • J
      mm/compaction: enhance compaction finish condition · 2149cdae
      Joonsoo Kim 提交于
      Compaction has anti fragmentation algorithm.  It is that freepage should
      be more than pageblock order to finish the compaction if we don't find any
      freepage in requested migratetype buddy list.  This is for mitigating
      fragmentation, but, there is a lack of migratetype consideration and it is
      too excessive compared to page allocator's anti fragmentation algorithm.
      
      Not considering migratetype would cause premature finish of compaction.
      For example, if allocation request is for unmovable migratetype, freepage
      with CMA migratetype doesn't help that allocation and compaction should
      not be stopped.  But, current logic regards this situation as compaction
      is no longer needed, so finish the compaction.
      
      Secondly, condition is too excessive compared to page allocator's logic.
      We can steal freepage from other migratetype and change pageblock
      migratetype on more relaxed conditions in page allocator.  This is
      designed to prevent fragmentation and we can use it here.  Imposing hard
      constraint only to the compaction doesn't help much in this case since
      page allocator would cause fragmentation again.
      
      To solve these problems, this patch borrows anti fragmentation logic from
      page allocator.  It will reduce premature compaction finish in some cases
      and reduce excessive compaction work.
      
      stress-highalloc test in mmtests with non movable order 7 allocation shows
      considerable increase of compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      31.82 : 42.20
      
      I tested it on non-reboot 5 runs stress-highalloc benchmark and found that
      there is no more degradation on allocation success rate than before.  That
      roughly means that this patch doesn't result in more fragmentations.
      
      Vlastimil suggests additional idea that we only test for fallbacks when
      migration scanner has scanned a whole pageblock.  It looked good for
      fragmentation because chance of stealing increase due to making more free
      pages in certain pageblock.  So, I tested it, but, it results in decreased
      compaction success rate, roughly 38.00.  I guess the reason that if system
      is low memory condition, watermark check could be failed due to not enough
      order 0 free page and so, sometimes, we can't reach a fallback check
      although migrate_pfn is aligned to pageblock_nr_pages.  I can insert code
      to cope with this situation but it makes code more complicated so I don't
      include his idea at this patch.
      
      [akpm@linux-foundation.org: fix CONFIG_CMA=n build]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2149cdae
    • J
      mm/page_alloc: factor out fallback freepage checking · 4eb7dce6
      Joonsoo Kim 提交于
      This is preparation step to use page allocator's anti fragmentation logic
      in compaction.  This patch just separates fallback freepage checking part
      from fallback freepage management part.  Therefore, there is no functional
      change.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4eb7dce6
    • J
      mm/cma: change fallback behaviour for CMA freepage · dc67647b
      Joonsoo Kim 提交于
      Freepage with MIGRATE_CMA can be used only for MIGRATE_MOVABLE and they
      should not be expanded to other migratetype buddy list to protect them
      from unmovable/reclaimable allocation.  Implementing these requirements in
      __rmqueue_fallback(), that is, finding largest possible block of freepage
      has bad effect that high order freepage with MIGRATE_CMA are broken
      continually although there are suitable order CMA freepage.  Reason is
      that they are not be expanded to other migratetype buddy list and next
      __rmqueue_fallback() invocation try to finds another largest block of
      freepage and break it again.  So, MIGRATE_CMA fallback should be handled
      separately.  This patch introduces __rmqueue_cma_fallback(), that just
      wrapper of __rmqueue_smallest() and call it before __rmqueue_fallback() if
      migratetype == MIGRATE_MOVABLE.
      
      This results in unintended behaviour change that MIGRATE_CMA freepage is
      always used first rather than other migratetype as movable allocation's
      fallback.  But, as already mentioned above, MIGRATE_CMA can be used only
      for MIGRATE_MOVABLE, so it is better to use MIGRATE_CMA freepage first as
      much as possible.  Otherwise, we needlessly take up precious freepages
      with other migratetype and increase chance of fragmentation.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc67647b
  9. 13 3月, 2015 1 次提交
    • M
      mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled · e009d5dc
      Michal Hocko 提交于
      Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
      after OOM killer is disabled if the allocation is performed by a kernel
      thread.  This behavior was introduced from the very beginning by
      7f33d49a ("mm, PM/Freezer: Disable OOM killer when tasks are frozen").
       This means that the basic contract for the allocation request is broken
      and the context requesting such an allocation might blow up unexpectedly.
      
      There are basically two ways forward.
      
      1) move oom_killer_disable after kernel threads are frozen.  This has a
         risk that the OOM victim wouldn't be able to finish because it would
         depend on an already frozen kernel thread.  This would be really tricky
         to debug.
      
      2) do not fail GFP_NOFAIL allocation no matter what and risk a
         potential Freezable kernel threads will loop and fail the suspend.
         Incidental allocations after kernel threads are frozen will at least
         dump a warning - if we are lucky and the serial console is still active
         of course...
      
      This patch implements the later option because it is safer.  We would see
      warning rather than allocation failures for the kernel threads which would
      blow up otherwise and have a higher chances to identify __GFP_NOFAIL users
      from deeper pm code.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDavid Rientjes <rientjes@gooogle.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e009d5dc
  10. 01 3月, 2015 1 次提交
  11. 14 2月, 2015 1 次提交
    • A
      mm: page_alloc: add kasan hooks on alloc and free paths · b8c73fc2
      Andrey Ryabinin 提交于
      Add kernel address sanitizer hooks to mark allocated page's addresses as
      accessible in corresponding shadow region.  Mark freed pages as
      inaccessible.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8c73fc2
  12. 13 2月, 2015 2 次提交
  13. 12 2月, 2015 1 次提交
    • V
      mm: more aggressive page stealing for UNMOVABLE allocations · 9c0415eb
      Vlastimil Babka 提交于
      When allocation falls back to stealing free pages of another migratetype,
      it can decide to steal extra pages, or even the whole pageblock in order
      to reduce fragmentation, which could happen if further allocation
      fallbacks pick a different pageblock.  In try_to_steal_freepages(), one of
      the situations where extra pages are stolen happens when we are trying to
      allocate a MIGRATE_RECLAIMABLE page.
      
      However, MIGRATE_UNMOVABLE allocations are not treated the same way,
      although spreading such allocation over multiple fallback pageblocks is
      arguably even worse than it is for RECLAIMABLE allocations.  To minimize
      fragmentation, we should minimize the number of such fallbacks, and thus
      steal as much as is possible from each fallback pageblock.
      
      Note that in theory this might put more pressure on movable pageblocks and
      cause movable allocations to steal back from unmovable pageblocks.
      However, movable allocations are not as aggressive with stealing, and do
      not cause permanent fragmentation, so the tradeoff is reasonable, and
      evaluation seems to support the change.
      
      This patch thus adds a check for MIGRATE_UNMOVABLE to the decision to
      steal extra free pages.  When evaluating with stress-highalloc from
      mmtests, this has reduced the number of MIGRATE_UNMOVABLE fallbacks to
      roughly 1/6.  The number of these fallbacks stealing from MIGRATE_MOVABLE
      block is reduced to 1/3.  There was no observation of growing number of
      unmovable pageblocks over time, and also not of increased movable
      allocation fallbacks.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c0415eb