1. 18 3月, 2016 31 次提交
    • C
      include/linux/list_bl.h: use bool instead of int for boolean functions · 26a247fd
      Chen Gang 提交于
      hlist_bl_unhashed() and hlist_bl_empty() are all boolean functions, so
      return bool instead of int.
      Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26a247fd
    • R
      include/uapi/linux/elf-em.h: remove v850 · faeb50b9
      Rob Landley 提交于
      The v850 port was removed by commits f606ddf4 and 07a887d3 in
      2008.  These #defines are not used in the current kernel.
      Signed-off-by: NRob Landley <rob@landley.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faeb50b9
    • C
      fix Christoph's email addresses · 93e205a7
      Christoph Lameter 提交于
      There are various email addresses for me throughout the kernel.  Use the
      one that will always be valid.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93e205a7
    • S
      bug: set warn variable before calling WARN() · dfbf2897
      Steven Rostedt 提交于
      This has hit me a couple of times already.  I would be debugging code
      and the system would simply hang and then reboot.  Finally, I found that
      the problem was caused by WARN_ON_ONCE() and friends.
      
      The macro WARN_ON_ONCE(condition) is defined as:
      
      	static bool __section(.data.unlikely) __warned;
      	int __ret_warn_once = !!(condition);
      
      	if (unlikely(__ret_warn_once))
      		if (WARN_ON(!__warned))
      			__warned = true;
      
      	unlikely(__ret_warn_once);
      
      Which looks great and all.  But what I have hit, is an issue when
      WARN_ON() itself hits the same WARN_ON_ONCE() code.  Because, the
      variable __warned is not yet set.  Then it too calls WARN_ON() and that
      triggers the warning again.  It keeps doing this until the stack is
      overflowed and the system crashes.
      
      By setting __warned first before calling WARN_ON() makes the original
      WARN_ON_ONCE() really only warn once, and not an infinite amount of
      times if the WARN_ON() also triggers the warning.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfbf2897
    • J
      timer: convert timer_slack_ns from unsigned long to u64 · da8b44d5
      John Stultz 提交于
      This patchset introduces a /proc/<pid>/timerslack_ns interface which
      would allow controlling processes to be able to set the timerslack value
      on other processes in order to save power by avoiding wakeups (Something
      Android currently does via out-of-tree patches).
      
      The first patch tries to fix the internal timer_slack_ns usage which was
      defined as a long, which limits the slack range to ~4 seconds on 32bit
      systems.  It converts it to a u64, which provides the same basically
      unlimited slack (500 years) on both 32bit and 64bit machines.
      
      The second patch introduces the /proc/<pid>/timerslack_ns interface
      which allows the full 64bit slack range for a task to be read or set on
      both 32bit and 64bit machines.
      
      With these two patches, on a 32bit machine, after setting the slack on
      bash to 10 seconds:
      
      $ time sleep 1
      
      real    0m10.747s
      user    0m0.001s
      sys     0m0.005s
      
      The first patch is a little ugly, since I had to chase the slack delta
      arguments through a number of functions converting them to u64s.  Let me
      know if it makes sense to break that up more or not.
      
      Other than that things are fairly straightforward.
      
      This patch (of 2):
      
      The timer_slack_ns value in the task struct is currently a unsigned
      long.  This means that on 32bit applications, the maximum slack is just
      over 4 seconds.  However, on 64bit machines, its much much larger (~500
      years).
      
      This disparity could make application development a little (as well as
      the default_slack) to a u64.  This means both 32bit and 64bit systems
      have the same effective internal slack range.
      
      Now the existing ABI via PR_GET_TIMERSLACK and PR_SET_TIMERSLACK specify
      the interface as a unsigned long, so we preserve that limitation on
      32bit systems, where SET_TIMERSLACK can only set the slack to a unsigned
      long value, and GET_TIMERSLACK will return ULONG_MAX if the slack is
      actually larger then what can be stored by an unsigned long.
      
      This patch also modifies hrtimer functions which specified the slack
      delta as a unsigned long.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oren Laadan <orenl@cellrox.com>
      Cc: Ruchi Kandoi <kandoiruchi@google.com>
      Cc: Rom Lemarchand <romlem@android.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Android Kernel Team <kernel-team@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da8b44d5
    • K
      thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers · fec89c10
      Kirill A. Shutemov 提交于
      freeze_page() and unfreeze_page() helpers evolved in rather complex
      beasts.  It would be nice to cut complexity of this code.
      
      This patch rewrites freeze_page() using standard try_to_unmap().
      unfreeze_page() is rewritten with remove_migration_ptes().
      
      The result is much simpler.
      
      But the new variant is somewhat slower for PTE-mapped THPs.  Current
      helpers iterates over VMAs the compound page is mapped to, and then over
      ptes within this VMA.  New helpers iterates over small page, then over
      VMA the small page mapped to, and only then find relevant pte.
      
      We have short cut for PMD-mapped THP: we directly install migration
      entries on PMD split.
      
      I don't think the slowdown is critical, considering how much simpler
      result is and that split_huge_page() is quite rare nowadays.  It only
      happens due memory pressure or migration.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fec89c10
    • K
      mm: make remove_migration_ptes() beyond mm/migration.c · e388466d
      Kirill A. Shutemov 提交于
      Make remove_migration_ptes() available to be used in split_huge_page().
      
      New parameter 'locked' added: as with try_to_umap() we need a way to
      indicate that caller holds rmap lock.
      
      We also shouldn't try to mlock() pte-mapped huge pages: pte-mapeed THP
      pages are never mlocked.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e388466d
    • K
      rmap: extend try_to_unmap() to be usable by split_huge_page() · 2a52bcbc
      Kirill A. Shutemov 提交于
      Add support for two ttu_flags:
      
        - TTU_SPLIT_HUGE_PMD would split PMD if it's there, before trying to
          unmap page;
      
        - TTU_RMAP_LOCKED indicates that caller holds relevant rmap lock;
      
      Also, change rwc->done to !page_mapcount() instead of !page_mapped().
      try_to_unmap() works on pte level, so we are really interested in the
      mappedness of this small page rather than of the compound page it's a
      part of.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a52bcbc
    • K
      rmap: introduce rmap_walk_locked() · b9773199
      Kirill A. Shutemov 提交于
      This patchset rewrites freeze_page() and unfreeze_page() using
      try_to_unmap() and remove_migration_ptes().  Result is much simpler, but
      somewhat slower.
      
      Migration 8GiB worth of PMD-mapped THP:
      
        Baseline	20.21 +/- 0.393
        Patched	20.73 +/- 0.082
        Slowdown	1.03x
      
      It's 3% slower, comparing to 14% in v1.  I don't it should be a stopper.
      
      Splitting of PTE-mapped pages slowed more.  But this is not a common
      case.
      
      Migration 8GiB worth of PMD-mapped THP:
      
        Baseline	20.39 +/- 0.225
        Patched	22.43 +/- 0.496
        Slowdown	1.10x
      
      rmap_walk_locked() is the same as rmap_walk(), but the caller takes care
      of the relevant rmap lock.
      
      This is preparation for switching THP splitting from custom rmap walk in
      freeze_page()/unfreeze_page() to the generic one.
      
      There is no support for KSM pages for now: not clear which lock is
      implied.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9773199
    • J
      mm: remove VM_FAULT_MINOR · 0e8fb931
      Jan Kara 提交于
      The define has a comment from Nick Piggin from 2007:
      
       /* For backwards compat. Remove me quickly. */
      
      I guess 9 years should not be too hurried sense of 'quickly' even for
      kernel measures.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e8fb931
    • D
      mm: exclude ZONE_DEVICE from GFP_ZONE_TABLE · b11a7b94
      Dan Williams 提交于
      ZONE_DEVICE (merged in 4.3) and ZONE_CMA (proposed) are examples of new
      mm zones that are bumping up against the current maximum limit of 4
      zones, i.e.  2 bits in page->flags for the GFP_ZONE_TABLE.
      
      The GFP_ZONE_TABLE poses an interesting constraint since
      include/linux/gfp.h gets included by the 32-bit portion of a 64-bit
      build.  We need to be careful to only build the table for zones that
      have a corresponding gfp_t flag.  GFP_ZONES_SHIFT is introduced for this
      purpose.  This patch does not attempt to solve the problem of adding a
      new zone that also has a corresponding GFP_ flag.
      
      Vlastimil points out that ZONE_DEVICE, by depending on x86_64 and
      SPARSEMEM_VMEMMAP implies that SECTIONS_WIDTH is zero.  In other words
      even though ZONE_DEVICE does not fit in GFP_ZONE_TABLE it is free to
      consume another bit in page->flags (expand ZONES_WIDTH) with room to
      spare.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=110931
      Fixes: 033fbae9 ("mm: ZONE_DEVICE for "device memory"")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NMark <markk@clara.co.uk>
      Reported-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b11a7b94
    • J
      mm/page_ref: add tracepoint to track down page reference manipulation · 95813b8f
      Joonsoo Kim 提交于
      CMA allocation should be guaranteed to succeed by definition, but,
      unfortunately, it would be failed sometimes.  It is hard to track down
      the problem, because it is related to page reference manipulation and we
      don't have any facility to analyze it.
      
      This patch adds tracepoints to track down page reference manipulation.
      With it, we can find exact reason of failure and can fix the problem.
      Following is an example of tracepoint output.  (note: this example is
      stale version that printing flags as the number.  Recent version will
      print it as human readable string.)
      
      <...>-9018  [004]    92.678375: page_ref_set:         pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
      <...>-9018  [004]    92.678378: kernel_stack:
       => get_page_from_freelist (ffffffff81176659)
       => __alloc_pages_nodemask (ffffffff81176d22)
       => alloc_pages_vma (ffffffff811bf675)
       => handle_mm_fault (ffffffff8119e693)
       => __do_page_fault (ffffffff810631ea)
       => trace_do_page_fault (ffffffff81063543)
       => do_async_page_fault (ffffffff8105c40a)
       => async_page_fault (ffffffff817581d8)
      [snip]
      <...>-9018  [004]    92.678379: page_ref_mod:         pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
      [snip]
      ...
      ...
      <...>-9131  [001]    93.174468: test_pages_isolated:  start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
      [snip]
      <...>-9018  [004]    93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
       => release_pages (ffffffff8117c9e4)
       => free_pages_and_swap_cache (ffffffff811b0697)
       => tlb_flush_mmu_free (ffffffff81199616)
       => tlb_finish_mmu (ffffffff8119a62c)
       => exit_mmap (ffffffff811a53f7)
       => mmput (ffffffff81073f47)
       => do_exit (ffffffff810794e9)
       => do_group_exit (ffffffff81079def)
       => SyS_exit_group (ffffffff81079e74)
       => entry_SYSCALL_64_fastpath (ffffffff817560b6)
      
      This output shows that problem comes from exit path.  In exit path, to
      improve performance, pages are not freed immediately.  They are gathered
      and processed by batch.  During this process, migration cannot be
      possible and CMA allocation is failed.  This problem is hard to find
      without this page reference tracepoint facility.
      
      Enabling this feature bloat kernel text 30 KB in my configuration.
      
         text    data     bss     dec     hex filename
      12127327        2243616 1507328 15878271         f2487f vmlinux_disabled
      12157208        2258880 1507328 15923416         f2f8d8 vmlinux_enabled
      
      Note that, due to header file dependency problem between mm.h and
      tracepoint.h, this feature has to open code the static key functions for
      tracepoints.  Proposed by Steven Rostedt in following link.
      
      https://lkml.org/lkml/2015/12/9/699
      
      [arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
      [iamjoonsoo.kim@lge.com: fix build failure for xtensa]
      [akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95813b8f
    • J
      mm: introduce page reference manipulation functions · fe896d18
      Joonsoo Kim 提交于
      The success of CMA allocation largely depends on the success of
      migration and key factor of it is page reference count.  Until now, page
      reference is manipulated by direct calling atomic functions so we cannot
      follow up who and where manipulate it.  Then, it is hard to find actual
      reason of CMA allocation failure.  CMA allocation should be guaranteed
      to succeed so finding offending place is really important.
      
      In this patch, call sites where page reference is manipulated are
      converted to introduced wrapper function.  This is preparation step to
      add tracepoint to each page reference manipulation function.  With this
      facility, we can easily find reason of CMA allocation failure.  There is
      no functional change in this patch.
      
      In addition, this patch also converts reference read sites.  It will
      help a second step that renames page._count to something else and
      prevents later attempt to direct access to it (Suggested by Andrew).
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe896d18
    • M
      mm: thp: set THP defrag by default to madvise and add a stall-free defrag option · 444eb2a4
      Mel Gorman 提交于
      THP defrag is enabled by default to direct reclaim/compact but not wake
      kswapd in the event of a THP allocation failure.  The problem is that
      THP allocation requests potentially enter reclaim/compaction.  This
      potentially incurs a severe stall that is not guaranteed to be offset by
      reduced TLB misses.  While there has been considerable effort to reduce
      the impact of reclaim/compaction, it is still a high cost and workloads
      that should fit in memory fail to do so.  Specifically, a simple
      anon/file streaming workload will enter direct reclaim on NUMA at least
      even though the working set size is 80% of RAM.  It's been years and
      it's time to throw in the towel.
      
      First, this patch defines THP defrag as follows;
      
       madvise: A failed allocation will direct reclaim/compact if the application requests it
       never:   Neither reclaim/compact nor wake kswapd
       defer:   A failed allocation will wake kswapd/kcompactd
       always:  A failed allocation will direct reclaim/compact (historical behaviour)
                khugepaged defrag will enter direct/reclaim but not wake kswapd.
      
      Next it sets the default defrag option to be "madvise" to only enter
      direct reclaim/compaction for applications that specifically requested
      it.
      
      Lastly, it removes a check from the page allocator slowpath that is
      related to __GFP_THISNODE to allow "defer" to work.  The callers that
      really cares are slub/slab and they are updated accordingly.  The slab
      one may be surprising because it also corrects a comment as kswapd was
      never woken up by that path.
      
      This means that a THP fault will no longer stall for most applications
      by default and the ideal for most users that get THP if they are
      immediately available.  There are still options for users that prefer a
      stall at startup of a new application by either restoring historical
      behaviour with "always" or pick a half-way point with "defer" where
      kswapd does some of the work in the background and wakes kcompactd if
      necessary.  THP defrag for khugepaged remains enabled and will enter
      direct/reclaim but no wakeup kswapd or kcompactd.
      
      After this patch a THP allocation failure will quickly fallback and rely
      on khugepaged to recover the situation at some time in the future.  In
      some cases, this will reduce THP usage but the benefit of THP is hard to
      measure and not a universal win where as a stall to reclaim/compaction
      is definitely measurable and can be painful.
      
      The first test for this is using "usemem" to read a large file and write
      a large anonymous mapping (to avoid the zero page) multiple times.  The
      total size of the mappings is 80% of RAM and the benchmark simply
      measures how long it takes to complete.  It uses multiple threads to see
      if that is a factor.  On UMA, the performance is almost identical so is
      not reported but on NUMA, we see this
      
      usemem
                                         4.4.0                 4.4.0
                                kcompactd-v1r1         nodefrag-v1r3
      Amean    System-1       102.86 (  0.00%)       46.81 ( 54.50%)
      Amean    System-4        37.85 (  0.00%)       34.02 ( 10.12%)
      Amean    System-7        48.12 (  0.00%)       46.89 (  2.56%)
      Amean    System-12       51.98 (  0.00%)       56.96 ( -9.57%)
      Amean    System-21       80.16 (  0.00%)       79.05 (  1.39%)
      Amean    System-30      110.71 (  0.00%)      107.17 (  3.20%)
      Amean    System-48      127.98 (  0.00%)      124.83 (  2.46%)
      Amean    Elapsd-1       185.84 (  0.00%)      105.51 ( 43.23%)
      Amean    Elapsd-4        26.19 (  0.00%)       25.58 (  2.33%)
      Amean    Elapsd-7        21.65 (  0.00%)       21.62 (  0.16%)
      Amean    Elapsd-12       18.58 (  0.00%)       17.94 (  3.43%)
      Amean    Elapsd-21       17.53 (  0.00%)       16.60 (  5.33%)
      Amean    Elapsd-30       17.45 (  0.00%)       17.13 (  1.84%)
      Amean    Elapsd-48       15.40 (  0.00%)       15.27 (  0.82%)
      
      For a single thread, the benchmark completes 43.23% faster with this
      patch applied with smaller benefits as the thread increases.  Similar,
      notice the large reduction in most cases in system CPU usage.  The
      overall CPU time is
      
                     4.4.0       4.4.0
              kcompactd-v1r1 nodefrag-v1r3
      User        10357.65    10438.33
      System       3988.88     3543.94
      Elapsed      2203.01     1634.41
      
      Which is substantial. Now, the reclaim figures
      
                                       4.4.0       4.4.0
                                kcompactd-v1r1nodefrag-v1r3
      Minor Faults                 128458477   278352931
      Major Faults                   2174976         225
      Swap Ins                      16904701           0
      Swap Outs                     17359627           0
      Allocation stalls                43611           0
      DMA allocs                           0           0
      DMA32 allocs                  19832646    19448017
      Normal allocs                614488453   580941839
      Movable allocs                       0           0
      Direct pages scanned          24163800           0
      Kswapd pages scanned                 0           0
      Kswapd pages reclaimed               0           0
      Direct pages reclaimed        20691346           0
      Compaction stalls                42263           0
      Compaction success                 938           0
      Compaction failures              41325           0
      
      This patch eliminates almost all swapping and direct reclaim activity.
      There is still overhead but it's from NUMA balancing which does not
      identify that it's pointless trying to do anything with this workload.
      
      I also tried the thpscale benchmark which forces a corner case where
      compaction can be used heavily and measures the latency of whether base
      or huge pages were used
      
      thpscale Fault Latencies
                                             4.4.0                 4.4.0
                                    kcompactd-v1r1         nodefrag-v1r3
      Amean    fault-base-1      5288.84 (  0.00%)     2817.12 ( 46.73%)
      Amean    fault-base-3      6365.53 (  0.00%)     3499.11 ( 45.03%)
      Amean    fault-base-5      6526.19 (  0.00%)     4363.06 ( 33.15%)
      Amean    fault-base-7      7142.25 (  0.00%)     4858.08 ( 31.98%)
      Amean    fault-base-12    13827.64 (  0.00%)    10292.11 ( 25.57%)
      Amean    fault-base-18    18235.07 (  0.00%)    13788.84 ( 24.38%)
      Amean    fault-base-24    21597.80 (  0.00%)    24388.03 (-12.92%)
      Amean    fault-base-30    26754.15 (  0.00%)    19700.55 ( 26.36%)
      Amean    fault-base-32    26784.94 (  0.00%)    19513.57 ( 27.15%)
      Amean    fault-huge-1      4223.96 (  0.00%)     2178.57 ( 48.42%)
      Amean    fault-huge-3      2194.77 (  0.00%)     2149.74 (  2.05%)
      Amean    fault-huge-5      2569.60 (  0.00%)     2346.95 (  8.66%)
      Amean    fault-huge-7      3612.69 (  0.00%)     2997.70 ( 17.02%)
      Amean    fault-huge-12     3301.75 (  0.00%)     6727.02 (-103.74%)
      Amean    fault-huge-18     6696.47 (  0.00%)     6685.72 (  0.16%)
      Amean    fault-huge-24     8000.72 (  0.00%)     9311.43 (-16.38%)
      Amean    fault-huge-30    13305.55 (  0.00%)     9750.45 ( 26.72%)
      Amean    fault-huge-32     9981.71 (  0.00%)    10316.06 ( -3.35%)
      
      The average time to fault pages is substantially reduced in the majority
      of caseds but with the obvious caveat that fewer THPs are actually used
      in this adverse workload
      
                                         4.4.0                 4.4.0
                                kcompactd-v1r1         nodefrag-v1r3
      Percentage huge-1         0.71 (  0.00%)       14.04 (1865.22%)
      Percentage huge-3        10.77 (  0.00%)       33.05 (206.85%)
      Percentage huge-5        60.39 (  0.00%)       38.51 (-36.23%)
      Percentage huge-7        45.97 (  0.00%)       34.57 (-24.79%)
      Percentage huge-12       68.12 (  0.00%)       40.07 (-41.17%)
      Percentage huge-18       64.93 (  0.00%)       47.82 (-26.35%)
      Percentage huge-24       62.69 (  0.00%)       44.23 (-29.44%)
      Percentage huge-30       43.49 (  0.00%)       55.38 ( 27.34%)
      Percentage huge-32       50.72 (  0.00%)       51.90 (  2.35%)
      
                                       4.4.0       4.4.0
                                kcompactd-v1r1nodefrag-v1r3
      Minor Faults                  37429143    47564000
      Major Faults                      1916        1558
      Swap Ins                          1466        1079
      Swap Outs                      2936863      149626
      Allocation stalls                62510           3
      DMA allocs                           0           0
      DMA32 allocs                   6566458     6401314
      Normal allocs                216361697   216538171
      Movable allocs                       0           0
      Direct pages scanned          25977580       17998
      Kswapd pages scanned                 0     3638931
      Kswapd pages reclaimed               0      207236
      Direct pages reclaimed         8833714          88
      Compaction stalls               103349           5
      Compaction success                 270           4
      Compaction failures             103079           1
      
      Note again that while this does swap as it's an aggressive workload, the
      direct relcim activity and allocation stalls is substantially reduced.
      There is some kswapd activity but ftrace showed that the kswapd activity
      was due to normal wakeups from 4K pages being allocated.
      Compaction-related stalls and activity are almost eliminated.
      
      I also tried the stutter benchmark.  For this, I do not have figures for
      NUMA but it's something that does impact UMA so I'll report what is
      available
      
      stutter
                                       4.4.0                 4.4.0
                              kcompactd-v1r1         nodefrag-v1r3
      Min         mmap      7.3571 (  0.00%)      7.3438 (  0.18%)
      1st-qrtle   mmap      7.5278 (  0.00%)     17.9200 (-138.05%)
      2nd-qrtle   mmap      7.6818 (  0.00%)     21.6055 (-181.25%)
      3rd-qrtle   mmap     11.0889 (  0.00%)     21.8881 (-97.39%)
      Max-90%     mmap     27.8978 (  0.00%)     22.1632 ( 20.56%)
      Max-93%     mmap     28.3202 (  0.00%)     22.3044 ( 21.24%)
      Max-95%     mmap     28.5600 (  0.00%)     22.4580 ( 21.37%)
      Max-99%     mmap     29.6032 (  0.00%)     25.5216 ( 13.79%)
      Max         mmap   4109.7289 (  0.00%)   4813.9832 (-17.14%)
      Mean        mmap     12.4474 (  0.00%)     19.3027 (-55.07%)
      
      This benchmark is trying to fault an anonymous mapping while there is a
      heavy IO load -- a scenario that desktop users used to complain about
      frequently.  This shows a mix because the ideal case of mapping with THP
      is not hit as often.  However, note that 99% of the mappings complete
      13.79% faster.  The CPU usage here is particularly interesting
      
                     4.4.0       4.4.0
              kcompactd-v1r1nodefrag-v1r3
      User           67.50        0.99
      System       1327.88       91.30
      Elapsed      2079.00     2128.98
      
      And once again we look at the reclaim figures
      
                                       4.4.0       4.4.0
                                kcompactd-v1r1nodefrag-v1r3
      Minor Faults                 335241922  1314582827
      Major Faults                       715         819
      Swap Ins                             0           0
      Swap Outs                            0           0
      Allocation stalls               532723           0
      DMA allocs                           0           0
      DMA32 allocs                1822364341  1177950222
      Normal allocs               1815640808  1517844854
      Movable allocs                       0           0
      Direct pages scanned          21892772           0
      Kswapd pages scanned          20015890    41879484
      Kswapd pages reclaimed        19961986    41822072
      Direct pages reclaimed        21892741           0
      Compaction stalls              1065755           0
      Compaction success                 514           0
      Compaction failures            1065241           0
      
      Allocation stalls and all direct reclaim activity is eliminated as well
      as compaction-related stalls.
      
      THP gives impressive gains in some cases but only if they are quickly
      available.  We're not going to reach the point where they are completely
      free so lets take the costs out of the fast paths finally and defer the
      cost to kswapd, kcompactd and khugepaged where it belongs.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      444eb2a4
    • S
      mm: remove unnecessary description about a non-exist gfp flag · b14a1ef5
      Satoru Takeuchi 提交于
      Since __GFP_NOACCOUNT was removed by commit 20b5c303 ("Revert 'gfp:
      add __GFP_NOACCOUNT'"), its description is not necessary.
      Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b14a1ef5
    • J
      mm: scale kswapd watermarks in proportion to memory · 795ae7a0
      Johannes Weiner 提交于
      In machines with 140G of memory and enterprise flash storage, we have
      seen read and write bursts routinely exceed the kswapd watermarks and
      cause thundering herds in direct reclaim.  Unfortunately, the only way
      to tune kswapd aggressiveness is through adjusting min_free_kbytes - the
      system's emergency reserves - which is entirely unrelated to the
      system's latency requirements.  In order to get kswapd to maintain a
      250M buffer of free memory, the emergency reserves need to be set to 1G.
      That is a lot of memory wasted for no good reason.
      
      On the other hand, it's reasonable to assume that allocation bursts and
      overall allocation concurrency scale with memory capacity, so it makes
      sense to make kswapd aggressiveness a function of that as well.
      
      Change the kswapd watermark scale factor from the currently fixed 25% of
      the tunable emergency reserve to a tunable 0.1% of memory.
      
      Beyond 1G of memory, this will produce bigger watermark steps than the
      current formula in default settings.  Ensure that the new formula never
      chooses steps smaller than that, i.e.  25% of the emergency reserve.
      
      On a 140G machine, this raises the default watermark steps - the
      distance between min and low, and low and high - from 16M to 143M.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      795ae7a0
    • K
      mm: cleanup *pte_alloc* interfaces · 3ed3a4f0
      Kirill A. Shutemov 提交于
      There are few things about *pte_alloc*() helpers worth cleaning up:
      
       - 'vma' argument is unused, let's drop it;
      
       - most __pte_alloc() callers do speculative check for pmd_none(),
         before taking ptl: let's introduce pte_alloc() macro which does
         the check.
      
         The only direct user of __pte_alloc left is userfaultfd, which has
         different expectation about atomicity wrt pmd.
      
       - pte_alloc_map() and pte_alloc_map_lock() are redefined using
         pte_alloc().
      
      [sudeep.holla@arm.com: fix build for arm64 hugetlbpage]
      [sfr@canb.auug.org.au: fix arch/arm/mm/mmu.c some more]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ed3a4f0
    • I
      virtio_balloon: export 'available' memory to balloon statistics · 5057dcd0
      Igor Redko 提交于
      Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
      statistics protocol, corresponding to 'Available' in /proc/meminfo.
      
      It indicates to the hypervisor how big the balloon can be inflated
      without pushing the guest system to swap.
      Signed-off-by: NIgor Redko <redkoi@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5057dcd0
    • I
      mm/page_alloc.c: calculate 'available' memory in a separate function · d02bd27b
      Igor Redko 提交于
      Add a new field, VIRTIO_BALLOON_S_AVAIL, to virtio_balloon memory
      statistics protocol, corresponding to 'Available' in /proc/meminfo.
      
      It indicates to the hypervisor how big the balloon can be inflated
      without pushing the guest system to swap.  This metric would be very
      useful in VM orchestration software to improve memory management of
      different VMs under overcommit.
      
      This patch (of 2):
      
      Factor out calculation of the available memory counter into a separate
      exportable function, in order to be able to use it in other parts of the
      kernel.
      
      In particular, it appears a relevant metric to report to the hypervisor
      via virtio-balloon statistics interface (in a followup patch).
      Signed-off-by: NIgor Redko <redkoi@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d02bd27b
    • A
      mm/thp/migration: switch from flush_tlb_range to flush_pmd_tlb_range · 458aa76d
      Aneesh Kumar K.V 提交于
      We remove one instace of flush_tlb_range here.  That was added by commit
      f714f4f2 ("mm: numa: call MMU notifiers on THP migration").  But the
      pmdp_huge_clear_flush_notify should have done the require flush for us.
      Hence remove the extra flush.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      458aa76d
    • K
      mm, tracing: refresh __def_vmaflag_names · bcf66917
      Kirill A. Shutemov 提交于
      Get list of VMA flags up-to-date and sort it to match VM_* definition
      order.
      
      [vbabka@suse.cz: add a note above vmaflag definitions to update the names when changing]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcf66917
    • A
      mm: move max_map_count bits into mm.h · ea606cf5
      Andrey Ryabinin 提交于
      max_map_count sysctl unrelated to scheduler. Move its bits from
      include/linux/sched/sysctl.h to include/linux/mm.h.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea606cf5
    • K
      thp, vmstats: count deferred split events · f9719a03
      Kirill A. Shutemov 提交于
      Count how many times we put a THP in split queue.  Currently, it happens
      on partial unmap of a THP.
      
      Rapidly growing value can indicate that an application behaves
      unfriendly wrt THP: often fault in huge page and then unmap part of it.
      This leads to unnecessary memory fragmentation and the application may
      require tuning.
      
      The event also can help with debugging kernel [mis-]behaviour.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9719a03
    • V
      mm: workingset: make shadow node shrinker memcg aware · 0a6b76dd
      Vladimir Davydov 提交于
      Workingset code was recently made memcg aware, but shadow node shrinker
      is still global.  As a result, one small cgroup can consume all memory
      available for shadow nodes, possibly hurting other cgroups by reclaiming
      their shadow nodes, even though reclaim distances stored in its shadow
      nodes have no effect.  To avoid this, we need to make shadow node
      shrinker memcg aware.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a6b76dd
    • V
      mm: memcontrol: zap memcg_kmem_online helper · b6ecd2de
      Vladimir Davydov 提交于
      As kmem accounting is now either enabled for all cgroups or disabled
      system-wide, there's no point in having memcg_kmem_online() helper -
      instead one can use memcg_kmem_enabled() and mem_cgroup_online(), as
      shrink_slab() now does.
      
      There are only two places left where this helper is used -
      __memcg_kmem_charge() and memcg_create_kmem_cache().  The former can
      only be called if memcg_kmem_enabled() returned true.  Since the cgroup
      it operates on is online, mem_cgroup_is_root() check will be enough.
      
      memcg_create_kmem_cache() can't use mem_cgroup_online() helper instead
      of memcg_kmem_online(), because it relies on the fact that in
      memcg_offline_kmem() memcg->kmem_state is changed before
      memcg_deactivate_kmem_caches() is called, but there we can just
      open-code the check.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6ecd2de
    • D
      include/linux/page-flags.h: force inlining of selected page flag modifications · 4b0f3261
      Denys Vlasenko 提交于
      Sometimes gcc mysteriously doesn't inline
      very small functions we expect to be inlined. See
      
          https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
      
      With this .config:
      http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
      the following functions get deinlined many times.
      Examples of disassembly:
      
      <SetPageUptodate> (43 copies, 141 calls):
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 80 0f 08             lock orb $0x8,(%rdi)
             5d                      pop    %rbp
             c3                      retq
      
      <PagePrivate> (10 copies, 134 calls):
             48 8b 07                mov    (%rdi),%rax
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             48 c1 e8 0b             shr    $0xb,%rax
             83 e0 01                and    $0x1,%eax
             5d                      pop    %rbp
             c3                      retq
      
      This patch fixes this via s/inline/__always_inline/.
      
      Code size decrease after the patch is ~7k:
      
          text     data      bss       dec     hex filename
      92125002 20826048 36417536 149368586 8e72f0a vmlinux
      92118087 20826112 36417536 149361735 8e71447 vmlinux7_pageops_after
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b0f3261
    • D
      bufferhead: force inlining of buffer head flag operations · ee91ef61
      Denys Vlasenko 提交于
      With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously doesn't inline
      very small functions we expect to be inlined.  See
      
          https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
      
      With this .config:
      http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
      set_buffer_foo(), clear_buffer_foo() and similar functions get deinlined
      about 60 times. Examples of disassembly:
      
      <set_buffer_mapped> (14 copies, 43 calls):
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 80 0f 20             lock orb $0x20,(%rdi)
             5d                      pop    %rbp
             c3                      retq
      <buffer_mapped> (3 copies, 34 calls):
             48 8b 07                mov    (%rdi),%rax
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             48 c1 e8 05             shr    $0x5,%rax
             83 e0 01                and    $0x1,%eax
             5d                      pop    %rbp
             c3                      retq
      <set_buffer_new> (5 copies, 13 calls):
             55                      push   %rbp
             48 89 e5                mov    %rsp,%rbp
             f0 80 0f 40             lock orb $0x40,(%rdi)
             5d                      pop    %rbp
             c3                      retq
      
      This patch fixes this via s/inline/__always_inline/.
      This decreases vmlinux by about 3 kbytes.
      
          text	    data	     bss	      dec	    hex	filename
      88200439	19905208	36421632	144527279	89d4faf	vmlinux2
      88197239	19905240	36421632	144524111	89d434f	vmlinux
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee91ef61
    • V
      mm, compaction: introduce kcompactd · 698b1b30
      Vlastimil Babka 提交于
      Memory compaction can be currently performed in several contexts:
      
       - kswapd balancing a zone after a high-order allocation failure
       - direct compaction to satisfy a high-order allocation, including THP
         page fault attemps
       - khugepaged trying to collapse a hugepage
       - manually from /proc
      
      The purpose of compaction is two-fold.  The obvious purpose is to
      satisfy a (pending or future) high-order allocation, and is easy to
      evaluate.  The other purpose is to keep overal memory fragmentation low
      and help the anti-fragmentation mechanism.  The success wrt the latter
      purpose is more
      
      The current situation wrt the purposes has a few drawbacks:
      
       - compaction is invoked only when a high-order page or hugepage is not
         available (or manually).  This might be too late for the purposes of
         keeping memory fragmentation low.
       - direct compaction increases latency of allocations.  Again, it would
         be better if compaction was performed asynchronously to keep
         fragmentation low, before the allocation itself comes.
       - (a special case of the previous) the cost of compaction during THP
         page faults can easily offset the benefits of THP.
       - kswapd compaction appears to be complex, fragile and not working in
         some scenarios.  It could also end up compacting for a high-order
         allocation request when it should be reclaiming memory for a later
         order-0 request.
      
      To improve the situation, we should be able to benefit from an
      equivalent of kswapd, but for compaction - i.e. a background thread
      which responds to fragmentation and the need for high-order allocations
      (including hugepages) somewhat proactively.
      
      One possibility is to extend the responsibilities of kswapd, which could
      however complicate its design too much.  It should be better to let
      kswapd handle reclaim, as order-0 allocations are often more critical
      than high-order ones.
      
      Another possibility is to extend khugepaged, but this kthread is a
      single instance and tied to THP configs.
      
      This patch goes with the option of a new set of per-node kthreads called
      kcompactd, and lays the foundations, without introducing any new
      tunables.  The lifecycle mimics kswapd kthreads, including the memory
      hotplug hooks.
      
      For compaction, kcompactd uses the standard compaction_suitable() and
      ompact_finished() criteria and the deferred compaction functionality.
      Unlike direct compaction, it uses only sync compaction, as there's no
      allocation latency to minimize.
      
      This patch doesn't yet add a call to wakeup_kcompactd.  The kswapd
      compact/reclaim loop for high-order pages will be replaced by waking up
      kcompactd in the next patch with the description of what's wrong with
      the old approach.
      
      Waking up of the kcompactd threads is also tied to kswapd activity and
      follows these rules:
       - we don't want to affect any fastpaths, so wake up kcompactd only from
         the slowpath, as it's done for kswapd
       - if kswapd is doing reclaim, it's more important than compaction, so
         don't invoke kcompactd until kswapd goes to sleep
       - the target order used for kswapd is passed to kcompactd
      
      Future possible future uses for kcompactd include the ability to wake up
      kcompactd on demand in special situations, such as when hugepages are
      not available (currently not done due to __GFP_NO_KSWAPD) or when a
      fragmentation event (i.e.  __rmqueue_fallback()) occurs.  It's also
      possible to perform periodic compaction with kcompactd.
      
      [arnd@arndb.de: fix build errors with kcompactd]
      [paul.gortmaker@windriver.com: don't use modular references for non modular code]
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      698b1b30
    • N
      /proc/kpageflags: return KPF_BUDDY for "tail" buddy pages · 832fc1de
      Naoya Horiguchi 提交于
      Currently /proc/kpageflags returns nothing for "tail" buddy pages, which
      is inconvenient when grasping how free pages are distributed.  This
      patch sets KPF_BUDDY for such pages.
      
      With this patch:
      
        $ grep MemFree /proc/meminfo ; tools/vm/page-types -b buddy
        MemFree:         3134992 kB
                     flags      page-count       MB  symbolic-flags                     long-symbolic-flags
        0x0000000000000400          779272     3044  __________B_______________________________ buddy
        0x0000000000000c00            4385       17  __________BM______________________________ buddy,mmap
                     total          783657     3061
      
      783657 pages is 3134628 kB (roughly consistent with the global counter,)
      so it's OK.
      
      [akpm@linux-foundation.org: update comment, per Naoya]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com&gt;>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      832fc1de
    • V
      mm: memcontrol: report kernel stack usage in cgroup2 memory.stat · 12580e4b
      Vladimir Davydov 提交于
      Show how much memory is allocated to kernel stacks.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12580e4b
    • V
      mm: memcontrol: report slab usage in cgroup2 memory.stat · 27ee57c9
      Vladimir Davydov 提交于
      Show how much memory is used for storing reclaimable and unreclaimable
      in-kernel data structures allocated from slab caches.
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ee57c9
  2. 16 3月, 2016 9 次提交