1. 27 10月, 2018 20 次提交
  2. 18 10月, 2018 1 次提交
  3. 16 10月, 2018 1 次提交
  4. 13 10月, 2018 2 次提交
    • J
      mm/thp: fix call to mmu_notifier in set_pmd_migration_entry() v2 · bfba8e5c
      Jérôme Glisse 提交于
      Inside set_pmd_migration_entry() we are holding page table locks and thus
      we can not sleep so we can not call invalidate_range_start/end()
      
      So remove call to mmu_notifier_invalidate_range_start/end() because they
      are call inside the function calling set_pmd_migration_entry() (see
      try_to_unmap_one()).
      
      Link: http://lkml.kernel.org/r/20181012181056.7864-1-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com>
      Reported-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Acked-by: NMichal Hocko <mhocko@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfba8e5c
    • J
      mm/mmap.c: don't clobber partially overlapping VMA with MAP_FIXED_NOREPLACE · 7aa867dd
      Jann Horn 提交于
      Daniel Micay reports that attempting to use MAP_FIXED_NOREPLACE in an
      application causes that application to randomly crash.  The existing check
      for handling MAP_FIXED_NOREPLACE looks up the first VMA that either
      overlaps or follows the requested region, and then bails out if that VMA
      overlaps *the start* of the requested region.  It does not bail out if the
      VMA only overlaps another part of the requested region.
      
      Fix it by checking that the found VMA only starts at or after the end of
      the requested region, in which case there is no overlap.
      
      Test case:
      
      user@debian:~$ cat mmap_fixed_simple.c
      #include <sys/mman.h>
      #include <errno.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      
      #ifndef MAP_FIXED_NOREPLACE
      #define MAP_FIXED_NOREPLACE 0x100000
      #endif
      
      int main(void) {
        char *p;
      
        errno = 0;
        p = mmap((void*)0x10001000, 0x4000, PROT_NONE,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, -1, 0);
        printf("p1=%p err=%m\n", p);
      
        errno = 0;
        p = mmap((void*)0x10000000, 0x2000, PROT_READ,
      MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED_NOREPLACE, -1, 0);
        printf("p2=%p err=%m\n", p);
      
        char cmd[100];
        sprintf(cmd, "cat /proc/%d/maps", getpid());
        system(cmd);
      
        return 0;
      }
      user@debian:~$ gcc -o mmap_fixed_simple mmap_fixed_simple.c
      user@debian:~$ ./mmap_fixed_simple
      p1=0x10001000 err=Success
      p2=0x10000000 err=Success
      10000000-10002000 r--p 00000000 00:00 0
      10002000-10005000 ---p 00000000 00:00 0
      564a9a06f000-564a9a070000 r-xp 00000000 fe:01 264004
        /home/user/mmap_fixed_simple
      564a9a26f000-564a9a270000 r--p 00000000 fe:01 264004
        /home/user/mmap_fixed_simple
      564a9a270000-564a9a271000 rw-p 00001000 fe:01 264004
        /home/user/mmap_fixed_simple
      564a9a54a000-564a9a56b000 rw-p 00000000 00:00 0                          [heap]
      7f8eba447000-7f8eba5dc000 r-xp 00000000 fe:01 405885
        /lib/x86_64-linux-gnu/libc-2.24.so
      7f8eba5dc000-7f8eba7dc000 ---p 00195000 fe:01 405885
        /lib/x86_64-linux-gnu/libc-2.24.so
      7f8eba7dc000-7f8eba7e0000 r--p 00195000 fe:01 405885
        /lib/x86_64-linux-gnu/libc-2.24.so
      7f8eba7e0000-7f8eba7e2000 rw-p 00199000 fe:01 405885
        /lib/x86_64-linux-gnu/libc-2.24.so
      7f8eba7e2000-7f8eba7e6000 rw-p 00000000 00:00 0
      7f8eba7e6000-7f8eba809000 r-xp 00000000 fe:01 405876
        /lib/x86_64-linux-gnu/ld-2.24.so
      7f8eba9e9000-7f8eba9eb000 rw-p 00000000 00:00 0
      7f8ebaa06000-7f8ebaa09000 rw-p 00000000 00:00 0
      7f8ebaa09000-7f8ebaa0a000 r--p 00023000 fe:01 405876
        /lib/x86_64-linux-gnu/ld-2.24.so
      7f8ebaa0a000-7f8ebaa0b000 rw-p 00024000 fe:01 405876
        /lib/x86_64-linux-gnu/ld-2.24.so
      7f8ebaa0b000-7f8ebaa0c000 rw-p 00000000 00:00 0
      7ffcc99fa000-7ffcc9a1b000 rw-p 00000000 00:00 0                          [stack]
      7ffcc9b44000-7ffcc9b47000 r--p 00000000 00:00 0                          [vvar]
      7ffcc9b47000-7ffcc9b49000 r-xp 00000000 00:00 0                          [vdso]
      ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
        [vsyscall]
      user@debian:~$ uname -a
      Linux debian 4.19.0-rc6+ #181 SMP Wed Oct 3 23:43:42 CEST 2018 x86_64 GNU/Linux
      user@debian:~$
      
      As you can see, the first page of the mapping at 0x10001000 was clobbered.
      
      Link: http://lkml.kernel.org/r/20181010152736.99475-1-jannh@google.com
      Fixes: a4ff8e86 ("mm: introduce MAP_FIXED_NOREPLACE")
      Signed-off-by: NJann Horn <jannh@google.com>
      Reported-by: NDaniel Micay <danielmicay@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7aa867dd
  5. 09 10月, 2018 2 次提交
  6. 08 10月, 2018 1 次提交
  7. 06 10月, 2018 9 次提交
  8. 02 10月, 2018 2 次提交
    • M
      mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration · efaffc5e
      Mel Gorman 提交于
      Rate limiting of page migrations due to automatic NUMA balancing was
      introduced to mitigate the worst-case scenario of migrating at high
      frequency due to false sharing or slowly ping-ponging between nodes.
      Since then, a lot of effort was spent on correctly identifying these
      pages and avoiding unnecessary migrations and the safety net may no longer
      be required.
      
      Jirka Hladky reported a regression in 4.17 due to a scheduler patch that
      avoids spreading STREAM tasks wide prematurely. However, once the task
      was properly placed, it delayed migrating the memory due to rate limiting.
      Increasing the limit fixed the problem for him.
      
      Currently, the limit is hard-coded and does not account for the real
      capabilities of the hardware. Even if an estimate was attempted, it would
      not properly account for the number of memory controllers and it could
      not account for the amount of bandwidth used for normal accesses. Rather
      than fudging, this patch simply eliminates the rate limiting.
      
      However, Jirka reports that a STREAM configuration using multiple
      processes achieved similar performance to 4.16. In local tests, this patch
      improved performance of STREAM relative to the baseline but it is somewhat
      machine-dependent. Most workloads show little or not performance difference
      implying that there is not a heavily reliance on the throttling mechanism
      and it is safe to remove.
      
      STREAM on 2-socket machine
                               4.19.0-rc5             4.19.0-rc5
                               numab-v1r1       noratelimit-v1r1
      MB/sec copy     43298.52 (   0.00%)    44673.38 (   3.18%)
      MB/sec scale    30115.06 (   0.00%)    31293.06 (   3.91%)
      MB/sec add      32825.12 (   0.00%)    34883.62 (   6.27%)
      MB/sec triad    32549.52 (   0.00%)    34906.60 (   7.24%
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux-MM <linux-mm@kvack.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      efaffc5e
    • S
      mm/migrate: Use spin_trylock() while resetting rate limit · 75346121
      Srikar Dronamraju 提交于
      Since this spinlock will only serialize the migrate rate limiting,
      convert the spin_lock() to a spin_trylock(). If another thread is updating, this
      task can move on.
      
      Specjbb2005 results (8 warehouses)
      Higher bops are better
      
      2 Socket - 2  Node Haswell - X86
      JVMS  Prev    Current  %Change
      4     205332  198512   -3.32145
      1     319785  313559   -1.94693
      
      2 Socket - 4 Node Power8 - PowerNV
      JVMS  Prev    Current  %Change
      8     74912   74761.9  -0.200368
      1     206585  214874   4.01239
      
      2 Socket - 2  Node Power9 - PowerNV
      JVMS  Prev    Current  %Change
      4     189162  180536   -4.56011
      1     213760  210281   -1.62753
      
      4 Socket - 4  Node Power7 - PowerVM
      JVMS  Prev     Current  %Change
      8     58736.8  56511.4  -3.78877
      1     105419   104899   -0.49327
      
      Avoiding stretching of window intervals may be the reason for the
      regression. Also code now uses READ_ONCE/WRITE_ONCE. That may
      also be hurting performance to some extent.
      
      Some events stats before and after applying the patch.
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        14,285,708      13,818,546
      migrations                1,180,621       1,149,960
      faults                    339,114         385,583
      cache-misses              55,205,631,894  55,259,546,768
      sched:sched_move_numa     843             2,257
      sched:sched_stick_numa    6               9
      sched:sched_swap_numa     219             512
      migrate:mm_migrate_pages  365             2,225
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        26907   72692
      numa_hint_faults_local  24279   62270
      numa_hit                239771  238762
      numa_huge_pte_updates   0       48
      numa_interleave         68      75
      numa_local              239688  238676
      numa_other              83      86
      numa_pages_migrated     363     2225
      numa_pte_updates        27415   98557
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                     Before          After
      cs                        3,202,779       3,173,490
      migrations                37,186          36,966
      faults                    106,076         108,776
      cache-misses              12,024,873,744  12,200,075,320
      sched:sched_move_numa     931             1,264
      sched:sched_stick_numa    0               0
      sched:sched_swap_numa     1               0
      migrate:mm_migrate_pages  637             899
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Haswell - X86
      Event                   Before  After
      numa_hint_faults        17409   21109
      numa_hint_faults_local  14367   17120
      numa_hit                73953   72934
      numa_huge_pte_updates   20      42
      numa_interleave         25      33
      numa_local              73892   72866
      numa_other              61      68
      numa_pages_migrated     668     915
      numa_pte_updates        27276   42326
      
      perf stats 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before       After
      cs                        8,474,013    8,312,022
      migrations                254,934      231,705
      faults                    320,506      310,242
      cache-misses              110,580,458  402,324,573
      sched:sched_move_numa     725          193
      sched:sched_stick_numa    0            0
      sched:sched_swap_numa     7            3
      migrate:mm_migrate_pages  145          93
      
      vmstat 8th warehouse Multi JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        22797   11838
      numa_hint_faults_local  21539   11216
      numa_hit                89308   90689
      numa_huge_pte_updates   0       0
      numa_interleave         865     1579
      numa_local              88955   89634
      numa_other              353     1055
      numa_pages_migrated     149     92
      numa_pte_updates        22930   12109
      
      perf stats 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                     Before     After
      cs                        2,195,628  2,170,481
      migrations                11,179     10,126
      faults                    149,656    160,962
      cache-misses              8,117,515  10,834,845
      sched:sched_move_numa     49         10
      sched:sched_stick_numa    0          0
      sched:sched_swap_numa     0          0
      migrate:mm_migrate_pages  5          2
      
      vmstat 8th warehouse Single JVM 2 Socket - 2  Node Power9 - PowerNV
      Event                   Before  After
      numa_hint_faults        3577    403
      numa_hint_faults_local  3476    358
      numa_hit                26142   25898
      numa_huge_pte_updates   0       0
      numa_interleave         358     207
      numa_local              26042   25860
      numa_other              100     38
      numa_pages_migrated     5       2
      numa_pte_updates        3587    400
      
      perf stats 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before           After
      cs                        100,602,296      110,339,633
      migrations                4,135,630        4,139,812
      faults                    789,256          863,622
      cache-misses              226,160,621,058  231,838,045,660
      sched:sched_move_numa     1,366            2,196
      sched:sched_stick_numa    16               33
      sched:sched_swap_numa     374              544
      migrate:mm_migrate_pages  1,350            2,469
      
      vmstat 8th warehouse Multi JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        47857   85748
      numa_hint_faults_local  39768   66831
      numa_hit                240165  242213
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              240165  242211
      numa_other              0       2
      numa_pages_migrated     1224    2376
      numa_pte_updates        48354   86233
      
      perf stats 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                     Before          After
      cs                        58,515,496      59,331,057
      migrations                564,845         552,019
      faults                    245,807         266,586
      cache-misses              73,603,757,976  73,796,312,990
      sched:sched_move_numa     996             981
      sched:sched_stick_numa    10              54
      sched:sched_swap_numa     193             286
      migrate:mm_migrate_pages  646             713
      
      vmstat 8th warehouse Single JVM 4 Socket - 4  Node Power7 - PowerVM
      Event                   Before  After
      numa_hint_faults        13422   14807
      numa_hint_faults_local  5619    5738
      numa_hit                36118   36230
      numa_huge_pte_updates   0       0
      numa_interleave         0       0
      numa_local              36116   36228
      numa_other              2       2
      numa_pages_migrated     616     703
      numa_pte_updates        13374   14742
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jirka Hladky <jhladky@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537552141-27815-6-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75346121
  9. 22 9月, 2018 1 次提交
  10. 21 9月, 2018 1 次提交
    • R
      mm: slowly shrink slabs with a relatively small number of objects · 172b06c3
      Roman Gushchin 提交于
      9092c71b ("mm: use sc->priority for slab shrink targets") changed the
      way that the target slab pressure is calculated and made it
      priority-based:
      
          delta = freeable >> priority;
          delta *= 4;
          do_div(delta, shrinker->seeks);
      
      The problem is that on a default priority (which is 12) no pressure is
      applied at all, if the number of potentially reclaimable objects is less
      than 4096 (1<<12).
      
      This causes the last objects on slab caches of no longer used cgroups to
      (almost) never get reclaimed.  It's obviously a waste of memory.
      
      It can be especially painful, if these stale objects are holding a
      reference to a dying cgroup.  Slab LRU lists are reparented on memcg
      offlining, but corresponding objects are still holding a reference to the
      dying cgroup.  If we don't scan these objects, the dying cgroup can't go
      away.  Most likely, the parent cgroup hasn't any directly charged objects,
      only remaining objects from dying children cgroups.  So it can easily hold
      a reference to hundreds of dying cgroups.
      
      If there are no big spikes in memory pressure, and new memory cgroups are
      created and destroyed periodically, this causes the number of dying
      cgroups grow steadily, causing a slow-ish and hard-to-detect memory
      "leak".  It's not a real leak, as the memory can be eventually reclaimed,
      but it could not happen in a real life at all.  I've seen hosts with a
      steadily climbing number of dying cgroups, which doesn't show any signs of
      a decline in months, despite the host is loaded with a production
      workload.
      
      It is an obvious waste of memory, and to prevent it, let's apply a minimal
      pressure even on small shrinker lists.  E.g.  if there are freeable
      objects, let's scan at least min(freeable, scan_batch) objects.
      
      This fix significantly improves a chance of a dying cgroup to be
      reclaimed, and together with some previous patches stops the steady growth
      of the dying cgroups number on some of our hosts.
      
      Link: http://lkml.kernel.org/r/20180905230759.12236-1-guro@fb.com
      Fixes: 9092c71b ("mm: use sc->priority for slab shrink targets")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NRik van Riel <riel@surriel.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      172b06c3