You need to sign in or sign up before continuing.
  1. 13 10月, 2022 2 次提交
  2. 04 10月, 2022 21 次提交
  3. 27 9月, 2022 2 次提交
  4. 12 9月, 2022 8 次提交
  5. 30 7月, 2022 2 次提交
  6. 18 7月, 2022 5 次提交
    • U
      mm/page_alloc: use try_cmpxchg in set_pfnblock_flags_mask · 04ec0061
      Uros Bizjak 提交于
      Use try_cmpxchg instead of cmpxchg in set_pfnblock_flags_mask.  x86
      CMPXCHG instruction returns success in ZF flag, so this change saves a
      compare after cmpxchg (and related move instruction in front of cmpxchg). 
      The main loop improves from:
      
          1c5d:	48 89 c2             	mov    %rax,%rdx
          1c60:	48 89 c1             	mov    %rax,%rcx
          1c63:	48 21 fa             	and    %rdi,%rdx
          1c66:	4c 09 c2             	or     %r8,%rdx
          1c69:	f0 48 0f b1 16       	lock cmpxchg %rdx,(%rsi)
          1c6e:	48 39 c1             	cmp    %rax,%rcx
          1c71:	75 ea                	jne    1c5d <...>
      
      to:
      
          1c60:	48 89 ca             	mov    %rcx,%rdx
          1c63:	48 21 c2             	and    %rax,%rdx
          1c66:	4c 09 c2             	or     %r8,%rdx
          1c69:	f0 48 0f b1 16       	lock cmpxchg %rdx,(%rsi)
          1c6e:	75 f0                	jne    1c60 <...>
      
      Link: https://lkml.kernel.org/r/20220708140736.8737-1-ubizjak@gmail.comSigned-off-by: NUros Bizjak <ubizjak@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      04ec0061
    • G
      mm, hugetlb: skip irrelevant nodes in show_free_areas() · dcadcf1c
      Gang Li 提交于
      show_free_areas() allows to filter out node specific data which is
      irrelevant to the allocation request.  But hugetlb_show_meminfo() still
      shows hugetlb on all nodes, which is redundant and unnecessary.
      
      Use show_mem_node_skip() to skip irrelevant nodes.  And replace
      hugetlb_show_meminfo() with hugetlb_show_meminfo_node(nid).
      
      before-and-after sample output of OOM:
      
      before:
      ```
      [  214.362453] Node 1 active_anon:148kB inactive_anon:4050920kB active_file:112kB inactive_file:100kB
      [  214.375429] Node 1 Normal free:45100kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig
      [  214.388334] lowmem_reserve[]: 0 0 0 0 0
      [  214.390251] Node 1 Normal: 423*4kB (UE) 320*8kB (UME) 187*16kB (UE) 117*32kB (UE) 57*64kB (UME) 20
      [  214.397626] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [  214.401518] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      ```
      
      after:
      ```
      [  145.069705] Node 1 active_anon:128kB inactive_anon:4049412kB active_file:56kB inactive_file:84kB u
      [  145.110319] Node 1 Normal free:45424kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig
      [  145.152315] lowmem_reserve[]: 0 0 0 0 0
      [  145.155244] Node 1 Normal: 470*4kB (UME) 373*8kB (UME) 247*16kB (UME) 168*32kB (UE) 86*64kB (UME)
      [  145.164119] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      ```
      
      Link: https://lkml.kernel.org/r/20220706034655.1834-1-ligang.bdlg@bytedance.comSigned-off-by: NGang Li <ligang.bdlg@bytedance.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      dcadcf1c
    • M
      mm/page_alloc: replace local_lock with normal spinlock · 01b44456
      Mel Gorman 提交于
      struct per_cpu_pages is no longer strictly local as PCP lists can be
      drained remotely using a lock for protection.  While the use of local_lock
      works, it goes against the intent of local_lock which is for "pure CPU
      local concurrency control mechanisms and not suited for inter-CPU
      concurrency control" (Documentation/locking/locktypes.rst)
      
      local_lock protects against migration between when the percpu pointer is
      accessed and the pcp->lock acquired.  The lock acquisition is a preemption
      point so in the worst case, a task could migrate to another NUMA node and
      accidentally allocate remote memory.  The main requirement is to pin the
      task to a CPU that is suitable for PREEMPT_RT and !PREEMPT_RT.
      
      Replace local_lock with helpers that pin a task to a CPU, lookup the
      per-cpu structure and acquire the embedded lock.  It's similar to
      local_lock without breaking the intent behind the API.  It is not a
      complete API as only the parts needed for PCP-alloc are implemented but in
      theory, the generic helpers could be promoted to a general API if there
      was demand for an embedded lock within a per-cpu struct with a guarantee
      that the per-cpu structure locked matches the running CPU and cannot use
      get_cpu_var due to RT concerns.  PCP requires these semantics to avoid
      accidentally allocating remote memory.
      
      [mgorman@techsingularity.net: use pcp_spin_trylock_irqsave instead of pcpu_spin_trylock_irqsave]
        Link: https://lkml.kernel.org/r/20220627084645.GA27531@techsingularity.net
      Link: https://lkml.kernel.org/r/20220624125423.6126-8-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      01b44456
    • N
      mm/page_alloc: remotely drain per-cpu lists · 443c2acc
      Nicolas Saenz Julienne 提交于
      Some setups, notably NOHZ_FULL CPUs, are too busy to handle the per-cpu
      drain work queued by __drain_all_pages().  So introduce a new mechanism to
      remotely drain the per-cpu lists.  It is made possible by remotely locking
      'struct per_cpu_pages' new per-cpu spinlocks.  A benefit of this new
      scheme is that drain operations are now migration safe.
      
      There was no observed performance degradation vs.  the previous scheme. 
      Both netperf and hackbench were run in parallel to triggering the
      __drain_all_pages(NULL, true) code path around ~100 times per second.  The
      new scheme performs a bit better (~5%), although the important point here
      is there are no performance regressions vs.  the previous mechanism. 
      Per-cpu lists draining happens only in slow paths.
      
      Minchan Kim tested an earlier version and reported;
      
      	My workload is not NOHZ CPUs but run apps under heavy memory
      	pressure so they goes to direct reclaim and be stuck on
      	drain_all_pages until work on workqueue run.
      
      	unit: nanosecond
      	max(dur)        avg(dur)                count(dur)
      	166713013       487511.77786438033      1283
      
      	From traces, system encountered the drain_all_pages 1283 times and
      	worst case was 166ms and avg was 487us.
      
      	The other problem was alloc_contig_range in CMA. The PCP draining
      	takes several hundred millisecond sometimes though there is no
      	memory pressure or a few of pages to be migrated out but CPU were
      	fully booked.
      
      	Your patch perfectly removed those wasted time.
      
      Link: https://lkml.kernel.org/r/20220624125423.6126-7-mgorman@techsingularity.netSigned-off-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      443c2acc
    • M
      mm/page_alloc: protect PCP lists with a spinlock · 4b23a68f
      Mel Gorman 提交于
      Currently the PCP lists are protected by using local_lock_irqsave to
      prevent migration and IRQ reentrancy but this is inconvenient.  Remote
      draining of the lists is impossible and a workqueue is required and every
      task allocation/free must disable then enable interrupts which is
      expensive.
      
      As preparation for dealing with both of those problems, protect the
      lists with a spinlock.  The IRQ-unsafe version of the lock is used
      because IRQs are already disabled by local_lock_irqsave.  spin_trylock
      is used in combination with local_lock_irqsave() but later will be
      replaced with a spin_trylock_irqsave when the local_lock is removed.
      
      The per_cpu_pages still fits within the same number of cache lines after
      this patch relative to before the series.
      
      struct per_cpu_pages {
              spinlock_t                 lock;                 /*     0     4 */
              int                        count;                /*     4     4 */
              int                        high;                 /*     8     4 */
              int                        batch;                /*    12     4 */
              short int                  free_factor;          /*    16     2 */
              short int                  expire;               /*    18     2 */
      
              /* XXX 4 bytes hole, try to pack */
      
              struct list_head           lists[13];            /*    24   208 */
      
              /* size: 256, cachelines: 4, members: 7 */
              /* sum members: 228, holes: 1, sum holes: 4 */
              /* padding: 24 */
      } __attribute__((__aligned__(64)));
      
      There is overhead in the fast path due to acquiring the spinlock even
      though the spinlock is per-cpu and uncontended in the common case.  Page
      Fault Test (PFT) running on a 1-socket reported the following results on a
      1 socket machine.
      
                                           5.19.0-rc3               5.19.0-rc3
                                              vanilla      mm-pcpspinirq-v5r16
      Hmean     faults/sec-1   869275.7381 (   0.00%)   874597.5167 *   0.61%*
      Hmean     faults/sec-3  2370266.6681 (   0.00%)  2379802.0362 *   0.40%*
      Hmean     faults/sec-5  2701099.7019 (   0.00%)  2664889.7003 *  -1.34%*
      Hmean     faults/sec-7  3517170.9157 (   0.00%)  3491122.8242 *  -0.74%*
      Hmean     faults/sec-8  3965729.6187 (   0.00%)  3939727.0243 *  -0.66%*
      
      There is a small hit in the number of faults per second but given that the
      results are more stable, it's borderline noise.
      
      [akpm@linux-foundation.org: add missing local_unlock_irqrestore() on contention path]
      Link: https://lkml.kernel.org/r/20220624125423.6126-6-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NYu Zhao <yuzhao@google.com>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4b23a68f