1. 11 12月, 2021 1 次提交
  2. 17 11月, 2021 1 次提交
  3. 12 11月, 2021 2 次提交
  4. 07 11月, 2021 9 次提交
  5. 27 9月, 2021 15 次提交
  6. 24 9月, 2021 1 次提交
    • S
      memcg: flush lruvec stats in the refault · 1f828223
      Shakeel Butt 提交于
      Prior to the commit 7e1c0d6f ("memcg: switch lruvec stats to rstat")
      and the commit aa48e47e ("memcg: infrastructure to flush memcg
      stats"), each lruvec memcg stats can be off by (nr_cgroups * nr_cpus *
      32) at worst and for unbounded amount of time.  The commit aa48e47e
      moved the lruvec stats to rstat infrastructure and the commit
      7e1c0d6f bounded the error for all the lruvec stats to (nr_cpus *
      32) at worst for at most 2 seconds.  More specifically it decoupled the
      number of stats and the number of cgroups from the error rate.
      
      However this reduction in error comes with the cost of triggering the
      slowpath of stats update more frequently.  Previously in the slowpath
      the kernel adds the stats up the memcg tree.  After aa48e47e, the
      kernel triggers the asyn lruvec stats flush through queue_work().  This
      causes regression reports from 0day kernel bot [1] as well as from
      phoronix test suite [2].
      
      We tried two options to fix the regression:
      
       1) Increase the threshold to trigger the slowpath in lruvec stats
          update codepath from 32 to 512.
      
       2) Remove the slowpath from lruvec stats update codepath and instead
          flush the stats in the page refault codepath. The assumption is that
          the kernel timely flush the stats, so, the update tree would be
          small in the refault codepath to not cause the preformance impact.
      
      Following are the results of will-it-scale/page_fault[1|2|3] benchmark
      on four settings i.e.  (1) 5.15-rc1 as baseline (2) 5.15-rc1 with
      aa48e47e and 7e1c0d6f reverted (3) 5.15-rc1 with option-1
      (4) 5.15-rc1 with option-2.
      
        test       (1)      (2)               (3)               (4)
        pg_f1   368563   406277 (10.23%)   399693  (8.44%)   416398 (12.97%)
        pg_f2   338399   372133  (9.96%)   369180  (9.09%)   381024 (12.59%)
        pg_f3   500853   575399 (14.88%)   570388 (13.88%)   576083 (15.02%)
      
      From the above result, it seems like the option-2 not only solves the
      regression but also improves the performance for at least these
      benchmarks.
      
      Feng Tang (intel) ran the aim7 benchmark with these two options and
      confirms that option-1 reduces the regression but option-2 removes the
      regression.
      
      Michael Larabel (phoronix) ran multiple benchmarks with these options
      and reported the results at [3] and it shows for most benchmarks
      option-2 removes the regression introduced by the commit aa48e47e
      ("memcg: infrastructure to flush memcg stats").
      
      Based on the experiment results, this patch proposed the option-2 as the
      solution to resolve the regression.
      
      Link: https://lore.kernel.org/all/20210726022421.GB21872@xsang-OptiPlex-9020 [1]
      Link: https://www.phoronix.com/scan.php?page=article&item=linux515-compile-regress [2]
      Link: https://openbenchmarking.org/result/2109226-DEBU-LINUX5104 [3]
      Fixes: aa48e47e ("memcg: infrastructure to flush memcg stats")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Tested-by: NMichael Larabel <Michael@phoronix.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>,
      Cc: Michal Koutný <mkoutny@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f828223
  7. 04 9月, 2021 11 次提交