1. 09 6月, 2020 7 次提交
  2. 08 6月, 2020 1 次提交
  3. 28 5月, 2020 1 次提交
  4. 26 5月, 2020 1 次提交
  5. 07 5月, 2020 1 次提交
  6. 06 5月, 2020 2 次提交
  7. 01 5月, 2020 1 次提交
  8. 26 4月, 2020 2 次提交
  9. 24 4月, 2020 9 次提交
  10. 23 4月, 2020 1 次提交
    • M
      mm, compaction: capture a page under direct compaction · 35d915be
      Mel Gorman 提交于
      to #26255339
      
      commit 5e1f0f098b4649fad53011246bcaeff011ffdf5d upstream
      
      Compaction is inherently race-prone as a suitable page freed during
      compaction can be allocated by any parallel task.  This patch uses a
      capture_control structure to isolate a page immediately when it is freed
      by a direct compactor in the slow path of the page allocator.  The
      intent is to avoid redundant scanning.
      
                                           5.0.0-rc1              5.0.0-rc1
                                     selective-v3r17          capture-v3r19
      Amean     fault-both-1         0.00 (   0.00%)        0.00 *   0.00%*
      Amean     fault-both-3      2582.11 (   0.00%)     2563.68 (   0.71%)
      Amean     fault-both-5      4500.26 (   0.00%)     4233.52 (   5.93%)
      Amean     fault-both-7      5819.53 (   0.00%)     6333.65 (  -8.83%)
      Amean     fault-both-12     9321.18 (   0.00%)     9759.38 (  -4.70%)
      Amean     fault-both-18     9782.76 (   0.00%)    10338.76 (  -5.68%)
      Amean     fault-both-24    15272.81 (   0.00%)    13379.55 *  12.40%*
      Amean     fault-both-30    15121.34 (   0.00%)    16158.25 (  -6.86%)
      Amean     fault-both-32    18466.67 (   0.00%)    18971.21 (  -2.73%)
      
      Latency is only moderately affected but the devil is in the details.  A
      closer examination indicates that base page fault latency is reduced but
      latency of huge pages is increased as it takes creater care to succeed.
      Part of the "problem" is that allocation success rates are close to 100%
      even when under pressure and compaction gets harder
      
                                      5.0.0-rc1              5.0.0-rc1
                                selective-v3r17          capture-v3r19
      Percentage huge-3        96.70 (   0.00%)       98.23 (   1.58%)
      Percentage huge-5        96.99 (   0.00%)       95.30 (  -1.75%)
      Percentage huge-7        94.19 (   0.00%)       97.24 (   3.24%)
      Percentage huge-12       94.95 (   0.00%)       97.35 (   2.53%)
      Percentage huge-18       96.74 (   0.00%)       97.30 (   0.58%)
      Percentage huge-24       97.07 (   0.00%)       97.55 (   0.50%)
      Percentage huge-30       95.69 (   0.00%)       98.50 (   2.95%)
      Percentage huge-32       96.70 (   0.00%)       99.27 (   2.65%)
      
      And scan rates are reduced as expected by 6% for the migration scanner
      and 29% for the free scanner indicating that there is less redundant
      work.
      
      Compaction migrate scanned    20815362    19573286
      Compaction free scanned       16352612    11510663
      
      [mgorman@techsingularity.net: remove redundant check]
        Link: http://lkml.kernel.org/r/20190201143853.GH9565@techsingularity.net
      Link: http://lkml.kernel.org/r/20190118175136.31341-23-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      35d915be
  11. 22 4月, 2020 4 次提交
    • C
      sysctl: handle overflow in proc_get_long · 662ef34f
      Christian Brauner 提交于
      fix #27124689
      
      commit 7f2923c4f73f21cfd714d12a2d48de8c21f11cfe upstream.
      
      proc_get_long() is a funny function.  It uses simple_strtoul() and for a
      good reason.  proc_get_long() wants to always succeed the parse and
      return the maybe incorrect value and the trailing characters to check
      against a pre-defined list of acceptable trailing values.  However,
      simple_strtoul() explicitly ignores overflows which can cause funny
      things like the following to happen:
      
        echo 18446744073709551616 > /proc/sys/fs/file-max
        cat /proc/sys/fs/file-max
        0
      
      (Which will cause your system to silently die behind your back.)
      
      On the other hand kstrtoul() does do overflow detection but does not
      return the trailing characters, and also fails the parse when anything
      other than '\n' is a trailing character whereas proc_get_long() wants to
      be more lenient.
      
      Now, before adding another kstrtoul() function let's simply add a static
      parse strtoul_lenient() which:
       - fails on overflow with -ERANGE
       - returns the trailing characters to the caller
      
      The reason why we should fail on ERANGE is that we already do a partial
      fail on overflow right now.  Namely, when the TMPBUFLEN is exceeded.  So
      we already reject values such as 184467440737095516160 (21 chars) but
      accept values such as 18446744073709551616 (20 chars) but both are
      overflows.  So we should just always reject 64bit overflows and not
      special-case this based on the number of chars.
      
      Link: http://lkml.kernel.org/r/20190107222700.15954-2-christian@brauner.ioSigned-off-by: NChristian Brauner <christian@brauner.io>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      662ef34f
    • M
      sched: Avoid scale real weight down to zero · 9b83fd88
      Michael Wang 提交于
      fix #26198889
      
      commit 26cf52229efc87e2effa9d788f9b33c40fb3358a linux-next
      
      During our testing, we found a case that shares no longer
      working correctly, the cgroup topology is like:
      
        /sys/fs/cgroup/cpu/A		(shares=102400)
        /sys/fs/cgroup/cpu/A/B	(shares=2)
        /sys/fs/cgroup/cpu/A/B/C	(shares=1024)
      
        /sys/fs/cgroup/cpu/D		(shares=1024)
        /sys/fs/cgroup/cpu/D/E	(shares=1024)
        /sys/fs/cgroup/cpu/D/E/F	(shares=1024)
      
      The same benchmark is running in group C & F, no other tasks are
      running, the benchmark is capable to consumed all the CPUs.
      
      We suppose the group C will win more CPU resources since it could
      enjoy all the shares of group A, but it's F who wins much more.
      
      The reason is because we have group B with shares as 2, since
      A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus,
      so A->cfs_rq.load.weight become very small.
      
      And in calc_group_shares() we calculate shares as:
      
        load = max(scale_load_down(cfs_rq->load.weight),
      cfs_rq->avg.load_avg);
        shares = (tg_shares * load) / tg_weight;
      
      Since the 'cfs_rq->load.weight' is too small, the load become 0
      after scale down, although 'tg_shares' is 102400, shares of the se
      which stand for group A on root cfs_rq become 2.
      
      While the se of D on root cfs_rq is far more bigger than 2, so it
      wins the battle.
      
      Thus when scale_load_down() scale real weight down to 0, it's no
      longer telling the real story, the caller will have the wrong
      information and the calculation will be buggy.
      
      This patch add check in scale_load_down(), so the real weight will
      be >= MIN_SHARES after scale, after applied the group C wins as
      expected.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.comAcked-by: NShanpei Chen <shanpeic@linux.alibaba.com>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      9b83fd88
    • H
      sched/fair: Fix race between runtime distribution and assignment · 70a23044
      Huaixin Chang 提交于
      fix #25892693
      
      commit 26a8b12747c975b33b4a82d62e4a307e1c07f31b upstream
      
      Currently, there is a potential race between distribute_cfs_runtime()
      and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read,
      distributes without holding lock and finds out there is not enough
      runtime to charge against after distribution. Because
      assign_cfs_rq_runtime() might be called during distribution, and use
      cfs_b->runtime at the same time.
      
      Fibtest is the tool to test this race. Assume all gcfs_rq is throttled
      and cfs period timer runs, slow threads might run and sleep, returning
      unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local
      pool. If all this happens sufficiently quickly, cfs_b->runtime will drop
      a lot. If runtime distributed is large too, over-use of runtime happens.
      
      A runtime over-using by about 70 percent of quota is seen when we
      test fibtest on a 96-core machine. We run fibtest with 1 fast thread and
      95 slow threads in test group, configure 10ms quota for this group and
      see the CPU usage of fibtest is 17.0%, which is far from than the
      expected 10%.
      
      On a smaller machine with 32 cores, we also run fibtest with 96
      threads. CPU usage is more than 12%, which is also more than expected
      10%. This shows that on similar workloads, this race do affect CPU
      bandwidth control.
      
      Solve this by holding lock inside distribute_cfs_runtime().
      
      Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
      Signed-off-by: NHuaixin Chang <changhuaixin@linux.alibaba.com>
      Reviewed-by: NBen Segall <bsegall@google.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      70a23044
    • X
      alinux: cgroup: Fix task_css_check rcu warnings · 798cfa76
      Xunlei Pang 提交于
      to #26424323
      
      task_css() should be protected by rcu, fix several callers.
      
      Fixes: 1f49a738 ("alinux: psi: Support PSI under cgroup v1")
      Acked-by: NMichael Wang <yun.wany@linux.alibaba.com>
      Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NYang Shi <yang.shi@linux.alibaba.com>
      798cfa76
  12. 17 4月, 2020 1 次提交
  13. 18 3月, 2020 9 次提交