• M
    sched/fair: Do not migrate if the prev_cpu is idle · 806486c3
    Mel Gorman 提交于
    wake_affine_idle() prefers to move a task to the current CPU if the
    wakeup is due to an interrupt. The expectation is that the interrupt
    data is cache hot and relevant to the waking task as well as avoiding
    a search. However, there is no way to determine if there was cache hot
    data on the previous CPU that may exceed the interrupt data. Furthermore,
    round-robin delivery of interrupts can migrate tasks around a socket where
    each CPU is under-utilised.  This can interact badly with cpufreq which
    makes decisions based on per-cpu data. It has been observed on machines
    with HWP that p-states are not boosted to their maximum levels even though
    the workload is latency and throughput sensitive.
    
    This patch uses the previous CPU for the task if it's idle and cache-affine
    with the current CPU even if the current CPU is idle due to the wakup
    being related to the interrupt. This reduces migrations at the cost of
    the interrupt data not being cache hot when the task wakes.
    
    A variety of workloads were tested on various machines and no adverse
    impact was noticed that was outside noise. dbench on ext4 on UMA showed
    roughly 10% reduction in the number of CPU migrations and it is a case
    where interrupts are frequent for IO competions. In most cases, the
    difference in performance is quite small but variability is often
    reduced. For example, this is the result for pgbench running on a UMA
    machine with different numbers of clients.
    
                              4.15.0-rc9             4.15.0-rc9
                                baseline              waprev-v1
     Hmean     1     22096.28 (   0.00%)    22734.86 (   2.89%)
     Hmean     4     74633.42 (   0.00%)    75496.77 (   1.16%)
     Hmean     7    115017.50 (   0.00%)   113030.81 (  -1.73%)
     Hmean     12   126209.63 (   0.00%)   126613.40 (   0.32%)
     Hmean     16   131886.91 (   0.00%)   130844.35 (  -0.79%)
     Stddev    1       636.38 (   0.00%)      417.11 (  34.46%)
     Stddev    4       614.64 (   0.00%)      583.24 (   5.11%)
     Stddev    7       542.46 (   0.00%)      435.45 (  19.73%)
     Stddev    12      173.93 (   0.00%)      171.50 (   1.40%)
     Stddev    16      671.42 (   0.00%)      680.30 (  -1.32%)
     CoeffVar  1         2.88 (   0.00%)        1.83 (  36.26%)
    
    Note that the different in performance is marginal but for low utilisation,
    there is less variability.
    Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
    Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matt Fleming <matt@codeblueprint.co.uk>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Link: http://lkml.kernel.org/r/20180130104555.4125-4-mgorman@techsingularity.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
    806486c3
fair.c 260.6 KB