1. 16 12月, 2016 1 次提交
  2. 15 12月, 2016 28 次提交
  3. 14 12月, 2016 1 次提交
  4. 13 12月, 2016 7 次提交
  5. 11 12月, 2016 2 次提交
    • V
      sched/core: Use load_avg for selecting idlest group · 6b94780e
      Vincent Guittot 提交于
      find_idlest_group() only compares the runnable_load_avg when looking
      for the least loaded group. But on fork intensive use case like
      hackbench where tasks blocked quickly after the fork, this can lead to
      selecting the same CPU instead of other CPUs, which have similar
      runnable load but a lower load_avg.
      
      When the runnable_load_avg of 2 CPUs are close, we now take into
      account the amount of blocked load as a 2nd selection factor. There is
      now 3 zones for the runnable_load of the rq:
      
       - [0 .. (runnable_load - imbalance)]:
      	Select the new rq which has significantly less runnable_load
      
       - [(runnable_load - imbalance) .. (runnable_load + imbalance)]:
      	The runnable loads are close so we use load_avg to chose
      	between the 2 rq
      
       - [(runnable_load + imbalance) .. ULONG_MAX]:
      	Keep the current rq which has significantly less runnable_load
      
      The scale factor that is currently used for comparing runnable_load,
      doesn't work well with small value. As an example, the use of a
      scaling factor fails as soon as this_runnable_load == 0 because we
      always select local rq even if min_runnable_load is only 1, which
      doesn't really make sense because they are just the same. So instead
      of scaling factor, we use an absolute margin for runnable_load to
      detect CPUs with similar runnable_load and we keep using scaling
      factor for blocked load.
      
      For use case like hackbench, this enable the scheduler to select
      different CPUs during the fork sequence and to spread tasks across the
      system.
      
      Tests have been done on a Hikey board (ARM based octo cores) for
      several kernel. The result below gives min, max, avg and stdev values
      of 18 runs with each configuration.
      
      The patches depend on the "no missing update_rq_clock()" work.
      
      hackbench -P -g 1
      
               ea86cb4b  7dc603c9  v4.8        v4.8+patches
        min    0.049         0.050         0.051       0,048
        avg    0.057         0.057(0%)     0.057(0%)   0,055(+5%)
        max    0.066         0.068         0.070       0,063
        stdev  +/-9%         +/-9%         +/-8%       +/-9%
      
      More performance numbers here:
      
        https://lkml.kernel.org/r/20161203214707.GI20785@codeblueprint.co.ukTested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: kernellwp@gmail.com
      Cc: umgwanakikbuti@gmail.com
      Cc: yuyang.du@intel.comc
      Link: http://lkml.kernel.org/r/1481216215-24651-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b94780e
    • V
      sched/core: Fix find_idlest_group() for fork · f519a3f1
      Vincent Guittot 提交于
      During fork, the utilization of a task is init once the rq has been
      selected because the current utilization level of the rq is used to
      set the utilization of the fork task. As the task's utilization is
      still 0 at this step of the fork sequence, it doesn't make sense to
      look for some spare capacity that can fit the task's utilization.
      Furthermore, I can see perf regressions for the test:
      
         hackbench -P -g 1
      
      because the least loaded policy is always bypassed and tasks are not
      spread during fork.
      
      With this patch and the fix below, we are back to same performances as
      for v4.8. The fix below is only a temporary one used for the test
      until a smarter solution is found because we can't simply remove the
      test which is useful for others benchmarks
      
      | @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
      |
      |	avg_cost = this_sd->avg_scan_cost;
      |
      | -	/*
      | -	 * Due to large variance we need a large fuzz factor; hackbench in
      | -	 * particularly is sensitive here.
      | -	 */
      | -	if ((avg_idle / 512) < avg_cost)
      | -		return -1;
      | -
      |	time = local_clock();
      |
      |	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
      Tested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: kernellwp@gmail.com
      Cc: umgwanakikbuti@gmail.com
      Cc: yuyang.du@intel.comc
      Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f519a3f1
  6. 09 12月, 2016 1 次提交