1. 10 1月, 2018 8 次提交
    • J
      sched/fair: Correct obsolete comment about cpufreq_update_util() · 9783be2c
      Joel Fernandes 提交于
      Since the remote cpufreq callback work, the cpufreq_update_util() call can happen
      from remote CPUs. The comment about local CPUs is thus obsolete. Update it
      accordingly.
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: Android Kernel <kernel-team@android.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: EAS Dev <eas-dev@lists.linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rohit Jain <rohit.k.jain@oracle.com>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20171215153944.220146-2-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9783be2c
    • J
      sched/fair: Remove impossible condition from find_idlest_group_cpu() · 18cec7e0
      Joel Fernandes 提交于
      find_idlest_group_cpu() goes through CPUs of a group previous selected by
      find_idlest_group(). find_idlest_group() returns NULL if the local group is the
      selected one and doesn't execute find_idlest_group_cpu if the group to which
      'cpu' belongs to is chosen. So we're always guaranteed to call
      find_idlest_group_cpu() with a group to which 'cpu' is non-local.
      
      This makes one of the conditions in find_idlest_group_cpu() an impossible one,
      which we can get rid off.
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Cc: Android Kernel <kernel-team@android.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: EAS Dev <eas-dev@lists.linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Rohit Jain <rohit.k.jain@oracle.com>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171215153944.220146-3-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      18cec7e0
    • V
      sched/cpufreq: Don't pass flags to sugov_set_iowait_boost() · 5083452f
      Viresh Kumar 提交于
      We are already passing sg_cpu as argument to sugov_set_iowait_boost()
      helper and the same can be used to retrieve the flags value. Get rid of
      the redundant argument.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: juri.lelli@redhat.com
      Cc: morten.rasmussen@arm.com
      Cc: tkjos@android.com
      Link: http://lkml.kernel.org/r/4ec5562b1a87e146ebab11fb5dde1ca9c763a7fb.1513158452.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5083452f
    • V
      sched/cpufreq: Initialize sg_cpu->flags to 0 · 6257e704
      Viresh Kumar 提交于
      Initializing sg_cpu->flags to SCHED_CPUFREQ_RT has no obvious benefit.
      The flags field wouldn't be used until the utilization update handler is
      called for the first time, and once that is called we will overwrite
      flags anyway.
      
      Initialize it to 0.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NJuri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: dietmar.eggemann@arm.com
      Cc: joelaf@google.com
      Cc: morten.rasmussen@arm.com
      Cc: tkjos@android.com
      Link: http://lkml.kernel.org/r/763feda6424ced8486b25a0c52979634e6104478.1513158452.git.viresh.kumar@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6257e704
    • J
      sched/fair: Consider RT/IRQ pressure in capacity_spare_wake() · f453ae22
      Joel Fernandes 提交于
      capacity_spare_wake() in the slow path influences choice of idlest groups,
      as we search for groups with maximum spare capacity. In scenarios where
      RT pressure is high, a sub optimal group can be chosen and hurt
      performance of the task being woken up.
      
      Fix this by using capacity_of() instead of capacity_orig_of() in capacity_spare_wake().
      
      Tests results from improvements with this change are below. More tests
      were also done by myself and Matt Fleming to ensure no degradation in
      different benchmarks.
      
      1) Rohit ran barrier.c test (details below) with following improvements:
      ------------------------------------------------------------------------
      This was Rohit's original use case for a patch he posted at [1] however
      from his recent tests he showed my patch can replace his slow path
      changes [1] and there's no need to selectively scan/skip CPUs in
      find_idlest_group_cpu in the slow path to get the improvement he sees.
      
      barrier.c (open_mp code) as a micro-benchmark. It does a number of
      iterations and barrier sync at the end of each for loop.
      
      Here barrier,c is running in along with ping on CPU 0 and 1 as:
      'ping -l 10000 -q -s 10 -f hostX'
      
      barrier.c can be found at:
      http://www.spinics.net/lists/kernel/msg2506955.html
      
      Following are the results for the iterations per second with this
      micro-benchmark (higher is better), on a 44 core, 2 socket 88 Threads
      Intel x86 machine:
      +--------+------------------+---------------------------+
      |Threads | Without patch    | With patch                |
      |        |                  |                           |
      +--------+--------+---------+-----------------+---------+
      |        | Mean   | Std Dev | Mean            | Std Dev |
      +--------+--------+---------+-----------------+---------+
      |1       | 539.36 | 60.16   | 572.54 (+6.15%) | 40.95   |
      |2       | 481.01 | 19.32   | 530.64 (+10.32%)| 56.16   |
      |4       | 474.78 | 22.28   | 479.46 (+0.99%) | 18.89   |
      |8       | 450.06 | 24.91   | 447.82 (-0.50%) | 12.36   |
      |16      | 436.99 | 22.57   | 441.88 (+1.12%) | 7.39    |
      |32      | 388.28 | 55.59   | 429.4  (+10.59%)| 31.14   |
      |64      | 314.62 | 6.33    | 311.81 (-0.89%) | 11.99   |
      +--------+--------+---------+-----------------+---------+
      
      2) ping+hackbench test on bare-metal sever (by Rohit)
      -----------------------------------------------------
      Here hackbench is running in threaded mode along
      with, running ping on CPU 0 and 1 as:
      'ping -l 10000 -q -s 10 -f hostX'
      
      This test is running on 2 socket, 20 core and 40 threads Intel x86
      machine:
      Number of loops is 10000 and runtime is in seconds (Lower is better).
      
      +--------------+-----------------+--------------------------+
      |Task Groups   | Without patch   |  With patch              |
      |              +-------+---------+----------------+---------+
      |(Groups of 40)| Mean  | Std Dev |  Mean          | Std Dev |
      +--------------+-------+---------+----------------+---------+
      |1             | 0.851 | 0.007   |  0.828 (+2.77%)| 0.032   |
      |2             | 1.083 | 0.203   |  1.087 (-0.37%)| 0.246   |
      |4             | 1.601 | 0.051   |  1.611 (-0.62%)| 0.055   |
      |8             | 2.837 | 0.060   |  2.827 (+0.35%)| 0.031   |
      |16            | 5.139 | 0.133   |  5.107 (+0.63%)| 0.085   |
      |25            | 7.569 | 0.142   |  7.503 (+0.88%)| 0.143   |
      +--------------+-------+---------+----------------+---------+
      
      [1] https://patchwork.kernel.org/patch/9991635/
      
      Matt Fleming also ran several different hackbench tests and cyclic test
      to santiy-check that the patch doesn't harm other usecases.
      Tested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Tested-by: NRohit Jain <rohit.k.jain@oracle.com>
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Atish Patra <atish.patra@oracle.com>
      Cc: Brendan Jackman <brendan.jackman@arm.com>
      Cc: Chris Redpath <Chris.Redpath@arm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Ramussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Saravana Kannan <skannan@quicinc.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171214212158.188190-1-joelaf@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f453ae22
    • P
      sched/fair: Use 'unsigned long' for utilization, consistently · f01415fd
      Patrick Bellasi 提交于
      Utilization and capacity are tracked as 'unsigned long', however some
      functions using them return an 'int' which is ultimately assigned back to
      'unsigned long' variables.
      
      Since there is not scope on using a different and signed type,
      consolidate the signature of functions returning utilization to always
      use the native type.
      
      This change improves code consistency, and it also benefits
      code paths where utilizations should be clamped by avoiding
      further type conversions or ugly type casts.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NChris Redpath <chris.redpath@arm.com>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20171205171018.9203-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f01415fd
    • R
      sched/core: Rework and clarify prepare_lock_switch() · 31cb1bc0
      rodrigosiqueira 提交于
      The prepare_lock_switch() function has an unused parameter, and also the
      function name was not descriptive. To improve readability and remove
      the extra parameter, do the following changes:
      
      * Move prepare_lock_switch() from kernel/sched/sched.h to
        kernel/sched/core.c, rename it to prepare_task(), and remove the
        unused parameter.
      
      * Split the smp_store_release() out from finish_lock_switch() to a
        function named finish_task.
      
      * Comments ajdustments.
      Signed-off-by: NRodrigo Siqueira <rodrigosiqueiramelo@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215140603.gxe5i2y6fg5ojfpp@smtp.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      31cb1bc0
    • M
      membarrier: Disable preemption when calling smp_call_function_many() · 54167607
      Mathieu Desnoyers 提交于
      smp_call_function_many() requires disabling preemption around the call.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215192310.25293-1-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54167607
  2. 28 12月, 2017 1 次提交
  3. 15 12月, 2017 1 次提交
    • S
      sched/rt: Do not pull from current CPU if only one CPU to pull · f73c52a5
      Steven Rostedt 提交于
      Daniel Wagner reported a crash on the BeagleBone Black SoC.
      
      This is a single CPU architecture, and does not have a functional
      arch_send_call_function_single_ipi() implementation which can crash
      the kernel if that is called.
      
      As it only has one CPU, it shouldn't be called, but if the kernel is
      compiled for SMP, the push/pull RT scheduling logic now calls it for
      irq_work if the one CPU is overloaded, it can use that function to call
      itself and crash the kernel.
      
      Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
      only has a single CPU. But SCHED_FEAT is a constant if sched debugging
      is turned off. Another fix can also be used, and this should also help
      with normal SMP machines. That is, do not initiate the pull code if
      there's only one RT overloaded CPU, and that CPU happens to be the
      current CPU that is scheduling in a lower priority task.
      
      Even on a system with many CPUs, if there's many RT tasks waiting to
      run on a single CPU, and that CPU schedules in another RT task of lower
      priority, it will initiate the PULL logic in case there's a higher
      priority RT task on another CPU that is waiting to run. But if there is
      no other CPU with waiting RT tasks, it will initiate the RT pull logic
      on itself (as it still has RT tasks waiting to run). This is a wasted
      effort.
      
      Not only does this help with SMP code where the current CPU is the only
      one with RT overloaded tasks, it should also solve the issue that
      Daniel encountered, because it will prevent the PULL logic from
      executing, as there's only one CPU on the system, and the check added
      here will cause it to exit the RT pull code.
      Reported-by: NDaniel Wagner <wagi@monom.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic")
      Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f73c52a5
  4. 11 12月, 2017 1 次提交
  5. 08 12月, 2017 1 次提交
  6. 07 12月, 2017 2 次提交
    • V
      sched/fair: Update and fix the runnable propagation rule · a4c3c049
      Vincent Guittot 提交于
      Unlike running, the runnable part can't be directly propagated through
      the hierarchy when we migrate a task. The main reason is that runnable
      time can be shared with other sched_entities that stay on the rq and
      this runnable time will also remain on prev cfs_rq and must not be
      removed.
      
      Instead, we can estimate what should be the new runnable of the prev
      cfs_rq and check that this estimation stay in a possible range. The
      prop_runnable_sum is a good estimation when adding runnable_sum but
      fails most often when we remove it. Instead, we could use the formula
      below instead:
      
        gcfs_rq's runnable_sum = gcfs_rq->avg.load_sum / gcfs_rq->load.weight
      
      which assumes that tasks are equally runnable which is not true but
      easy to compute.
      
      Beside these estimates, we have several simple rules that help us to filter
      out wrong ones:
      
       - ge->avg.runnable_sum <= than LOAD_AVG_MAX
       - ge->avg.runnable_sum >= ge->avg.running_sum (ge->avg.util_sum << LOAD_AVG_MAX)
       - ge->avg.runnable_sum can't increase when we detach a task
      
      The effect of these fixes is better cgroups balancing.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/1510842112-21028-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a4c3c049
    • O
      sched/wait: Fix add_wait_queue() behavioral change · c6b9d9a3
      Omar Sandoval 提交于
      The following cleanup commit:
      
        50816c48 ("sched/wait: Standardize internal naming of wait-queue entries")
      
      ... unintentionally changed the behavior of add_wait_queue() from
      inserting the wait entry at the head of the wait queue to the tail
      of the wait queue.
      
      Beyond a negative performance impact this change in behavior
      theoretically also breaks wait queues which mix exclusive and
      non-exclusive waiters, as non-exclusive waiters will not be
      woken up if they are queued behind enough exclusive waiters.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-team@fb.com
      Fixes: ("sched/wait: Standardize internal naming of wait-queue entries")
      Link: http://lkml.kernel.org/r/a16c8ccffd39bd08fdaa45a5192294c784b803a7.1512544324.git.osandov@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c6b9d9a3
  7. 05 12月, 2017 1 次提交
  8. 28 11月, 2017 1 次提交
  9. 13 11月, 2017 1 次提交
    • D
      Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20
      David Howells 提交于
      Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
      extra argument and make it 'unsigned int throughout.
      
      Also, consolidate a bunch of identical action functions into a default
      function that can do the appropriate thing for the mode.
      
      Also, change the argument name in the bit_wait*() function declarations to
      reflect the fact that it's the mode and not the bit number.
      
      [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
      should be done differently, though he's not immediately sure as to how]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      cc: Ingo Molnar <mingo@kernel.org>
      5e4def20
  10. 09 11月, 2017 2 次提交
    • P
      sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds · 765cc3a4
      Patrick Bellasi 提交于
      When the kernel is compiled with !CONFIG_SCHED_DEBUG support, we expect that
      all SCHED_FEAT are turned into compile time constants being propagated
      to support compiler optimizations.
      
      Specifically, we expect that code blocks like this:
      
         if (sched_feat(FEATURE_NAME) [&& <other_conditions>]) {
      	/* FEATURE CODE */
         }
      
      are turned into dead-code in case FEATURE_NAME defaults to FALSE, and thus
      being removed by the compiler from the finale image.
      
      For this mechanism to properly work it's required for the compiler to
      have full access, from each translation unit, to whatever is the value
      defined by the sched_feat macro. This macro is defined as:
      
         #define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
      
      and thus, the compiler can optimize that code only if the value of
      sysctl_sched_features is visible within each translation unit.
      
      Since:
      
         029632fb ("sched: Make separate sched*.c translation units")
      
      the scheduler code has been split into separate translation units
      however the definition of sysctl_sched_features is part of
      kernel/sched/core.c while, for all the other scheduler modules, it is
      visible only via kernel/sched/sched.h as an:
      
         extern const_debug unsigned int sysctl_sched_features
      
      Unfortunately, an extern reference does not allow the compiler to apply
      constants propagation. Thus, on !CONFIG_SCHED_DEBUG kernel we still end up
      with code to load a memory reference and (eventually) doing an unconditional
      jump of a chunk of code.
      
      This mechanism is unavoidable when sched_features can be turned on and off at
      run-time. However, this is not the case for "production" kernels compiled with
      !CONFIG_SCHED_DEBUG. In this case, sysctl_sched_features is just a constant value
      which cannot be changed at run-time and thus memory loads and jumps can be
      avoided altogether.
      
      This patch fixes the case of !CONFIG_SCHED_DEBUG kernel by declaring a local version
      of the sysctl_sched_features constant for each translation unit. This will
      ultimately allow the compiler to perform constants propagation and dead-code
      pruning.
      
      Tests have been done, with !CONFIG_SCHED_DEBUG on a v4.14-rc8 with and without
      the patch, by running 30 iterations of:
      
         perf bench sched messaging --pipe --thread --group 4 --loop 50000
      
      on a 40 cores Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz using the
      powersave governor to rule out variations due to frequency scaling.
      
      Statistics on the reported completion time:
      
                         count     mean       std     min       99%     max
        v4.14-rc8         30.0  15.7831  0.176032  15.442  16.01226  16.014
        v4.14-rc8+patch   30.0  15.5033  0.189681  15.232  15.93938  15.962
      
      ... show a 1.8% speedup on average completion time and 0.5% speedup in the
      99 percentile.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NChris Redpath <chris.redpath@arm.com>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20171108184101.16006-1-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      765cc3a4
    • V
      cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq · 07458f6a
      Viresh Kumar 提交于
      'cached_raw_freq' is used to get the next frequency quickly but should
      always be in sync with sg_policy->next_freq. There is a case where it is
      not and in such cases it should be reset to avoid switching to incorrect
      frequencies.
      
      Consider this case for example:
      
       - policy->cur is 1.2 GHz (Max)
       - New request comes for 780 MHz and we store that in cached_raw_freq.
       - Based on 780 MHz, we calculate the effective frequency as 800 MHz.
       - We then see the CPU wasn't idle recently and choose to keep the next
         freq as 1.2 GHz.
       - Now we have cached_raw_freq is 780 MHz and sg_policy->next_freq is
         1.2 GHz.
       - Now if the utilization doesn't change in then next request, then the
         next target frequency will still be 780 MHz and it will match with
         cached_raw_freq. But we will choose 1.2 GHz instead of 800 MHz here.
      
      Fixes: b7eaf1aa (cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely)
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Cc: 4.12+ <stable@vger.kernel.org> # 4.12+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      07458f6a
  11. 08 11月, 2017 1 次提交
  12. 05 11月, 2017 1 次提交
  13. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  14. 27 10月, 2017 10 次提交
  15. 26 10月, 2017 1 次提交
    • C
      sched/idle: Micro-optimize the idle loop · 54b933c6
      Cheng Jian 提交于
      Move the loop-invariant calculation of 'cpu' in do_idle() out of the loop body,
      because the current CPU is always constant.
      
      This improves the generated code both on x86-64 and ARM64:
      
      x86-64:
      
      Before patch (execution in loop):
      	864:       0f ae e8                lfence
      	867:       65 8b 05 c2 38 f1 7e    mov %gs:0x7ef138c2(%rip),%eax
      	86e:       89 c0                   mov %eax,%eax
      	870:       48 0f a3 05 68 19 08    bt  %rax,0x1081968(%rip)
      	877:	   01
      
      After patch (execution in loop):
      	872:       0f ae e8                lfence
      	875:       4c 0f a3 25 63 19 08    bt  %r12,0x1081963(%rip)
      	87c:       01
      
      ARM64:
      
      Before patch (execution in loop):
      	c58:       d5033d9f        dsb     ld
      	c5c:       d538d080        mrs     x0, tpidr_el1
      	c60:       b8606a61        ldr     w1, [x19,x0]
      	c64:       1100fc20        add     w0, w1, #0x3f
      	c68:       7100003f        cmp     w1, #0x0
      	c6c:       1a81b000        csel    w0, w0, w1, lt
      	c70:       13067c00        asr     w0, w0, #6
      	c74:       93407c00        sxtw    x0, w0
      	c78:       f8607a80        ldr     x0, [x20,x0,lsl #3]
      	c7c:       9ac12401        lsr     x1, x0, x1
      	c80:       36000581        tbz     w1, #0, d30 <do_idle+0x128>
      
      After patch (execution in loop):
      	c84:       d5033d9f        dsb     ld
      	c88:       f9400260        ldr     x0, [x19]
      	c8c:       ea14001f        tst     x0, x20
      	c90:       54000580        b.eq    d40 <do_idle+0x138>
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      [ Rewrote the title and the changelog. ]
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: huawei.libin@huawei.com
      Cc: xiexiuqi@huawei.com
      Link: http://lkml.kernel.org/r/1508930907-107755-1-git-send-email-cj.chengjian@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54b933c6
  16. 24 10月, 2017 1 次提交
    • R
      sched/isolcpus: Fix "isolcpus=" boot parameter handling when !CONFIG_CPUMASK_OFFSTACK · e22cdc3f
      Rakib Mullick 提交于
      cpulist_parse() uses nr_cpumask_bits as a limit to parse the
      passed buffer from kernel commandline. What nr_cpumask_bits
      represents varies depending upon the CONFIG_CPUMASK_OFFSTACK option:
      
       - If CONFIG_CPUMASK_OFFSTACK=n, then nr_cpumask_bits is the same as
         NR_CPUS, which might not represent the # of CPUs that really exist
         (default 64). So, there's a chance of a gap between nr_cpu_ids
         and NR_CPUS, which ultimately lead towards invalid cpulist_parse()
         operation. For example, if isolcpus=9 is passed on an 8 cpu
         system (CONFIG_CPUMASK_OFFSTACK=n) it doesn't show the error
         that it's supposed to.
      
      This patch fixes this bug by finding the last CPU of the passed
      isolcpus= list and checking it against nr_cpu_ids.
      
      It also fixes the error message where the nr_cpu_ids should be
      nr_cpu_ids-1, since CPU numbering starts from 0.
      Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adobriyan@gmail.com
      Cc: akpm@linux-foundation.org
      Cc: longman@redhat.com
      Cc: mka@chromium.org
      Cc: tj@kernel.org
      Link: http://lkml.kernel.org/r/20171023130154.9050-1-rakib.mullick@gmail.com
      [ Enhanced the changelog and the kernel message. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      
       include/linux/cpumask.h |   16 ++++++++++++++++
       kernel/sched/topology.c |    4 ++--
       2 files changed, 18 insertions(+), 2 deletions(-)
      e22cdc3f
  17. 20 10月, 2017 1 次提交
    • M
      membarrier: Provide register expedited private command · a961e409
      Mathieu Desnoyers 提交于
      This introduces a "register private expedited" membarrier command which
      allows eventual removal of important memory barrier constraints on the
      scheduler fast-paths. It changes how the "private expedited" membarrier
      command (new to 4.14) is used from user-space.
      
      This new command allows processes to register their intent to use the
      private expedited command.  This affects how the expedited private
      command introduced in 4.14-rc is meant to be used, and should be merged
      before 4.14 final.
      
      Processes are now required to register before using
      MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
      
      This fixes a problem that arose when designing requested extensions to
      sys_membarrier() to allow JITs to efficiently flush old code from
      instruction caches.  Several potential algorithms are much less painful
      if the user register intent to use this functionality early on, for
      example, before the process spawns the second thread.  Registering at
      this time removes the need to interrupt each and every thread in that
      process at the first expedited sys_membarrier() system call.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a961e409
  18. 10 10月, 2017 5 次提交
    • S
      sched/rt: Simplify the IPI based RT balancing logic · 4bdced5c
      Steven Rostedt (Red Hat) 提交于
      When a CPU lowers its priority (schedules out a high priority task for a
      lower priority one), a check is made to see if any other CPU has overloaded
      RT tasks (more than one). It checks the rto_mask to determine this and if so
      it will request to pull one of those tasks to itself if the non running RT
      task is of higher priority than the new priority of the next task to run on
      the current CPU.
      
      When we deal with large number of CPUs, the original pull logic suffered
      from large lock contention on a single CPU run queue, which caused a huge
      latency across all CPUs. This was caused by only having one CPU having
      overloaded RT tasks and a bunch of other CPUs lowering their priority. To
      solve this issue, commit:
      
        b6366f04 ("sched/rt: Use IPI to trigger RT task push migration instead of pulling")
      
      changed the way to request a pull. Instead of grabbing the lock of the
      overloaded CPU's runqueue, it simply sent an IPI to that CPU to do the work.
      
      Although the IPI logic worked very well in removing the large latency build
      up, it still could suffer from a large number of IPIs being sent to a single
      CPU. On a 80 CPU box, I measured over 200us of processing IPIs. Worse yet,
      when I tested this on a 120 CPU box, with a stress test that had lots of
      RT tasks scheduling on all CPUs, it actually triggered the hard lockup
      detector! One CPU had so many IPIs sent to it, and due to the restart
      mechanism that is triggered when the source run queue has a priority status
      change, the CPU spent minutes! processing the IPIs.
      
      Thinking about this further, I realized there's no reason for each run queue
      to send its own IPI. As all CPUs with overloaded tasks must be scanned
      regardless if there's one or many CPUs lowering their priority, because
      there's no current way to find the CPU with the highest priority task that
      can schedule to one of these CPUs, there really only needs to be one IPI
      being sent around at a time.
      
      This greatly simplifies the code!
      
      The new approach is to have each root domain have its own irq work, as the
      rto_mask is per root domain. The root domain has the following fields
      attached to it:
      
        rto_push_work	 - the irq work to process each CPU set in rto_mask
        rto_lock	 - the lock to protect some of the other rto fields
        rto_loop_start - an atomic that keeps contention down on rto_lock
      		    the first CPU scheduling in a lower priority task
      		    is the one to kick off the process.
        rto_loop_next	 - an atomic that gets incremented for each CPU that
      		    schedules in a lower priority task.
        rto_loop	 - a variable protected by rto_lock that is used to
      		    compare against rto_loop_next
        rto_cpu	 - The cpu to send the next IPI to, also protected by
      		    the rto_lock.
      
      When a CPU schedules in a lower priority task and wants to make sure
      overloaded CPUs know about it. It increments the rto_loop_next. Then it
      atomically sets rto_loop_start with a cmpxchg. If the old value is not "0",
      then it is done, as another CPU is kicking off the IPI loop. If the old
      value is "0", then it will take the rto_lock to synchronize with a possible
      IPI being sent around to the overloaded CPUs.
      
      If rto_cpu is greater than or equal to nr_cpu_ids, then there's either no
      IPI being sent around, or one is about to finish. Then rto_cpu is set to the
      first CPU in rto_mask and an IPI is sent to that CPU. If there's no CPUs set
      in rto_mask, then there's nothing to be done.
      
      When the CPU receives the IPI, it will first try to push any RT tasks that is
      queued on the CPU but can't run because a higher priority RT task is
      currently running on that CPU.
      
      Then it takes the rto_lock and looks for the next CPU in the rto_mask. If it
      finds one, it simply sends an IPI to that CPU and the process continues.
      
      If there's no more CPUs in the rto_mask, then rto_loop is compared with
      rto_loop_next. If they match, everything is done and the process is over. If
      they do not match, then a CPU scheduled in a lower priority task as the IPI
      was being passed around, and the process needs to start again. The first CPU
      in rto_mask is sent the IPI.
      
      This change removes this duplication of work in the IPI logic, and greatly
      lowers the latency caused by the IPIs. This removed the lockup happening on
      the 120 CPU machine. It also simplifies the code tremendously. What else
      could anyone ask for?
      
      Thanks to Peter Zijlstra for simplifying the rto_loop_start atomic logic and
      supplying me with the rto_start_trylock() and rto_start_unlock() helper
      functions.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Scott Wood <swood@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170424114732.1aac6dc4@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4bdced5c
    • B
      sched/fair: Fix usage of find_idlest_group() when the local group is idlest · 93f50f90
      Brendan Jackman 提交于
      find_idlest_group() returns NULL when the local group is idlest. The
      caller then continues the find_idlest_group() search at a lower level
      of the current CPU's sched_domain hierarchy. find_idlest_group_cpu() is
      not consulted and, crucially, @new_cpu is not updated. This means the
      search is pointless and we return @prev_cpu from select_task_rq_fair().
      
      This is fixed by initialising @new_cpu to @cpu instead of @prev_cpu.
      Signed-off-by: NBrendan Jackman <brendan.jackman@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171005114516.18617-6-brendan.jackman@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      93f50f90
    • B
      sched/fair: Fix usage of find_idlest_group() when no groups are allowed · 6fee85cc
      Brendan Jackman 提交于
      When 'p' is not allowed on any of the CPUs in the sched_domain, we
      currently return NULL from find_idlest_group(), and pointlessly
      continue the search on lower sched_domain levels (where 'p' is also not
      allowed) before returning prev_cpu regardless (as we have not updated
      new_cpu).
      
      Add an explicit check for this case, and add a comment to
      find_idlest_group(). Now when find_idlest_group() returns NULL, it always
      means that the local group is allowed and idlest.
      Signed-off-by: NBrendan Jackman <brendan.jackman@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171005114516.18617-5-brendan.jackman@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6fee85cc
    • B
      sched/fair: Fix find_idlest_group() when local group is not allowed · 0d10ab95
      Brendan Jackman 提交于
      When the local group is not allowed we do not modify this_*_load from
      their initial value of 0. That means that the load checks at the end
      of find_idlest_group cause us to incorrectly return NULL. Fixing the
      initial values to ULONG_MAX means we will instead return the idlest
      remote group in that case.
      Signed-off-by: NBrendan Jackman <brendan.jackman@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171005114516.18617-4-brendan.jackman@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0d10ab95
    • B
      sched/fair: Remove unnecessary comparison with -1 · e90381ea
      Brendan Jackman 提交于
      Since commit:
      
        83a0a96a ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu")
      
      find_idlest_group_cpu() (formerly find_idlest_cpu) no longer returns -1,
      so we can simplify the checking of the return value in find_idlest_cpu().
      Signed-off-by: NBrendan Jackman <brendan.jackman@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171005114516.18617-3-brendan.jackman@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e90381ea