1. 02 3月, 2017 16 次提交
  2. 08 2月, 2017 1 次提交
    • I
      sched/autogroup: Rename auto_group.[ch] to autogroup.[ch] · 1051408f
      Ingo Molnar 提交于
      The names are all 'autogroup', not 'auto_group' - so rename
      the kernel/sched/auto_group.[ch] to match the existing
      nomenclature.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1051408f
  3. 07 2月, 2017 1 次提交
  4. 01 2月, 2017 1 次提交
    • F
      sched/cputime: Increment kcpustat directly on irqtime account · a499a5a1
      Frederic Weisbecker 提交于
      The irqtime is accounted is nsecs and stored in
      cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
      accumulated amount reaches a new jiffy, this one gets accounted to the
      kcpustat.
      
      This was necessary when kcpustat was stored in cputime_t, which could at
      worst have jiffies granularity. But now kcpustat is stored in nsecs
      so this whole discretization game with temporary irqtime storage has
      become unnecessary.
      
      We can now directly account the irqtime to the kcpustat.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-17-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a499a5a1
  5. 14 1月, 2017 2 次提交
    • M
      sched/core: Add debugging code to catch missing update_rq_clock() calls · cb42c9a3
      Matt Fleming 提交于
      There's no diagnostic checks for figuring out when we've accidentally
      missed update_rq_clock() calls. Let's add some by piggybacking on the
      rq_*pin_lock() wrappers.
      
      The idea behind the diagnostic checks is that upon pining rq lock the
      rq clock should be updated, via update_rq_clock(), before anybody
      reads the clock with rq_clock() or rq_clock_task().
      
      The exception to this rule is when updates have explicitly been
      disabled with the rq_clock_skip_update() optimisation.
      
      There are some functions that only unpin the rq lock in order to grab
      some other lock and avoid deadlock. In that case we don't need to
      update the clock again and the previous diagnostic state can be
      carried over in rq_repin_lock() by saving the state in the rq_flags
      context.
      
      Since this patch adds a new clock update flag and some already exist
      in rq::clock_skip_update, that field has now been renamed. An attempt
      has been made to keep the flag manipulation code small and fast since
      it's used in the heart of the __schedule() fast path.
      
      For the !CONFIG_SCHED_DEBUG case the only object code change (other
      than addresses) is the following change to reset RQCF_ACT_SKIP inside
      of __schedule(),
      
        -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
        -       00 00 00
        +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb42c9a3
    • M
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming 提交于
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d8ac8971
  6. 24 11月, 2016 1 次提交
  7. 16 11月, 2016 3 次提交
    • V
      sched/fair: Propagate load during synchronous attach/detach · 09a43ace
      Vincent Guittot 提交于
      When a task moves from/to a cfs_rq, we set a flag which is then used to
      propagate the change at parent level (sched_entity and cfs_rq) during
      next update. If the cfs_rq is throttled, the flag will stay pending until
      the cfs_rq is unthrottled.
      
      For propagating the utilization, we copy the utilization of group cfs_rq to
      the sched_entity.
      
      For propagating the load, we have to take into account the load of the
      whole task group in order to evaluate the load of the sched_entity.
      Similarly to what was done before the rewrite of PELT, we add a correction
      factor in case the task group's load is greater than its share so it will
      contribute the same load of a task of equal weight.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: kernellwp@gmail.com
      Cc: pjt@google.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09a43ace
    • V
      sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list · 9c2791f9
      Vincent Guittot 提交于
      Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a
      child will always be called before its parent.
      
      The hierarchical order in shares update list has been introduced by
      commit:
      
        67e86250 ("sched: Introduce hierarchal order on shares update list")
      
      With the current implementation a child can be still put after its
      parent.
      
      Lets take the example of:
      
             root
              \
               b
               /\
               c d*
                 |
                 e*
      
      with root -> b -> c already enqueued but not d -> e so the
      leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail
      
      The branch d -> e will be added the first time that they are enqueued,
      starting with e then d.
      
      When e is added, its parents is not already on the list so e is put at
      the tail : head -> c -> b -> root -> e -> tail
      
      Then, d is added at the head because its parent is already on the
      list: head -> d -> c -> b -> root -> e -> tail
      
      e is not placed at the right position and will be called the last
      whereas it should be called at the beginning.
      
      Because it follows the bottom-up enqueue sequence, we are sure that we
      will finished to add either a cfs_rq without parent or a cfs_rq with a
      parent that is already on the list. We can use this event to detect
      when we have finished to add a new branch. For the others, whose
      parents are not already added, we have to ensure that they will be
      added after their children that have just been inserted the steps
      before, and after any potential parents that are already in the list.
      The easiest way is to put the cfs_rq just after the last inserted one
      and to keep track of it untl the branch is fully added.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: kernellwp@gmail.com
      Cc: pjt@google.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c2791f9
    • M
      sched/fair: Add per-CPU min capacity to sched_group_capacity · bf475ce0
      Morten Rasmussen 提交于
      struct sched_group_capacity currently represents the compute capacity
      sum of all CPUs in the sched_group.
      
      Unless it is divided by the group_weight to get the average capacity
      per CPU, it hides differences in CPU capacity for mixed capacity systems
      (e.g. high RT/IRQ utilization or ARM big.LITTLE).
      
      But even the average may not be sufficient if the group covers CPUs of
      different capacities.
      
      Instead, by extending struct sched_group_capacity to indicate min per-CPU
      capacity in the group a suitable group for a given task utilization can
      more easily be found such that CPUs with reduced capacity can be avoided
      for tasks with high utilization (not implemented by this patch).
      Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bf475ce0
  8. 30 9月, 2016 6 次提交
    • F
      sched/irqtime: Consolidate accounting synchronization with u64_stats API · 19d23dbf
      Frederic Weisbecker 提交于
      The irqtime accounting currently implement its own ad hoc implementation
      of u64_stats API. Lets rather consolidate it with the appropriate
      library.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1474849761-12678-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      19d23dbf
    • P
      sched/debug: Add SCHED_WARN_ON() · 9148a3a1
      Peter Zijlstra 提交于
      Provide SCHED_WARN_ON as wrapper for WARN_ON_ONCE() to avoid
      CONFIG_SCHED_DEBUG wrappery.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9148a3a1
    • P
      sched/fair: Introduce set_curr_task() helper · b2bf6c31
      Peter Zijlstra 提交于
      Now that the ia64 only set_curr_task() symbol is gone, provide a
      helper just like put_prev_task().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b2bf6c31
    • P
      sched/core: Optimize SCHED_SMT · 1b568f0a
      Peter Zijlstra 提交于
      Avoid pointless SCHED_SMT code when running on !SMT hardware.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b568f0a
    • P
      sched/core: Rewrite and improve select_idle_siblings() · 10e2f1ac
      Peter Zijlstra 提交于
      select_idle_siblings() is a known pain point for a number of
      workloads; it either does too much or not enough and sometimes just
      does plain wrong.
      
      This rewrite attempts to address a number of issues (but sadly not
      all).
      
      The current code does an unconditional sched_domain iteration; with
      the intent of finding an idle core (on SMT hardware). The problems
      which this patch tries to address are:
      
       - its pointless to look for idle cores if the machine is real busy;
         at which point you're just wasting cycles.
      
       - it's behaviour is inconsistent between SMT and !SMT hardware in
         that !SMT hardware ends up doing a scan for any idle CPU in the LLC
         domain, while SMT hardware does a scan for idle cores and if that
         fails, falls back to a scan for idle threads on the 'target' core.
      
      The new code replaces the sched_domain scan with 3 explicit scans:
      
       1) search for an idle core in the LLC
       2) search for an idle CPU in the LLC
       3) search for an idle thread in the 'target' core
      
      where 1 and 3 are conditional on SMT support and 1 and 2 have runtime
      heuristics to skip the step.
      
      Step 1) is conditional on sd_llc_shared->has_idle_cores; when a cpu
      goes idle and sd_llc_shared->has_idle_cores is false, we scan all SMT
      siblings of the CPU going idle. Similarly, we clear
      sd_llc_shared->has_idle_cores when we fail to find an idle core.
      
      Step 2) tracks the average cost of the scan and compares this to the
      average idle time guestimate for the CPU doing the wakeup. There is a
      significant fudge factor involved to deal with the variability of the
      averages. Esp. hackbench was sensitive to this.
      
      Step 3) is unconditional; we assume (also per step 1) that scanning
      all SMT siblings in a core is 'cheap'.
      
      With this; SMT systems gain step 2, which cures a few benchmarks --
      notably one from Facebook.
      
      One 'feature' of the sched_domain iteration, which we preserve in the
      new code, is that it would start scanning from the 'target' CPU,
      instead of scanning the cpumask in cpu id order. This avoids multiple
      CPUs in the LLC scanning for idle to gang up and find the same CPU
      quite as much. The down side is that tasks can end up hopping across
      the LLC for no apparent reason.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      10e2f1ac
    • P
      sched/core: Replace sd_busy/nr_busy_cpus with sched_domain_shared · 0e369d75
      Peter Zijlstra 提交于
      Move the nr_busy_cpus thing from its hacky sd->parent->groups->sgc
      location into the much more natural sched_domain_shared location.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0e369d75
  9. 15 9月, 2016 1 次提交
  10. 18 8月, 2016 2 次提交
  11. 17 8月, 2016 2 次提交
  12. 13 7月, 2016 1 次提交
  13. 27 6月, 2016 3 次提交
    • P
      sched/fair: Rework throttle_count sync · 55e16d30
      Peter Zijlstra 提交于
      Since we already take rq->lock when creating a cgroup, use it to also
      sync the throttle_count and avoid the extra state and enqueue path
      branch.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: linux-kernel@vger.kernel.org
      [ Fixed build warning. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      55e16d30
    • P
      sched/fair: Reorder cgroup creation code · 8663e24d
      Peter Zijlstra 提交于
      A future patch needs rq->lock held _after_ we link the task_group into
      the hierarchy. In order to avoid taking every rq->lock twice, reorder
      things a little and create online_fair_sched_group() to be called
      after we link the task_group.
      
      All this code is still ran from css_alloc() so css_online() isn't in
      fact used for this.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8663e24d
    • V
      sched/cgroup: Fix cpu_cgroup_fork() handling · ea86cb4b
      Vincent Guittot 提交于
      A new fair task is detached and attached from/to task_group with:
      
        cgroup_post_fork()
          ss->fork(child) := cpu_cgroup_fork()
            sched_move_task()
              task_move_group_fair()
      
      Which is wrong, because at this point in fork() the task isn't fully
      initialized and it cannot 'move' to another group, because its not
      attached to any group as yet.
      
      In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
      can just call this small part directly instead sched_move_task(). And
      the task doesn't really migrate because it is not yet attached so we
      need the following sequence:
      
        do_fork()
          sched_fork()
            __set_task_cpu()
      
          cgroup_post_fork()
            set_task_rq() # set task group and runqueue
      
          wake_up_new_task()
            select_task_rq() can select a new cpu
            __set_task_cpu
            post_init_entity_util_avg
              attach_task_cfs_rq()
            activate_task
              enqueue_task
      
      This patch makes that happen.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      [ Added TASK_SET_GROUP to set depth properly. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ea86cb4b