1. 08 6月, 2017 4 次提交
  2. 15 5月, 2017 5 次提交
    • P
      sched/topology: Rename sched_group_cpus() · ae4df9d6
      Peter Zijlstra 提交于
      There's a discrepancy in naming between the sched_domain and
      sched_group cpumask accessor. Since we're doing changes, fix it.
      
        $ git grep sched_group_cpus | wc -l
        28
        $ git grep sched_domain_span | wc -l
        38
      
      Suggests changing sched_group_cpus() into sched_group_span():
      
        for i  in `git grep -l sched_group_cpus`
        do
          sed -ie 's/sched_group_cpus/sched_group_span/g' $i
        done
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ae4df9d6
    • P
      sched/topology: Rename sched_group_mask() · e5c14b1f
      Peter Zijlstra 提交于
      Since sched_group_mask() is now an independent cpumask (it no longer
      masks sched_group_cpus()), rename the thing.
      Suggested-by: NLauro Ramos Venancio <lvenanci@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e5c14b1f
    • P
      sched/topology: Add sched_group_capacity debugging · 005f874d
      Peter Zijlstra 提交于
      Add sgc::id to easier spot domain construction issues.
      
      Take the opportunity to slightly rework the group printing, because
      adding more "(id: %d)" strings makes the entire thing very hard to
      read. Also the individual groups are very hard to separate, so add
      explicit visual grouping, which allows replacing all the "(%s: %d)"
      format things with shorter "%s=%d" variants.
      
      Then fix up some inconsistencies in surrounding prints for domains.
      
      The end result looks like:
      
        [] CPU0 attaching sched-domain(s):
        []  domain-0: span=0,4 level=DIE
        []   groups: 0:{ span=0 }, 4:{ span=4 }
        []   domain-1: span=0-1,3-5,7 level=NUMA
        []    groups: 0:{ span=0,4 mask=0,4 cap=2048 }, 1:{ span=1,5 mask=1,5 cap=2048 }, 3:{ span=3,7 mask=3,7 cap=2048 }
        []    domain-2: span=0-7 level=NUMA
        []     groups: 0:{ span=0-1,3-5,7 mask=0,4 cap=6144 }, 2:{ span=1-3,5-7 mask=2,6 cap=6144 }
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      005f874d
    • P
      sched/topology: Small cleanup · 8d5dc512
      Peter Zijlstra 提交于
      Move the allocation of topology specific cpumasks into the topology
      code.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8d5dc512
    • S
      sched/core: Call __schedule() from do_idle() without enabling preemption · 8663effb
      Steven Rostedt (VMware) 提交于
      I finally got around to creating trampolines for dynamically allocated
      ftrace_ops with using synchronize_rcu_tasks(). For users of the ftrace
      function hook callbacks, like perf, that allocate the ftrace_ops
      descriptor via kmalloc() and friends, ftrace was not able to optimize
      the functions being traced to use a trampoline because they would also
      need to be allocated dynamically. The problem is that they cannot be
      freed when CONFIG_PREEMPT is set, as there's no way to tell if a task
      was preempted on the trampoline. That was before Paul McKenney
      implemented synchronize_rcu_tasks() that would make sure all tasks
      (except idle) have scheduled out or have entered user space.
      
      While testing this, I triggered this bug:
      
       BUG: unable to handle kernel paging request at ffffffffa0230077
       ...
       RIP: 0010:0xffffffffa0230077
       ...
       Call Trace:
        schedule+0x5/0xe0
        schedule_preempt_disabled+0x18/0x30
        do_idle+0x172/0x220
      
      What happened was that the idle task was preempted on the trampoline.
      As synchronize_rcu_tasks() ignores the idle thread, there's nothing
      that lets ftrace know that the idle task was preempted on a trampoline.
      
      The idle task shouldn't need to ever enable preemption. The idle task
      is simply a loop that calls schedule or places the cpu into idle mode.
      In fact, having preemption enabled is inefficient, because it can
      happen when idle is just about to call schedule anyway, which would
      cause schedule to be called twice. Once for when the interrupt came in
      and was returning back to normal context, and then again in the normal
      path that the idle loop is running in, which would be pointless, as it
      had already scheduled.
      
      The only reason schedule_preempt_disable() enables preemption is to be
      able to call sched_submit_work(), which requires preemption enabled. As
      this is a nop when the task is in the RUNNING state, and idle is always
      in the running state, there's no reason that idle needs to enable
      preemption. But that means it cannot use schedule_preempt_disable() as
      other callers of that function require calling sched_submit_work().
      
      Adding a new function local to kernel/sched/ that allows idle to call
      the scheduler without enabling preemption, fixes the
      synchronize_rcu_tasks() issue, as well as removes the pointless spurious
      schedule calls caused by interrupts happening in the brief window where
      preemption is enabled just before it calls schedule.
      
      Reviewed: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170414084809.3dacde2a@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8663effb
  3. 27 4月, 2017 1 次提交
  4. 16 3月, 2017 2 次提交
    • P
      sched/core: Add {EN,DE}QUEUE_NOCLOCK flags · 0a67d1ee
      Peter Zijlstra 提交于
      Currently {en,de}queue_task() do an unconditional update_rq_clock().
      However since we want to avoid duplicate updates, so that each
      rq->lock section appears atomic in time, we need to be able to skip
      these clock updates.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0a67d1ee
    • P
      sched/core: Add rq->lock wrappers · 8a8c69c3
      Peter Zijlstra 提交于
      The missing update_rq_clock() check can work with partial rq->lock
      wrappery, since a missing wrapper can cause the warning to not be
      emitted when it should have, but cannot cause the warning to trigger
      when it should not have.
      
      The duplicate update_rq_clock() check however can cause false warnings
      to trigger. Therefore add more comprehensive rq->lock wrappery.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8a8c69c3
  5. 02 3月, 2017 17 次提交
  6. 08 2月, 2017 1 次提交
    • I
      sched/autogroup: Rename auto_group.[ch] to autogroup.[ch] · 1051408f
      Ingo Molnar 提交于
      The names are all 'autogroup', not 'auto_group' - so rename
      the kernel/sched/auto_group.[ch] to match the existing
      nomenclature.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1051408f
  7. 07 2月, 2017 1 次提交
  8. 01 2月, 2017 1 次提交
    • F
      sched/cputime: Increment kcpustat directly on irqtime account · a499a5a1
      Frederic Weisbecker 提交于
      The irqtime is accounted is nsecs and stored in
      cpu_irq_time.hardirq_time and cpu_irq_time.softirq_time. Once the
      accumulated amount reaches a new jiffy, this one gets accounted to the
      kcpustat.
      
      This was necessary when kcpustat was stored in cputime_t, which could at
      worst have jiffies granularity. But now kcpustat is stored in nsecs
      so this whole discretization game with temporary irqtime storage has
      become unnecessary.
      
      We can now directly account the irqtime to the kcpustat.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1485832191-26889-17-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a499a5a1
  9. 14 1月, 2017 2 次提交
    • M
      sched/core: Add debugging code to catch missing update_rq_clock() calls · cb42c9a3
      Matt Fleming 提交于
      There's no diagnostic checks for figuring out when we've accidentally
      missed update_rq_clock() calls. Let's add some by piggybacking on the
      rq_*pin_lock() wrappers.
      
      The idea behind the diagnostic checks is that upon pining rq lock the
      rq clock should be updated, via update_rq_clock(), before anybody
      reads the clock with rq_clock() or rq_clock_task().
      
      The exception to this rule is when updates have explicitly been
      disabled with the rq_clock_skip_update() optimisation.
      
      There are some functions that only unpin the rq lock in order to grab
      some other lock and avoid deadlock. In that case we don't need to
      update the clock again and the previous diagnostic state can be
      carried over in rq_repin_lock() by saving the state in the rq_flags
      context.
      
      Since this patch adds a new clock update flag and some already exist
      in rq::clock_skip_update, that field has now been renamed. An attempt
      has been made to keep the flag manipulation code small and fast since
      it's used in the heart of the __schedule() fast path.
      
      For the !CONFIG_SCHED_DEBUG case the only object code change (other
      than addresses) is the following change to reset RQCF_ACT_SKIP inside
      of __schedule(),
      
        -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
        -       00 00 00
        +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb42c9a3
    • M
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming 提交于
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d8ac8971
  10. 24 11月, 2016 1 次提交
  11. 16 11月, 2016 3 次提交
    • V
      sched/fair: Propagate load during synchronous attach/detach · 09a43ace
      Vincent Guittot 提交于
      When a task moves from/to a cfs_rq, we set a flag which is then used to
      propagate the change at parent level (sched_entity and cfs_rq) during
      next update. If the cfs_rq is throttled, the flag will stay pending until
      the cfs_rq is unthrottled.
      
      For propagating the utilization, we copy the utilization of group cfs_rq to
      the sched_entity.
      
      For propagating the load, we have to take into account the load of the
      whole task group in order to evaluate the load of the sched_entity.
      Similarly to what was done before the rewrite of PELT, we add a correction
      factor in case the task group's load is greater than its share so it will
      contribute the same load of a task of equal weight.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: kernellwp@gmail.com
      Cc: pjt@google.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1478598827-32372-5-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09a43ace
    • V
      sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list · 9c2791f9
      Vincent Guittot 提交于
      Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a
      child will always be called before its parent.
      
      The hierarchical order in shares update list has been introduced by
      commit:
      
        67e86250 ("sched: Introduce hierarchal order on shares update list")
      
      With the current implementation a child can be still put after its
      parent.
      
      Lets take the example of:
      
             root
              \
               b
               /\
               c d*
                 |
                 e*
      
      with root -> b -> c already enqueued but not d -> e so the
      leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail
      
      The branch d -> e will be added the first time that they are enqueued,
      starting with e then d.
      
      When e is added, its parents is not already on the list so e is put at
      the tail : head -> c -> b -> root -> e -> tail
      
      Then, d is added at the head because its parent is already on the
      list: head -> d -> c -> b -> root -> e -> tail
      
      e is not placed at the right position and will be called the last
      whereas it should be called at the beginning.
      
      Because it follows the bottom-up enqueue sequence, we are sure that we
      will finished to add either a cfs_rq without parent or a cfs_rq with a
      parent that is already on the list. We can use this event to detect
      when we have finished to add a new branch. For the others, whose
      parents are not already added, we have to ensure that they will be
      added after their children that have just been inserted the steps
      before, and after any potential parents that are already in the list.
      The easiest way is to put the cfs_rq just after the last inserted one
      and to keep track of it untl the branch is fully added.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: kernellwp@gmail.com
      Cc: pjt@google.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c2791f9
    • M
      sched/fair: Add per-CPU min capacity to sched_group_capacity · bf475ce0
      Morten Rasmussen 提交于
      struct sched_group_capacity currently represents the compute capacity
      sum of all CPUs in the sched_group.
      
      Unless it is divided by the group_weight to get the average capacity
      per CPU, it hides differences in CPU capacity for mixed capacity systems
      (e.g. high RT/IRQ utilization or ARM big.LITTLE).
      
      But even the average may not be sufficient if the group covers CPUs of
      different capacities.
      
      Instead, by extending struct sched_group_capacity to indicate min per-CPU
      capacity in the group a suitable group for a given task utilization can
      more easily be found such that CPUs with reduced capacity can be avoided
      for tasks with high utilization (not implemented by this patch).
      Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bf475ce0
  12. 30 9月, 2016 2 次提交