1. 02 3月, 2017 1 次提交
    • P
      sched/fair: Make select_idle_cpu() more aggressive · 4c77b18c
      Peter Zijlstra 提交于
      Kitsunyan reported desktop latency issues on his Celeron 887 because
      of commit:
      
        1b568f0a ("sched/core: Optimize SCHED_SMT")
      
      ... even though his CPU doesn't do SMT.
      
      The effect of running the SMT code on a !SMT part is basically a more
      aggressive select_idle_cpu(). Removing the avg condition fixed things
      for him.
      
      I also know FB likes this test gone, even though other workloads like
      having it.
      
      For now, take it out by default, until we get a better idea.
      Reported-by: Nkitsunyan <kitsunyan@inbox.ru>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c77b18c
  2. 28 2月, 2017 1 次提交
  3. 24 2月, 2017 3 次提交
  4. 22 2月, 2017 1 次提交
  5. 10 2月, 2017 1 次提交
  6. 08 2月, 2017 1 次提交
    • I
      sched/autogroup: Rename auto_group.[ch] to autogroup.[ch] · 1051408f
      Ingo Molnar 提交于
      The names are all 'autogroup', not 'auto_group' - so rename
      the kernel/sched/auto_group.[ch] to match the existing
      nomenclature.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1051408f
  7. 07 2月, 2017 4 次提交
  8. 01 2月, 2017 12 次提交
  9. 30 1月, 2017 6 次提交
  10. 28 1月, 2017 1 次提交
  11. 20 1月, 2017 1 次提交
    • P
      sched/clock: Fix hotplug crash · acb04058
      Peter Zijlstra 提交于
      Mike reported that he could trigger the WARN_ON_ONCE() in
      set_sched_clock_stable() using hotplug.
      
      This exposed a fundamental problem with the interface, we should never
      mark the TSC stable if we ever find it to be unstable. Therefore
      set_sched_clock_stable() is a broken interface.
      
      The reason it existed is that not having it is a pain, it means all
      relevant architecture code needs to call clear_sched_clock_stable()
      where appropriate.
      
      Of the three architectures that select HAVE_UNSTABLE_SCHED_CLOCK ia64
      and parisc are trivial in that they never called
      set_sched_clock_stable(), so add an unconditional call to
      clear_sched_clock_stable() to them.
      
      For x86 the story is a lot more involved, and what this patch tries to
      do is ensure we preserve the status quo. So even is Cyrix or Transmeta
      have usable TSC they never called set_sched_clock_stable() so they now
      get an explicit mark unstable.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9881b024 ("sched/clock: Delay switching sched_clock to stable")
      Link: http://lkml.kernel.org/r/20170119133633.GB6536@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      acb04058
  12. 14 1月, 2017 8 次提交
    • T
      sched/core: Separate out io_schedule_prepare() and io_schedule_finish() · 10ab5643
      Tejun Heo 提交于
      Now that IO schedule accounting is done inside __schedule(),
      io_schedule() can be split into three steps - prep, schedule, and
      finish - where the schedule part doesn't need any special annotation.
      This allows marking a sleep as iowait by simply wrapping an existing
      blocking function with io_schedule_prepare() and io_schedule_finish().
      
      Because task_struct->in_iowait is single bit, the caller of
      io_schedule_prepare() needs to record and the pass its state to
      io_schedule_finish() to be safe regarding nesting.  While this isn't
      the prettiest, these functions are mostly gonna be used by core
      functions and we don't want to use more space for ->in_iowait.
      
      While at it, as it's simple to do now, reimplement io_schedule()
      without unnecessarily going through io_schedule_timeout().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/1477673892-28940-3-git-send-email-tj@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10ab5643
    • T
      sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler · e33a9bba
      Tejun Heo 提交于
      For an interface to support blocking for IOs, it must call
      io_schedule() instead of schedule().  This makes it tedious to add IO
      blocking to existing interfaces as the switching between schedule()
      and io_schedule() is often buried deep.
      
      As we already have a way to mark the task as IO scheduling, this can
      be made easier by separating out io_schedule() into multiple steps so
      that IO schedule preparation can be performed before invoking a
      blocking interface and the actual accounting happens inside the
      scheduler.
      
      io_schedule_timeout() does the following three things prior to calling
      schedule_timeout().
      
       1. Mark the task as scheduling for IO.
       2. Flush out plugged IOs.
       3. Account the IO scheduling.
      
      done close to the actual scheduling.  This patch moves #3 into the
      scheduler so that later patches can separate out preparation and
      finish steps from io_schedule().
      Patch-originally-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: akpm@linux-foundation.org
      Cc: axboe@kernel.dk
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/20161207204841.GA22296@htj.duckdns.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e33a9bba
    • D
      sched/fair: Explain why MIN_SHARES isn't scaled in calc_cfs_shares() · b8fd8423
      Dietmar Eggemann 提交于
      Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Turner <pjt@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e9a4d858-bcf3-36b9-e3a9-449953e34569@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b8fd8423
    • V
      sched/core: Fix group_entity's share update · 89ee048f
      Vincent Guittot 提交于
      The update of the share of a cfs_rq is done when its load_avg is updated
      but before the group_entity's load_avg has been updated for the past time
      slot. This generates wrong load_avg accounting which can be significant
      when small tasks are involved in the scheduling.
      
      Let take the example of a task a that is dequeued of its task group A:
         root
        (cfs_rq)
          \
          (se)
           A
          (cfs_rq)
            \
            (se)
             a
      
      Task "a" was the only task in task group A which becomes idle when a is
      dequeued.
      
      We have the sequence:
      
      - dequeue_entity a->se
          - update_load_avg(a->se)
          - dequeue_entity_load_avg(A->cfs_rq, a->se)
          - update_cfs_shares(A->cfs_rq)
      	A->cfs_rq->load.weight == 0
              A->se->load.weight is updated with the new share (0 in this case)
      - dequeue_entity A->se
          - update_load_avg(A->se) but its weight is now null so the last time
            slot (up to a tick) will be accounted with a weight of 0 instead of
            its real weight during the time slot. The last time slot will be
            accounted as an idle one whereas it was a running one.
      
      If the running time of task a is short enough that no tick happens when it
      runs, all running time of group entity A->se will be accounted as idle
      time.
      
      Instead, we should update the share of a cfs_rq (in fact the weight of its
      group entity) only after having updated the load_avg of the group_entity.
      
      update_cfs_shares() now takes the sched_entity as a parameter instead of the
      cfs_rq, and the weight of the group_entity is updated only once its load_avg
      has been synced with current time.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/1482335426-7664-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      89ee048f
    • P
      sched/completions: Fix complete_all() semantics · da9647e0
      Peter Zijlstra 提交于
      Documentation/scheduler/completion.txt says this about complete_all():
      
        "calls complete_all() to signal all current and future waiters."
      
      Which doesn't strictly match the current semantics. Currently
      complete_all() is equivalent to UINT_MAX/2 complete() invocations,
      which is distinctly less than 'all current and future waiters'
      (enumerable vs innumerable), although it has worked in practice.
      
      However, Dmitry had a weird case where it might matter, so change
      completions to use saturation semantics for complete()/complete_all().
      Once done hits UINT_MAX (and complete_all() sets it there) it will
      never again be decremented.
      Requested-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: der.herr@hofr.at
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      da9647e0
    • T
      sched/deadline: Show leftover runtime and abs deadline in /proc/*/sched · 59f8c298
      Tommaso Cucinotta 提交于
      This patch allows for reading the current (leftover) runtime and
      absolute deadline of a SCHED_DEADLINE task through /proc/*/sched
      (entries dl.runtime and dl.deadline), while debugging/testing.
      Signed-off-by: NTommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NJuri Lelli <juri.lelli@arm.com>
      Reviewed-by: NLuca Abeni <luca.abeni@unitn.it>
      Acked-by: NDaniel Bistrot de Oliveira <danielbristot@gmail.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1477473437-10346-2-git-send-email-tommaso.cucinotta@sssup.itSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59f8c298
    • P
      sched/clock: Provide better clock continuity · 5680d809
      Peter Zijlstra 提交于
      When switching between the unstable and stable variants it is
      currently possible that clock discontinuities occur.
      
      And while these will mostly be 'small', attempt to do better.
      
      As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
      ktime_get_ns() based timeline at the point of switchover
      (sched_clock_init_late()) after SMP bringup.
      
      Equally, when the TSC is later found to be unstable -- typically
      because SMM tries to hide its SMI latencies by mucking with the TSC --
      we want to avoid large jumps.
      
      Since the clocksource watchdog reports the issue after the fact we
      cannot exactly fix up time, but since SMI latencies are typically
      small (~10ns range), the discontinuity is mainly due to drift between
      sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
      24days).
      
      I dislike this patch because it adds overhead to the good case in
      favour of dealing with badness. But given the widespread failure of
      TSC stability this is worth it.
      
      Note that in case the TSC makes drastic jumps after SMP bringup we're
      still hosed. There's just not much we can do in that case without
      stupid overhead.
      
      If we were to somehow expose tsc_clocksource_reliable (which is hard
      because this code is also used on ia64 and parisc) we could avoid some
      of the newly introduced overhead.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5680d809
    • P
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra 提交于
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9881b024