1. 02 3月, 2017 10 次提交
  2. 28 2月, 2017 1 次提交
  3. 24 2月, 2017 3 次提交
  4. 22 2月, 2017 1 次提交
  5. 10 2月, 2017 1 次提交
  6. 07 2月, 2017 4 次提交
  7. 01 2月, 2017 1 次提交
  8. 30 1月, 2017 5 次提交
  9. 14 1月, 2017 9 次提交
    • T
      sched/core: Separate out io_schedule_prepare() and io_schedule_finish() · 10ab5643
      Tejun Heo 提交于
      Now that IO schedule accounting is done inside __schedule(),
      io_schedule() can be split into three steps - prep, schedule, and
      finish - where the schedule part doesn't need any special annotation.
      This allows marking a sleep as iowait by simply wrapping an existing
      blocking function with io_schedule_prepare() and io_schedule_finish().
      
      Because task_struct->in_iowait is single bit, the caller of
      io_schedule_prepare() needs to record and the pass its state to
      io_schedule_finish() to be safe regarding nesting.  While this isn't
      the prettiest, these functions are mostly gonna be used by core
      functions and we don't want to use more space for ->in_iowait.
      
      While at it, as it's simple to do now, reimplement io_schedule()
      without unnecessarily going through io_schedule_timeout().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/1477673892-28940-3-git-send-email-tj@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      10ab5643
    • T
      sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler · e33a9bba
      Tejun Heo 提交于
      For an interface to support blocking for IOs, it must call
      io_schedule() instead of schedule().  This makes it tedious to add IO
      blocking to existing interfaces as the switching between schedule()
      and io_schedule() is often buried deep.
      
      As we already have a way to mark the task as IO scheduling, this can
      be made easier by separating out io_schedule() into multiple steps so
      that IO schedule preparation can be performed before invoking a
      blocking interface and the actual accounting happens inside the
      scheduler.
      
      io_schedule_timeout() does the following three things prior to calling
      schedule_timeout().
      
       1. Mark the task as scheduling for IO.
       2. Flush out plugged IOs.
       3. Account the IO scheduling.
      
      done close to the actual scheduling.  This patch moves #3 into the
      scheduler so that later patches can separate out preparation and
      finish steps from io_schedule().
      Patch-originally-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adilger.kernel@dilger.ca
      Cc: akpm@linux-foundation.org
      Cc: axboe@kernel.dk
      Cc: jack@suse.com
      Cc: kernel-team@fb.com
      Cc: mingbo@fb.com
      Cc: tytso@mit.edu
      Link: http://lkml.kernel.org/r/20161207204841.GA22296@htj.duckdns.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e33a9bba
    • P
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra 提交于
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9881b024
    • M
      sched/core: Add debugging code to catch missing update_rq_clock() calls · cb42c9a3
      Matt Fleming 提交于
      There's no diagnostic checks for figuring out when we've accidentally
      missed update_rq_clock() calls. Let's add some by piggybacking on the
      rq_*pin_lock() wrappers.
      
      The idea behind the diagnostic checks is that upon pining rq lock the
      rq clock should be updated, via update_rq_clock(), before anybody
      reads the clock with rq_clock() or rq_clock_task().
      
      The exception to this rule is when updates have explicitly been
      disabled with the rq_clock_skip_update() optimisation.
      
      There are some functions that only unpin the rq lock in order to grab
      some other lock and avoid deadlock. In that case we don't need to
      update the clock again and the previous diagnostic state can be
      carried over in rq_repin_lock() by saving the state in the rq_flags
      context.
      
      Since this patch adds a new clock update flag and some already exist
      in rq::clock_skip_update, that field has now been renamed. An attempt
      has been made to keep the flag manipulation code small and fast since
      it's used in the heart of the __schedule() fast path.
      
      For the !CONFIG_SCHED_DEBUG case the only object code change (other
      than addresses) is the following change to reset RQCF_ACT_SKIP inside
      of __schedule(),
      
        -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
        -       00 00 00
        +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb42c9a3
    • P
      sched/core: Add missing update_rq_clock() call in set_user_nice() · 2fb8d367
      Peter Zijlstra 提交于
      Address this rq-clock update bug:
      
        WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          set_next_entity()
          ? _raw_spin_lock()
          set_curr_task_fair()
          set_user_nice.part.85()
          set_user_nice()
          create_worker()
          worker_thread()
          kthread()
          ret_from_fork()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2fb8d367
    • P
      sched/core: Add missing update_rq_clock() in detach_task_cfs_rq() · 80f5c1b8
      Peter Zijlstra 提交于
      Instead of adding the update_rq_clock() all the way at the bottom of
      the callstack, add one at the top, this to aid later effort to
      minimize update_rq_lock() calls.
      
        WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          detach_task_cfs_rq()
          switched_from_fair()
          __sched_setscheduler()
          _sched_setscheduler()
          sched_set_stop_task()
          cpu_stop_create()
          __smpboot_create_thread.part.2()
          smpboot_register_percpu_thread_cpumask()
          cpu_stop_init()
          do_one_initcall()
          ? print_cpu_info()
          kernel_init_freeable()
          ? rest_init()
          kernel_init()
          ret_from_fork()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      80f5c1b8
    • P
      sched/core: Add missing update_rq_clock() in post_init_entity_util_avg() · 4126bad6
      Peter Zijlstra 提交于
      Address this rq-clock update bug:
      
        WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          __warn()
          post_init_entity_util_avg()
          wake_up_new_task()
          _do_fork()
          kernel_thread()
          rest_init()
          start_kernel()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4126bad6
    • M
      sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock · 92509b73
      Matt Fleming 提交于
      rq_clock() is called from sched_info_{depart,arrive}() after resetting
      RQCF_ACT_SKIP but prior to a call to update_rq_clock().
      
      In preparation for pending patches that check whether the rq clock has
      been updated inside of a pin context before rq_clock() is called, move
      the reset of rq->clock_skip_update immediately before unpinning the rq
      lock.
      
      This will avoid the new warnings which check if update_rq_clock() is
      being actively skipped.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-6-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92509b73
    • M
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming 提交于
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d8ac8971
  10. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  11. 29 11月, 2016 1 次提交
    • P
      sched/idle: Add support for tasks that inject idle · c1de45ca
      Peter Zijlstra 提交于
      Idle injection drivers such as Intel powerclamp and ACPI PAD drivers use
      realtime tasks to take control of CPU then inject idle. There are two
      issues with this approach:
      
       1. Low efficiency: injected idle task is treated as busy so sched ticks
          do not stop during injected idle period, the result of these
          unwanted wakeups can be ~20% loss in power savings.
      
       2. Idle accounting: injected idle time is presented to user as busy.
      
      This patch addresses the issues by introducing a new PF_IDLE flag which
      allows any given task to be treated as idle task while the flag is set.
      Therefore, idle injection tasks can run through the normal flow of NOHZ
      idle enter/exit to get the correct accounting as well as tick stop when
      possible.
      
      The implication is that idle task is then no longer limited to PID == 0.
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c1de45ca
  12. 24 11月, 2016 1 次提交
  13. 16 11月, 2016 2 次提交
    • V
      sched/fair: Fix hierarchical order in rq->leaf_cfs_rq_list · 9c2791f9
      Vincent Guittot 提交于
      Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a
      child will always be called before its parent.
      
      The hierarchical order in shares update list has been introduced by
      commit:
      
        67e86250 ("sched: Introduce hierarchal order on shares update list")
      
      With the current implementation a child can be still put after its
      parent.
      
      Lets take the example of:
      
             root
              \
               b
               /\
               c d*
                 |
                 e*
      
      with root -> b -> c already enqueued but not d -> e so the
      leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail
      
      The branch d -> e will be added the first time that they are enqueued,
      starting with e then d.
      
      When e is added, its parents is not already on the list so e is put at
      the tail : head -> c -> b -> root -> e -> tail
      
      Then, d is added at the head because its parent is already on the
      list: head -> d -> c -> b -> root -> e -> tail
      
      e is not placed at the right position and will be called the last
      whereas it should be called at the beginning.
      
      Because it follows the bottom-up enqueue sequence, we are sure that we
      will finished to add either a cfs_rq without parent or a cfs_rq with a
      parent that is already on the list. We can use this event to detect
      when we have finished to add a new branch. For the others, whose
      parents are not already added, we have to ensure that they will be
      added after their children that have just been inserted the steps
      before, and after any potential parents that are already in the list.
      The easiest way is to put the cfs_rq just after the last inserted one
      and to keep track of it untl the branch is fully added.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bsegall@google.com
      Cc: kernellwp@gmail.com
      Cc: pjt@google.com
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1478598827-32372-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c2791f9
    • M
      sched/fair: Add per-CPU min capacity to sched_group_capacity · bf475ce0
      Morten Rasmussen 提交于
      struct sched_group_capacity currently represents the compute capacity
      sum of all CPUs in the sched_group.
      
      Unless it is divided by the group_weight to get the average capacity
      per CPU, it hides differences in CPU capacity for mixed capacity systems
      (e.g. high RT/IRQ utilization or ARM big.LITTLE).
      
      But even the average may not be sufficient if the group covers CPUs of
      different capacities.
      
      Instead, by extending struct sched_group_capacity to indicate min per-CPU
      capacity in the group a suitable group for a given task utilization can
      more easily be found such that CPUs with reduced capacity can be avoided
      for tasks with high utilization (not implemented by this patch).
      Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: freedom.tan@mediatek.com
      Cc: keita.kobayashi.ym@renesas.com
      Cc: mgalbraith@suse.de
      Cc: sgurrappadi@nvidia.com
      Cc: vincent.guittot@linaro.org
      Cc: yuyang.du@intel.com
      Link: http://lkml.kernel.org/r/1476452472-24740-4-git-send-email-morten.rasmussen@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bf475ce0