1. 03 6月, 2016 1 次提交
    • R
      cpufreq: governor: Get rid of governor events · e788892b
      Rafael J. Wysocki 提交于
      The design of the cpufreq governor API is not very straightforward,
      as struct cpufreq_governor provides only one callback to be invoked
      from different code paths for different purposes.  The purpose it is
      invoked for is determined by its second "event" argument, causing it
      to act as a "callback multiplexer" of sorts.
      
      Unfortunately, that leads to extra complexity in governors, some of
      which implement the ->governor() callback as a switch statement
      that simply checks the event argument and invokes a separate function
      to handle that specific event.
      
      That extra complexity can be eliminated by replacing the all-purpose
      ->governor() callback with a family of callbacks to carry out specific
      governor operations: initialization and exit, start and stop and policy
      limits updates.  That also turns out to reduce the code size too, so
      do it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      e788892b
  2. 25 5月, 2016 1 次提交
    • P
      sched/core: Fix remote wakeups · b7e7ade3
      Peter Zijlstra 提交于
      Commit:
      
        b5179ac7 ("sched/fair: Prepare to fix fairness problems on migration")
      
      ... introduced a bug: Mike Galbraith found that it introduced a
      performance regression, while Paul E. McKenney reported lost
      wakeups and bisected it to this commit.
      
      The reason is that I mis-read ttwu_queue() such that I assumed any
      wakeup that got a remote queue must have had the task migrated.
      
      Since this is not so; we need to transfer this information between
      queueing the wakeup and actually doing the wakeup. Use a new
      task_struct::sched_flag for this, we already write to
      sched_contributes_to_load in the wakeup path so this is a hot and
      modified cacheline.
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Fixes: b5179ac7 ("sched/fair: Prepare to fix fairness problems on migration")
      Link: http://lkml.kernel.org/r/20160523091907.GD15728@worktop.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b7e7ade3
  3. 19 5月, 2016 1 次提交
  4. 12 5月, 2016 10 次提交
    • T
      sched/core: Provide a tsk_nr_cpus_allowed() helper · 50605ffb
      Thomas Gleixner 提交于
      tsk_nr_cpus_allowed() is an accessor for task->nr_cpus_allowed which allows
      us to change the representation of ->nr_cpus_allowed if required.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462969411-17735-2-git-send-email-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      50605ffb
    • T
      sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed · ade42e09
      Thomas Gleixner 提交于
      Use the future-safe accessor for struct task_struct's.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462969411-17735-1-git-send-email-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ade42e09
    • V
      sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems · 20878232
      Vik Heyndrickx 提交于
      Systems show a minimal load average of 0.00, 0.01, 0.05 even when they
      have no load at all.
      
      Uptime and /proc/loadavg on all systems with kernels released during the
      last five years up until kernel version 4.6-rc5, show a 5- and 15-minute
      minimum loadavg of 0.01 and 0.05 respectively. This should be 0.00 on
      idle systems, but the way the kernel calculates this value prevents it
      from getting lower than the mentioned values.
      
      Likewise but not as obviously noticeable, a fully loaded system with no
      processes waiting, shows a maximum 1/5/15 loadavg of 1.00, 0.99, 0.95
      (multiplied by number of cores).
      
      Once the (old) load becomes 93 or higher, it mathematically can never
      get lower than 93, even when the active (load) remains 0 forever.
      This results in the strange 0.00, 0.01, 0.05 uptime values on idle
      systems.  Note: 93/2048 = 0.0454..., which rounds up to 0.05.
      
      It is not correct to add a 0.5 rounding (=1024/2048) here, since the
      result from this function is fed back into the next iteration again,
      so the result of that +0.5 rounding value then gets multiplied by
      (2048-2037), and then rounded again, so there is a virtual "ghost"
      load created, next to the old and active load terms.
      
      By changing the way the internally kept value is rounded, that internal
      value equivalent now can reach 0.00 on idle, and 1.00 on full load. Upon
      increasing load, the internally kept load value is rounded up, when the
      load is decreasing, the load value is rounded down.
      
      The modified code was tested on nohz=off and nohz kernels. It was tested
      on vanilla kernel 4.6-rc5 and on centos 7.1 kernel 3.10.0-327. It was
      tested on single, dual, and octal cores system. It was tested on virtual
      hosts and bare hardware. No unwanted effects have been observed, and the
      problems that the patch intended to fix were indeed gone.
      Tested-by: NDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: NVik Heyndrickx <vik.heyndrickx@veribox.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Doug Smythies <dsmythies@telus.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 0f004f5a ("sched: Cure more NO_HZ load average woes")
      Link: http://lkml.kernel.org/r/e8d32bff-d544-7748-72b5-3c86cc71f09f@veribox.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20878232
    • M
      sched/fair: Correct unit of load_above_capacity · cfa10334
      Morten Rasmussen 提交于
      In calculate_imbalance() load_above_capacity currently has the unit
      [capacity] while it is used as being [load/capacity]. Not only is it
      wrong it also makes it unlikely that load_above_capacity is ever used
      as the subsequent code picks the smaller of load_above_capacity and
      the avg_load
      
      This patch ensures that load_above_capacity has the right unit
      [load/capacity].
      Signed-off-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      [ Changed changelog to note it was in capacity unit; +rebase. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1461958364-675-4-git-send-email-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cfa10334
    • P
      sched/fair: Clean up scale confusion · 1be0eb2a
      Peter Zijlstra 提交于
      Wanpeng noted that the scale_load_down() in calculate_imbalance() was
      weird. I agree, it should be SCHED_CAPACITY_SCALE, since we're going
      to compare against busiest->group_capacity, which is in [capacity]
      units.
      Reported-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1be0eb2a
    • W
      sched/nohz: Fix affine unpinned timers mess · 44496922
      Wanpeng Li 提交于
      The following commit:
      
        9642d18e ("nohz: Affine unpinned timers to housekeepers")'
      
      intended to affine unpinned timers to housekeepers:
      
        unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(houserkeepers, idle)    =>   nearest busy housekeepers(otherwise, fallback to itself)
      
      However, the !idle_cpu(i) && is_housekeeping_cpu(cpu) check modified the
      intention to:
      
        unpinned timers(full dynaticks, idle)   =>   any housekeepers(no mattter cpu topology)
        unpinned timers(full dynaticks, busy)   =>   any housekeepers(no mattter cpu topology)
        unpinned timers(housekeepers, idle)     =>   any busy cpus(otherwise, fallback to any housekeepers)
      
      This patch fixes it by checking if there are busy housekeepers nearby,
      otherwise falls to any housekeepers/itself. After the patch:
      
        unpinned timers(full dynaticks, idle)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(full dynaticks, busy)   =>   nearest busy housekeepers(otherwise, fallback to any housekeepers)
        unpinned timers(housekeepers, idle)     =>   nearest busy housekeepers(otherwise, fallback to itself)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Fixed the changelog. ]
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 'commit 9642d18e ("nohz: Affine unpinned timers to housekeepers")'
      Link: http://lkml.kernel.org/r/1462344334-8303-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      44496922
    • P
      sched/fair: Fix fairness issue on migration · 2f950354
      Peter Zijlstra 提交于
      Pavan reported that in the presence of very light tasks (or cgroups)
      the placement of migrated tasks can cause severe fairness issues.
      
      The problem is that enqueue_entity() places the task before it updates
      time, thereby it can place the task far in the past (remember that
      light tasks will shoot virtual time forward at a high speed, so in
      relation to the pre-existing light task, we can land far in the past).
      
      This is done because update_curr() needs the current task, and we
      might be placing the current task.
      
      The obvious solution is to differentiate between the current and any
      other task; placing the current before we update time, and placing any
      other task after, such that !curr tasks end up at the current moment
      in time, and not in the past.
      
      This commit re-introduces the previously reverted commit:
      
        3a47d512 ("sched/fair: Fix fairness issue on migration")
      
      ... which is now safe to do, after we've also fixed another
      underlying bug first, in:
      
        sched/fair: Prepare to fix fairness problems on migration
      
      and cleaned up other details in the migration code:
      
        sched/core: Kill sched_class::task_waking
      Reported-by: NPavan Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2f950354
    • P
      sched/core: Kill sched_class::task_waking to clean up the migration logic · 59efa0ba
      Peter Zijlstra 提交于
      With sched_class::task_waking being called only when we do
      set_task_cpu(), we can make sched_class::migrate_task_rq() do the work
      and eliminate sched_class::task_waking entirely.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59efa0ba
    • P
      sched/fair: Prepare to fix fairness problems on migration · b5179ac7
      Peter Zijlstra 提交于
      Mike reported that our recent attempt to fix migration problems:
      
        3a47d512 ("sched/fair: Fix fairness issue on migration")
      
      broke interactivity and the signal starve test. We reverted that
      commit and now let's try it again more carefully, with some other
      underlying problems fixed first.
      
      One problem is that I assumed ENQUEUE_WAKING was only set when we do a
      cross-cpu wakeup (migration), which isn't true. This means we now
      destroy the vruntime history of tasks and wakeup-preemption suffers.
      
      Cure this by making my assumption true, only call
      sched_class::task_waking() when we do a cross-cpu wakeup. This avoids
      the indirect call in the case we do a local wakeup.
      Reported-by: NMike Galbraith <mgalbraith@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Fixes: 3a47d512 ("sched/fair: Fix fairness issue on migration")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b5179ac7
    • P
      sched/fair: Move record_wakee() · c58d25f3
      Peter Zijlstra 提交于
      Since I want to make ->task_woken() conditional on the task getting
      migrated, we cannot use it to call record_wakee().
      
      Move it to select_task_rq_fair(), which gets called in almost all the
      same conditions. The only exception is if the woken task (@p) is
      CPU-bound (as per the nr_cpus_allowed test in select_task_rq()).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Pavan Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: byungchul.park@lge.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c58d25f3
  5. 11 5月, 2016 1 次提交
  6. 10 5月, 2016 1 次提交
    • X
      sched/rt, sched/dl: Don't push if task's scheduling class was changed · 13b5ab02
      Xunlei Pang 提交于
      We got this warning:
      
          WARNING: CPU: 1 PID: 2468 at kernel/sched/core.c:1161 set_task_cpu+0x1af/0x1c0
          [...]
          Call Trace:
      
          dump_stack+0x63/0x87
          __warn+0xd1/0xf0
          warn_slowpath_null+0x1d/0x20
          set_task_cpu+0x1af/0x1c0
          push_dl_task.part.34+0xea/0x180
          push_dl_tasks+0x17/0x30
          __balance_callback+0x45/0x5c
          __sched_setscheduler+0x906/0xb90
          SyS_sched_setattr+0x150/0x190
          do_syscall_64+0x62/0x110
          entry_SYSCALL64_slow_path+0x25/0x25
      
      This corresponds to:
      
          WARN_ON_ONCE(p->state == TASK_RUNNING &&
                   p->sched_class == &fair_sched_class &&
                   (p->on_rq && !task_on_rq_migrating(p)))
      
      It happens because in find_lock_later_rq(), the task whose scheduling
      class was changed to fair class is still pushed away as if it were
      a deadline task ...
      
      So, check in find_lock_later_rq() after double_lock_balance(), if the
      scheduling class of the deadline task was changed, break and retry.
      
      Apply the same logic to RT tasks.
      Signed-off-by: NXunlei Pang <xlpang@redhat.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Link: http://lkml.kernel.org/r/1462767091-1215-1-git-send-email-xlpang@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13b5ab02
  7. 09 5月, 2016 2 次提交
  8. 07 5月, 2016 1 次提交
  9. 06 5月, 2016 13 次提交
  10. 05 5月, 2016 9 次提交