1. 24 6月, 2020 4 次提交
  2. 04 6月, 2019 1 次提交
  3. 06 4月, 2019 1 次提交
  4. 10 9月, 2018 1 次提交
    • J
      sched/debug: Fix potential deadlock when writing to sched_features · e73e8197
      Jiada Wang 提交于
      The following lockdep report can be triggered by writing to /sys/kernel/debug/sched_features:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        4.18.0-rc6-00152-gcd3f77d7-dirty #18 Not tainted
        ------------------------------------------------------
        sh/3358 is trying to acquire lock:
        000000004ad3989d (cpu_hotplug_lock.rw_sem){++++}, at: static_key_enable+0x14/0x30
        but task is already holding lock:
        00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3 (&sb->s_type->i_mutex_key#3){+.+.}:
               lock_acquire+0xb8/0x148
               down_write+0xac/0x140
               start_creating+0x5c/0x168
               debugfs_create_dir+0x18/0x220
               opp_debug_register+0x8c/0x120
               _add_opp_dev+0x104/0x1f8
               dev_pm_opp_get_opp_table+0x174/0x340
               _of_add_opp_table_v2+0x110/0x760
               dev_pm_opp_of_add_table+0x5c/0x240
               dev_pm_opp_of_cpumask_add_table+0x5c/0x100
               cpufreq_init+0x160/0x430
               cpufreq_online+0x1cc/0xe30
               cpufreq_add_dev+0x78/0x198
               subsys_interface_register+0x168/0x270
               cpufreq_register_driver+0x1c8/0x278
               dt_cpufreq_probe+0xdc/0x1b8
               platform_drv_probe+0xb4/0x168
               driver_probe_device+0x318/0x4b0
               __device_attach_driver+0xfc/0x1f0
               bus_for_each_drv+0xf8/0x180
               __device_attach+0x164/0x200
               device_initial_probe+0x10/0x18
               bus_probe_device+0x110/0x178
               device_add+0x6d8/0x908
               platform_device_add+0x138/0x3d8
               platform_device_register_full+0x1cc/0x1f8
               cpufreq_dt_platdev_init+0x174/0x1bc
               do_one_initcall+0xb8/0x310
               kernel_init_freeable+0x4b8/0x56c
               kernel_init+0x10/0x138
               ret_from_fork+0x10/0x18
        -> #2 (opp_table_lock){+.+.}:
               lock_acquire+0xb8/0x148
               __mutex_lock+0x104/0xf50
               mutex_lock_nested+0x1c/0x28
               _of_add_opp_table_v2+0xb4/0x760
               dev_pm_opp_of_add_table+0x5c/0x240
               dev_pm_opp_of_cpumask_add_table+0x5c/0x100
               cpufreq_init+0x160/0x430
               cpufreq_online+0x1cc/0xe30
               cpufreq_add_dev+0x78/0x198
               subsys_interface_register+0x168/0x270
               cpufreq_register_driver+0x1c8/0x278
               dt_cpufreq_probe+0xdc/0x1b8
               platform_drv_probe+0xb4/0x168
               driver_probe_device+0x318/0x4b0
               __device_attach_driver+0xfc/0x1f0
               bus_for_each_drv+0xf8/0x180
               __device_attach+0x164/0x200
               device_initial_probe+0x10/0x18
               bus_probe_device+0x110/0x178
               device_add+0x6d8/0x908
               platform_device_add+0x138/0x3d8
               platform_device_register_full+0x1cc/0x1f8
               cpufreq_dt_platdev_init+0x174/0x1bc
               do_one_initcall+0xb8/0x310
               kernel_init_freeable+0x4b8/0x56c
               kernel_init+0x10/0x138
               ret_from_fork+0x10/0x18
        -> #1 (subsys mutex#6){+.+.}:
               lock_acquire+0xb8/0x148
               __mutex_lock+0x104/0xf50
               mutex_lock_nested+0x1c/0x28
               subsys_interface_register+0xd8/0x270
               cpufreq_register_driver+0x1c8/0x278
               dt_cpufreq_probe+0xdc/0x1b8
               platform_drv_probe+0xb4/0x168
               driver_probe_device+0x318/0x4b0
               __device_attach_driver+0xfc/0x1f0
               bus_for_each_drv+0xf8/0x180
               __device_attach+0x164/0x200
               device_initial_probe+0x10/0x18
               bus_probe_device+0x110/0x178
               device_add+0x6d8/0x908
               platform_device_add+0x138/0x3d8
               platform_device_register_full+0x1cc/0x1f8
               cpufreq_dt_platdev_init+0x174/0x1bc
               do_one_initcall+0xb8/0x310
               kernel_init_freeable+0x4b8/0x56c
               kernel_init+0x10/0x138
               ret_from_fork+0x10/0x18
        -> #0 (cpu_hotplug_lock.rw_sem){++++}:
               __lock_acquire+0x203c/0x21d0
               lock_acquire+0xb8/0x148
               cpus_read_lock+0x58/0x1c8
               static_key_enable+0x14/0x30
               sched_feat_write+0x314/0x428
               full_proxy_write+0xa0/0x138
               __vfs_write+0xd8/0x388
               vfs_write+0xdc/0x318
               ksys_write+0xb4/0x138
               sys_write+0xc/0x18
               __sys_trace_return+0x0/0x4
        other info that might help us debug this:
        Chain exists of:
          cpu_hotplug_lock.rw_sem --> opp_table_lock --> &sb->s_type->i_mutex_key#3
         Possible unsafe locking scenario:
               CPU0                    CPU1
               ----                    ----
          lock(&sb->s_type->i_mutex_key#3);
                                       lock(opp_table_lock);
                                       lock(&sb->s_type->i_mutex_key#3);
          lock(cpu_hotplug_lock.rw_sem);
         *** DEADLOCK ***
        2 locks held by sh/3358:
         #0: 00000000a8c4b363 (sb_writers#10){.+.+}, at: vfs_write+0x238/0x318
         #1: 00000000c1b31a88 (&sb->s_type->i_mutex_key#3){+.+.}, at: sched_feat_write+0x160/0x428
        stack backtrace:
        CPU: 5 PID: 3358 Comm: sh Not tainted 4.18.0-rc6-00152-gcd3f77d7-dirty #18
        Hardware name: Renesas H3ULCB Kingfisher board based on r8a7795 ES2.0+ (DT)
        Call trace:
         dump_backtrace+0x0/0x288
         show_stack+0x14/0x20
         dump_stack+0x13c/0x1ac
         print_circular_bug.isra.10+0x270/0x438
         check_prev_add.constprop.16+0x4dc/0xb98
         __lock_acquire+0x203c/0x21d0
         lock_acquire+0xb8/0x148
         cpus_read_lock+0x58/0x1c8
         static_key_enable+0x14/0x30
         sched_feat_write+0x314/0x428
         full_proxy_write+0xa0/0x138
         __vfs_write+0xd8/0x388
         vfs_write+0xdc/0x318
         ksys_write+0xb4/0x138
         sys_write+0xc/0x18
         __sys_trace_return+0x0/0x4
      
      This is because when loading the cpufreq_dt module we first acquire
      cpu_hotplug_lock.rw_sem lock, then in cpufreq_init(), we are taking
      the &sb->s_type->i_mutex_key lock.
      
      But when writing to /sys/kernel/debug/sched_features, the
      cpu_hotplug_lock.rw_sem lock depends on the &sb->s_type->i_mutex_key lock.
      
      To fix this bug, reverse the lock acquisition order when writing to
      sched_features, this way cpu_hotplug_lock.rw_sem no longer depends on
      &sb->s_type->i_mutex_key.
      Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: NJiada Wang <jiada_wang@mentor.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Eugeniu Rosca <erosca@de.adit-jv.com>
      Cc: George G. Davis <george_davis@mentor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180731121222.26195-1-jiada_wang@mentor.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e73e8197
  5. 25 7月, 2018 1 次提交
  6. 20 7月, 2018 1 次提交
  7. 21 6月, 2018 1 次提交
  8. 16 5月, 2018 1 次提交
  9. 20 3月, 2018 3 次提交
    • J
      sched/debug: Adjust newlines for better alignment · e9ca2670
      Joe Lawrence 提交于
      Scheduler debug stats include newlines that display out of alignment
      when prefixed by timestamps.  For example, the dmesg utility:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        [   83.124251]
        runnable tasks:
         S           task   PID         tree-key  switches  prio     wait-time
        sum-exec        sum-sleep
        -----------------------------------------------------------------------------------------------------------
      
      At the same time, some syslog utilities (like rsyslog by default) don't
      like the additional newlines control characters, saving lines like this
      to /var/log/messages:
      
        Mar 16 16:02:29 localhost kernel: #012runnable tasks:#012 S           task   PID         tree-key ...
                                          ^^^^               ^^^^
      Clean these up by moving newline characters to their own SEQ_printf
      invocation.  This leaves the /proc/sched_debug unchanged, but brings the
      entire output into alignment when prefixed:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        [   62.410368] runnable tasks:
        [   62.410368]  S           task   PID         tree-key  switches  prio     wait-time             sum-exec        sum-sleep
        [   62.410369] -----------------------------------------------------------------------------------------------------------
        [   62.410369]  I  kworker/u12:0     5      1932.215593       332   120         0.000000         3.621252         0.000000 0 0 /
      
      and no escaped control characters from rsyslog in /var/log/messages:
      
        Mar 16 16:15:06 localhost kernel: runnable tasks:
        Mar 16 16:15:06 localhost kernel: S           task   PID         tree-key  ...
      Signed-off-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e9ca2670
    • J
      sched/debug: Fix per-task line continuation for console output · a8c024cd
      Joe Lawrence 提交于
      When the SEQ_printf() macro prints to the console, it runs a simple
      printk() without KERN_CONT "continued" line printing.  The result of
      this is oddly wrapped task info, for example:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        runnable tasks:
        ...
        [   29.608611]  I
        [   29.608613]       rcu_sched     8      3252.013846      4087   120
        [   29.608614]         0.000000        29.090111         0.000000
        [   29.608615]  0 0
        [   29.608616]  /
      
      Modify SEQ_printf to use pr_cont() for expected one-line results:
      
        % echo t > /proc/sysrq-trigger
        % dmesg
        ...
        runnable tasks:
        ...
        [  106.716329]  S        cpuhp/5    37      2006.315026        14   120         0.000000         0.496893         0.000000 0 0 /
      Signed-off-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a8c024cd
    • P
      sched/fair: Add util_est on top of PELT · 7f65ea42
      Patrick Bellasi 提交于
      The util_avg signal computed by PELT is too variable for some use-cases.
      For example, a big task waking up after a long sleep period will have its
      utilization almost completely decayed. This introduces some latency before
      schedutil will be able to pick the best frequency to run a task.
      
      The same issue can affect task placement. Indeed, since the task
      utilization is already decayed at wakeup, when the task is enqueued in a
      CPU, this can result in a CPU running a big task as being temporarily
      represented as being almost empty. This leads to a race condition where
      other tasks can be potentially allocated on a CPU which just started to run
      a big task which slept for a relatively long period.
      
      Moreover, the PELT utilization of a task can be updated every [ms], thus
      making it a continuously changing value for certain longer running
      tasks. This means that the instantaneous PELT utilization of a RUNNING
      task is not really meaningful to properly support scheduler decisions.
      
      For all these reasons, a more stable signal can do a better job of
      representing the expected/estimated utilization of a task/cfs_rq.
      Such a signal can be easily created on top of PELT by still using it as
      an estimator which produces values to be aggregated on meaningful
      events.
      
      This patch adds a simple implementation of util_est, a new signal built on
      top of PELT's util_avg where:
      
          util_est(task) = max(task::util_avg, f(task::util_avg@dequeue))
      
      This allows to remember how big a task has been reported by PELT in its
      previous activations via f(task::util_avg@dequeue), which is the new
      _task_util_est(struct task_struct*) function added by this patch.
      
      If a task should change its behavior and it runs longer in a new
      activation, after a certain time its util_est will just track the
      original PELT signal (i.e. task::util_avg).
      
      The estimated utilization of cfs_rq is defined only for root ones.
      That's because the only sensible consumer of this signal are the
      scheduler and schedutil when looking for the overall CPU utilization
      due to FAIR tasks.
      
      For this reason, the estimated utilization of a root cfs_rq is simply
      defined as:
      
          util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued)
      
      where:
      
          cfs_rq::util_est::enqueued = sum(_task_util_est(task))
                                       for each RUNNABLE task on that root cfs_rq
      
      It's worth noting that the estimated utilization is tracked only for
      objects of interests, specifically:
      
       - Tasks: to better support tasks placement decisions
       - root cfs_rqs: to better support both tasks placement decisions as
                       well as frequencies selection
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f65ea42
  10. 04 3月, 2018 1 次提交
    • I
      sched/headers: Simplify and clean up header usage in the scheduler · 325ea10c
      Ingo Molnar 提交于
      Do the following cleanups and simplifications:
      
       - sched/sched.h already includes <asm/paravirt.h>, so no need to
         include it in sched/core.c again.
      
       - order the <linux/sched/*.h> headers alphabetically
      
       - add all <linux/sched/*.h> headers to kernel/sched/sched.h
      
       - remove all unnecessary includes from the .c files that
         are already included in kernel/sched/sched.h.
      
      Finally, make all scheduler .c files use a single common header:
      
        #include "sched.h"
      
      ... which now contains a union of the relied upon headers.
      
      This makes the various .c files easier to read and easier to handle.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      325ea10c
  11. 03 3月, 2018 1 次提交
    • I
      sched: Clean up and harmonize the coding style of the scheduler code base · 97fb7a0a
      Ingo Molnar 提交于
      A good number of small style inconsistencies have accumulated
      in the scheduler core, so do a pass over them to harmonize
      all these details:
      
       - fix speling in comments,
      
       - use curly braces for multi-line statements,
      
       - remove unnecessary parentheses from integer literals,
      
       - capitalize consistently,
      
       - remove stray newlines,
      
       - add comments where necessary,
      
       - remove invalid/unnecessary comments,
      
       - align structure definitions and other data types vertically,
      
       - add missing newlines for increased readability,
      
       - fix vertical tabulation where it's misaligned,
      
       - harmonize preprocessor conditional block labeling
         and vertical alignment,
      
       - remove line-breaks where they uglify the code,
      
       - add newline after local variable definitions,
      
      No change in functionality:
      
        md5:
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.before.asm
           1191fa0a890cfa8132156d2959d7e9e2  built-in.o.after.asm
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      97fb7a0a
  12. 30 9月, 2017 3 次提交
    • P
      sched/fair: Propagate an effective runnable_load_avg · 1ea6c46a
      Peter Zijlstra 提交于
      The load balancer uses runnable_load_avg as load indicator. For
      !cgroup this is:
      
        runnable_load_avg = \Sum se->avg.load_avg ; where se->on_rq
      
      That is, a direct sum of all runnable tasks on that runqueue. As
      opposed to load_avg, which is a sum of all tasks on the runqueue,
      which includes a blocked component.
      
      However, in the cgroup case, this comes apart since the group entities
      are always runnable, even if most of their constituent entities are
      blocked.
      
      Therefore introduce a runnable_weight which for task entities is the
      same as the regular weight, but for group entities is a fraction of
      the entity weight and represents the runnable part of the group
      runqueue.
      
      Then propagate this load through the PELT hierarchy to arrive at an
      effective runnable load avgerage -- which we should not confuse with
      the canonical runnable load average.
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1ea6c46a
    • P
      sched/fair: Rewrite PELT migration propagation · 0e2d2aaa
      Peter Zijlstra 提交于
      When an entity migrates in (or out) of a runqueue, we need to add (or
      remove) its contribution from the entire PELT hierarchy, because even
      non-runnable entities are included in the load average sums.
      
      In order to do this we have some propagation logic that updates the
      PELT tree, however the way it 'propagates' the runnable (or load)
      change is (more or less):
      
                           tg->weight * grq->avg.load_avg
        ge->avg.load_avg = ------------------------------
                                     tg->load_avg
      
      But that is the expression for ge->weight, and per the definition of
      load_avg:
      
        ge->avg.load_avg := ge->weight * ge->avg.runnable_avg
      
      That destroys the runnable_avg (by setting it to 1) we wanted to
      propagate.
      
      Instead directly propagate runnable_sum.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0e2d2aaa
    • P
      sched/fair: Rewrite cfs_rq->removed_*avg · 2a2f5d4e
      Peter Zijlstra 提交于
      Since on wakeup migration we don't hold the rq->lock for the old CPU
      we cannot update its state. Instead we add the removed 'load' to an
      atomic variable and have the next update on that CPU collect and
      process it.
      
      Currently we have 2 atomic variables; which already have the issue
      that they can be read out-of-sync. Also, two atomic ops on a single
      cacheline is already more expensive than an uncontended lock.
      
      Since we want to add more, convert the thing over to an explicit
      cacheline with a lock in.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2a2f5d4e
  13. 29 9月, 2017 1 次提交
  14. 12 9月, 2017 1 次提交
  15. 09 9月, 2017 1 次提交
  16. 25 8月, 2017 1 次提交
  17. 10 8月, 2017 3 次提交
  18. 30 6月, 2017 1 次提交
  19. 02 3月, 2017 2 次提交
  20. 14 1月, 2017 1 次提交
  21. 05 9月, 2016 3 次提交
  22. 10 8月, 2016 1 次提交
    • T
      cgroup: make cgroup_path() and friends behave in the style of strlcpy() · 4c737b41
      Tejun Heo 提交于
      cgroup_path() and friends used to format the path from the end and
      thus the resulting path usually didn't start at the start of the
      passed in buffer.  Also, when the buffer was too small, the partial
      result was truncated from the head rather than tail and there was no
      way to tell how long the full path would be.  These make the functions
      less robust and more awkward to use.
      
      With recent updates to kernfs_path(), cgroup_path() and friends can be
      made to behave in strlcpy() style.
      
      * cgroup_path(), cgroup_path_ns[_locked]() and task_cgroup_path() now
        always return the length of the full path.  If buffer is too small,
        it contains nul terminated truncated output.
      
      * All users updated accordingly.
      
      v2: cgroup_path() usage in kernel/sched/debug.c converted.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      4c737b41
  23. 08 6月, 2016 2 次提交
  24. 05 5月, 2016 1 次提交
  25. 29 2月, 2016 3 次提交
    • S
      sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug · ef477183
      Steven Rostedt (Red Hat) 提交于
      Playing with SCHED_DEADLINE and cpusets, I found that I was unable to create
      new SCHED_DEADLINE tasks, with the error of EBUSY as if the bandwidth was
      already used up. I then realized there wa no way to see what bandwidth is
      used by the runqueues to debug the issue.
      
      By adding the dl_bw->bw and dl_bw->total_bw to the output of the deadline
      info in /proc/sched_debug, this allows us to see what bandwidth has been
      reserved and where a problem may exist.
      
      For example, before the issue we see the ratio of the bandwidth:
      
       # cat /proc/sys/kernel/sched_rt_runtime_us
       950000
       # cat /proc/sys/kernel/sched_rt_period_us
       1000000
      
        # grep dl /proc/sched_debug
        dl_rq[0]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[1]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[2]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[3]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[4]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[5]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[6]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[7]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
      
      Note: (950000 / 1000000) << 20 == 996147
      
      After I played with cpusets and hit the issue, the result is now:
      
        # grep dl /proc/sched_debug
        dl_rq[0]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[1]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[2]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[3]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[4]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[5]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[6]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[7]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
      
      This shows that there is definitely a problem as we should never have a
      negative total bandwidth.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160222212825.756849091@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef477183
    • S
      sched/debug: Move sched_domain_sysctl to debug.c · 3866e845
      Steven Rostedt (Red Hat) 提交于
      The sched_domain_sysctl setup is only enabled when SCHED_DEBUG is
      configured. As debug.c is only compiled when SCHED_DEBUG is configured as
      well, move the setup of sched_domain_sysctl into that file.
      
      Note, the (un)register_sched_domain_sysctl() functions had to be changed
      from static to allow access to them from core.c.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160222212825.599278093@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3866e845
    • S
      sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c · d6ca41d7
      Steven Rostedt (Red Hat) 提交于
      As /sys/kernel/debug/sched_features is only created when SCHED_DEBUG is enabled, and the file
      debug.c is only compiled when SCHED_DEBUG is enabled, it makes sense to move
      sched_feature setup into that file and get rid of the #ifdef.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160222212825.464193063@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d6ca41d7