1. 06 5月, 2016 13 次提交
  2. 29 3月, 2016 1 次提交
  3. 23 3月, 2016 1 次提交
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
  4. 21 3月, 2016 5 次提交
  5. 11 3月, 2016 1 次提交
  6. 10 3月, 2016 1 次提交
  7. 09 3月, 2016 1 次提交
  8. 08 3月, 2016 2 次提交
  9. 05 3月, 2016 1 次提交
    • T
      sched/cputime: Fix steal time accounting vs. CPU hotplug · e9532e69
      Thomas Gleixner 提交于
      On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
      value over CPU down and up. So after the CPU comes up again the delta
      calculation in steal_account_process_tick() wreckages itself due to the
      unsigned math:
      
      	 u64 steal = paravirt_steal_clock(smp_processor_id());
      
      	 steal -= this_rq()->prev_steal_time;
      
      So if steal is smaller than rq->prev_steal_time we end up with an insane large
      value which then gets added to rq->prev_steal_time, resulting in a permanent
      wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
      become stale.
      
      Nice trick to tell the world how idle the system is (100%) while the CPU is
      100% busy running tasks. Though we prefer realistic numbers.
      
      None of the accounting values which use a previous value to account for
      fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
      check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
      deals with clock warps and limits the /proc/stat visible wreckage. The
      prev_time values are still wrong.
      
      Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: commit 095c0aa8 "sched: adjust scheduler cpu power for stolen time"
      Fixes: commit aa483808 "sched: Remove irq time from available CPU power"
      Fixes: commit e6e6685a "KVM guest: Steal time accounting"
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e9532e69
  10. 02 3月, 2016 8 次提交
    • F
      sched-clock: Migrate to use new tick dependency mask model · 4f49b90a
      Frederic Weisbecker 提交于
      Instead of checking sched_clock_stable from the nohz subsystem to verify
      its tick dependency, migrate it to the new mask in order to include it
      to the all-in-one check.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      4f49b90a
    • F
      sched: Migrate sched to use new tick dependency mask model · 76d92ac3
      Frederic Weisbecker 提交于
      Instead of providing asynchronous checks for the nohz subsystem to verify
      sched tick dependency, migrate sched to the new mask.
      
      Everytime a task is enqueued or dequeued, we evaluate the state of the
      tick dependency on top of the policy of the tasks in the runqueue, by
      order of priority:
      
      SCHED_DEADLINE: Need the tick in order to periodically check for runtime
      SCHED_FIFO    : Don't need the tick (no round-robin)
      SCHED_RR      : Need the tick if more than 1 task of the same priority
                      for round robin (simplified with checking if more than
                      one SCHED_RR task no matter what priority).
      SCHED_NORMAL  : Need the tick if more than 1 task for round-robin.
      
      We could optimize that further with one flag per sched policy on the tick
      dependency mask and perform only the checks relevant to the policy
      concerned by an enqueue/dequeue operation.
      
      Since the checks aren't based on the current task anymore, we could get
      rid of the task switch hook but it's still needed for posix cpu
      timers.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      76d92ac3
    • F
      sched: Account rr tasks · 01d36d0a
      Frederic Weisbecker 提交于
      In order to evaluate the scheduler tick dependency without probing
      context switches, we need to know how much SCHED_RR and SCHED_FIFO tasks
      are enqueued as those policies don't have the same preemption
      requirements.
      
      To prepare for that, let's account SCHED_RR tasks, we'll be able to
      deduce SCHED_FIFO tasks as well from it and the total RT tasks in the
      runqueue.
      Reviewed-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      01d36d0a
    • A
      sched/core: Get rid of 'cpu' argument in wq_worker_sleeping() · 9b7f6597
      Alexander Gordeev 提交于
      Given that wq_worker_sleeping() could only be called for a
      CPU it is running on, we do not need passing a CPU ID as an
      argument.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9b7f6597
    • T
      rcu: Make CPU_DYING_IDLE an explicit call · 27d50c7e
      Thomas Gleixner 提交于
      Make the RCU CPU_DYING_IDLE callback an explicit function call, so it gets
      invoked at the proper place.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.870167933@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      27d50c7e
    • T
      cpu/hotplug: Make wait for dead cpu completion based · e69aab13
      Thomas Gleixner 提交于
      Kill the busy spinning on the control side and just wait for the hotplugged
      cpu to tell that it reached the dead state.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.776157858@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e69aab13
    • T
      cpu/hotplug: Let upcoming cpu bring itself fully up · 8df3e07e
      Thomas Gleixner 提交于
      Let the upcoming cpu kick the hotplug thread and let itself complete the
      bringup. That way the controll side can just wait for the completion or later
      when we made the hotplug machinery async not care at all.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.697655464@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8df3e07e
    • T
      cpu/hotplug: Move scheduler cpu_online notifier to hotplug core · 949338e3
      Thomas Gleixner 提交于
      Move the scheduler cpu online notifier part to the hotplug core. This is
      anyway the highest priority callback and we need that functionality right now
      for the next changes.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.200791046@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      949338e3
  11. 29 2月, 2016 6 次提交
    • P
      sched/deadline: Remove superfluous call to switched_to_dl() · 801ccdbf
      Peter Zijlstra 提交于
      	if (A || B) {
      
      	} else if (A && !B) {
      
      	}
      
      If A we'll take the first branch, if !A we will not satisfy the second.
      Therefore the second branch will never be taken.
      Reported-by: Nluca abeni <luca.abeni@unitn.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160225140149.GK6357@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      801ccdbf
    • S
      sched/debug: Fix preempt_disable_ip recording for preempt_disable() · f904f582
      Sebastian Andrzej Siewior 提交于
      The preempt_disable() invokes preempt_count_add() which saves the caller
      in ->preempt_disable_ip. It uses CALLER_ADDR1 which does not look for
      its caller but for the parent of the caller. Which means we get the correct
      caller for something like spin_lock() unless the architectures inlines
      those invocations. It is always wrong for preempt_disable() or
      local_bh_disable().
      
      This patch makes the function get_lock_parent_ip() which tries
      CALLER_ADDR0,1,2 if the former is a locking function.
      This seems to record the preempt_disable() caller properly for
      preempt_disable() itself as well as for get_cpu_var() or
      local_bh_disable().
      
      Steven asked for the get_parent_ip() -> get_lock_parent_ip() rename.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160226135456.GB18244@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f904f582
    • R
      sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity · ff9a9b4c
      Rik van Riel 提交于
      When profiling syscall overhead on nohz-full kernels,
      after removing __acct_update_integrals() from the profile,
      native_sched_clock() remains as the top CPU user. This can be
      reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
      
      This will reduce timing accuracy on nohz_full CPUs to jiffy
      based sampling, just like on normal CPUs. It results in
      totally removing native_sched_clock from the profile, and
      significantly speeding up the syscall entry and exit path,
      as well as irq entry and exit, and KVM guest entry & exit.
      
      Additionally, only call the more expensive functions (and
      advance the seqlock) when jiffies actually changed.
      
      This code relies on another CPU advancing jiffies when the
      system is busy. On a nohz_full system, this is done by a
      housekeeping CPU.
      
      A microbenchmark calling an invalid syscall number 10 million
      times in a row speeds up an additional 30% over the numbers
      with just the previous patches, for a total speedup of about
      40% over 4.4 and 4.5-rc1.
      
      Run times for the microbenchmark:
      
       4.4				3.8 seconds
       4.5-rc1			3.7 seconds
       4.5-rc1 + first patch		3.3 seconds
       4.5-rc1 + first 3 patches	3.1 seconds
       4.5-rc1 + all patches		2.3 seconds
      
      A non-NOHZ_FULL cpu (not the housekeeping CPU):
      
       all kernels			1.86 seconds
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: clark@redhat.com
      Cc: eric.dumazet@gmail.com
      Cc: fweisbec@gmail.com
      Cc: luto@amacapital.net
      Link: http://lkml.kernel.org/r/1455152907-18495-5-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff9a9b4c
    • S
      sched/rt: Kick RT bandwidth timer immediately on start up · c3a990dc
      Steven Rostedt 提交于
      I've been debugging why deadline tasks can cause the RT scheduler to
      throttle, even when the deadline tasks are only taking up 50% of the
      CPU and RT tasks are not even using 1% of the CPU. Here's what I found.
      
      In order to keep a CPU from being hogged by RT tasks, the deadline
      scheduler adds its run time (delta_exec) to the rt_time of the RT
      bandwidth. That way, if the two use more than 95% of the CPU within one
      second (default settings), the RT tasks are throttled to allow non RT
      tasks to run.
      
      Although the deadline tasks add their run time to the RT bandwidth, it
      lets the RT tasks do the accounting. This is where the problem lies. If
      a deadline task runs for a bit, and no RT tasks are running, then it
      will continually add to the RT rt_time that is used to calculate how
      much CPU the RT tasks use. But no RT period is in play, and this
      accumulation of the runtime never gets reset.
      
      When an RT task finally gets to run, and the watchdog goes off, it can
      see that the RT task has used more than it should of, because the
      deadline task added all this runtime to its rt_time. Then the RT task
      that just woke up gets throttled for no good reason.
      
      I also noticed that when an RT task is queued, it starts the timer to
      account for overload and such. But that timer goes off one period
      later, which may be too late and the extra rt_time will trigger a
      throttle.
      
      This is a quick work around to the problem. When a new RT task is
      queued, the bandwidth timer is set to go off immediately. Then the
      timer can clear out the extra time added to the rt_time while there was
      no RT task running. This stops my tests from triggering the throttle,
      and it will still throttle if an RT task runs too much, even while a
      deadline task is running.
      
      A better solution may be to subtract the bandwidth that the deadline
      task uses from the rt_runtime, and add it back when its finished. Then
      there wont be a need for runtime tracking of the time used by deadline
      tasks.
      
      I may play with that solution tomorrow.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <juri.lelli@gmail.com>
      Cc: <williams@redhat.com>
      Cc: Clark Williams
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Juri Lelli
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160216183746.349ec98b@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3a990dc
    • S
      sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug · ef477183
      Steven Rostedt (Red Hat) 提交于
      Playing with SCHED_DEADLINE and cpusets, I found that I was unable to create
      new SCHED_DEADLINE tasks, with the error of EBUSY as if the bandwidth was
      already used up. I then realized there wa no way to see what bandwidth is
      used by the runqueues to debug the issue.
      
      By adding the dl_bw->bw and dl_bw->total_bw to the output of the deadline
      info in /proc/sched_debug, this allows us to see what bandwidth has been
      reserved and where a problem may exist.
      
      For example, before the issue we see the ratio of the bandwidth:
      
       # cat /proc/sys/kernel/sched_rt_runtime_us
       950000
       # cat /proc/sys/kernel/sched_rt_period_us
       1000000
      
        # grep dl /proc/sched_debug
        dl_rq[0]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[1]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[2]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[3]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[4]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[5]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[6]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
        dl_rq[7]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 0
      
      Note: (950000 / 1000000) << 20 == 996147
      
      After I played with cpusets and hit the issue, the result is now:
      
        # grep dl /proc/sched_debug
        dl_rq[0]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[1]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[2]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[3]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : 104857
        dl_rq[4]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[5]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[6]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
        dl_rq[7]:
          .dl_nr_running                 : 0
          .dl_bw->bw                     : 996147
          .dl_bw->total_bw               : -104857
      
      This shows that there is definitely a problem as we should never have a
      negative total bandwidth.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160222212825.756849091@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ef477183
    • S
      sched/debug: Move sched_domain_sysctl to debug.c · 3866e845
      Steven Rostedt (Red Hat) 提交于
      The sched_domain_sysctl setup is only enabled when SCHED_DEBUG is
      configured. As debug.c is only compiled when SCHED_DEBUG is configured as
      well, move the setup of sched_domain_sysctl into that file.
      
      Note, the (un)register_sched_domain_sysctl() functions had to be changed
      from static to allow access to them from core.c.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160222212825.599278093@goodmis.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3866e845