1. 10 1月, 2018 3 次提交
  2. 11 12月, 2017 1 次提交
  3. 05 12月, 2017 1 次提交
  4. 28 11月, 2017 1 次提交
  5. 09 11月, 2017 1 次提交
    • P
      sched/core: Optimize sched_feat() for !CONFIG_SCHED_DEBUG builds · 765cc3a4
      Patrick Bellasi 提交于
      When the kernel is compiled with !CONFIG_SCHED_DEBUG support, we expect that
      all SCHED_FEAT are turned into compile time constants being propagated
      to support compiler optimizations.
      
      Specifically, we expect that code blocks like this:
      
         if (sched_feat(FEATURE_NAME) [&& <other_conditions>]) {
      	/* FEATURE CODE */
         }
      
      are turned into dead-code in case FEATURE_NAME defaults to FALSE, and thus
      being removed by the compiler from the finale image.
      
      For this mechanism to properly work it's required for the compiler to
      have full access, from each translation unit, to whatever is the value
      defined by the sched_feat macro. This macro is defined as:
      
         #define sched_feat(x) (sysctl_sched_features & (1UL << __SCHED_FEAT_##x))
      
      and thus, the compiler can optimize that code only if the value of
      sysctl_sched_features is visible within each translation unit.
      
      Since:
      
         029632fb ("sched: Make separate sched*.c translation units")
      
      the scheduler code has been split into separate translation units
      however the definition of sysctl_sched_features is part of
      kernel/sched/core.c while, for all the other scheduler modules, it is
      visible only via kernel/sched/sched.h as an:
      
         extern const_debug unsigned int sysctl_sched_features
      
      Unfortunately, an extern reference does not allow the compiler to apply
      constants propagation. Thus, on !CONFIG_SCHED_DEBUG kernel we still end up
      with code to load a memory reference and (eventually) doing an unconditional
      jump of a chunk of code.
      
      This mechanism is unavoidable when sched_features can be turned on and off at
      run-time. However, this is not the case for "production" kernels compiled with
      !CONFIG_SCHED_DEBUG. In this case, sysctl_sched_features is just a constant value
      which cannot be changed at run-time and thus memory loads and jumps can be
      avoided altogether.
      
      This patch fixes the case of !CONFIG_SCHED_DEBUG kernel by declaring a local version
      of the sysctl_sched_features constant for each translation unit. This will
      ultimately allow the compiler to perform constants propagation and dead-code
      pruning.
      
      Tests have been done, with !CONFIG_SCHED_DEBUG on a v4.14-rc8 with and without
      the patch, by running 30 iterations of:
      
         perf bench sched messaging --pipe --thread --group 4 --loop 50000
      
      on a 40 cores Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz using the
      powersave governor to rule out variations due to frequency scaling.
      
      Statistics on the reported completion time:
      
                         count     mean       std     min       99%     max
        v4.14-rc8         30.0  15.7831  0.176032  15.442  16.01226  16.014
        v4.14-rc8+patch   30.0  15.5033  0.189681  15.232  15.93938  15.962
      
      ... show a 1.8% speedup on average completion time and 0.5% speedup in the
      99 percentile.
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NChris Redpath <chris.redpath@arm.com>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Reviewed-by: NBrendan Jackman <brendan.jackman@arm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20171108184101.16006-1-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      765cc3a4
  6. 27 10月, 2017 5 次提交
  7. 10 10月, 2017 3 次提交
    • P
      rcutorture: Dump writer stack if stalled · 0032f4e8
      Paul E. McKenney 提交于
      Right now, rcutorture warns if an rcu_torture_writer() kthread stalls,
      but this warning is not always all that helpful.  This commit therefore
      makes the first such warning include a stack dump.
      
      This in turn requires that sched_show_task() be exported to GPL modules,
      so this commit makes that change as well.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      0032f4e8
    • P
      sched,rcu: Make cond_resched() provide RCU quiescent state · f79c3ad6
      Paul E. McKenney 提交于
      There is some confusion as to which of cond_resched() or
      cond_resched_rcu_qs() should be added to long in-kernel loops.
      This commit therefore eliminates the decision by adding RCU quiescent
      states to cond_resched().  This commit also simplifies the code that
      used to interact with cond_resched_rcu_qs(), and that now interacts with
      cond_resched(), to reduce its overhead.  This reduction is necessary to
      allow the heavier-weight cond_resched_rcu_qs() mechanism to be invoked
      everywhere that cond_resched() is invoked.
      
      Part of that reduction in overhead converts the jiffies_till_sched_qs
      kernel parameter to read-only at runtime, thus eliminating the need for
      bounds checking.
      Reported-by: NMichal Hocko <mhocko@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      [ paulmck: Keep PREEMPT=n cond_resched a no-op, per Peter Zijlstra. ]
      f79c3ad6
    • P
      sched: Make resched_cpu() unconditional · 7c2102e5
      Paul E. McKenney 提交于
      The current implementation of synchronize_sched_expedited() incorrectly
      assumes that resched_cpu() is unconditional, which it is not.  This means
      that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
      fails as follows (analysis by Neeraj Upadhyay):
      
      o	CPU1 is waiting for expedited wait to complete:
      
      	sync_rcu_exp_select_cpus
      	     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
      	     IPI sent to CPU5
      
      	synchronize_sched_expedited_wait
      		 ret = swait_event_timeout(rsp->expedited_wq,
      					   sync_rcu_preempt_exp_done(rnp_root),
      					   jiffies_stall);
      
      	expmask = 0x20, CPU 5 in idle path (in cpuidle_enter())
      
      o	CPU5 handles IPI and fails to acquire rq lock.
      
      	Handles IPI
      	     sync_sched_exp_handler
      		 resched_cpu
      		     returns while failing to try lock acquire rq->lock
      		 need_resched is not set
      
      o	CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
      	idle (schedule() is not called).
      
      o	CPU 1 reports RCU stall.
      
      Given that resched_cpu() is now used only by RCU, this commit fixes the
      assumption by making resched_cpu() unconditional.
      Reported-by: NNeeraj Upadhyay <neeraju@codeaurora.org>
      Suggested-by: NNeeraj Upadhyay <neeraju@codeaurora.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      7c2102e5
  8. 30 9月, 2017 3 次提交
    • T
      sched: Implement interface for cgroup unified hierarchy · 0d593634
      Tejun Heo 提交于
      There are a couple interface issues which can be addressed in cgroup2
      interface.
      
      * Stats from cpuacct being reported separately from the cpu stats.
      
      * Use of different time units.  Writable control knobs use
        microseconds, some stat fields use nanoseconds while other cpuacct
        stat fields use centiseconds.
      
      * Control knobs which can't be used in the root cgroup still show up
        in the root.
      
      * Control knob names and semantics aren't consistent with other
        controllers.
      
      This patchset implements cpu controller's interface on cgroup2 which
      adheres to the controller file conventions described in
      Documentation/cgroups/cgroup-v2.txt.  Overall, the following changes
      are made.
      
      * cpuacct is implictly enabled and disabled by cpu and its information
        is reported through "cpu.stat" which now uses microseconds for all
        time durations.  All time duration fields now have "_usec" appended
        to them for clarity.
      
        Note that cpuacct.usage_percpu is currently not included in
        "cpu.stat".  If this information is actually called for, it will be
        added later.
      
      * "cpu.shares" is replaced with "cpu.weight" and operates on the
        standard scale defined by CGROUP_WEIGHT_MIN/DFL/MAX (1, 100, 10000).
        The weight is scaled to scheduler weight so that 100 maps to 1024
        and the ratio relationship is preserved - if weight is W and its
        scaled value is S, W / 100 == S / 1024.  While the mapped range is a
        bit smaller than the orignal scheduler weight range, the dead zones
        on both sides are relatively small and covers wider range than the
        nice value mappings.  This file doesn't make sense in the root
        cgroup and isn't created on root.
      
      * "cpu.weight.nice" is added. When read, it reads back the nice value
        which is closest to the current "cpu.weight".  When written, it sets
        "cpu.weight" to the weight value which matches the nice value.  This
        makes it easy to configure cgroups when they're competing against
        threads in threaded subtrees.
      
      * "cpu.cfs_quota_us" and "cpu.cfs_period_us" are replaced by "cpu.max"
        which contains both quota and period.
      
      v4: - Use cgroup2 basic usage stat as the information source instead
            of cpuacct.
      
      v3: - Added "cpu.weight.nice" to allow using nice values when
            configuring the weight.  The feature is requested by PeterZ.
          - Merge the patch to enable threaded support on cpu and cpuacct.
          - Dropped the bits about getting rid of cpuacct from patch
            description as there is a pretty strong case for making cpuacct
            an implicit controller so that basic cpu usage stats are always
            available.
          - Documentation updated accordingly.  "cpu.rt.max" section is
            dropped for now.
      
      v2: - cpu_stats_show() was incorrectly using CONFIG_FAIR_GROUP_SCHED
            for CFS bandwidth stats and also using raw division for u64.
            Use CONFIG_CFS_BANDWITH and do_div() instead.  "cpu.rt.max" is
            not included yet.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      0d593634
    • T
      sched: Misc preps for cgroup unified hierarchy interface · a1f7164c
      Tejun Heo 提交于
      Make the following changes in preparation for the cpu controller
      interface implementation for cgroup2.  This patch doesn't cause any
      functional differences.
      
      * s/cpu_stats_show()/cpu_cfs_stat_show()/
      
      * s/cpu_files/cpu_legacy_files/
      
      v2: Dropped cpuacct changes as it won't be used by cpu controller
          interface anymore.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      a1f7164c
    • V
      sched/fair: Use reweight_entity() for set_user_nice() · 9059393e
      Vincent Guittot 提交于
      Now that we directly change load_avg and propagate that change into
      the sums, sys_nice() and co should do the same, otherwise its possible
      to confuse load accounting when we migrate near the weight change.
      Fixes-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      [ Added changelog, fixed the call condition. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170517095045.GA8420@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9059393e
  9. 29 9月, 2017 1 次提交
  10. 20 9月, 2017 1 次提交
  11. 12 9月, 2017 1 次提交
  12. 07 9月, 2017 1 次提交
    • P
      sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs · 50e76632
      Peter Zijlstra 提交于
      Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
      because it now resulted in non-cpuset usage breaking too.
      
      On suspend cpuset_cpu_inactive() doesn't call into
      cpuset_update_active_cpus() because it doesn't want to move tasks about,
      there is no need, all tasks are frozen and won't run again until after
      we've resumed everything.
      
      But this means that when we finally do call into
      cpuset_update_active_cpus() after resuming the last frozen cpu in
      cpuset_cpu_active(), the top_cpuset will not have any difference with
      the cpu_active_mask and this it will not in fact do _anything_.
      
      So the cpuset configuration will not be restored. This was largely
      hidden because we would unconditionally create identity domains and
      mobile users would not in fact use cpusets much. And servers what do use
      cpusets tend to not suspend-resume much.
      
      An addition problem is that we'd not in fact wait for the cpuset work to
      finish before resuming the tasks, allowing spurious migrations outside
      of the specified domains.
      
      Fix the rebuild by introducing cpuset_force_rebuild() and fix the
      ordering with cpuset_wait_for_hotplug().
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: deb7aa30 ("cpuset: reorganize CPU / memory hotplug handling")
      Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      50e76632
  13. 17 8月, 2017 1 次提交
    • M
      membarrier: Provide expedited private command · 22e4ebb9
      Mathieu Desnoyers 提交于
      Implement MEMBARRIER_CMD_PRIVATE_EXPEDITED with IPIs using cpumask built
      from all runqueues for which current thread's mm is the same as the
      thread calling sys_membarrier. It executes faster than the non-expedited
      variant (no blocking). It also works on NOHZ_FULL configurations.
      
      Scheduler-wise, it requires a memory barrier before and after context
      switching between processes (which have different mm). The memory
      barrier before context switch is already present. For the barrier after
      context switch:
      
      * Our TSO archs can do RELEASE without being a full barrier. Look at
        x86 spin_unlock() being a regular STORE for example.  But for those
        archs, all atomics imply smp_mb and all of them have atomic ops in
        switch_mm() for mm_cpumask(), and on x86 the CR3 load acts as a full
        barrier.
      
      * From all weakly ordered machines, only ARM64 and PPC can do RELEASE,
        the rest does indeed do smp_mb(), so there the spin_unlock() is a full
        barrier and we're good.
      
      * ARM64 has a very heavy barrier in switch_to(), which suffices.
      
      * PPC just removed its barrier from switch_to(), but appears to be
        talking about adding something to switch_mm(). So add a
        smp_mb__after_unlock_lock() for now, until this is settled on the PPC
        side.
      
      Changes since v3:
      - Properly document the memory barriers provided by each architecture.
      
      Changes since v2:
      - Address comments from Peter Zijlstra,
      - Add smp_mb__after_unlock_lock() after finish_lock_switch() in
        finish_task_switch() to add the memory barrier we need after storing
        to rq->curr. This is much simpler than the previous approach relying
        on atomic_dec_and_test() in mmdrop(), which actually added a memory
        barrier in the common case of switching between userspace processes.
      - Return -EINVAL when MEMBARRIER_CMD_SHARED is used on a nohz_full
        kernel, rather than having the whole membarrier system call returning
        -ENOSYS. Indeed, CMD_PRIVATE_EXPEDITED is compatible with nohz_full.
        Adapt the CMD_QUERY mask accordingly.
      
      Changes since v1:
      - move membarrier code under kernel/sched/ because it uses the
        scheduler runqueue,
      - only add the barrier when we switch from a kernel thread. The case
        where we switch from a user-space thread is already handled by
        the atomic_dec_and_test() in mmdrop().
      - add a comment to mmdrop() documenting the requirement on the implicit
        memory barrier.
      
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      CC: Boqun Feng <boqun.feng@gmail.com>
      CC: Andrew Hunter <ahh@google.com>
      CC: Maged Michael <maged.michael@gmail.com>
      CC: gromer@google.com
      CC: Avi Kivity <avi@scylladb.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NDave Watson <davejwatson@fb.com>
      22e4ebb9
  14. 12 8月, 2017 1 次提交
  15. 10 8月, 2017 4 次提交
  16. 29 7月, 2017 1 次提交
    • T
      sched: Allow migrating kthreads into online but inactive CPUs · 955dbdf4
      Tejun Heo 提交于
      Per-cpu workqueues have been tripping CPU affinity sanity checks while
      a CPU is being offlined.  A per-cpu kworker ends up running on a CPU
      which isn't its target CPU while the CPU is online but inactive.
      
      While the scheduler allows kthreads to wake up on an online but
      inactive CPU, it doesn't allow a running kthread to be migrated to
      such a CPU, which leads to an odd situation where setting affinity on
      a sleeping and running kthread leads to different results.
      
      Each mem-reclaim workqueue has one rescuer which guarantees forward
      progress and the rescuer needs to bind itself to the CPU which needs
      help in making forward progress; however, due to the above issue,
      while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on
      the correct CPU if the CPU is in the process of going offline,
      tripping the sanity check and executing the work item on the wrong
      CPU.
      
      This patch updates __migrate_task() so that kthreads can be migrated
      into an inactive but online CPU.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      955dbdf4
  17. 25 7月, 2017 1 次提交
  18. 23 6月, 2017 3 次提交
  19. 20 6月, 2017 2 次提交
    • I
      sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c · 5822a454
      Ingo Molnar 提交于
      The key hashed waitqueue data structures and their initialization
      was done in the main scheduler file for no good reason, move them
      to sched/wait_bit.c instead.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5822a454
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  20. 11 6月, 2017 1 次提交
  21. 08 6月, 2017 4 次提交
    • P
      sched: Rely on synchronize_rcu_mult() de-duplication · d7d34d5e
      Paul E. McKenney 提交于
      The synchronize_rcu_mult() function now detects duplicate requests
      for the same grace-period flavor and waits only once for each flavor.
      This commit therefore removes the ugly #ifdef from sched_cpu_deactivate()
      because synchronize_rcu_mult(call_rcu, call_rcu_sched) now does what
      the #ifdef used to be needed for.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d7d34d5e
    • N
      sched/core: Omit building stop_sched_class when !SMP · f5832c19
      Nicolas Pitre 提交于
      The stop class is invoked through stop_machine only.
      This is dead code on UP builds.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170529210302.26868-3-nicolas.pitre@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f5832c19
    • D
      sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks · 3effcb42
      Daniel Bristot de Oliveira 提交于
      We have been facing some problems with self-suspending constrained
      deadline tasks. The main reason is that the original CBS was not
      designed for such sort of tasks.
      
      One problem reported by Xunlei Pang takes place when a task
      suspends, and then is awakened before the deadline, but so close
      to the deadline that its remaining runtime can cause the task
      to have an absolute density higher than allowed. In such situation,
      the original CBS assumes that the task is facing an early activation,
      and so it replenishes the task and set another deadline, one deadline
      in the future. This rule works fine for implicit deadline tasks.
      Moreover, it allows the system to adapt the period of a task in which
      the external event source suffered from a clock drift.
      
      However, this opens the window for bandwidth leakage for constrained
      deadline tasks. For instance, a task with the following parameters:
      
        runtime   = 5 ms
        deadline  = 7 ms
        [density] = 5 / 7 = 0.71
        period    = 1000 ms
      
      If the task runs for 1 ms, and then suspends for another 1ms,
      it will be awakened with the following parameters:
      
        remaining runtime = 4
        laxity = 5
      
      presenting a absolute density of 4 / 5 = 0.80.
      
      In this case, the original CBS would assume the task had an early
      wakeup. Then, CBS will reset the runtime, and the absolute deadline will
      be postponed by one relative deadline, allowing the task to run.
      
      The problem is that, if the task runs this pattern forever, it will keep
      receiving bandwidth, being able to run 1ms every 2ms. Following this
      behavior, the task would be able to run 500 ms in 1 sec. Thus running
      more than the 5 ms / 1 sec the admission control allowed it to run.
      
      Trying to address the self-suspending case, Luca Abeni, Giuseppe
      Lipari, and Juri Lelli [1] revisited the CBS in order to deal with
      self-suspending tasks. In the new approach, rather than
      replenishing/postponing the absolute deadline, the revised wakeup rule
      adjusts the remaining runtime, reducing it to fit into the allowed
      density.
      
      A revised version of the idea is:
      
      At a given time t, the maximum absolute density of a task cannot be
      higher than its relative density, that is:
      
        runtime / (deadline - t) <= dl_runtime / dl_deadline
      
      Knowing the laxity of a task (deadline - t), it is possible to move
      it to the other side of the equality, thus enabling to define max
      remaining runtime a task can use within the absolute deadline, without
      over-running the allowed density:
      
        runtime = (dl_runtime / dl_deadline) * (deadline - t)
      
      For instance, in our previous example, the task could still run:
      
        runtime = ( 5 / 7 ) * 5
        runtime = 3.57 ms
      
      Without causing damage for other deadline tasks. It is note worthy
      that the laxity cannot be negative because that would cause a negative
      runtime. Thus, this patch depends on the patch:
      
        df8eac8c ("sched/deadline: Throttle a constrained deadline task activated after the deadline")
      
      Which throttles a constrained deadline task activated after the
      deadline.
      
      Finally, it is also possible to use the revised wakeup rule for
      all other tasks, but that would require some more discussions
      about pros and cons.
      Reported-by: NXunlei Pang <xpang@redhat.com>
      Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      [peterz: replaced dl_is_constrained with dl_is_implicit]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@santannapisa.it>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/5c800ab3a74a168a84ee5f3f84d12a02e11383be.1495803804.git.bristot@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3effcb42
    • L
      sched/deadline: Reclaim bandwidth not used by dl tasks · daec5798
      Luca Abeni 提交于
      This commit introduces a per-runqueue "extra utilization" that can be
      reclaimed by deadline tasks. In this way, the maximum fraction of CPU
      time that can reclaimed by deadline tasks is fixed (and configurable)
      and does not depend on the total deadline utilization.
      The GRUB accounting rule is modified to add this "extra utilization"
      to the inactive utilization of the runqueue, and to avoid reclaiming
      more than a maximum fraction of the CPU time.
      Tested-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: NLuca Abeni <luca.abeni@santannapisa.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Claudio Scordino <claudio@evidence.eu.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
      Link: http://lkml.kernel.org/r/1495138417-6203-10-git-send-email-luca.abeni@santannapisa.itSigned-off-by: NIngo Molnar <mingo@kernel.org>
      daec5798