1. 22 2月, 2014 1 次提交
    • P
      sched: Fix hotplug task migration · 3f1d2a31
      Peter Zijlstra 提交于
      Dan Carpenter reported:
      
      > kernel/sched/rt.c:1347 pick_next_task_rt() warn: variable dereferenced before check 'prev' (see line 1338)
      > kernel/sched/deadline.c:1011 pick_next_task_dl() warn: variable dereferenced before check 'prev' (see line 1005)
      
      Kirill also spotted that migrate_tasks() will have an instant NULL
      deref because pick_next_task() will immediately deref prev.
      
      Instead of fixing all the corner cases because migrate_tasks() can
      pass in a NULL prev task in the unlikely case of hot-un-plug, provide
      a fake task such that we can remove all the NULL checks from the far
      more common paths.
      
      A further problem; not previously spotted; is that because we pushed
      pre_schedule() and idle_balance() into pick_next_task() we now need to
      avoid those getting called and pulling more tasks on our dying CPU.
      
      We avoid pull_{dl,rt}_task() by setting fake_task.prio to MAX_PRIO+1.
      We also note that since we call pick_next_task() exactly the amount of
      times we have runnable tasks present, we should never land in
      idle_balance().
      
      Fixes: 38033c37 ("sched: Push down pre_schedule() and idle_balance()")
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Reported-by: NKirill Tkhai <tkhai@yandex.ru>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140212094930.GB3545@laptop.programming.kicks-ass.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3f1d2a31
  2. 11 2月, 2014 1 次提交
    • P
      sched: Push down pre_schedule() and idle_balance() · 38033c37
      Peter Zijlstra 提交于
      This patch both merged idle_balance() and pre_schedule() and pushes
      both of them into pick_next_task().
      
      Conceptually pre_schedule() and idle_balance() are rather similar,
      both are used to pull more work onto the current CPU.
      
      We cannot however first move idle_balance() into pre_schedule_fair()
      since there is no guarantee the last runnable task is a fair task, and
      thus we would miss newidle balances.
      
      Similarly, the dl and rt pre_schedule calls must be ran before
      idle_balance() since their respective tasks have higher priority and
      it would not do to delay their execution searching for less important
      tasks first.
      
      However, by noticing that pick_next_tasks() already traverses the
      sched_class hierarchy in the right order, we can get the right
      behaviour and do away with both calls.
      
      We must however change the special case optimization to also require
      that prev is of sched_class_fair, otherwise we can miss doing a dl or
      rt pull where we needed one.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/n/tip-a8k6vvaebtn64nie345kx1je@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      38033c37
  3. 10 2月, 2014 1 次提交
  4. 13 1月, 2014 1 次提交
    • J
      sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic · 1baca4ce
      Juri Lelli 提交于
      Introduces data structures relevant for implementing dynamic
      migration of -deadline tasks and the logic for checking if
      runqueues are overloaded with -deadline tasks and for choosing
      where a task should migrate, when it is the case.
      
      Adds also dynamic migrations to SCHED_DEADLINE, so that tasks can
      be moved among CPUs when necessary. It is also possible to bind a
      task to a (set of) CPU(s), thus restricting its capability of
      migrating, or forbidding migrations at all.
      
      The very same approach used in sched_rt is utilised:
       - -deadline tasks are kept into CPU-specific runqueues,
       - -deadline tasks are migrated among runqueues to achieve the
         following:
          * on an M-CPU system the M earliest deadline ready tasks
            are always running;
          * affinity/cpusets settings of all the -deadline tasks is
            always respected.
      
      Therefore, this very special form of "load balancing" is done with
      an active method, i.e., the scheduler pushes or pulls tasks between
      runqueues when they are woken up and/or (de)scheduled.
      IOW, every time a preemption occurs, the descheduled task might be sent
      to some other CPU (depending on its deadline) to continue executing
      (push). On the other hand, every time a CPU becomes idle, it might pull
      the second earliest deadline ready task from some other CPU.
      
      To enforce this, a pull operation is always attempted before taking any
      scheduling decision (pre_schedule()), as well as a push one after each
      scheduling decision (post_schedule()). In addition, when a task arrives
      or wakes up, the best CPU where to resume it is selected taking into
      account its affinity mask, the system topology, but also its deadline.
      E.g., from the scheduling point of view, the best CPU where to wake
      up (and also where to push) a task is the one which is running the task
      with the latest deadline among the M executing ones.
      
      In order to facilitate these decisions, per-runqueue "caching" of the
      deadlines of the currently running and of the first ready task is used.
      Queued but not running tasks are also parked in another rb-tree to
      speed-up pushes.
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-5-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1baca4ce
  5. 17 12月, 2013 1 次提交
  6. 26 10月, 2013 1 次提交
  7. 16 10月, 2013 1 次提交
    • P
      sched/rt: Add missing rmb() · 7c3f2ab7
      Peter Zijlstra 提交于
      While discussing the proposed SCHED_DEADLINE patches which in parts
      mimic the existing FIFO code it was noticed that the wmb in
      rt_set_overloaded() didn't have a matching barrier.
      
      The only site using rt_overloaded() to test the rto_count is
      pull_rt_task() and we should issue a matching rmb before then assuming
      there's an rto_mask bit set.
      
      Without that smp_rmb() in there we could actually miss seeing the
      rto_mask bit.
      
      Also, change to using smp_[wr]mb(), even though this is SMP only code;
      memory barriers without smp_ always make me think they're against
      hardware of some sort.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: vincent.guittot@linaro.org
      Cc: luca.abeni@unitn.it
      Cc: bruce.ashfield@windriver.com
      Cc: dhaval.giani@gmail.com
      Cc: rostedt@goodmis.org
      Cc: hgu1972@gmail.com
      Cc: oleg@redhat.com
      Cc: fweisbec@gmail.com
      Cc: darren@dvhart.com
      Cc: johan.eker@ericsson.com
      Cc: p.faure@akatech.ch
      Cc: paulmck@linux.vnet.ibm.com
      Cc: raistlin@linux.it
      Cc: claudio@evidence.eu.com
      Cc: insop.song@gmail.com
      Cc: michael@amarulasolutions.com
      Cc: liming.wang@windriver.com
      Cc: fchecconi@gmail.com
      Cc: jkacur@redhat.com
      Cc: tommaso.cucinotta@sssup.it
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: harald.gustafsson@ericsson.com
      Cc: nicola.manica@disi.unitn.it
      Cc: tglx@linutronix.de
      Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7c3f2ab7
  8. 09 10月, 2013 1 次提交
  9. 06 10月, 2013 1 次提交
  10. 19 6月, 2013 1 次提交
  11. 28 5月, 2013 2 次提交
  12. 10 5月, 2013 1 次提交
  13. 08 2月, 2013 1 次提交
  14. 04 2月, 2013 1 次提交
  15. 31 1月, 2013 1 次提交
  16. 25 1月, 2013 3 次提交
    • Y
      sched/rt: Avoid updating RT entry timeout twice within one tick period · 57d2aa00
      Ying Xue 提交于
      The issue below was found in 2.6.34-rt rather than mainline rt
      kernel, but the issue still exists upstream as well.
      
      So please let me describe how it was noticed on 2.6.34-rt:
      
      On this version, each softirq has its own thread, it means there
      is at least one RT FIFO task per cpu. The priority of these
      tasks is set to 49 by default. If user launches an RT FIFO task
      with priority lower than 49 of softirq RT tasks, it's possible
      there are two RT FIFO tasks enqueued one cpu runqueue at one
      moment. By current strategy of balancing RT tasks, when it comes
      to RT tasks, we really need to put them off to a CPU that they
      can run on as soon as possible. Even if it means a bit of cache
      line flushing, we want RT tasks to be run with the least latency.
      
      When the user RT FIFO task which just launched before is
      running, the sched timer tick of the current cpu happens. In this
      tick period, the timeout value of the user RT task will be
      updated once. Subsequently, we try to wake up one softirq RT
      task on its local cpu. As the priority of current user RT task
      is lower than the softirq RT task, the current task will be
      preempted by the higher priority softirq RT task. Before
      preemption, we check to see if current can readily move to a
      different cpu. If so, we will reschedule to allow the RT push logic
      to try to move current somewhere else. Whenever the woken
      softirq RT task runs, it first tries to migrate the user FIFO RT
      task over to a cpu that is running a task of lesser priority. If
      migration is done, it will send a reschedule request to the found
      cpu by IPI interrupt. Once the target cpu responds the IPI
      interrupt, it will pick the migrated user RT task to preempt its
      current task. When the user RT task is running on the new cpu,
      the sched timer tick of the cpu fires. So it will tick the user
      RT task again. This also means the RT task timeout value will be
      updated again. As the migration may be done in one tick period,
      it means the user RT task timeout value will be updated twice
      within one tick.
      
      If we set a limit on the amount of cpu time for the user RT task
      by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
      upon reaching the soft limit.
      
      But exactly when the SIGXCPU signal should be sent depends on the
      RT task timeout value. In fact the timeout mechanism of sending
      the SIGXCPU signal assumes the RT task timeout is increased once
      every tick.
      
      However, currently the timeout value may be added twice per
      tick. So it results in the SIGXCPU signal being sent earlier
      than expected.
      
      To solve this issue, we prevent the timeout value from increasing
      twice within one tick time by remembering the jiffies value of
      last updating the timeout. As long as the RT task's jiffies is
      different with the global jiffies value, we allow its timeout to
      be updated.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57d2aa00
    • S
      sched/rt: Use root_domain of rt_rq not current processor · aa7f6730
      Shawn Bohrer 提交于
      When the system has multiple domains do_sched_rt_period_timer()
      can run on any CPU and may iterate over all rt_rq in
      cpu_online_mask.  This means when balance_runtime() is run for a
      given rt_rq that rt_rq may be in a different rd than the current
      processor.  Thus if we use smp_processor_id() to get rd in
      do_balance_runtime() we may borrow runtime from a rt_rq that is
      not part of our rd.
      
      This changes do_balance_runtime to get the rd from the passed in
      rt_rq ensuring that we borrow runtime only from the correct rd
      for the given rt_rq.
      
      This fixes a BUG at kernel/sched/rt.c:687! in __disable_runtime
      when we try reclaim runtime lent to other rt_rq but runtime has
      been lent to a rt_rq in another rd.
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NMike Galbraith <bitbucket@online.de>
      Cc: peterz@infradead.org
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1358186131-29494-1-git-send-email-sbohrer@rgmadvisors.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aa7f6730
    • K
      sched/rt: Add reschedule check to switched_from_rt() · 1158ddb5
      Kirill Tkhai 提交于
      Reschedule rq->curr if the first RT task has just been
      pulled to the rq.
      Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tkhai Kirill <tkhai@yandex.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/118761353614535@web28f.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1158ddb5
  17. 13 9月, 2012 1 次提交
  18. 04 9月, 2012 1 次提交
  19. 14 8月, 2012 1 次提交
  20. 06 6月, 2012 1 次提交
  21. 30 5月, 2012 2 次提交
  22. 13 4月, 2012 1 次提交
  23. 27 3月, 2012 1 次提交
  24. 13 3月, 2012 1 次提交
  25. 01 3月, 2012 2 次提交
  26. 22 2月, 2012 1 次提交
  27. 27 1月, 2012 1 次提交
    • C
      sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW · cb297a3e
      Chanho Min 提交于
      This issue happens under the following conditions:
      
       1. preemption is off
       2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
       3. RT scheduling class
       4. SMP system
      
      Sequence is as follows:
      
       1.suppose current task is A. start schedule()
       2.task A is enqueued pushable task at the entry of schedule()
         __schedule
          prev = rq->curr;
          ...
          put_prev_task
           put_prev_task_rt
            enqueue_pushable_task
       4.pick the task B as next task.
         next = pick_next_task(rq);
       3.rq->curr set to task B and context_switch is started.
         rq->curr = next;
       4.At the entry of context_swtich, release this cpu's rq->lock.
         context_switch
          prepare_task_switch
           prepare_lock_switch
            raw_spin_unlock_irq(&rq->lock);
       5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
       6.try_to_wake_up() which called by ISR acquires rq->lock
          try_to_wake_up
           ttwu_remote
            rq = __task_rq_lock(p)
            ttwu_do_wakeup(rq, p, wake_flags);
              task_woken_rt
       7.push_rt_task picks the task A which is enqueued before.
         task_woken_rt
          push_rt_tasks(rq)
           next_task = pick_next_pushable_task(rq)
       8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
         lowest_rq can be the remote rq.
        (But,If preemption is on, double_lock_balance always return 1 and it
         does't happen.)
         push_rt_task
          find_lock_lowest_rq
           if (double_lock_balance(rq, lowest_rq))..
       9.find_lock_lowest_rq return the available rq. task A is migrated to
         the remote cpu/rq.
         push_rt_task
          ...
          deactivate_task(rq, next_task, 0);
          set_task_cpu(next_task, lowest_rq->cpu);
          activate_task(lowest_rq, next_task, 0);
       10. But, task A is on irq context at this cpu.
           So, task A is scheduled by two cpus at the same time until restore from IRQ.
           Task A's stack is corrupted.
      
      To fix it, don't migrate an RT task if it's still running.
      Signed-off-by: NChanho Min <chanho.min@lge.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      cb297a3e
  28. 06 12月, 2011 2 次提交
    • S
      sched/rt: Code cleanup, remove a redundant function call · 5b680fd6
      Shan Hai 提交于
      The second call to sched_rt_period() is redundant, because the value of the
      rt_runtime was already read and it was protected by the ->rt_runtime_lock.
      Signed-off-by: NShan Hai <haishan.bai@gmail.com>
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      5b680fd6
    • M
      sched: Use rt.nr_cpus_allowed to recover select_task_rq() cycles · 76854c7e
      Mike Galbraith 提交于
      rt.nr_cpus_allowed is always available, use it to bail from select_task_rq()
      when only one cpu can be used, and saves some cycles for pinned tasks.
      
      See the line marked with '*' below:
      
        # taskset -c 3 pipe-test
      
         PerfTop:     997 irqs/sec  kernel:89.5%  exact:  0.0% [1000Hz cycles],  (all, CPU: 3)
      ------------------------------------------------------------------------------------------------
      
                   Virgin                                    Patched
                   samples  pcnt function                    samples  pcnt function
                   _______ _____ ___________________________ _______ _____ ___________________________
      
                   2880.00 10.2% __schedule                  3136.00 11.3% __schedule
                   1634.00  5.8% pipe_read                   1615.00  5.8% pipe_read
                   1458.00  5.2% system_call                 1534.00  5.5% system_call
                   1382.00  4.9% _raw_spin_lock_irqsave      1412.00  5.1% _raw_spin_lock_irqsave
                   1202.00  4.3% pipe_write                  1255.00  4.5% copy_user_generic_string
                   1164.00  4.1% copy_user_generic_string    1241.00  4.5% __switch_to
                   1097.00  3.9% __switch_to                  929.00  3.3% mutex_lock
                    872.00  3.1% mutex_lock                   846.00  3.0% mutex_unlock
                    687.00  2.4% mutex_unlock                 804.00  2.9% pipe_write
                    682.00  2.4% native_sched_clock           713.00  2.6% native_sched_clock
                    643.00  2.3% system_call_after_swapgs     653.00  2.3% _raw_spin_unlock_irqrestore
                    617.00  2.2% sched_clock_local            633.00  2.3% fsnotify
                    612.00  2.2% fsnotify                     605.00  2.2% sched_clock_local
                    596.00  2.1% _raw_spin_unlock_irqrestore  593.00  2.1% system_call_after_swapgs
                    542.00  1.9% sysret_check                 559.00  2.0% sysret_check
                    467.00  1.7% fget_light                   472.00  1.7% fget_light
                    462.00  1.6% finish_task_switch           461.00  1.7% finish_task_switch
                    437.00  1.5% vfs_write                    442.00  1.6% vfs_write
                    431.00  1.5% do_sync_write                428.00  1.5% do_sync_write
      *             413.00  1.5% select_task_rq_fair          404.00  1.5% _raw_spin_lock_irq
                    386.00  1.4% update_curr                  402.00  1.4% update_curr
                    385.00  1.4% rw_verify_area               389.00  1.4% do_sync_read
                    377.00  1.3% _raw_spin_lock_irq           378.00  1.4% vfs_read
                    369.00  1.3% do_sync_read                 340.00  1.2% pipe_iov_copy_from_user
                    360.00  1.3% vfs_read                     316.00  1.1% __wake_up_sync_key
                    342.00  1.2% hrtick_start_fair            313.00  1.1% __wake_up_common
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1321971504.6855.15.camel@marge.simson.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
      76854c7e
  29. 17 11月, 2011 2 次提交
  30. 16 11月, 2011 1 次提交
  31. 14 11月, 2011 1 次提交
  32. 06 10月, 2011 2 次提交