1. 17 12月, 2013 1 次提交
  2. 26 10月, 2013 1 次提交
  3. 16 10月, 2013 1 次提交
    • P
      sched/rt: Add missing rmb() · 7c3f2ab7
      Peter Zijlstra 提交于
      While discussing the proposed SCHED_DEADLINE patches which in parts
      mimic the existing FIFO code it was noticed that the wmb in
      rt_set_overloaded() didn't have a matching barrier.
      
      The only site using rt_overloaded() to test the rto_count is
      pull_rt_task() and we should issue a matching rmb before then assuming
      there's an rto_mask bit set.
      
      Without that smp_rmb() in there we could actually miss seeing the
      rto_mask bit.
      
      Also, change to using smp_[wr]mb(), even though this is SMP only code;
      memory barriers without smp_ always make me think they're against
      hardware of some sort.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: vincent.guittot@linaro.org
      Cc: luca.abeni@unitn.it
      Cc: bruce.ashfield@windriver.com
      Cc: dhaval.giani@gmail.com
      Cc: rostedt@goodmis.org
      Cc: hgu1972@gmail.com
      Cc: oleg@redhat.com
      Cc: fweisbec@gmail.com
      Cc: darren@dvhart.com
      Cc: johan.eker@ericsson.com
      Cc: p.faure@akatech.ch
      Cc: paulmck@linux.vnet.ibm.com
      Cc: raistlin@linux.it
      Cc: claudio@evidence.eu.com
      Cc: insop.song@gmail.com
      Cc: michael@amarulasolutions.com
      Cc: liming.wang@windriver.com
      Cc: fchecconi@gmail.com
      Cc: jkacur@redhat.com
      Cc: tommaso.cucinotta@sssup.it
      Cc: Juri Lelli <juri.lelli@gmail.com>
      Cc: harald.gustafsson@ericsson.com
      Cc: nicola.manica@disi.unitn.it
      Cc: tglx@linutronix.de
      Link: http://lkml.kernel.org/r/20131015103507.GF10651@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7c3f2ab7
  4. 09 10月, 2013 1 次提交
  5. 06 10月, 2013 1 次提交
  6. 19 6月, 2013 1 次提交
  7. 28 5月, 2013 2 次提交
  8. 10 5月, 2013 1 次提交
  9. 08 2月, 2013 1 次提交
  10. 04 2月, 2013 1 次提交
  11. 31 1月, 2013 1 次提交
  12. 25 1月, 2013 3 次提交
    • Y
      sched/rt: Avoid updating RT entry timeout twice within one tick period · 57d2aa00
      Ying Xue 提交于
      The issue below was found in 2.6.34-rt rather than mainline rt
      kernel, but the issue still exists upstream as well.
      
      So please let me describe how it was noticed on 2.6.34-rt:
      
      On this version, each softirq has its own thread, it means there
      is at least one RT FIFO task per cpu. The priority of these
      tasks is set to 49 by default. If user launches an RT FIFO task
      with priority lower than 49 of softirq RT tasks, it's possible
      there are two RT FIFO tasks enqueued one cpu runqueue at one
      moment. By current strategy of balancing RT tasks, when it comes
      to RT tasks, we really need to put them off to a CPU that they
      can run on as soon as possible. Even if it means a bit of cache
      line flushing, we want RT tasks to be run with the least latency.
      
      When the user RT FIFO task which just launched before is
      running, the sched timer tick of the current cpu happens. In this
      tick period, the timeout value of the user RT task will be
      updated once. Subsequently, we try to wake up one softirq RT
      task on its local cpu. As the priority of current user RT task
      is lower than the softirq RT task, the current task will be
      preempted by the higher priority softirq RT task. Before
      preemption, we check to see if current can readily move to a
      different cpu. If so, we will reschedule to allow the RT push logic
      to try to move current somewhere else. Whenever the woken
      softirq RT task runs, it first tries to migrate the user FIFO RT
      task over to a cpu that is running a task of lesser priority. If
      migration is done, it will send a reschedule request to the found
      cpu by IPI interrupt. Once the target cpu responds the IPI
      interrupt, it will pick the migrated user RT task to preempt its
      current task. When the user RT task is running on the new cpu,
      the sched timer tick of the cpu fires. So it will tick the user
      RT task again. This also means the RT task timeout value will be
      updated again. As the migration may be done in one tick period,
      it means the user RT task timeout value will be updated twice
      within one tick.
      
      If we set a limit on the amount of cpu time for the user RT task
      by setrlimit(RLIMIT_RTTIME), the SIGXCPU signal should be posted
      upon reaching the soft limit.
      
      But exactly when the SIGXCPU signal should be sent depends on the
      RT task timeout value. In fact the timeout mechanism of sending
      the SIGXCPU signal assumes the RT task timeout is increased once
      every tick.
      
      However, currently the timeout value may be added twice per
      tick. So it results in the SIGXCPU signal being sent earlier
      than expected.
      
      To solve this issue, we prevent the timeout value from increasing
      twice within one tick time by remembering the jiffies value of
      last updating the timeout. As long as the RT task's jiffies is
      different with the global jiffies value, we allow its timeout to
      be updated.
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NFan Du <fan.du@windriver.com>
      Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1342508623-2887-1-git-send-email-ying.xue@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57d2aa00
    • S
      sched/rt: Use root_domain of rt_rq not current processor · aa7f6730
      Shawn Bohrer 提交于
      When the system has multiple domains do_sched_rt_period_timer()
      can run on any CPU and may iterate over all rt_rq in
      cpu_online_mask.  This means when balance_runtime() is run for a
      given rt_rq that rt_rq may be in a different rd than the current
      processor.  Thus if we use smp_processor_id() to get rd in
      do_balance_runtime() we may borrow runtime from a rt_rq that is
      not part of our rd.
      
      This changes do_balance_runtime to get the rd from the passed in
      rt_rq ensuring that we borrow runtime only from the correct rd
      for the given rt_rq.
      
      This fixes a BUG at kernel/sched/rt.c:687! in __disable_runtime
      when we try reclaim runtime lent to other rt_rq but runtime has
      been lent to a rt_rq in another rd.
      Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NMike Galbraith <bitbucket@online.de>
      Cc: peterz@infradead.org
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1358186131-29494-1-git-send-email-sbohrer@rgmadvisors.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aa7f6730
    • K
      sched/rt: Add reschedule check to switched_from_rt() · 1158ddb5
      Kirill Tkhai 提交于
      Reschedule rq->curr if the first RT task has just been
      pulled to the rq.
      Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tkhai Kirill <tkhai@yandex.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/118761353614535@web28f.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1158ddb5
  13. 13 9月, 2012 1 次提交
  14. 04 9月, 2012 1 次提交
  15. 14 8月, 2012 1 次提交
  16. 06 6月, 2012 1 次提交
  17. 30 5月, 2012 2 次提交
  18. 13 4月, 2012 1 次提交
  19. 27 3月, 2012 1 次提交
  20. 13 3月, 2012 1 次提交
  21. 01 3月, 2012 2 次提交
  22. 22 2月, 2012 1 次提交
  23. 27 1月, 2012 1 次提交
    • C
      sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW · cb297a3e
      Chanho Min 提交于
      This issue happens under the following conditions:
      
       1. preemption is off
       2. __ARCH_WANT_INTERRUPTS_ON_CTXSW is defined
       3. RT scheduling class
       4. SMP system
      
      Sequence is as follows:
      
       1.suppose current task is A. start schedule()
       2.task A is enqueued pushable task at the entry of schedule()
         __schedule
          prev = rq->curr;
          ...
          put_prev_task
           put_prev_task_rt
            enqueue_pushable_task
       4.pick the task B as next task.
         next = pick_next_task(rq);
       3.rq->curr set to task B and context_switch is started.
         rq->curr = next;
       4.At the entry of context_swtich, release this cpu's rq->lock.
         context_switch
          prepare_task_switch
           prepare_lock_switch
            raw_spin_unlock_irq(&rq->lock);
       5.Shortly after rq->lock is released, interrupt is occurred and start IRQ context
       6.try_to_wake_up() which called by ISR acquires rq->lock
          try_to_wake_up
           ttwu_remote
            rq = __task_rq_lock(p)
            ttwu_do_wakeup(rq, p, wake_flags);
              task_woken_rt
       7.push_rt_task picks the task A which is enqueued before.
         task_woken_rt
          push_rt_tasks(rq)
           next_task = pick_next_pushable_task(rq)
       8.At find_lock_lowest_rq(), If double_lock_balance() returns 0,
         lowest_rq can be the remote rq.
        (But,If preemption is on, double_lock_balance always return 1 and it
         does't happen.)
         push_rt_task
          find_lock_lowest_rq
           if (double_lock_balance(rq, lowest_rq))..
       9.find_lock_lowest_rq return the available rq. task A is migrated to
         the remote cpu/rq.
         push_rt_task
          ...
          deactivate_task(rq, next_task, 0);
          set_task_cpu(next_task, lowest_rq->cpu);
          activate_task(lowest_rq, next_task, 0);
       10. But, task A is on irq context at this cpu.
           So, task A is scheduled by two cpus at the same time until restore from IRQ.
           Task A's stack is corrupted.
      
      To fix it, don't migrate an RT task if it's still running.
      Signed-off-by: NChanho Min <chanho.min@lge.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/CAOAMb1BHA=5fm7KTewYyke6u-8DP0iUuJMpgQw54vNeXFsGpoQ@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      cb297a3e
  24. 06 12月, 2011 2 次提交
    • S
      sched/rt: Code cleanup, remove a redundant function call · 5b680fd6
      Shan Hai 提交于
      The second call to sched_rt_period() is redundant, because the value of the
      rt_runtime was already read and it was protected by the ->rt_runtime_lock.
      Signed-off-by: NShan Hai <haishan.bai@gmail.com>
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      5b680fd6
    • M
      sched: Use rt.nr_cpus_allowed to recover select_task_rq() cycles · 76854c7e
      Mike Galbraith 提交于
      rt.nr_cpus_allowed is always available, use it to bail from select_task_rq()
      when only one cpu can be used, and saves some cycles for pinned tasks.
      
      See the line marked with '*' below:
      
        # taskset -c 3 pipe-test
      
         PerfTop:     997 irqs/sec  kernel:89.5%  exact:  0.0% [1000Hz cycles],  (all, CPU: 3)
      ------------------------------------------------------------------------------------------------
      
                   Virgin                                    Patched
                   samples  pcnt function                    samples  pcnt function
                   _______ _____ ___________________________ _______ _____ ___________________________
      
                   2880.00 10.2% __schedule                  3136.00 11.3% __schedule
                   1634.00  5.8% pipe_read                   1615.00  5.8% pipe_read
                   1458.00  5.2% system_call                 1534.00  5.5% system_call
                   1382.00  4.9% _raw_spin_lock_irqsave      1412.00  5.1% _raw_spin_lock_irqsave
                   1202.00  4.3% pipe_write                  1255.00  4.5% copy_user_generic_string
                   1164.00  4.1% copy_user_generic_string    1241.00  4.5% __switch_to
                   1097.00  3.9% __switch_to                  929.00  3.3% mutex_lock
                    872.00  3.1% mutex_lock                   846.00  3.0% mutex_unlock
                    687.00  2.4% mutex_unlock                 804.00  2.9% pipe_write
                    682.00  2.4% native_sched_clock           713.00  2.6% native_sched_clock
                    643.00  2.3% system_call_after_swapgs     653.00  2.3% _raw_spin_unlock_irqrestore
                    617.00  2.2% sched_clock_local            633.00  2.3% fsnotify
                    612.00  2.2% fsnotify                     605.00  2.2% sched_clock_local
                    596.00  2.1% _raw_spin_unlock_irqrestore  593.00  2.1% system_call_after_swapgs
                    542.00  1.9% sysret_check                 559.00  2.0% sysret_check
                    467.00  1.7% fget_light                   472.00  1.7% fget_light
                    462.00  1.6% finish_task_switch           461.00  1.7% finish_task_switch
                    437.00  1.5% vfs_write                    442.00  1.6% vfs_write
                    431.00  1.5% do_sync_write                428.00  1.5% do_sync_write
      *             413.00  1.5% select_task_rq_fair          404.00  1.5% _raw_spin_lock_irq
                    386.00  1.4% update_curr                  402.00  1.4% update_curr
                    385.00  1.4% rw_verify_area               389.00  1.4% do_sync_read
                    377.00  1.3% _raw_spin_lock_irq           378.00  1.4% vfs_read
                    369.00  1.3% do_sync_read                 340.00  1.2% pipe_iov_copy_from_user
                    360.00  1.3% vfs_read                     316.00  1.1% __wake_up_sync_key
                    342.00  1.2% hrtick_start_fair            313.00  1.1% __wake_up_common
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1321971504.6855.15.camel@marge.simson.netSigned-off-by: NIngo Molnar <mingo@elte.hu>
      76854c7e
  25. 17 11月, 2011 2 次提交
  26. 16 11月, 2011 1 次提交
  27. 14 11月, 2011 1 次提交
  28. 06 10月, 2011 3 次提交
  29. 18 9月, 2011 1 次提交
  30. 14 8月, 2011 2 次提交
    • P
      sched: Implement hierarchical task accounting for SCHED_OTHER · 953bfcd1
      Paul Turner 提交于
      Introduce hierarchical task accounting for the group scheduling case in CFS, as
      well as promoting the responsibility for maintaining rq->nr_running to the
      scheduling classes.
      
      The primary motivation for this is that with scheduling classes supporting
      bandwidth throttling it is possible for entities participating in throttled
      sub-trees to not have root visible changes in rq->nr_running across activate
      and de-activate operations.  This in turn leads to incorrect idle and
      weight-per-task load balance decisions.
      
      This also allows us to make a small fixlet to the fastpath in pick_next_task()
      under group scheduling.
      
      Note: this issue also exists with the existing sched_rt throttling mechanism.
      This patch does not address that.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110721184756.878333391@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      953bfcd1
    • S
      sched: Use pushable_tasks to determine next highest prio · 5181f4a4
      Steven Rostedt 提交于
      Hillf Danton proposed a patch (see link) that cleaned up the
      sched_rt code that calculates the priority of the next highest priority
      task to be used in finding run queues to pull from.
      
      His patch removed the calculating of the next prio to just use the current
      prio when deteriming if we should examine a run queue to pull from. The problem
      with his patch was that it caused more false checks. Because we check a run
      queue for pushable tasks if the current priority of that run queue is higher
      in priority than the task about to run on our run queue. But after grabbing
      the locks and doing the real check, we find that there may not be a task
      that has a higher prio task to pull. Thus the locks were taken with nothing to
      do.
      
      I added some trace_printks() to record when and how many times the run queue
      locks were taken to check for pullable tasks, compared to how many times we
      pulled a task.
      
      With the current method, it was:
      
        3806 locks taken vs 2812 pulled tasks
      
      With Hillf's patch:
      
        6728 locks taken vs 2804 pulled tasks
      
      The number of times locks were taken to pull a task went up almost double with
      no more success rate.
      
      But his patch did get me thinking. When we look at the priority of the highest
      task to consider taking the locks to do a pull, a failure to pull can be one
      of the following: (in order of most likely)
      
       o RT task was pushed off already between the check and taking the lock
       o Waiting RT task can not be migrated
       o RT task's CPU affinity does not include the target run queue's CPU
       o RT task's priority changed between the check and taking the lock
      
      And with Hillf's patch, the thing that caused most of the failures, is
      the RT task to pull was not at the right priority to pull (not greater than
      the current RT task priority on the target run queue).
      
      Most of the above cases we can't help. But the current method does not check
      if the next highest prio RT task can be migrated or not, and if it can not,
      we still grab the locks to do the test (we don't find out about this fact until
      after we have the locks). I thought about this case, and realized that the
      pushable task plist that is maintained only holds RT tasks that can migrate.
      If we move the calculating of the next highest prio task from the inc/dec_rt_task()
      functions into the queuing of the pushable tasks, then we only measure the
      priorities of those tasks that we push, and we get this basically for free.
      
      Not only does this patch make the code a little more efficient, it cleans it
      up and makes it a little simpler.
      
      Thanks to Hillf Danton for inspiring me on this patch.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Link: http://lkml.kernel.org/r/BANLkTimQ67180HxCx5vgMqumqw1EkFh3qg@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      5181f4a4