1. 04 3月, 2011 1 次提交
    • D
      sched: Allow users with sufficient RLIMIT_NICE to change from SCHED_IDLE policy · c02aa73b
      Darren Hart 提交于
      The current scheduler implementation returns -EPERM when trying to
      change from SCHED_IDLE to SCHED_OTHER or SCHED_BATCH. Since SCHED_IDLE
      is considered to be a nice 20 on steroids, changing to another policy
      should be allowed provided the RLIMIT_NICE is accounted for.
      
      This patch allows the following test-case to pass with RLIMIT_NICE=40,
      but still fail with RLIMIT_NICE=10 when the calling process is run
      from a typical shell (nice 0, or 20 in rlimit terms).
      
      int main()
      {
      	int ret;
      	struct sched_param sp;
      	sp.sched_priority = 0;
      
      	/* switch to SCHED_IDLE */
      	ret = sched_setscheduler(0, SCHED_IDLE, &sp);
      	printf("setscheduler IDLE: %d\n", ret);
      	if (ret) return ret;
      
      	/* switch back to SCHED_OTHER */
      	ret = sched_setscheduler(0, SCHED_OTHER, &sp);
      	printf("setscheduler OTHER: %d\n", ret);
      
      	return ret;
      }
      
       $ ulimit -e
       40
       $ ./test
       setscheduler IDLE: 0
       setscheduler OTHER: 0
      
       $ ulimit -e 10
       $ ulimit -e
       10
       $ ./test
       setscheduler IDLE: 0
       setscheduler OTHER: -1
      Signed-off-by: NDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
      LKML-Reference: <4D657BEE.4040608@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c02aa73b
  2. 26 2月, 2011 1 次提交
    • V
      sched: Clean up the IRQ_TIME_ACCOUNTING code · 544b4a1f
      Venkatesh Pallipadi 提交于
      Fix this warning:
      
        lkml.org/lkml/2011/1/30/124
      
       kernel/sched.c:3719: warning: 'irqtime_account_idle_ticks' defined but not used
       kernel/sched.c:3720: warning: 'irqtime_account_process_tick' defined but not used
      
      In a cleaner way than:
      
       7e949870: sched: Add #ifdef around irq time accounting functions
      
      This patch will not have any functional impact.
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Cc: heiko.carstens@de.ibm.com
      Cc: a.p.zijlstra@chello.nl
      LKML-Reference: <1298675596-10992-1-git-send-email-venki@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      544b4a1f
  3. 25 2月, 2011 1 次提交
  4. 03 2月, 2011 2 次提交
    • M
      sched: Add yield_to(task, preempt) functionality · d95f4122
      Mike Galbraith 提交于
      Currently only implemented for fair class tasks.
      
      Add a yield_to_task method() to the fair scheduling class. allowing the
      caller of yield_to() to accelerate another thread in it's thread group,
      task group.
      
      Implemented via a scheduler hint, using cfs_rq->next to encourage the
      target being selected.  We can rely on pick_next_entity to keep things
      fair, so noone can accelerate a thread that has already used its fair
      share of CPU time.
      
      This also means callers should only call yield_to when they really
      mean it.  Calling it too often can result in the scheduler just
      ignoring the hint.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095051.4ddb7738@annuminas.surriel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d95f4122
    • R
      sched: Use a buddy to implement yield_task_fair() · ac53db59
      Rik van Riel 提交于
      Use the buddy mechanism to implement yield_task_fair.  This
      allows us to skip onto the next highest priority se at every
      level in the CFS tree, unless doing so would introduce gross
      unfairness in CPU time distribution.
      
      We order the buddy selection in pick_next_entity to check
      yield first, then last, then next.  We need next to be able
      to override yield, because it is possible for the "next" and
      "yield" task to be different processen in the same sub-tree
      of the CFS tree.  When they are, we need to go into that
      sub-tree regardless of the "yield" hint, and pick the correct
      entity once we get to the right level.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ac53db59
  5. 27 1月, 2011 1 次提交
  6. 26 1月, 2011 7 次提交
  7. 19 1月, 2011 1 次提交
  8. 18 1月, 2011 2 次提交
  9. 07 1月, 2011 3 次提交
  10. 05 1月, 2011 2 次提交
    • N
      sched: Change wait_for_completion_*_timeout() to return a signed long · 6bf41237
      NeilBrown 提交于
      wait_for_completion_*_timeout() can return:
      
         0: if the wait timed out
       -ve: if the wait was interrupted
       +ve: if the completion was completed.
      
      As they currently return an 'unsigned long', the last two cases
      are not easily distinguished which can easily result in buggy
      code, as is the case for the recently added
      wait_for_completion_interruptible_timeout() call in
      net/sunrpc/cache.c
      
      So change them both to return 'long'.  As MAX_SCHEDULE_TIMEOUT
      is LONG_MAX, a large +ve return value should never overflow.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: J.  Bruce Fields <bfields@fieldses.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110105125016.64ccab0e@notabene.brown>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bf41237
    • G
      [S390] mutex: Introduce arch_mutex_cpu_relax() · 34b133f8
      Gerald Schaefer 提交于
      The spinning mutex implementation uses cpu_relax() in busy loops as a
      compiler barrier. Depending on the architecture, cpu_relax() may do more
      than needed in this specific mutex spin loops. On System z we also give
      up the time slice of the virtual cpu in cpu_relax(), which prevents
      effective spinning on the mutex.
      
      This patch replaces cpu_relax() in the spinning mutex code with
      arch_mutex_cpu_relax(), which can be defined by each architecture that
      selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
      this patch should not affect other architectures than System z for now.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1290437256.7455.4.camel@thinkpad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34b133f8
  11. 04 1月, 2011 1 次提交
  12. 20 12月, 2010 1 次提交
  13. 16 12月, 2010 3 次提交
  14. 09 12月, 2010 3 次提交
  15. 30 11月, 2010 3 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
    • P
      sched: Fix unregister_fair_sched_group() · 822bc180
      Paul Turner 提交于
      In the flipping and flopping between calling
      unregister_fair_sched_group() on a per-cpu versus per-group basis
      we ended up in a bad state.
      
      Remove from the list for the passed cpu as opposed to some
      arbitrary index.
      
      ( This fixes explosions w/ autogroup as well as a group
        creation/destruction stress test. )
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NPaul Turner <pjt@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20101130005740.080828123@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      822bc180
    • L
      rcu,cleanup: move synchronize_sched_expedited() out of sched.c · 7b27d547
      Lai Jiangshan 提交于
      The first version of synchronize_sched_expedited() used the migration
      code in the scheduler, and was therefore implemented in kernel/sched.c.
      However, the more recent version of this code no longer uses the
      migration code, so this commit moves it to the main RCU source files.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      7b27d547
  16. 26 11月, 2010 2 次提交
  17. 23 11月, 2010 1 次提交
  18. 18 11月, 2010 5 次提交