1. 07 12月, 2009 1 次提交
    • P
      sched: Fix balance vs hotplug race · 6ad4c188
      Peter Zijlstra 提交于
      Since (e761b772: cpu hotplug, sched: Introduce cpu_active_map and redo
      sched domain managment) we have cpu_active_mask which is suppose to rule
      scheduler migration and load-balancing, except it never (fully) did.
      
      The particular problem being solved here is a crash in try_to_wake_up()
      where select_task_rq() ends up selecting an offline cpu because
      select_task_rq_fair() trusts the sched_domain tree to reflect the
      current state of affairs, similarly select_task_rq_rt() trusts the
      root_domain.
      
      However, the sched_domains are updated from CPU_DEAD, which is after the
      cpu is taken offline and after stop_machine is done. Therefore it can
      race perfectly well with code assuming the domains are right.
      
      Cure this by building the domains from cpu_active_mask on
      CPU_DOWN_PREPARE.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ad4c188
  2. 03 12月, 2009 3 次提交
    • F
      mutex: Fix missing conditions to build mutex_spin_on_owner() · c08f7829
      Frederic Weisbecker 提交于
      We don't need to build mutex_spin_on_owner() if we have
      CONFIG_DEBUG_MUTEXES or CONFIG_HAVE_DEFAULT_NO_SPIN_MUTEXES as
      it won't be used under such configs.
      
      Use CONFIG_MUTEX_SPIN_ON_OWNER as it gathers all the necessary
      checks before building it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1259783357-8542-2-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      c08f7829
    • H
      sched, cputime: Introduce thread_group_times() · 0cf55e1e
      Hidetoshi Seto 提交于
      This is a real fix for problem of utime/stime values decreasing
      described in the thread:
      
         http://lkml.org/lkml/2009/11/3/522
      
      Now cputime is accounted in the following way:
      
       - {u,s}time in task_struct are increased every time when the thread
         is interrupted by a tick (timer interrupt).
      
       - When a thread exits, its {u,s}time are added to signal->{u,s}time,
         after adjusted by task_times().
      
       - When all threads in a thread_group exits, accumulated {u,s}time
         (and also c{u,s}time) in signal struct are added to c{u,s}time
         in signal struct of the group's parent.
      
      So {u,s}time in task struct are "raw" tick count, while
      {u,s}time and c{u,s}time in signal struct are "adjusted" values.
      
      And accounted values are used by:
      
       - task_times(), to get cputime of a thread:
         This function returns adjusted values that originates from raw
         {u,s}time and scaled by sum_exec_runtime that accounted by CFS.
      
       - thread_group_cputime(), to get cputime of a thread group:
         This function returns sum of all {u,s}time of living threads in
         the group, plus {u,s}time in the signal struct that is sum of
         adjusted cputimes of all exited threads belonged to the group.
      
      The problem is the return value of thread_group_cputime(),
      because it is mixed sum of "raw" value and "adjusted" value:
      
        group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
      
      This misbehavior can break {u,s}time monotonicity.
      Assume that if there is a thread that have raw values greater
      than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
      but only runs 45ms) and if it exits, cputime will decrease (e.g.
      -5ms).
      
      To fix this, we could do:
      
        group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
      
      But task_times() contains hard divisions, so applying it for
      every thread should be avoided.
      
      This patch fixes the above problem in the following way:
      
       - Modify thread's exit (= __exit_signal()) not to use task_times().
         It means {u,s}time in signal struct accumulates raw values instead
         of adjusted values.  As the result it makes thread_group_cputime()
         to return pure sum of "raw" values.
      
       - Introduce a new function thread_group_times(*task, *utime, *stime)
         that converts "raw" values of thread_group_cputime() to "adjusted"
         values, in same calculation procedure as task_times().
      
       - Modify group's exit (= wait_task_zombie()) to use this introduced
         thread_group_times().  It make c{u,s}time in signal struct to
         have adjusted values like before this patch.
      
       - Replace some thread_group_cputime() by thread_group_times().
         This replacements are only applied where conveys the "adjusted"
         cputime to users, and where already uses task_times() near by it.
         (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
      
      This patch have a positive side effect:
      
       - Before this patch, if a group contains many short-life threads
         (e.g. runs 0.9ms and not interrupted by ticks), the group's
         cputime could be invisible since thread's cputime was accumulated
         after adjusted: imagine adjustment function as adj(ticks, runtime),
           {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
         After this patch it will not happen because the adjustment is
         applied after accumulated.
      
      v2:
       - remove if()s, put new variables into signal_struct.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cf55e1e
    • H
      sched, cputime: Cleanups related to task_times() · d99ca3b9
      Hidetoshi Seto 提交于
      - Remove if({u,s}t)s because no one call it with NULL now.
      - Use cputime_{add,sub}().
      - Add ifndef-endif for prev_{u,s}time since they are used
        only when !VIRT_CPU_ACCOUNTING.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d99ca3b9
  3. 02 12月, 2009 2 次提交
    • R
      sched: Fix isolcpus boot option · bdddd296
      Rusty Russell 提交于
      Anton Blanchard wrote:
      
      > We allocate and zero cpu_isolated_map after the isolcpus
      > __setup option has run. This means cpu_isolated_map always
      > ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
      > cpumask that hasn't been allocated.
      
      I introduced this regression in 49557e62 (sched: Fix
      boot crash by zalloc()ing most of the cpu masks).
      
      Use the bootmem allocator if they set isolcpus=, otherwise
      allocate and zero like normal.
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: peterz@infradead.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Tested-by: NAnton Blanchard <anton@samba.org>
      bdddd296
    • T
      sched: Revert 498657a4 · 8592e648
      Tejun Heo 提交于
      498657a4 incorrectly assumed
      that preempt wasn't disabled around context_switch() and thus
      was fixing imaginary problem.  It also broke KVM because it
      depended on ->sched_in() to be called with irq enabled so that
      it can do smp calls from there.
      
      Revert the incorrect commit and add comment describing different
      contexts under with the two callbacks are invoked.
      
      Avi: spotted transposed in/out in the added comment.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Cc: peterz@infradead.org
      Cc: efault@gmx.de
      Cc: rusty@rustcorp.com.au
      LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8592e648
  4. 26 11月, 2009 4 次提交
    • H
      sched, time: Define nsecs_to_jiffies() · b7b20df9
      Hidetoshi Seto 提交于
      Use of msecs_to_jiffies() for nsecs_to_cputime() have some
      problems:
      
       - The type of msecs_to_jiffies()'s argument is unsigned int, so
         it cannot convert msecs greater than UINT_MAX = about 49.7 days.
      
       - msecs_to_jiffies() returns MAX_JIFFY_OFFSET if MSB of argument
         is set, assuming that input was negative value.  So it cannot
         convert msecs greater than INT_MAX = about 24.8 days too.
      
      This patch defines a new function nsecs_to_jiffies() that can
      deal greater values, and that can deal all incoming values as
      unsigned.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Amrico Wang <xiyou.wangcong@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4B0E16E7.5070307@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7b20df9
    • H
      sched: Remove task_{u,s,g}time() · d5b7c78e
      Hidetoshi Seto 提交于
      Now all task_{u,s}time() pairs are replaced by task_times().
      And task_gtime() is too simple to be an inline function.
      
      Cleanup them all.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      LKML-Reference: <4B0E16D1.70902@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d5b7c78e
    • H
      sched: Introduce task_times() to replace task_{u,s}time() pair · d180c5bc
      Hidetoshi Seto 提交于
      Functions task_{u,s}time() are called in pair in almost all
      cases.  However task_stime() is implemented to call task_utime()
      from its inside, so such paired calls run task_utime() twice.
      
      It means we do heavy divisions (div_u64 + do_div) twice to get
      utime and stime which can be obtained at same time by one set
      of divisions.
      
      This patch introduces a function task_times(*tsk, *utime,
      *stime) to retrieve utime and stime at once in better, optimized
      way.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      LKML-Reference: <4B0E16AE.906@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d180c5bc
    • M
      sched: Limit the number of scheduler debug messages · f6630114
      Mike Travis 提交于
      Remove the verbose scheduler debug messages unless kernel
      parameter "sched_debug" set.  /proc/sched_debug unchanged.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091118002221.489305000@alcatraz.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6630114
  5. 25 11月, 2009 1 次提交
  6. 24 11月, 2009 2 次提交
    • T
      sched: Optimize branch hint in context_switch() · 710390d9
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed over 90%
      incorrect branch hints:
      
        10420275 170645395  94 context_switch                 sched.c
         3043
        10408421 171098521  94 context_switch                 sched.c
         3050
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBB9F.6080304@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      710390d9
    • J
      sched_feat_write(): Update ppos instead of file->f_pos · 42994724
      Jan Blunck 提交于
      sched_feat_write() should update ppos instead of file->f_pos.
      
      (This reduces some BKL dependencies of this code.)
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Cc: jkacur@redhat.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      LKML-Reference: <1258735245-25826-8-git-send-email-jblunck@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42994724
  7. 16 11月, 2009 1 次提交
  8. 15 11月, 2009 1 次提交
  9. 12 11月, 2009 2 次提交
    • H
      sched: Fix granularity of task_u/stime() · 761b1d26
      Hidetoshi Seto 提交于
      Originally task_s/utime() were designed to return clock_t but
      later changed to return cputime_t by following commit:
      
        commit efe567fc
        Author: Christian Borntraeger <borntraeger@de.ibm.com>
        Date:   Thu Aug 23 15:18:02 2007 +0200
      
      It only changed the type of return value, but not the
      implementation. As the result the granularity of task_s/utime()
      is still that of clock_t, not that of cputime_t.
      
      So using task_s/utime() in __exit_signal() makes values
      accumulated to the signal struct to be rounded and coarse
      grained.
      
      This patch removes casts to clock_t in task_u/stime(), to keep
      granularity of cputime_t over the calculation.
      
      v2:
        Use div_u64() to avoid error "undefined reference to `__udivdi3`"
        on some 32bit systems.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: xiyou.wangcong@gmail.com
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      761b1d26
    • M
      sched: Fix/add missing update_rq_clock() calls · 055a0086
      Mike Galbraith 提交于
      kthread_bind(), migrate_task() and sched_fork were missing
      updates, and try_to_wake_up() was updating after having already
      used the stale clock.
      
      Aside from preventing potential latency hits, there' a side
      benefit in that early boot printk time stamps become monotonic.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1258020464.6491.2.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      055a0086
  10. 11 11月, 2009 2 次提交
  11. 10 11月, 2009 1 次提交
  12. 08 11月, 2009 3 次提交
  13. 05 11月, 2009 1 次提交
  14. 04 11月, 2009 2 次提交
  15. 03 11月, 2009 1 次提交
  16. 02 11月, 2009 1 次提交
  17. 28 10月, 2009 1 次提交
    • J
      sched: move rq_weight data array out of .percpu · 4a6cc4bd
      Jiri Kosina 提交于
      Commit 34d76c41 introduced percpu array update_shares_data, size of which
      being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
      NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
      
      Fix this by allocating this array dynamically and keep only pointer to it
      percpu.
      
      The per-cpu handling doesn't impose significant performance penalty on
      potentially contented path in tg_shares_up().
      
      ...
      ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
      ffffffff81043383:       00 00
      ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
      ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff81043393:       00
      ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff8104339b:       00
      ffffffff8104339c:       48 01 d0                add    %rdx,%rax
      ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
      ffffffff810433a6:       00
      ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
      ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
      ...
      
      After:
      
      ...
      ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
      ffffffff81043383:       00
      ffffffff81043384:       48 98                   cltq
      ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
      ffffffff8104338d:       00
      ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
      ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
      ffffffff8104339c:       81
      ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff810433a4:       00
      ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
      ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff810433b5:       00
      ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
      ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
      ...
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4a6cc4bd
  18. 26 10月, 2009 1 次提交
  19. 24 10月, 2009 1 次提交
    • M
      sched: Strengthen buddies and mitigate buddy induced latencies · f685ceac
      Mike Galbraith 提交于
      This patch restores the effectiveness of LAST_BUDDY in preventing
      pgsql+oltp from collapsing due to wakeup preemption. It also
      switches LAST_BUDDY to exclusively do what it does best, namely
      mitigate the effects of aggressive wakeup preemption, which
      improves vmark throughput markedly, and restores mysql+oltp
      scalability.
      
      Since buddies are about scalability, enable them beginning at the
      point where we begin expanding sched_latency, namely
      sched_nr_latency. Previously, buddies were cleared aggressively,
      which seriously reduced their effectiveness. Not clearing
      aggressively however, produces a small drop in mysql+oltp
      throughput immediately after peak, indicating that LAST_BUDDY is
      actually doing some harm. This is right at the point where X on the
      desktop in competition with another load wants low latency service.
      Ergo, do not enable until we need to scale.
      
      To mitigate latency induced by buddies, or by a task just missing
      wakeup preemption, check latency at tick time.
      
      Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
      CACHE_HOT_BUDDY.
      
      Supporting performance tests:
      
       tip   = v2.6.32-rc5-1497-ga525b32
       tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
       tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs
      
      (Three run averages except where noted.)
      
       vmark:
       ------
       tip           108466 messages per second
       tip+          125307 messages per second
       tip+x         125335 messages per second
       tipx          117781 messages per second
       2.6.31.3      122729 messages per second
      
       mysql+oltp:
       -----------
       clients          1        2        4        8       16       32       64        128    256
       ..........................................................................................
       tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
       tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
       tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
       2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86
      
       pgsql+oltp:
       -----------
       clients          1        2        4        8       16       32       64      128      256
       ..........................................................................................
       tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
       tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
       tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
       tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
       2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f685ceac
  20. 12 10月, 2009 1 次提交
  21. 09 10月, 2009 2 次提交
  22. 06 10月, 2009 1 次提交
  23. 05 10月, 2009 1 次提交
  24. 02 10月, 2009 1 次提交
  25. 24 9月, 2009 2 次提交
  26. 22 9月, 2009 1 次提交
    • A
      cpuidle: fix the menu governor to boost IO performance · 69d25870
      Arjan van de Ven 提交于
      Fix the menu idle governor which balances power savings, energy efficiency
      and performance impact.
      
      The reason for a reworked governor is that there have been serious
      performance issues reported with the existing code on Nehalem server
      systems.
      
      To show this I'm sure Andrew wants to see benchmark results:
      (benchmark is "fio", "no cstates" is using "idle=poll")
      
      		no cstates	current linux	new algorithm
      1 disk		107 Mb/s	85 Mb/s		105 Mb/s
      2 disks		215 Mb/s	123 Mb/s	209 Mb/s
      12 disks	590 Mb/s	320 Mb/s	585 Mb/s
      
      In various power benchmark measurements, no degredation was found by our
      measurement&diagnostics team.  Obviously a small percentage more power was
      used in the "fio" benchmark, due to the much higher performance.
      
      While it would be a novel idea to describe the new algorithm in this
      commit message, I cheaped out and described it in comments in the code
      instead.
      
      [changes since first post: spelling fixes from akpm, review feedback,
      folded menu-tng into menu.c]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69d25870