1. 03 12月, 2009 2 次提交
    • H
      sched, cputime: Introduce thread_group_times() · 0cf55e1e
      Hidetoshi Seto 提交于
      This is a real fix for problem of utime/stime values decreasing
      described in the thread:
      
         http://lkml.org/lkml/2009/11/3/522
      
      Now cputime is accounted in the following way:
      
       - {u,s}time in task_struct are increased every time when the thread
         is interrupted by a tick (timer interrupt).
      
       - When a thread exits, its {u,s}time are added to signal->{u,s}time,
         after adjusted by task_times().
      
       - When all threads in a thread_group exits, accumulated {u,s}time
         (and also c{u,s}time) in signal struct are added to c{u,s}time
         in signal struct of the group's parent.
      
      So {u,s}time in task struct are "raw" tick count, while
      {u,s}time and c{u,s}time in signal struct are "adjusted" values.
      
      And accounted values are used by:
      
       - task_times(), to get cputime of a thread:
         This function returns adjusted values that originates from raw
         {u,s}time and scaled by sum_exec_runtime that accounted by CFS.
      
       - thread_group_cputime(), to get cputime of a thread group:
         This function returns sum of all {u,s}time of living threads in
         the group, plus {u,s}time in the signal struct that is sum of
         adjusted cputimes of all exited threads belonged to the group.
      
      The problem is the return value of thread_group_cputime(),
      because it is mixed sum of "raw" value and "adjusted" value:
      
        group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time)
      
      This misbehavior can break {u,s}time monotonicity.
      Assume that if there is a thread that have raw values greater
      than adjusted values (e.g. interrupted by 1000Hz ticks 50 times
      but only runs 45ms) and if it exits, cputime will decrease (e.g.
      -5ms).
      
      To fix this, we could do:
      
        group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time)
      
      But task_times() contains hard divisions, so applying it for
      every thread should be avoided.
      
      This patch fixes the above problem in the following way:
      
       - Modify thread's exit (= __exit_signal()) not to use task_times().
         It means {u,s}time in signal struct accumulates raw values instead
         of adjusted values.  As the result it makes thread_group_cputime()
         to return pure sum of "raw" values.
      
       - Introduce a new function thread_group_times(*task, *utime, *stime)
         that converts "raw" values of thread_group_cputime() to "adjusted"
         values, in same calculation procedure as task_times().
      
       - Modify group's exit (= wait_task_zombie()) to use this introduced
         thread_group_times().  It make c{u,s}time in signal struct to
         have adjusted values like before this patch.
      
       - Replace some thread_group_cputime() by thread_group_times().
         This replacements are only applied where conveys the "adjusted"
         cputime to users, and where already uses task_times() near by it.
         (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.)
      
      This patch have a positive side effect:
      
       - Before this patch, if a group contains many short-life threads
         (e.g. runs 0.9ms and not interrupted by ticks), the group's
         cputime could be invisible since thread's cputime was accumulated
         after adjusted: imagine adjustment function as adj(ticks, runtime),
           {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0.
         After this patch it will not happen because the adjustment is
         applied after accumulated.
      
      v2:
       - remove if()s, put new variables into signal_struct.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B162517.8040909@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0cf55e1e
    • H
      sched, cputime: Cleanups related to task_times() · d99ca3b9
      Hidetoshi Seto 提交于
      - Remove if({u,s}t)s because no one call it with NULL now.
      - Use cputime_{add,sub}().
      - Add ifndef-endif for prev_{u,s}time since they are used
        only when !VIRT_CPU_ACCOUNTING.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4B1624C7.7040302@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d99ca3b9
  2. 02 12月, 2009 2 次提交
    • R
      sched: Fix isolcpus boot option · bdddd296
      Rusty Russell 提交于
      Anton Blanchard wrote:
      
      > We allocate and zero cpu_isolated_map after the isolcpus
      > __setup option has run. This means cpu_isolated_map always
      > ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
      > cpumask that hasn't been allocated.
      
      I introduced this regression in 49557e62 (sched: Fix
      boot crash by zalloc()ing most of the cpu masks).
      
      Use the bootmem allocator if they set isolcpus=, otherwise
      allocate and zero like normal.
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: peterz@infradead.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Tested-by: NAnton Blanchard <anton@samba.org>
      bdddd296
    • T
      sched: Revert 498657a4 · 8592e648
      Tejun Heo 提交于
      498657a4 incorrectly assumed
      that preempt wasn't disabled around context_switch() and thus
      was fixing imaginary problem.  It also broke KVM because it
      depended on ->sched_in() to be called with irq enabled so that
      it can do smp calls from there.
      
      Revert the incorrect commit and add comment describing different
      contexts under with the two callbacks are invoked.
      
      Avi: spotted transposed in/out in the added comment.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Cc: peterz@infradead.org
      Cc: efault@gmx.de
      Cc: rusty@rustcorp.com.au
      LKML-Reference: <1259726212-30259-2-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8592e648
  3. 26 11月, 2009 4 次提交
    • H
      sched, time: Define nsecs_to_jiffies() · b7b20df9
      Hidetoshi Seto 提交于
      Use of msecs_to_jiffies() for nsecs_to_cputime() have some
      problems:
      
       - The type of msecs_to_jiffies()'s argument is unsigned int, so
         it cannot convert msecs greater than UINT_MAX = about 49.7 days.
      
       - msecs_to_jiffies() returns MAX_JIFFY_OFFSET if MSB of argument
         is set, assuming that input was negative value.  So it cannot
         convert msecs greater than INT_MAX = about 24.8 days too.
      
      This patch defines a new function nsecs_to_jiffies() that can
      deal greater values, and that can deal all incoming values as
      unsigned.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Amrico Wang <xiyou.wangcong@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4B0E16E7.5070307@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7b20df9
    • H
      sched: Remove task_{u,s,g}time() · d5b7c78e
      Hidetoshi Seto 提交于
      Now all task_{u,s}time() pairs are replaced by task_times().
      And task_gtime() is too simple to be an inline function.
      
      Cleanup them all.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      LKML-Reference: <4B0E16D1.70902@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d5b7c78e
    • H
      sched: Introduce task_times() to replace task_{u,s}time() pair · d180c5bc
      Hidetoshi Seto 提交于
      Functions task_{u,s}time() are called in pair in almost all
      cases.  However task_stime() is implemented to call task_utime()
      from its inside, so such paired calls run task_utime() twice.
      
      It means we do heavy divisions (div_u64 + do_div) twice to get
      utime and stime which can be obtained at same time by one set
      of divisions.
      
      This patch introduces a function task_times(*tsk, *utime,
      *stime) to retrieve utime and stime at once in better, optimized
      way.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      LKML-Reference: <4B0E16AE.906@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d180c5bc
    • M
      sched: Limit the number of scheduler debug messages · f6630114
      Mike Travis 提交于
      Remove the verbose scheduler debug messages unless kernel
      parameter "sched_debug" set.  /proc/sched_debug unchanged.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091118002221.489305000@alcatraz.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6630114
  4. 25 11月, 2009 1 次提交
  5. 24 11月, 2009 2 次提交
    • T
      sched: Optimize branch hint in context_switch() · 710390d9
      Tim Blechmann 提交于
      Branch hint profiling on my nehalem machine showed over 90%
      incorrect branch hints:
      
        10420275 170645395  94 context_switch                 sched.c
         3043
        10408421 171098521  94 context_switch                 sched.c
         3050
      Signed-off-by: NTim Blechmann <tim@klingt.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4B0BBB9F.6080304@klingt.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      710390d9
    • J
      sched_feat_write(): Update ppos instead of file->f_pos · 42994724
      Jan Blunck 提交于
      sched_feat_write() should update ppos instead of file->f_pos.
      
      (This reduces some BKL dependencies of this code.)
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Cc: jkacur@redhat.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      LKML-Reference: <1258735245-25826-8-git-send-email-jblunck@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      42994724
  6. 16 11月, 2009 1 次提交
  7. 15 11月, 2009 1 次提交
  8. 12 11月, 2009 2 次提交
    • H
      sched: Fix granularity of task_u/stime() · 761b1d26
      Hidetoshi Seto 提交于
      Originally task_s/utime() were designed to return clock_t but
      later changed to return cputime_t by following commit:
      
        commit efe567fc
        Author: Christian Borntraeger <borntraeger@de.ibm.com>
        Date:   Thu Aug 23 15:18:02 2007 +0200
      
      It only changed the type of return value, but not the
      implementation. As the result the granularity of task_s/utime()
      is still that of clock_t, not that of cputime_t.
      
      So using task_s/utime() in __exit_signal() makes values
      accumulated to the signal struct to be rounded and coarse
      grained.
      
      This patch removes casts to clock_t in task_u/stime(), to keep
      granularity of cputime_t over the calculation.
      
      v2:
        Use div_u64() to avoid error "undefined reference to `__udivdi3`"
        on some 32bit systems.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: xiyou.wangcong@gmail.com
      Cc: Spencer Candland <spencer@bluehost.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      761b1d26
    • M
      sched: Fix/add missing update_rq_clock() calls · 055a0086
      Mike Galbraith 提交于
      kthread_bind(), migrate_task() and sched_fork were missing
      updates, and try_to_wake_up() was updating after having already
      used the stale clock.
      
      Aside from preventing potential latency hits, there' a side
      benefit in that early boot printk time stamps become monotonic.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1258020464.6491.2.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      055a0086
  9. 11 11月, 2009 1 次提交
  10. 10 11月, 2009 1 次提交
  11. 08 11月, 2009 3 次提交
  12. 05 11月, 2009 1 次提交
  13. 04 11月, 2009 2 次提交
  14. 03 11月, 2009 1 次提交
  15. 02 11月, 2009 1 次提交
  16. 28 10月, 2009 1 次提交
    • J
      sched: move rq_weight data array out of .percpu · 4a6cc4bd
      Jiri Kosina 提交于
      Commit 34d76c41 introduced percpu array update_shares_data, size of which
      being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
      NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
      
      Fix this by allocating this array dynamically and keep only pointer to it
      percpu.
      
      The per-cpu handling doesn't impose significant performance penalty on
      potentially contented path in tg_shares_up().
      
      ...
      ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
      ffffffff81043383:       00 00
      ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
      ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff81043393:       00
      ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff8104339b:       00
      ffffffff8104339c:       48 01 d0                add    %rdx,%rax
      ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
      ffffffff810433a6:       00
      ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
      ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
      ...
      
      After:
      
      ...
      ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
      ffffffff81043383:       00
      ffffffff81043384:       48 98                   cltq
      ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
      ffffffff8104338d:       00
      ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
      ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
      ffffffff8104339c:       81
      ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff810433a4:       00
      ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
      ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff810433b5:       00
      ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
      ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
      ...
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4a6cc4bd
  17. 26 10月, 2009 1 次提交
  18. 24 10月, 2009 1 次提交
    • M
      sched: Strengthen buddies and mitigate buddy induced latencies · f685ceac
      Mike Galbraith 提交于
      This patch restores the effectiveness of LAST_BUDDY in preventing
      pgsql+oltp from collapsing due to wakeup preemption. It also
      switches LAST_BUDDY to exclusively do what it does best, namely
      mitigate the effects of aggressive wakeup preemption, which
      improves vmark throughput markedly, and restores mysql+oltp
      scalability.
      
      Since buddies are about scalability, enable them beginning at the
      point where we begin expanding sched_latency, namely
      sched_nr_latency. Previously, buddies were cleared aggressively,
      which seriously reduced their effectiveness. Not clearing
      aggressively however, produces a small drop in mysql+oltp
      throughput immediately after peak, indicating that LAST_BUDDY is
      actually doing some harm. This is right at the point where X on the
      desktop in competition with another load wants low latency service.
      Ergo, do not enable until we need to scale.
      
      To mitigate latency induced by buddies, or by a task just missing
      wakeup preemption, check latency at tick time.
      
      Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
      CACHE_HOT_BUDDY.
      
      Supporting performance tests:
      
       tip   = v2.6.32-rc5-1497-ga525b32
       tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
       tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs
      
      (Three run averages except where noted.)
      
       vmark:
       ------
       tip           108466 messages per second
       tip+          125307 messages per second
       tip+x         125335 messages per second
       tipx          117781 messages per second
       2.6.31.3      122729 messages per second
      
       mysql+oltp:
       -----------
       clients          1        2        4        8       16       32       64        128    256
       ..........................................................................................
       tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
       tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
       tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
       2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86
      
       pgsql+oltp:
       -----------
       clients          1        2        4        8       16       32       64      128      256
       ..........................................................................................
       tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
       tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
       tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
       tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
       2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f685ceac
  19. 12 10月, 2009 1 次提交
  20. 09 10月, 2009 2 次提交
  21. 06 10月, 2009 1 次提交
  22. 05 10月, 2009 1 次提交
  23. 02 10月, 2009 1 次提交
  24. 24 9月, 2009 2 次提交
  25. 22 9月, 2009 1 次提交
    • A
      cpuidle: fix the menu governor to boost IO performance · 69d25870
      Arjan van de Ven 提交于
      Fix the menu idle governor which balances power savings, energy efficiency
      and performance impact.
      
      The reason for a reworked governor is that there have been serious
      performance issues reported with the existing code on Nehalem server
      systems.
      
      To show this I'm sure Andrew wants to see benchmark results:
      (benchmark is "fio", "no cstates" is using "idle=poll")
      
      		no cstates	current linux	new algorithm
      1 disk		107 Mb/s	85 Mb/s		105 Mb/s
      2 disks		215 Mb/s	123 Mb/s	209 Mb/s
      12 disks	590 Mb/s	320 Mb/s	585 Mb/s
      
      In various power benchmark measurements, no degredation was found by our
      measurement&diagnostics team.  Obviously a small percentage more power was
      used in the "fio" benchmark, due to the much higher performance.
      
      While it would be a novel idea to describe the new algorithm in this
      commit message, I cheaped out and described it in comments in the code
      instead.
      
      [changes since first post: spelling fixes from akpm, review feedback,
      folded menu-tng into menu.c]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69d25870
  26. 21 9月, 2009 3 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
    • P
      sched: Simplify sys_sched_rr_get_interval() system call · 0d721cea
      Peter Williams 提交于
      By removing the need for it to know details of scheduling classes.
      
      This allows PlugSched to define orthogonal scheduling classes.
      Signed-off-by: NPeter Williams <pwil3058@bigpond.net.au>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <06d1b89ee15a0eef82d7.1253496713@mudlark.pw.nest>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0d721cea
    • Y
      sched: Fix potential NULL derference of doms_cur · cb5fd13f
      Yong Zhang 提交于
      If CONFIG_CPUMASK_OFFSTACK is enabled but doms_cur alloc failed in
      arch_init_sched_domains(), doms_cur will move back to
      fallback_doms. But this time, fallback_doms has not been
      initialized yet.
      Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
      Cc: a.p.zijlstra@chello.nl
      LKML-Reference: <1252930816-7672-1-git-send-email-yong.zhang0@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cb5fd13f