1. 29 8月, 2011 3 次提交
    • S
      perf events: Fix slow and broken cgroup context switch code · a8d757ef
      Stephane Eranian 提交于
      The current cgroup context switch code was incorrect leading
      to bogus counts. Furthermore, as soon as there was an active
      cgroup event on a CPU, the context switch cost on that CPU
      would increase by a significant amount as demonstrated by a
      simple ping/pong example:
      
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10684.51 ctxsw/s
      
      Now start a cgroup perf stat:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
      $ ./pong
       Both processes pinned to CPU1, running for 10s
       6674.61 ctxsw/s
      
      That's a 37% penalty.
      
      Note that pong is not even in the monitored cgroup.
      
      The results shown by perf stat are bogus:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
       Performance counter stats for 'sleep 100':
      
       CPU1 <not counted> cycles   test
       CPU1 16,984,189,138 cycles  #    0.000 GHz
      
      The second 'cycles' event should report a count @ CPU clock
      (here 2.4GHz) as it is counting across all cgroups.
      
      The patch below fixes the bogus accounting and bypasses any
      cgroup switches in case the outgoing and incoming tasks are
      in the same cgroup.
      
      With this patch the same test now yields:
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10775.30 ctxsw/s
      
      Start perf stat with cgroup:
      
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
      Run pong outside the cgroup:
       $ /pong
       Both processes pinned to CPU1, running for 10s
       10687.80 ctxsw/s
      
      The penalty is now less than 2%.
      
      And the results for perf stat are correct:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 <not counted> cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
      Now perf stat reports the correct counts for
      for the non cgroup event.
      
      If we run pong inside the cgroup, then we also get the
      correct counts:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 22,297,726,205 cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
            10.001457237 seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8d757ef
    • L
      Linux 3.1-rc4 · c6a389f1
      Linus Torvalds 提交于
      c6a389f1
    • L
      Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c11a7e26
      Linus Torvalds 提交于
      * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ARM: mach-shmobile: sh7372 LCDC1 suspend fix V2 (incremental)
        OMAP: omap_device: only override _noirq methods, not normal suspend/resume
        PM / Runtime: Correct documentation of pm_runtime_irq_safe()
        ARM: mach-shmobile: sh7372 LCDC1 suspend fix
        sh-sci / PM: Use power.irq_safe
        PM: Use spinlock instead of mutex in clock management functions
      c11a7e26
  2. 28 8月, 2011 1 次提交
  3. 27 8月, 2011 10 次提交
  4. 26 8月, 2011 26 次提交