1. 29 9月, 2011 28 次提交
  2. 26 9月, 2011 1 次提交
  3. 20 9月, 2011 2 次提交
  4. 15 9月, 2011 1 次提交
  5. 12 9月, 2011 1 次提交
  6. 31 8月, 2011 1 次提交
    • E
      perf_event: Fix broken calc_timer_values() · 7f310a5d
      Eric B Munson 提交于
      We detected a serious issue with PERF_SAMPLE_READ and
      timing information when events were being multiplexing.
      
      Samples would have time_running > time_enabled. That
      was easy to reproduce with a libpfm4 example (ran 3
      times to cause multiplexing on Core 2):
      
       $ syst_smpl -e uops_retired:freq=1 &
       $ syst_smpl -e uops_retired:freq=1 &
       $ syst_smpl -e uops_retired:freq=1 &
       IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
       syst_smpl: WARNING: time_running > time_enabled
      	63277537998 uops_retired:freq=1 , scaled
      
      The bug was not present in kernel up to (and including) 3.0. It turns
      out the bug was introduced by the following commit:
      
      commit c4794295
      
          events: Move lockless timer calculation into helper function
      
      The parameters of the function got reversed yet the call sites
      were not updated to reflect the change. That lead to time_running
      and time_enabled being swapped. That had no effect when there was
      no multiplexing because in that case time_running = time_enabled
      but it would show up in any other scenario.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110829124112.GA4828@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      7f310a5d
  7. 29 8月, 2011 4 次提交
    • S
      perf events: Fix slow and broken cgroup context switch code · a8d757ef
      Stephane Eranian 提交于
      The current cgroup context switch code was incorrect leading
      to bogus counts. Furthermore, as soon as there was an active
      cgroup event on a CPU, the context switch cost on that CPU
      would increase by a significant amount as demonstrated by a
      simple ping/pong example:
      
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10684.51 ctxsw/s
      
      Now start a cgroup perf stat:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
      $ ./pong
       Both processes pinned to CPU1, running for 10s
       6674.61 ctxsw/s
      
      That's a 37% penalty.
      
      Note that pong is not even in the monitored cgroup.
      
      The results shown by perf stat are bogus:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
       Performance counter stats for 'sleep 100':
      
       CPU1 <not counted> cycles   test
       CPU1 16,984,189,138 cycles  #    0.000 GHz
      
      The second 'cycles' event should report a count @ CPU clock
      (here 2.4GHz) as it is counting across all cgroups.
      
      The patch below fixes the bogus accounting and bypasses any
      cgroup switches in case the outgoing and incoming tasks are
      in the same cgroup.
      
      With this patch the same test now yields:
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10775.30 ctxsw/s
      
      Start perf stat with cgroup:
      
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
      Run pong outside the cgroup:
       $ /pong
       Both processes pinned to CPU1, running for 10s
       10687.80 ctxsw/s
      
      The penalty is now less than 2%.
      
      And the results for perf stat are correct:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 <not counted> cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
      Now perf stat reports the correct counts for
      for the non cgroup event.
      
      If we run pong inside the cgroup, then we also get the
      correct counts:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 22,297,726,205 cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
            10.001457237 seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8d757ef
    • W
      sched: Fix a memory leak in __sdt_free() · feff8fa0
      WANG Cong 提交于
      This patch fixes the following memory leak:
      
      unreferenced object 0xffff880107266800 (size 512):
        comm "sched-powersave", pid 3718, jiffies 4323097853 (age 27495.450s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81133940>] create_object+0x187/0x28b
          [<ffffffff814ac103>] kmemleak_alloc+0x73/0x98
          [<ffffffff811232ba>] __kmalloc_node+0x104/0x159
          [<ffffffff81044b98>] kzalloc_node.clone.97+0x15/0x17
          [<ffffffff8104cb90>] build_sched_domains+0xb7/0x7f3
          [<ffffffff8104d4df>] partition_sched_domains+0x1db/0x24a
          [<ffffffff8109ee4a>] do_rebuild_sched_domains+0x3b/0x47
          [<ffffffff810a00c7>] rebuild_sched_domains+0x10/0x12
          [<ffffffff8104d5ba>] sched_power_savings_store+0x6c/0x7b
          [<ffffffff8104d5df>] sched_mc_power_savings_store+0x16/0x18
          [<ffffffff8131322c>] sysdev_class_store+0x20/0x22
          [<ffffffff81193876>] sysfs_write_file+0x108/0x144
          [<ffffffff81135b10>] vfs_write+0xaf/0x102
          [<ffffffff81135d23>] sys_write+0x4d/0x74
          [<ffffffff814c8a42>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org # 3.0
      Link: http://lkml.kernel.org/r/1313671017-4112-1-git-send-email-amwang@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      feff8fa0
    • T
      sched: Move blk_schedule_flush_plug() out of __schedule() · 9c40cef2
      Thomas Gleixner 提交于
      There is no real reason to run blk_schedule_flush_plug() with
      interrupts and preemption disabled.
      
      Move it into schedule() and call it when the task is going voluntarily
      to sleep. There might be false positives when the task is woken
      between that call and actually scheduling, but that's not really
      different from being woken immediately after switching away.
      
      This fixes a deadlock in the scheduler where the
      blk_schedule_flush_plug() callchain enables interrupts and thereby
      allows a wakeup to happen of the task that's going to sleep.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@kernel.org # 2.6.39+
      Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      9c40cef2
    • T
      sched: Separate the scheduler entry for preemption · c259e01a
      Thomas Gleixner 提交于
      Block-IO and workqueues call into notifier functions from the
      scheduler core code with interrupts and preemption disabled. These
      calls should be made before entering the scheduler core.
      
      To simplify this, separate the scheduler core code into
      __schedule(). __schedule() is directly called from the places which
      set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
      checks into schedule(), so they are only called when a task voluntary
      goes to sleep.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@kernel.org # 2.6.39+
      Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.deSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c259e01a
  8. 27 8月, 2011 1 次提交
  9. 26 8月, 2011 1 次提交