1. 16 2月, 2011 2 次提交
    • S
      perf: Add cgroup support · e5d1367f
      Stephane Eranian 提交于
      This kernel patch adds the ability to filter monitoring based on
      container groups (cgroups). This is for use in per-cpu mode only.
      
      The cgroup to monitor is passed as a file descriptor in the pid
      argument to the syscall. The file descriptor must be opened to
      the cgroup name in the cgroup filesystem. For instance, if the
      cgroup name is foo and cgroupfs is mounted in /cgroup, then the
      file descriptor is opened to /cgroup/foo. Cgroup mode is
      activated by passing PERF_FLAG_PID_CGROUP in the flags argument
      to the syscall.
      
      For instance to measure in cgroup foo on CPU1 assuming
      cgroupfs is mounted under /cgroup:
      
      struct perf_event_attr attr;
      int cgroup_fd, fd;
      
      cgroup_fd = open("/cgroup/foo", O_RDONLY);
      fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
      close(cgroup_fd);
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added perf_cgroup_{exit,attach} ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5d1367f
    • P
      perf: Fix throttle logic · 4fe757dd
      Peter Zijlstra 提交于
      It was possible to call pmu::start() on an already running event. In
      particular this lead so some wreckage as the hrtimer events would
      re-initialize active timers.
      
      This was due to throttled events being activated again by scheduling.
      Scheduling in a context would add and force start events, resulting in
      running events with a possible throttle status. The next tick to hit
      that task will then try to unthrottle the event and call ->start() on
      an already running event.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4fe757dd
  2. 03 2月, 2011 2 次提交
  3. 28 1月, 2011 1 次提交
    • E
      perf: Fix alloc_callchain_buffers() · 88d4f0db
      Eric Dumazet 提交于
      Commit 927c7a9e ("perf: Fix race in callchains") introduced
      a mismatch in the sizing of struct callchain_cpus_entries.
      
      nr_cpu_ids must be used instead of num_possible_cpus(), or we
      might get out of bound memory accesses on some machines.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Stephane Eranian <eranian@google.com>
      CC: stable@kernel.org
      LKML-Reference: <1295980851.3588.351.camel@edumazet-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      88d4f0db
  4. 22 1月, 2011 1 次提交
    • O
      perf: perf_event_exit_task_context: s/rcu_dereference/rcu_dereference_raw/ · 806839b2
      Oleg Nesterov 提交于
      In theory, almost every user of task->child->perf_event_ctxp[]
      is wrong. find_get_context() can install the new context at any
      moment, we need read_barrier_depends().
      
      dbe08d82 "perf: Fix
      find_get_context() vs perf_event_exit_task() race" added
      rcu_dereference() into perf_event_exit_task_context() to make
      the precedent, but this makes __rcu_dereference_check() unhappy.
      Use rcu_dereference_raw() to shut up the warning.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: acme@redhat.com
      Cc: paulus@samba.org
      Cc: stern@rowland.harvard.edu
      Cc: a.p.zijlstra@chello.nl
      Cc: fweisbec@gmail.com
      Cc: roland@redhat.com
      Cc: prasad@linux.vnet.ibm.com
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <20110121174547.GA8796@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      806839b2
  5. 21 1月, 2011 1 次提交
    • P
      perf: Annotate cpuctx->ctx.mutex to avoid a lockdep splat · 547e9fd7
      Peter Zijlstra 提交于
      Lockdep spotted:
      
      	loop_1b_instruc/1899 is trying to acquire lock:
      	 (event_mutex){+.+.+.}, at: [<ffffffff810e1908>] perf_trace_init+0x3b/0x2f7
      
      	but task is already holding lock:
      	 (&ctx->mutex){+.+.+.}, at: [<ffffffff810eb45b>] perf_event_init_context+0xc0/0x218
      
      	which lock already depends on the new lock.
      
      	the existing dependency chain (in reverse order) is:
      
      	-> #3 (&ctx->mutex){+.+.+.}:
      	-> #2 (cpu_hotplug.lock){+.+.+.}:
      	-> #1 (module_mutex){+.+...}:
      	-> #0 (event_mutex){+.+.+.}:
      
      But because the deadlock would be cpuhotplug (cpu-event) vs fork
      (task-event) it cannot, in fact, happen. We can annotate this by giving the
      perf_event_context used for the cpuctx a different lock class from those
      used by tasks.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      547e9fd7
  6. 20 1月, 2011 2 次提交
    • O
      perf: Fix perf_event_init_task()/perf_event_free_task() interaction · 8550d7cb
      Oleg Nesterov 提交于
      perf_event_init_task() should clear child->perf_event_ctxp[]
      before anything else. Otherwise, if
      perf_event_init_context(perf_hw_context) fails,
      perf_event_free_task() can free perf_event_ctxp[perf_sw_context]
      copied from parent->perf_event_ctxp[] by dup_task_struct().
      
      Also move the initialization of perf_event_mutex and
      perf_event_list from perf_event_init_context() to
      perf_event_init_context().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      LKML-Reference: <20110119182228.GC12183@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8550d7cb
    • O
      perf: Fix find_get_context() vs perf_event_exit_task() race · dbe08d82
      Oleg Nesterov 提交于
      find_get_context() must not install the new perf_event_context
      if the task has already passed perf_event_exit_task().
      
      If nothing else, this means the memory leak. Initially
      ctx->refcount == 2, it is supposed that
      perf_event_exit_task_context() should participate and do the
      necessary put_ctx().
      
      find_lively_task_by_vpid() checks PF_EXITING but this buys
      nothing, by the time we call find_get_context() this task can be
      already dead. To the point, cmpxchg() can succeed when the task
      has already done the last schedule().
      
      Change find_get_context() to populate task->perf_event_ctxp[]
      under task->perf_event_mutex, this way we can trust PF_EXITING
      because perf_event_exit_task() takes the same mutex.
      
      Also, change perf_event_exit_task_context() to use
      rcu_dereference(). Probably this is not strictly needed, but
      with or without this change find_get_context() can race with
      setup_new_exec()->perf_event_exit_task(), rcu_dereference()
      looks better.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      LKML-Reference: <20110119182207.GB12183@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbe08d82
  7. 19 1月, 2011 2 次提交
    • O
      perf: Validate cpu early in perf_event_alloc() · 66832eb4
      Oleg Nesterov 提交于
      Starting from perf_event_alloc()->perf_init_event(), the kernel
      assumes that event->cpu is either -1 or the valid CPU number.
      
      Change perf_event_alloc() to validate this argument early. This
      also means we can remove the similar check in
      find_get_context().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: gregkh@suse.de
      Cc: stable@kernel.org
      LKML-Reference: <20110118161032.GC693@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      66832eb4
    • O
      perf: Find_get_context: fix the per-cpu-counter check · 22a4ec72
      Oleg Nesterov 提交于
      If task == NULL, find_get_context() should always check that cpu
      is correct.
      
      Afaics, the bug was introduced by 38a81da2 "perf events: Clean
      up pid passing", but even before that commit "&& cpu != -1" was
      not exactly right, -ESRCH from find_task_by_vpid() is not
      accurate.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: gregkh@suse.de
      Cc: stable@kernel.org
      LKML-Reference: <20110118161008.GB693@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      22a4ec72
  8. 18 1月, 2011 1 次提交
    • P
      perf: Fix contexted inheritance · c5ed5145
      Peter Zijlstra 提交于
      Linus reported that the RCU lockdep annotation bits triggered for this
      rcu_dereference() because we're not holding rcu_read_lock().
      
      Going over the code I cannot convince myself its correct:
      
       - holding a ref on the parent_ctx, doesn't avoid it being uncloned
         concurrently (as the comment says), so we can race with a free.
      
       - holding parent_ctx->mutex doesn't avoid the above free from taking
         place either, it would at best avoid parent_ctx from being freed.
      
      I.e. the warning is correct. To fix the bug, serialize against the
      unclone_ctx() call by extending the reach of the parent_ctx->lock.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c5ed5145
  9. 07 1月, 2011 3 次提交
  10. 16 12月, 2010 3 次提交
  11. 09 12月, 2010 2 次提交
  12. 05 12月, 2010 3 次提交
  13. 01 12月, 2010 1 次提交
  14. 29 11月, 2010 1 次提交
    • J
      Kill off a bunch of warning: ‘inline’ is not at beginning of declaration · fa9f90be
      Jesper Juhl 提交于
      These warnings are spewed during a build of a 'allnoconfig' kernel
      (especially the ones from u64_stats_sync.h show up a lot) when building
      with -Wextra (which I often do)..
      They are
        a) annoying
        b) easy to get rid of.
      This patch kills them off.
      
      include/linux/u64_stats_sync.h:70:1: warning: ‘inline’ is not at beginning of declaration
      include/linux/u64_stats_sync.h:77:1: warning: ‘inline’ is not at beginning of declaration
      include/linux/u64_stats_sync.h:84:1: warning: ‘inline’ is not at beginning of declaration
      include/linux/u64_stats_sync.h:96:1: warning: ‘inline’ is not at beginning of declaration
      include/linux/u64_stats_sync.h:115:1: warning: ‘inline’ is not at beginning of declaration
      include/linux/u64_stats_sync.h:127:1: warning: ‘inline’ is not at beginning of declaration
      kernel/time.c:241:1: warning: ‘inline’ is not at beginning of declaration
      kernel/time.c:257:1: warning: ‘inline’ is not at beginning of declaration
      kernel/perf_event.c:4513:1: warning: ‘inline’ is not at beginning of declaration
      mm/page_alloc.c:4012:1: warning: ‘inline’ is not at beginning of declaration
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      fa9f90be
  15. 26 11月, 2010 6 次提交
  16. 18 11月, 2010 2 次提交
    • F
      tracing: New flag to allow non privileged users to use a trace event · 61c32659
      Frederic Weisbecker 提交于
      This adds a new trace event internal flag that allows them to be
      used in perf by non privileged users in case of task bound tracing.
      
      This is desired for syscalls tracepoint because they don't leak
      global system informations, like some other tracepoints.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      61c32659
    • P
      perf: Fix owner-list vs exit · 8882135b
      Peter Zijlstra 提交于
      Oleg noticed that a perf-fd keeping a reference on the creating task
      leads to a few funny side effects.
      
      There's two different aspects to this:
      
        - kernel based perf-events, these should not take out
          a reference on the creating task and appear on the task's
          event list since they're not bound to fds nor visible
          to userspace.
      
        - fork() and pthread_create(), these can lead to the creating
          task dying (and thus the task's event-list becomming useless)
          but keeping the list and ref alive until the event is closed.
      
      Combined they lead to malfunction of the ptrace hw_tracepoints.
      
      Cure this by not considering kernel based perf_events for the
      owner-list and destroying the owner-list when the owner dies.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      LKML-Reference: <1289576883.2084.286.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8882135b
  17. 12 11月, 2010 1 次提交
    • J
      perf,hw_breakpoint: Initialize hardware api earlier · 3c502e7a
      Jason Wessel 提交于
      When using early debugging, the kernel does not initialize the
      hw_breakpoint API early enough and causes the late initialization of
      the kernel debugger to fail. The boot arguments are:
      
          earlyprintk=vga ekgdboc=kbd kgdbwait
      
      Then simply type "go" at the kdb prompt and boot. The kernel will
      later emit the message:
      
          kgdb: Could not allocate hwbreakpoints
      
      And at that point the kernel debugger will cease to work correctly.
      
      The solution is to initialize the hw_breakpoint at the same time that
      all the other perf call backs are initialized instead of using a
      core_initcall() initialization which happens well after the kernel
      debugger can make use of hardware breakpoints.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4CD3396D.1090308@windriver.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3c502e7a
  18. 11 11月, 2010 1 次提交
    • S
      perf_events: Fix time tracking in samples · eed01528
      Stephane Eranian 提交于
      This patch corrects time tracking in samples. Without this patch
      both time_enabled and time_running are bogus when user asks for
      PERF_SAMPLE_READ.
      
      One uses PERF_SAMPLE_READ to sample the values of other counters
      in each sample. Because of multiplexing, it is necessary to know
      both time_enabled, time_running to be able to scale counts correctly.
      
      In this second version of the patch, we maintain a shadow
      copy of ctx->time which allows us to compute ctx->time without
      calling update_context_time() from NMI context. We avoid the
      issue that update_context_time() must always be called with
      ctx->lock held.
      
      We do not keep shadow copies of the other event timings
      because if the lead event is overflowing then it is active
      and thus it's been scheduled in via event_sched_in() in
      which case neither tstamp_stopped, tstamp_running can be modified.
      
      This timing logic only applies to samples when PERF_SAMPLE_READ
      is used.
      
      Note that this patch does not address timing issues related
      to sampling inheritance between tasks. This will be addressed
      in a future patch.
      
      With this patch, the libpfm4 example task_smpl now reports
      correct counts (shown on 2.4GHz Core 2):
      
      $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears  noploop 5
      noploop for 5 seconds
      IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3
      	2,400,000,254 unhalted_core_cycles:u (33)
      	2,399,273,744 instructions_retired:u (34)
      	53,340 baclears (35)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eed01528
  19. 22 10月, 2010 2 次提交
    • S
      perf_events: Fix for transaction recovery in group_sched_in() · d7842da4
      Stephane Eranian 提交于
      This new version (see commit 8e5fc1a7) is much simpler and ensures that
      in case of error in group_sched_in() during event_sched_in(), the
      events up to the failed event go through regular event_sched_out().
      But the failed event and the remaining events in the group have their
      timings adjusted as if they had also gone through event_sched_in() and
      event_sched_out(). This ensures timing uniformity across all events in
      a group. This also takes care of the tstamp_stopped problem in case
      the group could never be scheduled. The tstamp_stopped is updated as
      if the event had actually run.
      
      With this patch, the following now reports correct time_enabled,
      in case the NMI watchdog is active:
      
      $ task -e unhalted_core_cycles,instructions_retired,baclears,baclears
      noploop 1
      noploop for 1 seconds
      
      0 unhalted_core_cycles (100.00% scaling, ena=997,552,872, run=0)
      0 instructions_retired (100.00% scaling, ena=997,552,872, run=0)
      0 baclears (100.00% scaling, ena=997,552,872, run=0)
      0 baclears (100.00% scaling, ena=997,552,872, run=0)
      
      And the older test case also works:
      
      $ task -einstructions_retired,baclears,baclears -e
      unhalted_core_cycles,baclears,baclears sleep 5
      
      1680885 instructions_retired (69.39% scaling, ena=950756, run=291006)
        10735 baclears (69.39% scaling, ena=950756, run=291006)
        10735 baclears (69.39% scaling, ena=950756, run=291006)
      
            0 unhalted_core_cycles (100.00% scaling, ena=817932, run=0)
            0 baclears (100.00% scaling, ena=817932, run=0)
            0 baclears (100.00% scaling, ena=817932, run=0)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cbeeebc.8ee7d80a.5a28.0d5f@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d7842da4
    • S
      perf_events: Revert: Fix transaction recovery in group_sched_in() · 9ffcfa6f
      Stephane Eranian 提交于
      This patch reverts commit 8e5fc1a7 (perf_events: Fix transaction
      recovery in group_sched_in()) because it had one flaw in case the
      group could never be scheduled. It would cause time_enabled to get
      negative.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cbeeeb7.0aefd80a.6e40.0e2f@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ffcfa6f
  20. 19 10月, 2010 3 次提交