1. 05 12月, 2010 2 次提交
  2. 01 12月, 2010 1 次提交
  3. 26 11月, 2010 6 次提交
  4. 18 11月, 2010 2 次提交
    • F
      tracing: New flag to allow non privileged users to use a trace event · 61c32659
      Frederic Weisbecker 提交于
      This adds a new trace event internal flag that allows them to be
      used in perf by non privileged users in case of task bound tracing.
      
      This is desired for syscalls tracepoint because they don't leak
      global system informations, like some other tracepoints.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      61c32659
    • P
      perf: Fix owner-list vs exit · 8882135b
      Peter Zijlstra 提交于
      Oleg noticed that a perf-fd keeping a reference on the creating task
      leads to a few funny side effects.
      
      There's two different aspects to this:
      
        - kernel based perf-events, these should not take out
          a reference on the creating task and appear on the task's
          event list since they're not bound to fds nor visible
          to userspace.
      
        - fork() and pthread_create(), these can lead to the creating
          task dying (and thus the task's event-list becomming useless)
          but keeping the list and ref alive until the event is closed.
      
      Combined they lead to malfunction of the ptrace hw_tracepoints.
      
      Cure this by not considering kernel based perf_events for the
      owner-list and destroying the owner-list when the owner dies.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      LKML-Reference: <1289576883.2084.286.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8882135b
  5. 12 11月, 2010 1 次提交
    • J
      perf,hw_breakpoint: Initialize hardware api earlier · 3c502e7a
      Jason Wessel 提交于
      When using early debugging, the kernel does not initialize the
      hw_breakpoint API early enough and causes the late initialization of
      the kernel debugger to fail. The boot arguments are:
      
          earlyprintk=vga ekgdboc=kbd kgdbwait
      
      Then simply type "go" at the kdb prompt and boot. The kernel will
      later emit the message:
      
          kgdb: Could not allocate hwbreakpoints
      
      And at that point the kernel debugger will cease to work correctly.
      
      The solution is to initialize the hw_breakpoint at the same time that
      all the other perf call backs are initialized instead of using a
      core_initcall() initialization which happens well after the kernel
      debugger can make use of hardware breakpoints.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4CD3396D.1090308@windriver.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3c502e7a
  6. 11 11月, 2010 1 次提交
    • S
      perf_events: Fix time tracking in samples · eed01528
      Stephane Eranian 提交于
      This patch corrects time tracking in samples. Without this patch
      both time_enabled and time_running are bogus when user asks for
      PERF_SAMPLE_READ.
      
      One uses PERF_SAMPLE_READ to sample the values of other counters
      in each sample. Because of multiplexing, it is necessary to know
      both time_enabled, time_running to be able to scale counts correctly.
      
      In this second version of the patch, we maintain a shadow
      copy of ctx->time which allows us to compute ctx->time without
      calling update_context_time() from NMI context. We avoid the
      issue that update_context_time() must always be called with
      ctx->lock held.
      
      We do not keep shadow copies of the other event timings
      because if the lead event is overflowing then it is active
      and thus it's been scheduled in via event_sched_in() in
      which case neither tstamp_stopped, tstamp_running can be modified.
      
      This timing logic only applies to samples when PERF_SAMPLE_READ
      is used.
      
      Note that this patch does not address timing issues related
      to sampling inheritance between tasks. This will be addressed
      in a future patch.
      
      With this patch, the libpfm4 example task_smpl now reports
      correct counts (shown on 2.4GHz Core 2):
      
      $ task_smpl -p 2400000000 -e unhalted_core_cycles:u,instructions_retired:u,baclears  noploop 5
      noploop for 5 seconds
      IIP:0x000000004006d6 PID:5596 TID:5596 TIME:466,210,211,430 STREAM_ID:33 PERIOD:2,400,000,000 ENA=1,010,157,814 RUN=1,010,157,814 NR=3
      	2,400,000,254 unhalted_core_cycles:u (33)
      	2,399,273,744 instructions_retired:u (34)
      	53,340 baclears (35)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cc6e14b.1e07e30a.256e.5190@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eed01528
  7. 22 10月, 2010 2 次提交
    • S
      perf_events: Fix for transaction recovery in group_sched_in() · d7842da4
      Stephane Eranian 提交于
      This new version (see commit 8e5fc1a7) is much simpler and ensures that
      in case of error in group_sched_in() during event_sched_in(), the
      events up to the failed event go through regular event_sched_out().
      But the failed event and the remaining events in the group have their
      timings adjusted as if they had also gone through event_sched_in() and
      event_sched_out(). This ensures timing uniformity across all events in
      a group. This also takes care of the tstamp_stopped problem in case
      the group could never be scheduled. The tstamp_stopped is updated as
      if the event had actually run.
      
      With this patch, the following now reports correct time_enabled,
      in case the NMI watchdog is active:
      
      $ task -e unhalted_core_cycles,instructions_retired,baclears,baclears
      noploop 1
      noploop for 1 seconds
      
      0 unhalted_core_cycles (100.00% scaling, ena=997,552,872, run=0)
      0 instructions_retired (100.00% scaling, ena=997,552,872, run=0)
      0 baclears (100.00% scaling, ena=997,552,872, run=0)
      0 baclears (100.00% scaling, ena=997,552,872, run=0)
      
      And the older test case also works:
      
      $ task -einstructions_retired,baclears,baclears -e
      unhalted_core_cycles,baclears,baclears sleep 5
      
      1680885 instructions_retired (69.39% scaling, ena=950756, run=291006)
        10735 baclears (69.39% scaling, ena=950756, run=291006)
        10735 baclears (69.39% scaling, ena=950756, run=291006)
      
            0 unhalted_core_cycles (100.00% scaling, ena=817932, run=0)
            0 baclears (100.00% scaling, ena=817932, run=0)
            0 baclears (100.00% scaling, ena=817932, run=0)
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cbeeebc.8ee7d80a.5a28.0d5f@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d7842da4
    • S
      perf_events: Revert: Fix transaction recovery in group_sched_in() · 9ffcfa6f
      Stephane Eranian 提交于
      This patch reverts commit 8e5fc1a7 (perf_events: Fix transaction
      recovery in group_sched_in()) because it had one flaw in case the
      group could never be scheduled. It would cause time_enabled to get
      negative.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cbeeeb7.0aefd80a.6e40.0e2f@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ffcfa6f
  8. 19 10月, 2010 9 次提交
    • P
      perf: Optimize sw events · 7e54a5a0
      Peter Zijlstra 提交于
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e54a5a0
    • P
      perf: Use jump_labels to optimize the scheduler hooks · 82cd6def
      Peter Zijlstra 提交于
      Trades a call + conditional + ret for an unconditional jmp.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101014203625.501657727@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      82cd6def
    • P
      perf, hw_breakpoint: Fix crash in hw_breakpoint creation · d580ff86
      Peter Zijlstra 提交于
      hw_breakpoint creation needs to account stuff per-task to ensure there
      is always sufficient hardware resources to back these things due to
      ptrace.
      
      With the perf per pmu context changes the event initialization no
      longer has access to the event context, for the simple reason that we
      need to first find the pmu (result of initialization) before we can
      find the context.
      
      This makes hw_breakpoints unhappy, because it can no longer do per
      task accounting, cure this by frobbing a task pointer in the event::hw
      bits for now...
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20101014203625.391543667@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d580ff86
    • P
      perf: Find task before event alloc · c6be5a5c
      Peter Zijlstra 提交于
      So that we can pass the task pointer to the event allocation, so that
      we can use task associated data during event initialization.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101014203625.340789919@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c6be5a5c
    • P
      perf: Fix task refcount bugs · e7d0bc04
      Peter Zijlstra 提交于
      Currently it looks like find_lively_task_by_vpid() takes a task ref
      and relies on find_get_context() to drop it.
      
      The problem is that perf_event_create_kernel_counter() shouldn't be
      dropping task refs.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMatt Helsley <matthltc@us.ibm.com>
      LKML-Reference: <20101014203625.278436085@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e7d0bc04
    • P
      perf: Fix group moving · 74c3337c
      Peter Zijlstra 提交于
      Matt found we trigger the WARN_ON_ONCE() in perf_group_attach() when we take
      the move_group path in perf_event_open().
      
      Since we cannot de-construct the group (we rely on it to move the events), we
      have to simply ignore the double attach. The group state is context invariant
      and doesn't need changing.
      Reported-by: NMatt Fleming <matt@console-pimps.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1287135757.29097.1368.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74c3337c
    • P
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra 提交于
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e360adbe
    • S
      perf_events: Fix transaction recovery in group_sched_in() · 8e5fc1a7
      Stephane Eranian 提交于
      The group_sched_in() function uses a transactional approach to schedule
      a group of events. In a group, either all events can be scheduled or
      none are. To schedule each event in, the function calls event_sched_in().
      In case of error, event_sched_out() is called on each event in the group.
      
      The problem is that event_sched_out() does not completely cancel the
      effects of event_sched_in(). Furthermore event_sched_out() changes the
      state of the event as if it had run which is not true is this particular
      case.
      
      Those inconsistencies impact time tracking fields and may lead to events
      in a group not all reporting the same time_enabled and time_running values.
      This is demonstrated with the example below:
      
      $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
      1946101 unhalted_core_cycles (32.85% scaling, ena=829181, run=556827)
        11423 baclears (32.85% scaling, ena=829181, run=556827)
         7671 baclears (0.00% scaling, ena=556827, run=556827)
      
      2250443 unhalted_core_cycles (57.83% scaling, ena=962822, run=405995)
        11705 baclears (57.83% scaling, ena=962822, run=405995)
        11705 baclears (57.83% scaling, ena=962822, run=405995)
      
      Notice that in the first group, the last baclears event does not
      report the same timings as its siblings.
      
      This issue comes from the fact that tstamp_stopped is updated
      by event_sched_out() as if the event had actually run.
      
      To solve the issue, we must ensure that, in case of error, there is
      no change in the event state whatsoever. That means timings must
      remain as they were when entering group_sched_in().
      
      To do this we defer updating tstamp_running until we know the
      transaction succeeded. Therefore, we have split event_sched_in()
      in two parts separating the update to tstamp_running.
      
      Similarly, in case of error, we do not want to update tstamp_stopped.
      Therefore, we have split event_sched_out() in two parts separating
      the update to tstamp_stopped.
      
      With this patch, we now get the following output:
      
      $ task -eunhalted_core_cycles,baclears,baclears -e unhalted_core_cycles,baclears,baclears sleep 5
      2492050 unhalted_core_cycles (71.75% scaling, ena=1093330, run=308841)
        11243 baclears (71.75% scaling, ena=1093330, run=308841)
        11243 baclears (71.75% scaling, ena=1093330, run=308841)
      
      1852746 unhalted_core_cycles (0.00% scaling, ena=784489, run=784489)
         9253 baclears (0.00% scaling, ena=784489, run=784489)
         9253 baclears (0.00% scaling, ena=784489, run=784489)
      
      Note that the uneven timing between groups is a side effect of
      the process spending most of its time sleeping, i.e., not enough
      event rotations (but that's a separate issue).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4cb86b4c.41e9d80a.44e9.3e19@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8e5fc1a7
    • S
      perf_events: Fix bogus context time tracking · c530ccd9
      Stephane Eranian 提交于
      You can only call update_context_time() when the context
      is active, i.e., the thread it is attached to is still running.
      
      However, perf_event_read() can be called even when the context
      is inactive, e.g., user read() the counters. The call to
      update_context_time() must be conditioned on the status of
      the context, otherwise, bogus time_enabled, time_running may
      be returned. Here is an example on AMD64. The task program
      is an example from libpfm4. The -p prints deltas every 1s.
      
      $ task -p -e cpu_clk_unhalted sleep 5
          2,266,610 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      	    0 cpu_clk_unhalted (0.00% scaling, ena=2,158,982, run=2,158,982)
      5,242,358,071 cpu_clk_unhalted (99.95% scaling, ena=5,000,359,984, run=2,319,270)
      
      Whereas if you don't read deltas, e.g., no call to perf_event_read() until
      the process terminates:
      
      $ task -e cpu_clk_unhalted sleep 5
          2,497,783 cpu_clk_unhalted (0.00% scaling, ena=2,376,899, run=2,376,899)
      
      Notice that time_enable, time_running are bogus in the first example
      causing bogus scaling.
      
      This patch fixes the problem, by conditionally calling update_context_time()
      in perf_event_read().
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      LKML-Reference: <4cb856dc.51edd80a.5ae0.38fb@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c530ccd9
  9. 12 10月, 2010 1 次提交
  10. 11 10月, 2010 1 次提交
  11. 04 10月, 2010 1 次提交
    • S
      perf_events: Fix invalid pointer when pid is invalid · 540804b5
      Stephane Eranian 提交于
      This patch fixes an error in perf_event_open() when the pid
      provided by the user is invalid. find_lively_task_by_vpid()
      does not return NULL on error but an error code. Without the
      fix the error code was silently passed to find_get_context()
      which would eventually cause a invalid pointer dereference.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Cc: paulus@samba.org
      Cc: davem@davemloft.net
      Cc: fweisbec@gmail.com
      Cc: perfmon2-devel@lists.sf.net
      Cc: eranian@gmail.com
      Cc: robert.richter@amd.com
      LKML-Reference: <4ca9a5d1.e8e9d80a.3dbb.ffff8f2e@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      540804b5
  12. 21 9月, 2010 1 次提交
  13. 17 9月, 2010 4 次提交
  14. 15 9月, 2010 2 次提交
    • M
      perf events: Clean up pid passing · 38a81da2
      Matt Helsley 提交于
      The kernel perf event creation path shouldn't use find_task_by_vpid()
      because a vpid exists in a specific namespace. find_task_by_vpid() uses
      current's pid namespace which isn't always the correct namespace to use
      for the vpid in all the places perf_event_create_kernel_counter() (and
      thus find_get_context()) is called.
      
      The goal is to clean up pid namespace handling and prevent bugs like:
      
      	https://bugzilla.kernel.org/show_bug.cgi?id=17281
      
      Instead of using pids switch find_get_context() to use task struct
      pointers directly. The syscall is responsible for resolving the pid to
      a task struct. This moves the pid namespace resolution into the syscall
      much like every other syscall that takes pid parameters.
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robin Green <greenrd@greenrd.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      LKML-Reference: <a134e5e392ab0204961fd1a62c84a222bf5874a9.1284407763.git.matthltc@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      38a81da2
    • M
      perf events: Split out task search into helper · 2ebd4ffb
      Matt Helsley 提交于
      Split out the code which searches for non-exiting tasks into its own
      helper. Creating this helper not only makes the code slightly more
      readable it prepares to move the search out of find_get_context() in
      a subsequent commit.
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robin Green <greenrd@greenrd.org>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      LKML-Reference: <561205417b450b8a4bf7488374541d64b4690431.1284407762.git.matthltc@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ebd4ffb
  15. 13 9月, 2010 2 次提交
  16. 10 9月, 2010 4 次提交