1. 19 6月, 2013 1 次提交
  2. 28 5月, 2013 4 次提交
    • P
      perf: Fix perf mmap bugs · 26cb63ad
      Peter Zijlstra 提交于
      Vince reported a problem found by his perf specific trinity
      fuzzer.
      
      Al noticed 2 problems with perf's mmap():
      
       - it has issues against fork() since we use vma->vm_mm for accounting.
       - it has an rb refcount leak on double mmap().
      
      We fix the issues against fork() by using VM_DONTCOPY; I don't
      think there's code out there that uses this; we didn't hear
      about weird accounting problems/crashes. If we do need this to
      work, the previously proposed VM_PINNED could make this work.
      
      Aside from the rb reference leak spotted by Al, Vince's example
      prog was indeed doing a double mmap() through the use of
      perf_event_set_output().
      
      This exposes another problem, since we now have 2 events with
      one buffer, the accounting gets screwy because we account per
      event. Fix this by making the buffer responsible for its own
      accounting.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26cb63ad
    • S
      perf: Add sysfs entry to adjust multiplexing interval per PMU · 62b85639
      Stephane Eranian 提交于
      This patch adds /sys/device/xxx/perf_event_mux_interval_ms to ajust
      the multiplexing interval per PMU. The unit is milliseconds. Value has
      to be >= 1.
      
      In the 4th version, we renamed the sysfs file to be more consistent
      with the other /proc/sys/kernel entries for perf_events.
      
      In the 5th version, we handle the reprogramming of the hrtimer using
      hrtimer_forward_now(). That way, we sync up to new timer value quickly
      (suggested by Jiri Olsa).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/1364991694-5876-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      62b85639
    • S
      perf: Use hrtimers for event multiplexing · 9e630205
      Stephane Eranian 提交于
      The current scheme of using the timer tick was fine for per-thread
      events. However, it was causing bias issues in system-wide mode
      (including for uncore PMUs). Event groups would not get their fair
      share of runtime on the PMU. With tickless kernels, if a core is idle
      there is no timer tick, and thus no event rotation (multiplexing).
      However, there are events (especially uncore events) which do count
      even though cores are asleep.
      
      This patch changes the timer source for multiplexing.  It introduces a
      per-PMU per-cpu hrtimer. The advantage is that even when a core goes
      idle, it will come back to service the hrtimer, thus multiplexing on
      system-wide events works much better.
      
      The per-PMU implementation (suggested by PeterZ) enables adjusting the
      multiplexing interval per PMU. The preferred interval is stashed into
      the struct pmu. If not set, it will be forced to the default interval
      value.
      
      In order to minimize the impact of the hrtimer, it is turned on and
      off on demand. When the PMU on a CPU is overcommited, the hrtimer is
      activated.  It is stopped when the PMU is not overcommitted.
      
      In order for this to work properly, we had to change the order of
      initialization in start_kernel() such that hrtimer_init() is run
      before perf_event_init().
      
      The default interval in milliseconds is set to a timer tick just like
      with the old code. We will provide a sysctl to tune this in another
      patch.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e630205
    • J
      perf: Fix hw breakpoints overflow period sampling · ab573844
      Jiri Olsa 提交于
      The hw breakpoint pmu 'add' function is missing the
      period_left update needed for SW events.
      
      The perf HW breakpoint events use the SW events framework
      to process the overflow, so it needs to be properly initialized
      in the PMU 'add' method.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Stephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1367421944-19082-5-git-send-email-jolsa@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab573844
  3. 23 4月, 2013 1 次提交
    • F
      perf: New helper to prevent full dynticks CPUs from stopping tick · 026249ef
      Frederic Weisbecker 提交于
      Provide a new helper that help full dynticks CPUs to prevent
      from stopping their tick in case there are events in the local
      rotation list.
      
      This way we make sure that perf_event_task_tick() is serviced
      on demand.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      026249ef
  4. 01 4月, 2013 3 次提交
  5. 27 3月, 2013 1 次提交
  6. 18 3月, 2013 1 次提交
  7. 16 3月, 2013 1 次提交
  8. 06 3月, 2013 1 次提交
  9. 09 2月, 2013 1 次提交
    • O
      perf: Introduce hw_perf_event->tp_target and ->tp_list · f22c1bb6
      Oleg Nesterov 提交于
      sys_perf_event_open()->perf_init_event(event) is called before
      find_get_context(event), this means that event->ctx == NULL when
      class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
      can't know if this event is per-task or system-wide.
      
      This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
      this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
      The patch also moves ->bp_target up so that it can overlap with the
      new member, this can help the compiler to generate the better code.
      
      trace_uprobe_register() will use it for prefiltering to avoid the
      unnecessary breakpoints in mm's we do not want to trace.
      
      ->tp_target doesn't have its own reference, but we can rely on the
      fact that either sys_perf_event_open() holds a reference, or it is
      equal to event->ctx->task. So this pointer is always valid until
      free_event().
      
      Also add the "struct list_head tp_list" into this union. It is not
      strictly necessary, but it can simplify the next changes and we can
      add it for free.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f22c1bb6
  10. 01 2月, 2013 1 次提交
  11. 24 10月, 2012 2 次提交
  12. 13 10月, 2012 1 次提交
  13. 05 10月, 2012 1 次提交
  14. 13 9月, 2012 1 次提交
  15. 04 9月, 2012 2 次提交
  16. 23 8月, 2012 1 次提交
  17. 10 8月, 2012 5 次提交
    • F
      perf: Add attribute to filter out callchains · d0775264
      Frederic Weisbecker 提交于
      Introducing following bits to the the perf_event_attr struct:
      
        - exclude_callchain_kernel to filter out kernel callchain
          from the sample dump
      
        - exclude_callchain_user to filter out user callchain
          from the sample dump
      
      We need to be able to disable standard user callchain dump when we use
      the dwarf cfi callchain mode, because frame pointer based user
      callchains are useless in this mode.
      
      Implementing also exclude_callchain_kernel to have complete set of
      options.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      [ Added kernel callchains filtering ]
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-7-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0775264
    • J
      perf: Add ability to attach user stack dump to sample · c5ebcedb
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
      of the user level stack on sample. The size of the dump is specified by
      sample_stack_user value.
      
      Being able to dump parts of the user stack, starting from the stack
      pointer, will be useful to make a post mortem dwarf CFI based stack
      unwinding.
      
      Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
      architecture provides user stack dump on perf event samples.  This needs
      access to the user stack pointer which is not unified across
      architectures. Enabling this for x86 architecture.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5ebcedb
    • J
      perf: Add perf_output_skip function to skip bytes in sample · 5685e0ff
      Jiri Olsa 提交于
      Introducing perf_output_skip function to be able to skip data within the
      perf ring buffer.
      
      When writing data into perf ring buffer we first reserve needed place in
      ring buffer and then copy the actual data.
      
      There's a possibility we won't be able to fill all the reserved size
      with data, so we need a way to skip the remaining bytes.
      
      This is going to be useful when storing the user stack dump, where we
      might end up with less data than we originally requested.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5685e0ff
    • F
      perf: Factor __output_copy to be usable with specific copy function · 91d7753a
      Frederic Weisbecker 提交于
      Adding a generic way to use __output_copy function with specific copy
      function via DEFINE_PERF_OUTPUT_COPY macro.
      
      Using this to add new __output_copy_user function, that provides output
      copy from user pointers. For x86 the copy_from_user_nmi function is used
      and __copy_from_user_inatomic for the rest of the architectures.
      
      This new function will be used in user stack dump on sample, coming in
      next patches.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7753a
    • J
      perf: Add ability to attach user level registers dump to sample · 4018994f
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
      user level registers on sample. Registers we want to dump are specified
      by sample_regs_user bitmask.
      
      Only user level registers are dumped at the moment. Meaning the register
      values of the user space context as it was before the user entered the
      kernel for whatever reason (syscall, irq, exception, or a PMI happening
      in userspace).
      
      The layout of the sample_regs_user bitmap is described in
      asm/perf_regs.h for archs that support register dump.
      
      This is going to be useful to bring Dwarf CFI based stack unwinding on
      top of samples.
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Dump registers ABI specification. ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Suggested-by: NStephane Eranian <eranian@google.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4018994f
  18. 31 7月, 2012 1 次提交
  19. 18 6月, 2012 1 次提交
  20. 06 6月, 2012 2 次提交
  21. 31 5月, 2012 1 次提交
  22. 23 5月, 2012 1 次提交
    • J
      Revert "sched, perf: Use a single callback into the scheduler" · ab0cce56
      Jiri Olsa 提交于
      This reverts commit cb04ff9a ("sched, perf: Use a single
      callback into the scheduler").
      
      Before this change was introduced, the process switch worked
      like this (wrt. to perf event schedule):
      
           schedule (prev, next)
             - schedule out all perf events for prev
             - switch to next
             - schedule in all perf events for current (next)
      
      After the commit, the process switch looks like:
      
           schedule (prev, next)
             - schedule out all perf events for prev
             - schedule in all perf events for (next)
             - switch to next
      
      The problem is, that after we schedule perf events in, the pmu
      is enabled and we can receive events even before we make the
      switch to next - so "current" still being prev process (event
      SAMPLE data are filled based on the value of the "current"
      process).
      
      Thats exactly what we see for test__PERF_RECORD test. We receive
      SAMPLES with PID of the process that our tracee is scheduled
      from.
      
      Discussed with Peter Zijlstra:
      
       > Bah!, yeah I guess reverting is the right thing for now. Sad
       > though.
       >
       > So by having the two hooks we have a black-spot between them
       > where we receive no events at all, this black-spot covers the
       > hand-over of current and we thus don't receive the 'wrong'
       > events.
       >
       > I rather liked we could do away with both that black-spot and
       > clean up the code a little, but apparently people rely on it.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: acme@redhat.com
      Cc: paulus@samba.org
      Cc: cjashfor@linux.vnet.ibm.com
      Cc: fweisbec@gmail.com
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/20120523111302.GC1638@m.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab0cce56
  23. 09 5月, 2012 2 次提交
  24. 24 3月, 2012 1 次提交
  25. 23 3月, 2012 1 次提交
    • P
      perf: Fix mmap_page capabilities and docs · c7206205
      Peter Zijlstra 提交于
      Complete the syscall-less self-profiling feature and address
      all complaints, namely:
      
       - capabilities, so we can detect what is actually available at runtime
      
           Add a capabilities field to perf_event_mmap_page to indicate
           what is actually available for use.
      
       - on x86: RDPMC weirdness due to being 40/48 bits and not sign-extending
         properly.
      
       - ABI documentation as to how all this stuff works.
      
      Also improve the documentation for the new features.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vweaver1@eecs.utk.edu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1332433596.2487.33.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c7206205
  26. 17 3月, 2012 1 次提交
  27. 09 3月, 2012 1 次提交