1. 20 10月, 2015 2 次提交
    • S
      perf: Add PERF_SAMPLE_BRANCH_CALL · c229bf9d
      Stephane Eranian 提交于
      Add a new branch sample type to cover only call branches (function calls).
      The current ANY_CALL included direct, indirect calls and far jumps.
      
      We want to be able to differentiate indirect from direct calls. Therefore
      we introduce PERF_SAMPLE_BRANCH_CALL. The implementation is up to each
      architecture.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: khandual@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/r/1444720151-10275-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c229bf9d
    • A
      perf/x86: Fix time_shift in perf_event_mmap_page · b9511cd7
      Adrian Hunter 提交于
      Commit:
      
        b20112ed ("perf/x86: Improve accuracy of perf/sched clock")
      
      allowed the time_shift value in perf_event_mmap_page to be as much
      as 32.  Unfortunately the documented algorithms for using time_shift
      have it shifting an integer, whereas to work correctly with the value
      32, the type must be u64.
      
      In the case of perf tools, Intel PT decodes correctly but the timestamps
      that are output (for example by perf script) have lost 32-bits of
      granularity so they look like they are not changing at all.
      
      Fix by limiting the shift to 31 and adjusting the multiplier accordingly.
      
      Also update the documentation of perf_event_mmap_page so that new code
      based on it will be more future-proof.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: b20112ed ("perf/x86: Improve accuracy of perf/sched clock")
      Link: http://lkml.kernel.org/r/1445001845-13688-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9511cd7
  2. 04 8月, 2015 1 次提交
  3. 24 7月, 2015 1 次提交
    • A
      perf: Add PERF_RECORD_SWITCH to indicate context switches · 45ac1403
      Adrian Hunter 提交于
      There are already two events for context switches, namely the tracepoint
      sched:sched_switch and the software event context_switches.
      Unfortunately neither are suitable for use by non-privileged users for
      the purpose of synchronizing hardware trace data (e.g. Intel PT) to the
      context switch.
      
      Tracepoints are no good at all for non-privileged users because they
      need either CAP_SYS_ADMIN or /proc/sys/kernel/perf_event_paranoid <= -1.
      
      On the other hand, kernel software events need either CAP_SYS_ADMIN or
      /proc/sys/kernel/perf_event_paranoid <= 1.
      
      Now many distributions do default perf_event_paranoid to 1 making
      context_switches a contender, except it has another problem (which is
      also shared with sched:sched_switch) which is that it happens before
      perf schedules events out instead of after perf schedules events in.
      Whereas a privileged user can see all the events anyway, a
      non-privileged user only sees events for their own processes, in other
      words they see when their process was scheduled out not when it was
      scheduled in. That presents two problems to use the event:
      
      1. the information comes too late, so tools have to look ahead in the
         event stream to find out what the current state is
      
      2. if they are unlucky tracing might have stopped before the
         context-switches event is recorded.
      
      This new PERF_RECORD_SWITCH event does not have those problems
      and it also has a couple of other small advantages.
      
      It is easier to use because it is an auxiliary event (like mmap, comm
      and task events) which can be enabled by setting a single bit. It is
      smaller than sched:sched_switch and easier to parse.
      
      To make the event useful for privileged users also, if the
      context is cpu-wide then the event record will be
      PERF_RECORD_SWITCH_CPU_WIDE which is the same as
      PERF_RECORD_SWITCH except it also provides the next or
      previous pid/tid.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1437471846-26995-2-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      45ac1403
  4. 20 6月, 2015 1 次提交
  5. 07 6月, 2015 2 次提交
  6. 02 4月, 2015 7 次提交
    • A
      perf: Add ITRACE_START record to indicate that tracing has started · ec0d7729
      Alexander Shishkin 提交于
      For counters that generate AUX data that is bound to the context of a
      running task, such as instruction tracing, the decoder needs to know
      exactly which task is running when the event is first scheduled in,
      before the first sched_switch. The decoder's need to know this stems
      from the fact that instruction flow trace decoding will almost always
      require program's object code in order to reconstruct said flow and
      for that we need at least its pid/tid in the perf stream.
      
      To single out such instruction tracing pmus, this patch introduces
      ITRACE PMU capability. The reason this is not part of RECORD_AUX
      record is that not all pmus capable of generating AUX data need this,
      and the opposite is *probably* also true.
      
      While sched_switch covers for most cases, there are two problems with it:
      the consumer will need to process events out of order (that is, having
      found RECORD_AUX, it will have to skip forward to the nearest sched_switch
      to figure out which task it was, then go back to the actual trace to
      decode it) and it completely misses the case when the tracing is enabled
      and disabled before sched_switch, for example, via PERF_EVENT_IOC_DISABLE.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-15-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec0d7729
    • A
      perf: Add wakeup watermark control to the AUX area · 1a594131
      Alexander Shishkin 提交于
      When AUX area gets a certain amount of new data, we want to wake up
      userspace to collect it. This adds a new control to specify how much
      data will cause a wakeup. This is then passed down to pmu drivers via
      output handle's "wakeup" field, so that the driver can find the nearest
      point where it can generate an interrupt.
      
      We repurpose __reserved_2 in the event attribute for this, even though
      it was never checked to be zero before, aux_watermark will only matter
      for new AUX-aware code, so the old code should still be fine.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-10-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1a594131
    • A
      perf: Support overwrite mode for the AUX area · 2023a0d2
      Alexander Shishkin 提交于
      This adds support for overwrite mode in the AUX area, which means "keep
      collecting data till you're stopped", turning AUX area into a circular
      buffer, where new data overwrites old data. It does not depend on data
      buffer's overwrite mode, so that it doesn't lose sideband data that is
      instrumental for processing AUX data.
      
      Overwrite mode is enabled at mapping AUX area read only. Even though
      aux_tail in the buffer's user page might be user writable, it will be
      ignored in this mode.
      
      A PERF_RECORD_AUX with PERF_AUX_FLAG_OVERWRITE set is written to the perf
      data stream every time an event writes new data to the AUX area. The pmu
      driver might not be able to infer the exact beginning of the new data in
      each snapshot, some drivers will only provide the tail, which is
      aux_offset + aux_size in the AUX record. Consumer has to be able to tell
      the new data from the old one, for example, by means of time stamps if
      such are provided in the trace.
      
      Consumer is also responsible for disabling any events that might write
      to the AUX area (thus potentially racing with the consumer) before
      collecting the data.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-9-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2023a0d2
    • A
      perf: Add AUX record · 68db7e98
      Alexander Shishkin 提交于
      When there's new data in the AUX space, output a record indicating its
      offset and size and a set of flags, such as PERF_AUX_FLAG_TRUNCATED, to
      mean the described data was truncated to fit in the ring buffer.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-7-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      68db7e98
    • P
      perf: Add AUX area to ring buffer for raw data streams · 45bfb2e5
      Peter Zijlstra 提交于
      This patch introduces "AUX space" in the perf mmap buffer, intended for
      exporting high bandwidth data streams to userspace, such as instruction
      flow traces.
      
      AUX space is a ring buffer, defined by aux_{offset,size} fields in the
      user_page structure, and read/write pointers aux_{head,tail}, which abide
      by the same rules as data_* counterparts of the main perf buffer.
      
      In order to allocate/mmap AUX, userspace needs to set up aux_offset to
      such an offset that will be greater than data_offset+data_size and
      aux_size to be the desired buffer size. Both need to be page aligned.
      Then, same aux_offset and aux_size should be passed to mmap() call and
      if everything adds up, you should have an AUX buffer as a result.
      
      Pages that are mapped into this buffer also come out of user's mlock
      rlimit plus perf_event_mlock_kb allowance.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-3-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45bfb2e5
    • A
      perf: Add data_{offset,size} to user_page · e8c6deac
      Alexander Shishkin 提交于
      Currently, the actual perf ring buffer is one page into the mmap area,
      following the user page and the userspace follows this convention. This
      patch adds data_{offset,size} fields to user_page that can be used by
      userspace instead for locating perf data in the mmap area. This is also
      helpful when mapping existing or shared buffers if their size is not
      known in advance.
      
      Right now, it is made to follow the existing convention that
      
      	data_offset == PAGE_SIZE and
      	data_offset + data_size == mmap_size.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-2-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e8c6deac
    • A
      tracing, perf: Implement BPF programs attached to kprobes · 2541517c
      Alexei Starovoitov 提交于
      BPF programs, attached to kprobes, provide a safe way to execute
      user-defined BPF byte-code programs without being able to crash or
      hang the kernel in any way. The BPF engine makes sure that such
      programs have a finite execution time and that they cannot break
      out of their sandbox.
      
      The user interface is to attach to a kprobe via the perf syscall:
      
      	struct perf_event_attr attr = {
      		.type	= PERF_TYPE_TRACEPOINT,
      		.config	= event_id,
      		...
      	};
      
      	event_fd = perf_event_open(&attr,...);
      	ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
      
      'prog_fd' is a file descriptor associated with BPF program
      previously loaded.
      
      'event_id' is an ID of the kprobe created.
      
      Closing 'event_fd':
      
      	close(event_fd);
      
      ... automatically detaches BPF program from it.
      
      BPF programs can call in-kernel helper functions to:
      
        - lookup/update/delete elements in maps
      
        - probe_read - wraper of probe_kernel_read() used to access any
          kernel data structures
      
      BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is
      architecture dependent) and return 0 to ignore the event and 1 to store
      kprobe event into the ring buffer.
      
      Note, kprobes are a fundamentally _not_ a stable kernel ABI,
      so BPF programs attached to kprobes must be recompiled for
      every kernel version and user must supply correct LINUX_VERSION_CODE
      in attr.kern_version during bpf_prog_load() call.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2541517c
  7. 27 3月, 2015 1 次提交
    • P
      perf: Add per event clockid support · 34f43927
      Peter Zijlstra 提交于
      While thinking on the whole clock discussion it occurred to me we have
      two distinct uses of time:
      
       1) the tracking of event/ctx/cgroup enabled/running/stopped times
          which includes the self-monitoring support in struct
          perf_event_mmap_page.
      
       2) the actual timestamps visible in the data records.
      
      And we've been conflating them.
      
      The first is all about tracking time deltas, nobody should really care
      in what time base that happens, its all relative information, as long
      as its internally consistent it works.
      
      The second however is what people are worried about when having to
      merge their data with external sources. And here we have the
      discussion on MONOTONIC vs MONOTONIC_RAW etc..
      
      Where MONOTONIC is good for correlating between machines (static
      offset), MONOTNIC_RAW is required for correlating against a fixed rate
      hardware clock.
      
      This means configurability; now 1) makes that hard because it needs to
      be internally consistent across groups of unrelated events; which is
      why we had to have a global perf_clock().
      
      However, for 2) it doesn't really matter, perf itself doesn't care
      what it writes into the buffer.
      
      The below patch makes the distinction between these two cases by
      adding perf_event_clock() which is used for the second case. It
      further makes this configurable on a per-event basis, but adds a few
      sanity checks such that we cannot combine events with different clocks
      in confusing ways.
      
      And since we then have per-event configurability we might as well
      retain the 'legacy' behaviour as a default.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      34f43927
  8. 23 3月, 2015 1 次提交
  9. 25 2月, 2015 1 次提交
    • M
      perf/x86/intel: Support task events with Intel CQM · bfe1fcd2
      Matt Fleming 提交于
      Add support for task events as well as system-wide events. This change
      has a big impact on the way that we gather LLC occupancy values in
      intel_cqm_event_read().
      
      Currently, for system-wide (per-cpu) events we defer processing to
      userspace which knows how to discard all but one cpu result per package.
      
      Things aren't so simple for task events because we need to do the value
      aggregation ourselves. To do this, we defer updating the LLC occupancy
      value in event->count from intel_cqm_event_read() and do an SMP
      cross-call to read values for all packages in intel_cqm_event_count().
      We need to ensure that we only do this for one task event per cache
      group, otherwise we'll report duplicate values.
      
      If we're a system-wide event we want to fallback to the default
      perf_event_count() implementation. Refactor this into a common function
      so that we don't duplicate the code.
      
      Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an
      event's task (if the event isn't per-cpu) inside of the Intel CQM PMU
      driver.  This task information is only availble in the upper layers of
      the perf infrastructure.
      
      Other perf backends stash the target task in event->hw.*target so we
      need to do something similar. The task is used to determine whether
      events should share a cache group and an RMID.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kanaka Juvva <kanaka.d.juvva@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: linux-api@vger.kernel.org
      Link: http://lkml.kernel.org/r/1422038748-21397-8-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bfe1fcd2
  10. 19 2月, 2015 2 次提交
  11. 16 11月, 2014 1 次提交
  12. 28 10月, 2014 1 次提交
  13. 09 6月, 2014 1 次提交
  14. 06 6月, 2014 1 次提交
    • A
      perf: Differentiate exec() and non-exec() comm events · 82b89778
      Adrian Hunter 提交于
      perf tools like 'perf report' can aggregate samples by comm strings,
      which generally works.  However, there are other potential use-cases.
      For example, to pair up 'calls' with 'returns' accurately (from branch
      events like Intel BTS) it is necessary to identify whether the process
      has exec'd.  Although a comm event is generated when an 'exec' happens
      it is also generated whenever the comm string is changed on a whim
      (e.g. by prctl PR_SET_NAME).  This patch adds a flag to the comm event
      to differentiate one case from the other.
      
      In order to determine whether the kernel supports the new flag, a
      selection bit named 'exec' is added to struct perf_event_attr.  The
      bit does nothing but will cause perf_event_open() to fail if the bit
      is set on kernels that do not have it defined.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/537D9EBE.7030806@intel.com
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      82b89778
  15. 05 6月, 2014 1 次提交
  16. 19 5月, 2014 1 次提交
  17. 24 1月, 2014 1 次提交
  18. 12 1月, 2014 1 次提交
  19. 17 12月, 2013 1 次提交
  20. 29 10月, 2013 1 次提交
  21. 04 10月, 2013 1 次提交
    • A
      perf: Add generic transaction flags · fdfbbd07
      Andi Kleen 提交于
      Add a generic qualifier for transaction events, as a new sample
      type that returns a flag word. This is particularly useful
      for qualifying aborts: to distinguish aborts which happen
      due to asynchronous events (like conflicts caused by another
      CPU) versus instructions that lead to an abort.
      
      The tuning strategies are very different for those cases,
      so it's important to distinguish them easily and early.
      
      Since it's inconvenient and inflexible to filter for this
      in the kernel we report all the events out and allow
      some post processing in user space.
      
      The flags are based on the Intel TSX events, but should be fairly
      generic and mostly applicable to other HTM architectures too. In addition
      to various flag words there's also reserved space to report an
      program supplied abort code. For TSX this is used to distinguish specific
      classes of aborts, like a lock busy abort when doing lock elision.
      
      Flags:
      
      Elision and generic transactions 		   (ELISION vs TRANSACTION)
      (HLE vs RTM on TSX; IBM etc.  would likely only use TRANSACTION)
      Aborts caused by current thread vs aborts caused by others (SYNC vs ASYNC)
      Retryable transaction				   (RETRY)
      Conflicts with other threads			   (CONFLICT)
      Transaction write capacity overflow		   (CAPACITY WRITE)
      Transaction read capacity overflow		   (CAPACITY READ)
      
      Transactions implicitely aborted can also return an abort code.
      This can be used to signal specific events to the profiler. A common
      case is abort on lock busy in a RTM eliding library (code 0xff)
      To handle this case we include the TSX abort code
      
      Common example aborts in TSX would be:
      
      - Data conflict with another thread on memory read.
                                            Flags: TRANSACTION|ASYNC|CONFLICT
      - executing a WRMSR in a transaction. Flags: TRANSACTION|SYNC
      - HLE transaction in user space is too large
                                            Flags: ELISION|SYNC|CAPACITY-WRITE
      
      The only flag that is somewhat TSX specific is ELISION.
      
      This adds the perf core glue needed for reporting the new flag word out.
      
      v2: Add MEM/MISC
      v3: Move transaction to the end
      v4: Separate capacity-read/write and remove misc
      v5: Remove _SAMPLE. Move abort flags to 32bit. Rename
          transaction to txn
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1379688044-14173-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fdfbbd07
  22. 20 9月, 2013 2 次提交
    • P
      perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' · fa731587
      Peter Zijlstra 提交于
      Solve the problems around the broken definition of perf_event_mmap_page::
      cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
      fixed by:
      
        860f085b ("perf: Fix broken union in 'struct perf_event_mmap_page'")
      
      The problem with the fix (merged in v3.12-rc1 and not yet released
      officially), noticed by Vince Weaver is that the new behavior is
      not detectable by new user-space, and that due to the reuse of the
      field names it's easy to mis-compile a binary if old headers are used
      on a new kernel or new headers are used on an old kernel.
      
      To solve all that make this change explicit, detectable and self-contained,
      by iterating the ABI the following way:
      
       - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
         confuse old user-space binaries. RDPMC will be marked as unavailable
         to old binaries but that's within the ABI, this is a capability bit.
      
       - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
         libraries can reliably detect that bit 0 is deprecated and perma-zero
         without having to check the kernel version.
      
       - Use bits 2, 3, 4 for the newly defined, correct functionality:
      
      	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
      	cap_user_time		: 1, /* The time_* fields are used */
      	cap_user_time_zero	: 1, /* The time_zero field is used */
      
       - Rename all the bitfield names in perf_event.h to be different from the
         old names, to make sure it's not possible to mis-compile it
         accidentally with old assumptions.
      
      The 'size' field can then be used in the future to add new fields and it
      will act as a natural ABI version indicator as well.
      
      Also adjust tools/perf/ userspace for the new definitions, noticed by
      Adrian Hunter.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Also-Fixed-by: NAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fa731587
    • P
      perf: Update ABI comment · c5ecceef
      Peter Zijlstra 提交于
      For some mysterious reason the sample_id field of PERF_RECORD_MMAP went AWOL.
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c5ecceef
  23. 18 9月, 2013 1 次提交
  24. 03 9月, 2013 1 次提交
  25. 02 9月, 2013 2 次提交
  26. 30 8月, 2013 1 次提交
    • A
      perf: make events stream always parsable · ff3d527c
      Adrian Hunter 提交于
      The event stream is not always parsable because the format of a sample
      is dependent on the sample_type of the selected event.  When there is
      more than one selected event and the sample_types are not the same then
      parsing becomes problematic.  A sample can be matched to its selected
      event using the ID that is allocated when the event is opened.
      Unfortunately, to get the ID from the sample means first parsing it.
      
      This patch adds a new sample format bit PERF_SAMPLE_IDENTIFER that puts
      the ID at a fixed position so that the ID can be retrieved without
      parsing the sample.  For sample events, that is the first position
      immediately after the header.  For non-sample events, that is the last
      position.
      
      In this respect parsing samples requires that the sample_type and ID
      values are recorded.  For example, perf tools records struct
      perf_event_attr and the IDs within the perf.data file.  Those must be
      read first before it is possible to parse samples found later in the
      perf.data file.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1377591794-30553-6-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff3d527c
  27. 08 8月, 2013 1 次提交
  28. 23 7月, 2013 2 次提交