1. 19 8月, 2009 3 次提交
    • L
      tracing/events: Add ftrace_event_call param to define_fields() · 14be96c9
      Li Zefan 提交于
      This parameter is needed by syscall events to add define_fields()
      handler.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAF90.6060801@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14be96c9
    • L
      tracing/syscalls: Add fields format for exit events · 10a5b66f
      Li Zefan 提交于
      Add "format" file for syscall exit events:
      
       # cat events/syscalls/sys_exit_open/format
       name: sys_exit_open
       ID: 344
       format:
               field:unsigned short common_type;       offset:0;       size:2;
               field:unsigned char common_flags;       offset:2;       size:1;
               field:unsigned char common_preempt_count;       offset:3;       size:1;
               field:int common_pid;   offset:4;       size:4;
               field:int common_tgid;  offset:8;       size:4;
      
               field:int nr;   offset:12;      size:4;
               field:unsigned long ret;        offset:16;      size:4;
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAF61.3060307@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10a5b66f
    • L
      tracing/syscalls: Fix fields format for enter events · e6971969
      Li Zefan 提交于
      The "format" file of a trace event is originally for parsers to
      parse ftrace binary output.
      
      But the "format" file of a syscall event can only be used by
      perfcounter, because it describes the format of struct
      syscall_enter_record not struct syscall_trace_enter.
      
      To fix this, we remove struct syscall_enter_record, and then
      struct syscall_trace_enter will be used by both perf profile
      and ftrace.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAF39.1030404@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e6971969
  2. 17 8月, 2009 5 次提交
    • L
      ftrace: Simplify seqfile code · 3be04b47
      Li Zefan 提交于
      Use seq_release_private().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A891AAB.8090701@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3be04b47
    • L
      trace_stack: Simplify seqfile code · 2fc5f0cf
      Li Zefan 提交于
      Extract duplicate code in t_start() and t_next().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A891A91.4030602@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2fc5f0cf
    • L
      trace_stat: Fix missing entry in stat file · 97d53202
      Li Zefan 提交于
      One entry is missing in the output of a stat file.
      
      The cause is, when stat_seq_start() is called the 2nd time, we
      should start from the (pos-1)th elem in the rbtree but not pos,
      because pos == 0 is the header.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A891A65.70009@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97d53202
    • L
      tracing/syscalls: Fix to print parameter types · ba8b3a40
      Li Zefan 提交于
      When syscall tracing was implemented as a tracer,
      "syscall_arg_type" trace option could be set to enable the
      display of syscall parameter types.
      
      Now this option is gone since it's no longer a tracer, but the
      code is still there but dead.
      
      So we remove dead code and re-enable the printing of paramete
      types via the verbose option:
      
        # echo verbose > trace_options
        # echo syscalls > set_event
        # cat trace
      	...
              bash-3331  [000]    95.348937: sys_fcntl64 -> 0x1
              bash-3331  [000]    95.348942: sys_close(unsigned int fd: a)
      	...
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <4A891AF6.5050102@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ba8b3a40
    • L
      tracing/events: Add module tracepoints · 7ead8b83
      Li Zefan 提交于
      Add trace points to trace module_load, module_free, module_get,
      module_put and module_request, and use trace_event facility to
      get the trace output.
      
      Here's the sample output:
      
           TASK-PID    CPU#    TIMESTAMP  FUNCTION
              | |       |          |         |
          <...>-42    [000]     1.758380: module_request: fb0 wait=1 call_site=fb_open
          ...
          <...>-60    [000]     3.269403: module_load: scsi_wait_scan
          <...>-60    [000]     3.269432: module_put: scsi_wait_scan call_site=sys_init_module refcnt=0
          <...>-61    [001]     3.273168: module_free: scsi_wait_scan
          ...
          <...>-1021  [000]    13.836081: module_load: sunrpc
          <...>-1021  [000]    13.840589: module_put: sunrpc call_site=sys_init_module refcnt=-1
          <...>-1027  [000]    13.848098: module_get: sunrpc call_site=try_module_get refcnt=0
          <...>-1027  [000]    13.848308: module_get: sunrpc call_site=get_filesystem refcnt=1
          <...>-1027  [000]    13.848692: module_put: sunrpc call_site=put_filesystem refcnt=0
          ...
       modprobe-2587  [001]  1088.437213: module_load: trace_events_sample F
       modprobe-2587  [001]  1088.437786: module_put: trace_events_sample call_site=sys_init_module refcnt=0
      
      Note:
      
      - the taints flag can be 'F', 'C' and/or 'P' if mod->taints != 0
      
      - the module refcnt is percpu, so it can be negative in a
        specific cpu
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4A891B3C.5030608@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ead8b83
  3. 14 8月, 2009 1 次提交
    • I
      tracing: Fix syscall tracing on !HAVE_FTRACE_SYSCALLS architectures · 60d970c2
      Ingo Molnar 提交于
      The new syscall_regfunc()/unregfunc() functions rely on
      the existence of TIF_SYSCALL_FTRACE - but that TIF flag
      is only offered by HAVE_FTRACE_SYSCALLS.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60d970c2
  4. 12 8月, 2009 11 次提交
    • F
      tracing: Support for syscall events raw records in perfcounters · 19007a67
      Frederic Weisbecker 提交于
      This bring the support for raw syscall events in perfcounters.
      The arguments or exit value are saved as a raw sample using
      the PERF_SAMPLE_RAW attribute in a perf counter.
      
      Example (for now you must explicitly set the PERF_SAMPLE_RAW flag
      in perf record):
      
      perf record -e syscalls:sys_enter_open -f -F 1 -a
      perf report -D
      
      	0x2cbb8 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 02 00 50 00 20 e9 39 ab 0a 7f 00 00  ......P. .9....
      	.  0010:  bc 14 00 00 bc 14 00 00 01 00 00 00 00 00 00 00  ...............
      	.  0020:  2c 00 00 00 15 01 01 00 bc 14 00 00 bc 14 00 00  ,..............
                        ^  ^  ^  ^  ^  ^  ^  ..........................
                        Event Size  struct trace_entry
      
      	.  0030:  00 00 00 00 46 98 43 02 00 00 00 00 80 08 00 00  ....F.C........
                        ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
                        ptr to file name        open flags
      
      	.  0040:  00 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00  ...............
                        ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
      	.         open mode               padding
      
      	0x2cbb8 [0x50]: PERF_EVENT_SAMPLE (IP, 2): 5308: 0x7f0aab39e920 period: 1
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      19007a67
    • F
      tracing: Add fields format definition for syscall events · dc4ddb4c
      Frederic Weisbecker 提交于
      Define the format of the syscall trace fields to parse the binary
      values from a raw trace using the syscall events "format" file.
      
      This is defined dynamically using the syscalls metadata.
      It prepares the export of syscall event raw records to perf
      counters.
      
      Example:
      
      $ cat /debug/tracing/events/syscalls/sys_enter_sched_getparam/format
      name: sys_enter_sched_getparam
      ID: 39
      format:
      	field:unsigned short common_type;	offset:0;	size:2;
      	field:unsigned char common_flags;	offset:2;	size:1;
      	field:unsigned char common_preempt_count;	offset:3;	size:1;
      	field:int common_pid;	offset:4;	size:4;
      	field:int common_tgid;	offset:8;	size:4;
      
      	field:pid_t pid;	offset:12;	size:8;
      	field:struct sched_param * param;	offset:20;	size:8;
      
      print fmt: "pid: 0x%08lx, param: 0x%08lx", ((unsigned long)(REC->pid)), ((unsigned long)(REC->param))
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      dc4ddb4c
    • F
      tracing: Add ftrace event call parameter to its field descriptor handler · e8f9f4d7
      Frederic Weisbecker 提交于
      Add the struct ftrace_event_call as a parameter of its show_format()
      callback. This way we can use it from the syscall trace events to
      retrieve the syscall name from the ftrace event call parameter and
      describe its fields using the syscalls metadata.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      e8f9f4d7
    • J
      tracing: Add perf counter support for syscalls tracing · f4b5ffcc
      Jason Baron 提交于
      The perf counter support is automated for usual trace events. But we
      have to define specific callbacks for this to handle syscalls trace
      events
      
      Make 'perf stat -e syscalls:sys_enter_blah' work with syscall style
      tracepoints.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      f4b5ffcc
    • J
      tracing: Add individual syscalls tracepoint id support · 64c12e04
      Jason Baron 提交于
      The current state of syscalls tracepoints generates only one event id
      for every syscall events.
      
      This patch associates an id with each syscall trace event, so that we
      can identify each syscall trace event using the 'perf' tool.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      64c12e04
    • J
      tracing: Add trace events for each syscall entry/exit · fb34a08c
      Jason Baron 提交于
      Layer Frederic's syscall tracer on tracepoints. We create trace events
      via hooking into the SYSCALL_DEFINE macros. This allows us to
      individually toggle syscall entry and exit points on/off.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      fb34a08c
    • J
      tracing: Add ftrace_event_call void * 'data' field · 69fd4f0e
      Jason Baron 提交于
      add an optional void * pointer to 'ftrace_event_call' that is
      passed in for regfunc and unregfunc.
      
      This prepares for syscall tracepoints creation by passing the name of
      the syscall we want to trace and then retrieve its number through our
      arch syscall table.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      69fd4f0e
    • J
      tracing: Raw_init() bailout in trace event register fail case · f744bd57
      Jason Baron 提交于
      Allow the return value of raw_init() trace event callback to bail us out
      of creating a trace event file, in case we fail to register our
      event.
      
      Also, we plan to return -ENOSYS for syscall events that don't match any
      syscalls listed in our arch tracing syscall table, we don't want to warn
      in that case, we just want this event to be invisible in debugfs and
      ignored.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      f744bd57
    • J
      tracing: Add syscall tracepoints · a871bd33
      Jason Baron 提交于
      add two tracepoints in syscall exit and entry path, conditioned on
      TIF_SYSCALL_FTRACE. Supports the syscall trace event code.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a871bd33
    • J
      tracing: Call arch_init_ftrace_syscalls at boot · 066e0378
      Jason Baron 提交于
      Call arch_init_ftrace_syscalls at boot, so we can determine early the
      set of syscalls for the syscall trace events.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      066e0378
    • Z
      tracing: Rename set_tracer_flags()'s local variable trace_flags · 7770841e
      Zhaolei 提交于
      set_tracer_flags() have a local variable named trace_flags which has
      the same name than a global one in the same scope.
      This leads to confusion, using tracer_flags should be better by its
      meaning.
      
      Changelog:
      v1->v2: Simplified another patch in this patchset, no change in this
              patch.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      7770841e
  5. 10 8月, 2009 2 次提交
  6. 09 8月, 2009 7 次提交
    • P
      perf_counter: Fix a race on perf_counter_ctx · 3a80b4a3
      Peter Zijlstra 提交于
      While extending perfcounters with BTS hw-tracing, Markus
      Metzger managed to trigger this warning:
      
         [  995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
      
      triggers because commit
      9f498cc5 (perf_counter: Full
      task tracing) removed clearing of tsk->perf_counter_ctxp out
      from under ctx->lock which introduced a race (against
      perf_lock_task_context).
      
      Move it back and deal with the exit notification by explicitly
      passing along the former task context.
      Reported-by: NMarkus T Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249667341.17467.5.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a80b4a3
    • F
      perf_counter: Fix tracepoint sampling to be part of generic sampling · 3a43ce68
      Frederic Weisbecker 提交于
      Based on Peter's comments, make tracepoint sampling generic
      just like all the other sampling bits are. This is a rename
      with no code changes:
      
      - PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
      - struct perf_tracepoint_record to perf_raw_record
      
      We want the system in place that transport tracepoints raw
      samples events into the perf ring buffer to be generalized and
      usable by any type of counter.
      
      Reported-by; Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a43ce68
    • F
      perf_counter: Work around gcc warning by initializing tracepoint record unconditionally · 10b8e306
      Frederic Weisbecker 提交于
      Despite that the tracepoint record is always present when the
      PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
      thinking it might not be initialized:
      
        kernel/perf_counter.c: In function ‘perf_counter_output’:
        kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
      
      Then, initialize it to NULL and always check if it's not NULL
      before dereference it.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249698400-5441-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10b8e306
    • P
      perf_counter: Fix software counters for fast moving event sources · 7b4b6658
      Peter Zijlstra 提交于
      Reimplement the software counters to deal with fast moving
      event sources (such as tracepoints). This means being able
      to generate multiple overflows from a single 'event' as well
      as support throttling.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7b4b6658
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
    • P
      perf_counter, ftrace: Fix perf_counter integration · 3a659305
      Peter Zijlstra 提交于
      Adds possible second part to the assign argument of TP_EVENT().
      
        TP_perf_assign(
      	__perf_count(foo);
      	__perf_addr(bar);
        )
      
      Which, when specified make the swcounter increment with @foo instead
      of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
      associated with the event) when this triggers a counter overflow.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a659305
    • S
      posix_cpu_timers_exit_group(): Do not use thread_group_cputimer() · 17d42c1c
      Stanislaw Gruszka 提交于
      When the process exits we don't have to run new cputimer nor
      use running one (as it not accounts when tsk->exit_state != 0)
      to get process CPU times.  As there is only one thread we can
      just use CPU times fields from task and signal structs.
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17d42c1c
  7. 08 8月, 2009 7 次提交
    • T
      tracing/filters: Don't use pred on alloc failure · fb82ad71
      Tom Zanussi 提交于
      Dan Carpenter sent me a fix to prevent pred from being used if
      it couldn't be allocated.  This updates his patch for the same
      problem in the tracing tree (which has changed this code quite
      substantially).
      Reported-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <1249746576.6453.30.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      The original report:
      
      create_logical_pred() could sometimes return NULL.
      
      It's a static checker complaining rather than problems at runtime...
      fb82ad71
    • T
      tracing/filters: Always free pred on filter_add_subsystem_pred() failure · 26528e77
      Tom Zanussi 提交于
      If filter_add_subsystem_pred() fails due to ENOSPC or ENOMEM,
      the pred doesn't get freed, while as a side effect it does for
      other errors. Make it so the caller always frees the pred for
      any error.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <1249746593.6453.32.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      26528e77
    • T
      tracing/filters: Don't use pred on alloc failure · 96b2de31
      Tom Zanussi 提交于
      Dan Carpenter sent me a fix to prevent pred from being used if
      it couldn't be allocated.  I noticed the same problem also
      existed for the create_pred() case and added a fix for that.
      Reported-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <1249746549.6453.29.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      96b2de31
    • Y
      x86/irq: Fix move_irq_desc() for nodes without ram · ad7d6c7a
      Yinghai Lu 提交于
      Don't move it if target node is -1.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4A785B5D.4070702@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ad7d6c7a
    • E
      execve: must clear current->clear_child_tid · 9c8a8228
      Eric Dumazet 提交于
      While looking at Jens Rosenboom bug report
      (http://lkml.org/lkml/2009/7/27/35) about strange sys_futex call done from
      a dying "ps" program, we found following problem.
      
      clone() syscall has special support for TID of created threads.  This
      support includes two features.
      
      One (CLONE_CHILD_SETTID) is to set an integer into user memory with the
      TID value.
      
      One (CLONE_CHILD_CLEARTID) is to clear this same integer once the created
      thread dies.
      
      The integer location is a user provided pointer, provided at clone()
      time.
      
      kernel keeps this pointer value into current->clear_child_tid.
      
      At execve() time, we should make sure kernel doesnt keep this user
      provided pointer, as full user memory is replaced by a new one.
      
      As glibc fork() actually uses clone() syscall with CLONE_CHILD_SETTID and
      CLONE_CHILD_CLEARTID set, chances are high that we might corrupt user
      memory in forked processes.
      
      Following sequence could happen:
      
      1) bash (or any program) starts a new process, by a fork() call that
         glibc maps to a clone( ...  CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID
         ...) syscall
      
      2) When new process starts, its current->clear_child_tid is set to a
         location that has a meaning only in bash (or initial program) context
         (&THREAD_SELF->tid)
      
      3) This new process does the execve() syscall to start a new program.
         current->clear_child_tid is left unchanged (a non NULL value)
      
      4) If this new program creates some threads, and initial thread exits,
         kernel will attempt to clear the integer pointed by
         current->clear_child_tid from mm_release() :
      
              if (tsk->clear_child_tid
                  && !(tsk->flags & PF_SIGNALED)
                  && atomic_read(&mm->mm_users) > 1) {
                      u32 __user * tidptr = tsk->clear_child_tid;
                      tsk->clear_child_tid = NULL;
      
                      /*
                       * We don't check the error code - if userspace has
                       * not set up a proper pointer then tough luck.
                       */
      << here >>      put_user(0, tidptr);
                      sys_futex(tidptr, FUTEX_WAKE, 1, NULL, NULL, 0);
              }
      
      5) OR : if new program is not multi-threaded, but spied by /proc/pid
         users (ps command for example), mm_users > 1, and the exiting program
         could corrupt 4 bytes in a persistent memory area (shm or memory mapped
         file)
      
      If current->clear_child_tid points to a writeable portion of memory of the
      new program, kernel happily and silently corrupts 4 bytes of memory, with
      unexpected effects.
      
      Fix is straightforward and should not break any sane program.
      Reported-by: NJens Rosenboom <jens@mcbone.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sonny Rao <sonnyrao@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c8a8228
    • X
      generic-ipi: fix hotplug_cfd() · 69dd647f
      Xiao Guangrong 提交于
      Use CONFIG_HOTPLUG_CPU, not CONFIG_CPU_HOTPLUG
      
      When hot-unpluging a cpu, it will leak memory allocated at cpu hotplug,
      but only if CPUMASK_OFFSTACK=y, which is default to n.
      
      The bug was introduced by 8969a5ed
      ("generic-ipi: remove kmalloc()").
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69dd647f
    • E
      ring-buffer: Fix memleak in ring_buffer_free() · bd3f0221
      Eric Dumazet 提交于
      I noticed oprofile memleaked in linux-2.6 current tree,
      and tracked this ring-buffer leak.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <4A7C06B9.2090302@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd3f0221
  8. 07 8月, 2009 2 次提交
    • L
      lockdep: Fix file mode of lock_stat · 9795447f
      Li Zefan 提交于
      /proc/lock_stat is writable.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <4A7BE7B6.10904@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9795447f
    • P
      perf_counter: Fix double list iteration in per task precise stats · 1054598c
      Peter Zijlstra 提交于
      Brice Goglin reported this crash with per task precise stats:
      
      > I finally managed to test the threaded perfcounter statistics (thanks a
      > lot for implementing it). I am running 2.6.31-rc5 (with the AMD
      > magny-cours patches but I don't think they matter here). I am trying to
      > measure local/remote memory accesses per thread during the well-known
      > stream benchmark. It's compiled with OpenMP using 16 threads on a
      > quad-socket quad-core barcelona machine.
      >
      > Command line is:
      >  /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf record -f -s
      > -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 ./stream
      >
      > It seems to work fine with a single -e <counter> on the command line
      > while it crashes when there are at least 2 of them.
      > It seems to work fine without -s as well.
      
      A silly copy-paste resulted in a messed up iteration which would
      cause the OOPS.
      Reported-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NBrice Goglin <Brice.Goglin@inria.fr>
      LKML-Reference: <1249574786.32113.550.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1054598c
  9. 06 8月, 2009 2 次提交
    • R
      ring-buffer: Fix advance of reader in rb_buffer_peek() · 469535a5
      Robert Richter 提交于
      When calling rb_buffer_peek() from ring_buffer_consume() and a
      padding event is returned, the function rb_advance_reader() is
      called twice. This may lead to missing samples or under high
      workloads to the warning below. This patch fixes this. If a padding
      event is returned by rb_buffer_peek() it will be consumed by the
      calling function now.
      
      Also, I simplified some code in ring_buffer_consume().
      
      ------------[ cut here ]------------
      WARNING: at /dev/shm/.source/linux/kernel/trace/ring_buffer.c:2289 rb_advance_reader+0x2e/0xc5()
      Hardware name: Anaheim
      Modules linked in:
      Pid: 29, comm: events/2 Tainted: G        W  2.6.31-rc3-oprofile-x86_64-standard-00059-g5050dc2 #1
      Call Trace:
      [<ffffffff8106776f>] ? rb_advance_reader+0x2e/0xc5
      [<ffffffff81039ffe>] warn_slowpath_common+0x77/0x8f
      [<ffffffff8103a025>] warn_slowpath_null+0xf/0x11
      [<ffffffff8106776f>] rb_advance_reader+0x2e/0xc5
      [<ffffffff81068bda>] ring_buffer_consume+0xa0/0xd2
      [<ffffffff81326933>] op_cpu_buffer_read_entry+0x21/0x9e
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff8132749b>] sync_buffer+0xa5/0x401
      [<ffffffff810be3af>] ? __find_get_block+0x4b/0x165
      [<ffffffff81326c1b>] ? wq_sync_buffer+0x0/0x78
      [<ffffffff81326c76>] wq_sync_buffer+0x5b/0x78
      [<ffffffff8104aa30>] worker_thread+0x113/0x1ac
      [<ffffffff8104dd95>] ? autoremove_wake_function+0x0/0x38
      [<ffffffff8104a91d>] ? worker_thread+0x0/0x1ac
      [<ffffffff8104dc9a>] kthread+0x88/0x92
      [<ffffffff8100bdba>] child_rip+0xa/0x20
      [<ffffffff8104dc12>] ? kthread+0x0/0x92
      [<ffffffff8100bdb0>] ? child_rip+0x0/0x20
      ---[ end trace f561c0a58fcc89bd ]---
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      469535a5
    • F
      tracing/events: Only define remove_subsystem_dir() if CONFIG_MODULES · a2ca5e03
      Frederic Weisbecker 提交于
      If we disable modules, we get the following warning in ftrace events
      file:
      
      kernel/trace/trace_events.c:912: attention : ‘remove_subsystem_dir’ defined but not used
      
      remove_subystem_dir() is useless if !CONFIG_MODULES, then move it to
      the appropriate #ifdef section of trace_events.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      a2ca5e03