1. 04 9月, 2009 1 次提交
    • S
      ring-buffer: remove ring_buffer_event_discard · dc892f73
      Steven Rostedt 提交于
      The function ring_buffer_event_discard can be used on any item in the
      ring buffer, even after the item was committed. This function provides
      no safety nets and is very race prone.
      
      An item may be safely removed from the ring buffer before it is committed
      with the ring_buffer_discard_commit.
      
      Since there are currently no users of this function, and because this
      function is racey and error prone, this patch removes it altogether.
      
      Note, removing this function also allows the counters to ignore
      all discarded events (patches will follow).
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dc892f73
  2. 31 8月, 2009 1 次提交
    • L
      tracing/filters: Defer pred allocation · 8e254c1d
      Li Zefan 提交于
      init_preds() allocates about 5392 bytes of memory (on x86_32) for
      a TRACE_EVENT. With my config, at system boot total memory occupied
      is:
      
      	5392 * (642 + 15) == 3459KB
      
      642 == cat available_events | wc -l
      15 == number of dirs in events/ftrace
      
      That's quite a lot, so we'd better defer memory allocation util
      it's needed, that's when filter is used.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      LKML-Reference: <4A9B8EA5.6020700@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8e254c1d
  3. 28 8月, 2009 2 次提交
    • F
      tracing: Fix double CPP substitution in TRACE_EVENT_FN · 0dd7b747
      Frederic Weisbecker 提交于
      TRACE_EVENT_FN relays on TRACE_EVENT by reprocessing its parameters
      into the ftrace events CPP macro. This leads to a double substitution
      in some cases.
      
      For example, a bad consequence is a format always prefixed by
      "%s, %s\n" for every TRACE_EVENT_FN based events.
      
      Eg:
      	cat /debug/tracing/events/syscalls/sys_enter/format
      	[...]
      	print fmt: "%s, %s\n", "\"NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)\"",\
      	"REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3],\
      	REC->args[4], REC->args[5]"
      
      This creates a failure in post-processing tools such as perf trace or
      trace-cmd.
      
      Then drop this double substitution and replace it by a new __cpparg()
      macro that relays CPP arguments containing commas.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Josh Stone <jistone@redhat.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <1251413406-6704-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0dd7b747
    • F
      tracing: Undef TRACE_EVENT_FN between trace events headers inclusion · 6c347d43
      Frederic Weisbecker 提交于
      The recent commit:
      
      	tracing/events: fix the include file dependencies
      
      fixed a file dependency problem while including more than
      one trace event header file.
      
      This fix undefined TRACE_EVENT after an event header macro
      preprocessing in order to make tracepoint.h able to correctly declare
      the tracepoints necessary for the next event header file.
      
      But now we also need to undefine TRACE_EVENT_FN at the end of an event
      header file preprocessing for the same reason.
      
      This fixes the following build error:
      
      In file included from include/trace/events/napi.h:5,
                       from net/core/net-traces.c:28:
      include/linux/tracepoint.h:285:1: warning: "TRACE_EVENT_FN" redefined
      In file included from include/trace/define_trace.h:61,
                       from include/trace/events/skb.h:40,
                       from net/core/net-traces.c:27:
      include/trace/ftrace.h:50:1: warning: this is the location of the previous definition
      In file included from include/trace/events/napi.h:5,
                       from net/core/net-traces.c:28:
      include/linux/tracepoint.h:285:1: warning: "TRACE_EVENT_FN" redefined
      In file included from include/trace/define_trace.h:61,
                       from include/trace/events/skb.h:40,
                       from net/core/net-traces.c:27:
      include/trace/ftrace.h:50:1: warning: this is the location of the previous definition
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <20090827161732.GA7618@nowhere>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6c347d43
  4. 26 8月, 2009 8 次提交
    • S
      tracing: add comments to explain TRACE_EVENT out of protection · 7cb2e3ee
      Steven Rostedt 提交于
      The commit:
        commit 5ac35daa
        Author: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
        tracing/events: fix the include file dependencies
      
      Moved the TRACE_EVENT out of the ifdef protection of tracepoints.h
      but uses the define of TRACE_EVENT itself as protection. This patch
      adds comments to explain why.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7cb2e3ee
    • X
      tracing/events: fix the include file dependencies · 5ac35daa
      Xiao Guangrong 提交于
      The TRACE_EVENT depends on the include/linux/tracepoint.h first
      and include/trace/ftrace.h later, if we include the ftrace.h early,
      a building error will occur.
      
      Both define TRACE_EVENT in trace_a.h and trace_b.h, if we include
      those in .c file, like this:
      
      #define CREATE_TRACE_POINTS
      include <trace/events/trace_a.h>
      include <trace/events/trace_b.h>
      
      The above will not work, because the TRACE_EVENT was re-defined by
      the previous .h file.
      Reported-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      LKML-Reference: <4A937F5E.3020802@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5ac35daa
    • L
      tracing/filters: Support filtering for char * strings · 87a342f5
      Li Zefan 提交于
      Usually, char * entries are dangerous in traces because the string
      can be released whereas a pointer to it can still wait to be read from
      the ring buffer.
      
      But sometimes we can assume it's safe, like in case of RO data
      (eg: __file__ or __line__, used in bkl trace event). If these RO data
      are in a module and so is the call to the trace event, then it's safe,
      because the ring buffer will be flushed once this module get unloaded.
      
      To allow char * to be treated as a string:
      
      	TRACE_EVENT(...,
      
      		TP_STRUCT__entry(
      			__field_ext(const char *, name, FILTER_PTR_STRING)
      			...
      		)
      
      		...
      	);
      
      The filtering will not dereference "char *" unless the developer
      explicitly sets FILTER_PTR_STR in __field_ext.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A7B9287.90205@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87a342f5
    • L
      tracing/filters: Add __field_ext() to TRACE_EVENT · 43b51ead
      Li Zefan 提交于
      Add __field_ext(), so a field can be assigned to a specific
      filter_type, which matches a corresponding filter function.
      
      For example, a later patch will allow this:
      	__field_ext(const char *, str, FILTER_PTR_STR);
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A7B9272.60507095@cn.fujitsu.com>
      
      [
        Fixed a -1 to FILTER_OTHER
        Forward ported to latest kernel.
      ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      43b51ead
    • S
      tracing/sched: show CPU task wakes up on in trace event · f0693c8b
      Steven Rostedt 提交于
      While debugging the scheduler push / pull algorithm, I found
      it very annoying that the sched wake up events did not show
      the CPU that the task was waking on. In order to analyze the
      scheduler, I needed that information.
      
      This patch adds recording of the CPU that a task is waking up
      on.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f0693c8b
    • J
      tracing: Create generic syscall TRACE_EVENTs · 1c569f02
      Josh Stone 提交于
      This converts the syscall_enter/exit tracepoints into TRACE_EVENTs, so
      you can have generic ftrace events that capture all system calls with
      arguments and return values.  These generic events are also renamed to
      sys_enter/exit, so they're more closely aligned to the specific
      sys_enter_foo events.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      LKML-Reference: <1251150194-1713-5-git-send-email-jistone@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      1c569f02
    • J
      tracing: Move tracepoint callbacks from declaration to definition · 97419875
      Josh Stone 提交于
      It's not strictly correct for the tracepoint reg/unreg callbacks to
      occur when a client is hooking up, because the actual tracepoint may not
      be present yet.  This happens to be fine for syscall, since that's in
      the core kernel, but it would cause problems for tracepoints defined in
      a module that hasn't been loaded yet.  It also means the reg/unreg has
      to be EXPORTed for any modules to use the tracepoint (as in SystemTap).
      
      This patch removes DECLARE_TRACE_WITH_CALLBACK, and instead introduces
      DEFINE_TRACE_FN which stores the callbacks in struct tracepoint.  The
      callbacks are used now when the active state of the tracepoint changes
      in set_tracepoint & disable_tracepoint.
      
      This also introduces TRACE_EVENT_FN, so ftrace events can also provide
      registration callbacks if needed.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      LKML-Reference: <1251150194-1713-4-git-send-email-jistone@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      97419875
    • J
      tracing: Make syscall tracepoints conditional · 3d27d8cb
      Josh Stone 提交于
      The syscall enter/exit tracepoints are only supported on archs that
      HAVE_SYSCALL_TRACEPOINTS, so the declarations should be #ifdef'ed.
      Also, the definition of syscall_regfunc and syscall_unregfunc should
      depend on this same config, rather than the ftrace-specific one.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <1251150194-1713-3-git-send-email-jistone@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      3d27d8cb
  5. 19 8月, 2009 4 次提交
    • L
      tracing/syscalls: Add filtering support · 540b7b8d
      Li Zefan 提交于
      Add filtering support for syscall events:
      
       # echo 'mode == 0666' > events/syscalls/sys_enter_open
       # echo 'ret == 0' > events/syscalls/sys_exit_open
       # echo 1 > events/syscalls/sys_enter_open
       # echo 1 > events/syscalls/sys_exit_open
       # cat trace
       ...
         modprobe-3084 [001] 117.463140: sys_open(filename: 917d3e8, flags: 0, mode: 1b6)
         modprobe-3084 [001] 117.463176: sys_open -> 0x0
             less-3086 [001] 117.510455: sys_open(filename: 9c6bdb8, flags: 8000, mode: 1b6)
         sendmail-2574 [001] 122.145840: sys_open(filename: b807a365, flags: 0, mode: 1b6)
       ...
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAFCB.1040006@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      540b7b8d
    • L
      tracing/events: Add trace_define_common_fields() · e647d6b3
      Li Zefan 提交于
      Extract duplicate code. Also prepare for the later patch.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAFB8.1010304@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e647d6b3
    • L
      tracing/events: Add ftrace_event_call param to define_fields() · 14be96c9
      Li Zefan 提交于
      This parameter is needed by syscall events to add define_fields()
      handler.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAF90.6060801@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14be96c9
    • L
      tracing/syscalls: Add fields format for exit events · 10a5b66f
      Li Zefan 提交于
      Add "format" file for syscall exit events:
      
       # cat events/syscalls/sys_exit_open/format
       name: sys_exit_open
       ID: 344
       format:
               field:unsigned short common_type;       offset:0;       size:2;
               field:unsigned char common_flags;       offset:2;       size:1;
               field:unsigned char common_preempt_count;       offset:3;       size:1;
               field:int common_pid;   offset:4;       size:4;
               field:int common_tgid;  offset:8;       size:4;
      
               field:int nr;   offset:12;      size:4;
               field:unsigned long ret;        offset:16;      size:4;
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A8BAF61.3060307@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10a5b66f
  6. 17 8月, 2009 1 次提交
    • L
      tracing/events: Add module tracepoints · 7ead8b83
      Li Zefan 提交于
      Add trace points to trace module_load, module_free, module_get,
      module_put and module_request, and use trace_event facility to
      get the trace output.
      
      Here's the sample output:
      
           TASK-PID    CPU#    TIMESTAMP  FUNCTION
              | |       |          |         |
          <...>-42    [000]     1.758380: module_request: fb0 wait=1 call_site=fb_open
          ...
          <...>-60    [000]     3.269403: module_load: scsi_wait_scan
          <...>-60    [000]     3.269432: module_put: scsi_wait_scan call_site=sys_init_module refcnt=0
          <...>-61    [001]     3.273168: module_free: scsi_wait_scan
          ...
          <...>-1021  [000]    13.836081: module_load: sunrpc
          <...>-1021  [000]    13.840589: module_put: sunrpc call_site=sys_init_module refcnt=-1
          <...>-1027  [000]    13.848098: module_get: sunrpc call_site=try_module_get refcnt=0
          <...>-1027  [000]    13.848308: module_get: sunrpc call_site=get_filesystem refcnt=1
          <...>-1027  [000]    13.848692: module_put: sunrpc call_site=put_filesystem refcnt=0
          ...
       modprobe-2587  [001]  1088.437213: module_load: trace_events_sample F
       modprobe-2587  [001]  1088.437786: module_put: trace_events_sample call_site=sys_init_module refcnt=0
      
      Note:
      
      - the taints flag can be 'F', 'C' and/or 'P' if mod->taints != 0
      
      - the module refcnt is percpu, so it can be negative in a
        specific cpu
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4A891B3C.5030608@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ead8b83
  7. 12 8月, 2009 9 次提交
    • F
      tracing: Add fields format definition for syscall events · dc4ddb4c
      Frederic Weisbecker 提交于
      Define the format of the syscall trace fields to parse the binary
      values from a raw trace using the syscall events "format" file.
      
      This is defined dynamically using the syscalls metadata.
      It prepares the export of syscall event raw records to perf
      counters.
      
      Example:
      
      $ cat /debug/tracing/events/syscalls/sys_enter_sched_getparam/format
      name: sys_enter_sched_getparam
      ID: 39
      format:
      	field:unsigned short common_type;	offset:0;	size:2;
      	field:unsigned char common_flags;	offset:2;	size:1;
      	field:unsigned char common_preempt_count;	offset:3;	size:1;
      	field:int common_pid;	offset:4;	size:4;
      	field:int common_tgid;	offset:8;	size:4;
      
      	field:pid_t pid;	offset:12;	size:8;
      	field:struct sched_param * param;	offset:20;	size:8;
      
      print fmt: "pid: 0x%08lx, param: 0x%08lx", ((unsigned long)(REC->pid)), ((unsigned long)(REC->param))
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      dc4ddb4c
    • F
      tracing: Add ftrace event call parameter to its field descriptor handler · e8f9f4d7
      Frederic Weisbecker 提交于
      Add the struct ftrace_event_call as a parameter of its show_format()
      callback. This way we can use it from the syscall trace events to
      retrieve the syscall name from the ftrace event call parameter and
      describe its fields using the syscalls metadata.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      e8f9f4d7
    • J
      tracing: Add perf counter support for syscalls tracing · f4b5ffcc
      Jason Baron 提交于
      The perf counter support is automated for usual trace events. But we
      have to define specific callbacks for this to handle syscalls trace
      events
      
      Make 'perf stat -e syscalls:sys_enter_blah' work with syscall style
      tracepoints.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      f4b5ffcc
    • J
      tracing: Add individual syscalls tracepoint id support · 64c12e04
      Jason Baron 提交于
      The current state of syscalls tracepoints generates only one event id
      for every syscall events.
      
      This patch associates an id with each syscall trace event, so that we
      can identify each syscall trace event using the 'perf' tool.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      64c12e04
    • J
      tracing: Add trace events for each syscall entry/exit · fb34a08c
      Jason Baron 提交于
      Layer Frederic's syscall tracer on tracepoints. We create trace events
      via hooking into the SYSCALL_DEFINE macros. This allows us to
      individually toggle syscall entry and exit points on/off.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      fb34a08c
    • J
      tracing: Add ftrace_event_call void * 'data' field · 69fd4f0e
      Jason Baron 提交于
      add an optional void * pointer to 'ftrace_event_call' that is
      passed in for regfunc and unregfunc.
      
      This prepares for syscall tracepoints creation by passing the name of
      the syscall we want to trace and then retrieve its number through our
      arch syscall table.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      69fd4f0e
    • J
      tracing: Add syscall tracepoints · a871bd33
      Jason Baron 提交于
      add two tracepoints in syscall exit and entry path, conditioned on
      TIF_SYSCALL_FTRACE. Supports the syscall trace event code.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      a871bd33
    • J
      tracing: Add DECLARE_TRACE_WITH_CALLBACK() macro · 63fbdab3
      Jason Baron 提交于
      Introduce a new 'DECLARE_TRACE_WITH_CALLBACK()' macro, so that
      tracepoints can associate an external register/unregister function.
      
      This prepares for the syscalls tracer conversion to trace events. We
      will need to perform arch level operations once a syscall event is
      turned on/off, such as TIF flags setting, hence the need of such
      specific callbacks.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      63fbdab3
    • J
      tracing: Call arch_init_ftrace_syscalls at boot · 066e0378
      Jason Baron 提交于
      Call arch_init_ftrace_syscalls at boot, so we can determine early the
      set of syscalls for the syscall trace events.
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      066e0378
  8. 10 8月, 2009 3 次提交
    • F
      perf_counter: Zero dead bytes from ftrace raw samples size alignment · 1853db0e
      Frederic Weisbecker 提交于
      After aligning the ftrace raw samples, there are dead bytes storing
      random data from the stack. We don't want to leak these to userspace,
      then zero these out.
      
      Before:
      
      	0x2de88 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
      	.  0010:  68 01 00 00 68 01 00 00 2c 00 00 00 00 00 00 00  h...h...,......
      	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
      	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
      	.  0040:  68 01 00 00 40 7f 46 81 ff ff ff ff 00 10 1b 7f  h...@.F........
                                                            ^  ^  ^  ^
                                                               Leak
      
      After:
      
      	0x2d318 [0x50]: event: 9
      	.
      	. ... raw event: size 80 bytes
      	.  0000:  09 00 00 00 01 00 50 00 d0 c7 00 81 ff ff ff ff  ......P........
      	.  0010:  68 01 00 00 68 01 00 00 68 14 00 00 00 00 00 00  h...h...h......
      	.  0020:  2c 00 00 00 2b 00 01 02 68 01 00 00 68 01 00 00  ,...+...h...h..
      	.  0030:  6b 6f 6e 64 65 6d 61 6e 64 2f 30 00 00 00 00 00  kondemand/0....
      	.  0040:  68 01 00 00 a0 80 46 81 ff ff ff ff 00 00 00 00  h.....F........
                                                            ^  ^  ^  ^
      							 Fixed
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <1249915116-5210-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      1853db0e
    • F
      perf_counter: Subtract the buffer size field from the event record size · 304703ab
      Frederic Weisbecker 提交于
      We compute the perf raw sample size by aligning the raw ftrace
      event size plus the buffer size field itself. We do that
      instead of aligning only the perf raw sample size, so that we
      might economize some in some cases.
      
      But this buffer size field is not stored in the perf raw
      sample, we must then substract its size from the buffer once we
      computed the alignment unless we may get a useless u32 field in
      the buffer.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090810141129.GA5124@nowhere>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      304703ab
    • P
      perf_counter: Correct PERF_SAMPLE_RAW output · a044560c
      Peter Zijlstra 提交于
      PERF_SAMPLE_* output switches should unconditionally output the
      correct format, as they are the only way to unambiguously parse
      the PERF_EVENT_SAMPLE data.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249896447.17467.74.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a044560c
  9. 09 8月, 2009 3 次提交
    • F
      perf_counter: Fix tracepoint sampling to be part of generic sampling · 3a43ce68
      Frederic Weisbecker 提交于
      Based on Peter's comments, make tracepoint sampling generic
      just like all the other sampling bits are. This is a rename
      with no code changes:
      
      - PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
      - struct perf_tracepoint_record to perf_raw_record
      
      We want the system in place that transport tracepoints raw
      samples events into the perf ring buffer to be generalized and
      usable by any type of counter.
      
      Reported-by; Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a43ce68
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
    • P
      perf_counter, ftrace: Fix perf_counter integration · 3a659305
      Peter Zijlstra 提交于
      Adds possible second part to the assign argument of TP_EVENT().
      
        TP_perf_assign(
      	__perf_count(foo);
      	__perf_addr(bar);
        )
      
      Which, when specified make the swcounter increment with @foo instead
      of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
      associated with the event) when this triggers a counter overflow.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a659305
  10. 08 8月, 2009 4 次提交
    • P
      bzip2/lzma/gzip: fix comments describing decompressor API · daeb6b6f
      Phillip Lougher 提交于
      Fix and improve comments in decompress/generic.h that describe the
      decompressor API.  Also remove an unused definition, and rename INBUF_LEN
      in lib/decompress_inflate.c to conform to bzip2/lzma naming.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      daeb6b6f
    • K
      mm: make set_mempolicy(MPOL_INTERLEAV) N_HIGH_MEMORY aware · 4bfc4495
      KAMEZAWA Hiroyuki 提交于
      At first, init_task's mems_allowed is initialized as this.
       init_task->mems_allowed == node_state[N_POSSIBLE]
      
      And cpuset's top_cpuset mask is initialized as this
       top_cpuset->mems_allowed = node_state[N_HIGH_MEMORY]
      
      Before 2.6.29:
      policy's mems_allowed is initialized as this.
      
        1. update tasks->mems_allowed by its cpuset->mems_allowed.
        2. policy->mems_allowed = nodes_and(tasks->mems_allowed, user's mask)
      
      Updating task's mems_allowed in reference to top_cpuset's one.
      cpuset's mems_allowed is aware of N_HIGH_MEMORY, always.
      
      In 2.6.30: After commit 58568d2a
      ("cpuset,mm: update tasks' mems_allowed in time"), policy's mems_allowed
      is initialized as this.
      
        1. policy->mems_allowd = nodes_and(task->mems_allowed, user's mask)
      
      Here, if task is in top_cpuset, task->mems_allowed is not updated from
      init's one.  Assume user excutes command as #numactrl --interleave=all
      ,....
      
        policy->mems_allowd = nodes_and(N_POSSIBLE, ALL_SET_MASK)
      
      Then, policy's mems_allowd can includes a possible node, which has no pgdat.
      
      MPOL's INTERLEAVE just scans nodemask of task->mems_allowd and access this
      directly.
      
        NODE_DATA(nid)->zonelist even if NODE_DATA(nid)==NULL
      
      Then, what's we need is making policy->mems_allowed be aware of
      N_HIGH_MEMORY.  This patch does that.  But to do so, extra nodemask will
      be on statck.  Because I know cpumask has a new interface of
      CPUMASK_ALLOC(), I added it to node.
      
      This patch stands on old behavior.  But I feel this fix itself is just a
      Band-Aid.  But to do fundametal fix, we have to take care of memory
      hotplug and it takes time.  (task->mems_allowd should be N_HIGH_MEMORY, I
      think.)
      
      mpol_set_nodemask() should be aware of N_HIGH_MEMORY and policy's nodemask
      should be includes only online nodes.
      
      In old behavior, this is guaranteed by frequent reference to cpuset's
      code.  Now, most of them are removed and mempolicy has to check it by
      itself.
      
      To do check, a few nodemask_t will be used for calculating nodemask.  But,
      size of nodemask_t can be big and it's not good to allocate them on stack.
      
      Now, cpumask_t has CPUMASK_ALLOC/FREE an easy code for get scratch area.
      NODEMASK_ALLOC/FREE shoudl be there.
      
      [akpm@linux-foundation.org: cleanups & tweaks]
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4bfc4495
    • C
      vfs: add __destroy_inode · 2e00c97e
      Christoph Hellwig 提交于
      When we want to tear down an inode that lost the add to the cache race
      in XFS we must not call into ->destroy_inode because that would delete
      the inode that won the race from the inode cache radix tree.
      
      This patch provides the __destroy_inode helper needed to fix this,
      the actual fix will be in th next patch.  As XFS was the only reason
      destroy_inode was exported we shift the export to the new __destroy_inode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      2e00c97e
    • C
      vfs: fix inode_init_always calling convention · 54e34621
      Christoph Hellwig 提交于
      Currently inode_init_always calls into ->destroy_inode if the additional
      initialization fails.  That's not only counter-intuitive because
      inode_init_always did not allocate the inode structure, but in case of
      XFS it's actively harmful as ->destroy_inode might delete the inode from
      a radix-tree that has never been added.  This in turn might end up
      deleting the inode for the same inum that has been instanciated by
      another process and cause lots of cause subtile problems.
      
      Also in the case of re-initializing a reclaimable inode in XFS it would
      free an inode we still want to keep alive.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      54e34621
  11. 06 8月, 2009 2 次提交
  12. 05 8月, 2009 2 次提交