1. 25 10月, 2017 1 次提交
    • Y
      bpf: permit multiple bpf attachments for a single perf event · e87c6bc3
      Yonghong Song 提交于
      This patch enables multiple bpf attachments for a
      kprobe/uprobe/tracepoint single trace event.
      Each trace_event keeps a list of attached perf events.
      When an event happens, all attached bpf programs will
      be executed based on the order of attachment.
      
      A global bpf_event_mutex lock is introduced to protect
      prog_array attaching and detaching. An alternative will
      be introduce a mutex lock in every trace_event_call
      structure, but it takes a lot of extra memory.
      So a global bpf_event_mutex lock is a good compromise.
      
      The bpf prog detachment involves allocation of memory.
      If the allocation fails, a dummy do-nothing program
      will replace to-be-detached program in-place.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e87c6bc3
  2. 12 9月, 2017 1 次提交
  3. 29 8月, 2017 1 次提交
    • Z
      perf/ftrace: Fix double traces of perf on ftrace:function · 75e83876
      Zhou Chengming 提交于
      When running perf on the ftrace:function tracepoint, there is a bug
      which can be reproduced by:
      
        perf record -e ftrace:function -a sleep 20 &
        perf record -e ftrace:function ls
        perf script
      
                    ls 10304 [005]   171.853235: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853237: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853239: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853240: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853242: ftrace:function:
        __task_pid_nr_ns
                    ls 10304 [005]   171.853244: ftrace:function:
        __task_pid_nr_ns
      
      We can see that all the function traces are doubled.
      
      The problem is caused by the inconsistency of the register
      function perf_ftrace_event_register() with the probe function
      perf_ftrace_function_call(). The former registers one probe
      for every perf_event. And the latter handles all perf_events
      on the current cpu. So when two perf_events on the current cpu,
      the traces of them will be doubled.
      
      So this patch adds an extra parameter "event" for perf_tp_event,
      only send sample data to this event when it's not NULL.
      Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Cc: huawei.libin@huawei.com
      Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75e83876
  4. 08 8月, 2017 1 次提交
    • Y
      bpf: add support for sys_enter_* and sys_exit_* tracepoints · cf5f5cea
      Yonghong Song 提交于
      Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
      style tracepoints. The iovisor/bcc issue #748
      (https://github.com/iovisor/bcc/issues/748) documents this issue.
      For example, if you try to attach a bpf program to tracepoints
      syscalls/sys_enter_newfstat, you will get the following error:
         # ./tools/trace.py t:syscalls:sys_enter_newfstat
         Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
         Failed to attach BPF to tracepoint
      
      The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
      tracepoints are treated differently from other tracepoints and there
      is no bpf hook to it.
      
      This patch adds bpf support for these syscalls tracepoints by
        . permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF
        . calling bpf programs in perf_syscall_enter and perf_syscall_exit
      
      The legality of bpf program ctx access is also checked.
      Function trace_event_get_offsets returns correct max offset for each
      specific syscall tracepoint, which is compared against the maximum offset
      access in bpf program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf5f5cea
  5. 29 9月, 2016 1 次提交
  6. 08 4月, 2016 1 次提交
  7. 09 3月, 2016 1 次提交
  8. 29 2月, 2016 1 次提交
    • T
      tracing/syscalls: Rename "/format" tracepoint field name "nr" to "__syscall_nr: · 026842d1
      Taeung Song 提交于
      Some tracepoint have multiple fields with the same name, "nr", the first
      one is a unique syscall ID, the other is a syscall argument:
      
        # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_io_getevents/format
        name: sys_enter_io_getevents
        ID: 747
        format:
       	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
       	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
       	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
       	field:int common_pid;	offset:4;	size:4;	signed:1;
      
       	field:int nr;	offset:8;	size:4;	signed:1;
       	field:aio_context_t ctx_id;	offset:16;	size:8;	signed:0;
       	field:long min_nr;	offset:24;	size:8;	signed:0;
       	field:long nr;	offset:32;	size:8;	signed:0;
       	field:struct io_event * events;	offset:40;	size:8;	signed:0;
       	field:struct timespec * timeout;	offset:48;	size:8;	signed:0;
      
        print fmt: "ctx_id: 0x%08lx, min_nr: 0x%08lx, nr: 0x%08lx, events: 0x%08lx, timeout: 0x%08lx", ((unsigned long)(REC->ctx_id)), ((unsigned long)(REC->min_nr)), ((unsigned long)(REC->nr)), ((unsigned long)(REC->events)), ((unsigned long)(REC->timeout))
        #
      
      Fix it by renaming the "/format" common tracepoint field "nr" to "__syscall_nr".
      Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
      [ Do not rename the struct member, just the '/format' field name ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160226132301.3ae065a4@gandalf.local.homeSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      026842d1
  9. 01 10月, 2015 1 次提交
    • S
      tracing: Move trace_flags from global to a trace_array field · 983f938a
      Steven Rostedt (Red Hat) 提交于
      In preparation to make trace options per instance, the global trace_flags
      needs to be moved from being a global variable to a field within the trace
      instance trace_array structure.
      
      There's still more work to do, as there's some functions that use
      trace_flags without passing in a way to get to the current_trace array. For
      those, the global_trace is used directly (from trace.c). This includes
      setting and clearing the trace_flags. This means that when a new instance is
      created, it just gets the trace_flags of the global_trace and will not be
      able to modify them. Depending on the functions that have access to the
      trace_array, the flags of an instance may not affect parts of its trace,
      where the global_trace is used. These will be fixed in future changes.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      983f938a
  10. 14 5月, 2015 3 次提交
  11. 14 1月, 2015 1 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
  12. 15 12月, 2014 1 次提交
  13. 20 11月, 2014 1 次提交
  14. 31 10月, 2014 1 次提交
    • R
      tracing/syscalls: Ignore numbers outside NR_syscalls' range · 086ba77a
      Rabin Vincent 提交于
      ARM has some private syscalls (for example, set_tls(2)) which lie
      outside the range of NR_syscalls.  If any of these are called while
      syscall tracing is being performed, out-of-bounds array access will
      occur in the ftrace and perf sys_{enter,exit} handlers.
      
       # trace-cmd record -e raw_syscalls:* true && trace-cmd report
       ...
       true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
       true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
       true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
       true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
       ...
      
       # trace-cmd record -e syscalls:* true
       [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
       [   17.289590] pgd = 9e71c000
       [   17.289696] [aaaaaace] *pgd=00000000
       [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
       [   17.290169] Modules linked in:
       [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
       [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
       [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
       [   17.290866] LR is at syscall_trace_enter+0x124/0x184
      
      Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
      
      Commit cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      added the check for less than zero, but it should have also checked
      for greater than NR_syscalls.
      
      Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
      
      Fixes: cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      Cc: stable@vger.kernel.org # 2.6.33+
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      086ba77a
  15. 10 9月, 2014 1 次提交
  16. 10 1月, 2014 1 次提交
  17. 07 1月, 2014 1 次提交
  18. 22 12月, 2013 1 次提交
    • T
      tracing: Add and use generic set_trigger_filter() implementation · bac5fb97
      Tom Zanussi 提交于
      Add a generic event_command.set_trigger_filter() op implementation and
      have the current set of trigger commands use it - this essentially
      gives them all support for filters.
      
      Syntactically, filters are supported by adding 'if <filter>' just
      after the command, in which case only events matching the filter will
      invoke the trigger.  For example, to add a filter to an
      enable/disable_event command:
      
          echo 'enable_event:system:event if common_pid == 999' > \
                    .../othersys/otherevent/trigger
      
      The above command will only enable the system:event event if the
      common_pid field in the othersys:otherevent event is 999.
      
      As another example, to add a filter to a stacktrace command:
      
          echo 'stacktrace if common_pid == 999' > \
                         .../somesys/someevent/trigger
      
      The above command will only trigger a stacktrace if the common_pid
      field in the event is 999.
      
      The filter syntax is the same as that described in the 'Event
      filtering' section of Documentation/trace/events.txt.
      
      Because triggers can now use filters, the trigger-invoking logic needs
      to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
      trigger has a filter associated with it, the trigger invocation now
      needs to happen after the { assign; } part of the call, in order for
      the trigger condition to be tested.
      
      There's still a SOFT_DISABLED-only check at the top of e.g. the
      ftrace_raw_events function, so when an event is soft disabled but not
      because of the presence of a trigger, the original SOFT_DISABLED
      behavior remains unchanged.
      
      There's also a bit of trickiness in that some triggers need to avoid
      being invoked while an event is currently in the process of being
      logged, since the trigger may itself log data into the trace buffer.
      Thus we make sure the current event is committed before invoking those
      triggers.  To do that, we split the trigger invocation in two - the
      first part (event_triggers_call()) checks the filter using the current
      trace record; if a command has the post_trigger flag set, it sets a
      bit for itself in the return value, otherwise it directly invoks the
      trigger.  Once all commands have been either invoked or set their
      return flag, event_triggers_call() returns.  The current record is
      then either committed or discarded; if any commands have deferred
      their triggers, those commands are finally invoked following the close
      of the current event by event_triggers_post_call().
      
      To simplify the above and make it more efficient, the TRIGGER_COND bit
      is introduced, which is set only if a soft-disabled trigger needs to
      use the log record for filter testing or needs to wait until the
      current log record is closed.
      
      The syscall event invocation code is also changed in analogous ways.
      
      Because event triggers need to be able to create and free filters,
      this also adds a couple external wrappers for the existing
      create_filter and free_filter functions, which are too generic to be
      made extern functions themselves.
      
      Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bac5fb97
  19. 21 12月, 2013 1 次提交
    • T
      tracing: Add basic event trigger framework · 85f2b082
      Tom Zanussi 提交于
      Add a 'trigger' file for each trace event, enabling 'trace event
      triggers' to be set for trace events.
      
      'trace event triggers' are patterned after the existing 'ftrace
      function triggers' implementation except that triggers are written to
      per-event 'trigger' files instead of to a single file such as the
      'set_ftrace_filter' used for ftrace function triggers.
      
      The implementation is meant to be entirely separate from ftrace
      function triggers, in order to keep the respective implementations
      relatively simple and to allow them to diverge.
      
      The event trigger functionality is built on top of SOFT_DISABLE
      functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
      flags which is checked when any trace event fires.  Triggers set for a
      particular event need to be checked regardless of whether that event
      is actually enabled or not - getting an event to fire even if it's not
      enabled is what's already implemented by SOFT_DISABLE mode, so trigger
      mode directly reuses that.  Event trigger essentially inherit the soft
      disable logic in __ftrace_event_enable_disable() while adding a bit of
      logic and trigger reference counting via tm_ref on top of that in a
      new trace_event_trigger_enable_disable() function.  Because the base
      __ftrace_event_enable_disable() code now needs to be invoked from
      outside trace_events.c, a wrapper is also added for those usages.
      
      The triggers for an event are actually invoked via a new function,
      event_triggers_call(), and code is also added to invoke them for
      ftrace_raw_event calls as well as syscall events.
      
      The main part of the patch creates a new trace_events_trigger.c file
      to contain the trace event triggers implementation.
      
      The standard open, read, and release file operations are implemented
      here.
      
      The open() implementation sets up for the various open modes of the
      'trigger' file.  It creates and attaches the trigger iterator and sets
      up the command parser.  If opened for reading set up the trigger
      seq_ops.
      
      The read() implementation parses the event trigger written to the
      'trigger' file, looks up the trigger command, and passes it along to
      that event_command's func() implementation for command-specific
      processing.
      
      The release() implementation does whatever cleanup is needed to
      release the 'trigger' file, like releasing the parser and trigger
      iterator, etc.
      
      A couple of functions for event command registration and
      unregistration are added, along with a list to add them to and a mutex
      to protect them, as well as an (initially empty) registration function
      to add the set of commands that will be added by future commits, and
      call to it from the trace event initialization code.
      
      also added are a couple trigger-specific data structures needed for
      these implementations such as a trigger iterator and a struct for
      trigger-specific data.
      
      A couple structs consisting mostly of function meant to be implemented
      in command-specific ways, event_command and event_trigger_ops, are
      used by the generic event trigger command implementations.  They're
      being put into trace.h alongside the other trace_event data structures
      and functions, in the expectation that they'll be needed in several
      trace_event-related files such as trace_events_trigger.c and
      trace_events.c.
      
      The event_command.func() function is meant to be called by the trigger
      parsing code in order to add a trigger instance to the corresponding
      event.  It essentially coordinates adding a live trigger instance to
      the event, and arming the triggering the event.
      
      Every event_command func() implementation essentially does the
      same thing for any command:
      
         - choose ops - use the value of param to choose either a number or
           count version of event_trigger_ops specific to the command
         - do the register or unregister of those ops
         - associate a filter, if specified, with the triggering event
      
      The reg() and unreg() ops allow command-specific implementations for
      event_trigger_op registration and unregistration, and the
      get_trigger_ops() op allows command-specific event_trigger_ops
      selection to be parameterized.  When a trigger instance is added, the
      reg() op essentially adds that trigger to the triggering event and
      arms it, while unreg() does the opposite.  The set_filter() function
      is used to associate a filter with the trigger - if the command
      doesn't specify a set_filter() implementation, the command will ignore
      filters.
      
      Each command has an associated trigger_type, which serves double duty,
      both as a unique identifier for the command as well as a value that
      can be used for setting a trigger mode bit during trigger invocation.
      
      The signature of func() adds a pointer to the event_command struct,
      used to invoke those functions, along with a command_data param that
      can be passed to the reg/unreg functions.  This allows func()
      implementations to use command-specific blobs and supports code
      re-use.
      
      The event_trigger_ops.func() command corrsponds to the trigger 'probe'
      function that gets called when the triggering event is actually
      invoked.  The other functions are used to list the trigger when
      needed, along with a couple mundane book-keeping functions.
      
      This also moves event_file_data() into trace.h so it can be used
      outside of trace_events.c.
      
      Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Idea-by: NSteve Rostedt <rostedt@goodmis.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      85f2b082
  20. 06 12月, 2013 1 次提交
  21. 06 11月, 2013 2 次提交
    • T
      tracing: Add support for SOFT_DISABLE to syscall events · d562aff9
      Tom Zanussi 提交于
      The original SOFT_DISABLE patches didn't add support for soft disable
      of syscall events; this adds it.
      
      Add an array of ftrace_event_file pointers indexed by syscall number
      to the trace array and remove the existing enabled bitmaps, which as a
      result are now redundant.  The ftrace_event_file structs in turn
      contain the soft disable flags we need for per-syscall soft disable
      accounting.
      
      Adding ftrace_event_files also means we can remove the USE_CALL_FILTER
      bit, thus enabling multibuffer filter support for syscall events.
      
      Link: http://lkml.kernel.org/r/6e72b566e85d8df8042f133efbc6c30e21fb017e.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d562aff9
    • T
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi 提交于
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
      
      Before:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
      
      After:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      appropriate).
      
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f306cc82
  22. 22 8月, 2013 1 次提交
  23. 19 7月, 2013 3 次提交
  24. 03 7月, 2013 1 次提交
    • Z
      tracing: Fix irqs-off tag display in syscall tracing · 11034ae9
      zhangwei(Jovi) 提交于
      All syscall tracing irqs-off tags are wrong, the syscall enter entry doesn't
      disable irqs.
      
       [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event
       [root@jovi tracing]# cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 13/13   #P:2
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
             irqbalance-513   [000] d... 56115.496766: sys_open(filename: 804e1a6, flags: 0, mode: 1b6)
             irqbalance-513   [000] d... 56115.497008: sys_open(filename: 804e1bb, flags: 0, mode: 1b6)
               sendmail-771   [000] d... 56115.827982: sys_open(filename: b770e6d1, flags: 0, mode: 1b6)
      
      The reason is syscall tracing doesn't record irq_flags into buffer.
      The proper display is:
      
       [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event
       [root@jovi tracing]# cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 14/14   #P:2
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
             irqbalance-514   [001] ....    46.213921: sys_open(filename: 804e1a6, flags: 0, mode: 1b6)
             irqbalance-514   [001] ....    46.214160: sys_open(filename: 804e1bb, flags: 0, mode: 1b6)
                  <...>-920   [001] ....    47.307260: sys_open(filename: 4e82a0c5, flags: 80000, mode: 0)
      
      Link: http://lkml.kernel.org/r/1365564393-10972-3-git-send-email-jovi.zhangwei@huawei.com
      
      Cc: stable@vger.kernel.org # 2.6.35
      Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      11034ae9
  25. 16 3月, 2013 1 次提交
  26. 15 3月, 2013 4 次提交
  27. 13 2月, 2013 1 次提交
    • S
      tracing/syscalls: Allow archs to ignore tracing compat syscalls · f431b634
      Steven Rostedt 提交于
      The tracing of ia32 compat system calls has been a bit of a pain as they
      use different system call numbers than the 64bit equivalents.
      
      I wrote a simple 'lls' program that lists files. I compiled it as a i686
      ELF binary and ran it under a x86_64 box. This is the result:
      
      echo 0 > /debug/tracing/tracing_on
      echo 1 > /debug/tracing/events/syscalls/enable
      echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on
      
      grep lls /debug/tracing/trace
      
      [.. skipping calls before TS_COMPAT is set ...]
      
                   lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
                   lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
                   lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
                   lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
                   lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
                   lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
                   lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
                   lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
                   lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
                   lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
                   lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
                   lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
                   lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409279: sys_close(fd: 3)
                   lls-1127  [005] d...   936.421550: sys_close -> 0x200
                   lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
                   lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
                   lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
                   lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
                   lls-1127  [005] d...   936.421580: sys_capget -> 0x0
                   lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
                   lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
                   lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
                   lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
                   lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
                   lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
      
      Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
      obviously incorrect, and confusing.
      
      Other efforts have been made to fix this:
      
      https://lkml.org/lkml/2012/3/26/367
      
      But the real solution is to rewrite the syscall internals and come up
      with a fixed solution. One that doesn't require all the kluge that the
      current solution has.
      
      Thus for now, instead of outputting incorrect data, simply ignore them.
      With this patch the changes now have:
      
       #> grep lls /debug/tracing/trace
       #>
      
      Compat system calls simply are not traced. If users need compat
      syscalls, then they should just use the raw syscall tracepoints.
      
      For an architecture to make their compat syscalls ignored, it must
      define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
      define an arch_trace_is_compat_syscall() function that will return true
      if the current task should ignore tracing the syscall.
      
      I want to stress that this change does not affect actual syscalls in any
      way, shape or form. It is only used within the tracing system and
      doesn't interfere with the syscall logic at all. The changes are
      consolidated nicely into trace_syscalls.c and asm/ftrace.h.
      
      I had to make one small modification to asm/thread_info.h and that was
      to remove the include of asm/ftrace.h. As asm/ftrace.h required the
      current_thread_info() it was causing include hell. That include was
      added back in 2008 when the function graph tracer was added:
      
       commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"
      
      It does not need to be included there.
      
      Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.homeAcked-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f431b634
  28. 22 1月, 2013 1 次提交
  29. 01 11月, 2012 1 次提交
  30. 25 9月, 2012 1 次提交
  31. 18 8月, 2012 1 次提交
  32. 31 7月, 2012 1 次提交