1. 01 9月, 2017 1 次提交
    • S
      tracing: Only have rmmod clear buffers that its events were active in · 065e63f9
      Steven Rostedt (VMware) 提交于
      Currently, when a module event is enabled, when that module is removed, it
      clears all ring buffers. This is to prevent another module from being loaded
      and having one of its trace event IDs from reusing a trace event ID of the
      removed module. This could cause undesirable effects as the trace event of
      the new module would be using its own processing algorithms to process raw
      data of another event. To prevent this, when a module is loaded, if any of
      its events have been used (signified by the WAS_ENABLED event call flag,
      which is never cleared), all ring buffers are cleared, just in case any one
      of them contains event data of the removed event.
      
      The problem is, there's no reason to clear all ring buffers if only one (or
      less than all of them) uses one of the events. Instead, only clear the ring
      buffers that recorded the events of a module that is being removed.
      
      To do this, instead of keeping the WAS_ENABLED flag with the trace event
      call, move it to the per instance (per ring buffer) event file descriptor.
      The event file descriptor maps each event to a separate ring buffer
      instance. Then when the module is removed, only the ring buffers that
      activated one of the module's events get cleared. The rest are not touched.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      065e63f9
  2. 28 6月, 2017 1 次提交
    • J
      tracing: Add support for recording tgid of tasks · d914ba37
      Joel Fernandes 提交于
      Inorder to support recording of tgid, the following changes are made:
      
      * Introduce a new API (tracing_record_taskinfo) to additionally record the tgid
        along with the task's comm at the same time. This has has the benefit of not
        setting trace_cmdline_save before all the information for a task is saved.
      * Add a new API tracing_record_taskinfo_sched_switch to record task information
        for 2 tasks at a time (previous and next) and use it from sched_switch probe.
      * Preserve the old API (tracing_record_cmdline) and create it as a wrapper
        around the new one so that existing callers aren't affected.
      * Reuse the existing sched_switch and sched_wakeup probes to record tgid
        information and add a new option 'record-tgid' to enable recording of tgid
      
      When record-tgid option isn't enabled to being with, we take care to make sure
      that there's isn't memory or runtime overhead.
      
      Link: http://lkml.kernel.org/r/20170627020155.5139-1-joelaf@google.com
      
      Cc: kernel-team@android.com
      Cc: Ingo Molnar <mingo@redhat.com>
      Tested-by: NMichael Sartain <mikesart@gmail.com>
      Signed-off-by: NJoel Fernandes <joelaf@google.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d914ba37
  3. 14 6月, 2017 3 次提交
  4. 21 4月, 2017 8 次提交
    • S
      tracing/ftrace: Add a better way to pass data via the probe functions · 6e444319
      Steven Rostedt (VMware) 提交于
      With the redesign of the registration and execution of the function probes
      (triggers), data can now be passed from the setup of the probe to the probe
      callers that are specific to the trace_array it is on. Although, all probes
      still only affect the toplevel trace array, this change will allow for
      instances to have their own probes separated from other instances and the
      top array.
      
      That is, something like the stacktrace probe can be set to trace only in an
      instance and not the toplevel trace array. This isn't implement yet, but
      this change sets the ground work for the change.
      
      When a probe callback is triggered (someone writes the probe format into
      set_ftrace_filter), it calls register_ftrace_function_probe() passing in
      init_data that will be used to initialize the probe. Then for every matching
      function, register_ftrace_function_probe() will call the probe_ops->init()
      function with the init data that was passed to it, as well as an address to
      a place holder that is associated with the probe and the instance. The first
      occurrence will have a NULL in the pointer. The init() function will then
      initialize it. If other probes are added, or more functions are part of the
      probe, the place holder will be passed to the init() function with the place
      holder data that it was initialized to the last time.
      
      Then this place_holder is passed to each of the other probe_ops functions,
      where it can be used in the function callback. When the probe_ops free()
      function is called, it can be called either with the rip of the function
      that is being removed from the probe, or zero, indicating that there are no
      more functions attached to the probe, and the place holder is about to be
      freed. This gives the probe_ops a way to free the data it assigned to the
      place holder if it was allocade during the first init call.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      6e444319
    • S
      ftrace: Dynamically create the probe ftrace_ops for the trace_array · 7b60f3d8
      Steven Rostedt (VMware) 提交于
      In order to eventually have each trace_array instance have its own unique
      set of function probes (triggers), the trace array needs to hold the ops and
      the filters for the probes.
      
      This is the first step to accomplish this. Instead of having the private
      data of the probe ops point to the trace_array, create a separate list that
      the trace_array holds. There's only one private_data for a probe, we need
      one per trace_array. The probe ftrace_ops will be dynamically created for
      each instance, instead of being static.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      7b60f3d8
    • S
      tracing: Pass the trace_array into ftrace_probe_ops functions · b5f081b5
      Steven Rostedt (VMware) 提交于
      Pass the trace_array associated to a ftrace_probe_ops into the probe_ops
      func(), init() and free() functions. The trace_array is the descriptor that
      describes a tracing instance. This will help create the infrastructure that
      will allow having function probes unique to tracing instances.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b5f081b5
    • S
      tracing: Have the trace_array hold the list of registered func probes · 04ec7bb6
      Steven Rostedt (VMware) 提交于
      Add a link list to the trace_array to hold func probes that are registered.
      Currently, all function probes are the same for all instances as it was
      before, that is, only the top level trace_array holds the function probes.
      But this lays the ground work to have function probes be attached to
      individual instances, and having the event trigger only affect events in the
      given instance. But that work is still to be done.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      04ec7bb6
    • S
      ftrace: Have unregister_ftrace_function_probe_func() return a value · d3d532d7
      Steven Rostedt (VMware) 提交于
      Currently unregister_ftrace_function_probe_func() is a void function. It
      does not give any feedback if an error occurred or no item was found to
      remove and nothing was done.
      
      Change it to return status and success if it removed something. Also update
      the callers to return that feedback to the user.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      d3d532d7
    • S
      ftrace: Remove data field from ftrace_func_probe structure · 1a48df00
      Steven Rostedt (VMware) 提交于
      No users of the function probes uses the data field anymore. Remove it, and
      change the init function to take a void *data parameter instead of a
      void **data, because the init will just get the data that the registering
      function was received, and there's no state after it is called.
      
      The other functions for ftrace_probe_ops still take the data parameter, but
      it will currently only be passed NULL. It will stay as a parameter for
      future data to be passed to these functions.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1a48df00
    • S
      ftrace: Added ftrace_func_mapper for function probe triggers · 41794f19
      Steven Rostedt (VMware) 提交于
      In order to move the ops to the function probes directly, they need a way to
      map function ips to their own data without depending on the infrastructure
      of the function probes, as the data field will be going away.
      
      New helper functions are added that are based on the ftrace_hash code.
      ftrace_func_mapper functions are there to let the probes map ips to their
      data. These can be allocated by the probe ops, and referenced in the
      function callbacks.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      41794f19
    • S
      ftrace: Pass probe ops to probe function · bca6c8d0
      Steven Rostedt (VMware) 提交于
      In preparation to cleaning up the probe function registration code, the
      "data" parameter will eventually be removed from the probe->func() call.
      Instead it will receive its own "ops" function, in which it can set up its
      own data that it needs to map.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      bca6c8d0
  5. 09 12月, 2016 1 次提交
  6. 24 11月, 2016 1 次提交
    • S
      tracing: Make tracepoint_printk a static_key · 42391745
      Steven Rostedt (Red Hat) 提交于
      Currently, when tracepoint_printk is set (enabled by the "tp_printk" kernel
      command line), it causes trace events to print via printk(). This is a very
      dangerous operation, but is useful for debugging.
      
      The issue is, it's seldom used, but it is always checked even if it's not
      enabled by the kernel command line. Instead of having this feature called by
      a branch against a variable, turn that variable into a static key, and this
      will remove the test and jump.
      
      To simplify things, the functions output_printk() and
      trace_event_buffer_commit() were moved from trace_events.c to trace.c.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      42391745
  7. 23 11月, 2016 1 次提交
  8. 20 6月, 2016 5 次提交
  9. 04 5月, 2016 2 次提交
    • S
      tracing: Use temp buffer when filtering events · 0fc1b09f
      Steven Rostedt (Red Hat) 提交于
      Filtering of events requires the data to be written to the ring buffer
      before it can be decided to filter or not. This is because the parameters of
      the filter are based on the result that is written to the ring buffer and
      not on the parameters that are passed into the trace functions.
      
      The ftrace ring buffer is optimized for writing into the ring buffer and
      committing. The discard procedure used when filtering decides the event
      should be discarded is much more heavy weight. Thus, using a temporary
      filter when filtering events can speed things up drastically.
      
      Without a temp buffer we have:
      
       # trace-cmd start -p nop
       # perf stat -r 10 hackbench 50
             0.790706626 seconds time elapsed ( +-  0.71% )
      
       # trace-cmd start -e all
       # perf stat -r 10 hackbench 50
             1.566904059 seconds time elapsed ( +-  0.27% )
      
       # trace-cmd start -e all -f 'common_preempt_count==20'
       # perf stat -r 10 hackbench 50
             1.690598511 seconds time elapsed ( +-  0.19% )
      
       # trace-cmd start -e all -f 'common_preempt_count!=20'
       # perf stat -r 10 hackbench 50
             1.707486364 seconds time elapsed ( +-  0.30% )
      
      The first run above is without any tracing, just to get a based figure.
      hackbench takes ~0.79 seconds to run on the system.
      
      The second run enables tracing all events where nothing is filtered. This
      increases the time by 100% and hackbench takes 1.57 seconds to run.
      
      The third run filters all events where the preempt count will equal "20"
      (this should never happen) thus all events are discarded. This takes 1.69
      seconds to run. This is 10% slower than just committing the events!
      
      The last run enables all events and filters where the filter will commit all
      events, and this takes 1.70 seconds to run. The filtering overhead is
      approximately 10%. Thus, the discard and commit of an event from the ring
      buffer may be about the same time.
      
      With this patch, the numbers change:
      
       # trace-cmd start -p nop
       # perf stat -r 10 hackbench 50
             0.778233033 seconds time elapsed ( +-  0.38% )
      
       # trace-cmd start -e all
       # perf stat -r 10 hackbench 50
             1.582102692 seconds time elapsed ( +-  0.28% )
      
       # trace-cmd start -e all -f 'common_preempt_count==20'
       # perf stat -r 10 hackbench 50
             1.309230710 seconds time elapsed ( +-  0.22% )
      
       # trace-cmd start -e all -f 'common_preempt_count!=20'
       # perf stat -r 10 hackbench 50
             1.786001924 seconds time elapsed ( +-  0.20% )
      
      The first run is again the base with no tracing.
      
      The second run is all tracing with no filtering. It is a little slower, but
      that may be well within the noise.
      
      The third run shows that discarding all events only took 1.3 seconds. This
      is a speed up of 23%! The discard is much faster than even the commit.
      
      The one downside is shown in the last run. Events that are not discarded by
      the filter will take longer to add, this is due to the extra copy of the
      event.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0fc1b09f
    • C
      tracing: Don't display trigger file for events that can't be enabled · 854145e0
      Chunyu Hu 提交于
      Currently register functions for events will be called
      through the 'reg' field of event class directly without
      any check when seting up triggers.
      
      Triggers for events that don't support register through
      debug fs (events under events/ftrace are for trace-cmd to
      read event format, and most of them don't have a register
      function except events/ftrace/functionx) can't be enabled
      at all, and an oops will be hit when setting up trigger
      for those events, so just not creating them is an easy way
      to avoid the oops.
      
      Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com
      
      Cc: stable@vger.kernel.org # 3.14+
      Fixes: 85f2b082 ("tracing: Add basic event trigger framework")
      Signed-off-by: NChunyu Hu <chuhu@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      854145e0
  10. 30 4月, 2016 1 次提交
  11. 20 4月, 2016 1 次提交
    • T
      tracing: Add 'hist' event trigger command · 7ef224d1
      Tom Zanussi 提交于
      'hist' triggers allow users to continually aggregate trace events,
      which can then be viewed afterwards by simply reading a 'hist' file
      containing the aggregation in a human-readable format.
      
      The basic idea is very simple and boils down to a mechanism whereby
      trace events, rather than being exhaustively dumped in raw form and
      viewed directly, are automatically 'compressed' into meaningful tables
      completely defined by the user.
      
      This is done strictly via single-line command-line commands and
      without the aid of any kind of programming language or interpreter.
      
      A surprising number of typical use cases can be accomplished by users
      via this simple mechanism.  In fact, a large number of the tasks that
      users typically do using the more complicated script-based tracing
      tools, at least during the initial stages of an investigation, can be
      accomplished by simply specifying a set of keys and values to be used
      in the creation of a hash table.
      
      The Linux kernel trace event subsystem happens to provide an extensive
      list of keys and values ready-made for such a purpose in the form of
      the event format files associated with each trace event.  By simply
      consulting the format file for field names of interest and by plugging
      them into the hist trigger command, users can create an endless number
      of useful aggregations to help with investigating various properties
      of the system.  See Documentation/trace/events.txt for examples.
      
      hist triggers are implemented on top of the existing event trigger
      infrastructure, and as such are consistent with the existing triggers
      from a user's perspective as well.
      
      The basic syntax follows the existing trigger syntax.  Users start an
      aggregation by writing a 'hist' trigger to the event of interest's
      trigger file:
      
        # echo hist:keys=xxx [ if filter] > event/trigger
      
      Once a hist trigger has been set up, by default it continually
      aggregates every matching event into a hash table using the event key
      and a value field named 'hitcount'.
      
      To view the aggregation at any point in time, simply read the 'hist'
      file in the same directory as the 'trigger' file:
      
        # cat event/hist
      
      The detailed syntax provides additional options for user control, and
      is described exhaustively in Documentation/trace/events.txt and in the
      virtual tracing/README file in the tracing subsystem.
      
      Link: http://lkml.kernel.org/r/72d263b5e1853fe9c314953b65833c3aa75479f2.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ef224d1
  12. 19 4月, 2016 3 次提交
    • S
      tracing: Add infrastructure to allow set_event_pid to follow children · c37775d5
      Steven Rostedt 提交于
      Add the infrastructure needed to have the PIDs in set_event_pid to
      automatically add PIDs of the children of the tasks that have their PIDs in
      set_event_pid. This will also remove PIDs from set_event_pid when a task
      exits
      
      This is implemented by adding hooks into the fork and exit tracepoints. On
      fork, the PIDs are added to the list, and on exit, they are removed.
      
      Add a new option called event_fork that when set, PIDs in set_event_pid will
      automatically get their children PIDs added when they fork, as well as any
      task that exits will have its PID removed from set_event_pid.
      
      This works for instances as well.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c37775d5
    • S
      tracing: Use pid bitmap instead of a pid array for set_event_pid · f4d34a87
      Steven Rostedt 提交于
      In order to add the ability to let tasks that are filtered by the events
      have their children also be traced on fork (and then not traced on exit),
      convert the array into a pid bitmask. Most of the time the number of pids is
      only 32768 pids or a 4k bitmask, which is the same size as the default list
      currently is, and that list could grow if more pids are listed.
      
      This also greatly simplifies the code.
      Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f4d34a87
    • S
      tracing: Rename check_ignore_pid() to ignore_this_task() · 9ebc57cf
      Steven Rostedt 提交于
      The name "check_ignore_pid" is confusing in trying to figure out if the pid
      should be ignored or not. Rename it to "ignore_this_task" which is pretty
      straight forward, as a task (not a pid) is passed in, and should if true
      should be ignored.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9ebc57cf
  13. 08 4月, 2016 1 次提交
  14. 04 3月, 2016 1 次提交
    • S
      tracing: Do not have 'comm' filter override event 'comm' field · e57cbaf0
      Steven Rostedt (Red Hat) 提交于
      Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and
      process names" added a 'comm' filter that will filter events based on the
      current tasks struct 'comm'. But this now hides the ability to filter events
      that have a 'comm' field too. For example, sched_migrate_task trace event.
      That has a 'comm' field of the task to be migrated.
      
       echo 'comm == "bash"' > events/sched_migrate_task/filter
      
      will now filter all sched_migrate_task events for tasks named "bash" that
      migrates other tasks (in interrupt context), instead of seeing when "bash"
      itself gets migrated.
      
      This fix requires a couple of changes.
      
      1) Change the look up order for filter predicates to look at the events
         fields before looking at the generic filters.
      
      2) Instead of basing the filter function off of the "comm" name, have the
         generic "comm" filter have its own filter_type (FILTER_COMM). Test
         against the type instead of the name to assign the filter function.
      
      3) Add a new "COMM" filter that works just like "comm" but will filter based
         on the current task, even if the trace event contains a "comm" field.
      
      Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
      
      Cc: stable@vger.kernel.org # v4.3+
      Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names"
      Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e57cbaf0
  15. 24 2月, 2016 1 次提交
    • S
      tracing: Fix showing function event in available_events · d045437a
      Steven Rostedt (Red Hat) 提交于
      The ftrace:function event is only displayed for parsing the function tracer
      data. It is not used to enable function tracing, and does not include an
      "enable" file in its event directory.
      
      Originally, this event was kept separate from other events because it did
      not have a ->reg parameter. But perf added a "reg" parameter for its use
      which caused issues, because it made the event available to functions where
      it was not compatible for.
      
      Commit 9b63776f "tracing: Do not enable function event with enable"
      added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
      from being enabled by normal trace events. But this commit missed keeping
      the function event from being displayed by the "available_events" directory,
      which is used to show what events can be enabled by set_event.
      
      One documented way to enable all events is to:
      
       cat available_events > set_event
      
      But because the function event is displayed in the available_events, this
      now causes an INVALID error:
      
       cat: write error: Invalid argument
      Reported-by: NChunyu Hu <chuhu@redhat.com>
      Fixes: 9b63776f "tracing: Do not enable function event with enable"
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d045437a
  16. 04 1月, 2016 1 次提交
  17. 02 12月, 2015 1 次提交
    • S
      tracing: Add sched_wakeup_new and sched_waking tracepoints for pid filter · 0f72e37e
      Steven Rostedt (Red Hat) 提交于
      The set_event_pid filter relies on attaching to the sched_switch and
      sched_wakeup tracepoints to see if it should filter the tracing on schedule
      tracepoints. By adding the callbacks to sched_wakeup, pids in the
      set_event_pid file will trace the wakeups of those tasks with those pids.
      
      But sched_wakeup_new and sched_waking were missed. These two should also be
      traced. Luckily, these tracepoints share the same class as sched_wakeup
      which means they can use the same pre and post callbacks as sched_wakeup
      does.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0f72e37e
  18. 04 11月, 2015 1 次提交
    • S
      tracing: Put back comma for empty fields in boot string parsing · 43ed3843
      Steven Rostedt (Red Hat) 提交于
      Both early_enable_events() and apply_trace_boot_options() parse a boot
      string that may get parsed later on. They both use strsep() which converts a
      comma into a nul character. To still allow the boot string to be parsed
      again the same way, the nul character gets converted back to a comma after
      the token is processed.
      
      The problem is that these two functions check for an empty parameter (two
      commas in a row ",,"), and continue the loop if the parameter is empty, but
      fails to place the comma back. In this case, the second parsing will end at
      this blank field, and not process fields afterward.
      
      In most cases, users should not have an empty field, but if its going to be
      checked, the code might as well be correct.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      43ed3843
  19. 03 11月, 2015 1 次提交
  20. 26 10月, 2015 4 次提交
    • S
      tracing: Fix sparse RCU warning · fb662288
      Steven Rostedt (Red Hat) 提交于
      p_start() and p_stop() are seq_file functions that match. Teach sparse to
      know that rcu_read_lock_sched() that is taken by p_start() is released by
      p_stop.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fb662288
    • S
      tracing: Check all tasks on each CPU when filtering pids · 8ca532ad
      Steven Rostedt (Red Hat) 提交于
      My tests found that if a task is running but not filtered when set_event_pid
      is modified, then it can still be traced.
      
      Call on_each_cpu() to check if the current running task should be filtered
      and update the per cpu flags of tr->data appropriately.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8ca532ad
    • S
      tracing: Implement event pid filtering · 3fdaf80f
      Steven Rostedt (Red Hat) 提交于
      Add the necessary hooks to use the pids loaded in set_event_pid to filter
      all the events enabled in the tracing instance that match the pids listed.
      
      Two probes are added to both sched_switch and sched_wakeup tracepoints to be
      called before other probes are called and after the other probes are called.
      The first is used to set the necessary flags to let the probes know to test
      if they should be traced or not.
      
      The sched_switch pre probe will set the "ignore_pid" flag if neither the
      previous or next task has a matching pid.
      
      The sched_switch probe will set the "ignore_pid" flag if the next task
      does not match the matching pid.
      
      The pre probe allows for probes tracing sched_switch to be traced if
      necessary.
      
      The sched_wakeup pre probe will set the "ignore_pid" flag if neither the
      current task nor the wakee task has a matching pid.
      
      The sched_wakeup post probe will set the "ignore_pid" flag if the current
      task does not have a matching pid.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3fdaf80f
    • S
      tracing: Add set_event_pid directory for future use · 49090107
      Steven Rostedt (Red Hat) 提交于
      Create a tracing directory called set_event_pid, which currently has no
      function, but will be used to filter all events for the tracing instance or
      the pids that are added to the file.
      
      The reason no functionality is added with this commit is that this commit
      focuses on the creation and removal of the pids in a safe manner. And tests
      can be made against this change to make sure things are correct before
      hooking features to the list of pids.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      49090107
  21. 01 10月, 2015 1 次提交