1. 30 7月, 2013 4 次提交
  2. 19 7月, 2013 3 次提交
  3. 03 7月, 2013 2 次提交
    • S
      tracing: Fix race between deleting buffer and setting events · 2a6c24af
      Steven Rostedt (Red Hat) 提交于
      While analyzing the code, I discovered that there's a potential race between
      deleting a trace instance and setting events. There are a few races that can
      occur if events are being traced as the buffer is being deleted. Mostly the
      problem comes with freeing the descriptor used by the trace event callback.
      To prevent problems like this, the events are disabled before the buffer is
      deleted. The problem with the current solution is that the event_mutex is let
      go between disabling the events and freeing the files, which means that the events
      could be enabled again while the freeing takes place.
      
      Cc: stable@vger.kernel.org # 3.10
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a6c24af
    • S
      tracing: Add trace_array_get/put() to event handling · 8e2e2fa4
      Steven Rostedt (Red Hat) 提交于
      Commit a695cb58 "tracing: Prevent deleting instances when they are being read"
      tried to fix a race between deleting a trace instance and reading contents
      of a trace file. But it wasn't good enough. The following could crash the kernel:
      
       # cd /sys/kernel/debug/tracing/instances
       # ( while :; do mkdir foo; rmdir foo; done ) &
       # ( while :; do echo 1 > foo/events/sched/sched_switch 2> /dev/null; done ) &
      
      Luckily this can only be done by root user, but it should be fixed regardless.
      
      The problem is that a delete of the file can happen after the write to the event
      is opened, but before the enabling happens.
      
      The solution is to make sure the trace_array is available before succeeding in
      opening for write, and incerment the ref counter while opened.
      
      Now the instance can be deleted when the events are writing to the buffer,
      but the deletion of the instance will disable all events before the instance
      is actually deleted.
      
      Cc: stable@vger.kernel.org # 3.10
      Reported-by: NAlexander Lam <azl@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8e2e2fa4
  4. 02 7月, 2013 4 次提交
  5. 12 6月, 2013 2 次提交
  6. 16 5月, 2013 1 次提交
  7. 10 5月, 2013 3 次提交
  8. 09 5月, 2013 2 次提交
    • S
      tracing: Return error if register_ftrace_function_probe() fails for event_enable_func() · ff305ded
      Steven Rostedt (Red Hat) 提交于
      register_ftrace_function_probe() returns the number of functions
      it registered, which can be zero, it can also return a negative number
      if something went wrong. But event_enable_func() only checks for
      the case that it didn't register anything, it needs to also check
      for the case that something went wrong and return that error code
      as well.
      
      Added some comments about the code as well, to make it more
      understandable.
      
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff305ded
    • M
      tracing: Don't succeed if event_enable_func did not register anything · a5b85bd1
      Masami Hiramatsu 提交于
      Return 0 instead of the number of activated ftrace function probes if
      event_enable_func succeeded and return an error code if it failed or
      did not register any functions. But it currently returns the number
      of registered functions and if it didn't register anything, it returns 0,
      but that is considered success.
      
      This also fixes the return value. As if it succeeds, it returns the
      number of functions that were enabled, which is returned back to
      the user in ftrace_regex_write (the write() return code). If only
      one function is enabled, then the return code of the write is one,
      and this can confuse the user program in thinking it only wrote 1
      byte.
      
      Link: http://lkml.kernel.org/r/20130509054413.30398.55650.stgit@mhiramat-M0-7522
      
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      [ Rewrote change log to reflect that this fixes two bugs - SR ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a5b85bd1
  9. 16 3月, 2013 3 次提交
  10. 15 3月, 2013 12 次提交
    • S
      tracing: Add function probe triggers to enable/disable events · 3cd715de
      Steven Rostedt (Red Hat) 提交于
      Add triggers to function tracer that lets an event get enabled or
      disabled when a function is called:
      
      format is:
      
       <function>:enable_event:<system>:<event>[:<count>]
       <function>:disable_event:<system>:<event>[:<count>]
      
       echo 'schedule:enable_event:sched:sched_switch' > /debug/tracing/set_ftrace_filter
      
      Every time schedule is called, it will enable the sched_switch event.
      
       echo 'schedule:disable_event:sched:sched_switch:2' > /debug/tracing/set_ftrace_filter
      
      The first two times schedule is called while the sched_switch
      event is enabled, it will disable it. It will not count for a time
      that the event is already disabled (or enabled for enable_event).
      
      [ fixed return without mutex_unlock() - thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3cd715de
    • S
      tracing: Add a way to soft disable trace events · 417944c4
      Steven Rostedt (Red Hat) 提交于
      In order to let triggers enable or disable events, we need a 'soft'
      method for doing so. For example, if a function probe is added that
      lets a user enable or disable events when a function is called, that
      change must be done without taking locks or a mutex, and definitely
      it can't sleep. But the full enabling of a tracepoint is expensive.
      
      By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
      without the protection of a mutex (using set/clear_bit()), this soft
      disable flag can be used to allow critical sections to enable or disable
      events from being traced (after the event has been placed into "SOFT_MODE").
      
      Some caveats though: The comm recorder (to map pids with a comm) can not
      be soft disabled (yet). If you disable an event with with a "soft"
      disable and wait a while before reading the trace, the comm cache may be
      replaced and you'll get a bunch of <...> for comms in the trace.
      
      Reading the "enable" file for an event that is disabled will now give
      you "0*" where the '*' denotes that the tracepoint is still active but
      the event itself is "disabled".
      
      [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      417944c4
    • S
      tracing: Add alloc_snapshot kernel command line parameter · 55034cd6
      Steven Rostedt (Red Hat) 提交于
      If debugging the kernel, and the developer wants to use
      tracing_snapshot() in places where tracing_snapshot_alloc() may
      be difficult (or more likely, the developer is lazy and doesn't
      want to bother with tracing_snapshot_alloc() at all), then adding
      
        alloc_snapshot
      
      to the kernel command line parameter will tell ftrace to allocate
      the snapshot buffer (if configured) when it allocates the main
      tracing buffer.
      
      I also noticed that ring_buffer_expanded and tracing_selftest_disabled
      had inconsistent use of boolean "true" and "false" with "0" and "1".
      I cleaned that up too.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55034cd6
    • S
      tracing: Clear all trace buffers when unloaded module event was used · 873c642f
      Steven Rostedt (Red Hat) 提交于
      Currently we do not know what buffer a module event was enabled in.
      On unload, it is safest to clear all buffer instances, not just the
      top level buffer.
      
      Todo: Clear only the buffer that the event was used in. The
      infrastructure is there to do this, but it makes the code a bit
      more complex. Lets get the current code vetted before we add that.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      873c642f
    • S
      tracing: Only clear trace buffer on module unload if event was traced · 575380da
      Steven Rostedt (Red Hat) 提交于
      Currently, when a module with events is unloaded, the trace buffer is
      cleared. This is just a safety net in case the module might have some
      strange callback when its event is outputted. But there's no reason
      to reset the buffer if the module didn't have any of its events traced.
      
      Add a flag to the event "call" structure called WAS_ENABLED and gets set
      when the event is ever enabled, and this flag never gets cleared. When a
      module gets unloaded, if any of its events have this flag set, then the
      trace buffer will get cleared.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      575380da
    • S
      tracing: Fix trace events build without modules · 315326c1
      Steven Rostedt (Red Hat) 提交于
      The new multi-buffers added a descriptor that kept track of module
      events, and the directories they use, with struct ftace_module_file_ops.
      This is used to add a ref count to keep modules from unloading while
      their files are being accessed.
      
      As the descriptor is only needed when CONFIG_MODULES is enabled, it
      is only declared when the config is enabled. But that struct is
      dereferenced in a few areas outside the #ifdef CONFIG_MODULES.
      
      By adding some helper routines and moving code around a little,
      events can be compiled again without modules.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      315326c1
    • S
      tracing: Use direct field, type and system names · 92edca07
      Steven Rostedt 提交于
      The names used to display the field and type in the event format
      files are copied, as well as the system name that is displayed.
      
      All these names are created by constant values passed in.
      If one of theses values were to be removed by a module, the module
      would also be required to remove any event it created.
      
      By using the strings directly, we can save over 100K of memory.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      92edca07
    • S
      tracing: Use kmem_cache_alloc instead of kmalloc in trace_events.c · d1a29143
      Steven Rostedt 提交于
      The event structures used by the trace events are mostly persistent,
      but they are also allocated by kmalloc, which is not the best at
      allocating space for what is used. By converting these kmallocs
      into kmem_cache_allocs, we can save over 50K of space that is
      permanently allocated.
      
      After boot we have:
      
       slab name          active allocated size
       ---------          ------ --------- ----
      ftrace_event_file    979   1005     56   67    1
      ftrace_event_field   2301   2310     48   77    1
      
      The ftrace_event_file has at boot up 979 active objects out of
      1005 allocated in the slabs. Each object is 56 bytes. In a normal
      kmalloc, that would allocate 64 bytes for each object.
      
       1005 - 979  = 26 objects not used
       26 * 56 = 1456 bytes wasted
      
      But if we used kmalloc:
      
       64 - 56 = 8 bytes unused per allocation
       8 * 979 = 7832 bytes wasted
      
       7832 - 1456 = 6376 bytes in savings
      
      Doing the same for ftrace_event_field where there's 2301 objects
      allocated in a slab that can hold 2310 with 48 bytes each we have:
      
       2310 - 2301 = 9 objects not used
       9 * 48 = 432 bytes wasted
      
      A kmalloc would also use 64 bytes per object:
      
       64 - 48 = 16 bytes unused per allocation
       16 * 2301 = 36816 bytes wasted!
      
       36816 - 432 = 36384 bytes in savings
      
      This change gives us a total of 42760 bytes in savings. At least
      on my machine, but as there's a lot of these persistent objects
      for all configurations that use trace points, this is a net win.
      
      Thanks to Ezequiel Garcia for his trace_analyze presentation which
      pointed out the wasted space in my code.
      
      Cc: Ezequiel Garcia <elezegarcia@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d1a29143
    • S
      tracing: Get trace_events kernel command line working again · 77248221
      Steven Rostedt 提交于
      With the new descriptors used to allow multiple buffers in the
      tracing directory added, the kernel command line parameter
      trace_events=... no longer works. This is because the top level
      (global) trace array now has a list of descriptors associated
      with the events and the files in the debugfs directory. But in
      early bootup, when the command line is processed and the events
      enabled, the trace array list of events has not been set up yet.
      
      Without the list of events in the trace array, the setting of
      events to record will fail because it would not match any events.
      
      The solution is to set up the top level array in two stages.
      The first is to just add the ftrace file descriptors that just point
      to the events. This will allow events to be enabled and start tracing.
      The second stage is called after the filesystem is set up, and this
      stage will create the debugfs event files and directories associated
      with the trace array events.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      77248221
    • S
      tracing: Add rmdir to remove multibuffer instances · 0c8916c3
      Steven Rostedt 提交于
      Add a method to the hijacked dentry descriptor of the
      "instances" directory to allow for rmdir to remove an
      instance of a multibuffer.
      
      Example:
      
        cd /debug/tracing/instances
        mkdir hello
        ls
      hello/
        rmdir hello
        ls
      
      Like the mkdir method, the i_mutex is dropped for the instances
      directory. The instances directory is created at boot up and can
      not be renamed or removed. The trace_types_lock mutex is used to
      synchronize adding and removing of instances.
      
      I've run several stress tests with different threads trying to
      create and delete directories of the same name, and it has stood
      up fine.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0c8916c3
    • S
      tracing: Add interface to allow multiple trace buffers · 277ba044
      Steven Rostedt 提交于
      Add the interface ("instances" directory) to add multiple buffers
      to ftrace. To create a new instance, simply do a mkdir in the
      instances directory:
      
      This will create a directory with the following:
      
       # cd instances
       # mkdir foo
       # ls foo
      buffer_size_kb        free_buffer  trace_clock    trace_pipe
      buffer_total_size_kb  set_event    trace_marker   tracing_enabled
      events/               trace        trace_options  tracing_on
      
      Currently only events are able to be set, and there isn't a way
      to delete a buffer when one is created (yet).
      
      Note, the i_mutex lock is dropped from the parent "instances"
      directory during the mkdir operation. As the "instances" directory
      can not be renamed or deleted (created on boot), I do not see
      any harm in dropping the lock. The creation of the sub directories
      is protected by trace_types_lock mutex, which only lets one
      instance get into the code path at a time. If two tasks try to
      create or delete directories of the same name, only one will occur
      and the other will fail with -EEXIST.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      277ba044
    • S
      tracing: Separate out trace events from global variables · ae63b31e
      Steven Rostedt 提交于
      The trace events for ftrace are all defined via global variables.
      The arrays of events and event systems are linked to a global list.
      This prevents multiple users of the event system (what to enable and
      what not to).
      
      By adding descriptors to represent the event/file relation, as well
      as to which trace_array descriptor they are associated with, allows
      for more than one set of events to be defined. Once the trace events
      files have a link between the trace event and the trace_array they
      are associated with, we can create multiple trace_arrays that can
      record separate events in separate buffers.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae63b31e
  11. 22 1月, 2013 1 次提交
    • S
      tracing: Remove the extra 4 bytes of padding in events · b000c806
      Steven Rostedt 提交于
      Due to a userspace issue with PowerTop v2beta, which hardcoded
      the offset of event fields that it was using, it broke when
      we removed the Big Kernel Lock counter from the event header.
      
       (commit e6e1e259 "tracing: Remove lock_depth from event entry")
      
      Because this broke userspace, it was determined that we must
      keep those 4 bytes around.
      
       (commit a3a4a5ac "Regression: partial revert "tracing: Remove lock_depth from event entry"")
      
      This unfortunately wastes space in the ring buffer. 4 bytes per
      event, where a lot of events are just 24 bytes. That's 16% of the
      buffer wasted. A million events will add 4 megs of white space
      into the buffer.
      
      It was later noticed that PowerTop v2beta could not work on systems
      where the kernel was 64 bit but the userspace was 32 bits.
      The reason was because the offsets are different between the
      two and the hard coded offset of one would not work with the other.
      
      With PowerTop v2 final, it implemented the same interface that both
      perf and trace-cmd use. That is, it reads the format file of
      the event to find the offsets of the fields it needs. This fixes
      the problem with running powertop on a 32 bit userspace running
      on a 64 bit kernel. It also no longer requires the 4 byte padding.
      
      As PowerTop v2 has been out for a while, and is included in all
      major distributions, it is time that we can safely remove the
      4 bytes of padding. Users of PowerTop v2beta should upgrade to
      PowerTop v2 final.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b000c806
  12. 02 11月, 2012 2 次提交
    • S
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions · 0d5c6e1c
      Steven Rostedt 提交于
      Have the ring buffer commit function use the irq_work infrastructure to
      wake up any waiters waiting on the ring buffer for new data. The irq_work
      was created for such a purpose, where doing the actual wake up at the
      time of adding data is too dangerous, as an event or function trace may
      be in the midst of the work queue locks and cause deadlocks. The irq_work
      will either delay the action to the next timer interrupt, or trigger an IPI
      to itself forcing an interrupt to do the work (in a safe location).
      
      With irq_work, all ring buffer commits can safely do wakeups, removing
      the need for the ring buffer commit "nowake" variants, which were used
      by events and function tracing. All commits can now safely use the
      normal commit, and the "nowake" variants can be removed.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0d5c6e1c
    • S
      tracing: Separate open function from set_event and available_events · 15075cac
      Steven Rostedt 提交于
      The open function used by available_events is the same as set_event even
      though it uses different seq functions. This causes a side effect of
      writing into available_events clearing all events, even though
      available_events is suppose to be read only.
      
      There's no reason to keep a single function for just the open and have
      both use different functions for everything else. It is a little
      confusing and causes strange behavior. Just have each have their own
      function.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      15075cac
  13. 01 11月, 2012 1 次提交
    • S
      tracing: Enable comm recording if trace_printk() is used · 81698831
      Steven Rostedt 提交于
      If comm recording is not enabled when trace_printk() is used then
      you just get this type of output:
      
      [ adding trace_printk("hello! %d", irq); in do_IRQ ]
      
                 <...>-2843  [001] d.h.    80.812300: do_IRQ: hello! 14
                 <...>-2734  [002] d.h2    80.824664: do_IRQ: hello! 14
                 <...>-2713  [003] d.h.    80.829971: do_IRQ: hello! 14
                 <...>-2814  [000] d.h.    80.833026: do_IRQ: hello! 14
      
      By enabling the comm recorder when trace_printk is enabled:
      
             hackbench-6715  [001] d.h.   193.233776: do_IRQ: hello! 21
                  sshd-2659  [001] d.h.   193.665862: do_IRQ: hello! 21
                <idle>-0     [001] d.h1   193.665996: do_IRQ: hello! 21
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      81698831