1. 27 5月, 2009 1 次提交
    • S
      tracing: add __print_flags for events · be74b73a
      Steven Rostedt 提交于
      Developers have been asking for the ability in the ftrace event tracer
      to display names of bits in a flags variable.
      
      Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
      assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.
      
      Some examples where this would be useful are the state flags in a context
      switch, kmalloc flags, and even permision flags in accessing files.
      
      [
        v2 changes include:
      
        Frederic Weisbecker's idea of using a mask instead of bits,
        thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.
      
        Li Zefan's idea of allowing the caller of __print_flags to add their
        own delimiter (or no delimiter) where we can get for file permissions
        rwx instead of r|w|x.
      ]
      
      [
        v3 changes:
      
         Christoph Hellwig's idea of using an array instead of va_args.
      ]
      
      [ Impact: better displaying of flags in trace output ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      be74b73a
  2. 26 5月, 2009 3 次提交
    • Z
      ftrace: clean up of using ftrace_event_enable_disable() · 0e907c99
      Zhaolei 提交于
      Always use ftrace_event_enable_disable() to enable/disable an event
      so that we can factorize out the event toggling code.
      
      [ Impact: factorize and cleanup event tracing code ]
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      0e907c99
    • Z
      ftrace: Add task_comm support for trace_event · b11c53e1
      Zhaolei 提交于
      If we enable a trace event alone without any tracer running (such as
      function tracer, sched switch tracer, etc...) it can't output enough
      task command information.
      
      We need to use the tracing_{start/stop}_cmdline_record() helpers
      which are designed to keep track of cmdlines for any tasks that
      were scheduled during the tracing.
      
      Before this patch:
       # echo 1 > debugfs/tracing/events/sched/sched_switch/enable
       # cat debugfs/tracing/trace
       # tracer: nop
       #
       #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
       #              | |       |          |         |
                  <...>-2289  [000] 526276.724790: sched_switch: task bash:2289 [120] ==> sshd:2287 [120]
                  <...>-2287  [000] 526276.725231: sched_switch: task sshd:2287 [120] ==> bash:2289 [120]
                  <...>-2289  [000] 526276.725452: sched_switch: task bash:2289 [120] ==> sshd:2287 [120]
                  <...>-2287  [000] 526276.727181: sched_switch: task sshd:2287 [120] ==> swapper:0 [140]
                 <idle>-0     [000] 526277.032734: sched_switch: task swapper:0 [140] ==> events/0:5 [115]
                  <...>-5     [000] 526277.032782: sched_switch: task events/0:5 [115] ==> swapper:0 [140]
       ...
      
      After this patch:
       # tracer: nop
       #
       #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
       #              | |       |          |         |
                   bash-2269  [000] 527347.989229: sched_switch: task bash:2269 [120] ==> sshd:2267 [120]
                   sshd-2267  [000] 527347.990960: sched_switch: task sshd:2267 [120] ==> bash:2269 [120]
                   bash-2269  [000] 527347.991143: sched_switch: task bash:2269 [120] ==> sshd:2267 [120]
                   sshd-2267  [000] 527347.992959: sched_switch: task sshd:2267 [120] ==> swapper:0 [140]
                 <idle>-0     [000] 527348.531989: sched_switch: task swapper:0 [140] ==> events/0:5 [115]
               events/0-5     [000] 527348.532115: sched_switch: task events/0:5 [115] ==> swapper:0 [140]
       ...
      
      Changelog:
      v1->v2: Update Kconfig to select CONTEXT_SWITCH_TRACER in
              ENABLE_EVENT_TRACING
      v2->v3: v2 can solve problem that was caused by config EVENT_TRACING
              alone, but when CONFIG_FTRACE is off and CONFIG_TRACING is
              selected by other config, compile fail happened again.
              This version solves it.
      
      [ Impact: fix incomplete output of event tracing ]
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      b11c53e1
    • L
      tracing: add trace_event_read_lock() · 4f535968
      Lai Jiangshan 提交于
      I found that there is nothing to protect event_hash in
      ftrace_find_event(). Rcu protects the event hashlist
      but not the event itself while we use it after its extraction
      through ftrace_find_event().
      
      This lack of a proper locking in this spot opens a race
      window between any event dereferencing and module removal.
      
      Eg:
      
      --Task A--
      
      print_trace_line(trace) {
        event = find_ftrace_event(trace)
      
      --Task B--
      
      trace_module_remove_events(mod) {
        list_trace_events_module(ev, mod) {
          unregister_ftrace_event(ev->event) {
            hlist_del(ev->event->node)
              list_del(....)
          }
        }
      }
      |--> module removed, the event has been dropped
      
      --Task A--
      
        event->print(trace); // Dereferencing freed memory
      
      If the event retrieved belongs to a module and this module
      is concurrently removed, we may end up dereferencing a data
      from a freed module.
      
      RCU could solve this, but it would add latency to the kernel and
      forbid tracers output callbacks to call any sleepable code.
      So this fix converts 'trace_event_mutex' to a read/write semaphore,
      and adds trace_event_read_lock() to protect ftrace_find_event().
      
      [ Impact: fix possible freed memory dereference in ftrace ]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <4A114806.7090302@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      4f535968
  3. 21 5月, 2009 1 次提交
  4. 19 5月, 2009 1 次提交
    • S
      blktrace: remove debugfs entries on bad path · fd51d251
      Stefan Raspl 提交于
      debugfs directory entries for devices are not removed on some
      of the failure pathes in do_blk_trace_setup().
      One way to reproduce is to start blktrace on multiple devices
      with insufficient Vmalloc space: Devices will fail with
      a message like this:
      
      	BLKTRACESETUP(2) /dev/sdu failed: 5/Input/output error
      
      If so, the respective entries in debugfs
      (e.g. /sys/kernel/debug/block/sdu) will remain and subsequent
      attempts to start blktrace on the respective devices will not
      succeed due to existing directories.
      
      [ Impact: fix /debug/tracing file cleanup corner case ]
      Signed-off-by: NStefan Raspl <stefan.raspl@linux.vnet.ibm.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: schwidefsky@de.ibm.com
      Cc: heiko.carstens@de.ibm.com
      LKML-Reference: <4A1266CC.5040801@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fd51d251
  5. 18 5月, 2009 1 次提交
  6. 15 5月, 2009 5 次提交
  7. 12 5月, 2009 7 次提交
  8. 11 5月, 2009 1 次提交
    • L
      blktrace: pdu_buf of pc events should be unsigned · 04986257
      Li Zefan 提交于
      I got this:
        8,0    1   305.417782332  2037  I   R 32 (ffffff9e 10 00 ...) [bash]
      
      It should be:
        8,0    1   305.417782332  2037  I   R 32 (9e 10 00 ...) [bash]
      
      [ Impact: fix output of pc events ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A07C6B3.9080802@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04986257
  9. 09 5月, 2009 4 次提交
  10. 08 5月, 2009 6 次提交
  11. 07 5月, 2009 10 次提交
    • S
      tracing: append ":*" to internal setting of system events · d6bf81ef
      Steven Rostedt 提交于
      The system enabling of events uses the same code as the set_event file.
      It passes in the name of the system to the parser and that will enable
      all the events that has that system as a name.
      
      The problem is that it will also enable events with the same name as the
      system.
      
      If you have system name foo, and system name bar, but within the system
      bar, there exists an event called foo. By setting the system name foo,
      you will also be enabling the event foo in the system bar. This is not
      an expected result.
      
      The solution is to pass in "foo:*", which will only enable the system
      foo and not events called foo.
      
      [ Impact: prevent accidental enabling of events with same name as a system ]
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d6bf81ef
    • S
      ring-buffer: remove complex calculations in ring-buffer-test · 29c8000e
      Steven Rostedt 提交于
      Ingo Molnar thought that the code to calculate the time in cond_resched
      is a bit too ugly and is not needed. This patch removes it and replaces
      it with a simple call to cond_resched. I kept the comment that explains
      the reason for the cond_resched.
      
      [ Impact: remove ugly code ]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      29c8000e
    • L
      tracing/events: fix concurrent access to ftrace_events list, fix · d94fc523
      Li Zefan 提交于
      In filter_add_subsystem_pred() we should release event_mutex before
      calling filter_free_subsystem_preds(), since both functions hold
      event_mutex.
      
      [ Impact: fix deadlock when writing invalid pred into subsystem filter ]
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: tzanussi@gmail.com
      Cc: a.p.zijlstra@chello.nl
      Cc: fweisbec@gmail.com
      Cc: rostedt@goodmis.org
      LKML-Reference: <4A028993.7020509@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d94fc523
    • F
      tracing/filters: support for operator reserved characters in strings · 5928c3cc
      Frederic Weisbecker 提交于
      When we set a filter for an event, such as:
      
      echo "name == my_lock_name" > \
      	/debug/tracing/events/lockdep/lock_acquired/filter
      
      then the following order of token type is parsed:
      
      - space
      - operator
      - parentheses
      - operand
      
      Because the operators and parentheses have a higher precedence
      than the operand characters, which is normal, then we can't
      use any string containing such special characters:
      
      ()=<>!&|
      
      To get this support and also avoid ambiguous intepretation from
      the parser or the human, we can do it using double quotes so that
      we keep the usual languages habits.
      
      Then after this patch you can still declare string condition like
      before:
      
      echo name == myname
      
      But if you want to compare against a string containing an operator
      character, you can use double quotes:
      
      echo 'name == "&myname"'
      
      Don't forget to include the whole expression into single quotes or
      the double ones will be eaten by echo.
      
      [ Impact: support strings with special characters for tracing filters ]
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Zhaolei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      5928c3cc
    • F
      tracing/filters: support for filters of dynamic sized arrays · e8808c10
      Frederic Weisbecker 提交于
      Currently the filtering infrastructure supports well the
      numeric types and fixed sized array types.
      
      But the recently added __string() field uses a specific
      indirect offset mechanism which requires a specific
      predicate. Until now it wasn't supported.
      
      This patch adds this support and implies very few changes,
      only a new predicate is needed, the management of this specific
      field can be done through the usual string helpers in the
      filtering infrastructure.
      
      [ Impact: support all kinds of strings in the tracing filters ]
      
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Zhaolei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e8808c10
    • S
      tracing: add hierarchical enabling of events · 8ae79a13
      Steven Rostedt 提交于
      With the current event directory, you can only enable individual events.
      The file debugfs/tracing/set_event is used to be able to enable or
      disable several events at once. But that can still be awkward.
      
      This patch adds hierarchical enabling of events. That is, each directory
      in debugfs/tracing/events has an "enable" file. This file can enable
      or disable all events within the directory and below.
      
       # echo 1 > /debugfs/tracing/events/enable
      
      will enable all events.
      
       # echo 1 > /debugfs/tracing/events/sched/enable
      
      will enable all events in the sched subsystem.
      
       # echo 1 > /debugfs/tracing/events/enable
       # echo 0 > /debugfs/tracing/events/irq/enable
      
      will enable all events, but then disable just the irq subsystem events.
      
      When reading one of these enable files, there are four results:
      
       0 - all events this file affects are disabled
       1 - all events this file affects are enabled
       X - there is a mixture of events enabled and disabled
       ? - this file does not affect any event
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8ae79a13
    • S
      tracing: reset ring buffer when removing modules with events · 9456f0fa
      Steven Rostedt 提交于
      Li Zefan found that there's a race using the event ids of events and
      modules. When a module is loaded, an event id is incremented. We only
      have 16 bits for event ids (65536) and there is a possible (but highly
      unlikely) race that we could load and unload a module that registers
      events so many times that the event id counter overflows.
      
      When it overflows, it then restarts and goes looking for available
      ids. An id is available if it was added by a module and released.
      
      The race is if you have one module add an id, and then is removed.
      Another module loaded can use that same event id. But if the old module
      still had events in the ring buffer, the new module's call back would
      get bogus data.  At best (and most likely) the output would just be
      garbage. But if the module for some reason used pointers (not recommended)
      then this could potentially crash.
      
      The safest thing to do is just reset the ring buffer if a module that
      registered events is removed.
      
      [ Impact: prevent unpredictable results of event id overflows ]
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9456f0fa
    • A
      Eliminate thousands of warnings with gcc 3.2 build · 57adc4d2
      Andi Kleen 提交于
      When building with gcc 3.2 I get thousands of warnings such as
      
      include/linux/gfp.h: In function `allocflags_to_migratetype':
      include/linux/gfp.h:105: warning: null format string
      
      due to passing a NULL format string to warn_slowpath() in
      
      #define __WARN()		warn_slowpath(__FILE__, __LINE__, NULL)
      
      Split this case out into a separate call.  This also shrinks the kernel
      slightly:
      
                text    data     bss     dec     hex filename
             4802274  707668  712704 6222646  5ef336 vmlinux
                text    data     bss     dec     hex filename
             4799027  703572  712704 6215303  5ed687 vmlinux
      
      due to removeing one argument from the commonly-called __WARN().
      
      [akpm@linux-foundation.org: reduce scope of `empty']
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57adc4d2
    • W
      inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive · 381a80e6
      Wu Fengguang 提交于
      There is what we believe to be a false positive reported by lockdep.
      
      inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
      kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
      dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock
      
      The plan is to fix this via lockdep annotation, but that is proving to be
      quite involved.
      
      The patch flips the allocation over to GFP_NFS to shut the warning up, for
      the 2.6.30 release.
      
      Hopefully we will fix this for real in 2.6.31.  I'll queue a patch in -mm
      to switch it back to GFP_KERNEL so we don't forget.
      
        =================================
        [ INFO: inconsistent lock state ]
        2.6.30-rc2-next-20090417 #203
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
        {RECLAIM_FS-ON-W} state was registered at:
          [<ffffffff81079188>] mark_held_locks+0x68/0x90
          [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
          [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
          [<ffffffff81130652>] kernel_event+0xe2/0x190
          [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
          [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
          [<ffffffff8110444d>] vfs_create+0xcd/0x140
          [<ffffffff8110825d>] do_filp_open+0x88d/0xa20
          [<ffffffff810f6b68>] do_sys_open+0x98/0x140
          [<ffffffff810f6c50>] sys_open+0x20/0x30
          [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
        irq event stamp: 690455
        hardirqs last  enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
        hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
        softirqs last  enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
        softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50
      
        other info that might help us debug this:
        2 locks held by kswapd0/380:
         #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
         #1:  (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0
      
        stack backtrace:
        Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
        Call Trace:
         [<ffffffff810789ef>] print_usage_bug+0x19f/0x200
         [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
         [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
         [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
         [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
         [<ffffffff810f478c>] ? slob_free+0x10c/0x370
         [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81012fe9>] ? sched_clock+0x9/0x10
         [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
         [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
         [<ffffffff8110cb23>] d_kill+0x33/0x60
         [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
         [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
         [<ffffffff810d0cc5>] shrink_slab+0x125/0x180
         [<ffffffff810d1540>] kswapd+0x560/0x7a0
         [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
         [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
         [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
         [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
         [<ffffffff8106555b>] kthread+0x5b/0xa0
         [<ffffffff8100d40a>] child_rip+0xa/0x20
         [<ffffffff8100cdd0>] ? restore_args+0x0/0x30
         [<ffffffff81065500>] ? kthread+0x0/0xa0
         [<ffffffff8100d400>] ? child_rip+0x0/0x20
      
      [eparis@redhat.com: fix audit too]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      381a80e6
    • S
      ring-buffer: change test to be more latency friendly · 3e07a4f6
      Steven Rostedt 提交于
      The ring buffer benchmark/test runs a producer for 10 seconds.
      This is done with preemption and interrupts enabled. But if the kernel
      is not compiled with CONFIG_PREEMPT, it basically stops everything
      but interrupts for 10 seconds.
      
      Although this is just a test and is not for production, this attribute
      can be quite annoying. It can also spawn badness elsewhere.
      
      This patch solves the issues by calling "cond_resched" when the system
      is not compiled with CONFIG_PREEMPT. It also keeps track of the time
      spent to call cond_resched such that it does not go against the
      time calculations. That is, if the task schedules away, the time scheduled
      out is removed from the test data. Note, this only works for non PREEMPT
      because we do not know when the task is scheduled out if we have PREEMPT
      enabled.
      
      [ Impact: prevent test from stopping the world for 10 seconds ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3e07a4f6