1. 13 4月, 2013 2 次提交
  2. 15 3月, 2013 3 次提交
  3. 14 3月, 2013 1 次提交
    • S
      tracing: Fix free of probe entry by calling call_rcu_sched() · 740466bc
      Steven Rostedt (Red Hat) 提交于
      Because function tracing is very invasive, and can even trace
      calls to rcu_read_lock(), RCU access in function tracing is done
      with preempt_disable_notrace(). This requires a synchronize_sched()
      for updates and not a synchronize_rcu().
      
      Function probes (traceon, traceoff, etc) must be freed after
      a synchronize_sched() after its entry has been removed from the
      hash. But call_rcu() is used. Fix this by using call_rcu_sched().
      
      Also fix the usage to use hlist_del_rcu() instead of hlist_del().
      
      Cc: stable@vger.kernel.org
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      740466bc
  4. 12 3月, 2013 1 次提交
    • S
      tracing: Fix race in snapshot swapping · 2721e72d
      Steven Rostedt (Red Hat) 提交于
      Although the swap is wrapped with a spin_lock, the assignment
      of the temp buffer used to swap is not within that lock.
      It needs to be moved into that lock, otherwise two swaps
      happening on two different CPUs, can end up using the wrong
      temp buffer to assign in the swap.
      
      Luckily, all current callers of the swap function appear to have
      their own locks. But in case something is added that allows two
      different callers to call the swap, then there's a chance that
      this race can trigger and corrupt the buffers.
      
      New code is coming soon that will allow for this race to trigger.
      
      I've Cc'd stable, so this bug will not show up if someone backports
      one of the changes that can trigger this bug.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2721e72d
  5. 07 3月, 2013 2 次提交
    • S
      tracing: Do not return EINVAL in snapshot when not allocated · c9960e48
      Steven Rostedt (Red Hat) 提交于
      To use the tracing snapshot feature, writing a '1' into the snapshot
      file causes the snapshot buffer to be allocated if it has not already
      been allocated and dose a 'swap' with the main buffer, so that the
      snapshot now contains what was in the main buffer, and the main buffer
      now writes to what was the snapshot buffer.
      
      To free the snapshot buffer, a '0' is written into the snapshot file.
      
      To clear the snapshot buffer, any number but a '0' or '1' is written
      into the snapshot file. But if the file is not allocated it returns
      -EINVAL error code. This is rather pointless. It is better just to
      do nothing and return success.
      Acked-by: NHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c9960e48
    • S
      tracing: Add help of snapshot feature when snapshot is empty · d8741e2e
      Steven Rostedt (Red Hat) 提交于
      When cat'ing the snapshot file, instead of showing an empty trace
      header like the trace file does, show how to use the snapshot
      feature.
      
      Also, this is a good place to show if the snapshot has been allocated
      or not. Users may want to "pre allocate" the snapshot to have a fast
      "swap" of the current buffer. Otherwise, a swap would be slow and might
      fail as it would need to allocate the snapshot buffer, and that might
      fail under tight memory constraints.
      
      Here's what it looked like before:
      
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 0/0   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
      
      Here's what it looks like now:
      
       # tracer: nop
       #
       #
       # * Snapshot is freed *
       #
       # Snapshot commands:
       # echo 0 > snapshot : Clears and frees snapshot buffer
       # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
       #                      Takes a snapshot of the main buffer.
       # echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
       #                      (Doesn't have to be '2' works with any number that
       #                       is not a '0' or '1')
      Acked-by: NHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d8741e2e
  6. 03 3月, 2013 1 次提交
    • J
      trace/ring_buffer: handle 64bit aligned structs · 649508f6
      James Hogan 提交于
      Some 32 bit architectures require 64 bit values to be aligned (for
      example Meta which has 64 bit read/write instructions). These require 8
      byte alignment of event data too, so use
      !CONFIG_HAVE_64BIT_ALIGNED_ACCESS instead of !CONFIG_64BIT ||
      CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to decide alignment, and align
      buffer_data_page::data accordingly.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: Steven Rostedt <rostedt@goodmis.org> (previous version subtly different)
      649508f6
  7. 28 2月, 2013 2 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • S
      ftrace: Update the kconfig for DYNAMIC_FTRACE · db05021d
      Steven Rostedt 提交于
      The prompt to enable DYNAMIC_FTRACE (the ability to nop and
      enable function tracing at run time) had a confusing statement:
      
       "enable/disable ftrace tracepoints dynamically"
      
      This was written before tracepoints were added to the kernel,
      but now that tracepoints have been added, this is very confusing
      and has confused people enough to give wrong information during
      presentations.
      
      Not only that, I looked at the help text, and it still references
      that dreaded daemon that use to wake up once a second to update
      the nop locations and brick NICs, that hasn't been around for over
      five years.
      
      Time to bring the text up to the current decade.
      
      Cc: stable@vger.kernel.org
      Reported-by: NEzequiel Garcia <elezegarcia@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      db05021d
  8. 19 2月, 2013 1 次提交
    • S
      ftrace: Call ftrace cleanup module notifier after all other notifiers · 8c189ea6
      Steven Rostedt (Red Hat) 提交于
      Commit: c1bf08ac "ftrace: Be first to run code modification on modules"
      
      changed ftrace module notifier's priority to INT_MAX in order to
      process the ftrace nops before anything else could touch them
      (namely kprobes). This was the correct thing to do.
      
      Unfortunately, the ftrace module notifier also contains the ftrace
      clean up code. As opposed to the set up code, this code should be
      run *after* all the module notifiers have run in case a module is doing
      correct clean-up and unregisters its ftrace hooks. Basically, ftrace
      needs to do clean up on module removal, as it needs to know about code
      being removed so that it doesn't try to modify that code. But after it
      removes the module from its records, if a ftrace user tries to remove
      a probe, that removal will fail due as the record of that code segment
      no longer exists.
      
      Nothing really bad happens if the probe removal is called after ftrace
      did the clean up, but the ftrace removal function will return an error.
      Correct code (such as kprobes) will produce a WARN_ON() if it fails
      to remove the probe. As people get annoyed by frivolous warnings, it's
      best to do the ftrace clean up after everything else.
      
      By splitting the ftrace_module_notifier into two notifiers, one that
      does the module load setup that is run at high priority, and the other
      that is called for module clean up that is run at low priority, the
      problem is solved.
      
      Cc: stable@vger.kernel.org
      Reported-by: NFrank Ch. Eigler <fche@redhat.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8c189ea6
  9. 13 2月, 2013 1 次提交
    • S
      tracing/syscalls: Allow archs to ignore tracing compat syscalls · f431b634
      Steven Rostedt 提交于
      The tracing of ia32 compat system calls has been a bit of a pain as they
      use different system call numbers than the 64bit equivalents.
      
      I wrote a simple 'lls' program that lists files. I compiled it as a i686
      ELF binary and ran it under a x86_64 box. This is the result:
      
      echo 0 > /debug/tracing/tracing_on
      echo 1 > /debug/tracing/events/syscalls/enable
      echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on
      
      grep lls /debug/tracing/trace
      
      [.. skipping calls before TS_COMPAT is set ...]
      
                   lls-1127  [005] d...   936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
                   lls-1127  [005] d...   936.409190: sys_recvfrom -> 0x8a77000
                   lls-1127  [005] d...   936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
                   lls-1127  [005] d...   936.409215: sys_lgetxattr -> 0xf76ff000
                   lls-1127  [005] d...   936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
                   lls-1127  [005] d...   936.409228: sys_dup2 -> 0xfffffffffffffffe
                   lls-1127  [005] d...   936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
                   lls-1127  [005] d...   936.409242: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
                   lls-1127  [005] d...   936.409244: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
                   lls-1127  [005] d...   936.409248: sys_lgetxattr -> 0xf76e5000
                   lls-1127  [005] d...   936.409248: sys_newlstat(filename: 3, statbuf: 19614)
                   lls-1127  [005] d...   936.409249: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
                   lls-1127  [005] d...   936.409279: sys_newfstat -> 0x3
                   lls-1127  [005] d...   936.409279: sys_close(fd: 3)
                   lls-1127  [005] d...   936.421550: sys_close -> 0x200
                   lls-1127  [005] d...   936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
                   lls-1127  [005] d...   936.421560: sys_removexattr -> 0x0
                   lls-1127  [005] d...   936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
                   lls-1127  [005] d...   936.421574: sys_lgetxattr -> 0x4d564000
                   lls-1127  [005] d...   936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
                   lls-1127  [005] d...   936.421580: sys_capget -> 0x0
                   lls-1127  [005] d...   936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
                   lls-1127  [005] d...   936.421589: sys_lgetxattr -> 0x4d710000
                   lls-1127  [005] d...   936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
                   lls-1127  [005] d...   936.426141: sys_lgetxattr -> 0x4d713000
                   lls-1127  [005] d...   936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
                   lls-1127  [005] d...   936.426146: sys_newlstat -> 0x0
                   lls-1127  [005] d...   936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
      
      Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
      obviously incorrect, and confusing.
      
      Other efforts have been made to fix this:
      
      https://lkml.org/lkml/2012/3/26/367
      
      But the real solution is to rewrite the syscall internals and come up
      with a fixed solution. One that doesn't require all the kluge that the
      current solution has.
      
      Thus for now, instead of outputting incorrect data, simply ignore them.
      With this patch the changes now have:
      
       #> grep lls /debug/tracing/trace
       #>
      
      Compat system calls simply are not traced. If users need compat
      syscalls, then they should just use the raw syscall tracepoints.
      
      For an architecture to make their compat syscalls ignored, it must
      define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
      define an arch_trace_is_compat_syscall() function that will return true
      if the current task should ignore tracing the syscall.
      
      I want to stress that this change does not affect actual syscalls in any
      way, shape or form. It is only used within the tracing system and
      doesn't interfere with the syscall logic at all. The changes are
      consolidated nicely into trace_syscalls.c and asm/ftrace.h.
      
      I had to make one small modification to asm/thread_info.h and that was
      to remove the include of asm/ftrace.h. As asm/ftrace.h required the
      current_thread_info() it was causing include hell. That include was
      added back in 2008 when the function graph tracer was added:
      
       commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"
      
      It does not need to be included there.
      
      Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.homeAcked-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f431b634
  10. 09 2月, 2013 12 次提交
    • O
      uprobes/perf: Avoid uprobe_apply() whenever possible · b2fe8ba6
      Oleg Nesterov 提交于
      uprobe_perf_open/close call the costly uprobe_apply() every time,
      we can avoid it if:
      
      	- "nr_systemwide != 0" is not changed.
      
      	- There is another process/thread with the same ->mm.
      
      	- copy_proccess() does inherit_event(). dup_mmap() preserves the
      	  inserted breakpoints.
      
      	- event->attr.enable_on_exec == T, we can rely on uprobe_mmap()
      	  called by exec/mmap paths.
      
      	- tp_target is exiting. Only _close() checks PF_EXITING, I don't
      	  think TRACE_REG_PERF_OPEN can hit the dying task too often.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      b2fe8ba6
    • O
      uprobes/perf: Teach trace_uprobe/perf code to use UPROBE_HANDLER_REMOVE · f42d24a1
      Oleg Nesterov 提交于
      Change uprobe_trace_func() and uprobe_perf_func() to return "int". Change
      uprobe_dispatcher() to return "trace_ret | perf_ret" although this is not
      needed, currently TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive.
      
      The only functional change is that uprobe_perf_func() checks the filtering
      too and returns UPROBE_HANDLER_REMOVE if nobody wants to trace current.
      
      Testing:
      
      	# perf probe -x /lib/libc.so.6 syscall
      
      	# perf record -e probe_libc:syscall -i perl -e 'fork; syscall -1 for 1..10; wait'
      
      	# perf report --show-total-period
      		100.00%            10     perl  libc-2.8.so    [.] syscall
      
      Before this patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				20
      
      A child process doesn't have a counter, but still it hits this breakoint
      "copied" by dup_mmap().
      
      After the patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				11
      
      The child process hits this int3 only once and does unapply_uprobe().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f42d24a1
    • O
      uprobes/perf: Teach trace_uprobe/perf code to pre-filter · 31ba3348
      Oleg Nesterov 提交于
      Finally implement uprobe_perf_filter() which checks ->nr_systemwide or
      ->perf_events to figure out whether we need to insert the breakpoint.
      
      uprobe_perf_open/close are changed to do uprobe_apply(true/false) when
      the new perf event comes or goes away.
      
      Note that currently this is very suboptimal:
      
      	- uprobe_register() called by TRACE_REG_PERF_REGISTER becomes a
      	  heavy nop, consumer->filter() always returns F at this stage.
      
      	  As it was already discussed we need uprobe_register_only() to
      	  avoid the costly register_for_each_vma() when possible.
      
      	- uprobe_apply() is oftenly overkill. Unless "nr_systemwide != 0"
      	  changes we need uprobe_apply_mm(), unapply_uprobe() is almost
      	  what we need.
      
      	- uprobe_apply() can be simply avoided sometimes, see the next
      	  changes.
      
      Testing:
      
      	# perf probe -x /lib/libc.so.6 syscall
      
      	# perl -e 'syscall -1 while 1' &
      	[1] 530
      
      	# perf record -e probe_libc:syscall perl -e 'syscall -1 for 1..10; sleep 1'
      
      	# perf report --show-total-period
      		100.00%            10     perl  libc-2.8.so    [.] syscall
      
      Before this patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				79291
      
      A huge ->nrhit == 79291 reflects the fact that the background process
      530 constantly hits this breakpoint too, even if doesn't contribute to
      the output.
      
      After the patch:
      
      	# cat /sys/kernel/debug/tracing/uprobe_profile
      		/lib/libc.so.6 syscall				10
      
      This shows that only the target process was punished by int3.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      31ba3348
    • O
      uprobes/perf: Teach trace_uprobe/perf code to track the active perf_event's · 736288ba
      Oleg Nesterov 提交于
      Introduce "struct trace_uprobe_filter" which records the "active"
      perf_event's attached to ftrace_event_call. For the start we simply
      use list_head, we can optimize this later if needed. For example, we
      do not really need to record an event with ->parent != NULL, we can
      rely on parent->child_list. And we can certainly do some optimizations
      for the case when 2 events have the same ->tp_target or tp_target->mm.
      
      Change trace_uprobe_register() to process TRACE_REG_PERF_OPEN/CLOSE
      and add/del this perf_event to the list.
      
      We can probably avoid any locking, but lets start with the "obvioulsy
      correct" trace_uprobe_filter->rwlock which protects everything.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      736288ba
    • O
      uprobes/perf: Always increment trace_uprobe->nhit · 1b47aefd
      Oleg Nesterov 提交于
      Move tu->nhit++ from uprobe_trace_func() to uprobe_dispatcher().
      
      ->nhit counts how many time we hit the breakpoint inserted by this
      uprobe, we do not want to loose this info if uprobe was enabled by
      sys_perf_event_open().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      1b47aefd
    • O
      uprobes/tracing: Kill uprobe_trace_consumer, embed uprobe_consumer into trace_uprobe · a932b738
      Oleg Nesterov 提交于
      trace_uprobe->consumer and "struct uprobe_trace_consumer" add the
      unnecessary indirection and complicate the code for no reason.
      
      This patch simply embeds uprobe_consumer into "struct trace_uprobe",
      all other changes only fix the compilation errors.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      a932b738
    • O
      uprobes/tracing: Introduce is_trace_uprobe_enabled() · b64b0077
      Oleg Nesterov 提交于
      probe_event_enable/disable() check tu->consumer != NULL to avoid the
      wrong uprobe_register/unregister().
      
      We are going to kill this pointer and "struct uprobe_trace_consumer",
      so we add the new helper, is_trace_uprobe_enabled(), which can rely
      on TP_FLAG_TRACE/TP_FLAG_PROFILE instead.
      
      Note: the current logic doesn't look optimal, it is not clear why
      TP_FLAG_TRACE/TP_FLAG_PROFILE are mutually exclusive, we will probably
      change this later.
      
      Also kill the unused TP_FLAG_UPROBE.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      b64b0077
    • O
      uprobes/tracing: Ensure inode != NULL in create_trace_uprobe() · 7e4e28c5
      Oleg Nesterov 提交于
      probe_event_enable/disable() check tu->inode != NULL at the start.
      This is ugly, if igrab() can fail create_trace_uprobe() should not
      succeed and "postpone" the failure.
      
      And S_ISREG(inode->i_mode) check added by d24d7dbf is not safe.
      
      Note: alloc_uprobe() should probably check igrab() != NULL as well.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      7e4e28c5
    • O
      uprobes/tracing: Fully initialize uprobe_trace_consumer before uprobe_register() · 4161824f
      Oleg Nesterov 提交于
      probe_event_enable() does uprobe_register() and only after that sets
      utc->tu and tu->consumer/flags. This can race with uprobe_dispatcher()
      which can miss these assignments or see them out of order. Nothing
      really bad can happen, but this doesn't look clean/safe.
      
      And this does not allow to use uprobe_consumer->filter() we are going
      to add, it is called by uprobe_register() and it needs utc->tu.
      
      Change this code to initialize everything before uprobe_register(), and
      reset tu->consumer/flags if it fails. We can't race with event_disable(),
      the caller holds event_mutex, and if we could the code would be wrong
      anyway.
      
      In fact I think uprobe_trace_consumer should die, it buys nothing but
      complicates the code. We can simply add uprobe_consumer into trace_uprobe.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      4161824f
    • O
      uprobes/tracing: Fix dentry/mount leak in create_trace_uprobe() · 84d7ed79
      Oleg Nesterov 提交于
      create_trace_uprobe() does kern_path() to find ->d_inode, but forgets
      to do path_put(). We can do this right after igrab().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      84d7ed79
    • O
      uprobes: Change handle_swbp() to expose bp_vaddr to handler_chain() · 74e59dfc
      Oleg Nesterov 提交于
      Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
      what consumer->handler() needs but uprobe_get_swbp_addr() is not
      exported.
      
      This also simplifies the code and makes it more consistent across
      the supported architectures. handle_swbp() becomes the only caller
      of uprobe_get_swbp_addr().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      74e59dfc
    • O
      uprobes: Kill uprobe_consumer->filter() · fe20d71f
      Oleg Nesterov 提交于
      uprobe_consumer->filter() is pointless in its current form, kill it.
      
      We will add it back, but with the different signature/semantics. Perhaps
      we will even re-introduce the callsite in handler_chain(), but not to
      just skip uc->handler().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      fe20d71f
  11. 08 2月, 2013 1 次提交
  12. 02 2月, 2013 1 次提交
    • S
      tracing: Init current_trace to nop_trace and remove NULL checks · d840f718
      Steven Rostedt (Red Hat) 提交于
      On early boot up, when the ftrace ring buffer is initialized, the
      static variable current_trace is initialized to &nop_trace.
      Before this initialization, current_trace is NULL and will never
      become NULL again. It is always reassigned to a ftrace tracer.
      
      Several places check if current_trace is NULL before it uses
      it, and this check is frivolous, because at the point in time
      when the checks are made the only way current_trace could be
      NULL is if ftrace failed its allocations at boot up, and the
      paths to these locations would probably not be possible.
      
      By initializing current_trace to &nop_trace where it is declared,
      current_trace will never be NULL, and we can remove all these
      checks of current_trace being NULL which never needed to be
      checked in the first place.
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d840f718
  13. 31 1月, 2013 4 次提交
  14. 30 1月, 2013 1 次提交
    • S
      tracing/fgraph: Adjust fgraph depth before calling trace return callback · 03274a3f
      Steven Rostedt (Red Hat) 提交于
      While debugging the virtual cputime with the function graph tracer
      with a max_depth of 1 (most common use of the max_depth so far),
      I found that I was missing kernel execution because of a race condition.
      
      The code for the return side of the function has a slight race:
      
      	ftrace_pop_return_trace(&trace, &ret, frame_pointer);
      	trace.rettime = trace_clock_local();
      	ftrace_graph_return(&trace);
      	barrier();
      	current->curr_ret_stack--;
      
      The ftrace_pop_return_trace() initializes the trace structure for
      the callback. The ftrace_graph_return() uses the trace structure
      for its own use as that structure is on the stack and is local
      to this function. Then the curr_ret_stack is decremented which
      is what the trace.depth is set to.
      
      If an interrupt comes in after the ftrace_graph_return() but
      before the curr_ret_stack, then the called function will get
      a depth of 2. If max_depth is set to 1 this function will be
      ignored.
      
      The problem is that the trace has already been called, and the
      timestamp for that trace will not reflect the time the function
      was about to re-enter userspace. Calls to the interrupt will not
      be traced because the max_depth has prevented this.
      
      To solve this issue, the ftrace_graph_return() can safely be
      moved after the current->curr_ret_stack has been updated.
      This way the timestamp for the return callback will reflect
      the actual time.
      
      If an interrupt comes in after the curr_ret_stack update and
      ftrace_graph_return(), it will be traced. It may look a little
      confusing to see it within the other function, but at least
      it will not be lost.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      03274a3f
  15. 29 1月, 2013 1 次提交
  16. 26 1月, 2013 2 次提交
  17. 25 1月, 2013 1 次提交
  18. 24 1月, 2013 1 次提交
  19. 23 1月, 2013 2 次提交
    • S
      ring-buffer: Remove trace.h from ring_buffer.c · 0b07436d
      Steven Rostedt 提交于
      ring_buffer.c use to require declarations from trace.h, but
      these have moved to the generic header files. There's nothing
      in trace.h that ring_buffer.c requires.
      
      There's some headers that trace.h included that ring_buffer.c
      needs, but it's best that it includes them directly, and not
      include trace.h.
      
      Also, some things may use ring_buffer.c without having tracing
      configured. This removes the dependency that may come in the
      future.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0b07436d
    • S
      ring-buffer: User context bit recursion checking · 567cd4da
      Steven Rostedt 提交于
      Using context bit recursion checking, we can help increase the
      performance of the ring buffer.
      
      Before this patch:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 10.285
      Time: 10.407
      Time: 10.243
      Time: 10.372
      Time: 10.380
      Time: 10.198
      Time: 10.272
      Time: 10.354
      Time: 10.248
      Time: 10.253
      
      (average: 10.3012)
      
      Now we have:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 9.712
      Time: 9.824
      Time: 9.861
      Time: 9.827
      Time: 9.962
      Time: 9.905
      Time: 9.886
      Time: 10.088
      Time: 9.861
      Time: 9.834
      
      (average: 9.876)
      
       a 4% savings!
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      567cd4da