1. 23 1月, 2015 1 次提交
  2. 14 1月, 2015 1 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
  3. 20 11月, 2014 1 次提交
  4. 14 11月, 2014 2 次提交
  5. 06 6月, 2014 1 次提交
  6. 24 4月, 2014 2 次提交
  7. 09 4月, 2014 1 次提交
  8. 21 2月, 2014 1 次提交
  9. 10 1月, 2014 1 次提交
  10. 07 1月, 2014 1 次提交
  11. 03 1月, 2014 8 次提交
  12. 06 11月, 2013 1 次提交
    • T
      tracing: Update event filters for multibuffer · f306cc82
      Tom Zanussi 提交于
      The trace event filters are still tied to event calls rather than
      event files, which means you don't get what you'd expect when using
      filters in the multibuffer case:
      
      Before:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 2048
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      Setting the filter in tracing/instances/test1/events shouldn't affect
      the same event in tracing/events as it does above.
      
      After:
      
        # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # mkdir /sys/kernel/debug/tracing/instances/test1
        # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
        bytes_alloc > 8192
        # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
        bytes_alloc > 2048
      
      We'd like to just move the filter directly from ftrace_event_call to
      ftrace_event_file, but there are a couple cases that don't yet have
      multibuffer support and therefore have to continue using the current
      event_call-based filters.  For those cases, a new USE_CALL_FILTER bit
      is added to the event_call flags, whose main purpose is to keep the
      old behavior for those cases until they can be updated with
      multibuffer support; at that point, the USE_CALL_FILTER flag (and the
      new associated call_filter_check_discard() function) can go away.
      
      The multibuffer support also made filter_current_check_discard()
      redundant, so this change removes that function as well and replaces
      it with filter_check_discard() (or call_filter_check_discard() as
      appropriate).
      
      Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f306cc82
  13. 01 8月, 2013 1 次提交
    • S
      tracing/kprobes: Fail to unregister if probe event files are in use · 40c32592
      Steven Rostedt (Red Hat) 提交于
      When a probe is being removed, it cleans up the event files that correspond
      to the probe. But there is a race between writing to one of these files
      and deleting the probe. This is especially true for the "enable" file.
      
      	CPU 0				CPU 1
      	-----				-----
      
      				  fd = open("enable",O_WRONLY);
      
        probes_open()
        release_all_trace_probes()
        unregister_trace_probe()
        if (trace_probe_is_enabled(tp))
      	return -EBUSY
      
      				   write(fd, "1", 1)
      				   __ftrace_set_clr_event()
      				   call->class->reg()
      				    (kprobe_register)
      				     enable_trace_probe(tp)
      
        __unregister_trace_probe(tp);
        list_del(&tp->list)
        unregister_probe_event(tp) <-- fails!
        free_trace_probe(tp)
      
      				   write(fd, "0", 1)
      				   __ftrace_set_clr_event()
      				   call->class->unreg
      				    (kprobe_register)
      				    disable_trace_probe(tp) <-- BOOM!
      
      A test program was written that used two threads to simulate the
      above scenario adding a nanosleep() interval to change the timings
      and after several thousand runs, it was able to trigger this bug
      and crash:
      
      BUG: unable to handle kernel paging request at 00000005000000f9
      IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7
      PGD 7808a067 PUD 0
      Oops: 0000 [#1] PREEMPT SMP
      Dumping ftrace buffer:
      ---------------------------------
      Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6
      CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
      Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
      task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
      RIP: 0010:[<ffffffff810dee70>]  [<ffffffff810dee70>] probes_open+0x3b/0xa7
      RSP: 0018:ffff880076e53c38  EFLAGS: 00010203
      RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
      RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
      RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
      R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
      R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
      FS:  00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
      Stack:
       ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
       ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
       ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
      Call Trace:
       [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30
       [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b
       [<ffffffff81130f78>] do_dentry_open+0x162/0x226
       [<ffffffff81131186>] finish_open+0x46/0x54
       [<ffffffff8113f30b>] do_last+0x7f6/0x996
       [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44
       [<ffffffff8113f6dd>] path_openat+0x232/0x496
       [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a
       [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a
       [<ffffffff81131f4e>] do_sys_open+0x70/0x102
       [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197
       [<ffffffff81131ffe>] SyS_open+0x1e/0x20
       [<ffffffff81522742>] system_call_fastpath+0x16/0x1b
      Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7
      RIP  [<ffffffff810dee70>] probes_open+0x3b/0xa7
       RSP <ffff880076e53c38>
      CR2: 00000005000000f9
      ---[ end trace 35f17d68fc569897 ]---
      
      The unregister_trace_probe() must be done first, and if it fails it must
      fail the removal of the kprobe.
      
      Several changes have already been made by Oleg Nesterov and Masami Hiramatsu
      to allow moving the unregister_probe_event() before the removal of
      the probe and exit the function if it fails. This prevents the tp
      structure from being used after it is freed.
      
      Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.orgAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      40c32592
  14. 19 7月, 2013 2 次提交
  15. 02 7月, 2013 4 次提交
  16. 20 6月, 2013 1 次提交
  17. 16 5月, 2013 3 次提交
  18. 10 5月, 2013 6 次提交
  19. 02 11月, 2012 1 次提交
    • S
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions · 0d5c6e1c
      Steven Rostedt 提交于
      Have the ring buffer commit function use the irq_work infrastructure to
      wake up any waiters waiting on the ring buffer for new data. The irq_work
      was created for such a purpose, where doing the actual wake up at the
      time of adding data is too dangerous, as an event or function trace may
      be in the midst of the work queue locks and cause deadlocks. The irq_work
      will either delay the action to the next timer interrupt, or trigger an IPI
      to itself forcing an interrupt to do the work (in a safe location).
      
      With irq_work, all ring buffer commits can safely do wakeups, removing
      the need for the ring buffer commit "nowake" variants, which were used
      by events and function tracing. All commits can now safely use the
      normal commit, and the "nowake" variants can be removed.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0d5c6e1c
  20. 01 11月, 2012 1 次提交