1. 11 11月, 2014 1 次提交
    • R
      tracing: Do not busy wait in buffer splice · e30f53aa
      Rabin Vincent 提交于
      On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
      lockup:
      
       # trace-cmd record -e raw_syscalls:* -F false
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
       ...
       Call Trace:
        [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
        [<ffffffff81092e25>] wait_on_pipe+0x35/0x40
        [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
        [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
        [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
        [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
        [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff8112415d>] do_splice_to+0x6d/0x90
        [<ffffffff81126971>] SyS_splice+0x7c1/0x800
        [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
      
      The problem is this: tracing_buffers_splice_read() calls
      ring_buffer_wait() to wait for data in the ring buffers.  The buffers
      are not empty so ring_buffer_wait() returns immediately.  But
      tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
      meaning it only wants to read a full page.  When the full page is not
      available, tracing_buffers_splice_read() tries to wait again with
      ring_buffer_wait(), which again returns immediately, and so on.
      
      Fix this by adding a "full" argument to ring_buffer_wait() which will
      make ring_buffer_wait() wait until the writer has left the reader's
      page, i.e.  until full-page reads will succeed.
      
      Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
      
      Cc: stable@vger.kernel.org # 3.16+
      Fixes: b1169cc6 ("tracing: Remove mock up poll wait function")
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e30f53aa
  2. 31 10月, 2014 1 次提交
    • R
      tracing/syscalls: Ignore numbers outside NR_syscalls' range · 086ba77a
      Rabin Vincent 提交于
      ARM has some private syscalls (for example, set_tls(2)) which lie
      outside the range of NR_syscalls.  If any of these are called while
      syscall tracing is being performed, out-of-bounds array access will
      occur in the ftrace and perf sys_{enter,exit} handlers.
      
       # trace-cmd record -e raw_syscalls:* true && trace-cmd report
       ...
       true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
       true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
       true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
       true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
       ...
      
       # trace-cmd record -e syscalls:* true
       [   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
       [   17.289590] pgd = 9e71c000
       [   17.289696] [aaaaaace] *pgd=00000000
       [   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
       [   17.290169] Modules linked in:
       [   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
       [   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
       [   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
       [   17.290866] LR is at syscall_trace_enter+0x124/0x184
      
      Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.
      
      Commit cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      added the check for less than zero, but it should have also checked
      for greater than NR_syscalls.
      
      Link: http://lkml.kernel.org/p/1414620418-29472-1-git-send-email-rabin@rab.in
      
      Fixes: cd0980fc "tracing: Check invalid syscall nr while tracing syscalls"
      Cc: stable@vger.kernel.org # 2.6.33+
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      086ba77a
  3. 25 10月, 2014 2 次提交
    • S
      ftrace: Fix checking of trampoline ftrace_ops in finding trampoline · 4fc40904
      Steven Rostedt (Red Hat) 提交于
      When modifying code, ftrace has several checks to make sure things
      are being done correctly. One of them is to make sure any code it
      modifies is exactly what it expects it to be before it modifies it.
      In order to do so with the new trampoline logic, it must be able
      to find out what trampoline a function is hooked to in order to
      see if the code that hooks to it is what's expected.
      
      The logic to find the trampoline from a record (accounting descriptor
      for a function that is hooked) needs to only look at the "old_hash"
      of an ops that is being modified. The old_hash is the list of function
      an ops is hooked to before its update. Since a record would only be
      pointing to an ops that is being modified if it was already hooked
      before.
      
      Currently, it can pick a modified ops based on its new functions it
      will be hooked to, and this picks the wrong trampoline and causes
      the check to fail, disabling ftrace.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      
      ftrace: squash into ordering of ops for modification
      4fc40904
    • S
      ftrace: Set ops->old_hash on modifying what an ops hooks to · 8252ecf3
      Steven Rostedt (Red Hat) 提交于
      The code that checks for trampolines when modifying function hooks
      tests against a modified ops "old_hash". But the ops old_hash pointer
      is not being updated before the changes are made, making it possible
      to not find the right hash to the callback and possibly causing
      ftrace to break in accounting and disable itself.
      
      Have the ops set its old_hash before the modifying takes place.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8252ecf3
  4. 09 10月, 2014 2 次提交
  5. 03 10月, 2014 1 次提交
    • S
      ring-buffer: Fix infinite spin in reading buffer · 24607f11
      Steven Rostedt (Red Hat) 提交于
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      24607f11
  6. 19 9月, 2014 3 次提交
    • A
      sched: Add helper for task stack page overrun checking · a70857e4
      Aaron Tomlin 提交于
      This facility is used in a few places so let's introduce
      a helper function to improve code readability.
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: oleg@redhat.com
      Cc: riel@redhat.com
      Cc: prarit@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: mpe@ellerman.id.au
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a70857e4
    • A
      init/main.c: Give init_task a canary · d4311ff1
      Aaron Tomlin 提交于
      Tasks get their end of stack set to STACK_END_MAGIC with the
      aim to catch stack overruns. Currently this feature does not
      apply to init_task. This patch removes this restriction.
      
      Note that a similar patch was posted by Prarit Bhargava
      some time ago but was never merged:
      
        http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4311ff1
    • K
      sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() · f139caf2
      Kirill Tkhai 提交于
      schedule(), io_schedule() and schedule_timeout() always return
      with TASK_RUNNING state set, so one more setting is unnecessary.
      
      (All places in patch are visible good, only exception is
       kiblnd_scheduler() from:
      
            drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
      
       Its schedule() is one line above standard 3 lines of unified diff)
      
      No places where set_current_state() is used for mb().
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Anil Belur <askb23@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dmitry Eremin <dmitry.eremin@intel.com>
      Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Isaac Huang <he.huang@intel.com>
      Cc: James E.J. Bottomley <JBottomley@parallels.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liang Zhen <liang.zhen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masaru Nomura <massa.nomura@gmail.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: Peng Tao <bergwolf@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Robert Love <robert.w.love@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Cc: Zi Shen Lim <zlim.lnx@gmail.com>
      Cc: devel@driverdev.osuosl.org
      Cc: dm-devel@redhat.com
      Cc: dri-devel@lists.freedesktop.org
      Cc: fcoe-devel@open-fcoe.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-afs@lists.infradead.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: qla2xxx-upstream@qlogic.com
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f139caf2
  7. 13 9月, 2014 2 次提交
  8. 10 9月, 2014 7 次提交
    • A
      kernel: trace_syscalls: Replace rcu_assign_pointer() with RCU_INIT_POINTER() · fb5a613b
      Andreea-Cristina Bernat 提交于
      The uses of "rcu_assign_pointer()" are NULLing out the pointers.
      According to RCU_INIT_POINTER()'s block comment:
      "1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
      it is better to use it instead of rcu_assign_pointer() because it has a
      smaller overhead.
      
      The following Coccinelle semantic patch was used:
      @@
      @@
      
      - rcu_assign_pointer
      + RCU_INIT_POINTER
        (..., NULL)
      
      Link: http://lkml.kernel.org/p/20140822142822.GA32391@adaSigned-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fb5a613b
    • S
      ftrace: Replace tramp_hash with old_*_hash to save space · fef5aeee
      Steven Rostedt (Red Hat) 提交于
      Allowing function callbacks to declare their own trampolines requires
      that each ftrace_ops that has a trampoline must have some sort of
      accounting that keeps track of which ops has a trampoline attached
      to a record.
      
      The easy way to solve this was to add a "tramp_hash" that created a
      hash entry for every function that a ops uses with a trampoline.
      But since we can have literally tens of thousands of functions being
      traced, that means we need tens of thousands of descriptors to map
      the ops to the function in the hash. This is quite expensive and
      can cause enabling and disabling the function graph tracer to take
      some time to start and stop. It can take up to several seconds to
      disable or enable all functions in the function graph tracer for this
      reason.
      
      The better approach albeit more complex, is to keep track of how ops
      are being enabled and disabled, and use that along with the counting
      of the number of ops attached to records, to determive what ops has
      a trampoline attached to a record at enabling and disabling of
      tracing.
      
      To do this, the tramp_hash has been replaced with an old_filter_hash
      and old_notrace_hash, which get the copy of the ops filter_hash and
      notrace_hash respectively. The old hashes is kept until the ops has
      been modified or removed and the old hashes are used with the logic
      of the accounting to determine the ops that have the trampoline of
      a record. The reason this has less of a footprint is due to the trick
      that an "empty" hash in the filter_hash means "all functions" and
      an empty hash in the notrace hash means "no functions" in the hash.
      
      This is much more efficienct, doesn't have the delay, and takes up
      much less memory, as we do not need to map all the functions but
      just figure out which functions are mapped at the time it is
      enabled or disabled.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fef5aeee
    • S
      ftrace: Annotate the ops operation on update · e1effa01
      Steven Rostedt (Red Hat) 提交于
      Add three new flags for ftrace_ops:
      
        FTRACE_OPS_FL_ADDING
        FTRACE_OPS_FL_REMOVING
        FTRACE_OPS_FL_MODIFYING
      
      These will be set for the ftrace_ops when they are first added
      to the function tracing, being removed from function tracing
      or just having their functions changed from function tracing,
      respectively.
      
      This will be needed to remove the tramp_hash, which can grow quite
      big. The tramp_hash is used to note what functions a ftrace_ops
      is using a trampoline for. Denoting which ftrace_ops is being
      modified, will allow us to use the ftrace_ops hashes themselves,
      which are much smaller as they have a global flag to denote if
      a ftrace_ops is tracing all functions, as well as a notrace hash
      if the ftrace_ops is tracing all but a few. The tramp_hash just
      creates a hash item for every function, which can go into the 10s
      of thousands if all functions are using the ftrace_ops trampoline.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e1effa01
    • S
      ftrace: Grab any ops for a rec for enabled_functions output · 5fecaa04
      Steven Rostedt (Red Hat) 提交于
      When dumping the enabled_functions, use the first op that is
      found with a trampoline to the record, as there should only be
      one, as only one ops can be registered to a function that has
      a trampoline.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5fecaa04
    • S
      ftrace: Remove freeing of old_hash from ftrace_hash_move() · 3296fc4e
      Steven Rostedt (Red Hat) 提交于
      ftrace_hash_move() currently frees the old hash that is passed to it
      after replacing the pointer with the new hash. Instead of having the
      function do that chore, have the caller perform the free.
      
      This lets the ftrace_hash_move() be used a bit more freely, which
      is needed for changing the way the trampoline logic is done.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3296fc4e
    • S
      ftrace: Set callback to ftrace_stub when no ops are registered · f7aad4e1
      Steven Rostedt (Red Hat) 提交于
      The clean up that adds the helper function ftrace_ops_get_func()
      caused the default function to not change when DYNAMIC_FTRACE was not
      set and no ftrace_ops were registered. Although static tracing is
      not very useful (not having DYNAMIC_FTRACE set), it is still supported
      and we don't want to break it.
      
      Clean up the if statement even more to specifically have the default
      function call ftrace_stub when no ftrace_ops are registered. This
      fixes the small bug for static tracing as well as makes the code a
      bit more understandable.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f7aad4e1
    • S
      ftrace: Add helper function ftrace_ops_get_func() · 87354059
      Steven Rostedt (Red Hat) 提交于
      Add the helper function to what the mcount trampoline is to call
      for a ftrace_ops function. This helper will be used by arch code
      in the future to set up dynamic trampolines. But as this does the
      same tests that are performed in choosing what function to call for
      the default mcount trampoline, might as well use it to clean up
      the existing code.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      87354059
  9. 09 9月, 2014 1 次提交
  10. 26 8月, 2014 1 次提交
    • J
      trace: Fix epoll hang when we race with new entries · 4ce97dbf
      Josef Bacik 提交于
      Epoll on trace_pipe can sometimes hang in a weird case.  If the ring buffer is
      empty when we set waiters_pending but an event shows up exactly at that moment
      we can miss being woken up by the ring buffers irq work.  Since
      ring_buffer_empty() is inherently racey we will sometimes think that the buffer
      is not empty.  So we don't get woken up and we don't think there are any events
      even though there were some ready when we added the watch, which makes us hang.
      This patch fixes this by making sure that we are actually on the wait list
      before we set waiters_pending, and add a memory barrier to make sure
      ring_buffer_empty() is going to be correct.
      
      Link: http://lkml.kernel.org/p/1408989581-23727-1-git-send-email-jbacik@fb.com
      
      Cc: stable@vger.kernel.org # 3.10+
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4ce97dbf
  11. 23 8月, 2014 5 次提交
    • S
      ftrace: Use current addr when converting to nop in __ftrace_replace_code() · 39b5552c
      Steven Rostedt (Red Hat) 提交于
      In __ftrace_replace_code(), when converting the call to a nop in a function
      it needs to compare against the "curr" (current) value of the ftrace ops, and
      not the "new" one. It currently does not affect x86 which is the only arch
      to do the trampolines with function graph tracer, but when other archs that do
      depend on this code implement the function graph trampoline, it can crash.
      
      Here's an example when ARM uses the trampolines (in the future):
      
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
       Modules linked in: omap_rng rng_core ipv6
       CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28-dirty #52
       [<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
       [<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
       [<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
       [<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
       [<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
       [<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
       [<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
       [<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
       [<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
       [<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
       [<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
       [<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
       [<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
       ---[ end trace dc9ce72c5b617d8f ]---
      [   65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
      [   65.054070]  actual: 85:1b:00:eb
      
      Fixes: 7413af1f "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      39b5552c
    • S
      ftrace: Fix function_profiler and function tracer together · 5f151b24
      Steven Rostedt (Red Hat) 提交于
      The latest rewrite of ftrace removed the separate ftrace_ops of
      the function tracer and the function graph tracer and had them
      share the same ftrace_ops. This simplified the accounting by removing
      the multiple layers of functions called, where the global_ops func
      would call a special list that would iterate over the other ops that
      were registered within it (like function and function graph), which
      itself was registered to the ftrace ops list of all functions
      currently active. If that sounds confusing, the code that implemented
      it was also confusing and its removal is a good thing.
      
      The problem with this change was that it assumed that the function
      and function graph tracer can never be used at the same time.
      This is mostly true, but there is an exception. That is when the
      function profiler uses the function graph tracer to profile.
      The function profiler can be activated the same time as the function
      tracer, and this breaks the assumption and the result is that ftrace
      will crash (it detects the error and shuts itself down, it does not
      cause a kernel oops).
      
      To solve this issue, a previous change allowed the hash tables
      for the functions traced by a ftrace_ops to be a pointer and let
      multiple ftrace_ops share the same hash. This allows the function
      and function_graph tracer to have separate ftrace_ops, but still
      share the hash, which is what is done.
      
      Now the function and function graph tracers have separate ftrace_ops
      again, and the function tracer can be run while the function_profile
      is active.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5f151b24
    • S
      ftrace: Fix up trampoline accounting with looping on hash ops · bce0b6c5
      Steven Rostedt (Red Hat) 提交于
      Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
      the rec->flags by more than once (one per those that share the ftrace_hash).
      This means that the tramp_hash may not have a hash item when it was added.
      
      For example, if two ftrace_ops share a hash for a ftrace record, and the
      first ops has a trampoline, when it adds itself it will set the rec->flags
      TRAMP flag and increments its nr_trampolines counter. When the second ops
      is added, it must clear that tramp flag but also decrement the other ops
      that shares its hash. As the update to the function callbacks has not yet
      been performed, the other ops will not have the tramp hash set yet and it
      can not be used to know to decrement its nr_trampolines.
      
      Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
      held, a ops with a trampoline to a record during an update of another ops
      that shares the record will have its func_hash pointing to it. Since a
      trampoline can only be set for a record if only one ops is attached to it,
      we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
      is set) and then find the ops that has this record in its hashes.
      
      Also added some output to help debug when things go wrong.
      
      Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bce0b6c5
    • S
      ftrace: Update all ftrace_ops for a ftrace_hash_ops update · 84261912
      Steven Rostedt (Red Hat) 提交于
      When updating what an ftrace_ops traces, if it is registered (that is,
      actively tracing), and that ftrace_ops uses the shared global_ops
      local_hash, then we need to update all tracers that are active and
      also share the global_ops' ftrace_hash_ops.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      84261912
    • S
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) 提交于
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      33b7f99c
  12. 07 8月, 2014 2 次提交
    • S
      ring-buffer: Always reset iterator to reader page · 651e22f2
      Steven Rostedt (Red Hat) 提交于
      When performing a consuming read, the ring buffer swaps out a
      page from the ring buffer with a empty page and this page that
      was swapped out becomes the new reader page. The reader page
      is owned by the reader and since it was swapped out of the ring
      buffer, writers do not have access to it (there's an exception
      to that rule, but it's out of scope for this commit).
      
      When reading the "trace" file, it is a non consuming read, which
      means that the data in the ring buffer will not be modified.
      When the trace file is opened, a ring buffer iterator is allocated
      and writes to the ring buffer are disabled, such that the iterator
      will not have issues iterating over the data.
      
      Although the ring buffer disabled writes, it does not disable other
      reads, or even consuming reads. If a consuming read happens, then
      the iterator is reset and starts reading from the beginning again.
      
      My tests would sometimes trigger this bug on my i386 box:
      
      WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
      Modules linked in:
      CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
       f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
       ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
      Call Trace:
       [<c18796b3>] dump_stack+0x4b/0x75
       [<c103a0e3>] warn_slowpath_common+0x7e/0x95
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c103a185>] warn_slowpath_fmt+0x33/0x35
       [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
       [<c10bed04>] trace_find_cmdline+0x40/0x64
       [<c10c3c16>] trace_print_context+0x27/0xec
       [<c10c4360>] ? trace_seq_printf+0x37/0x5b
       [<c10c0b15>] print_trace_line+0x319/0x39b
       [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
       [<c10c13b1>] s_show+0x192/0x1ab
       [<c10bfd9a>] ? s_next+0x5a/0x7c
       [<c112e76e>] seq_read+0x267/0x34c
       [<c1115a25>] vfs_read+0x8c/0xef
       [<c112e507>] ? seq_lseek+0x154/0x154
       [<c1115ba2>] SyS_read+0x54/0x7f
       [<c188488e>] syscall_call+0x7/0xb
      ---[ end trace 3f507febd6b4cc83 ]---
      >>>> ##### CPU 1 buffer started ####
      
      Which was the __trace_find_cmdline() function complaining about the pid
      in the event record being negative.
      
      After adding more test cases, this would trigger more often. Strangely
      enough, it would never trigger on a single test, but instead would trigger
      only when running all the tests. I believe that was the case because it
      required one of the tests to be shutting down via delayed instances while
      a new test started up.
      
      After spending several days debugging this, I found that it was caused by
      the iterator becoming corrupted. Debugging further, I found out why
      the iterator became corrupted. It happened with the rb_iter_reset().
      
      As consuming reads may not read the full reader page, and only part
      of it, there's a "read" field to know where the last read took place.
      The iterator, must also start at the read position. In the rb_iter_reset()
      code, if the reader page was disconnected from the ring buffer, the iterator
      would start at the head page within the ring buffer (where writes still
      happen). But the mistake there was that it still used the "read" field
      to start the iterator on the head page, where it should always start
      at zero because readers never read from within the ring buffer where
      writes occur.
      
      I originally wrote a patch to have it set the iter->head to 0 instead
      of iter->head_page->read, but then I questioned why it wasn't always
      setting the iter to point to the reader page, as the reader page is
      still valid.  The list_empty(reader_page->list) just means that it was
      successful in swapping out. But the reader_page may still have data.
      
      There was a bug report a long time ago that was not reproducible that
      had something about trace_pipe (consuming read) not matching trace
      (iterator read). This may explain why that happened.
      
      Anyway, the correct answer to this bug is to always use the reader page
      an not reset the iterator to inside the writable ring buffer.
      
      Cc: stable@vger.kernel.org # 2.6.28+
      Fixes: d769041f "ring_buffer: implement new locking"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      651e22f2
    • S
      ring-buffer: Up rb_iter_peek() loop count to 3 · 021de3d9
      Steven Rostedt (Red Hat) 提交于
      After writting a test to try to trigger the bug that caused the
      ring buffer iterator to become corrupted, I hit another bug:
      
       WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
       Modules linked in: ipt_MASQUERADE sunrpc [...]
       CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
        ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
        ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
       Call Trace:
        [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
        [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
        [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
        [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
        [<ffffffff8114cf94>] ? seq_read+0x148/0x361
        [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
        [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
        [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2
      
      Debugging this bug, which triggers when the rb_iter_peek() loops too
      many times (more than 2 times), I discovered there's a case that can
      cause that function to legitimately loop 3 times!
      
      rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
      only deals with the reader page (it's for consuming reads). The
      rb_iter_peek() is for traversing the buffer without consuming it, and as
      such, it can loop for one more reason. That is, if we hit the end of
      the reader page or any page, it will go to the next page and try again.
      
      That is, we have this:
      
       1. iter->head > iter->head_page->page->commit
          (rb_inc_iter() which moves the iter to the next page)
          try again
      
       2. event = rb_iter_head_event()
          event->type_len == RINGBUF_TYPE_TIME_EXTEND
          rb_advance_iter()
          try again
      
       3. read the event.
      
      But we never get to 3, because the count is greater than 2 and we
      cause the WARNING and return NULL.
      
      Up the counter to 3.
      
      Cc: stable@vger.kernel.org # 2.6.37+
      Fixes: 69d1b839 "ring-buffer: Bind time extend and data events together"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      021de3d9
  13. 28 7月, 2014 1 次提交
  14. 24 7月, 2014 5 次提交
    • S
      ftrace: Add warning if tramp hash does not match nr_trampolines · dc6f03f2
      Steven Rostedt (Red Hat) 提交于
      After adding all the records to the tramp_hash, add a check that makes
      sure that the number of records added matches the number of records
      expected to match and do a WARN_ON and disable ftrace if they do
      not match.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dc6f03f2
    • S
      ftrace: Fix trampoline hash update check on rec->flags · 2a0343ba
      Steven Rostedt (Red Hat) 提交于
      In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs
      to the ops hash if the rec has only one callback attached and the
      ops is connected to the rec. It gives a nasty warning and shuts down
      ftrace if the rec doesn't have a trampoline set for it. But this
      can happen with the following scenario:
      
        # cd /sys/kernel/debug/tracing
        # echo schedule do_IRQ > set_ftrace_filter
        # mkdir instances/foo
        # echo schedule > instances/foo/set_ftrace_filter
        # echo function_graph > current_function
        # echo function > instances/foo/current_function
        # echo nop > instances/foo/current_function
      
      The above would then trigger the following warning and disable
      ftrace:
      
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b()
       Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...]
       CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000
        ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001
        ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be
       Call Trace:
        [<ffffffff81502130>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
        [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
        [<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b
        [<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38
        [<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28
        [<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276
        [<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91
        [<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc
        [<ffffffff8120478f>] ? security_file_permission+0x1b/0x31
        [<ffffffff81130e49>] ? vfs_write+0xac/0x11c
        [<ffffffff8113115d>] ? SyS_write+0x60/0x8e
        [<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b
       ---[ end trace 938c4415cbc7dc96 ]---
       ------------[ cut here ]------------
      
      Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.comReported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a0343ba
    • S
      ring-buffer: Use rb_page_size() instead of open coded head_page size · 10e83fd0
      Steven Rostedt (Red Hat) 提交于
      There's a helper function to get a ring buffer page size (the number
      of bytes of data recorded on the page), called rb_page_size().
      Use that instead of open coding it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      10e83fd0
    • T
      ftrace: Provide trace clocks monotonic · 1b3e5c09
      Thomas Gleixner 提交于
      Expose the new NMI safe accessor to clock monotonic to the tracer.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      1b3e5c09
    • S
      ftrace: Rename ftrace_ops field from trampolines to nr_trampolines · 0162d621
      Steven Rostedt (Red Hat) 提交于
      Having two fields within the same struct that is off by one character
      can be confusing and error prone. Rename the counter "trampolines"
      to "nr_trampolines" to explicitly show it is a counter and not to
      be confused by the "trampoline" field.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0162d621
  15. 21 7月, 2014 1 次提交
  16. 19 7月, 2014 5 次提交