1. 20 11月, 2014 7 次提交
  2. 19 11月, 2014 1 次提交
    • S
      tracing: Fix race of function probes counting · a9ce7c36
      Steven Rostedt (Red Hat) 提交于
      The function probe counting for traceon and traceoff suffered a race
      condition where if the probe was executing on two or more CPUs at the
      same time, it could decrement the counter by more than one when
      disabling (or enabling) the tracer only once.
      
      The way the traceon and traceoff probes are suppose to work is that
      they disable (or enable) tracing once per count. If a user were to
      echo 'schedule:traceoff:3' into set_ftrace_filter, then when the
      schedule function was called, it would disable tracing. But the count
      should only be decremented once (to 2). Then if the user enabled tracing
      again (via tracing_on file), the next call to schedule would disable
      tracing again and the count would be decremented to 1.
      
      But if multiple CPUS called schedule at the same time, it is possible
      that the count would be decremented more than once because of the
      simple "count--" used.
      
      By reading the count into a local variable and using memory barriers
      we can guarantee that the count would only be decremented once per
      disable (or enable).
      
      The stack trace probe had a similar race, but here the stack trace will
      decrement for each time it is called. But this had the read-modify-
      write race, where it could stack trace more than the number of times
      that was specified. This case we use a cmpxchg to stack trace only the
      number of times specified.
      
      The dump probes can still use the old "update_count()" function as
      they only run once, and that is controlled by the dump logic
      itself.
      
      Link: http://lkml.kernel.org/r/20141118134643.4b550ee4@gandalf.local.homeSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a9ce7c36
  3. 14 11月, 2014 9 次提交
  4. 12 11月, 2014 6 次提交
  5. 01 11月, 2014 2 次提交
    • S
      ftrace/x86: Show trampoline call function in enabled_functions · 15d5b02c
      Steven Rostedt (Red Hat) 提交于
      The file /sys/kernel/debug/tracing/eneabled_functions is used to debug
      ftrace function hooks. Add to the output what function is being called
      by the trampoline if the arch supports it.
      
      Add support for this feature in x86_64.
      
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      15d5b02c
    • S
      ftrace/x86: Add dynamic allocated trampoline for ftrace_ops · f3bea491
      Steven Rostedt (Red Hat) 提交于
      The current method of handling multiple function callbacks is to register
      a list function callback that calls all the other callbacks based on
      their hash tables and compare it to the function that the callback was
      called on. But this is very inefficient.
      
      For example, if you are tracing all functions in the kernel and then
      add a kprobe to a function such that the kprobe uses ftrace, the
      mcount trampoline will switch from calling the function trace callback
      to calling the list callback that will iterate over all registered
      ftrace_ops (in this case, the function tracer and the kprobes callback).
      That means for every function being traced it checks the hash of the
      ftrace_ops for function tracing and kprobes, even though the kprobes
      is only set at a single function. The kprobes ftrace_ops is checked
      for every function being traced!
      
      Instead of calling the list function for functions that are only being
      traced by a single callback, we can call a dynamically allocated
      trampoline that calls the callback directly. The function graph tracer
      already uses a direct call trampoline when it is being traced by itself
      but it is not dynamically allocated. It's trampoline is static in the
      kernel core. The infrastructure that called the function graph trampoline
      can also be used to call a dynamically allocated one.
      
      For now, only ftrace_ops that are not dynamically allocated can have
      a trampoline. That is, users such as function tracer or stack tracer.
      kprobes and perf allocate their ftrace_ops, and until there's a safe
      way to free the trampoline, it can not be used. The dynamically allocated
      ftrace_ops may, although, use the trampoline if the kernel is not
      compiled with CONFIG_PREEMPT. But that will come later.
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f3bea491
  6. 25 10月, 2014 2 次提交
    • S
      ftrace: Fix checking of trampoline ftrace_ops in finding trampoline · 4fc40904
      Steven Rostedt (Red Hat) 提交于
      When modifying code, ftrace has several checks to make sure things
      are being done correctly. One of them is to make sure any code it
      modifies is exactly what it expects it to be before it modifies it.
      In order to do so with the new trampoline logic, it must be able
      to find out what trampoline a function is hooked to in order to
      see if the code that hooks to it is what's expected.
      
      The logic to find the trampoline from a record (accounting descriptor
      for a function that is hooked) needs to only look at the "old_hash"
      of an ops that is being modified. The old_hash is the list of function
      an ops is hooked to before its update. Since a record would only be
      pointing to an ops that is being modified if it was already hooked
      before.
      
      Currently, it can pick a modified ops based on its new functions it
      will be hooked to, and this picks the wrong trampoline and causes
      the check to fail, disabling ftrace.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      
      ftrace: squash into ordering of ops for modification
      4fc40904
    • S
      ftrace: Set ops->old_hash on modifying what an ops hooks to · 8252ecf3
      Steven Rostedt (Red Hat) 提交于
      The code that checks for trampolines when modifying function hooks
      tests against a modified ops "old_hash". But the ops old_hash pointer
      is not being updated before the changes are made, making it possible
      to not find the right hash to the callback and possibly causing
      ftrace to break in accounting and disable itself.
      
      Have the ops set its old_hash before the modifying takes place.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8252ecf3
  7. 09 10月, 2014 2 次提交
  8. 03 10月, 2014 1 次提交
    • S
      ring-buffer: Fix infinite spin in reading buffer · 24607f11
      Steven Rostedt (Red Hat) 提交于
      Commit 651e22f2 "ring-buffer: Always reset iterator to reader page"
      fixed one bug but in the process caused another one. The reset is to
      update the header page, but that fix also changed the way the cached
      reads were updated. The cache reads are used to test if an iterator
      needs to be updated or not.
      
      A ring buffer iterator, when created, disables writes to the ring buffer
      but does not stop other readers or consuming reads from happening.
      Although all readers are synchronized via a lock, they are only
      synchronized when in the ring buffer functions. Those functions may
      be called by any number of readers. The iterator continues down when
      its not interrupted by a consuming reader. If a consuming read
      occurs, the iterator starts from the beginning of the buffer.
      
      The way the iterator sees that a consuming read has happened since
      its last read is by checking the reader "cache". The cache holds the
      last counts of the read and the reader page itself.
      
      Commit 651e22f2 changed what was saved by the cache_read when
      the rb_iter_reset() occurred, making the iterator never match the cache.
      Then if the iterator calls rb_iter_reset(), it will go into an
      infinite loop by checking if the cache doesn't match, doing the reset
      and retrying, just to see that the cache still doesn't match! Which
      should never happen as the reset is suppose to set the cache to the
      current value and there's locks that keep a consuming reader from
      having access to the data.
      
      Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      24607f11
  9. 19 9月, 2014 3 次提交
    • A
      sched: Add helper for task stack page overrun checking · a70857e4
      Aaron Tomlin 提交于
      This facility is used in a few places so let's introduce
      a helper function to improve code readability.
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: oleg@redhat.com
      Cc: riel@redhat.com
      Cc: prarit@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: mpe@ellerman.id.au
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-3-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a70857e4
    • A
      init/main.c: Give init_task a canary · d4311ff1
      Aaron Tomlin 提交于
      Tasks get their end of stack set to STACK_END_MAGIC with the
      aim to catch stack overruns. Currently this feature does not
      apply to init_task. This patch removes this restriction.
      
      Note that a similar patch was posted by Prarit Bhargava
      some time ago but was never merged:
      
        http://marc.info/?l=linux-kernel&m=127144305403241&w=2Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: aneesh.kumar@linux.vnet.ibm.com
      Cc: dzickus@redhat.com
      Cc: bmr@redhat.com
      Cc: jcastillo@redhat.com
      Cc: jgh@redhat.com
      Cc: minchan@kernel.org
      Cc: tglx@linutronix.de
      Cc: hannes@cmpxchg.org
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/r/1410527779-8133-2-git-send-email-atomlin@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4311ff1
    • K
      sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after schedule() · f139caf2
      Kirill Tkhai 提交于
      schedule(), io_schedule() and schedule_timeout() always return
      with TASK_RUNNING state set, so one more setting is unnecessary.
      
      (All places in patch are visible good, only exception is
       kiblnd_scheduler() from:
      
            drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
      
       Its schedule() is one line above standard 3 lines of unified diff)
      
      No places where set_current_state() is used for mb().
      Signed-off-by: NKirill Tkhai <ktkhai@parallels.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Anil Belur <askb23@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dmitry Eremin <dmitry.eremin@intel.com>
      Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Isaac Huang <he.huang@intel.com>
      Cc: James E.J. Bottomley <JBottomley@parallels.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liang Zhen <liang.zhen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masaru Nomura <massa.nomura@gmail.com>
      Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Oleg Drokin <green@linuxhacker.ru>
      Cc: Peng Tao <bergwolf@gmail.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Robert Love <robert.w.love@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Ursula Braun <ursula.braun@de.ibm.com>
      Cc: Zi Shen Lim <zlim.lnx@gmail.com>
      Cc: devel@driverdev.osuosl.org
      Cc: dm-devel@redhat.com
      Cc: dri-devel@lists.freedesktop.org
      Cc: fcoe-devel@open-fcoe.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux390@de.ibm.com
      Cc: linux-afs@lists.infradead.org
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-raid@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-scsi@vger.kernel.org
      Cc: qla2xxx-upstream@qlogic.com
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: user-mode-linux-user@lists.sourceforge.net
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f139caf2
  10. 13 9月, 2014 2 次提交
  11. 10 9月, 2014 5 次提交
    • A
      kernel: trace_syscalls: Replace rcu_assign_pointer() with RCU_INIT_POINTER() · fb5a613b
      Andreea-Cristina Bernat 提交于
      The uses of "rcu_assign_pointer()" are NULLing out the pointers.
      According to RCU_INIT_POINTER()'s block comment:
      "1.   This use of RCU_INIT_POINTER() is NULLing out the pointer"
      it is better to use it instead of rcu_assign_pointer() because it has a
      smaller overhead.
      
      The following Coccinelle semantic patch was used:
      @@
      @@
      
      - rcu_assign_pointer
      + RCU_INIT_POINTER
        (..., NULL)
      
      Link: http://lkml.kernel.org/p/20140822142822.GA32391@adaSigned-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fb5a613b
    • S
      ftrace: Replace tramp_hash with old_*_hash to save space · fef5aeee
      Steven Rostedt (Red Hat) 提交于
      Allowing function callbacks to declare their own trampolines requires
      that each ftrace_ops that has a trampoline must have some sort of
      accounting that keeps track of which ops has a trampoline attached
      to a record.
      
      The easy way to solve this was to add a "tramp_hash" that created a
      hash entry for every function that a ops uses with a trampoline.
      But since we can have literally tens of thousands of functions being
      traced, that means we need tens of thousands of descriptors to map
      the ops to the function in the hash. This is quite expensive and
      can cause enabling and disabling the function graph tracer to take
      some time to start and stop. It can take up to several seconds to
      disable or enable all functions in the function graph tracer for this
      reason.
      
      The better approach albeit more complex, is to keep track of how ops
      are being enabled and disabled, and use that along with the counting
      of the number of ops attached to records, to determive what ops has
      a trampoline attached to a record at enabling and disabling of
      tracing.
      
      To do this, the tramp_hash has been replaced with an old_filter_hash
      and old_notrace_hash, which get the copy of the ops filter_hash and
      notrace_hash respectively. The old hashes is kept until the ops has
      been modified or removed and the old hashes are used with the logic
      of the accounting to determine the ops that have the trampoline of
      a record. The reason this has less of a footprint is due to the trick
      that an "empty" hash in the filter_hash means "all functions" and
      an empty hash in the notrace hash means "no functions" in the hash.
      
      This is much more efficienct, doesn't have the delay, and takes up
      much less memory, as we do not need to map all the functions but
      just figure out which functions are mapped at the time it is
      enabled or disabled.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fef5aeee
    • S
      ftrace: Annotate the ops operation on update · e1effa01
      Steven Rostedt (Red Hat) 提交于
      Add three new flags for ftrace_ops:
      
        FTRACE_OPS_FL_ADDING
        FTRACE_OPS_FL_REMOVING
        FTRACE_OPS_FL_MODIFYING
      
      These will be set for the ftrace_ops when they are first added
      to the function tracing, being removed from function tracing
      or just having their functions changed from function tracing,
      respectively.
      
      This will be needed to remove the tramp_hash, which can grow quite
      big. The tramp_hash is used to note what functions a ftrace_ops
      is using a trampoline for. Denoting which ftrace_ops is being
      modified, will allow us to use the ftrace_ops hashes themselves,
      which are much smaller as they have a global flag to denote if
      a ftrace_ops is tracing all functions, as well as a notrace hash
      if the ftrace_ops is tracing all but a few. The tramp_hash just
      creates a hash item for every function, which can go into the 10s
      of thousands if all functions are using the ftrace_ops trampoline.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e1effa01
    • S
      ftrace: Grab any ops for a rec for enabled_functions output · 5fecaa04
      Steven Rostedt (Red Hat) 提交于
      When dumping the enabled_functions, use the first op that is
      found with a trampoline to the record, as there should only be
      one, as only one ops can be registered to a function that has
      a trampoline.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5fecaa04
    • S
      ftrace: Remove freeing of old_hash from ftrace_hash_move() · 3296fc4e
      Steven Rostedt (Red Hat) 提交于
      ftrace_hash_move() currently frees the old hash that is passed to it
      after replacing the pointer with the new hash. Instead of having the
      function do that chore, have the caller perform the free.
      
      This lets the ftrace_hash_move() be used a bit more freely, which
      is needed for changing the way the trampoline logic is done.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3296fc4e