1. 19 5月, 2012 1 次提交
  2. 17 5月, 2012 4 次提交
    • S
      ring-buffer: Reset head page before running self test · 308f7eeb
      Steven Rostedt 提交于
      When the ring buffer does its consistency test on itself, it
      removes the head page, runs the tests, and then adds it back
      to what the "head_page" pointer was. But because the head_page
      pointer may lack behind the real head page (held by the link
      list pointer). The reset may be incorrect.
      
      Instead, if the head_page exists (it does not on first allocation)
      reset it back to the real head page before running the consistency
      tests. Then it will be put back to its original location after
      the tests are complete.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      308f7eeb
    • S
      ring-buffer: Add integrity check at end of iter read · 659f451f
      Steven Rostedt 提交于
      There use to be ring buffer integrity checks after updating the
      size of the ring buffer. But now that the ring buffer can modify
      the size while the system is running, the integrity checks were
      removed, as they require the ring buffer to be disabed to perform
      the check.
      
      Move the integrity check to the reading of the ring buffer via the
      iterator reads (the "trace" file). As reading via an iterator requires
      disabling the ring buffer, it is a perfect place to have it.
      
      If the ring buffer happens to be disabled when updating the size,
      we still perform the integrity check.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      659f451f
    • V
      ring-buffer: Make addition of pages in ring buffer atomic · 5040b4b7
      Vaibhav Nagarnaik 提交于
      This patch adds the capability to add new pages to a ring buffer
      atomically while write operations are going on. This makes it possible
      to expand the ring buffer size without reinitializing the ring buffer.
      
      The new pages are attached between the head page and its previous page.
      
      Link: http://lkml.kernel.org/r/1336096792-25373-2-git-send-email-vnagarnaik@google.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Laurent Chavey <chavey@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5040b4b7
    • V
      ring-buffer: Make removal of ring buffer pages atomic · 83f40318
      Vaibhav Nagarnaik 提交于
      This patch adds the capability to remove pages from a ring buffer
      without destroying any existing data in it.
      
      This is done by removing the pages after the tail page. This makes sure
      that first all the empty pages in the ring buffer are removed. If the
      head page is one in the list of pages to be removed, then the page after
      the removed ones is made the head page. This removes the oldest data
      from the ring buffer and keeps the latest data around to be read.
      
      To do this in a non-racey manner, tracing is stopped for a very short
      time while the pages to be removed are identified and unlinked from the
      ring buffer. The pages are freed after the tracing is restarted to
      minimize the time needed to stop tracing.
      
      The context in which the pages from the per-cpu ring buffer are removed
      runs on the respective CPU. This minimizes the events not traced to only
      NMI trace contexts.
      
      Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Laurent Chavey <chavey@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      83f40318
  3. 24 4月, 2012 1 次提交
  4. 23 2月, 2012 1 次提交
    • S
      tracing/ring-buffer: Only have tracing_on disable tracing buffers · 499e5470
      Steven Rostedt 提交于
      As the ring-buffer code is being used by other facilities in the
      kernel, having tracing_on file disable *all* buffers is not a desired
      affect. It should only disable the ftrace buffers that are being used.
      
      Move the code into the trace.c file and use the buffer disabling
      for tracing_on() and tracing_off(). This way only the ftrace buffers
      will be affected by them and other kernel utilities will not be
      confused to why their output suddenly stopped.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      499e5470
  5. 13 9月, 2011 1 次提交
  6. 31 8月, 2011 1 次提交
    • V
      trace: Add ring buffer stats to measure rate of events · c64e148a
      Vaibhav Nagarnaik 提交于
      The stats file under per_cpu folder provides the number of entries,
      overruns and other statistics about the CPU ring buffer. However, the
      numbers do not provide any indication of how full the ring buffer is in
      bytes compared to the overall size in bytes. Also, it is helpful to know
      the rate at which the cpu buffer is filling up.
      
      This patch adds an entry "bytes: " in printed stats for per_cpu ring
      buffer which provides the actual bytes consumed in the ring buffer. This
      field includes the number of bytes used by recorded events and the
      padding bytes added when moving the tail pointer to next page.
      
      It also adds the following time stamps:
      "oldest event ts:" - the oldest timestamp in the ring buffer
      "now ts:"  - the timestamp at the time of reading
      
      The field "now ts" provides a consistent time snapshot to the userspace
      when being read. This is read from the same trace clock used by tracing
      event timestamps.
      
      Together, these values provide the rate at which the buffer is filling
      up, from the formula:
      bytes / (now_ts - oldest_event_ts)
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      c64e148a
  7. 15 6月, 2011 3 次提交
  8. 26 5月, 2011 1 次提交
    • S
      ftrace: Add internal recursive checks · b1cff0ad
      Steven Rostedt 提交于
      Witold reported a reboot caused by the selftests of the dynamic function
      tracer. He sent me a config and I used ktest to do a config_bisect on it
      (as my config did not cause the crash). It pointed out that the problem
      config was CONFIG_PROVE_RCU.
      
      What happened was that if multiple callbacks are attached to the
      function tracer, we iterate a list of callbacks. Because the list is
      managed by synchronize_sched() and preempt_disable, the access to the
      pointers uses rcu_dereference_raw().
      
      When PROVE_RCU is enabled, the rcu_dereference_raw() calls some
      debugging functions, which happen to be traced. The tracing of the debug
      function would then call rcu_dereference_raw() which would then call the
      debug function and then... well you get the idea.
      
      I first wrote two different patches to solve this bug.
      
      1) add a __rcu_dereference_raw() that would not do any checks.
      2) add notrace to the offending debug functions.
      
      Both of these patches worked.
      
      Talking with Paul McKenney on IRC, he suggested to add recursion
      detection instead. This seemed to be a better solution, so I decided to
      implement it. As the task_struct already has a trace_recursion to detect
      recursion in the ring buffer, and that has a very small number it
      allows, I decided to use that same variable to add flags that can detect
      the recursion inside the infrastructure of the function tracer.
      
      I plan to change it so that the task struct bit can be checked in
      mcount, but as that requires changes to all archs, I will hold that off
      to the next merge window.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1306348063.1465.116.camel@gandalf.stny.rr.comReported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b1cff0ad
  9. 31 3月, 2011 1 次提交
  10. 10 3月, 2011 3 次提交
  11. 18 2月, 2011 1 次提交
  12. 09 2月, 2011 1 次提交
    • J
      tracing: Add unstable sched clock note to the warning · 5e38ca8f
      Jiri Olsa 提交于
      The warning "Delta way too big" warning might appear on a system with
      unstable shed clock right after the system is resumed and tracing
      was enabled during the suspend.
      
      Since it's not realy bug, and the unstable sched clock is working
      fast and reliable otherwise, Steven suggested to keep using the
      sched clock in any case and just to make note in the warning itself.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <1296649698-6003-1-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5e38ca8f
  13. 19 1月, 2011 1 次提交
  14. 24 12月, 2010 1 次提交
    • D
      ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page · e1e35927
      David Sharp 提交于
      Fix two related problems in the event-copying loop of
      ring_buffer_read_page.
      
      The loop condition for copying events is off-by-one.
      "len" is the remaining space in the caller-supplied page.
      "size" is the size of the next event (or two events).
      If len == size, then there is just enough space for the next event.
      
      size was set to rb_event_ts_length, which may include the size of two
      events if the first event is a time-extend, in order to assure time-
      extends are kept together with the event after it. However,
      rb_advance_reader always advances by one event. This would result in the
      event after any time-extend being duplicated. Instead, get the size of
      a single event for the memcpy, but use rb_event_ts_length for the loop
      condition.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1293064704-8101-1-git-send-email-dhsharp@google.com>
      LKML-Reference: <AANLkTin7nLrRPc9qGjdjHbeVDDWiJjAiYyb-L=gH85bx@mail.gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e1e35927
  15. 21 10月, 2010 5 次提交
    • S
      ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE · b8b2663b
      Steven Rostedt 提交于
      With the binding of time extends to events we no longer need to use
      the macro RB_TIMESTAMPS_PER_PAGE. Remove it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b8b2663b
    • S
      ring-buffer: Micro-optimize with some strategic inlining · d9abde21
      Steven Rostedt 提交于
      By using inline and noinline, we are able to make the fast path of
      recording an event 4% faster.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d9abde21
    • S
      ring-buffer: Remove condition to add timestamp in fast path · 140ff891
      Steven Rostedt 提交于
      There's a condition to check if we should add a time extend or
      not in the fast path. But this condition is racey (in the sense
      that we can add a unnecessary time extend, but nothing that
      can break anything). We later check if the time or event time
      delta should be zero or have real data in it (not racey), making
      this first check redundant.
      
      This check may help save space once in a while, but really is
      not worth the hassle to try to save some space that happens at
      most 134 ms at a time.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      140ff891
    • S
      ring-buffer: Bind time extend and data events together · 69d1b839
      Steven Rostedt 提交于
      When the time between two timestamps is greater than
      2^27 nanosecs (~134 ms) a time extend event is added that extends
      the time difference to 59 bits (~18 years). This is due to
      events only having a 27 bit field to store time.
      
      Currently this time extend is a separate event. We add it just before
      the event data that is being written to the buffer. But before
      the event data is committed, the event data can also be discarded (as
      with the case of filters). But because the time extend has already been
      committed, it will stay in the buffer.
      
      If lots of events are being filtered and no event is being
      written, then every 134ms a time extend can be added to the buffer
      without any data attached. To keep from filling the entire buffer
      with time extends, a time extend will never be the first event
      in a page because the page timestamp can be used. Time extends can
      only fill the rest of a page with some data at the beginning.
      
      This patch binds the time extend with the data. The difference here
      is that the time extend is not committed before the data is added.
      Instead, when a time extend is needed, the space reserved on
      the ring buffer is the time extend + the data event size. The
      time extend is added to the first part of the reserved block and
      the data is added to the second. The time extend event is passed
      back to the reserver, but since the reserver also uses a function
      to find the data portion of the reserved block, no changes to the
      ring buffer interface need to be made.
      
      When a commit is discarded, we now remove both the time extend and
      the event. With this approach no more than one time extend can
      be in the buffer in a row. Data must always follow a time extend.
      
      Thanks to Mathieu Desnoyers for suggesting this idea.
      Suggested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      69d1b839
    • S
      ring-buffer: Pass delta by value and not by reference · f25106ae
      Steven Rostedt 提交于
      The delta between events is passed to the timestamp code by reference
      and the timestamp code will reset the value. But it can be reset
      from the caller. No need to pass it in by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the delta in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f25106ae
  16. 20 10月, 2010 2 次提交
    • S
      ring-buffer: Pass timestamp by value and not by reference · e8bc43e8
      Steven Rostedt 提交于
      The original code for the ring buffer had locations that modified
      the timestamp and that change was used by the callers. Now,
      the timestamp is not reused by the callers and there is no reason
      to pass it by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the timestamp in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e8bc43e8
    • S
      ring-buffer: Make write slow path out of line · 747e94ae
      Steven Rostedt 提交于
      Gcc inlines the slow path of the ring buffer write which can
      hurt performance. This patch simply forces the slow path function
      rb_move_tail() to always be a function.
      
      The ring_buffer_benchmark module with reader_disabled=1 shows that
      this patch changes the time to record an event from 135 ns to
      132 ns. (3 ns or 2.22% improvement)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      747e94ae
  17. 15 10月, 2010 1 次提交
    • A
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann 提交于
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  18. 13 10月, 2010 1 次提交
    • S
      ring-buffer: Fix typo of time extends per page · d0134324
      Steven Rostedt 提交于
      Time stamps for the ring buffer are created by the difference between
      two events. Each page of the ring buffer holds a full 64 bit timestamp.
      Each event has a 27 bit delta stamp from the last event. The unit of time
      is nanoseconds, so 27 bits can hold ~134 milliseconds. If two events
      happen more than 134 milliseconds apart, a time extend is inserted
      to add more bits for the delta. The time extend has 59 bits, which
      is good for ~18 years.
      
      Currently the time extend is committed separately from the event.
      If an event is discarded before it is committed, due to filtering,
      the time extend still exists. If all events are being filtered, then
      after ~134 milliseconds a new time extend will be added to the buffer.
      
      This can only happen till the end of the page. Since each page holds
      a full timestamp, there is no reason to add a time extend to the
      beginning of a page. Time extends can only fill a page that has actual
      data at the beginning, so there is no fear that time extends will fill
      more than a page without any data.
      
      When reading an event, a loop is made to skip over time extends
      since they are only used to maintain the time stamp and are never
      given to the caller. As a paranoid check to prevent the loop running
      forever, with the knowledge that time extends may only fill a page,
      a check is made that tests the iteration of the loop, and if the
      iteration is more than the number of time extends that can fit in a page
      a warning is printed and the ring buffer is disabled (all of ftrace
      is also disabled with it).
      
      There is another event type that is called a TIMESTAMP which can
      hold 64 bits of data in the theoretical case that two events happen
      18 years apart. This code has not been implemented, but the name
      of this event exists, as well as the structure for it. The
      size of a TIMESTAMP is 16 bytes, where as a time extend is only
      8 bytes. The macro used to calculate how many time extends can fit on
      a page used the TIMESTAMP size instead of the time extend size
      cutting the amount in half.
      
      The following test case can easily trigger the warning since we only
      need to have half the page filled with time extends to trigger the
      warning:
      
       # cd /sys/kernel/debug/tracing/
       # echo function > current_tracer
       # echo 'common_pid < 0' > events/ftrace/function/filter
       # echo > trace
       # echo 1 > trace_marker
       # sleep 120
       # cat trace
      
      Enabling the function tracer and then setting the filter to only trace
      functions where the process id is negative (no events), then clearing
      the trace buffer to ensure that we have nothing in the buffer,
      then write to trace_marker to add an event to the beginning of a page,
      sleep for 2 minutes (only 35 seconds is probably needed, but this
      guarantees the bug), and then finally reading the trace which will
      trigger the bug.
      
      This patch fixes the typo and prevents the false positive of that warning.
      Reported-by: NHans J. Koch <hjk@linutronix.de>
      Tested-by: NHans J. Koch <hjk@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Stable Kernel <stable@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d0134324
  19. 05 9月, 2010 1 次提交
  20. 02 9月, 2010 1 次提交
    • S
      ring-buffer: Place duplicate expression into a single function · f6195aa0
      Steven Rostedt 提交于
      While discussing the strictness of the 80 character limit on the
      Kernel Summit Discussion mailing list, I showed examples that I
      broke that limit slightly with some algorithms. In discussing with
      John Linville, what looked better, I realized that two of the
      80 char breaking culprits were an identical expression.
      
      As a clean up, this patch moves the identical expression into its
      own helper function and that is used instead. As a side effect,
      the offending code is now under the 80 character limit. :-)
      
      This clean up code also changes the expression from
      
      	(A - B) - C  to  A - (B + C)
      
      This makes the code look a little nicer too.
      
      Cc: John W. Linville <linville@tuxdriver.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f6195aa0
  21. 07 8月, 2010 1 次提交
    • H
      tracing: Fix ring_buffer_read_page reading out of page boundary · 18fab912
      Huang Ying 提交于
      With the configuration: CONFIG_DEBUG_PAGEALLOC=y and Shaohua's patch:
      
      [PATCH]x86: make spurious_fault check correct pte bit
      
      Function call graph trace with the following will trigger a page fault.
      
      # cd /sys/kernel/debug/tracing/
      # echo function_graph > current_tracer
      # cat per_cpu/cpu1/trace_pipe_raw > /dev/null
      
      BUG: unable to handle kernel paging request at ffff880006e99000
      IP: [<ffffffff81085572>] rb_event_length+0x1/0x3f
      PGD 1b19063 PUD 1b1d063 PMD 3f067 PTE 6e99160
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/devices/virtual/net/lo/operstate
      CPU 1
      Modules linked in:
      
      Pid: 1982, comm: cat Not tainted 2.6.35-rc6-aes+ #300 /Bochs
      RIP: 0010:[<ffffffff81085572>]  [<ffffffff81085572>] rb_event_length+0x1/0x3f
      RSP: 0018:ffff880006475e38  EFLAGS: 00010006
      RAX: 0000000000000ff0 RBX: ffff88000786c630 RCX: 000000000000001d
      RDX: ffff880006e98000 RSI: 0000000000000ff0 RDI: ffff880006e99000
      RBP: ffff880006475eb8 R08: 000000145d7008bd R09: 0000000000000000
      R10: 0000000000008000 R11: ffffffff815d9336 R12: ffff880006d08000
      R13: ffff880006e605d8 R14: 0000000000000000 R15: 0000000000000018
      FS:  00007f2b83e456f0(0000) GS:ffff880002100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffff880006e99000 CR3: 00000000064a8000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process cat (pid: 1982, threadinfo ffff880006474000, task ffff880006e40770)
      Stack:
       ffff880006475eb8 ffffffff8108730f 0000000000000ff0 000000145d7008bd
      <0> ffff880006e98010 ffff880006d08010 0000000000000296 ffff88000786c640
      <0> ffffffff81002956 0000000000000000 ffff8800071f4680 ffff8800071f4680
      Call Trace:
       [<ffffffff8108730f>] ? ring_buffer_read_page+0x15a/0x24a
       [<ffffffff81002956>] ? return_to_handler+0x15/0x2f
       [<ffffffff8108a575>] tracing_buffers_read+0xb9/0x164
       [<ffffffff810debfe>] vfs_read+0xaf/0x150
       [<ffffffff81002941>] return_to_handler+0x0/0x2f
       [<ffffffff810248b0>] __bad_area_nosemaphore+0x17e/0x1a1
       [<ffffffff81002941>] return_to_handler+0x0/0x2f
       [<ffffffff810248e6>] bad_area_nosemaphore+0x13/0x15
      Code: 80 25 b2 16 b3 00 fe c9 c3 55 48 89 e5 f0 80 0d a4 16 b3 00 02 c9 c3 55 31 c0 48 89 e5 48 83 3d 94 16 b3 00 01 c9 0f 94 c0 c3 55 <8a> 0f 48 89 e5 83 e1 1f b8 08 00 00 00 0f b6 d1 83 fa 1e 74 27
      RIP  [<ffffffff81085572>] rb_event_length+0x1/0x3f
       RSP <ffff880006475e38>
      CR2: ffff880006e99000
      ---[ end trace a6877bb92ccb36bb ]---
      
      The root cause is that ring_buffer_read_page() may read out of page
      boundary, because the boundary checking is done after reading. This is
      fixed via doing boundary checking before reading.
      Reported-by: NShaohua Li <shaohua.li@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1280297641.2771.307.camel@yhuang-dev>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      18fab912
  22. 21 7月, 2010 1 次提交
  23. 04 6月, 2010 1 次提交
    • S
      tracing: Remove ftrace_preempt_disable/enable · 5168ae50
      Steven Rostedt 提交于
      The ftrace_preempt_disable/enable functions were to address a
      recursive race caused by the function tracer. The function tracer
      traces all functions which makes it easily susceptible to recursion.
      One area was preempt_enable(). This would call the scheduler and
      the schedulre would call the function tracer and loop.
      (So was it thought).
      
      The ftrace_preempt_disable/enable was made to protect against recursion
      inside the scheduler by storing the NEED_RESCHED flag. If it was
      set before the ftrace_preempt_disable() it would not call schedule
      on ftrace_preempt_enable(), thinking that if it was set before then
      it would have already scheduled unless it was already in the scheduler.
      
      This worked fine except in the case of SMP, where another task would set
      the NEED_RESCHED flag for a task on another CPU, and then kick off an
      IPI to trigger it. This could cause the NEED_RESCHED to be saved at
      ftrace_preempt_disable() but the IPI to arrive in the the preempt
      disabled section. The ftrace_preempt_enable() would not call the scheduler
      because the flag was already set before entring the section.
      
      This bug would cause a missed preemption check and cause lower latencies.
      
      Investigating further, I found that the recusion caused by the function
      tracer was not due to schedule(), but due to preempt_schedule(). Now
      that preempt_schedule is completely annotated with notrace, the recusion
      no longer is an issue.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5168ae50
  24. 25 5月, 2010 2 次提交
    • S
      ring-buffer: Move zeroing out excess in page to ring buffer code · 2711ca23
      Steven Rostedt 提交于
      Currently the trace splice code zeros out the excess bytes in the page before
      sending it off to userspace.
      
      This is to make sure userspace is not getting anything it should not be
      when reading the pages, because the excess data was never initialized
      to zero before writing (for perfomance reasons).
      
      But the splice code has no business in doing this work, it should be
      done by the ring buffer. With the latest changes for recording lost
      events, the splice code gets it wrong anyway.
      
      Move the zeroing out of excess bytes into the ring buffer code.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2711ca23
    • S
      ring-buffer: Reset "real_end" when page is filled · b3230c8b
      Steven Rostedt 提交于
      The code to store the "lost events" requires knowing the real end
      of the page. Since the 'commit' includes the padding at the end of
      a page a "real_end" variable was used to keep track of the end not
      including the padding.
      
      If events were lost, the reader can place the count of events in
      the padded area if there is enough room.
      
      The bug this patch fixes is that when we fill the page we do not
      reset the real_end variable, and if the writer had wrapped a few
      times, the real_end would be incorrect.
      
      This patch simply resets the real_end if the page was filled.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b3230c8b
  25. 05 5月, 2010 1 次提交
  26. 28 4月, 2010 1 次提交
    • D
      ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd
      David Miller 提交于
      When performing a non-consuming read, a synchronize_sched() is
      performed once for every cpu which is actively tracing.
      
      This is very expensive, and can make it take several seconds to open
      up the 'trace' file with lots of cpus.
      
      Only one synchronize_sched() call is actually necessary.  What is
      desired is for all cpus to see the disabling state change.  So we
      transform the existing sequence:
      
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      where each ring_buffer_start() call performs a synchronize_sched(),
      into the following:
      
      	for_each_cpu() {
      		ring_buffer_read_prepare();
      	}
      	ring_buffer_read_prepare_sync();
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      wherein only the single ring_buffer_read_prepare_sync() call needs to
      do the synchronize_sched().
      
      The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
      memory and increments ->record_disabled.
      
      In the second phase, ring_buffer_read_prepare_sync() makes sure this
      ->record_disabled state is visible fully to all cpus.
      
      And in the final third phase, the ring_buffer_read_start() calls reset
      the 'iter' objects allocated in the first phase since we now know that
      none of the cpus are adding trace entries any more.
      
      This makes openning the 'trace' file nearly instantaneous on a
      sparc64 Niagara2 box with 128 cpus tracing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      72c9ddfd
  27. 01 4月, 2010 1 次提交
    • S
      ring-buffer: Add lost event count to end of sub buffer · ff0ff84a
      Steven Rostedt 提交于
      Currently, binary readers of the ring buffer only know where events were
      lost, but not how many events were lost at that location.
      This information is available, but it would require adding another
      field to the sub buffer header to include it.
      
      But when a event can not fit at the end of a sub buffer, it is written
      to the next sub buffer. This means there is a good chance that the
      buffer may have room to hold this counter. If it does, write
      the counter at the end of the sub buffer and set another flag
      in the data size field that states that this information exists.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff0ff84a