1. 20 6月, 2016 7 次提交
  2. 08 6月, 2016 1 次提交
  3. 21 5月, 2016 1 次提交
    • S
      ftrace: Don't disable irqs when taking the tasklist_lock read_lock · 6112a300
      Soumya PN 提交于
      In ftrace.c inside the function alloc_retstack_tasklist() (which will be
      invoked when function_graph tracing is on) the tasklist_lock is being
      held as reader while iterating through a list of threads. Here the lock
      is being held as reader with irqs disabled. The tasklist_lock is never
      write_locked in interrupt context so it is safe to not disable interrupts
      for the duration of read_lock in this block which, can be significant,
      given the block of code iterates through all threads. Hence changing the
      code to call read_lock() and read_unlock() instead of read_lock_irqsave()
      and read_unlock_irqrestore().
      
      A similar change was made in commits: 8063e41d ("tracing: Change
      syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")'
      and 3472eaa1 ("sched: normalize_rt_tasks(): Don't use _irqsave for
      tasklist_lock, use task_rq_lock()")'
      
      Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.comSigned-off-by: NSoumya PN <soumya.p.n@hpe.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6112a300
  4. 14 5月, 2016 1 次提交
    • S
      ring-buffer: Prevent overflow of size in ring_buffer_resize() · 59643d15
      Steven Rostedt (Red Hat) 提交于
      If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
      then the DIV_ROUND_UP() will return zero.
      
      Here's the details:
      
        # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb
      
      tracing_entries_write() processes this and converts kb to bytes.
      
       18014398509481980 << 10 = 18446744073709547520
      
      and this is passed to ring_buffer_resize() as unsigned long size.
      
       size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
      
      Where DIV_ROUND_UP(a, b) is (a + b - 1)/b
      
      BUF_PAGE_SIZE is 4080 and here
      
       18446744073709547520 + 4080 - 1 = 18446744073709551599
      
      where 18446744073709551599 is still smaller than 2^64
      
       2^64 - 18446744073709551599 = 17
      
      But now 18446744073709551599 / 4080 = 4521260802379792
      
      and size = size * 4080 = 18446744073709551360
      
      This is checked to make sure its still greater than 2 * 4080,
      which it is.
      
      Then we convert to the number of buffer pages needed.
      
       nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)
      
      but this time size is 18446744073709551360 and
      
       2^64 - (18446744073709551360 + 4080 - 1) = -3823
      
      Thus it overflows and the resulting number is less than 4080, which makes
      
        3823 / 4080 = 0
      
      an nr_pages is set to this. As we already checked against the minimum that
      nr_pages may be, this causes the logic to fail as well, and we crash the
      kernel.
      
      There's no reason to have the two DIV_ROUND_UP() (that's just result of
      historical code changes), clean up the code and fix this bug.
      
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      59643d15
  5. 13 5月, 2016 1 次提交
    • S
      ring-buffer: Use long for nr_pages to avoid overflow failures · 9b94a8fb
      Steven Rostedt (Red Hat) 提交于
      The size variable to change the ring buffer in ftrace is a long. The
      nr_pages used to update the ring buffer based on the size is int. On 64 bit
      machines this can cause an overflow problem.
      
      For example, the following will cause the ring buffer to crash:
      
       # cd /sys/kernel/debug/tracing
       # echo 10 > buffer_size_kb
       # echo 8556384240 > buffer_size_kb
      
      Then you get the warning of:
      
       WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
      
      Which is:
      
        RB_WARN_ON(cpu_buffer, nr_removed);
      
      Note each ring buffer page holds 4080 bytes.
      
      This is because:
      
       1) 10 causes the ring buffer to have 3 pages.
          (10kb requires 3 * 4080 pages to hold)
      
       2) (2^31 / 2^10  + 1) * 4080 = 8556384240
          The value written into buffer_size_kb is shifted by 10 and then passed
          to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
      
       3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
          which is 4080. 8761737461760 / 4080 = 2147484672
      
       4) nr_pages is subtracted from the current nr_pages (3) and we get:
          2147484669. This value is saved in a signed integer nr_pages_to_update
      
       5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
          turns into the value of -2147482627
      
       6) As the value is a negative number, in update_pages_handler() it is
          negated and passed to rb_remove_pages() and 2147482627 pages will
          be removed, which is much larger than 3 and it causes the warning
          because not all the pages asked to be removed were removed.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
      
      Cc: stable@vger.kernel.org # 2.6.28+
      Fixes: 7a8e76a3 ("tracing: unified trace buffer")
      Reported-by: NHao Qin <QEver.cn@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9b94a8fb
  6. 10 5月, 2016 2 次提交
  7. 04 5月, 2016 2 次提交
    • S
      tracing: Use temp buffer when filtering events · 0fc1b09f
      Steven Rostedt (Red Hat) 提交于
      Filtering of events requires the data to be written to the ring buffer
      before it can be decided to filter or not. This is because the parameters of
      the filter are based on the result that is written to the ring buffer and
      not on the parameters that are passed into the trace functions.
      
      The ftrace ring buffer is optimized for writing into the ring buffer and
      committing. The discard procedure used when filtering decides the event
      should be discarded is much more heavy weight. Thus, using a temporary
      filter when filtering events can speed things up drastically.
      
      Without a temp buffer we have:
      
       # trace-cmd start -p nop
       # perf stat -r 10 hackbench 50
             0.790706626 seconds time elapsed ( +-  0.71% )
      
       # trace-cmd start -e all
       # perf stat -r 10 hackbench 50
             1.566904059 seconds time elapsed ( +-  0.27% )
      
       # trace-cmd start -e all -f 'common_preempt_count==20'
       # perf stat -r 10 hackbench 50
             1.690598511 seconds time elapsed ( +-  0.19% )
      
       # trace-cmd start -e all -f 'common_preempt_count!=20'
       # perf stat -r 10 hackbench 50
             1.707486364 seconds time elapsed ( +-  0.30% )
      
      The first run above is without any tracing, just to get a based figure.
      hackbench takes ~0.79 seconds to run on the system.
      
      The second run enables tracing all events where nothing is filtered. This
      increases the time by 100% and hackbench takes 1.57 seconds to run.
      
      The third run filters all events where the preempt count will equal "20"
      (this should never happen) thus all events are discarded. This takes 1.69
      seconds to run. This is 10% slower than just committing the events!
      
      The last run enables all events and filters where the filter will commit all
      events, and this takes 1.70 seconds to run. The filtering overhead is
      approximately 10%. Thus, the discard and commit of an event from the ring
      buffer may be about the same time.
      
      With this patch, the numbers change:
      
       # trace-cmd start -p nop
       # perf stat -r 10 hackbench 50
             0.778233033 seconds time elapsed ( +-  0.38% )
      
       # trace-cmd start -e all
       # perf stat -r 10 hackbench 50
             1.582102692 seconds time elapsed ( +-  0.28% )
      
       # trace-cmd start -e all -f 'common_preempt_count==20'
       # perf stat -r 10 hackbench 50
             1.309230710 seconds time elapsed ( +-  0.22% )
      
       # trace-cmd start -e all -f 'common_preempt_count!=20'
       # perf stat -r 10 hackbench 50
             1.786001924 seconds time elapsed ( +-  0.20% )
      
      The first run is again the base with no tracing.
      
      The second run is all tracing with no filtering. It is a little slower, but
      that may be well within the noise.
      
      The third run shows that discarding all events only took 1.3 seconds. This
      is a speed up of 23%! The discard is much faster than even the commit.
      
      The one downside is shown in the last run. Events that are not discarded by
      the filter will take longer to add, this is due to the extra copy of the
      event.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0fc1b09f
    • C
      tracing: Don't display trigger file for events that can't be enabled · 854145e0
      Chunyu Hu 提交于
      Currently register functions for events will be called
      through the 'reg' field of event class directly without
      any check when seting up triggers.
      
      Triggers for events that don't support register through
      debug fs (events under events/ftrace are for trace-cmd to
      read event format, and most of them don't have a register
      function except events/ftrace/functionx) can't be enabled
      at all, and an oops will be hit when setting up trigger
      for those events, so just not creating them is an easy way
      to avoid the oops.
      
      Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com
      
      Cc: stable@vger.kernel.org # 3.14+
      Fixes: 85f2b082 ("tracing: Add basic event trigger framework")
      Signed-off-by: NChunyu Hu <chuhu@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      854145e0
  8. 03 5月, 2016 1 次提交
  9. 30 4月, 2016 6 次提交
  10. 27 4月, 2016 4 次提交
  11. 26 4月, 2016 4 次提交
  12. 20 4月, 2016 10 次提交