1. 24 11月, 2015 2 次提交
    • S
      ring-buffer: Put back the length if crossed page with add_timestamp · bd1b7cd3
      Steven Rostedt (Red Hat) 提交于
      Commit fcc742ea "ring-buffer: Add event descriptor to simplify passing
      data" added a descriptor that holds various data instead of passing around
      several variables through parameters. The problem was that one of the
      parameters was modified in a function and the code was designed not to have
      an effect on that modified  parameter. Now that the parameter is a
      descriptor and any modifications to it are non-volatile, the size of the
      data could be unnecessarily expanded.
      
      Remove the extra space added if a timestamp was added and the event went
      across the page.
      
      Cc: stable@vger.kernel.org # 4.3+
      Fixes: fcc742ea "ring-buffer: Add event descriptor to simplify passing data"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd1b7cd3
    • S
      ring-buffer: Update read stamp with first real commit on page · b81f472a
      Steven Rostedt (Red Hat) 提交于
      Do not update the read stamp after swapping out the reader page from the
      write buffer. If the reader page is swapped out of the buffer before an
      event is written to it, then the read_stamp may get an out of date
      timestamp, as the page timestamp is updated on the first commit to that
      page.
      
      rb_get_reader_page() only returns a page if it has an event on it, otherwise
      it will return NULL. At that point, check if the page being returned has
      events and has not been read yet. Then at that point update the read_stamp
      to match the time stamp of the reader page.
      
      Cc: stable@vger.kernel.org # 2.6.30+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b81f472a
  2. 11 11月, 2015 1 次提交
    • S
      bpf_trace: Make dependent on PERF_EVENTS · a31d82d8
      Steven Rostedt 提交于
      Arnd Bergmann reported:
      
        In my ARM randconfig tests, I'm getting a build error for
        newly added code in bpf_perf_event_read and bpf_perf_event_output
        whenever CONFIG_PERF_EVENTS is disabled:
      
        kernel/trace/bpf_trace.c: In function 'bpf_perf_event_read':
        kernel/trace/bpf_trace.c:203:11: error: 'struct perf_event' has no member named 'oncpu'
        if (event->oncpu != smp_processor_id() ||
                 ^
        kernel/trace/bpf_trace.c:204:11: error: 'struct perf_event' has no member named 'pmu'
              event->pmu->count)
      
        This can happen when UPROBE_EVENT is enabled but KPROBE_EVENT
        is disabled. I'm not sure if that is a configuration we care
        about, otherwise we could prevent this case from occuring by
        adding Kconfig dependencies.
      
      Looking at this further, it's really that UPROBE_EVENT enables PERF_EVENTS.
      By just having BPF_EVENTS depend on PERF_EVENTS, then all is fine.
      
      Link: http://lkml.kernel.org/r/4525348.Aq9YoXkChv@wuerfelReported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a31d82d8
  3. 10 11月, 2015 1 次提交
  4. 08 11月, 2015 1 次提交
  5. 06 11月, 2015 1 次提交
  6. 04 11月, 2015 9 次提交
    • S
      tracing: Put back comma for empty fields in boot string parsing · 43ed3843
      Steven Rostedt (Red Hat) 提交于
      Both early_enable_events() and apply_trace_boot_options() parse a boot
      string that may get parsed later on. They both use strsep() which converts a
      comma into a nul character. To still allow the boot string to be parsed
      again the same way, the nul character gets converted back to a comma after
      the token is processed.
      
      The problem is that these two functions check for an empty parameter (two
      commas in a row ",,"), and continue the loop if the parameter is empty, but
      fails to place the comma back. In this case, the second parsing will end at
      this blank field, and not process fields afterward.
      
      In most cases, users should not have an empty field, but if its going to be
      checked, the code might as well be correct.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      43ed3843
    • J
      tracing: Apply tracer specific options from kernel command line. · a4d1e688
      Jiaxing Wang 提交于
      Currently, the trace_options parameter is only applied in
      tracer_alloc_buffers() when global_trace.current_trace is nop_trace,
      so a tracer specific option will not be applied even when the specific
      tracer is also enabled from kernel command line. For example, the
      'func_stack_trace' option can't be enabled with the following kernel
      parameter:
      
        ftrace=function ftrace_filter=kfree trace_options=func_stack_trace
      
      We can enable tracer specific options by simply apply the options again
      if the specific tracer is also supplied from command line and started
      in register_tracer().
      
      To make trace_boot_options_buf can be parsed again, a comma and a space
      is put back if they were replaced by strsep and strstrip respectively.
      
      Also make register_tracer() be __init to access the __init data, and
      in fact register_tracer is only called from __init code.
      
      Link: http://lkml.kernel.org/r/1446599669-9294-1-git-send-email-hello.wjx@gmail.comSigned-off-by: NJiaxing Wang <hello.wjx@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a4d1e688
    • S
      ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark · 54ed1444
      Steven Rostedt (Red Hat) 提交于
      wake_up_process() has a memory barrier before doing anything, thus adding a
      memory barrier before calling it is redundant.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      54ed1444
    • S
      tracing: Allow dumping traces without tracking trace started cpus · 919cd979
      Sasha Levin 提交于
      We don't init iter->started when dumping the ftrace buffer, and there's no
      real need to do so - so allow skipping that check if the iter doesn't have
      an initialized ->started cpumask.
      
      Link: http://lkml.kernel.org/r/1441385156-27279-1-git-send-email-sasha.levin@oracle.comSigned-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      919cd979
    • P
      ring_buffer: Fix more races when terminating the producer in the benchmark · f47cb66d
      Petr Mladek 提交于
      The commit b44754d8 ("ring_buffer: Allow to exit the ring
      buffer benchmark immediately") added a hack into ring_buffer_producer()
      that set @kill_test when kthread_should_stop() returned true. It improved
      the situation a lot. It stopped the kthread in most cases because
      the producer spent most of the time in the patched while cycle.
      
      But there are still few possible races when kthread_should_stop()
      is set outside of the cycle. Then we do not set @kill_test and
      some other checks pass.
      
      This patch adds a better fix. It renames @test_kill/TEST_KILL() into
      a better descriptive @test_error/TEST_ERROR(). Also it introduces
      break_test() function that checks for both @test_error and
      kthread_should_stop().
      
      The new function is used in the producer when the check for @test_error
      is not enough. It is not used in the consumer because its state
      is manipulated by the producer via the "reader_finish" variable.
      
      Also we add a missing check into ring_buffer_producer_thread()
      between setting TASK_INTERRUPTIBLE and calling schedule_timeout().
      Otherwise, we might miss a wakeup from kthread_stop().
      
      Link: http://lkml.kernel.org/r/1441629518-32712-3-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f47cb66d
    • P
      ring_buffer: Do no not complete benchmark reader too early · 8b46ff69
      Petr Mladek 提交于
      It seems that complete(&read_done) might be called too early
      in some situations.
      
      1st scenario:
      -------------
      
      CPU0					CPU1
      
      ring_buffer_producer_thread()
        wake_up_process(consumer);
        wait_for_completion(&read_start);
      
      					ring_buffer_consumer_thread()
      					  complete(&read_start);
      
        ring_buffer_producer()
          # producing data in
          # the do-while cycle
      
      					  ring_buffer_consumer();
      					    # reading data
      					    # got error
      					    # set kill_test = 1;
      					    set_current_state(
      						TASK_INTERRUPTIBLE);
      					    if (reader_finish)  # false
      					    schedule();
      
          # producer still in the middle of
          # do-while cycle
          if (consumer && !(cnt % wakeup_interval))
            wake_up_process(consumer);
      
      					    # spurious wakeup
      					    while (!reader_finish &&
      						   !kill_test)
      					    # leaving because
      					    # kill_test == 1
      					    reader_finish = 0;
      					    complete(&read_done);
      
      1st BANG: We might access uninitialized "read_done" if this is the
      	  the first round.
      
          # producer finally leaving
          # the do-while cycle because kill_test == 1;
      
          if (consumer) {
            reader_finish = 1;
            wake_up_process(consumer);
            wait_for_completion(&read_done);
      
      2nd BANG: This will never complete because consumer already did
      	  the completion.
      
      2nd scenario:
      -------------
      
      CPU0					CPU1
      
      ring_buffer_producer_thread()
        wake_up_process(consumer);
        wait_for_completion(&read_start);
      
      					ring_buffer_consumer_thread()
      					  complete(&read_start);
      
        ring_buffer_producer()
          # CPU3 removes the module	  <--- difference from
          # and stops producer          <--- the 1st scenario
          if (kthread_should_stop())
            kill_test = 1;
      
      					  ring_buffer_consumer();
      					    while (!reader_finish &&
      						   !kill_test)
      					    # kill_test == 1 => we never go
      					    # into the top level while()
      					    reader_finish = 0;
      					    complete(&read_done);
      
          # producer still in the middle of
          # do-while cycle
          if (consumer && !(cnt % wakeup_interval))
            wake_up_process(consumer);
      
      					    # spurious wakeup
      					    while (!reader_finish &&
      						   !kill_test)
      					    # leaving because kill_test == 1
      					    reader_finish = 0;
      					    complete(&read_done);
      
      BANG: We are in the same "bang" situations as in the 1st scenario.
      
      Root of the problem:
      --------------------
      
      ring_buffer_consumer() must complete "read_done" only when "reader_finish"
      variable is set. It must not be skipped due to other conditions.
      
      Note that we still must keep the check for "reader_finish" in a loop
      because there might be spurious wakeups as described in the
      above scenarios.
      
      Solution:
      ----------
      
      The top level cycle in ring_buffer_consumer() will finish only when
      "reader_finish" is set. The data will be read in "while-do" cycle
      so that they are not read after an error (kill_test == 1)
      or a spurious wake up.
      
      In addition, "reader_finish" is manipulated by the producer thread.
      Therefore we add READ_ONCE() to make sure that the fresh value is
      read in each cycle. Also we add the corresponding barrier
      to synchronize the sleep check.
      
      Next we set the state back to TASK_RUNNING for the situation where we
      did not sleep.
      
      Just from paranoid reasons, we initialize both completions statically.
      This is safer, in case there are other races that we are unaware of.
      
      As a side effect we could remove the memory barrier from
      ring_buffer_producer_thread(). IMHO, this was the reason for
      the barrier. ring_buffer_reset() uses spin locks that should
      provide the needed memory barrier for using the buffer.
      
      Link: http://lkml.kernel.org/r/1441629518-32712-2-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8b46ff69
    • D
      tracing: Remove redundant TP_ARGS redefining · fb8c2293
      Dmitry Safonov 提交于
      TP_ARGS is not used anywhere in trace.h nor trace_entries.h
      Firstly, I left just #undef TP_ARGS and had no errors - remove it.
      
      Link: http://lkml.kernel.org/r/1446576560-14085-1-git-send-email-0x7f454c46@gmail.comSigned-off-by: NDmitry Safonov <0x7f454c46@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fb8c2293
    • S
      tracing: Rename max_stack_lock to stack_trace_max_lock · d332736d
      Steven Rostedt (Red Hat) 提交于
      Now that max_stack_lock is a global variable, it requires a naming
      convention that is unlikely to collide. Rename it to the same naming
      convention that the other stack_trace variables have.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d332736d
    • A
      tracing: Allow arch-specific stack tracer · bb99d8cc
      AKASHI Takahiro 提交于
      A stack frame may be used in a different way depending on cpu architecture.
      Thus it is not always appropriate to slurp the stack contents, as current
      check_stack() does, in order to calcurate a stack index (height) at a given
      function call. At least not on arm64.
      In addition, there is a possibility that we will mistakenly detect a stale
      stack frame which has not been overwritten.
      
      This patch makes check_stack() a weak function so as to later implement
      arch-specific version.
      
      Link: http://lkml.kernel.org/r/1446182741-31019-5-git-send-email-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bb99d8cc
  7. 03 11月, 2015 11 次提交
  8. 30 10月, 2015 1 次提交
    • D
      blktrace: re-write setting q->blk_trace · cdea01b2
      Davidlohr Bueso 提交于
      This is really about simplifying the double xchg patterns into
      a single cmpxchg, with the same logic. Other than the immediate
      cleanup, there are some subtleties this change deals with:
      
      (i) While the load of the old bt is fully ordered wrt everything,
      ie:
      
              old_bt = xchg(&q->blk_trace, bt);             [barrier]
              if (old_bt)
      	     (void) xchg(&q->blk_trace, old_bt);    [barrier]
      
      blk_trace could still be changed between the xchg and the old_bt
      load. Note that this description is merely theoretical and afaict
      very small, but doing everything in a single context with cmpxchg
      closes this potential race.
      
      (ii) Ordering guarantees are obviously kept with cmpxchg.
      
      (iii) Gets rid of the hacky-by-nature (void)xchg pattern.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      eviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cdea01b2
  9. 27 10月, 2015 2 次提交
  10. 26 10月, 2015 4 次提交
    • S
      tracing: Fix sparse RCU warning · fb662288
      Steven Rostedt (Red Hat) 提交于
      p_start() and p_stop() are seq_file functions that match. Teach sparse to
      know that rcu_read_lock_sched() that is taken by p_start() is released by
      p_stop.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      fb662288
    • S
      tracing: Check all tasks on each CPU when filtering pids · 8ca532ad
      Steven Rostedt (Red Hat) 提交于
      My tests found that if a task is running but not filtered when set_event_pid
      is modified, then it can still be traced.
      
      Call on_each_cpu() to check if the current running task should be filtered
      and update the per cpu flags of tr->data appropriately.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8ca532ad
    • S
      tracing: Implement event pid filtering · 3fdaf80f
      Steven Rostedt (Red Hat) 提交于
      Add the necessary hooks to use the pids loaded in set_event_pid to filter
      all the events enabled in the tracing instance that match the pids listed.
      
      Two probes are added to both sched_switch and sched_wakeup tracepoints to be
      called before other probes are called and after the other probes are called.
      The first is used to set the necessary flags to let the probes know to test
      if they should be traced or not.
      
      The sched_switch pre probe will set the "ignore_pid" flag if neither the
      previous or next task has a matching pid.
      
      The sched_switch probe will set the "ignore_pid" flag if the next task
      does not match the matching pid.
      
      The pre probe allows for probes tracing sched_switch to be traced if
      necessary.
      
      The sched_wakeup pre probe will set the "ignore_pid" flag if neither the
      current task nor the wakee task has a matching pid.
      
      The sched_wakeup post probe will set the "ignore_pid" flag if the current
      task does not have a matching pid.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3fdaf80f
    • S
      tracing: Add set_event_pid directory for future use · 49090107
      Steven Rostedt (Red Hat) 提交于
      Create a tracing directory called set_event_pid, which currently has no
      function, but will be used to filter all events for the tracing instance or
      the pids that are added to the file.
      
      The reason no functionality is added with this commit is that this commit
      focuses on the creation and removal of the pids in a safe manner. And tests
      can be made against this change to make sure things are correct before
      hooking features to the list of pids.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      49090107
  11. 22 10月, 2015 1 次提交
    • A
      bpf: introduce bpf_perf_event_output() helper · a43eec30
      Alexei Starovoitov 提交于
      This helper is used to send raw data from eBPF program into
      special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
      User space needs to perf_event_open() it (either for one or all cpus) and
      store FD into perf_event_array (similar to bpf_perf_event_read() helper)
      before eBPF program can send data into it.
      
      Today the programs triggered by kprobe collect the data and either store
      it into the maps or print it via bpf_trace_printk() where latter is the debug
      facility and not suitable to stream the data. This new helper replaces
      such bpf_trace_printk() usage and allows programs to have dedicated
      channel into user space for post-processing of the raw data collected.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a43eec30
  12. 21 10月, 2015 5 次提交
  13. 20 10月, 2015 1 次提交
    • S
      tracing: Have stack tracer force RCU to be watching · a2d76290
      Steven Rostedt (Red Hat) 提交于
      The stack tracer was triggering the WARN_ON() in module.c:
      
       static void module_assert_mutex_or_preempt(void)
       {
       #ifdef CONFIG_LOCKDEP
      	if (unlikely(!debug_locks))
      		return;
      
      	WARN_ON(!rcu_read_lock_sched_held() &&
      		!lockdep_is_held(&module_mutex));
       #endif
       }
      
      The reason is that the stack tracer traces all function calls, and some of
      those calls happen while exiting or entering user space and idle. Some of
      these functions are called after RCU had already stopped watching, as RCU
      does not watch userspace or idle CPUs.
      
      If a max stack is hit, then the save_stack_trace() is called, which will
      check module addresses and call module_assert_mutex_or_preempt(), and then
      trigger the warning. Sad part is, the warning itself will also do a stack
      trace and tigger the same warning. That probably should be fixed.
      
      The warning was added by 0be964be "module: Sanitize RCU usage and
      locking" but this bug has probably been around longer. But it's unlikely to
      cause much harm, but the new warning causes the system to lock up.
      
      Cc: stable@vger.kernel.org # 4.2+
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc:"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a2d76290