1. 04 12月, 2008 7 次提交
    • S
      ftrace: add ability to only trace swapper tasks · e32d8956
      Steven Rostedt 提交于
      Impact: new feature
      
      This patch lets the swapper tasks of all CPUS be filtered by the
      set_ftrace_pid file.
      
      If '0' is echoed into this file, then all the idle tasks (aka swapper)
      is flagged to be traced.  This affects all CPU idle tasks.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e32d8956
    • S
      ftrace: use struct pid · 978f3a45
      Steven Rostedt 提交于
      Impact: clean up, extend PID filtering to PID namespaces
      
      Eric Biederman suggested using the struct pid for filtering on
      pids in the kernel. This patch is based off of a demonstration
      of an implementation that Eric sent me in an email.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      978f3a45
    • S
      ftrace: trace single pid for function graph tracer · 804a6851
      Steven Rostedt 提交于
      Impact: New feature
      
      This patch makes the changes to set_ftrace_pid apply to the function
      graph tracer.
      
        # echo $$ > /debugfs/tracing/set_ftrace_pid
        # echo function_graph > /debugfs/tracing/current_tracer
      
      Will cause only the current task to be traced. Note, the trace flags are
      also inherited by child processes, so the children of the shell
      will also be traced.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      804a6851
    • S
      ftrace: use task struct trace flag to filter on pid · 0ef8cde5
      Steven Rostedt 提交于
      Impact: clean up
      
      Use the new task struct trace flags to determine if a process should be
      traced or not.
      
      Note: this moves the searching of the pid to the slow path of setting
      the pid field. This needs to be converted to the pid name space.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ef8cde5
    • S
      ftrace: graph of a single function · ea4e2bc4
      Steven Rostedt 提交于
      This patch adds the file:
      
         /debugfs/tracing/set_graph_function
      
      which can be used along with the function graph tracer.
      
      When this file is empty, the function graph tracer will act as
      usual. When the file has a function in it, the function graph
      tracer will only trace that function.
      
      For example:
      
       # echo blk_unplug > /debugfs/tracing/set_graph_function
       # cat /debugfs/tracing/trace
       [...]
       ------------------------------------------
       | 2)  make-19003  =>  kjournald-2219
       ------------------------------------------
      
       2)               |  blk_unplug() {
       2)               |    dm_unplug_all() {
       2)               |      dm_get_table() {
       2)      1.381 us |        _read_lock();
       2)      0.911 us |        dm_table_get();
       2)      1. 76 us |        _read_unlock();
       2) +   12.912 us |      }
       2)               |      dm_table_unplug_all() {
       2)               |        blk_unplug() {
       2)      0.778 us |          generic_unplug_device();
       2)      2.409 us |        }
       2)      5.992 us |      }
       2)      0.813 us |      dm_table_put();
       2) +   29. 90 us |    }
       2) +   34.532 us |  }
      
      You can add up to 32 functions into this file. Currently we limit it
      to 32, but this may change with later improvements.
      
      To add another function, use the append '>>':
      
        # echo sys_read >> /debugfs/tracing/set_graph_function
        # cat /debugfs/tracing/set_graph_function
        blk_unplug
        sys_read
      
      Using the '>' will clear out the function and write anew:
      
        # echo sys_write > /debug/tracing/set_graph_function
        # cat /debug/tracing/set_graph_function
        sys_write
      
      Note, if you have function graph running while doing this, the small
      time between clearing it and updating it will cause the graph to
      record all functions. This should not be an issue because after
      it sets the filter, only those functions will be recorded from then on.
      If you need to only record a particular function then set this
      file first before starting the function graph tracer. In the future
      this side effect may be corrected.
      
      The set_graph_function file is similar to the set_ftrace_filter but
      it does not take wild cards nor does it allow for more than one
      function to be set with a single write. There is no technical reason why
      this is the case, I just do not have the time yet to implement that.
      
      Note, dynamic ftrace must be enabled for this to appear because it
      uses the dynamic ftrace records to match the name to the mcount
      call sites.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea4e2bc4
    • S
      ftrace: fix race in function graph during fork · e8e1abe9
      Steven Rostedt 提交于
      Impact: graph tracer race/crash fix
      
      There is a nasy race in startup of a new process running the
      function graph tracer. In fork.c:
      
      	total_forks++;
      	spin_unlock(&current->sighand->siglock);
      	write_unlock_irq(&tasklist_lock);
      	ftrace_graph_init_task(p);
      	proc_fork_connector(p);
      	cgroup_post_fork(p);
      	return p;
      
      The new task is free to run as soon as the tasklist_lock is released.
      This is before the ftrace_graph_init_task. If the task does run
      it will be using the same ret_stack and curr_ret_stack as the parent.
      This will cause crashes that are difficult to debug.
      
      This patch moves the ftrace_graph_init_task to just after the alloc_pid
      code. This fixes the above race.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e8e1abe9
    • S
      trace: fix output of stack trace · 0a37119d
      Steven Rostedt 提交于
      Impact: fix to output of stack trace
      
      If a function is not found in the stack of the stack tracer, the
      number printed is quite strange. This fixes the algorithm to handle
      missing functions better.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a37119d
  2. 03 12月, 2008 9 次提交
    • I
      tracing/function-graph-tracer: enabled by default · 764f3b95
      Ingo Molnar 提交于
      CONFIG_FUNCTION_GRAPH_TRACER depends on FUNCTION_TRACER already,
      (turning it non-default) so it so making it default-n is pointless.
      
      So enable it by default - it's a nice extension of the function tracer.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      764f3b95
    • F
      tracing/function-graph-tracer: improve duration output · 166d3c79
      Frederic Weisbecker 提交于
      Impact: better trace output of duration for long calls
      
      The old duration output didn't exceeded 9999.999 us to fit the column
      and the nanosecs were always 3 numbers. As Ingo suggested, it's better
      to have the whole microseconds elapsed time and shift the nanosecs precision
      if needed to fit the maximum 7 numbers. And usec need more number, the case
      should be rare and important enough to break a bit the column alignment to
      show it.
      
      So, depending of the duration value, we now have these patterns:
      
          u.nnn us
         uu.nnn us
        uuu.nnn us
       uuuu.nnn us
       uuuuu.nn us
       uuuuuu.n us
       uuuuuuuu..... us
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      166d3c79
    • F
      tracing/function-graph-tracer: display unified style cmdline and pid · 11e84acc
      Frederic Weisbecker 提交于
      Impact: extend function-graph output: let one know which thread called a function
      
      This patch implements a helper function to print the couple cmdline/pid.
      Its output is provided during task switching and on each row if the new
      "funcgraph-proc" defualt-off option is set through trace_options file.
      
      The output is center aligned and never exceeds 14 characters. The cmdline
      is truncated over 7 chars.
      But note that if the pid exceeds 6 characters, the column will overflow (but
      the situation is abnormal).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      11e84acc
    • S
      ftrace: function graph return for function entry · e49dc19c
      Steven Rostedt 提交于
      Impact: feature, let entry function decide to trace or not
      
      This patch lets the graph tracer entry function decide if the tracing
      should be done at the end as well. This requires all function graph
      entry functions return 1 if it should trace, or 0 if the return should
      not be traced.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e49dc19c
    • S
      ring-buffer: change "page" variable names to "bpage" · 044fa782
      Steven Rostedt 提交于
      Impact: clean up
      
      Andrew Morton pointed out that the kernel convention of a variable
      named page should be of type page struct. The ring buffer uses
      a variable named "page" for a pointer to something else.
      
      This patch converts those to be called "bpage" (as in "buffer page").
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      044fa782
    • S
      ftrace: add ftrace_graph_stop() · 14a866c5
      Steven Rostedt 提交于
      Impact: new ftrace_graph_stop function
      
      While developing more features of function graph, I hit a bug that
      caused the WARN_ON to trigger in the prepare_ftrace_return function.
      Well, it was hard for me to find out that was happening because the
      bug would not print, it would just cause a hard lockup or reboot.
      The reason is that it is not safe to call printk from this function.
      
      Looking further, I also found that it calls unregister_ftrace_graph,
      which grabs a mutex and calls kstop machine. This would definitely
      lock the box up if it were to trigger.
      
      This patch adds a fast and safe ftrace_graph_stop() which will
      stop the function tracer. Then it is safe to call the WARN ON.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14a866c5
    • S
      ring-buffer: read page interface · 8789a9e7
      Steven Rostedt 提交于
      Impact: new API to ring buffer
      
      This patch adds a new interface into the ring buffer that allows a
      page to be read from the ring buffer on a given CPU. For every page
      read, one must also be given to allow for a "swap" of the pages.
      
       rpage = ring_buffer_alloc_read_page(buffer);
       if (!rpage)
      	goto err;
       ret = ring_buffer_read_page(buffer, &rpage, cpu, full);
       if (!ret)
      	goto empty;
       process_page(rpage);
       ring_buffer_free_read_page(rpage);
      
      The caller of these functions must handle any waits that are
      needed to wait for new data. The ring_buffer_read_page will simply
      return 0 if there is no data, or if "full" is set and the writer
      is still on the current page.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8789a9e7
    • S
      ring-buffer: move some metadata into buffer page · abc9b56d
      Steven Rostedt 提交于
      Impact: get ready for splice changes
      
      This patch moves the commit and timestamp into the beginning of each
      data page of the buffer. This change will allow the page to be moved
      to another location (disk, network, etc) and still have information
      in the page to be able to read it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      abc9b56d
    • S
      ftrace: replace raw_local_irq_save with local_irq_save · a5e25883
      Steven Rostedt 提交于
      Impact: fix for lockdep and ftrace
      
      The raw_local_irq_save/restore confuses lockdep. This patch
      converts them to the local_irq_save/restore variants.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a5e25883
  3. 02 12月, 2008 4 次提交
    • F
      tracing/function-graph-tracer: support for x86-64 · 48d68b20
      Frederic Weisbecker 提交于
      Impact: extend and enable the function graph tracer to 64-bit x86
      
      This patch implements the support for function graph tracer under x86-64.
      Both static and dynamic tracing are supported.
      
      This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
      wanted to use probe_kernel_read/write to make the return address
      saving/patching code more generic but it causes tracing recursion.
      
      That would be perhaps useful to implement a notrace version of these
      function for other archs ports.
      
      Note that arch/x86/process_64.c is not traced, as in X86-32. I first
      thought __switch_to() was responsible of crashes during tracing because I
      believed current task were changed inside but that's actually not the
      case (actually yes, but not the "current" pointer).
      
      So I will have to investigate to find the functions that harm here, to
      enable tracing of the other functions inside (but there is no issue at
      this time, while process_64.c stays out of -pg flags).
      
      A little possible race condition is fixed inside this patch too. When the
      tracer allocate a return stack dynamically, the current depth is not
      initialized before but after. An interrupt could occur at this time and,
      after seeing that the return stack is allocated, the tracer could try to
      trace it with a random uninitialized depth. It's a prevention, even if I
      hadn't problems with it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Bird <tim.bird@am.sony.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48d68b20
    • L
      function trace: fix a bug of single thread function trace · 66eafebc
      Liming Wang 提交于
      Impact: fix "no output from tracer" bug caused by ftrace_update_pid_func()
      
      When disabling single thread function trace using
      "echo -1 > set_ftrace_pid", the normal function trace
      has to restore to original function, otherwise the normal
      function trace will not work well.
      
      Without this commit, something like below:
      
      	$ ps |grep 850
      	  850 root      2556 S    -/bin/sh
      	$ echo 850 > /debug/tracing/set_ftrace_pid
      	$ echo function > /debug/tracing/current_tracer
      	$ echo 1 > /debug/tracing/tracing_enabled
      	$ sleep 1
      	$ echo 0 > /debug/tracing/tracing_enabled
      	$ cat /debug/tracing/trace_pipe |wc -l
      	59704
      	$ echo -1 > /debug/tracing/set_ftrace_pid
      	$ echo 1 > /debug/tracing/tracing_enabled
      	$ sleep 1
      	$ echo 0 > /debug/tracing/tracing_enabled
      	$ more /debug/tracing/trace_pipe
      		<====== nothing output now!
      			it should output trace record.
      Signed-off-by: NLiming Wang <liming.wang@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      66eafebc
    • A
      taint: add missing comment · a8005992
      Arjan van de Ven 提交于
      The description for 'D' was missing in the comment...  (causing me a
      minute of WTF followed by looking at more of the code)
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8005992
    • D
      epoll: introduce resource usage limits · 7ef9964e
      Davide Libenzi 提交于
      It has been thought that the per-user file descriptors limit would also
      limit the resources that a normal user can request via the epoll
      interface.  Vegard Nossum reported a very simple program (a modified
      version attached) that can make a normal user to request a pretty large
      amount of kernel memory, well within the its maximum number of fds.  To
      solve such problem, default limits are now imposed, and /proc based
      configuration has been introduced.  A new directory has been created,
      named /proc/sys/fs/epoll/ and inside there, there are two configuration
      points:
      
        max_user_instances = Maximum number of devices - per user
      
        max_user_watches   = Maximum number of "watched" fds - per user
      
      The current default for "max_user_watches" limits the memory used by epoll
      to store "watches", to 1/32 of the amount of the low RAM.  As example, a
      256MB 32bit machine, will have "max_user_watches" set to roughly 90000.
      That should be enough to not break existing heavy epoll users.  The
      default value for "max_user_instances" is set to 128, that should be
      enough too.
      
      This also changes the userspace, because a new error code can now come out
      from EPOLL_CTL_ADD (-ENOSPC).  The EMFILE from epoll_create() was already
      listed, so that should be ok.
      
      [akpm@linux-foundation.org: use get_current_user()]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: <stable@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: NVegard Nossum <vegardno@ifi.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef9964e
  4. 01 12月, 2008 2 次提交
  5. 30 11月, 2008 2 次提交
    • I
      sched: prevent divide by zero error in cpu_avg_load_per_task, update · af6d596f
      Ingo Molnar 提交于
      Regarding the bug addressed in:
      
        4cd42620: sched: prevent divide by zero error in cpu_avg_load_per_task
      
      Linus points out that the fix is not complete:
      
      > There's nothing that keeps gcc from deciding not to reload
      > rq->nr_running.
      >
      > Of course, in _practice_, I don't think gcc ever will (if it decides
      > that it will spill, gcc is likely going to decide that it will
      > literally spill the local variable to the stack rather than decide to
      > reload off the pointer), but it's a valid compiler optimization, and
      > it even has a name (rematerialization).
      >
      > So I suspect that your patch does fix the bug, but it still leaves the
      > fairly unlikely _potential_ for it to re-appear at some point.
      >
      > We have ACCESS_ONCE() as a macro to guarantee that the compiler
      > doesn't rematerialize a pointer access. That also would clarify
      > the fact that we access something unsafe outside a lock.
      
      So make sure our nr_running value is immutable and cannot change
      after we check it for nonzero.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      af6d596f
    • I
      sched, cpusets: fix warning in kernel/cpuset.c · 1583715d
      Ingo Molnar 提交于
      this warning:
      
        kernel/cpuset.c: In function ‘generate_sched_domains’:
        kernel/cpuset.c:588: warning: ‘ndoms’ may be used uninitialized in this function
      
      triggers because GCC does not recognize that ndoms stays uninitialized
      only if doms is NULL - but that flow is covered at the end of
      generate_sched_domains().
      
      Help out GCC by initializing this variable to 0. (that's prudent anyway)
      
      Also, this function needs a splitup and code flow simplification:
      with 160 lines length it's clearly too long.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1583715d
  6. 29 11月, 2008 1 次提交
  7. 28 11月, 2008 4 次提交
    • L
      ftrace: improve seq_operation of ftrace · 50cdaf08
      Liming Wang 提交于
      Impact: make ftrace position computing more sane
      
      First remove useless ->pos field. Then we needn't check seq_printf
      in .show like other place.
      Signed-off-by: NLiming Wang <liming.wang@windriver.com>
      Reviewed-by: NBruce Ashfield <bruce.ashfield@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      50cdaf08
    • T
      tracing, alpha: fix build: add missing #ifdef CONFIG_STACKTRACE · c7425acb
      Török Edwin 提交于
      There are architectures that still have no stacktrace support.
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7425acb
    • I
      tracing/function-graph-tracer: more output tweaks · d51090b3
      Ingo Molnar 提交于
      Impact: prettify the output some more
      
      Before:
      
      0)           |     sys_read() {
      0)      0.796 us |   fget_light();
      0)           |       vfs_read() {
      0)           |         rw_verify_area() {
      0)           |           security_file_permission() {
      ------------8<---------- thread sshd-1755 ------------8<----------
      
      After:
      
       0)               |  sys_read() {
       0)      0.796 us |    fget_light();
       0)               |    vfs_read() {
       0)               |      rw_verify_area() {
       0)               |        security_file_permission() {
       ------------------------------------------
       | 1)  migration/0--1  =>  sshd-1755
       ------------------------------------------
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d51090b3
    • F
      tracing/function-graph-tracer: adjustments of the trace informations · 1a056155
      Frederic Weisbecker 提交于
      Impact: increase the visual qualities of the call-graph-tracer output
      
      This patch applies various trace output formatting changes:
      
       - CPU is now a decimal number, followed by a parenthesis.
      
       - Overhead is now on the second column (gives a good visibility)
      
       - Cost is now on the third column, can't exceed 9999.99 us. It is
         followed by a virtual line based on a "|" character.
      
       - Functions calls are now the last column on the right. This way, we
         haven't dynamic column (which flow is harder to follow) on its right.
      
       - CPU and Overhead have their own option flag. They are default-on but you
         can disable them easily:
      
            echo nofuncgraph-cpu > trace_options
            echo nofuncgraph-overhead > trace_options
      
      TODO:
      
      _ Refactoring of the thread switch output.
      _ Give a default-off option to output the thread and its pid on each row.
      _ Provide headers
      _ ....
      
      Here is an example of the new trace style:
      
      0)           |             mutex_unlock() {
      0)      0.639 us |           __mutex_unlock_slowpath();
      0)      1.607 us |         }
      0)           |             remove_wait_queue() {
      0)      0.616 us |           _spin_lock_irqsave();
      0)      0.616 us |           _spin_unlock_irqrestore();
      0)      2.779 us |         }
      0)      0.495 us |         n_tty_set_room();
      0) ! 9999.999 us |       }
      0)           |           tty_ldisc_deref() {
      0)      0.615 us |         _spin_lock_irqsave();
      0)      0.616 us |         _spin_unlock_irqrestore();
      0)      2.793 us |       }
      0)           |           current_fs_time() {
      0)      0.488 us |         current_kernel_time();
      0)      0.495 us |         timespec_trunc();
      0)      2.486 us |       }
      0) ! 9999.999 us |     }
      0) ! 9999.999 us |   }
      0) ! 9999.999 us | }
      0)           |     sys_read() {
      0)      0.796 us |   fget_light();
      0)           |       vfs_read() {
      0)           |         rw_verify_area() {
      0)           |           security_file_permission() {
      0)      0.488 us |         cap_file_permission();
      0)      1.720 us |       }
      0)      3.  4 us |     }
      0)           |         tty_read() {
      0)      0.488 us |       tty_paranoia_check();
      0)           |           tty_ldisc_ref_wait() {
      0)           |             tty_ldisc_try() {
      0)      0.615 us |           _spin_lock_irqsave();
      0)      0.615 us |           _spin_unlock_irqrestore();
      0)      5.436 us |         }
      0)      6.427 us |       }
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a056155
  8. 27 11月, 2008 3 次提交
    • F
      tracing/function-graph-tracer: enhancements for the trace output · 83a8df61
      Frederic Weisbecker 提交于
      Impact: enhance the output of the graph-tracer
      
      This patch applies some ideas of Ingo Molnar and Steven Rostedt.
      
      * Output leaf functions in one line with parenthesis, semicolon and duration
        output.
      
      * Add a second column (after cpu) for an overhead sign.
        if duration > 100 us, "!"
        if duration > 10 us, "+"
        else " "
      
      * Print output in us with remaining nanosec: u.n
      
      * Print duration on the right end, following the indentation of the functions.
        Use also visual clues: "-" on entry call (no duration to output) and "+" on
        return (duration output).
      
      The name of the tracer has been fixed as well: function-branch becomes
      function_branch.
      
      Here is an example of the new output:
      
      CPU[000]           dequeue_entity() {                    -
      CPU[000]             update_curr() {                    -
      CPU[000]               update_min_vruntime();                    + 0.512 us
      CPU[000]             }                                + 1.504 us
      CPU[000]             clear_buddies();                    + 0.481 us
      CPU[000]             update_min_vruntime();                    + 0.504 us
      CPU[000]           }                                + 4.557 us
      CPU[000]           hrtick_update() {                    -
      CPU[000]             hrtick_start_fair();                    + 0.489 us
      CPU[000]           }                                + 1.443 us
      CPU[000] +       }                                + 14.655 us
      CPU[000] +     }                                + 15.678 us
      CPU[000] +   }                                + 16.686 us
      CPU[000]     msecs_to_jiffies();                    + 0.481 us
      CPU[000]     put_prev_task_fair();                    + 0.504 us
      CPU[000]     pick_next_task_fair();                    + 0.482 us
      CPU[000]     pick_next_task_rt();                    + 0.504 us
      CPU[000]     pick_next_task_fair();                    + 0.481 us
      CPU[000]     pick_next_task_idle();                    + 0.489 us
      CPU[000]     _spin_trylock();                    + 0.655 us
      CPU[000]     _spin_unlock();                    + 0.609 us
      
      CPU[000]  ------------8<---------- thread bash-2794 ------------8<----------
      
      CPU[000]               finish_task_switch() {                    -
      CPU[000]                 _spin_unlock_irq();                    + 0.722 us
      CPU[000]               }                                + 2.369 us
      CPU[000] !           }                                + 501972.605 us
      CPU[000] !         }                                + 501973.763 us
      CPU[000]           copy_from_read_buf() {                    -
      CPU[000]             _spin_lock_irqsave();                    + 0.670 us
      CPU[000]             _spin_unlock_irqrestore();                    + 0.699 us
      CPU[000]             copy_to_user() {                    -
      CPU[000]               might_fault() {                    -
      CPU[000]                 __might_sleep();                    + 0.503 us
      CPU[000]               }                                + 1.632 us
      CPU[000]               __copy_to_user_ll();                    + 0.542 us
      CPU[000]             }                                + 3.858 us
      CPU[000]             tty_audit_add_data() {                    -
      CPU[000]               _spin_lock_irq();                    + 0.609 us
      CPU[000]               _spin_unlock_irq();                    + 0.624 us
      CPU[000]             }                                + 3.196 us
      CPU[000]             _spin_lock_irqsave();                    + 0.624 us
      CPU[000]             _spin_unlock_irqrestore();                    + 0.625 us
      CPU[000] +         }                                + 13.611 us
      CPU[000]           copy_from_read_buf() {                    -
      CPU[000]             _spin_lock_irqsave();                    + 0.624 us
      CPU[000]             _spin_unlock_irqrestore();                    + 0.616 us
      CPU[000]           }                                + 2.820 us
      CPU[000]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83a8df61
    • S
      sched: prevent divide by zero error in cpu_avg_load_per_task · 4cd42620
      Steven Rostedt 提交于
      Impact: fix divide by zero crash in scheduler rebalance irq
      
      While testing the branch profiler, I hit this crash:
      
      divide error: 0000 [#1] PREEMPT SMP
      [...]
      RIP: 0010:[<ffffffff8024a008>]  [<ffffffff8024a008>] cpu_avg_load_per_task+0x50/0x7f
      [...]
      Call Trace:
       <IRQ> <0> [<ffffffff8024fd43>] find_busiest_group+0x3e5/0xcaa
       [<ffffffff8025da75>] rebalance_domains+0x2da/0xa21
       [<ffffffff80478769>] ? find_next_bit+0x1b2/0x1e6
       [<ffffffff8025e2ce>] run_rebalance_domains+0x112/0x19f
       [<ffffffff8026d7c2>] __do_softirq+0xa8/0x232
       [<ffffffff8020ea7c>] call_softirq+0x1c/0x3e
       [<ffffffff8021047a>] do_softirq+0x94/0x1cd
       [<ffffffff8026d5eb>] irq_exit+0x6b/0x10e
       [<ffffffff8022e6ec>] smp_apic_timer_interrupt+0xd3/0xff
       [<ffffffff8020e4b3>] apic_timer_interrupt+0x13/0x20
      
      The code for cpu_avg_load_per_task has:
      
      	if (rq->nr_running)
      		rq->avg_load_per_task = rq->load.weight / rq->nr_running;
      
      The runqueue lock is not held here, and there is nothing that prevents
      the rq->nr_running from going to zero after it passes the if condition.
      
      The branch profiler simply made the race window bigger.
      
      This patch saves off the rq->nr_running to a local variable and uses that
      for both the condition and the division.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4cd42620
    • L
      ftrace: prevent recursion · 4f5a7f40
      Lai Jiangshan 提交于
      Impact: prevent unnecessary stack recursion
      
      if the resched flag was set before we entered, then don't reschedule.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4f5a7f40
  9. 26 11月, 2008 8 次提交
    • A
      tracing: add "power-tracer": C/P state tracer to help power optimization · f3f47a67
      Arjan van de Ven 提交于
      Impact: new "power-tracer" ftrace plugin
      
      This patch adds a C/P-state ftrace plugin that will generate
      detailed statistics about the C/P-states that are being used,
      so that we can look at detailed decisions that the C/P-state
      code is making, rather than the too high level "average"
      that we have today.
      
      An example way of using this is:
      
       mount -t debugfs none /sys/kernel/debug
       echo cstate > /sys/kernel/debug/tracing/current_tracer
       echo 1 > /sys/kernel/debug/tracing/tracing_enabled
       sleep 1
       echo 0 > /sys/kernel/debug/tracing/tracing_enabled
       cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3f47a67
    • S
      ftrace: add cpu annotation for function graph tracer · 437f24fb
      Steven Rostedt 提交于
      Impact: enhancement for function graph tracer
      
      When run on a SMP box, the function graph tracer is confusing because
      it shows the different CPUS as changes in the trace.
      
      This patch adds the annotation of 'CPU[###]' where ### is a three digit
      number. The output will look similar to this:
      
      CPU[001]     dput() {
      CPU[000] } 726
      CPU[001]     } 487
      CPU[000] do_softirq() {
      CPU[001]   } 2221
      CPU[000]   __do_softirq() {
      CPU[000]     __local_bh_disable() {
      CPU[001]   unroll_tree_refs() {
      CPU[000]     } 569
      CPU[001]   } 501
      CPU[000]     rcu_process_callbacks() {
      CPU[001]   kfree() {
      
      What makes this nice is that now you can grep the file and produce
      readable format for a particular CPU.
      
       # cat /debug/tracing/trace > /tmp/trace
       # grep '^CPU\[000\]' /tmp/trace > /tmp/trace0
       # grep '^CPU\[001\]' /tmp/trace > /tmp/trace1
      
      Will give you:
      
       # head /tmp/trace0
      CPU[000] ------------8<---------- thread sshd-3899 ------------8<----------
      CPU[000]     inotify_dentry_parent_queue_event() {
      CPU[000]     } 2531
      CPU[000]     inotify_inode_queue_event() {
      CPU[000]     } 505
      CPU[000]   } 69626
      CPU[000] } 73089
      CPU[000] audit_syscall_exit() {
      CPU[000]   path_put() {
      CPU[000]     dput() {
      
       # head /tmp/trace1
      CPU[001] ------------8<---------- thread pcscd-3446 ------------8<----------
      CPU[001]               } 4186
      CPU[001]               dput() {
      CPU[001]               } 543
      CPU[001]               vfs_permission() {
      CPU[001]                 inode_permission() {
      CPU[001]                   shmem_permission() {
      CPU[001]                     generic_permission() {
      CPU[001]                     } 501
      CPU[001]                   } 2205
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      437f24fb
    • S
      ftrace: add thread comm to function graph tracer · 660c7f9b
      Steven Rostedt 提交于
      Impact: enhancement to function graph tracer
      
      Export the trace_find_cmdline so the function graph tracer can
      use it to print the comms of the threads.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      660c7f9b
    • S
      ftrace: let function tracing and function return run together · e53a6319
      Steven Rostedt 提交于
      Impact: feature
      
      This patch enables function tracing and function return to run together.
      I've tested this by enabling the stack tracer and return tracer, where
      both the function entry and function return are used together with
      dynamic ftrace.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e53a6319
    • S
      ftrace: use code patching for ftrace graph tracer · 5a45cfe1
      Steven Rostedt 提交于
      Impact: more efficient code for ftrace graph tracer
      
      This patch uses the dynamic patching, when available, to patch
      the function graph code into the kernel.
      
      This patch will ease the way for letting both function tracing
      and function graph tracing run together.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5a45cfe1
    • S
      ftrace: add function tracing to single thread · df4fc315
      Steven Rostedt 提交于
      Impact: feature to function trace a single thread
      
      This patch adds the ability to function trace a single thread.
      The file:
      
        /debugfs/tracing/set_ftrace_pid
      
      contains the pid to trace. Valid pids are any positive integer.
      Writing any negative number to this file will disable the pid
      tracing and the function tracer will go back to tracing all of
      threads.
      
      This feature works with both static and dynamic function tracing.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      df4fc315
    • F
      tracing/function-return-tracer: set a more human readable output · 287b6e68
      Frederic Weisbecker 提交于
      Impact: feature
      
      This patch sets a C-like output for the function graph tracing.
      For this aim, we now call two handler for each function: one on the entry
      and one other on return. This way we can draw a well-ordered call stack.
      
      The pid of the previous trace is loosely stored to be compared against
      the one of the current trace to see if there were a context switch.
      
      Without this little feature, the call tree would seem broken at
      some locations.
      We could use the sched_tracer to capture these sched_events but this
      way of processing is much more simpler.
      
      2 spaces have been chosen for indentation to fit the screen while deep
      calls. The time of execution in nanosecs is printed just after closed
      braces, it seems more easy this way to find the corresponding function.
      If the time was printed as a first column, it would be not so easy to
      find the corresponding function if it is called on a deep depth.
      
      I plan to output the return value but on 32 bits CPU, the return value
      can be 32 or 64, and its difficult to guess on which case we are.
      I don't know what would be the better solution on X86-32: only print
      eax (low-part) or even edx (high-part).
      
      Actually it's thee same problem when a function return a 8 bits value, the
      high part of eax could contain junk values...
      
      Here is an example of trace:
      
      sys_read() {
        fget_light() {
        } 526
        vfs_read() {
          rw_verify_area() {
            security_file_permission() {
              cap_file_permission() {
              } 519
            } 1564
          } 2640
          do_sync_read() {
            pipe_read() {
              __might_sleep() {
              } 511
              pipe_wait() {
                prepare_to_wait() {
                } 760
                deactivate_task() {
                  dequeue_task() {
                    dequeue_task_fair() {
                      dequeue_entity() {
                        update_curr() {
                          update_min_vruntime() {
                          } 504
                        } 1587
                        clear_buddies() {
                        } 512
                        add_cfs_task_weight() {
                        } 519
                        update_min_vruntime() {
                        } 511
                      } 5602
                      dequeue_entity() {
                        update_curr() {
                          update_min_vruntime() {
                          } 496
                        } 1631
                        clear_buddies() {
                        } 496
                        update_min_vruntime() {
                        } 527
                      } 4580
                      hrtick_update() {
                        hrtick_start_fair() {
                        } 488
                      } 1489
                    } 13700
                  } 14949
                } 16016
                msecs_to_jiffies() {
                } 496
                put_prev_task_fair() {
                } 504
                pick_next_task_fair() {
                } 489
                pick_next_task_rt() {
                } 496
                pick_next_task_fair() {
                } 489
                pick_next_task_idle() {
                } 489
      
      ------------8<---------- thread 4 ------------8<----------
      
      finish_task_switch() {
      } 1203
      do_softirq() {
        __do_softirq() {
          __local_bh_disable() {
          } 669
          rcu_process_callbacks() {
            __rcu_process_callbacks() {
              cpu_quiet() {
                rcu_start_batch() {
                } 503
              } 1647
            } 3128
            __rcu_process_callbacks() {
            } 542
          } 5362
          _local_bh_enable() {
          } 587
        } 8880
      } 9986
      kthread_should_stop() {
      } 669
      deactivate_task() {
        dequeue_task() {
          dequeue_task_fair() {
            dequeue_entity() {
              update_curr() {
                calc_delta_mine() {
                } 511
                update_min_vruntime() {
                } 511
              } 2813
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      287b6e68
    • F
      tracing/function-return-tracer: change the name into function-graph-tracer · fb52607a
      Frederic Weisbecker 提交于
      Impact: cleanup
      
      This patch changes the name of the "return function tracer" into
      function-graph-tracer which is a more suitable name for a tracing
      which makes one able to retrieve the ordered call stack during
      the code flow.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fb52607a