提交 · 8ef9925b02c23e3838d5e593c5cf37984141150f · openanolis / cloud-kernel

29 9月, 2017 3 次提交

sched/debug: Add explicit TASK_PARKED printing · 8ef9925b

由 Peter Zijlstra 提交于 9月 22, 2017

Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it
its own print state because it will not in fact get woken by regular
wakeups and is a long-term state.

This requires moving TASK_PARKED into the TASK_REPORT mask, and since
that latter needs to be a contiguous bitmask, we need to shuffle the
bits around a bit.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

8ef9925b

sched/debug: Add explicit TASK_IDLE printing · 06eb6184

由 Peter Zijlstra 提交于 9月 22, 2017

Markus reported that kthreads that idle using TASK_IDLE instead of
TASK_INTERRUPTIBLE are reported in as TASK_UNINTERRUPTIBLE and things
like htop mark those red.

This is undesirable, so add an explicit state for TASK_IDLE.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

06eb6184

sched/tracing: Fix trace_sched_switch task-state printing · efb40f58

由 Peter Zijlstra 提交于 9月 22, 2017

Convert trace_sched_switch to use the common task-state helpers and
fix the "X" and "Z" order, possibly they ended up in the wrong order
because TASK_REPORT has them in the wrong order too.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

efb40f58

04 4月, 2017 1 次提交

sched,tracing: Update trace_sched_pi_setprio() · b91473ff

由 Peter Zijlstra 提交于 3月 23, 2017

Pass the PI donor task, instead of a numerical priority.

Numerical priorities are not sufficient to describe state ever since
SCHED_DEADLINE.

Annotate all sched tracepoints that are currently broken; fixing them
will bork userspace. *hate*.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170323150216.353599881@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b91473ff

02 3月, 2017 1 次提交

sched/headers: Prepare for new header dependencies before moving code to... · 6a3827d7

由 Ingo Molnar 提交于 2月 08, 2017

sched/headers: Prepare for new header dependencies before moving code to <linux/sched/numa_balancing.h>

We are going to split <linux/sched/numa_balancing.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/numa_balancing.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

6a3827d7

06 10月, 2015 1 次提交

sched/core: Fix trace_sched_switch() · c73464b1

由 Peter Zijlstra 提交于 9月 28, 2015

__trace_sched_switch_state() is the last remaining PREEMPT_ACTIVE
user, move trace_sched_switch() from prepare_task_switch() to
__schedule() and propagate the @preempt argument.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c73464b1

03 8月, 2015 1 次提交

sched: Introduce the 'trace_sched_waking' tracepoint · fbd705a0

由 Peter Zijlstra 提交于 6月 09, 2015

Mathieu reported that since 317f3941 ("sched: Move the second half
of ttwu() to the remote cpu") trace_sched_wakeup() can happen out of
context of the waker.

This is a problem when you want to analyse wakeup paths because it is
now very hard to correlate the wakeup event to whoever issued the
wakeup.

OTOH trace_sched_wakeup() is issued at the point where we set
p->state = TASK_RUNNING, which is right were we hand the task off to
the scheduler, so this is an important point when looking at
scheduling behaviour, up to here its been the wakeup path everything
hereafter is due to scheduler policy.

To bridge this gap, introduce a second tracepoint: trace_sched_waking.
It is guaranteed to be called in the waker context.
Reported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Francis Giraldeau <francis.giraldeau@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150609091336.GQ3644@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

fbd705a0

19 5月, 2015 1 次提交

sched/wait: Introduce TASK_NOLOAD and TASK_IDLE · 80ed87c8

由 Peter Zijlstra 提交于 5月 08, 2015

Currently people use TASK_INTERRUPTIBLE to idle kthreads and wait for
'work' because TASK_UNINTERRUPTIBLE contributes to the loadavg. Having
all idle kthreads contribute to the loadavg is somewhat silly.

Now mostly this works OK, because kthreads have all their signals
masked. However there's a few sites where this is causing problems and
TASK_UNINTERRUPTIBLE should be used, except for that loadavg issue.

This patch adds TASK_NOLOAD which, when combined with
TASK_UNINTERRUPTIBLE avoids the loadavg accounting.

As most of imagined usage sites are loops where a thread wants to
idle, waiting for work, a helper TASK_IDLE is introduced.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

80ed87c8

13 12月, 2014 1 次提交

tracing/sched: Check preempt_count() for current when reading task->state · aee4e5f3

由 Steven Rostedt (Red Hat) 提交于 12月 10, 2014

When recording the state of a task for the sched_switch tracepoint a check of
task_preempt_count() is performed to see if PREEMPT_ACTIVE is set. This is
because, technically, a task being preempted is really in the TASK_RUNNING
state, and that is what should be recorded when tracing a sched_switch,
even if the task put itself into another state (it hasn't scheduled out
in that state yet).

But with the change to use per_cpu preempt counts, the
task_thread_info(p)->preempt_count is no longer used, and instead
task_preempt_count(p) is used.

The problem is that this does not use the current preempt count but a stale
one from a previous sched_switch. The task_preempt_count(p) uses
saved_preempt_count and not preempt_count(). But for tracing sched_switch,
if p is current, we really want preempt_count().

I hit this bug when I was tracing sleep and the call from do_nanosleep()
scheduled out in the "RUNNING" state.

           sleep-4290  [000] 537272.259992: sched_switch:         sleep:4290 [120] R ==> swapper/0:0 [120]
           sleep-4290  [000] 537272.260015: kernel_stack:         <stack trace>
=> __schedule (ffffffff8150864a)
=> schedule (ffffffff815089f8)
=> do_nanosleep (ffffffff8150b76c)
=> hrtimer_nanosleep (ffffffff8108d66b)
=> SyS_nanosleep (ffffffff8108d750)
=> return_to_handler (ffffffff8150e8e5)
=> tracesys_phase2 (ffffffff8150c844)

After a bit of hair pulling, I found that the state was really
TASK_INTERRUPTIBLE, but the saved_preempt_count had an old PREEMPT_ACTIVE
set and caused the sched_switch tracepoint to show it as RUNNING.

Link: http://lkml.kernel.org/r/20141210174428.3cb7542a@gandalf.local.homeAcked-by: NIngo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org # 3.13+
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 01028747 "sched: Create more preempt_count accessors"
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

aee4e5f3

28 10月, 2014 1 次提交

sched: Fix the PREEMPT_ACTIVE check in __trace_sched_switch_state() · 8f9fbf09

由 Oleg Nesterov 提交于 10月 07, 2014

task_preempt_count() has nothing to do with the actual preempt counter,
thread_info->saved_preempt_count is only valid right after switch_to().

__trace_sched_switch_state() can use preempt_count(), prev is still the
current task when trace_sched_switch() is called.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
[ Added BUG_ON(). ]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20141007195108.GB28002@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8f9fbf09

05 6月, 2014 1 次提交

sched, trace: Add a tracepoint for IPI-less remote wakeups · dfc68f29

由 Andy Lutomirski 提交于 6月 04, 2014

Remote wakeups of polling CPUs are a valuable performance
improvement; add a tracepoint to make it much easier to verify that
they're working.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: nicolas.pitre@linaro.org
Cc: daniel.lezcano@linaro.org
Cc: umgwanakikbuti@gmail.com
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/16205aee116772aa686814f9b13bccb562108047.1401902905.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

dfc68f29

22 1月, 2014 1 次提交

sched: add tracepoints related to NUMA task migration · 286549dc

由 Mel Gorman 提交于 1月 21, 2014

This patch adds three tracepoints
 o trace_sched_move_numa	when a task is moved to a node
 o trace_sched_swap_numa	when a task is swapped with another task
 o trace_sched_stick_numa	when a numa-related migration fails

The tracepoints allow the NUMA scheduler activity to be monitored and the
following high-level metrics can be calculated

 o NUMA migrated stuck	 nr trace_sched_stick_numa
 o NUMA migrated idle	 nr trace_sched_move_numa
 o NUMA migrated swapped nr trace_sched_swap_numa
 o NUMA local swapped	 trace_sched_swap_numa src_nid == dst_nid (should never happen)
 o NUMA remote swapped	 trace_sched_swap_numa src_nid != dst_nid (should == NUMA migrated swapped)
 o NUMA group swapped	 trace_sched_swap_numa src_ngid == dst_ngid
			 Maybe a small number of these are acceptable
			 but a high number would be a major surprise.
			 It would be even worse if bounces are frequent.
 o NUMA avg task migs.	 Average number of migrations for tasks
 o NUMA stddev task mig	 Self-explanatory
 o NUMA max task migs.	 Maximum number of migrations for a single task

In general the intent of the tracepoints is to help diagnose problems
where automatic NUMA balancing appears to be doing an excessive amount
of useless work.

[akpm@linux-foundation.org: remove semicolon-after-if, repair coding-style]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Alex Thorlton <athorlton@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

286549dc

31 10月, 2013 1 次提交

hung_task debugging: Add tracepoint to report the hang · 6a716c90

由 Oleg Nesterov 提交于 10月 19, 2013

Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.

Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6a716c90

25 9月, 2013 1 次提交

sched: Create more preempt_count accessors · 01028747

由 Peter Zijlstra 提交于 8月 14, 2013

We need a few special preempt_count accessors:
 - task_preempt_count() for when we're interested in the preemption
   count of another (non-running) task.
 - init_task_preempt_count() for properly initializing the preemption
   count.
 - init_idle_preempt_count() a special case of the above for the idle
   threads.

With these no generic code ever touches thread_info::preempt_count
anymore and architectures could choose to remove it.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-jf5swrio8l78j37d06fzmo4r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

01028747

14 8月, 2013 2 次提交

tracing/perf: Reimplement TP_perf_assign() logic · 12473965

由 Oleg Nesterov 提交于 8月 06, 2013

The next patch tries to avoid the costly perf_trace_buf_* calls
when possible but there is a problem. We can only do this if
__task == NULL, perf_tp_event(task != NULL) has the additional
code for this case.

Unfortunately, TP_perf_assign/__perf_xxx which changes the default
values of __count/__task variables for perf_trace_buf_submit() is
called "too late", after we already did perf_trace_buf_prepare(),
and the optimization above can't work.

So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
DECLARE_EVENT_CLASS() can use the result of assignments hidden in
"args" right after ftrace_get_offsets_##call() which is mostly
trivial. This allows us to have the fast-path "__task != NULL"
check at the start, see the next patch.

Link: http://lkml.kernel.org/r/20130806160844.GA2739@redhat.comTested-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

12473965

tracing/perf: Expand TRACE_EVENT(sched_stat_runtime) · 36009d07

由 Oleg Nesterov 提交于 8月 06, 2013

To simplify the review of the next patches:

1. We are going to reimplent __perf_task/counter and embedd them
   into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
   DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
   different TP_ARGS's.

2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
   right before perf_trace_buf_prepare().

   This way it evaluates TP_ARGS() asap, the next patch explores
   this fact.

   Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
   "struct pt_regs *regs", perhaps it makes sense to remove this
   argument. And perhaps we can teach perf_trace_buf_submit()
   to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
   in this case.

3. Cosmetic, but the typecast from "void*" buys nothing. It just
   adds the noise, remove it.

Link: http://lkml.kernel.org/r/20130806160841.GA2736@redhat.comAcked-by: NPeter Zijlstra <peterz@infradead.org>
Tested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

36009d07

12 4月, 2013 1 次提交

kthread: Prevent unpark race which puts threads on the wrong cpu · f2530dc7

由 Thomas Gleixner 提交于 4月 09, 2013

The smpboot threads rely on the park/unpark mechanism which binds per
cpu threads on a particular core. Though the functionality is racy:

CPU0	       	 	CPU1  	     	    CPU2
unpark(T)				    wake_up_process(T)
  clear(SHOULD_PARK)	T runs
			leave parkme() due to !SHOULD_PARK  
  bind_to(CPU2)		BUG_ON(wrong CPU)						    

We cannot let the tasks move themself to the target CPU as one of
those tasks is actually the migration thread itself, which requires
that it starts running on the target cpu right away.

The solution to this problem is to prevent wakeups in park mode which
are not from unpark(). That way we can guarantee that the association
of the task to the target cpu is working correctly.

Add a new task state (TASK_PARKED) which prevents other wakeups and
use this state explicitly for the unpark wakeup.

Peter noticed: Also, since the task state is visible to userspace and
all the parked tasks are still in the PID space, its a good hint in ps
and friends that these tasks aren't really there for the moment.

The migration thread has another related issue.

CPU0	      	     	 CPU1
Bring up CPU2
create_thread(T)
park(T)
 wait_for_completion()
			 parkme()
			 complete()
sched_set_stop_task()
			 schedule(TASK_PARKED)

The sched_set_stop_task() call is issued while the task is on the
runqueue of CPU1 and that confuses the hell out of the stop_task class
on that cpu. So we need the same synchronizaion before
sched_set_stop_task().
Reported-by: NDave Jones <davej@redhat.com>
Reported-and-tested-by: NDave Hansen <dave@sr71.net>
Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
Acked-by: NPeter Ziljstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: dhillf@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f2530dc7

31 7月, 2012 1 次提交

perf/trace: Add ability to set a target task for events · e6dab5ff

由 Andrew Vagin 提交于 7月 11, 2012

A few events are interesting not only for a current task.
For example, sched_stat_* events are interesting for a task
which wakes up. For this reason, it will be good if such
events will be delivered to a target task too.

Now a target task can be set by using __perf_task().

The original idea and a draft patch belongs to Peter Zijlstra.

I need these events for profiling sleep times. sched_switch is used for
getting callchains and sched_stat_* is used for getting time periods.
These events are combined in user space, then it can be analyzed by
perf tools.
Inspired-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arun Sharma <asharma@fb.com>
Signed-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e6dab5ff

31 3月, 2012 1 次提交

tracing, sched, vfs: Fix 'old_pid' usage in trace_sched_process_exec() · 6308191f

由 Oleg Nesterov 提交于 3月 30, 2012

1. TRACE_EVENT(sched_process_exec) forgets to actually use the
   old pid argument, it sets ->old_pid = p->pid.

2. search_binary_handler() uses the wrong pid number. tracepoint
   needs the global pid_t from the root namespace, while old_pid
   is the virtual pid number as it seen by the tracer/parent.

With this patch we have two pid_t's in search_binary_handler(),
not really nice. Perhaps we should switch to "struct pid*", but
in this case it would be better to cleanup the current code
first and move the "depth == 0" code outside.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: David Smith <dsmith@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Link: http://lkml.kernel.org/r/20120330162636.GA4857@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6308191f

23 2月, 2012 1 次提交

tracepoint, vfs, sched: Add exec() tracepoint · 4ff16c25

由 David Smith 提交于 2月 07, 2012

Added a minimal exec tracepoint. Exec is an important major event
in the life of a task, like fork(), clone() or exit(), all of
which we already trace.

[ We also do scheduling re-balancing during exec() - so it's useful
  from a scheduler instrumentation POV as well. ]

If you want to watch a task start up, when it gets exec'ed is a good place
to start.  With the addition of this tracepoint, exec's can be monitored
and better picture of general system activity can be obtained. This
tracepoint will also enable better process life tracking, allowing you to
answer questions like "what process keeps starting up binary X?".

This tracepoint can also be useful in ftrace filtering and trigger
conditions: i.e. starting or stopping filtering when exec is called.
Signed-off-by: NDavid Smith <dsmith@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/4F314D19.7030504@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

4ff16c25

22 2月, 2012 1 次提交

sched/events: Revert trace_sched_stat_sleeptime() · 8c79a045

由 Peter Zijlstra 提交于 1月 30, 2012

Commit 1ac9bc69 ("sched/tracing: Add a new tracepoint for sleeptime")
added a new sched:sched_stat_sleeptime tracepoint.

It's broken: the first sample we get on a task might be bad because
of a stale sleep_start value that wasn't reset at the last task switch
because the tracepoint was not active.

It also breaks the existing schedstat samples due to the side
effects of:

-               se->statistics.sleep_start = 0;
...
-               se->statistics.block_start = 0;

Nor do I see means to fix it without adding overhead to the scheduler
fast path, which I'm not willing to for the sake of redundant
instrumentation.

Most importantly, sleep time information can already be constructed
by tracing context switches and wakeups, and taking the timestamp
difference between the schedule-out, the wakeup and the schedule-in.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-pc4c9qhl8q6vg3bs4j6k0rbd@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

8c79a045

24 12月, 2011 1 次提交

sched/tracing: Add a new tracepoint for sleeptime · 1ac9bc69

由 Arun Sharma 提交于 12月 21, 2011

If CONFIG_SCHEDSTATS is defined, the kernel maintains
information about how long the task was sleeping or
in the case of iowait, blocking in the kernel before
getting woken up.

This will be useful for sleep time profiling.

Note: this information is only provided for sched_fair.
Other scheduling classes may choose to provide this in
the future.

Note: the delay includes the time spent on the runqueue
as well.
Signed-off-by: NArun Sharma <asharma@fb.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/1324512940-32060-2-git-send-email-asharma@fb.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1ac9bc69

06 12月, 2011 1 次提交

events, sched: Add tracepoint for accounting blocked time · b781a602

由 Andrew Vagin 提交于 11月 28, 2011

This tracepoint shows how long a task is sleeping in uninterruptible state.

E.g. it may show how long and where a mutex is waited for.
Signed-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

b781a602

26 9月, 2011 1 次提交

sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch · 557ab425

由 Peter Zijlstra 提交于 9月 16, 2011

We had need to see the difference between scheduling a runnable task and
a runnable task being involuntarily preempted.

No app should rely on the old string output (the binary
trace event record format is not changed).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1316164603.10174.11.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

557ab425

21 9月, 2010 1 次提交

tracing/sched: Add sched_pi_setprio tracepoint · a8027073

由 Steven Rostedt 提交于 9月 20, 2010

Add a tracepoint that shows the priority of a task being boosted
via priority inheritance.

Cc: Gregory Haskins <ghaskins@novell.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a8027073

29 6月, 2010 1 次提交

tracing: Convert more sched events to DEFINE_EVENT · 210f7669

由 Li Zefan 提交于 5月 24, 2010

Convert sched_wait_task to DEFINE_EVENT, and save ~1K:

   text    data     bss     dec     hex filename
 104595    9424    4992  119011   1d0e3 kernel/sched.o.orig
 103619    9344    4992  117955   1ccc3 kernel/sched.o

No change in functionality.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4BFA3787.2040800@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

210f7669

01 6月, 2010 1 次提交

sched, trace: Fix sched_switch() prev_state argument · 02f72694

由 Peter Zijlstra 提交于 5月 31, 2010

For CONFIG_PREEMPT=y kernels the sched_switch(.prev_state) argument isn't
useful because we can get preempted with current->state != TASK_RUNNING
without actually getting removed from the runqueue.

Cure this by treating all preempted tasks as runnable from the tracer's
point of view.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cautiously-acked-by: NSteven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1275322715.27810.23323.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

02f72694

07 5月, 2010 1 次提交

sched: Remove rq argument to the tracepoints · 27a9da65

由 Peter Zijlstra 提交于 5月 04, 2010

struct rq isn't visible outside of sched.o so its near useless to
expose the pointer, also there are no users of it, so remove it.
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1272997616.1642.207.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

27a9da65

26 11月, 2009 3 次提交

tracepoint: Move signal sending tracepoint to events/signal.h · d1eb650f

由 Masami Hiramatsu 提交于 11月 24, 2009

Move signal sending event to events/signal.h. This patch also
renames sched_signal_send event to signal_generate.

Changes in v4:
 - Fix a typo of task_struct pointer.

Changes in v3:
 - Add docbook style comments

Changes in v2:
 - Add siginfo argument
 - Add siginfo storing macro
Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Reviewed-by: NJason Baron <jbaron@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Oleg Nesterov <oleg@redhat.com>
LKML-Reference: <20091124215645.30449.60208.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d1eb650f

tracing: Restore original format of sched events · 470dda74

由 Li Zefan 提交于 11月 26, 2009

The original format for sched_stat_iowait and sched_stat_sleep:

  $ cat events/sched/sched_stat_iowait/format
  ...
  print fmt: "comm=%s pid=%d delay=%Lu [ns]", ...
  $ cat events/sched/sched_stat_sleep/format
  ...
  print fmt: "comm=%s pid=%d delay=%Lu [ns]", ...

But commit commit 75ec29ab
("tracing: Convert some sched trace events to DEFINE_EVENT and
_PRINT") broke the format:

  $ cat events/sched/sched_stat_iowait/format
  print fmt: "task: %s:%d iowait: %Lu [ns]", ...
  $ cat events/sched/sched_stat_sleep/format
  print fmt: "task: %s:%d sleep: %Lu [ns]", ...

No change in functionality.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E2951.9050800@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

470dda74

events: Rename TRACE_EVENT_TEMPLATE() to DECLARE_EVENT_CLASS() · 091ad365

由 Ingo Molnar 提交于 11月 26, 2009

It is not quite obvious at first sight what TRACE_EVENT_TEMPLATE
does: does it define an event as well beyond defining a template?

To clarify this, rename it to DECLARE_EVENT_CLASS, which follows
the various 'DECLARE_*()' idioms we already have in the kernel:

  DECLARE_EVENT_CLASS(class)

    DEFINE_EVENT(class, event1)
    DEFINE_EVENT(class, event2)
    DEFINE_EVENT(class, event3)

To complete this logic we should also rename TRACE_EVENT() to:

  DEFINE_SINGLE_EVENT(single_event)

... but in a more quiet moment of the kernel cycle.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4B0E286A.2000405@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

091ad365

25 11月, 2009 1 次提交

tracing: Convert some sched trace events to DEFINE_EVENT and _PRINT · 75ec29ab

由 Steven Rostedt 提交于 11月 18, 2009

Converting some of the scheduler trace events to use the
TRACE_EVENT_TEMPLATE, DEFINE_EVENT and DEFINE_EVENT_PRINT helped to
save some space:

$ size kernel/sched.o-*
   text	   data	    bss	    dec	    hex	filename
  79299	   6776	   2520	  88595	  15a13	kernel/sched.o-notrace
 101941	  11896	   2584	 116421	  1c6c5	kernel/sched.o-templ
 104779	  11896	   2584	 119259	  1d1db	kernel/sched.o-trace

sched.o-notrace is without any tracepoints compiled
sched.o-templ is with this patch
sched.o-trace is the tracepoints before this patch

The trace events converted to DEFINE_EVENT:

sched_wakeup, sched_wakeup_new, sched_process_free, sched_process_exit,
and sched_stat_wait.

The trace events converted to DEFINE_EVENT_PRINT:

sched_stat_sleep and sched_stat_iowait.

Note, since the TRACE_EVENT_TEMPLATE always uses a print, the
sched_stat_wait print format is defined in the template and this
template is used by sched_stat_sleep and sched_stat_iowait. But the
later two override the print format.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

75ec29ab

15 10月, 2009 1 次提交

events: Harmonize event field names and print output names · 434a83c3

由 Ingo Molnar 提交于 10月 15, 2009

Now that we can filter based on fields via perf record, people
will start using filter expressions and will expect them to
be obvious.

The primary way to see which fields are available is by looking
at the trace output, such as:

  gcc-18676 [000]   343.011728: irq_handler_entry: irq=0 handler=timer
  cc1-18677 [000]   343.012727: irq_handler_entry: irq=0 handler=timer
  cc1-18677 [000]   343.032692: irq_handler_entry: irq=0 handler=timer
  cc1-18677 [000]   343.033690: irq_handler_entry: irq=0 handler=timer
  cc1-18677 [000]   343.034687: irq_handler_entry: irq=0 handler=timer
  cc1-18677 [000]   343.035686: irq_handler_entry: irq=0 handler=timer
  cc1-18677 [000]   343.036684: irq_handler_entry: irq=0 handler=timer

While 'irq==0' filters work, the 'handler==<x>' filter expression
does not work:

  $ perf record -R -f -a -e irq:irq_handler_entry --filter handler=timer sleep 1
   Error: failed to set filter with 22 (Invalid argument)

The problem is that while an 'irq' field exists and is recognized
as a filter field - 'handler' does not exist - its name is 'name'
in the output.

To solve this, we need to synchronize the printout and the field
names, wherever possible.

In cases where the printout prints a non-field, we enclose
that information in square brackets, such as:

  perf-1380  [013]   724.903505: softirq_exit: vec=9 [action=RCU]
  perf-1380  [013]   724.904482: softirq_exit: vec=1 [action=TIMER]

This way users can use filter expressions more intuitively: all
fields that show up as 'primary' (non-bracketed) information is
filterable.

This patch harmonizes the field names for all irq, bkl, power,
sched and timer events.

We might in fact think about dropping the print format bit of
generic tracepoints altogether, and just print the fields that
are being recorded.

Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

434a83c3

14 9月, 2009 1 次提交

perf_counter, sched: Add sched_stat_runtime tracepoint · f977bb49

由 Ingo Molnar 提交于 9月 13, 2009

This allows more precise tracking of how the scheduler accounts
(and acts upon) a task having spent N nanoseconds of CPU time.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f977bb49

02 9月, 2009 1 次提交

sched: Add wait, sleep and iowait accounting tracepoints · 768d0c27

由 Peter Zijlstra 提交于 7月 23, 2009

Add 3 schedstat tracepoints to help account for wait-time,
sleep-time and iowait-time.

They can also be used as a perf-counter source to profile tasks
on these clocks.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <new-submission>
[ build fix for the !CONFIG_SCHEDSTATS case ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

768d0c27

26 8月, 2009 1 次提交

tracing/sched: show CPU task wakes up on in trace event · f0693c8b

由 Steven Rostedt 提交于 8月 06, 2009

While debugging the scheduler push / pull algorithm, I found
it very annoying that the sched wake up events did not show
the CPU that the task was waking on. In order to analyze the
scheduler, I needed that information.

This patch adds recording of the CPU that a task is waking up
on.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f0693c8b

13 7月, 2009 1 次提交

tracing/events: Move TRACE_SYSTEM outside of include guard · d0b6e04a

由 Li Zefan 提交于 7月 13, 2009

If TRACE_INCLDUE_FILE is defined, <trace/events/TRACE_INCLUDE_FILE.h>
will be included and compiled, otherwise it will be
<trace/events/TRACE_SYSTEM.h>

So TRACE_SYSTEM should be defined outside of #if proctection,
just like TRACE_INCLUDE_FILE.

Imaging this scenario:

 #include <trace/events/foo.h>
    -> TRACE_SYSTEM == foo
 ...
 #include <trace/events/bar.h>
    -> TRACE_SYSTEM == bar
 ...
 #define CREATE_TRACE_POINTS
 #include <trace/events/foo.h>
    -> TRACE_SYSTEM == bar !!!

and then bar.h will be included and compiled.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A5A9CF1.2010007@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d0b6e04a

27 5月, 2009 1 次提交

tracing: add previous task state info to sched switch event · 937cdb9d

由 Steven Rostedt 提交于 5月 15, 2009

It is useful to see the state of a task that is being switched out.
This patch adds the output of the state of the previous task in
the context switch event.

[ Impact: see state of switched out task in context switch ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

937cdb9d

06 5月, 2009 1 次提交

tracepoint: trace_sched_migrate_task(): remove parameter · de1d7286

由 Mathieu Desnoyers 提交于 5月 05, 2009

The orig_cpu parameter in trace_sched_migrate_task() is not necessary,
it can be got by using task_cpu(p) in the probe.

[ Impact: micro-optimization ]
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
[ modified from Mathieu's patch. The original patch is at:
  http://marc.info/?l=linux-kernel&m=123791201716239&w=2 ]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: zhaolei@cn.fujitsu.com
Cc: laijs@cn.fujitsu.com
LKML-Reference: <49FFFDB7.1050402@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

de1d7286

15 4月, 2009 1 次提交

tracing/events: move trace point headers into include/trace/events · ad8d75ff

由 Steven Rostedt 提交于 4月 14, 2009

Impact: clean up

Create a sub directory in include/trace called events to keep the
trace point headers in their own separate directory. Only headers that
declare trace points should be defined in this directory.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ad8d75ff

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功