提交 · 94dcf29a11b3d20a28790598d701f98484a969da · openanolis / cloud-kernel

23 3月, 2011 1 次提交

kthread: use kthread_create_on_node() · 94dcf29a

由 Eric Dumazet 提交于 3月 22, 2011

ksoftirqd, kworker, migration, and pktgend kthreads can be created with
kthread_create_on_node(), to get proper NUMA affinities for their stack and
task_struct.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94dcf29a

26 2月, 2011 1 次提交

genirq: Provide forced interrupt threading · 8d32a307

由 Thomas Gleixner 提交于 2月 23, 2011

Add a commandline parameter "threadirqs" which forces all interrupts except
those marked IRQF_NO_THREAD to run threaded. That's mostly a debug option to
allow retrieving better debug data from crashing interrupt handlers. If
"threadirqs" is not enabled on the kernel command line, then there is no
impact in the interrupt hotpath.

Architecture code needs to select CONFIG_IRQ_FORCED_THREADING after
marking the interrupts which cant be threaded IRQF_NO_THREAD. All
interrupts which have IRQF_TIMER set are implict marked
IRQF_NO_THREAD. Also all PER_CPU interrupts are excluded.

Forced threading hard interrupts also forces all soft interrupt
handling into thread context.

When enabled it might slow down things a bit, but for debugging problems in
interrupt code it's a reasonable penalty as it does not immediately
crash and burn the machine when an interrupt handler is buggy.

Some test results on a Core2Duo machine:

Cache cold run of:
 # time git grep irq_desc

      non-threaded       threaded
 real 1m18.741s          1m19.061s
 user 0m1.874s           0m1.757s
 sys  0m5.843s           0m5.427s

 # iperf -c server
non-threaded
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   933 Mbits/sec
threaded
[  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   934 Mbits/sec
[  3]  0.0-10.0 sec  1.09 GBytes   937 Mbits/sec
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20110223234956.772668648@linutronix.de>

8d32a307

09 2月, 2011 1 次提交

softirq: Avoid stack switch from ksoftirqd · c305d524

由 Thomas Gleixner 提交于 2月 02, 2011

ksoftirqd() calls do_softirq() which switches stacks on several
architectures. That makes no sense at all. ksoftirqd's stack is
sufficient.

Call __do_softirq() directly.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NDavid Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: NFrank Rowand <frank.rowand@am.sony.com>
LKML-Reference: <alpine.LFD.2.00.1102021704530.31804@localhost6.localdomain6>

c305d524

26 1月, 2011 1 次提交

softirqs: Free up pf flag PF_KSOFTIRQD · 4dd53d89

由 Venkatesh Pallipadi 提交于 12月 21, 2010

Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.
Tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-2-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4dd53d89

14 1月, 2011 1 次提交

kernel: clean up USE_GENERIC_SMP_HELPERS · 351f8f8e

由 Amerigo Wang 提交于 1月 12, 2011

For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.

Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.

For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g.  blackfin, only
on_each_cpu() is compiled.
Signed-off-by: NAmerigo Wang <amwang@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

351f8f8e

07 1月, 2011 1 次提交

sched: Constify function scope static struct sched_param usage · c9b5f501

由 Peter Zijlstra 提交于 1月 07, 2011

Function-scope statics are discouraged because they are
easily overlooked and can cause subtle bugs/races due to
their global (non-SMP safe) nature.

Linus noticed that we did this for sched_param - at minimum
make the const.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: Message-ID: <AANLkTinotRxScOHEb0HgFgSpGPkq_6jKTv5CfvnQM=ee@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c9b5f501

17 12月, 2010 1 次提交

core: Replace __get_cpu_var with __this_cpu_read if not used for an address. · 909ea964

由 Christoph Lameter 提交于 12月 08, 2010

__get_cpu_var() can be replaced with this_cpu_read and will then use a
single read instruction with implied address calculation to access the
correct per cpu instance.

However, the address of a per cpu variable passed to __this_cpu_read()
cannot be determined (since it's an implied address conversion through
segment prefixes).  Therefore apply this only to uses of __get_cpu_var
where the address of the variable is not used.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Hugh Dickins <hughd@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

909ea964

23 10月, 2010 1 次提交

sched: Make sched_param argument static in sched_setscheduler() callers · fe7de49f

由 KOSAKI Motohiro 提交于 10月 20, 2010

Andrew Morton pointed out almost all sched_setscheduler() callers are
using fixed parameters and can be converted to static.  It reduces runtime
memory use a little.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NJames Morris <jmorris@namei.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fe7de49f

21 10月, 2010 1 次提交

tracing: Cleanup the convoluted softirq tracepoints · f4bc6bb2

由 Thomas Gleixner 提交于 10月 19, 2010

With the addition of trace_softirq_raise() the softirq tracepoint got
even more convoluted. Why the tracepoints take two pointers to assign
an integer is beyond my comprehension.

But adding an extra case which treats the first pointer as an unsigned
long when the second pointer is NULL including the back and forth
type casting is just horrible.

Convert the softirq tracepoints to take a single unsigned int argument
for the softirq vector number and fix the call sites.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: mathieu.desnoyers@efficios.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>

f4bc6bb2

19 10月, 2010 3 次提交

sched: Call tick_check_idle before __irq_enter · d267f87f

由 Venkatesh Pallipadi 提交于 10月 04, 2010

When CPU is idle and on first interrupt, irq_enter calls tick_check_idle()
to notify interruption from idle. But, there is a problem if this call
is done after __irq_enter, as all routines in __irq_enter may find
stale time due to yet to be done tick_check_idle.

Specifically, trace calls in __irq_enter when they use global clock and also
account_system_vtime change in this patch as it wants to use sched_clock_cpu()
to do proper irq timing.

But, tick_check_idle was moved after __irq_enter intentionally to
prevent problem of unneeded ksoftirqd wakeups by the commit ee5f80a9:

    irq: call __irq_enter() before calling the tick_idle_check
    Impact: avoid spurious ksoftirqd wakeups

Moving tick_check_idle() before __irq_enter and wrapping it with
local_bh_enable/disable would solve both the problems.
Fixed-by: NYong Zhang <yong.zhang0@gmail.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-9-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d267f87f

sched: Add a PF flag for ksoftirqd identification · 6cdd5199

由 Venkatesh Pallipadi 提交于 10月 04, 2010

To account softirq time cleanly in scheduler, we need to identify whether
softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
Add PF_KSOFTIRQD for that purpose.

As all PF flag bits are currently taken, create space by moving one of the
infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
with some other state fields.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-4-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6cdd5199

sched: Fix softirq time accounting · 75e1056f

由 Venkatesh Pallipadi 提交于 10月 04, 2010

Peter Zijlstra found a bug in the way softirq time is accounted in
VIRT_CPU_ACCOUNTING on this thread:

http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

The problem is, softirq processing uses local_bh_disable internally. There
is no way, later in the flow, to differentiate between whether softirq is
being processed or is it just that bh has been disabled. So, a hardirq when bh
is disabled results in time being wrongly accounted as softirq.

Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
as well. As account_system_time() in normal tick based accouting also uses
softirq_count, which will be set even when not in softirq with bh disabled.

Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
processing. The patch below does that and adds API in_serving_softirq() which
returns whether we are currently processing softirq or not.

Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
to in_serving_softirq.

Looks like many usages of in_softirq really want in_serving_softirq. Those
changes can be made individually on a case by case basis.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

75e1056f

12 10月, 2010 2 次提交

genirq: Remove arch_init_chip_data() · b7d0d825

由 Thomas Gleixner 提交于 9月 29, 2010

This function should have not been there in the first place.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@elte.hu>

b7d0d825

genirq: Query arch for number of early descriptors · b683de2b

由 Thomas Gleixner 提交于 9月 27, 2010

sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go
ahead and allocate more.

Use the unused return value of arch_probe_nr_irqs() to let the
architecture return the number of early allocations. Fix up all users.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@elte.hu>

b683de2b

22 9月, 2010 1 次提交
- T
  softirqs: Make wakeup_softirqd static · 676cb02d
  由 Thomas Gleixner 提交于 7月 20, 2009
```
No users outside of kernel/softirq.c
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
```
  676cb02d
05 6月, 2010 1 次提交

kernel/: fix BUG_ON checks for cpu notifier callbacks direct call · 9e506f7a

由 Akinobu Mita 提交于 6月 04, 2010

The commit 80b5184c ("kernel/: convert cpu
notifier to return encapsulate errno value") changed the return value of
cpu notifier callbacks.

Those callbacks don't return NOTIFY_BAD on failures anymore.  But there
are a few callbacks which are called directly at init time and checking
the return value.

I forgot to change BUG_ON checking by the direct callers in the commit.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e506f7a

28 5月, 2010 1 次提交

kernel/: convert cpu notifier to return encapsulate errno value · 80b5184c

由 Akinobu Mita 提交于 5月 26, 2010

By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for kernel/*.c
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80b5184c

11 5月, 2010 1 次提交

rcu: refactor RCU's context-switch handling · 25502a6c

由 Paul E. McKenney 提交于 4月 01, 2010

The addition of preemptible RCU to treercu resulted in a bit of
confusion and inefficiency surrounding the handling of context switches
for RCU-sched and for RCU-preempt. For RCU-sched, a context switch
is a quiescent state, pure and simple, just like it always has been.
For RCU-preempt, a context switch is in no way a quiescent state, but
special handling is required when a task blocks in an RCU read-side
critical section.

However, the callout from the scheduler and the outer loop in ksoftirqd
still calls something named rcu_sched_qs(), whose name is no longer
accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched
quiescent state, it ends up unnecessarily (though harmlessly, aside
from the performance hit) enqueuing the current task if it happens to
be running in an RCU-preempt read-side critical section. This not only
increases the maximum latency of scheduler_tick(), it also needlessly
increases the overhead of the next outermost rcu_read_unlock() invocation.

This patch addresses this situation by separating the notion of RCU's
context-switch handling from that of RCU-sched's quiescent states.
The context-switch handling is covered by rcu_note_context_switch() in
general and by rcu_preempt_note_context_switch() for preemptible RCU.
This permits rcu_sched_qs() to handle quiescent states and only quiescent
states. It also reduces the maximum latency of scheduler_tick(), though
probably by much less than a microsecond. Finally, it means that tasks
within preemptible-RCU read-side critical sections avoid incurring the
overhead of queuing unless there really is a context switch.
Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>

25502a6c

04 2月, 2010 1 次提交

hrtimer, softirq: Fix hrtimer->softirq trampoline · b9c30322

由 Peter Zijlstra 提交于 2月 03, 2010

hrtimers callbacks are always done from hardirq context, either the
jiffy tick interrupt or the hrtimer device interrupt.

[ there is currently one exception that can still call a hrtimer
  callback from softirq, but even in that case this will still
  work correctly. ]
Reported-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yury Polyanskiy <ypolyans@princeton.edu>
Tested-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
LKML-Reference: <1265120401.24455.306.camel@laptop>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b9c30322

02 11月, 2009 1 次提交

rcu: Cleanup: balance rcu_irq_enter()/rcu_irq_exit() calls · c5e0cb3d

由 Lai Jiangshan 提交于 10月 28, 2009

Currently, rcu_irq_exit() is invoked only for CONFIG_NO_HZ,
while rcu_irq_enter() is invoked unconditionally.  This patch
moves rcu_irq_exit() out from under CONFIG_NO_HZ so that the
calls are balanced.

This patch has no effect on the behavior of the kernel because
both rcu_irq_enter() and rcu_irq_exit() are empty for
!CONFIG_NO_HZ, but the code is easier to understand if the calls
are obviously balanced in all cases.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <12567428891605-git-send-email->
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c5e0cb3d

29 10月, 2009 1 次提交

percpu: make percpu symbols under kernel/ and mm/ unique · 1871e52c

由 Tejun Heo 提交于 10月 29, 2009

This patch updates percpu related symbols under kernel/ and mm/ such
that percpu symbols are unique and don't clash with local symbols.
This serves two purposes of decreasing the possibility of global
percpu symbol collision and allowing dropping per_cpu__ prefix from
percpu symbols.

* kernel/lockdep.c: s/lock_stats/cpu_lock_stats/

* kernel/sched.c: s/init_rq_rt/init_rt_rq_var/	(any better idea?)
  		  s/sched_group_cpus/sched_groups/

* kernel/softirq.c: s/ksoftirqd/run_ksoftirqd/a

* kernel/softlockup.c: s/(*)_timestamp/softlockup_\1_ts/
  		       s/watchdog_task/softlockup_watchdog/
		       s/timestamp/ts/ for local variables

* kernel/time/timer_stats: s/lookup_lock/tstats_lookup_lock/

* mm/slab.c: s/reap_work/slab_reap_work/
  	     s/reap_node/slab_reap_node/

* mm/vmstat.c: local variable changed to avoid collision with vmstat_work

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: N(slab/vmstat) Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>

1871e52c

18 9月, 2009 1 次提交

softirq: add BLOCK_IOPOLL to softirq_to_name · 5dd4de58

由 Li Zefan 提交于 9月 17, 2009

With BLOCK_IOPOLL_SOFTIRQ added, softirq_to_name[] and
show_softirq_name() needs to be updated.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AB20398.8070209@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5dd4de58

23 8月, 2009 1 次提交

rcu: Renamings to increase RCU clarity · d6714c22

由 Paul E. McKenney 提交于 8月 22, 2009

Make RCU-sched, RCU-bh, and RCU-preempt be underlying
implementations, with "RCU" defined in terms of one of the
three.  Update the outdated rcu_qsctr_inc() names, as these
functions no longer increment anything.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: akpm@linux-foundation.org
Cc: mathieu.desnoyers@polymtl.ca
Cc: josht@linux.vnet.ibm.com
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
LKML-Reference: <12509746132696-git-send-email->
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d6714c22

22 7月, 2009 1 次提交

softirq: introduce tasklet_hrtimer infrastructure · 9ba5f005

由 Peter Zijlstra 提交于 7月 22, 2009

commit ca109491 (hrtimer: removing all ur callback modes) moved all
hrtimer callbacks into hard interrupt context when high resolution
timers are active. That breaks code which relied on the assumption
that the callback happens in softirq context.

Provide a generic infrastructure which combines tasklets and hrtimers
together to provide an in-softirq hrtimer experience.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: torvalds@linux-foundation.org
Cc: kaber@trash.net
Cc: David Miller <davem@davemloft.net>
LKML-Reference: <1248265724.27058.1366.camel@twins>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9ba5f005

19 6月, 2009 1 次提交

softirq: introduce statistics for softirq · aa0ce5bb

由 Keika Kobayashi 提交于 6月 17, 2009

Statistics for softirq doesn't exist.
It will be helpful like statistics for interrupts.
This patch introduces counting the number of softirq,
which will be exported in /proc/softirqs.

When softirq handler consumes much CPU time,
/proc/stat is like the following.

$ while :; do  cat /proc/stat | head -n1 ; sleep 10 ; done
cpu  88 0 408 739665 583 28 2 0 0
cpu  450 0 1090 740970 594 28 1294 0 0
                              ^^^^
                             softirq

In such a situation,
/proc/softirqs shows us which softirq handler is invoked.
We can see the increase rate of softirqs.

<before>
$ cat /proc/softirqs
                CPU0       CPU1       CPU2       CPU3
HI                 0          0          0          0
TIMER         462850     462805     462782     462718
NET_TX             0          0          0        365
NET_RX          2472          2          2         40
BLOCK              0          0        381       1164
TASKLET            0          0          0        224
SCHED         462654     462689     462698     462427
RCU             3046       2423       3367       3173

<after>
$ cat /proc/softirqs
                CPU0       CPU1       CPU2       CPU3
HI                 0          0          0          0
TIMER         463361     465077     465056     464991
NET_TX            53          0          1        365
NET_RX          3757          2          2         40
BLOCK              0          0        398       1170
TASKLET            0          0          0        224
SCHED         463074     464318     464612     463330
RCU             3505       2948       3947       3673

When CPU TIME of softirq is high,
the rates of increase is the following.
  TIMER  : 220/sec     : CPU1-3
  NET_TX : 5/sec       : CPU0
  NET_RX : 120/sec     : CPU0
  SCHED  : 40-200/sec  : all CPU
  RCU    : 45-58/sec   : all CPU

The rates of increase in an idle mode is the following.
  TIMER  : 250/sec
  SCHED  : 250/sec
  RCU    : 2/sec

It seems many softirqs for receiving packets and rcu are invoked.  This
gives us help for checking system.
Signed-off-by: NKeika Kobayashi <kobayashi.kk@ncos.nec.co.jp>
Reviewed-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa0ce5bb

13 6月, 2009 1 次提交

tasklets: new tasklet scheduling function · 7c692cba

由 Vegard Nossum 提交于 5月 21, 2008

Rationale: kmemcheck needs to be able to schedule a tasklet without
touching any dynamically allocated memory _at_ _all_ (since that would
lead to a recursive page fault). This tasklet is used for writing the
error reports to the kernel log.

The new scheduling function avoids touching any other tasklets by
inserting the new tasklist as the head of the "tasklet_hi" list instead
of on the tail.

Also don't wake up the softirq thread lest the scheduler access some
tracked memory and we go down with a recursive page fault.

In this case, we'd better just wait for the maximum time of 1/HZ for the
message to appear.
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>

7c692cba

29 4月, 2009 1 次提交

tracing: fix build failure on s390 · a0e39ed3

由 Heiko Carstens 提交于 4月 29, 2009

"tracing: create automated trace defines" causes this compile error on s390,
as reported by Sachin Sant against linux-next:

 kernel/built-in.o: In function `__do_softirq':
 (.text+0x1c680): undefined reference to `__tracepoint_softirq_entry'

This happens because the definitions of the softirq tracepoints were moved
from kernel/softirq.c to kernel/irq/handle.c. Since s390 doesn't support
generic hardirqs handle.c doesn't get compiled and the definitions are
missing.

So move the tracepoints to softirq.c again.

[ Impact: fix build failure on s390 ]
Reported-by: NSachin Sant <sachinp@in.ibm.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <20090429135139.5fac79b8@osiris.boeblingen.de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a0e39ed3

28 4月, 2009 1 次提交

x86/irq: change irq_desc_alloc() to take node instead of cpu · 85ac16d0

由 Yinghai Lu 提交于 4月 27, 2009

This simplifies the node awareness of the code. All our allocators
only deal with a NUMA node ID locality not with CPU ids anyway - so
there's no need to maintain (and transform) a CPU id all across the
IRq layer.

v2: keep move_irq_desc related

[ Impact: cleanup, prepare IRQ code to be NUMA-aware ]
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
LKML-Reference: <49F65536.2020300@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

85ac16d0

17 4月, 2009 1 次提交

kernel/softirq.c: fix sparse warning · 79d381c9

由 H Hartley Sweeten 提交于 4月 16, 2009

Fix sparse warning in kernel/softirq.c.

  warning: do-while statement is not a compound statement
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
LKML-Reference: <BD79186B4FD85F4B8E60E381CAEE1909015F9033@mi8nycmail19.Mi8.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

79d381c9

15 4月, 2009 2 次提交

tracing/events: move trace point headers into include/trace/events · ad8d75ff

由 Steven Rostedt 提交于 4月 14, 2009

Impact: clean up

Create a sub directory in include/trace called events to keep the
trace point headers in their own separate directory. Only headers that
declare trace points should be defined in this directory.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ad8d75ff

tracing: create automated trace defines · a8d154b0

由 Steven Rostedt 提交于 4月 10, 2009

This patch lowers the number of places a developer must modify to add
new tracepoints. The current method to add a new tracepoint
into an existing system is to write the trace point macro in the
trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
DECLARE_TRACE, then they must add the same named item into the C file
with the macro DEFINE_TRACE(name) and then add the trace point.

This change cuts out the needing to add the DEFINE_TRACE(name).
Every file that uses the tracepoint must still include the trace/<type>.h
file, but the one C file must also add a define before the including
of that file.

 #define CREATE_TRACE_POINTS
 #include <trace/mytrace.h>

This will cause the trace/mytrace.h file to also produce the C code
necessary to implement the trace point.

Note, if more than one trace/<type>.h is used to create the C code
it is best to list them all together.

 #define CREATE_TRACE_POINTS
 #include <trace/foo.h>
 #include <trace/bar.h>
 #include <trace/fido.h>

Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
the cleaner solution of the define above the includes over my first
design to have the C code include a "special" header.

This patch converts sched, irq and lockdep and skb to use this new
method.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a8d154b0

31 3月, 2009 1 次提交

hrtimer: fix rq->lock inversion (again) · 7f1e2ca9

由 Peter Zijlstra 提交于 3月 13, 2009

It appears I inadvertly introduced rq->lock recursion to the
hrtimer_start() path when I delegated running already expired
timers to softirq context.

This patch fixes it by introducing a __hrtimer_start_range_ns()
method that will not use raise_softirq_irqoff() but
__raise_softirq_irqoff() which avoids the wakeup.

It then also changes schedule() to check for pending softirqs and
do the wakeup then, I'm not quite sure I like this last bit, nor
am I convinced its really needed.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
LKML-Reference: <20090313112301.096138802@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7f1e2ca9

13 3月, 2009 4 次提交

softirq: no need to have SOFTIRQ in softirq name · 899039e8

由 Steven Rostedt 提交于 3月 13, 2009

Impact: clean up

It is redundant to have 'SOFTIRQ' in the softirq names.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

899039e8

tracing: tracepoints for softirq entry/exit - tracepoints · 39842323

由 Jason Baron 提交于 3月 12, 2009

Introduce softirq entry/exit tracepoints. These are useful for
augmenting existing tracers, and to figure out softirq frequencies and
timings.

[
  s/irq_softirq_/softirq_/ for trace point names and
  Fixed printf format in TRACE_FORMAT macro
   - Steven Rostedt
]

LKML-Reference: <20090312183603.GC3352@redhat.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

39842323

tracing: tracepoints for softirq entry/exit - add softirq-to-name array · 5d592b44

由 Jason Baron 提交于 3月 12, 2009

Create a 'softirq_to_name' array, which is indexed by softirq #, so
that we can easily convert between the softirq index # and its name, in
order to get more meaningful output messages.

LKML-Reference: <20090312183336.GB3352@redhat.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

5d592b44

locking: rename trace_softirq_[enter|exit] => lockdep_softirq_[enter|exit] · d820ac4c

由 Ingo Molnar 提交于 3月 13, 2009

Impact: cleanup

The naming clashes with upcoming softirq tracepoints, so rename the
APIs to lockdep_*().
Requested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d820ac4c

05 3月, 2009 1 次提交

rcu: increment quiescent state counter in ksoftirqd() · 64ca5ab9

由 Eric Dumazet 提交于 3月 04, 2009

If a machine is flooded by network frames, a cpu can loop
100% of its time inside ksoftirqd() without calling schedule().
This can delay RCU grace period to insane values.

Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.

Paul: "This regression was a result of the recent change from
"schedule()" to "cond_resched()", which got rid of that quiescent
state in the common case where a reschedule is not needed".
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

64ca5ab9

25 2月, 2009 1 次提交

generic-ipi: remove CSD_FLAG_WAIT · 6e275637

由 Peter Zijlstra 提交于 2月 25, 2009

Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
the code so that we can use CSD_FLAG_LOCK for both purposes.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6e275637

23 1月, 2009 1 次提交

trace, lockdep: manual preempt count adding for local_bh_disable · 7e49fcce

由 Steven Rostedt 提交于 1月 22, 2009

Impact: fix to preempt trace triggering lockdep check_flag failure

In local_bh_disable, the use of add_preempt_count causes the
preempt tracer to start recording the time preemption is off.
But because it already modified the preempt_count to show
softirqs disabled, and before it called the lockdep code to
handle this, it causes a state that lockdep can not handle.

The preempt tracer will reset the ring buffer on start of a trace,
and the ring buffer reset code does a spin_lock_irqsave. This
calls into lockdep and lockdep will fail when it detects the
invalid state of having softirqs disabled but the internal
current->softirqs_enabled is still set.

The fix is to manually add the SOFTIRQ_OFFSET to preempt count
and call the preempt tracer code outside the lockdep critical
area.

Thanks to Peter Zijlstra for suggesting this solution.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7e49fcce

13 1月, 2009 1 次提交

x86: arch_probe_nr_irqs · 4a046d17

由 Yinghai Lu 提交于 1月 12, 2009

Impact: save RAM with large NR_CPUS, get smaller nr_irqs
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NMike Travis <travis@sgi.com>

4a046d17

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功