提交 · 0f2541d299d233eddddee4345795e0c46264fd56 · _Walt / cloud-kernel

06 8月, 2009 1 次提交

ring-buffer: fix check of try_to_discard result · 0f2541d2

由 Steven Rostedt 提交于 8月 05, 2009

The function ring_buffer_discard_commit inversed the code path
of the result of try_to_discard. It should skip incrementing the
entry counter if try_to_discard succeeded. But instead, it increments
the entry conder if it succeeded to discard, and does not increment
it if it fails.

The result of this bug is that filtering will make the stat counters
incorrect.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0f2541d2

29 7月, 2009 2 次提交

tracing: Fix missing function_graph events when we splice_read from trace_pipe · 74e7ff8c

由 Lai Jiangshan 提交于 7月 28, 2009

About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.

The same problem appears with ftrace_dump()

Example of an output before this patch:

1)               |      ktime_get_real() {
1)   2.846 us    |          read_hpet();
1)   4.558 us    |        }
1)   6.195 us    |      }

After this patch:

0)               |      ktime_get_real() {
0)               |        getnstimeofday() {
0)   1.960 us    |          read_hpet();
0)   3.597 us    |        }
0)   5.196 us    |      }

The fix also applies on 2.6.30
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@kernel.org
LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

74e7ff8c

tracing: Fix invalid function_graph entry · 38ceb592

由 Lai Jiangshan 提交于 7月 28, 2009

When print_graph_entry() computes a function call entry event, it needs
to also check the next entry to guess if it matches the return event of
the current function entry.
In order to look at this next event, it needs to consume the current
entry before going ahead in the ring buffer.

However, if the current event that gets consumed is the last one in the
ring buffer head page, the ring_buffer may reuse the page for writers.
The consumed entry will then become invalid because of possible
racy overwriting.

Me must then handle this entry by making a copy of it.

The fix also applies on 2.6.30
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@kernel.org
LKML-Reference: <4A6EEAEC.3050508@cn.fujitsu.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

38ceb592

28 7月, 2009 2 次提交

update the comment in kthread_stop() · 9ae26027

由 Oleg Nesterov 提交于 6月 19, 2009

Commit 63706172 ("kthreads: rework
kthread_stop()") removed the limitation that the thread function mysr
not call do_exit() itself, but forgot to update the comment.

Since that commit it is OK to use kthread_stop() even if kthread can
exit itself.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ae26027

module: use MODULE_SYMBOL_PREFIX with module_layout · 6560dc16

由 Mike Frysinger 提交于 7月 23, 2009

The check_modstruct_version() needs to look up the symbol "module_layout"
in the kernel, but it does so literally and not by a C identifier. The
trouble is that it does not include a symbol prefix for those ports that
need it (like the Blackfin and H8300 port). So make sure we tack on the
MODULE_SYMBOL_PREFIX define to the front of it.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6560dc16

25 7月, 2009 1 次提交

trace: stop tracer in oops_enter() · bdff7870

由 Thomas Gleixner 提交于 7月 24, 2009

If trace_printk_on_oops is set we lose interesting trace information
when the tracer is enabled across oops handling and printing. We want
the trace which might give us information _WHY_ we oopsed.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bdff7870

23 7月, 2009 11 次提交

tracing: only truncate ftrace files when O_TRUNC is set · 8650ae32

由 Steven Rostedt 提交于 7月 22, 2009

The current code will truncate the ftrace files contents if O_APPEND
is not set and the file is opened in write mode. This is incorrect.
It should only truncate the file if O_TRUNC is set. Otherwise
if one of these files is opened by a C program with fopen "r+",
it will incorrectly truncate the file.
Reported-by: NJiri Olsa <jolsa@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8650ae32

tracing: show proper address for trace-printk format · 4c739ff0

由 Steven Rostedt 提交于 7月 22, 2009

Since the trace_printk may use pointers to the format fields
in the buffer, they are exported via debugfs/tracing/printk_formats.
This is used by utilities that read the ring buffer in binary format.
It helps the utilities map the address of the format in the binary
buffer to what the printf format looks like.

Unfortunately, the way the output code works, it exports the address
of the pointer to the format address, and not the format address
itself. This makes the file totally useless in trying to figure
out what format string a binary address belongs to.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4c739ff0

tracing/stat: Fix seqfile memory leak · 636eacee

由 Li Zefan 提交于 7月 23, 2009

Every time we cat a trace_stat file, we leak memory allocated by
seq_open().

Also fix memory leak in a failure path in tracing_stat_open().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A67D92B.4060704@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

636eacee

function-graph: Fix seqfile memory leak · 87827111

由 Li Zefan 提交于 7月 23, 2009

Every time we cat set_graph_function, we leak memory allocated
by seq_open().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A67D907.2010500@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

87827111

trace_stack: Fix seqfile memory leak · d8cc1ab7

由 Li Zefan 提交于 7月 23, 2009

Every time we cat stack_trace, we leak memory allocated by seq_open().
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A67D8E8.3020500@cn.fujitsu.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d8cc1ab7

genirq: Fix UP compile failure caused by irq_thread_check_affinity · 61f38261

由 Bruno Premont 提交于 7月 22, 2009

Since genirq: Delegate irq affinity setting to the irq thread
(591d2fb0) compilation with
CONFIG_SMP=n fails with following error:

/usr/src/linux-2.6/kernel/irq/manage.c:
   In function 'irq_thread_check_affinity':
/usr/src/linux-2.6/kernel/irq/manage.c:475:
   error: 'struct irq_desc' has no member named 'affinity'
make[4]: *** [kernel/irq/manage.o] Error 1

That commit adds a new function irq_thread_check_affinity() which
uses struct irq_desc.affinity which is only available for CONFIG_SMP=y.
Move that function under #ifdef CONFIG_SMP.

[ tglx@brownpaperbag: compile and boot tested on UP and SMP ]
Signed-off-by: NBruno Premont <bonbons@linux-vserver.org>
LKML-Reference: <20090722222232.2eb3e1c4@neptune.home>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

61f38261

perf: fix stack data leak · 0dc3d523

由 Arjan van de Ven 提交于 7月 21, 2009

the "reserved" field was not initialized to zero, resulting in 4 bytes
of stack data leaking to userspace....
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0dc3d523

perf_counter: Fix throttle/unthrottle event logging · 966ee4d6

由 Anton Blanchard 提交于 7月 22, 2009

Right now we only print PERF_EVENT_THROTTLE + 1 (ie PERF_EVENT_UNTHROTTLE).
Fix this to print both a throttle and unthrottle event.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090722130546.GE9029@kryten>

966ee4d6

perf_counter: PERF_SAMPLE_ID and inherited counters · 7f453c24

由 Peter Zijlstra 提交于 7月 21, 2009

Anton noted that for inherited counters the counter-id as provided by
PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
because each inherited counter gets its own id.

His suggestion was to always return the parent counter id, since that
is the primary counter id as exposed. However, these inherited
counters have a unique identifier so that events like
PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
counter gets modified, which is important when trying to normalize the
sample streams.

This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
which is more useful anyway, since changing periods became a lot more
common than initially thought -- rendering PERF_EVENT_PERIOD the less
useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
value, since it reports the value used to trigger the overflow,
whereas PERF_EVENT_PERIOD simply reports the requested period changed,
which might only take effect on the next cycle).

This still leaves us PERF_EVENT_THROTTLE to consider, but since that
_should_ be a rare occurrence, and linking it to a primary id is the
most useful bit to diagnose the problem, we introduce a
PERF_SAMPLE_STREAM_ID, for those few cases where the full
reconstruction is important.

[Does change the ABI a little, but I see no other way out]
Suggested-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248095846.15751.8781.camel@twins>

7f453c24

perf_counter: Plug more stack leaks · 573402db

由 Peter Zijlstra 提交于 7月 22, 2009

Per example of Arjan's patch, I went through and found a few more.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

573402db

perf: Fix stack data leak · c9f73a3d

由 Arjan van de Ven 提交于 7月 21, 2009

the "reserved" field was not initialized to zero, resulting in 4 bytes
of stack data leaking to userspace....
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

c9f73a3d

22 7月, 2009 1 次提交

softirq: introduce tasklet_hrtimer infrastructure · 9ba5f005

由 Peter Zijlstra 提交于 7月 22, 2009

commit ca109491 (hrtimer: removing all ur callback modes) moved all
hrtimer callbacks into hard interrupt context when high resolution
timers are active. That breaks code which relied on the assumption
that the callback happens in softirq context.

Provide a generic infrastructure which combines tasklets and hrtimers
together to provide an in-softirq hrtimer experience.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: torvalds@linux-foundation.org
Cc: kaber@trash.net
Cc: David Miller <davem@davemloft.net>
LKML-Reference: <1248265724.27058.1366.camel@twins>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9ba5f005

21 7月, 2009 1 次提交

genirq: Delegate irq affinity setting to the irq thread · 591d2fb0

由 Thomas Gleixner 提交于 7月 21, 2009

irq_set_thread_affinity() calls set_cpus_allowed_ptr() which might
sleep, but irq_set_thread_affinity() is called with desc->lock held
and can be called from hard interrupt context as well. The code has
another bug as it does not hold a ref on the task struct as required
by set_cpus_allowed_ptr().

Just set the IRQTF_AFFINITY bit in action->thread_flags. The next time
the thread runs it migrates itself. Solves all of the above problems
nicely.

Add kerneldoc to irq_set_thread_affinity() while at it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>

591d2fb0

19 7月, 2009 2 次提交

clocksource: Prevent NULL pointer dereference · 79ef2bb0

由 Thomas Gleixner 提交于 7月 19, 2009

Writing a zero length string to sys/.../current_clocksource will cause
a NULL pointer dereference if the clock events system is in one shot
(highres or nohz) mode.
Pointed-out-by: NDan Carpenter <error27@gmail.com>
LKML-Reference: <alpine.DEB.2.00.0907191545580.12306@bicker>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

79ef2bb0

timer: Avoid reading uninitialized data · 4841158b

由 Pavel Roskin 提交于 7月 18, 2009

timer->expires may be uninitialized, so check timer_pending() before
touching timer->expires to pacify kmemcheck.
Signed-off-by: NPavel Roskin <proski@gnu.org>
LKML-Reference: <20090718204602.5191.360.stgit@mj.roinet.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

4841158b

18 7月, 2009 6 次提交

sched: fix nr_uninterruptible accounting of frozen tasks really · 6301cb95

由 Thomas Gleixner 提交于 7月 17, 2009

commit e3c8ca83 (sched: do not count frozen tasks toward load) broke
the nr_uninterruptible accounting on freeze/thaw. On freeze the task
is excluded from accounting with a check for (task->flags &
PF_FROZEN), but that flag is cleared before the task is thawed. So
while we prevent that the task with state TASK_UNINTERRUPTIBLE
is accounted to nr_uninterruptible on freeze we decrement
nr_uninterruptible on thaw.

Use a separate flag which is handled by the freezing task itself. Set
it before calling the scheduler with TASK_UNINTERRUPTIBLE state and
clear it after we return from frozen state.

Cc: <stable@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6301cb95

sched: fix load average accounting vs. cpu hotplug · a468d389

由 Thomas Gleixner 提交于 7月 17, 2009

The new load average code clears rq->calc_load_active on
CPU_ONLINE. That's wrong as the new onlined CPU might have got a
scheduler tick already and accounted the delta to the stale value of
the time we offlined the CPU.

Clear the value when we cleanup the dead CPU instead. 

Also move the update of the calc_load_update time for the newly online
CPU to CPU_UP_PREPARE to avoid that the CPU plays catch up with the
stale update time value.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a468d389

profile: Suppress warning about large allocations when profile=1 is specified · e5d490b2

由 Mel Gorman 提交于 7月 15, 2009

When profile= is used, a large buffer is allocated early at
boot. This can be larger than what the page allocator can
provide so it prints a warning. However, the caller is able to
handle the situation so this patch suppresses the warning.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>
Cc: Heinz Diehl <htd@fancy-poultry.org>
Cc: David Miller <davem@davemloft.net>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1247656992-19846-3-git-send-email-mel@csn.ul.ie>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e5d490b2

perf_counter: Log vfork as a fork event · ed900c05

由 Anton Blanchard 提交于 7月 16, 2009

Right now we don't output vfork events. Even though we should
always see an exec after a vfork, we may get perfcounter
samples between the vfork and exec. These samples can lead to
some confusion when parsing perfcounter data.

To keep things consistent we should always log a fork event. It
will result in a little more log data, but is less confusing to
trace parsing tools.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.589309391@samba.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ed900c05

perf_counter: Make sure we dont leak kernel memory to userspace · 413ee3b4

由 Anton Blanchard 提交于 7月 16, 2009

There are a few places we are leaking tiny amounts of kernel
memory to userspace. This happens when writing out strings
because we always align the end to 64 bits.

To avoid this we should always use an appropriately sized
temporary buffer and ensure it is zeroed.

Since d_path assembles the string from the end of the buffer
backwards, we need to add 64 bits after the buffer to allow for
alignment.

We also need to copy arch_vma_name to the temporary buffer,
because if we use it directly we may end up copying to
userspace a number of bytes after the end of the string
constant.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.273972048@samba.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

413ee3b4

sched: Account for vruntime wrapping · 54fdc581

由 Fabio Checconi 提交于 7月 16, 2009

I spotted two sites that didn't take vruntime wrap-around into
account. Fix these by creating a comparison helper that does do
so.
Signed-off-by: NFabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

54fdc581

17 7月, 2009 1 次提交

tracing/function: Fix the return value of ftrace_trace_onoff_callback() · 04aef32d

由 Xiao Guangrong 提交于 7月 15, 2009

ftrace_trace_onoff_callback() will return an error even if we do the
right operation, for example:

 # echo _spin_*:traceon:10 > set_ftrace_filter
 -bash: echo: write error: Invalid argument
 # cat set_ftrace_filter
 #### all functions enabled ####
 _spin_trylock_bh:traceon:count=10
 _spin_unlock_irq:traceon:count=10
 _spin_unlock_bh:traceon:count=10
 _spin_lock_irq:traceon:count=10
 _spin_unlock:traceon:count=10
 _spin_trylock:traceon:count=10
 _spin_unlock_irqrestore:traceon:count=10
 _spin_lock_irqsave:traceon:count=10
 _spin_lock_bh:traceon:count=10
 _spin_lock:traceon:count=10

We want to set _spin_*:traceon:10 to set_ftrace_filter, it complains
with "Invalid argument", but the operation is successful.

This is because ftrace_process_regex() returns the number of functions that
matched the pattern. If the number is not 0, this value is returned
by ftrace_regex_write() whereas we want to return the number of bytes
virtually written.
Also the file offset pointer is not updated in this case.

If the number of matched functions is lower than the number of bytes written
by the user, this results to a reprocessing of the string given by the user with
a lower size, leading to a malformed ftrace regex and then a -EINVAL returned.

So, this patch fixes it by returning 0 if no error occured.
The fix also applies on 2.6.30
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

04aef32d

13 7月, 2009 3 次提交

tracing/function-profiler: do not free per cpu variable stat · 6ab5d668

由 Steven Rostedt 提交于 6月 04, 2009

The per cpu variable stat is freeded if we fail to allocate a name
on start up. This was due to stat at first being allocated in the
initial design. But since then, it has become a static per cpu variable
but the free on error was not removed.

Also added __init annotation to the function that this is in.

[ Impact: prevent possible memory corruption on low mem at boot up ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6ab5d668

perf_counter: Fix the tracepoint channel to perfcounters · d4d7d0b9

由 Chris Wilson 提交于 7月 06, 2009

Fix a missed rename in EVENT_PROFILE support so that it gets
built and allows tracepoint tracing from the 'perf' tool.

Fix a typo in the (never before built & enabled) portion in
perf_counter.c as well, and update that code to the
attr.config changes as well.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ben Gamari <bgamari.foss@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d4d7d0b9

headers: smp_lock.h redux · 405f5571

由 Alexey Dobriyan 提交于 7月 11, 2009

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

405f5571

11 7月, 2009 3 次提交

futexes: Fix infinite loop in get_futex_key() on huge page · ce2ae53b

由 Sonny Rao 提交于 7月 10, 2009

get_futex_key() can infinitely loop if it is called on a
virtual address that is within a huge page but not aligned to
the beginning of that page.  The call to get_user_pages_fast
will return the struct page for a sub-page within the huge page
and the check for page->mapping will always fail.

The fix is to call compound_head on the page before checking
that it's mapped.
Signed-off-by: NSonny Rao <sonnyrao@us.ibm.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Cc: anton@samba.org
Cc: rajamony@us.ibm.com
Cc: speight@us.ibm.com
Cc: mstephen@us.ibm.com
Cc: grimm@us.ibm.com
Cc: mikey@ozlabs.au.ibm.com
LKML-Reference: <20090710231313.GA23572@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ce2ae53b

sched: Fix bug in SCHED_IDLE interaction with group scheduling · d07387b4

由 Paul Turner 提交于 7月 10, 2009

One of the isolation modifications for SCHED_IDLE is the
unitization of sleeper credit.  However the check for this
assumes that the sched_entity we're placing always belongs to a
task.

This is potentially not true with group scheduling and leaves
us rummaging randomly when we try to pull the policy.
Signed-off-by: NPaul Turner <pjt@google.com>
Cc: peterz@infradead.org
LKML-Reference: <alpine.DEB.1.00.0907101649570.29914@kitami.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d07387b4

sched: optimize cond_resched() · d86ee480

由 Peter Zijlstra 提交于 7月 10, 2009

Optimize cond_resched() by removing one conditional.

Currently cond_resched() checks system_state ==
SYSTEM_RUNNING in order to avoid scheduling before the
scheduler is running.

We can however, as per suggestion of Matt, use
PREEMPT_ACTIVE to accomplish that very same.
Suggested-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d86ee480

10 7月, 2009 6 次提交

hrtimer: Fix migration expiry check · 6ff7041d

由 Thomas Gleixner 提交于 7月 10, 2009

The timer migration expiry check should prevent the migration of a
timer to another CPU when the timer expires before the next event is
scheduled on the other CPU. Migrating the timer might delay it because
we can not reprogram the clock event device on the other CPU. But the
code implementing that check has two flaws:

- for !HIGHRES the check compares the expiry value with the clock
  events device expiry value which is wrong for CLOCK_REALTIME based
  timers.

- the check is racy. It holds the hrtimer base lock of the target CPU,
  but the clock event device expiry value can be modified
  nevertheless, e.g. by an timer interrupt firing.

The !HIGHRES case is easy to fix as we can enqueue the timer on the
cpu which was selected by the load balancer. It runs the idle
balancing code once per jiffy anyway. So the maximum delay for the
timer is the same as when we keep the tick on the current cpu going.

In the HIGHRES case we can get the next expiry value from the hrtimer
cpu_base of the target CPU and serialize the update with the cpu_base
lock. This moves the lock section in hrtimer_interrupt() so we can set
next_event to KTIME_MAX while we are handling the expired timers and
set it to the next expiry value after we handled the timers under the
base lock. While the expired timers are processed timer migration is
blocked because the expiry time of the timer is always <= KTIME_MAX.

Also remove the now useless clockevents_get_next_event() function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6ff7041d

hrtimer: migration: do not check expiry time on current CPU · 7e0c5086

由 Thomas Gleixner 提交于 7月 09, 2009

The timer migration code needs to check whether the expiry time of the
timer is before the programmed clock event expiry time when the timer
is enqueued on another CPU because we can not reprogram the timer
device on the other CPU. The current logic checks the expiry time even
if we enqueue on the current CPU when nohz_get_load_balancer() returns
current CPU. This might lead to an endless loop in the expiry check
code when the expiry time of the timer is before the current
programmed next event.

Check whether nohz_get_load_balancer() returns current CPU and skip
the expiry check if this is the case.

The bug was triggered from the networking code. The patch fixes the
regression http://bugzilla.kernel.org/show_bug.cgi?id=13738
(Soft-Lockup/Race in networking in 2.6.31-rc1+195)

Cc: Arun Bharadwaj <arun@linux.vnet.ibm.com
Tested-by: NJoao Correia <joaomiguelcorreia@gmail.com>
Tested-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7e0c5086

sched: Fix rt_rq->pushable_tasks initialization in init_rt_rq() · c20b08e3

由 Fabio Checconi 提交于 6月 15, 2009

init_rt_rq() initializes only rq->rt.pushable_tasks, and not the
pushable_tasks field of the passed rt_rq.  The plist is not used
uninitialized since the only pushable_tasks plists used are the
ones of root rt_rqs; anyway reinitializing the list on every group
creation corrupts the root plist, losing its previous contents.
Signed-off-by: NFabio Checconi <fabio@gandalf.sssup.it>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090615185638.GK21741@gandalf.sssup.it>
CC: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c20b08e3

sched: Reset sched stats on fork() · 7793527b

由 Lucas De Marchi 提交于 7月 09, 2009

The sched_stat fields are currently not reset upon fork.
Ingo's recent commit 6c594c21
did reset nr_migrations, but it didn't reset any of the
others.

This patch resets all sched_stat fields on fork.
Signed-off-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <193b0f820907090457s7a3662f4gcdecdc22fcae857b@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7793527b

sched_rt: Fix overload bug on rt group scheduling · a1ba4d8b

由 Peter Zijlstra 提交于 4月 01, 2009

Fixes an easily triggerable BUG() when setting process affinities.

Make sure to count the number of migratable tasks in the same place:
the root rt_rq. Otherwise the number doesn't make sense and we'll hit
the BUG in set_cpus_allowed_rt().

Also, make sure we only count tasks, not groups (this is probably
already taken care of by the fact that rt_se->nr_cpus_allowed will be 0
for groups, but be more explicit)
Tested-by: NThomas Gleixner <tglx@linutronix.de>
CC: stable@kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NGregory Haskins <ghaskins@novell.com>
LKML-Reference: <1247067476.9777.57.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a1ba4d8b

perf_counter: Stop open coding unclone_ctx · 71a851b4

由 Peter Zijlstra 提交于 7月 10, 2009

Instead of open coding the unclone context thingy, put it in
a common function.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71a851b4

_Walt / cloud-kernel 与 Fork 源项目一致

_Walt / cloud-kernel
与 Fork 源项目一致