提交 · f740e6cd0cb5e7468e46831aeb4d9c30e03d5ebc · openeuler / raspberrypi-kernel

28 6月, 2011 3 次提交

stop_machine: implement stop_machine_from_inactive_cpu() · f740e6cd

由 Tejun Heo 提交于 6月 23, 2011

Currently, mtrr wants stop_machine functionality while a CPU is being
brought up.  As stop_machine() requires the calling CPU to be active,
mtrr implements its own stop_machine using stop_one_cpu() on each
online CPU.  This doesn't only unnecessarily duplicate complex logic
but also introduces a possibility of deadlock when it races against
the generic stop_machine().

This patch implements stop_machine_from_inactive_cpu() to serve such
use cases.  Its functionality is basically the same as stop_machine();
however, it should be called from a CPU which isn't active and doesn't
depend on working scheduling on the calling CPU.

This is achieved by using busy loops for synchronization and
open-coding stop_cpus queuing and waiting with direct invocation of
fn() for local CPU inbetween.
Signed-off-by: NTejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110623182056.982526827@sbsiddha-MOBL3.sc.intel.comSigned-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f740e6cd

stop_machine: reorganize stop_cpus() implementation · fd7355ba

由 Tejun Heo 提交于 6月 23, 2011

Refactor the queuing part of the stop cpus work from __stop_cpus() into
queue_stop_cpus_work().

The reorganization is to help future improvements to stop_machine()
and doesn't introduce any behavior difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20110623182056.897818337@sbsiddha-MOBL3.sc.intel.comSigned-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

fd7355ba

x86, mtrr: lock stop machine during MTRR rendezvous sequence · 6d3321e8

由 Suresh Siddha 提交于 6月 23, 2011

MTRR rendezvous sequence using stop_one_cpu_nowait() can potentially
happen in parallel with another system wide rendezvous using
stop_machine(). This can lead to deadlock (The order in which
works are queued can be different on different cpu's. Some cpu's
will be running the first rendezvous handler and others will be running
the second rendezvous handler. Each set waiting for the other set to join
for the system wide rendezvous, leading to a deadlock).

MTRR rendezvous sequence is not implemented using stop_machine() as this
gets called both from the process context aswell as the cpu online paths
(where the cpu has not come online and the interrupts are disabled etc).
stop_machine() works with only online cpus.

For now, take the stop_machine mutex in the MTRR rendezvous sequence that
gets called from an online cpu (here we are in the process context
and can potentially sleep while taking the mutex). And the MTRR rendezvous
that gets triggered during cpu online doesn't need to take this stop_machine
lock (as the stop_machine() already ensures that there is no cpu hotplug
going on in parallel by doing get_online_cpus())

TBD: Pursue a cleaner solution of extending the stop_machine()
infrastructure to handle the case where the calling cpu is
still not online and use this for MTRR rendezvous sequence.

fixes: https://bugzilla.novell.com/show_bug.cgi?id=672008Reported-by: NVadim Kotelnikov <vadimuzzz@inbox.ru>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/20110623182056.807230326@sbsiddha-MOBL3.sc.intel.com
Cc: stable@kernel.org # 2.6.35+, backport a week or two after this gets more testing in mainline
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

6d3321e8

22 6月, 2011 3 次提交

alarmtimers: Return -ENOTSUPP if no RTC device is present · 1c6b39ad

由 John Stultz 提交于 6月 16, 2011

Toralf Förster and Richard Weinberger noted that if there is
no RTC device, the alarm timers core prints out an annoying
"ALARM timers will not wake from suspend" message.

This warning has been removed in a previous patch, however
the issue still remains:  The original idea was to support
alarm timers even if there was no rtc device, as long as the
system didn't go into suspend.

However, after further consideration, communicating to the application
that alarmtimers are not fully functional seems like the better
solution.

So this patch makes it so we return -ENOTSUPP to any posix _ALARM
clockid calls if there is no backing RTC device on the system.

Further this changes the behavior where when there is no rtc device
we will check for one on clock_getres, clock_gettime, timer_create,
and timer_nsleep instead of on suspend.

CC: Toralf Förster <toralf.foerster@gmx.de>
CC: Richard Weinberger <richard@nod.at
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Reported by: Richard Weinberger <richard@nod.at>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

1c6b39ad

alarmtimers: Handle late rtc module loading · c008ba58

由 John Stultz 提交于 6月 16, 2011

The alarmtimers code currently picks a rtc device to use at
late init time. However, if your rtc driver is loaded as a module,
it may be registered after the alarmtimers late init code, leaving
the alarmtimers nonfunctional.

This patch moves the the rtcdevice selection to when we actually try
to use it, allowing us to make use of rtc modules that may have been
loaded at any point since bootup.

CC: Thomas Gleixner <tglx@linutronix.de>
CC: Meelis Roos <mroos@ut.ee>
Reported-by: NMeelis Roos <mroos@ut.ee>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

c008ba58

PM: Free memory bitmaps if opening /dev/snapshot fails · 8440f4b1

由 Michal Kubecek 提交于 6月 18, 2011

When opening /dev/snapshot device, snapshot_open() creates memory
bitmaps which are freed in snapshot_release(). But if any of the
callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
fails, snapshot_release() is never called and bitmaps are not freed.
Next attempt to open /dev/snapshot then triggers BUG_ON() check in
create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
is active on s390x.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org

8440f4b1

18 6月, 2011 1 次提交

KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring · 87966996

由 David Howells 提交于 6月 17, 2011

____call_usermodehelper() now erases any credentials set by the
subprocess_inf::init() function.  The problem is that commit
17f60a7d ("capabilites: allow the application of capability limits
to usermode helpers") creates and commits new credentials with
prepare_kernel_cred() after the call to the init() function.  This wipes
all keyrings after umh_keys_init() is called.

The best way to deal with this is to put the init() call just prior to
the commit_creds() call, and pass the cred pointer to init().  That
means that umh_keys_init() and suchlike can modify the credentials
_before_ they are published and potentially in use by the rest of the
system.

This prevents request_key() from working as it is prevented from passing
the session keyring it set up with the authorisation token to
/sbin/request-key, and so the latter can't assume the authority to
instantiate the key.  This causes the in-kernel DNS resolver to fail
with ENOKEY unconditionally.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Tested-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87966996

17 6月, 2011 3 次提交

generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts · d8ad7d11

由 Takao Indoh 提交于 3月 29, 2011

There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.

To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.

The details of the crash are:

 (1) 2nd kernel boots up

 (2) A pending IPI from 1st kernel comes when irqs are first enabled
     in start_kernel().

 (3) Kernel tries to handle the interrupt, but call_single_queue
     is not initialized yet at this point. As a result, in the
     generic_smp_call_function_single_interrupt(), NULL pointer
     dereference occurs when list_replace_init() tries to access
     &q->list.next.

Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().
Signed-off-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Milton Miller <miltonm@bga.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

d8ad7d11

rcu: Move RCU_BOOST #ifdefs to header file · f8b7fc6b

由 Paul E. McKenney 提交于 6月 16, 2011

The commit "use softirq instead of kthreads except when RCU_BOOST=y"
just applied #ifdef in place. This commit is a cleanup that moves
the newly #ifdef'ed code to the header file kernel/rcutree_plugin.h.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f8b7fc6b

clocksource: Make watchdog robust vs. interruption · b5199515

由 Thomas Gleixner 提交于 6月 16, 2011

The clocksource watchdog code is interruptible and it has been
observed that this can trigger false positives which disable the TSC.

The reason is that an interrupt storm or a long running interrupt
handler between the read of the watchdog source and the read of the
TSC brings the two far enough apart that the delta is larger than the
unstable treshold. Move both reads into a short interrupt disabled
region to avoid that.
Reported-and-tested-by: NVernon Mauery <vernux@us.ibm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org

b5199515

16 6月, 2011 3 次提交

rcu: use softirq instead of kthreads except when RCU_BOOST=y · a46e0899

由 Paul E. McKenney 提交于 6月 15, 2011

This patch #ifdefs RCU kthreads out of the kernel unless RCU_BOOST=y,
thus eliminating context-switch overhead if RCU priority boosting has
not been configured.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a46e0899

gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL · d2c32258

由 Josh Triplett 提交于 6月 15, 2011

CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time. According to commit b99b87f7 ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this. However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it. Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of
CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

Observed in the short list of =y values in a minimal kernel configuration.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Acked-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2c32258

memcg: clear mm->owner when last possible owner leaves · 733eda7a

由 KAMEZAWA Hiroyuki 提交于 6月 15, 2011

The following crash was reported:

> Call Trace:
> [<ffffffff81139792>] mem_cgroup_from_task+0x15/0x17
> [<ffffffff8113a75a>] __mem_cgroup_try_charge+0x148/0x4b4
> [<ffffffff810493f3>] ? need_resched+0x23/0x2d
> [<ffffffff814cbf43>] ? preempt_schedule+0x46/0x4f
> [<ffffffff8113afe8>] mem_cgroup_charge_common+0x9a/0xce
> [<ffffffff8113b6d1>] mem_cgroup_newpage_charge+0x5d/0x5f
> [<ffffffff81134024>] khugepaged+0x5da/0xfaf
> [<ffffffff81078ea0>] ? __init_waitqueue_head+0x4b/0x4b
> [<ffffffff81133a4a>] ? add_mm_counter.constprop.5+0x13/0x13
> [<ffffffff81078625>] kthread+0xa8/0xb0
> [<ffffffff814d13e8>] ? sub_preempt_count+0xa1/0xb4
> [<ffffffff814d5664>] kernel_thread_helper+0x4/0x10
> [<ffffffff814ce858>] ? retint_restore_args+0x13/0x13
> [<ffffffff8107857d>] ? __init_kthread_worker+0x5a/0x5a

What happens is that khugepaged tries to charge a huge page against an mm
whose last possible owner has already exited, and the memory controller
crashes when the stale mm->owner is used to look up the cgroup to charge.

mm->owner has never been set to NULL with the last owner going away, but
nobody cared until khugepaged came along.

Even then it wasn't a problem because the final mmput() on an mm was
forced to acquire and release mmap_sem in write-mode, preventing an
exiting owner to go away while the mmap_sem was held, and until "692e0b35
mm: thp: optimize memcg charge in khugepaged", the memory cgroup charge
was protected by mmap_sem in read-mode.

Instead of going back to relying on the mmap_sem to enforce lifetime of a
task, this patch ensures that mm->owner is properly set to NULL when the
last possible owner is exiting, which the memory controller can handle
just fine.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-by: NHugh Dickins <hughd@google.com>
Reported-by: NDave Jones <davej@redhat.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

733eda7a

15 6月, 2011 5 次提交

sched: Check if lowest_mask is initialized in find_lowest_rq() · 0da938c4

由 Steven Rostedt 提交于 6月 14, 2011

On system boot up, the lowest_mask is initialized with an
early_initcall(). But RT tasks may wake up on other
early_initcall() callers before the lowest_mask is initialized,
causing a system crash.

Commit "d72bce0e rcu: Cure load woes" was the first commit
to wake up RT tasks in early init. Before this commit this bug
should not happen.
Reported-by: NAndrew Theurer <habanero@linux.vnet.ibm.com>
Tested-by: NAndrew Theurer <habanero@linux.vnet.ibm.com>
Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110614223657.824872966@goodmis.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

0da938c4

sched: Fix need_resched() when checking peempt · 8dd0de8b

由 Hillf Danton 提交于 6月 14, 2011

The RT preempt check tests the wrong task if NEED_RESCHED is
set. It currently checks the local CPU task. It is supposed to
check the task that is running on the runqueue we are about to
wake another task on.
Signed-off-by: NHillf Danton <dhillf@gmail.com>
Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110614223657.450239027@goodmis.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

8dd0de8b

signal.c: fix kernel-doc notation · ada9c933

由 Randy Dunlap 提交于 6月 14, 2011

Fix kernel-doc warnings in signal.c:

Warning(kernel/signal.c:2374): No description found for parameter 'nset'
Warning(kernel/signal.c:2374): Excess function parameter 'set' description in 'sys_rt_sigprocmask'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ada9c933

rcu: Use softirq to address performance regression · 09223371

由 Shaohua Li 提交于 6月 14, 2011

Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread. A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler. Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling. (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime. And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work. RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case. This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Tested-by: N"Alex,Shi" <alex.shi@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

09223371

rcu: Simplify curing of load woes · 9a432736

由 Paul E. McKenney 提交于 5月 30, 2011

Make the functions creating the kthreads wake them up.  Leverage the
fact that the per-node and boost kthreads can run anywhere, thus
dispensing with the need to wake them up once the incoming CPU has
gone fully online.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NDaniel J Blueman <daniel.blueman@gmail.com>

9a432736

10 6月, 2011 1 次提交

genirq: Prevent potential NULL dereference in irq_set_irq_wake() · 13863a66

由 Jesper Juhl 提交于 6月 09, 2011

In kernel/irq/manage.c::irq_set_irq_wake() we call
irq_get_desc_buslock() which may return NULL, but the code
dereferences the result unconditionally.

irq_set_irq_wake() has lots of callers - I checked a few and I couldn't
find anything that guarantees that they won't call it with some input that
will cause irq_get_desc_buslock() to return NULL, so I think it's a good
thing to test and -EINVAL was the most sane error code in this situation
that I could think of.

Not all callers test the return value of irq_set_irq_wake(), but those
that do take != 0 to mean error as far as I can see, so they should be
fine. I guess those that don't test actually should, but that's a
different issue.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1106092300360.17868@swampdragon.chaosbits.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

13863a66

09 6月, 2011 1 次提交

tracing: Fix regression in printk_formats file · db5e7ecc

由 Steven Rostedt 提交于 6月 09, 2011

The fix to fix the printk_formats of modules broke the
printk_formats of trace_printks in the kernel.

The update of what to show via the seq_file was only updated
if the passed in fmt was NULL, which happens only on the first
iteration. The result was showing the first format every time
instead of iterating through the available formats.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

db5e7ecc

08 6月, 2011 2 次提交

ftrace: Revert ftrace: Remove unnecessary disabling of irqs · a4f18ed1

由 Steven Rostedt 提交于 6月 07, 2011

Revert the commit that removed the disabling of interrupts around
the initial modifying of mcount callers to nops, and update the comment.

The original comment was outdated and stated that the interrupts were
being disabled to prevent kstop machine, which was required with the
old ftrace daemon, but was no longer the case.

What the comment failed to mention was that interrupts needed to be
disabled to keep interrupts from preempting the modifying of the code
and then executing the code that was partially modified.

Revert the commit and update the comment.
Reported-by: NRichard W.M. Jones <rjones@redhat.com>
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a4f18ed1

kprobes/trace: Fix kprobe selftest for gcc 4.6 · 265a5b7e

由 Steven Rostedt 提交于 6月 06, 2011

With gcc 4.6, the self test kprobe function:

 kprobe_trace_selftest_target()

is optimized such that kallsyms does not list it. The kprobes
test uses this function to insert a probe and test it. But
it will fail the test if the function is not listed in kallsyms.

Adding a __used annotation keeps the symbol in the kallsyms table.
Suggested-by: NDavid Daney <ddaney@caviumnetworks.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

265a5b7e

07 6月, 2011 3 次提交

sched: Fix/clarify set_task_cpu() locking rules · 6c6c54e1

由 Peter Zijlstra 提交于 6月 03, 2011

Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
set_task_cpu() was called with both relevant rq->locks held, which
should be sufficient for running tasks since holding its rq->lock
will serialize against sched_move_task().

Update the comments and fix the task_group() lockdep test.
Reported-and-tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1307115427.2353.3456.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

6c6c54e1

lockdep: Fix lock_is_held() on recursion · f2513cde

由 Peter Zijlstra 提交于 6月 06, 2011

The main lock_is_held() user is lockdep_assert_held(), avoid false
assertions in lockdep_off() sections by unconditionally reporting the
lock is taken.

[ the reason this is important is a lockdep_assert_held() in ttwu()
  which triggers a warning under lockdep_off() as in printk() which
  can trigger another wakeup and lock up due to spinlock
  recursion, as reported and heroically debugged by Arne Jansen ]
Reported-and-tested-by: NArne Jansen <lists@die-jansens.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1307398759.2497.966.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>

f2513cde

ftrace: Fix possible undefined return code · 0aff1c0c

由 GuoWen Li 提交于 6月 01, 2011

kernel/trace/ftrace.c: In function 'ftrace_regex_write.clone.15':
kernel/trace/ftrace.c:2743:6: warning: 'ret' may be used uninitialized in this
function
Signed-off-by: NGuoWen Li <guowen.li.linux@gmail.com>
Link: http://lkml.kernel.org/r/201106011918.47939.guowen.li.linux@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>

0aff1c0c

04 6月, 2011 1 次提交

perf: Comment /proc/sys/kernel/perf_event_paranoid to be part of user ABI · aa4a2218

由 Vince Weaver 提交于 6月 03, 2011

Turns out that distro packages use this file as an indicator of
the perf event subsystem - this is easier to check for from scripts
than the existence of the system call.

This is easy enough to keep around for the kernel, so add a
comment to make sure it stays so.
Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
Cc: David Ahern <dsahern@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031751170.29381@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>

aa4a2218

03 6月, 2011 6 次提交

timers: Consider slack value in mod_timer() · 1c3cc116

由 Sebastian Andrzej Siewior 提交于 5月 21, 2011

There is an optimization which does not update the timer if the timer
was pending and the expiration time was unchanged.

Since commit 3bbb9ec9 ("timers: Introduce the concept of timer slack
for legacy timers") this optimization is no longer applied for timers
where the expiration time got extended due to the slack value. So we
need to check again after the expiration time might have been updated.

[ tglx: Made it a single check by applying slack first and sorting
  out the slack = 0 value (all timeouts < 256 jiffies) early ]
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
Link: http://lkml.kernel.org/r/20110521105828.GA29442@Chamillionaire.breakpoint.ccSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1c3cc116

genirq: Ensure we locate the passed IRQ in irq_alloc_descs() · c5182b88

由 Mark Brown 提交于 6月 02, 2011

When irq_alloc_descs() is called with no base IRQ specified then it will
search for a range of IRQs starting from a specified base address. In the
case where an IRQ is specified it still does this search in order to ensure
that none of the requested range is already allocated and it still uses the
from parameter to specify the base for the search. This means that in the
case where a base is specified but from is zero (which is reasonable as
any IRQ number is in the range specified by a zero from) the function will
get confused and try to allocate the first suitably sized block of free IRQs
it finds.

Instead use a specified IRQ as the base address for the search, and insist
that any from that is specified can support that IRQ.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Link: http://lkml.kernel.org/r/1307037313-15733-1-git-send-email-broonie@opensource.wolfsonmicro.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

c5182b88

genirq: Fix descriptor init on non-sparse IRQs · e7fbad30

由 Linus Walleij 提交于 5月 31, 2011

The genirq changes are initializing descriptors for sparse IRQs quite
differently from how non-sparse (stacked?) IRQs are initialized, with
the effect that on my platform all IRQs are default-disabled on sparse
IRQs and default-enabled if non-sparse IRQs are used, crashing some
GPIO driver.

Fix this by refactoring the non-sparse IRQs to use the same descriptor
init function as the sparse IRQs.

Signed-off: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/1306858479-16622-1-git-send-email-linus.walleij@stericsson.com
Cc: stable@kernel.org # 2.6.39
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e7fbad30

irq: Handle spurios irq detection for threaded irqs · 3a43e05f

由 Sebastian Andrzej Siewior 提交于 5月 31, 2011

The detection of spurios interrupts is currently limited to first level
handler. In force-threaded mode we never notice if the threaded irq does
not feel responsible.
This patch catches the return value of the threaded handler and forwards
it to the spurious detector. If the primary handler returns only
IRQ_WAKE_THREAD then the spourious detector ignores it because it gets
called again from the threaded handler.

[ tglx: Report the erroneous return value early and bail out ]
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
Link: http://lkml.kernel.org/r/1306824972-27067-2-git-send-email-sebastian@breakpoint.ccSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

3a43e05f

genirq: Print threaded handler in spurious debug output · ef26f20c

由 Sebastian Andrzej Siewior 提交于 5月 31, 2011

In forced threaded mode (or with an explicit threaded handler) we only
see the primary handler, but not the threaded handler.
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
Link: http://lkml.kernel.org/r/1306824972-27067-1-git-send-email-sebastian@breakpoint.ccSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ef26f20c

clockevents: Handle empty cpumask gracefully · 1b054b67

由 Thomas Gleixner 提交于 6月 03, 2011

For UP it's stupid to request an initialized cpumask for the clock
event devices. Though we need the mask set even on UP to avoid a
horrible ifdeffery especially in the broadcast code.

For SMP we can at least try to survive with a warning and set the
cpumask of the cpu we're running on. That gives a decent chance to
bring the machine up and retrieve the debug info.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: Stephen Boyd <sboyd@codeaurora.org>

1b054b67

31 5月, 2011 4 次提交

perf, cgroups: Fix up for new API · 74c355fb

由 Peter Zijlstra 提交于 5月 30, 2011

Ben changed the cgroup API in commit f780bdb7 (cgroups: add
per-thread subsystem callbacks) in an incompatible way, but
forgot to convert the perf cgroup bits.

Avoid compile warnings and runtime splats and convert perf too ;-)
Acked-by: NBen Blum <bblum@andrew.cmu.edu>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1306767651.1200.2990.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

74c355fb

sched: Fix schedstat.nr_wakeups_migrate · f339b9dc

由 Peter Zijlstra 提交于 5月 31, 2011

While looking over the code I found that with the ttwu rework the
nr_wakeups_migrate test broke since we now switch cpus prior to
calling ttwu_stat(), hence the test is always true.

Cure this by passing the migration state in wake_flags. Also move the
whole test under CONFIG_SMP, its hard to migrate tasks on UP :-)
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-pwwxl7gdqs5676f1d4cx6pj7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

f339b9dc

sched: Fix cross-cpu clock sync on remote wakeups · f01114cb

由 Peter Zijlstra 提交于 5月 31, 2011

Markus reported that commit 317f3941 ("sched: Move the second half
of ttwu() to the remote cpu") caused some accounting funnies on his AMD
Phenom II X4, such as weird 'top' results.

It turns out that this is due to non-synced TSC and the queued remote
wakeups stopped coupeling the two relevant cpu clocks, which leads to
wakeups seeing time jumps, which in turn lead to skewed runtime stats.

Add an explicit call to sched_clock_cpu() to couple the per-cpu clocks
to restore the normal flow of time.
Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1306835745.2353.3.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

f01114cb

rcu: Cure load woes · d72bce0e

由 Peter Zijlstra 提交于 5月 30, 2011

Commit cc3ce517 (rcu: Start RCU kthreads in TASK_INTERRUPTIBLE
state) fudges a sleeping task' state, resulting in the scheduler seeing
a TASK_UNINTERRUPTIBLE task going to sleep, but a TASK_INTERRUPTIBLE
task waking up. The result is unbalanced load calculation.

The problem that patch tried to address is that the RCU threads could
stay in UNINTERRUPTIBLE state for quite a while and triggering the hung
task detector due to on-demand wake-ups.

Cure the problem differently by always giving the tasks at least one
wake-up once the CPU is fully up and running, this will kick them out of
the initial UNINTERRUPTIBLE state and into the regular INTERRUPTIBLE
wait state.

[ The alternative would be teaching kthread_create() to start threads as
  INTERRUPTIBLE but that needs a tad more thought. ]
Reported-by: NDamien Wyart <damien.wyart@free.fr>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Link: http://lkml.kernel.org/r/1306755291.1200.2872.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

d72bce0e

30 5月, 2011 1 次提交

mm: Fix boot crash in mm_alloc() · 6345d24d

由 Linus Torvalds 提交于 5月 29, 2011

Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:

    BUG: unable to handle kernel NULL pointer dereference at   (null)
    IP: [<c11ae035>] find_next_bit+0x55/0xb0
    Call Trace:
     [<c11addda>] cpumask_any_but+0x2a/0x70
     [<c102396b>] flush_tlb_mm+0x2b/0x80
     [<c1022705>] pud_populate+0x35/0x50
     [<c10227ba>] pgd_alloc+0x9a/0xf0
     [<c103a3fc>] mm_init+0xec/0x120
     [<c103a7a3>] mm_alloc+0x53/0xd0

which was introduced by commit de03c72c ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask

Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Reported-by: NIngo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6345d24d

29 5月, 2011 1 次提交

idle governor: Avoid lock acquisition to read pm_qos before entering idle · 333c5ae9

由 Tim Chen 提交于 2月 11, 2011

Thanks to the reviews and comments by Rafael, James, Mark and Andi.
Here's version 2 of the patch incorporating your comments and also some
update to my previous patch comments.

I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos target value according to the list of qos
requests received.  This look up currently needs the acquisition of a
lock to access the list of qos requests to find the qos target value,
slowing down the entrance into idle state due to contention by multiple
cpus to access this list.  The contention is severe when there are a lot
of cpus waking and going into idle.  For example, for a simple workload
that has 32 pair of processes ping ponging messages to each other, where
64 cpu cores are active in test system, I see the following profile with
37.82% of cpu cycles spent in contention of pm_qos_lock:

-     37.82%          swapper  [kernel.kallsyms]          [k]
_raw_spin_lock_irqsave
   - _raw_spin_lock_irqsave
      - 95.65% pm_qos_request
           menu_select
           cpuidle_idle_call
         - cpu_idle
              99.98% start_secondary

A better approach will be to cache the updated pm_qos target value so
reading it does not require lock acquisition as in the patch below.
With this patch the contention for pm_qos_lock is removed and I saw a
2.2X increase in throughput for my message passing workload.

cc: stable@kernel.org
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJames Bottomley <James.Bottomley@suse.de>
Acked-by: Nmark gross <markgross@thegnar.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

333c5ae9

28 5月, 2011 2 次提交

rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state · cc3ce517

由 Paul E. McKenney 提交于 5月 25, 2011

Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can
result in softlockup warnings.  Because some of RCU's kthreads can
legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE
state in order to avoid those warnings.
Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cc3ce517

rcu: Remove waitqueue usage for cpu, node, and boost kthreads · 08bca60a

由 Peter Zijlstra 提交于 5月 20, 2011

It is not necessary to use waitqueues for the RCU kthreads because
we always know exactly which thread is to be awakened.  In addition,
wake_up() only issues an actual wakeup when there is a thread waiting on
the queue, which was why there was an extra explicit wake_up_process()
to get the RCU kthreads started.

Eliminating the waitqueues (and wake_up()) in favor of wake_up_process()
eliminates the need for the initial wake_up_process() and also shrinks
the data structure size a bit.  The wakeup logic is placed in a new
rcu_wait() macro.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

08bca60a