提交 · e636825346b36a07ccfc8e30946d52855e21f681 · openeuler / raspberrypi-kernel

21 3月, 2012 2 次提交

exit_signal: simplify the "we have changed execution domain" logic · e6368253

由 Oleg Nesterov 提交于 3月 19, 2012

exit_notify() checks "tsk->self_exec_id != tsk->parent_exec_id"
to handle the "we have changed execution domain" case.

We can change do_thread() to always set ->exit_signal = SIGCHLD
and remove this check to simplify the code.

We could change setup_new_exec() instead, this looks more logical
because it increments ->self_exec_id. But note that de_thread()
already resets ->exit_signal if it changes the leader, let's keep
both changes close to each other.

Note that we change ->exit_signal lockless, this changes the rules.
Thereafter ->exit_signal is not stable under tasklist but this is
fine, the only possible change is OLDSIG -> SIGCHLD. This can race
with eligible_child() but the race is harmless. We can race with
reparent_leader() which changes our ->exit_signal in parallel, but
it does the same change to SIGCHLD.

The noticeable user-visible change is that the execing task is not
"visible" to do_wait()->eligible_child(__WCLONE) right after exec.
To me this looks more logical, and this is consistent with mt case.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6368253

CLONE_PARENT shouldn't allow to set ->exit_signal · 5f8aadd8

由 Oleg Nesterov 提交于 3月 14, 2012

The child must not control its ->exit_signal, it is the parent who
decides which signal the child should use for notification.

This means that CLONE_PARENT should not use "clone_flags & CSIGNAL",
the forking task is the sibling of the new process and their parent
doesn't control exit_signal in this case.

This patch uses ->exit_signal of the forking process, but perhaps
we should simply use SIGCHLD.

We read group_leader->exit_signal lockless, this can race with the
ORIGINAL_SIGNAL -> SIGCHLD transition, but this is fine.

Potentially this change allows to kill self_exec_id/parent_exec_id.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f8aadd8

16 3月, 2012 3 次提交

genirq: Remove paranoid warnons and bogus fixups · e04268b0

由 Thomas Gleixner 提交于 3月 15, 2012

Alexander pointed out that the warnons in the regular exit path are
bogus and the thread_mask one actually could be triggered when
__setup_irq() hands out that thread_mask again after __free_irq()
dropped irq_desc->lock.

Thinking more about it, neither IRQTF_RUNTHREAD nor the bit in
thread_mask can be set as this is the regular exit path. We come here
due to:
	__free_irq()
	   remove action from desc
	   synchronize_irq()
	   kthread_stop()

So synchronize_irq() makes sure that the thread finished running and
cleaned up both the thread_active count and thread_mask. After that
point nothing can set IRQTF_RUNTHREAD on this action. So the warnons
and the cleanups are pointless.
Reported-by: NAlexander Gordeev <agordeev@redhat.com>
Cc: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/20120315190755.GA6732@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e04268b0

prctl: use CAP_SYS_RESOURCE for PR_SET_MM option · 79f0713d

由 Cyrill Gorcunov 提交于 3月 15, 2012

CAP_SYS_ADMIN is already overloaded left and right, so to have more
fine-grained access control use CAP_SYS_RESOURCE here.

The CAP_SYS_RESOUCE is chosen because this prctl option allows a current
process to adjust some fields of memory map descriptor which rather
represents what the process owns: pointers to code, data, stack
segments, command line, auxiliary vector data and etc.
Suggested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79f0713d

ntp: Fix integer overflow when setting time · a078c6d0

由 Sasha Levin 提交于 3月 15, 2012

'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.

Use div64_long() instead.
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-send-email-levinsasha928@gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a078c6d0

15 3月, 2012 1 次提交

printk: Make it compile with !CONFIG_PRINTK · 600e1458

由 Peter Zijlstra 提交于 3月 15, 2012

Commit 3ccf3e83 ("printk/sched: Introduce special
printk_sched() for those awkward moments") overlooked
an #ifdef, so move code around to respect these directives.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Link: http://lkml.kernel.org/r/1331811337.18960.179.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

600e1458

14 3月, 2012 1 次提交

genirq: Flush the irq thread on synchronization · 7140ea19

由 Ido Yariv 提交于 12月 02, 2011

The current implementation does not always flush the threaded handler
when disabling the irq. In case the irq handler was called, but the
threaded handler hasn't started running yet, the interrupt will be
flagged as pending, and the handler will not run. This implementation
has some issues:

First, if the interrupt is a wake source and flagged as pending, the
system will not be able to suspend.

Second, when quickly disabling and re-enabling the irq, the threaded
handler might continue to run after the irq is re-enabled without the
irq handler being called first. This might be an unexpected behavior.

In addition, it might be counter-intuitive that the threaded handler
will not be called even though the irq handler was called and returned
IRQ_WAKE_THREAD.

Fix this by always waiting for the threaded handler to complete in
synchronize_irq().

[ tglx: Massaged comments, added WARN_ONs and the missing
  	IRQTF_RUNTHREAD check in exit_irq_thread() ]
Signed-off-by: NIdo Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322843052-7166-1-git-send-email-ido@wizery.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7140ea19

13 3月, 2012 6 次提交

sched: Fix nohz load accounting -- again! · c308b56b

由 Peter Zijlstra 提交于 3月 01, 2012

Various people reported nohz load tracking still being wrecked, but Doug
spotted the actual problem. We fold the nohz remainder in too soon,
causing us to loose samples and under-account.

So instead of playing catch-up up-front, always do a single load-fold
with whatever state we encounter and only then fold the nohz remainder
and play catch-up.
Reported-by: NDoug Smythies <dsmythies@telus.net>
Reported-by: NLesÅ=82aw Kope=C4=87 <leslaw.kopec@nasza-klasa.pl>
Reported-by: NAman Gupta <aman@tmm1.net>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-4v31etnhgg9kwd6ocgx3rxl8@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

c308b56b

sched: Update yield() docs · 8e3fabfd

由 Peter Zijlstra 提交于 3月 06, 2012

Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1331056466.11248.327.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

8e3fabfd

printk/sched: Introduce special printk_sched() for those awkward moments · 3ccf3e83

由 Peter Zijlstra 提交于 2月 27, 2012

There's a few awkward printk()s inside of scheduler guts that people
prefer to keep but really are rather deadlock prone. Fudge around it
by storing the text in a per-cpu buffer and poll it using the existing
printk_tick() handler.

This will drop output when its more frequent than once a tick, however
only the affinity thing could possible go that fast and for that just
one should suffice to notify the admin he's done something silly..
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-wua3lmkt3dg8nfts66o6brne@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

3ccf3e83

sched/nohz: Correctly initialize 'next_balance' in 'nohz' idle balancer · 554cecaf

由 Diwakar Tundlam 提交于 3月 07, 2012

The 'next_balance' field of 'nohz' idle balancer must be initialized
to jiffies. Since jiffies is initialized to negative 300 seconds the
'nohz' idle balancer does not run for the first 300s (5mins) after
bootup. If no new processes are spawed or no idle cycles happen, the
load on the cpus will remain unbalanced for that duration.
Signed-off-by: NDiwakar Tundlam <dtundlam@nvidia.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1DD7BFEDD3147247B1355BEFEFE4665237994F30EF@HQMAIL04.nvidia.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

554cecaf

sched: Cleanup cpu_active madness · 5fbd036b

由 Peter Zijlstra 提交于 12月 15, 2011

Stepan found:

CPU0		CPUn

_cpu_up()
  __cpu_up()

		boostrap()
		  notify_cpu_starting()
		  set_cpu_online()
		  while (!cpu_active())
		    cpu_relax()

<PREEMPT-out>

smp_call_function(.wait=1)
  /* we find cpu_online() is true */
  arch_send_call_function_ipi_mask()

  /* wait-forever-more */

<PREEMPT-in>
		  local_irq_enable()

  cpu_notify(CPU_ONLINE)
    sched_cpu_active()
      set_cpu_active()

Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.

On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.
Reported-by: NStepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twinsSigned-off-by: NIngo Molnar <mingo@elte.hu>

5fbd036b

sched: Fix load-balance wreckage · 5d6523eb

由 Peter Zijlstra 提交于 3月 10, 2012

Commit 367456c7 ("sched: Ditch per cgroup task lists for
load-balancing") completely wrecked load-balancing due to
a few silly mistakes.

Correct those and remove more pointless code.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-zk04ihygwxn7qqrlpaf73b0r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

5d6523eb

10 3月, 2012 4 次提交

genirq: Get rid of unnecessary IRQTF_DIED flag · 5234ffb9

由 Alexander Gordeev 提交于 3月 09, 2012

Currently IRQTF_DIED flag is set when a IRQ thread handler calls do_exit()
But also PF_EXITING per process flag gets set when a thread exits. This
fix eliminates the duplicate by using PF_EXITING flag.

Also, there is a race condition in exit_irq_thread(). In case a thread's
bit is cleared in desc->threads_oneshot (and the IRQ line gets unmasked),
but before IRQTF_DIED flag is set, a new interrupt might come in and set
just cleared bit again, this time forever. This fix throws IRQTF_DIED flag
away, eliminating the race as a result.

[ tglx: Test THREAD_EXITING first as suggested by Oleg ]
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120309135958.GD2114@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5234ffb9

genirq: No need to check IRQTF_DIED before stopping a thread handler · 05d74efa

由 Alexander Gordeev 提交于 3月 09, 2012

Since 63706172 kthread_stop() is not
afraid of dead kernel threads. So no need to check if a thread is
alive before stopping it. These checks still were racy.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120309135939.GC2114@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

05d74efa

genirq: Get rid of unnecessary irqaction field in task_struct · 4bcdf1d0

由 Alexander Gordeev 提交于 3月 09, 2012

When a new thread handler is created, an irqaction is passed to it as
data. Not only that irqaction is stored in task_struct by the handler
for later use, but also a structure associated with the kernel thread
keeps this value as long as the thread exists.

This fix kicks irqaction out off task_struct. Yes, I introduce new bit
field. But it allows not only to eliminate the duplicate, but also
shortens size of task_struct.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120309135925.GB2114@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

4bcdf1d0

genirq: Fix incorrect check for forced IRQ thread handler · 540b60e2

由 Alexander Gordeev 提交于 3月 09, 2012

We do not want a bitwise AND between boolean operands
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20120309135912.GA2114@dhcp-26-207.brq.redhat.com
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

540b60e2

08 3月, 2012 1 次提交

Revert "CPU hotplug, cpusets, suspend: Don't touch cpusets during suspend/resume" · 4293f20c

由 Linus Torvalds 提交于 3月 07, 2012

This reverts commit 8f2f748b.

It causes some odd regression that we have not figured out, and it's too
late in the -rc series to try to figure it out now.

As reported by Konstantin Khlebnikov, it causes consistent hangs on his
laptop (Thinkpad x220: 2x cores + HT).  They can be avoided by adding
calls to "rebuild_sched_domains();" in cpuset_cpu_[in]active() for the
CPU_{ONLINE/DOWN_FAILED/DOWN_PREPARE}_FROZEN cases, but it's not at all
clear why, and it makes no sense.

Konstantin's config doesn't even have CONFIG_CPUSETS enabled, just to
make things even more interesting.  So it's not the cpusets, it's just
the scheduling domains.

So until this is understood, revert.
Bisected-reported-and-tested-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4293f20c

07 3月, 2012 1 次提交

genirq: Clear action->thread_mask if IRQ_ONESHOT is not set · 52abb700

由 Thomas Gleixner 提交于 3月 06, 2012

Xommit ac563761(genirq: Unmask oneshot irqs when thread was not woken)
fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
handle_level_irq.

This happens because thread_mask is or'ed unconditionally in
irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared.  So
the check for !desc->thread_active fails and keeps the interrupt
disabled.

Keep the thread_mask zero for !IRQ_ONESHOT interrupts.

Document the thread_mask magic while at it.
Reported-and-tested-by: NSven Joachim <svenjoac@gmx.de>
Reported-and-tested-by: NStefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

52abb700

06 3月, 2012 9 次提交

softirq: Reduce invoke_softirq() code duplication · b2a00178

由 Heiko Carstens 提交于 3月 05, 2012

The two invoke_softirq() variants are identical except for a single
line. So move the #ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED inside one of
the functions and get rid of the other one.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b2a00178

genirq: Fix long-term regression in genirq irq_set_irq_type() handling · a09b659c

由 Russell King 提交于 3月 05, 2012

In 2008, commit 0c5d1eb7 ("genirq: record trigger type") modified the
way set_irq_type() handles the 'no trigger' condition.  However, this has
an adverse effect on PCMCIA support on Intel StrongARM and probably PXA
platforms.

PCMCIA has several status signals on the socket which can trigger
interrupts; some of these status signals depend on the card's mode
(whether it is configured in memory or IO mode).  For example, cards have
a 'Ready/IRQ' signal: in memory mode, this provides an indication to
PCMCIA that the card has finished its power up initialization.  In IO
mode, it provides the device interrupt signal.  Other status signals
switch between on-board battery status and loud speaker output.

In classical PCMCIA implementations, where you have a specific socket
controller, the controller provides a method to mask interrupts from the
socket, and importantly ignore any state transitions on the pins which
correspond with interrupts once masked.  This masking prevents unwanted
events caused by the removal and application of socket power being
forwarded.

However, on platforms where there is no socket controller, the PCMCIA
status and interrupt signals are routed to standard edge-triggered GPIOs. 
These GPIOs can be configured to interrupt on rising edge, falling edge,
or never.  This is where the problems start.

Edge triggered interrupts are required to record events while disabled via
the usual methods of {free,request,disable,enable}_irq() to prevent
problems with dropped interrupts (eg, the 8390 driver uses disable_irq()
to defer the delivery of interrupts).  As a result, these interfaces can
not be used to implement the desired behaviour.

The side effect of this is that if the 'Ready/IRQ' GPIO is disabled via
disable_irq() on suspend, and enabled via enable_irq() after resume, we
will record the state transitions caused by powering events as valid
interrupts, and foward them to the card driver, which may attempt to
access a card which is not powered up.

This leads delays resume while drivers spin in their interrupt handlers,
and complaints from drivers before they realize what's happened.

Moreover, in the case of the 'Ready/IRQ' signal, this is requested and
freed by the card driver itself; the PCMCIA core has no idea whether the
interrupt is requested, and, therefore, whether a call to disable_irq()
would be valid.  (We tried this around 2.4.17 / 2.5.1 kernel era, and
ended up throwing it out because of this problem.)

Therefore, it was decided back in around 2002 to disable the edge
triggering instead, resulting in all state transitions on the GPIO being
ignored.  That's what we actually need the hardware to do.

The commit above changes this behaviour; it explicitly prevents the 'no
trigger' state being selected.

The reason that request_irq() does not accept the 'no trigger' state is
for compatibility with existing drivers which do not provide their desired
triggering configuration.  The set_irq_type() function is 'new' and not
used by non-trigger aware drivers.

Therefore, revert this change, and restore previously working platforms
back to their former state.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Cc: linux@arm.linux.org.uk
Cc: Ingo Molnar <mingo@elte.hu>
Cc: stable@vger.kernel.org
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a09b659c

hung_task: fix the broken rcu_lock_break() logic · 6027ce49

由 Oleg Nesterov 提交于 3月 05, 2012

check_hung_uninterruptible_tasks()->rcu_lock_break() introduced by
"softlockup: check all tasks in hung_task" commit ce9dbe24 looks
absolutely wrong.

	- rcu_lock_break() does put_task_struct(). If the task has exited
	  it is not safe to even read its ->state, nothing protects this
	  task_struct.

	- The TASK_DEAD checks are wrong too. Contrary to the comment, we
	  can't use it to check if the task was unhashed. It can be unhashed
	  without TASK_DEAD, or it can be valid with TASK_DEAD.

	  For example, an autoreaping task can do release_task(current)
	  long before it sets TASK_DEAD in do_exit().

	  Or, a zombie task can have ->state == TASK_DEAD but release_task()
	  was not called, and in this case we must not break the loop.

Change this code to check pid_alive() instead, and do this before we drop
the reference to the task_struct.

Note: while_each_thread() under rcu_read_lock() is not really safe, it can
livelock.  This will be fixed later, but fortunately in this case the
"max_count" logic saves us anyway.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NMandeep Singh Baines <msb@google.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6027ce49

vfork: kill PF_STARTING · 6e27f63e

由 Oleg Nesterov 提交于 3月 05, 2012

Previously it was (ab)used by utrace.  Then it was wrongly used by the
scheduler code.

Currently it is not used, kill it before it finds the new erroneous user.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e27f63e

coredump_wait: don't call complete_vfork_done() · 57b59c4a

由 Oleg Nesterov 提交于 3月 05, 2012

Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done().  zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.

mm_release() becomes the only caller, unexport complete_vfork_done().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57b59c4a

vfork: make it killable · d68b46fe

由 Oleg Nesterov 提交于 3月 05, 2012

Make vfork() killable.

Change do_fork(CLONE_VFORK) to do wait_for_completion_killable().  If it
fails we do not return to the user-mode and never touch the memory shared
with our child.

However, in this case we should clear child->vfork_done before return, we
use task_lock() in do_fork()->wait_for_vfork_done() and
complete_vfork_done() to serialize with each other.

Note: now that we use task_lock() we don't really need completion, we
could turn task->vfork_done into "task_struct *wake_up_me" but this needs
some complications.

NOTE: this and the next patches do not affect in-kernel users of
CLONE_VFORK, kernel threads run with all signals ignored including
SIGKILL/SIGSTOP.

However this is obviously the user-visible change.  Not only a fatal
signal can kill the vforking parent, a sub-thread can do execve or
exit_group() and kill the thread sleeping in vfork().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d68b46fe

vfork: introduce complete_vfork_done() · c415c3b4

由 Oleg Nesterov 提交于 3月 05, 2012

No functional changes.

Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c415c3b4

kprobes: return proper error code from register_kprobe() · f986a499

由 Prashanth Nageshappa 提交于 3月 05, 2012

register_kprobe() aborts if the address of the new request falls in a
prohibited area (such as ftrace pouch, __kprobes annotated functions,
non-kernel text addresses, jump label text).  We however don't return the
right error on this abort, resulting in a silent failure - incorrect
adding/reporting of kprobes ('perf probe do_fork+18' or 'perf probe
mcount' for instance).

In V2 we are incorporating Masami Hiramatsu's  feedback.

This patch fixes it by returning -EINVAL upon failure.

While we are here, rename the label used for exit to be more appropriate.
Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: NPrashanth K Nageshappa <prashanth@linux.vnet.ibm.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f986a499

kmsg_dump: don't run on non-error paths by default · c22ab332

由 Matthew Garrett 提交于 3月 05, 2012

Since commit 04c6862c ("kmsg_dump: add kmsg_dump() calls to the
reboot, halt, poweroff and emergency_restart paths"), kmsg_dump() gets
run on normal paths including poweroff and reboot.

This is less than ideal given pstore implementations that can only
represent single backtraces, since a reboot may overwrite a stored oops
before it's been picked up by userspace.  In addition, some pstore
backends may have low performance and provide a significant delay in
reboot as a result.

This patch adds a printk.always_kmsg_dump kernel parameter (which can also
be changed from userspace).  Without it, the code will only be run on
failure paths rather than on normal paths.  The option can be enabled in
environments where there's a desire to attempt to audit whether or not a
reboot was cleanly requested or not.
Signed-off-by: NMatthew Garrett <mjg@redhat.com>
Acked-by: NSeiji Aguchi <seiji.aguchi@hds.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c22ab332

05 3月, 2012 3 次提交

perf: Add callback to flush branch_stack on context switch · d010b332

由 Stephane Eranian 提交于 2月 09, 2012

With branch stack sampling, it is possible to filter by priv levels.

In system-wide mode, that means it is possible to capture only user
level branches. The builtin SW LBR filter needs to disassemble code
based on LBR captured addresses. For that, it needs to know the task
the addresses are associated with. Because of context switches, the
content of the branch stack buffer may contain addresses from
different tasks.

We need a callback on context switch to either flush the branch stack
or save it. This patch adds a new callback in struct pmu which is called
during context switches. The callback is called only when necessary.
That is when a system-wide context has, at least, one event which
uses PERF_SAMPLE_BRANCH_STACK. The callback is never called for
per-thread context.

In this version, the Intel x86 code simply flushes (resets) the LBR
on context switches (fills it with zeroes). Those zeroed branches are
then filtered out by the SW filter.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-11-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

d010b332

perf: Disable PERF_SAMPLE_BRANCH_* when not supported · 2481c5fa

由 Stephane Eranian 提交于 2月 09, 2012

PERF_SAMPLE_BRANCH_* is disabled for:

 - SW events (sw counters, tracepoints)
 - HW breakpoints
 - ALL but Intel x86 architecture
 - AMD64 processors
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-10-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

2481c5fa

perf: Add generic taken branch sampling support · bce38cd5

由 Stephane Eranian 提交于 2月 09, 2012

This patch adds the ability to sample taken branches to the
perf_event interface.

The ability to capture taken branches is very useful for all
sorts of analysis. For instance, basic block profiling, call
counts, statistical call graph.

This new capability requires hardware assist and as such may
not be available on all HW platforms. On Intel x86 it is
implemented on top of the Last Branch Record (LBR) facility.

To enable taken branches sampling, the PERF_SAMPLE_BRANCH_STACK
bit must be set in attr->sample_type.

Sampled taken branches may be filtered by type and/or priv
levels.

The patch adds a new field, called branch_sample_type, to the
perf_event_attr structure. It contains a bitmask of filters
to apply to the sampled taken branches.

Filters may be implemented in HW. If the HW filter does not exist
or is not good enough, some arch may also implement a SW filter.

The following generic filters are currently defined:
- PERF_SAMPLE_USER
only branches whose targets are at the user level

- PERF_SAMPLE_KERNEL
only branches whose targets are at the kernel level

- PERF_SAMPLE_HV
only branches whose targets are at the hypervisor level

- PERF_SAMPLE_ANY
any type of branches (subject to priv levels filters)

- PERF_SAMPLE_ANY_CALL
any call branches (may incl. syscall on some arch)

- PERF_SAMPLE_ANY_RET
any return branches (may incl. syscall returns on some arch)

- PERF_SAMPLE_IND_CALL
indirect call branches

Obviously filter may be combined. The priv level bits are optional.
If not provided, the priv level of the associated event are used. It
is possible to collect branches at a priv level different from the
associated event. Use of kernel, hv priv levels is subject to permissions
and availability (hv).

The number of taken branch records present in each sample may vary based
on HW, the type of sampled branches, the executed code. Therefore
each sample contains the number of taken branches it contains.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1328826068-11713-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

bce38cd5

02 3月, 2012 2 次提交

sched: Clean up parameter passing of proc_sched_autogroup_set_nice() · 2e5b5b3a

由 Hiroshi Shimamoto 提交于 2月 23, 2012

Pass nice as a value to proc_sched_autogroup_set_nice().

No side effect is expected, and the variable err will be overwritten with
the return value.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F45FBB7.5090607@ct.jp.nec.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

2e5b5b3a

Block: use a freezable workqueue for disk-event polling · 62d3c543

由 Alan Stern 提交于 3月 02, 2012

This patch (as1519) fixes a bug in the block layer's disk-events
polling.  The polling is done by a work routine queued on the
system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
polling continues even in the middle of a system sleep transition.

Obviously, polling a suspended drive for media changes and such isn't
a good thing to do; in the case of USB mass-storage devices it can
lead to real problems requiring device resets and even re-enumeration.

The patch fixes things by creating a new system-wide, non-reentrant,
freezable workqueue and using it for disk-events polling.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
CC: <stable@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

62d3c543

01 3月, 2012 7 次提交

sched: Ditch per cgroup task lists for load-balancing · 367456c7

由 Peter Zijlstra 提交于 2月 20, 2012

Per cgroup load-balance has numerous problems, chief amongst them that
there is no real sane order in them. So stop pretending it makes sense
and enqueue all tasks on a single list.

This also allows us to more easily fix the fwd progress issue
uncovered by the lock-break stuff. Rotate the list on failure to
migreate and limit the total iterations to nr_running (which with
releasing the lock isn't strictly accurate but close enough).

Also add a filter that skips very light tasks on the first attempt
around the list, this attempts to avoid shooting whole cgroups around
without affecting over balance.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Link: http://lkml.kernel.org/n/tip-tx8yqydc7eimgq7i4rkc3a4g@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

367456c7

sched: Rename load-balancing fields · ddcdf6e7

由 Peter Zijlstra 提交于 2月 22, 2012

 s/env->this_/env->dst_/g
 s/env->busiest_/env->src_/g
 s/pull_task/move_task/g

Makes everything clearer.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Link: http://lkml.kernel.org/n/tip-0yvgms8t8x962drpvl0fu0kk@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

ddcdf6e7

sched: Move load-balancing arguments into helper struct · 8e45cb54

由 Peter Zijlstra 提交于 2月 22, 2012

Passing large sets of similar arguments all around the load-balancer
gets tiresom when you want to modify something. Stick them all in a
helper structure and pass the structure around.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: pjt@google.com
Link: http://lkml.kernel.org/n/tip-5slqz0vhsdzewrfk9eza1aon@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

8e45cb54

sched/rt: Do not submit new work when PI-blocked · 3c7d5184

由 Thomas Gleixner 提交于 7月 17, 2011

When we are PI-blocked then we want to get things done ASAP.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-vw8et3445km5b8mpihf4trae@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

3c7d5184

sched/rt: Prevent idle task boosting · 1c4dd99b

由 Thomas Gleixner 提交于 6月 06, 2011

Idle task boosting is a nono in general. There is one
exception, when PREEMPT_RT and NOHZ is active:

The idle task calls get_next_timer_interrupt() and holds
the timer wheel base->lock on the CPU and another CPU wants
to access the timer (probably to cancel it). We can safely
ignore the boosting request, as the idle CPU runs this code
with interrupts disabled and will complete the lock
protected section without being interrupted. So there is no
real need to boost.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-755rvsosz7sdzot12a3gbha6@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

1c4dd99b

sched/wait: Add __wake_up_all_locked() API · 63b20011

由 Thomas Gleixner 提交于 12月 01, 2011

For code which protects the waitqueue itself with another lock it
makes no sense to acquire the waitqueue lock for wakeup all. Provide
__wake_up_all_locked().

This is an optimization on the vanilla kernel (to be used by the
PCI code) and an important semantic distinction on -rt.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-ux6m4b8jonb9inx8xafh77ds@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

63b20011

sched/rt: Document scheduler related skip-resched-check sites · ba74c144

由 Thomas Gleixner 提交于 3月 21, 2011

Create a distinction between scheduler related preempt_enable_no_resched()
calls and the nearly one hundred other places in the kernel that do not
want to reschedule, for one reason or another.

This distinction matters for -rt, where the scheduler and the non-scheduler
preempt models (and checks) are different. For upstream it's purely
documentational.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

ba74c144