提交 · 35b740e4662ef386f0c60e1b60aaf5b44db9914c · OpenHarmony / kernel_linux

05 1月, 2012 2 次提交

ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b

由 Oleg Nesterov 提交于 1月 04, 2012

This is the temporary simple fix for 3.2, we need more changes in this
area.

1. do_signal_stop() assumes that the running untraced thread in the
   stopped thread group is not possible. This was our goal but it is
   not yet achieved: a stopped-but-resumed tracee can clone the running
   thread which can initiate another group-stop.

   Remove WARN_ON_ONCE(!current->ptrace).

2. A new thread always starts with ->jobctl = 0. If it is auto-attached
   and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
   but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
   in do_jobctl_trap() if another debugger attaches.

   Change __ptrace_unlink() to set the artificial SIGSTOP for report.

   Alternatively we could change ptrace_init_task() to copy signr from
   current, but this means we can copy it for no reason and hide the
   possible similar problems.
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>		[3.1]
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a88951b

ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race · 50b8d257

由 Oleg Nesterov 提交于 1月 04, 2012

Test-case:

	int main(void)
	{
		int pid, status;

		pid = fork();
		if (!pid) {
			for (;;) {
				if (!fork())
					return 0;
				if (waitpid(-1, &status, 0) < 0) {
					printf("ERR!! wait: %m\n");
					return 0;
				}
			}
		}

		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
		assert(waitpid(-1, NULL, 0) == pid);

		assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
					PTRACE_O_TRACEFORK) == 0);

		do {
			ptrace(PTRACE_CONT, pid, 0, 0);
			pid = waitpid(-1, NULL, 0);
		} while (pid > 0);

		return 1;
	}

It fails because ->real_parent sees its child in EXIT_DEAD state
while the tracer is going to change the state back to EXIT_ZOMBIE
in wait_task_zombie().

The offending commit is 823b018e which moved the EXIT_DEAD check,
but in fact we should not blame it. The original code was not
correct as well because it didn't take ptrace_reparented() into
account and because we can't really trust ->ptrace.

This patch adds the additional check to close this particular
race but it doesn't solve the whole problem. We simply can't
rely on ->ptrace in this case, it can be cleared if the tracer
is multithreaded by the exiting ->parent.

I think we should kill EXIT_DEAD altogether, we should always
remove the soon-to-be-reaped child from ->children or at least
we should never do the DEAD->ZOMBIE transition. But this is too
complex for 3.2.
Reported-and-tested-by: NDenys Vlasenko <vda.linux@googlemail.com>
Tested-by: NLukasz Michalik <lmi@ift.uni.wroc.pl>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>		[3.0+]
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50b8d257

04 1月, 2012 1 次提交

hung_task: fix false positive during vfork · f9fab10b

由 Mandeep Singh Baines 提交于 1月 03, 2012

vfork parent uninterruptibly and unkillably waits for its child to
exec/exit. This wait is of unbounded length. Ignore such waits
in the hung_task detector.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Reported-by: NSasha Levin <levinsasha928@gmail.com>
LKML-Reference: <1325344394.28904.43.camel@lappy>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f9fab10b

01 1月, 2012 1 次提交

futex: Fix uninterruptible loop due to gate_area · e6780f72

由 Hugh Dickins 提交于 12月 31, 2011

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.
Reported-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6780f72

31 12月, 2011 1 次提交

Revert "clockevents: Set noop handler in clockevents_exchange_device()" · 3b87487a

由 Linus Torvalds 提交于 12月 30, 2011

This reverts commit de28f25e.

It results in resume problems for various people. See for example

  http://thread.gmane.org/gmane.linux.kernel/1233033
  http://thread.gmane.org/gmane.linux.kernel/1233389
  http://thread.gmane.org/gmane.linux.kernel/1233159
  http://thread.gmane.org/gmane.linux.kernel/1227868/focus=1230877

and the fedora and ubuntu bug reports

  https://bugzilla.redhat.com/show_bug.cgi?id=767248
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/904569

which got bisected down to the stable version of this commit.
Reported-by: NJonathan Nieder <jrnieder@gmail.com>
Reported-by: NPhil Miller <mille121@illinois.edu>
Reported-by: NPhilip Langdale <philipl@overt.org>
Reported-by: NTim Gardner <tim.gardner@canonical.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <gregkh@suse.de>
Cc: stable@kernel.org    # for stable kernels that applied the original
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b87487a

21 12月, 2011 3 次提交

lockdep/waitqueues: Add better annotation · f07fdec5

由 Peter Zijlstra 提交于 12月 13, 2011

 -> #2 (&tty->write_wait){-.-...}:

is a lot more informative than:

 -> #2 (key#19){-.....}:
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-8zpopbny51023rdb0qq67eye@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

f07fdec5

binary_sysctl(): fix memory leak · 3d3c8f93

由 Michel Lespinasse 提交于 12月 19, 2011

binary_sysctl() calls sysctl_getname() which allocates from names_cache
slab usin __getname()

The matching function to free the name is __putname(), and not putname()
which should be used only to match getname() allocations.

This is because when auditing is enabled, putname() calls audit_putname
*instead* (not in addition) to __putname().  Then, if a syscall is in
progress, audit_putname does not release the name - instead, it expects
the name to get released when the syscall completes, but that will happen
only if audit_getname() was called previously, i.e.  if the name was
allocated with getname() rather than the naked __getname().  So,
__getname() followed by putname() ends up leaking memory.
Signed-off-by: NMichel Lespinasse <walken@google.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d3c8f93

cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask · b246272e

由 David Rientjes 提交于 12月 19, 2011

Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.

c0ff7453 ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread.  This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes.  This was addressed by
89e8a244 ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind.  To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change.  This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Reported-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b246272e

20 12月, 2011 1 次提交

cgroups: fix a css_set not found bug in cgroup_attach_proc · e0197aae

由 Mandeep Singh Baines 提交于 12月 15, 2011

There is a BUG when migrating a PF_EXITING proc. Since css_set_prefetch()
is not called for the PF_EXITING case, find_existing_css_set() will return
NULL inside cgroup_task_migrate() causing a BUG.

This bug is easy to reproduce. Create a zombie and echo its pid to
cgroup.procs.

$ cat zombie.c
\#include <unistd.h>

int main()
{
  if (fork())
      pause();
  return 0;
}
$

We are hitting this bug pretty regularly on ChromeOS.

This bug is already fixed by Tejun Heo's cgroup patchset which is
targetted for the next merge window:

https://lkml.org/lkml/2011/11/1/356

I've create a smaller patch here which just fixes this bug so that a
fix can be merged into the current release and stable.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Downstream-Bug-Report: http://crosbug.com/23953Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: stable@kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Olof Johansson <olofj@chromium.org>

e0197aae

19 12月, 2011 1 次提交

time/clocksource: Fix kernel-doc warnings · b1b73d09

由 Kusanagi Kouichi 提交于 12月 19, 2011

Fix various KernelDoc build warnings.
Signed-off-by: NKusanagi Kouichi <slash@ac.auone-net.jp>
Cc: John Stultz <johnstul@us.ibm.com>
Link: http://lkml.kernel.org/r/20111219091320.0D5AF6FC03D@msa105.auone-net.jpSigned-off-by: NIngo Molnar <mingo@elte.hu>

b1b73d09

16 12月, 2011 1 次提交

sched: Fix select_idle_sibling() regression in selecting an idle SMT sibling · ab278921

由 Peter Zijlstra 提交于 12月 15, 2011

Mike Galbraith reported that this recent commit:

   commit 4dcfe102
   Author: Peter Zijlstra <peterz@infradead.org>
   Date:   Thu Nov 10 13:01:10 2011 +0100

       sched: Avoid SMT siblings in select_idle_sibling() if possible

stopped selecting an idle SMT sibling when there are no idle
cores in a single socket system.

Intent of the select_idle_sibling() was to fallback to an idle
SMT sibling, if it fails to identify an idle core. But this
fallback was not happening on systems where all the scheduler
domains had `SD_SHARE_PKG_RESOURCES' flag set.

Fix it. Slightly bigger patch of cleaning all these goto's etc
is queued up for the next release.
Reported-by: NMike Galbraith <efault@gmx.de>
Reported-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1323978421.1984.244.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

ab278921

14 12月, 2011 1 次提交

perf events: Fix ring_buffer_wakeup() brown paperbag bug · 44b7f4b9

由 Will Deacon 提交于 12月 13, 2011

Commit 10c6db11 ("perf: Fix loss of notification with multi-event")
seems to unconditionally dereference event->rb in the wakeup handler,
this is wrong, there might not be a buffer attached.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20111213152651.GP20297@mudshark.cambridge.arm.com
[ minor edits ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

44b7f4b9

13 12月, 2011 1 次提交

cpu: Export cpu_up() · a513f6ba

由 Paul E. McKenney 提交于 12月 11, 2011

Building rcutorture as a module requires cpu_up() as well as cpu_down()
exported, so apply EXPORT_SYMBOL_GPL().
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

a513f6ba

12 12月, 2011 27 次提交

rcu: Apply ACCESS_ONCE() to rcu_boost() return value · 4f89b336

由 Paul E. McKenney 提交于 12月 09, 2011

Both TINY_RCU's and TREE_RCU's implementations of rcu_boost() access
the ->boost_tasks and ->exp_tasks fields without preventing concurrent
changes to these fields.  This commit therefore applies ACCESS_ONCE in
order to prevent compiler mischief.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4f89b336

Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" · 70321d44

由 Paul E. McKenney 提交于 12月 09, 2011

This reverts commit 5342e269.

The approach taken in this patch was deemed too abusive to mutexes,
and thus too likely to result in maintenance problems in the future.
Instead, we will disallow RCU read-side critical sections that partially
overlap with interrupt-disbled code segments.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

70321d44

rcu: Augment rcu_batch_end tracing for idle and callback state · 4968c300

由 Paul E. McKenney 提交于 12月 07, 2011

The current rcu_batch_end event trace records only the name of the RCU
flavor and the total number of callbacks that remain queued on the
current CPU. This is insufficient for testing and tuning the new
dyntick-idle RCU_FAST_NO_HZ code, so this commit adds idle state along
with whether or not any of the callbacks that were ready to invoke
at the beginning of rcu_do_batch() are still queued.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4968c300

rcu: Add rcutorture tests for srcu_read_lock_raw() · 101db7b4

由 Paul E. McKenney 提交于 12月 05, 2011

This commit adds simple rcutorture tests for srcu_read_lock_raw() and
srcu_read_unlock_raw().  It does not test doing srcu_read_lock_raw()
in an exception handler and releasing it in the corresponding process
context.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

101db7b4

rcu: Make rcutorture test for hotpluggability before offlining CPUs · f220242a

由 Paul E. McKenney 提交于 12月 03, 2011

The rcutorture test now can automatically exercise CPU hotplug and
collect success statistics, which can be correlated with other rcutorture
activity. This permits rcutorture to completely exercise RCU regardless
of what sort of userspace and filesystem layout is in use. Unfortunately,
rcutorture is happy to attempt to offline CPUs that cannot be offlined,
for example, CPU 0 in both the x86 and ARM architectures. Although this
allows rcutorture testing to proceed normally, it confounds attempts at
error analysis due to the resulting flood of spurious CPU-hotplug errors.

Therefore, this commit uses the new cpu_is_hotpluggable() function to
avoid attempting to offline CPUs that are not hotpluggable, which in
turn avoids spurious CPU-hotplug errors.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f220242a

rcu: Remove redundant rcu_cpu_stall_suppress declaration · 2d1dc9a6

由 Paul E. McKenney 提交于 11月 30, 2011

No point in having two identical rcu_cpu_stall_suppress declarations,
so remove the more obscure of the two.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2d1dc9a6

rcu: Adaptive dyntick-idle preparation · f23f7fa1

由 Paul E. McKenney 提交于 11月 30, 2011

If there are other CPUs active at a given point in time, then there is a
limit to what a given CPU can do to advance the current RCU grace period.
Beyond this limit, attempting to force the RCU grace period forward will
do nothing but consume energy burning CPU cycles.

Therefore, this commit takes an adaptive approach to RCU_FAST_NO_HZ
preparations for idle. It pushes the RCU core state machine for
two cycles unconditionally, and then it will push from zero to three
additional cycles, but only as long as the RCU core has work for this
CPU to do immediately. The rcu_pending() function is used to check
whether the RCU core has such work.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f23f7fa1

rcu: Keep invoking callbacks if CPU otherwise idle · dff1672d

由 Paul E. McKenney 提交于 11月 29, 2011

The rcu_do_batch() function that invokes callbacks for TREE_RCU and
TREE_PREEMPT_RCU normally throttles callback invocation to avoid degrading
scheduling latency. However, as long as the CPU would otherwise be idle,
there is no downside to continuing to invoke any callbacks that have passed
through their grace periods. In fact, processing such callbacks in a
timely manner has the benefit of increasing the probability that the
CPU can enter the power-saving dyntick-idle mode.

Therefore, this commit allows callback invocation to continue beyond the
preset limit as long as the scheduler does not have some other task to
run and as long as context is that of the idle task or the relevant
RCU kthread.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

dff1672d

rcu: Irq nesting is always 0 on rcu_enter_idle_common · facc4e15

由 Frederic Weisbecker 提交于 11月 28, 2011

Because tasks don't nest, the ->dyntick_nesting must always be zero upon
entry to rcu_idle_enter_common().  Therefore, pass "0" rather than the
counter itself.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

facc4e15

rcu: Don't check irq nesting from rcu idle entry/exit · b6fc6020

由 Frederic Weisbecker 提交于 11月 28, 2011

Because tasks do not nest, rcu_idle_enter() and rcu_idle_exit() do
not need to check for nesting.  This commit therefore moves nesting
checks from rcu_idle_enter_common() to rcu_irq_exit() and from
rcu_idle_exit_common() to rcu_irq_enter().
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b6fc6020

rcu: Permit dyntick-idle with callbacks pending · 7cb92499

由 Paul E. McKenney 提交于 11月 28, 2011

The current implementation of RCU_FAST_NO_HZ prevents CPUs from entering
dyntick-idle state if they have RCU callbacks pending. Unfortunately,
this has the side-effect of often preventing them from entering this
state, especially if at least one other CPU is not in dyntick-idle state.
However, the resulting per-tick wakeup is wasteful in many cases: if the
CPU has already fully responded to the current RCU grace period, there
will be nothing for it to do until this grace period ends, which will
frequently take several jiffies.

This commit therefore permits a CPU that has done everything that the
current grace period has asked of it (rcu_pending() == 0) even if it
still as RCU callbacks pending. However, such a CPU posts a timer to
wake it up several jiffies later (6 jiffies, based on experience with
grace-period lengths). This wakeup is required to handle situations
that can result in all CPUs being in dyntick-idle mode, thus failing
to ever complete the current grace period. If a CPU wakes up before
the timer goes off, then it cancels that timer, thus avoiding spurious
wakeups.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

7cb92499

rcu: Identify dyntick-idle CPUs on first force_quiescent_state() pass · f0e7c19d

由 Paul E. McKenney 提交于 11月 23, 2011

Fixes and workarounds for a number of issues (for example, that in
df4012edc) make it safe to once again detect dyntick-idle CPUs on the
first pass of force_quiescent_state(), so this commit makes that change.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f0e7c19d

rcu: Remove dynticks false positives and RCU failures · c92b131b

由 Paul E. McKenney 提交于 11月 23, 2011

Assertions in rcu_init_percpu_data() unknowingly relied on outgoing
CPUs being turned off before reaching the idle loop. Unfortunately,
when running under kvm/qemu on x86, CPUs really can get to idle before
begin shut off. These CPUs are then born in dyntick-idle mode from an
RCU perspective, which results in splats in rcu_init_percpu_data() and
in RCU wrongly ignoring those CPUs despite them being active. This in
turn can cause RCU to end grace periods prematurely, potentially freeing
up memory that the newly onlined CPUs were still using. This is most
decidedly not what we need to see in an RCU implementation.

This commit therefore replaces the assertions in rcu_init_percpu_data()
with code that forces RCU's dyntick-idle view of newly onlined CPUs to
match reality.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

c92b131b

rcu: Reduce latency of rcu_prepare_for_idle() · 3ad0decf

由 Paul E. McKenney 提交于 11月 22, 2011

Re-enable interrupts across calls to quiescent-state functions and
also across force_quiescent_state() to reduce latency.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3ad0decf

rcu: Eliminate RCU_FAST_NO_HZ grace-period hang · f535a607

由 Paul E. McKenney 提交于 11月 22, 2011

With the new implementation of RCU_FAST_NO_HZ, it was possible to hang
RCU grace periods as follows:

o	CPU 0 attempts to go idle, cycles several times through the
	rcu_prepare_for_idle() loop, then goes dyntick-idle when
	RCU needs nothing more from it, while still having at least
	on RCU callback pending.

o	CPU 1 goes idle with no callbacks.

Both CPUs can then stay in dyntick-idle mode indefinitely, preventing
the RCU grace period from ever completing, possibly hanging the system.

This commit therefore prevents CPUs that have RCU callbacks from entering
dyntick-idle mode.  This approach also eliminates the need for the
end-of-grace-period IPIs used previously.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f535a607

rcu: Avoid needlessly IPIing CPUs at GP end · 84ad00cb

由 Paul E. McKenney 提交于 11月 22, 2011

If a CPU enters dyntick-idle mode with callbacks pending, it will need
an IPI at the end of the grace period. However, if it exits dyntick-idle
mode before the grace period ends, it will be needlessly IPIed at the
end of the grace period.

Therefore, this commit clears the per-CPU rcu_awake_at_gp_end flag
when a CPU determines that it does not need it. This in turn requires
disabling interrupts across much of rcu_prepare_for_idle() in order to
avoid having nested interrupts clearing this state out from under us.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

84ad00cb

rcu: Go dyntick-idle more quickly if CPU has serviced current grace period · 3084f2f8

由 Paul E. McKenney 提交于 11月 22, 2011

The earlier version would attempt to push callbacks through five times
before going into dyntick-idle mode if callbacks remained, but the CPU
had done all that it needed to do for the current RCU grace periods.
This is wasteful: In most cases, once the CPU has done all that it
needs to for the current RCU grace periods, it will make no further
progress on the callbacks no matter how many times it loops through
the RCU core processing and the idle-entry code.

This commit therefore goes to dyntick-idle mode whenever the current
CPU has done all it can for the current grace period.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

3084f2f8

rcu: Add tracing for RCU_FAST_NO_HZ · 433cdddc

由 Paul E. McKenney 提交于 11月 22, 2011

This commit adds trace_rcu_prep_idle(), which is invoked from
rcu_prepare_for_idle() and rcu_wake_cpu() to trace attempts on
the part of RCU to force CPUs into dyntick-idle mode.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

433cdddc

nohz: Remove tick_nohz_idle_enter_norcu() / tick_nohz_idle_exit_norcu() · 1268fbc7

由 Frederic Weisbecker 提交于 11月 17, 2011

Those two APIs were provided to optimize the calls of
tick_nohz_idle_enter() and rcu_idle_enter() into a single
irq disabled section. This way no interrupt happening in-between would
needlessly process any RCU job.

Now we are talking about an optimization for which benefits
have yet to be measured. Let's start simple and completely decouple
idle rcu and dyntick idle logics to simplify.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1268fbc7

rcu: Add rcutorture CPU-hotplug capability · b58bdcca

由 Paul E. McKenney 提交于 11月 16, 2011

Running CPU-hotplug operations concurrently with rcutorture has
historically been a good way to find bugs in both RCU and CPU hotplug.
This commit therefore adds an rcutorture module parameter called
"onoff_interval" that causes a randomly selected CPU-hotplug operation to
be executed at the specified interval, in seconds.  The default value of
"onoff_interval" is zero, which disables rcutorture-instigated CPU-hotplug
operations.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b58bdcca

events: Make events use the new is_idle_task() API · 77aeeebd

由 Paul E. McKenney 提交于 11月 10, 2011

Change from direct comparison of ->pid with zero to is_idle_task().
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

77aeeebd

kdb: Make KDB use the new is_idle_task() API · 7fc20c5c

由 Paul E. McKenney 提交于 11月 10, 2011

Change from direct comparison of ->pid with zero to is_idle_task().
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

7fc20c5c

rcu: Make RCU use the new is_idle_task() API · 99745b6a

由 Paul E. McKenney 提交于 11月 10, 2011

Change from direct comparison of ->pid with zero to is_idle_task().
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

99745b6a

rcu: Control rcutorture startup from kernel boot parameters · bb3bf705

由 Paul E. McKenney 提交于 11月 04, 2011

Currently, if rcutorture is built into the kernel, it must be manually
started or started from an init script. This is inconvenient for
automated KVM testing, where it is good to be able to fully control
rcutorture execution from the kernel parameters. This patch therefore
adds a module parameter named "rcutorture_runnable" that defaults
to zero ("don't start automatically"), but which can be set to one
to cause rcutorture to start up immediately during boot.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bb3bf705

rcu: Add rcutorture system-shutdown capability · d5f546d8

由 Paul E. McKenney 提交于 11月 04, 2011

Although it is easy to run rcutorture tests under KVM, there is currently
no nice way to run such a test for a fixed time period, collect all of
the rcutorture data, and then shut the system down cleanly. This commit
therefore adds an rcutorture module parameter named "shutdown_secs" that
specified the run duration in seconds, after which rcutorture terminates
the test and powers the system down. The default value for "shutdown_secs"
is zero, which disables shutdown.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

d5f546d8

rcu: Fix idle-task checks · 11dbaa8c

由 Paul E. McKenney 提交于 11月 02, 2011

RCU has traditionally relied on idle_cpu() to determine whether a given
CPU is running in the context of an idle task, but commit 908a3283
(Fix idle_cpu()) has invalidated this approach.  After commit 908a3283,
idle_cpu() will return true if the current CPU is currently running the
idle task, and will be doing so for the foreseeable future.  RCU instead
needs to know whether or not the current CPU is currently running the
idle task, regardless of what the near future might bring.

This commit therefore switches from idle_cpu() to "current->pid != 0".
Reported-by: NWu Fengguang <fengguang.wu@intel.com>
Suggested-by: NCarsten Emde <C.Emde@osadl.org>
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

11dbaa8c

rcu: Allow dyntick-idle mode for CPUs with callbacks · aea1b35e

由 Paul E. McKenney 提交于 11月 02, 2011

Currently, RCU does not permit a CPU to enter dyntick-idle mode if that
CPU has any RCU callbacks queued. This means that workloads for which
each CPU wakes up and does some RCU updates every few ticks will never
enter dyntick-idle mode. This can result in significant unnecessary power
consumption, so this patch permits a given to enter dyntick-idle mode if
it has callbacks, but only if that same CPU has completed all current
work for the RCU core. We determine use rcu_pending() to determine
whether a given CPU has completed all current work for the RCU core.
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

aea1b35e

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年