提交 · 55ec936ff4e57cc626db336a7bf33b267390e9b4 · openeuler / raspberrypi-kernel

11 5月, 2010 9 次提交

rcu: slim down rcutiny by removing rcu_scheduler_active and friends · bbad9379

由 Paul E. McKenney 提交于 4月 02, 2010

TINY_RCU does not need rcu_scheduler_active unless CONFIG_DEBUG_LOCK_ALLOC.
So conditionally compile rcu_scheduler_active in order to slim down
rcutiny a bit more. Also gets rid of an EXPORT_SYMBOL_GPL, which is
responsible for most of the slimming.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

bbad9379

rcu: refactor RCU's context-switch handling · 25502a6c

由 Paul E. McKenney 提交于 4月 01, 2010

The addition of preemptible RCU to treercu resulted in a bit of
confusion and inefficiency surrounding the handling of context switches
for RCU-sched and for RCU-preempt. For RCU-sched, a context switch
is a quiescent state, pure and simple, just like it always has been.
For RCU-preempt, a context switch is in no way a quiescent state, but
special handling is required when a task blocks in an RCU read-side
critical section.

However, the callout from the scheduler and the outer loop in ksoftirqd
still calls something named rcu_sched_qs(), whose name is no longer
accurate. Furthermore, when rcu_check_callbacks() notes an RCU-sched
quiescent state, it ends up unnecessarily (though harmlessly, aside
from the performance hit) enqueuing the current task if it happens to
be running in an RCU-preempt read-side critical section. This not only
increases the maximum latency of scheduler_tick(), it also needlessly
increases the overhead of the next outermost rcu_read_unlock() invocation.

This patch addresses this situation by separating the notion of RCU's
context-switch handling from that of RCU-sched's quiescent states.
The context-switch handling is covered by rcu_note_context_switch() in
general and by rcu_preempt_note_context_switch() for preemptible RCU.
This permits rcu_sched_qs() to handle quiescent states and only quiescent
states. It also reduces the maximum latency of scheduler_tick(), though
probably by much less than a microsecond. Finally, it means that tasks
within preemptible-RCU read-side critical sections avoid incurring the
overhead of queuing unless there really is a context switch.
Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>

25502a6c

rcu: rename rcutiny rcu_ctrlblk to rcu_sched_ctrlblk · 99652b54

由 Paul E. McKenney 提交于 3月 30, 2010

Make naming line up in preparation for CONFIG_TINY_PREEMPT_RCU.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

99652b54

rcu: shrink rcutiny by making synchronize_rcu_bh() be inline · da848c47

由 Paul E. McKenney 提交于 3月 30, 2010

Because synchronize_rcu_bh() is identical to synchronize_sched(),
make the former a static inline invoking the latter, saving the
overhead of an EXPORT_SYMBOL_GPL() and the duplicate code.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

da848c47

rcu: ignore offline CPUs in last non-dyntick-idle CPU check · 5db35673

由 Lai Jiangshan 提交于 3月 30, 2010

Offline CPUs are not in nohz_cpu_mask, but can be ignored when checking
for the last non-dyntick-idle CPU. This patch therefore only checks
online CPUs for not being dyntick idle, allowing fast entry into
full-system dyntick-idle state even when there are some offline CPUs.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

5db35673

rcu: move some code from macro to function · 0c34029a

由 Lai Jiangshan 提交于 3月 28, 2010

Shrink the RCU_INIT_FLAVOR() macro by moving all but the initialization
of the ->rda[] array to rcu_init_one(). The call to rcu_init_one()
can then be moved to the end of the RCU_INIT_FLAVOR() macro, which is
required because rcu_boot_init_percpu_data(), which is now called from
rcu_init_one(), depends on the initialization of the ->rda[] array.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

0c34029a

rcu: make dead code really dead · f261414f

由 Lai Jiangshan 提交于 3月 28, 2010

cleanup: make dead code really dead
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

f261414f

rcu: substitute set_need_resched for sending resched IPIs · d25eb944

由 Paul E. McKenney 提交于 3月 18, 2010

This patch adds a check to __rcu_pending() that does a local
set_need_resched() if the current CPU is holding up the current grace
period and if force_quiescent_state() will be called soon. The goal is
to reduce the probability that force_quiescent_state() will need to do
smp_send_reschedule(), which sends an IPI and is therefore more expensive
on most architectures.
Signed-off-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

d25eb944

rcu: optionally leave lockdep enabled after RCU lockdep splat · 2b3fc35f

由 Lai Jiangshan 提交于 4月 20, 2010

There is no need to disable lockdep after an RCU lockdep splat,
so remove the debug_lockdeps_off() from lockdep_rcu_dereference().
To avoid repeated lockdep splats, use a static variable in the inlined
rcu_dereference_check() and rcu_dereference_protected() macros so that
a given instance splats only once, but so that multiple instances can
be detected per boot.

This is controlled by a new config variable CONFIG_PROVE_RCU_REPEATEDLY,
which is disabled by default.  This provides the normal lockdep behavior
by default, but permits people who want to find multiple RCU-lockdep
splats per boot to easily do so.
Requested-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

2b3fc35f

07 5月, 2010 1 次提交

rcu: create rcu_my_thread_group_empty() wrapper · ee84b824

由 Paul E. McKenney 提交于 5月 06, 2010

Some RCU-lockdep splat repairs need to know whether they are running
in a single-threaded process.  Unfortunately, the thread_group_empty()
primitive is defined in sched.h, and can induce #include hell.  This
commit therefore introduces a rcu_my_thread_group_empty() wrapper that
is defined in rcupdate.c, thus avoiding the need to include sched.h
everywhere.
Signed-off-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

ee84b824

05 5月, 2010 3 次提交

sched: Fix an RCU warning in print_task() · b629317e

由 Li Zefan 提交于 4月 22, 2010

With CONFIG_PROVE_RCU=y, a warning can be triggered:

  $ cat /proc/sched_debug

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

Both cgroup_path() and task_group() should be called with either
rcu_read_lock or cgroup_mutex held.

The rcu_dereference_check() does include cgroup_lock_is_held(), so we
know that this lock is not held.  Therefore, in a CONFIG_PREEMPT kernel,
to say nothing of a CONFIG_PREEMPT_RT kernel, the original code could
have ended up copying a string out of the freelist.

This patch inserts RCU read-side primitives needed to prevent this
scenario.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

b629317e

cgroup: Fix an RCU warning in alloc_css_id() · fae9c791

由 Li Zefan 提交于 4月 22, 2010

With CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o memory xxx /mnt
  # mkdir /mnt/0

...
kernel/cgroup.c:4442 invoked rcu_dereference_check() without protection!
...

This is a false-positive. It's safe to directly access parent_css->id.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

fae9c791

cgroup: Fix an RCU warning in cgroup_path() · 9a9686b6

由 Li Zefan 提交于 4月 22, 2010

with CONFIG_PROVE_RCU=y, a warning can be triggered:

  # mount -t cgroup -o debug xxx /mnt
  # cat /proc/$$/cgroup

...
kernel/cgroup.c:1649 invoked rcu_dereference_check() without protection!
...

This is a false-positive, because cgroup_path() can be called
with either rcu_read_lock() held or cgroup_mutex held.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

9a9686b6

01 5月, 2010 1 次提交

perf: Fix resource leak in failure path of perf_event_open() · 048c8520

由 Tejun Heo 提交于 5月 01, 2010

perf_event_open() kfrees event after init failure which doesn't
release all resources allocated by perf_event_alloc().  Use
free_event() instead.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: <stable@kernel.org>
LKML-Reference: <4BDBE237.1040809@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

048c8520

30 4月, 2010 3 次提交

rcu: Fix RCU lockdep splat on freezer_fork path · 8b46f880

由 Paul E. McKenney 提交于 4月 21, 2010

Add an RCU read-side critical section to suppress this false
positive.
Located-by: NEric Paris <eparis@parisplace.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <1271880131-3951-2-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8b46f880

rcu: Fix RCU lockdep splat in set_task_cpu on fork path · 8b08ca52

由 Peter Zijlstra 提交于 4月 21, 2010

Add an RCU read-side critical section to suppress this false
positive.
Located-by: NEric Paris <eparis@parisplace.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <1271880131-3951-1-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8b08ca52

workqueue: flush_delayed_work: keep the original workqueue for re-queueing · 47dd5be2

由 Oleg Nesterov 提交于 4月 30, 2010

flush_delayed_work() always uses keventd_wq for re-queueing,
but it should use the workqueue this dwork was queued on.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

47dd5be2

25 4月, 2010 1 次提交

kernel/sys.c: fix compat uname machine · 46da2766

由 Andreas Schwab 提交于 4月 23, 2010

On ppc64 you get this error:

  $ setarch ppc -R true
  setarch: ppc: Unrecognized architecture

because uname still reports ppc64 as the machine.

So mask off the personality flags when checking for PER_LINUX32.
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

46da2766

23 4月, 2010 1 次提交

mutex: Don't spin when the owner CPU is offline or other weird cases · 4b402210

由 Benjamin Herrenschmidt 提交于 4月 16, 2010

Due to recent load-balancer changes that delay the task migration to
the next wakeup, the adaptive mutex spinning ends up in a live lock
when the owner's CPU gets offlined because the cpu_online() check
lives before the owner running check.

This patch changes mutex_spin_on_owner() to return 0 (don't spin) in
any case where we aren't sure about the owner struct validity or CPU
number, and if the said CPU is offline. There is no point going back &
re-evaluate spinning in corner cases like that, let's just go to
sleep.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271212509.13059.135.camel@pasglop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4b402210

22 4月, 2010 1 次提交

CRED: Fix a race in creds_are_invalid() in credentials debugging · e134d200

由 David Howells 提交于 4月 21, 2010

creds_are_invalid() reads both cred->usage and cred->subscribers and then
compares them to make sure the number of processes subscribed to a cred struct
never exceeds the refcount of that cred struct.

The problem is that this can cause a race with both copy_creds() and
exit_creds() as the two counters, whilst they are of atomic_t type, are only
atomic with respect to themselves, and not atomic with respect to each other.

This means that if creds_are_invalid() can read the values on one CPU whilst
they're being modified on another CPU, and so can observe an evolving state in
which the subscribers count now is greater than the usage count a moment
before.

Switching the order in which the counts are read cannot help, so the thing to
do is to remove that particular check.

I had considered rechecking the values to see if they're in flux if the test
fails, but I can't guarantee they won't appear the same, even if they've
changed several times in the meantime.

Note that this can only happen if CONFIG_DEBUG_CREDENTIALS is enabled.

The problem is only likely to occur with multithreaded programs, and can be
tested by the tst-eintr1 program from glibc's "make check".  The symptoms look
like:

	CRED: Invalid credentials
	CRED: At include/linux/cred.h:240
	CRED: Specified credentials: ffff88003dda5878 [real][eff]
	CRED: ->magic=43736564, put_addr=(null)
	CRED: ->usage=766, subscr=766
	CRED: ->*uid = { 0,0,0,0 }
	CRED: ->*gid = { 0,0,0,0 }
	CRED: ->security is ffff88003d72f538
	CRED: ->security {359, 359}
	------------[ cut here ]------------
	kernel BUG at kernel/cred.c:850!
	...
	RIP: 0010:[<ffffffff81049889>]  [<ffffffff81049889>] __invalid_creds+0x4e/0x52
	...
	Call Trace:
	 [<ffffffff8104a37b>] copy_creds+0x6b/0x23f

Note the ->usage=766 and subscr=766.  The values appear the same because
they've been re-read since the check was made.
Reported-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

e134d200

21 4月, 2010 1 次提交

CRED: Fix double free in prepare_usermodehelper_creds() error handling · eff30363

由 David Howells 提交于 4月 20, 2010

Patch 570b8fb5:

	Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
	Date:   Tue Mar 30 00:04:00 2010 +0100
	Subject: CRED: Fix memory leak in error handling

attempts to fix a memory leak in the error handling by making the offending
return statement into a jump down to the bottom of the function where a
kfree(tgcred) is inserted.

This is, however, incorrect, as it does a kfree() after doing put_cred() if
security_prepare_creds() fails.  That will result in a double free if 'error'
is jumped to as put_cred() will also attempt to free the new tgcred record by
virtue of it being pointed to by the new cred record.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

eff30363

19 4月, 2010 1 次提交

rcu: Make RCU lockdep check the lockdep_recursion variable · bc293d62

由 Paul E. McKenney 提交于 4月 15, 2010

The lockdep facility temporarily disables lockdep checking by
incrementing the current->lockdep_recursion variable.  Such
disabling happens in NMIs and in other situations where lockdep
might expect to recurse on itself.

This patch therefore checks current->lockdep_recursion, disabling RCU
lockdep splats when this variable is non-zero.  In addition, this patch
removes the "likely()", as suggested by Lai Jiangshan.
Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
Reported-by: NDavid Miller <davem@davemloft.net>
Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: eric.dumazet@gmail.com
LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bc293d62

11 4月, 2010 1 次提交

PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handling · d88d4050

由 Jiri Slaby 提交于 4月 10, 2010

When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device
improperly by old_decode_dev and it results in an error while
hibernating with s2disk.

All users already pass the new device number, so switch to
new_decode_dev().
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Reported-and-tested-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: N"Rafael J. Wysocki" <rjw@sisk.pl>

d88d4050

07 4月, 2010 1 次提交

mm: avoid null-pointer deref in sync_mm_rss() · a3a2e76c

由 KAMEZAWA Hiroyuki 提交于 4月 06, 2010

- We weren't zeroing p->rss_stat[] at fork()

- Consequently sync_mm_rss() was dereferencing tsk->mm for kernel
  threads and was oopsing.

- Make __sync_task_rss_stat() static, too.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15648

[akpm@linux-foundation.org: remove the BUG_ON(!mm->rss)]
Reported-by: NTroels Liebe Bentsen <tlb@rapanden.dk>
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
"Michael S. Tsirkin" <mst@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3a2e76c

06 4月, 2010 3 次提交

sched: Fix sched_getaffinity() · 84fba5ec

由 Anton Blanchard 提交于 4月 06, 2010

taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with
the following error:

  sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument)

This box has 128 threads and 16 bytes is enough to cover it.

Commit cd3d8031 (sched:
sched_getaffinity(): Allow less than NR_CPUS length) is
comparing this 16 bytes agains nr_cpu_ids.

Fix it by comparing nr_cpu_ids to the number of bits in the
cpumask we pass in.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Sharyathi Nagesh <sharyath@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Mike Travis <travis@sgi.com>
LKML-Reference: <20100406070218.GM5594@kryten>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

84fba5ec

Fix up possibly racy module refcounting · 5fbfb18d

由 Nick Piggin 提交于 4月 01, 2010

Module refcounting is implemented with a per-cpu counter for speed.
However there is a race when tallying the counter where a reference may
be taken by one CPU and released by another. Reference count summation
may then see the decrement without having seen the previous increment,
leading to lower than expected count. A module which never has its
actual reference drop below 1 may return a reference count of 0 due to
this race.

Module removal generally runs under stop_machine, which prevents this
race causing bugs due to removal of in-use modules. However there are
other real bugs in module.c code and driver code (module_refcount is
exported) where the callers do not run under stop_machine.

Fix this by maintaining running per-cpu counters for the number of
module refcount increments and the number of refcount decrements. The
increments are tallied after the decrements, so any decrement seen will
always have its corresponding increment counted. The final refcount is
the difference of the total increments and decrements, preventing a
low-refcount from being returned.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fbfb18d

audit: preface audit printk with audit · 449cedf0

由 Eric Paris 提交于 4月 05, 2010

There have been a number of reports of people seeing the message:
"name_count maxed, losing inode data: dev=00:05, inode=3185"
in dmesg. These usually lead to people reporting problems to the filesystem
group who are in turn clueless what they mean.

Eventually someone finds me and I explain what is going on and that
these come from the audit system. The basics of the problem is that the
audit subsystem never expects a single syscall to 'interact' (for some
wish washy meaning of interact) with more than 20 inodes. But in fact
some operations like loading kernel modules can cause changes to lots of
inodes in debugfs.

There are a couple real fixes being bandied about including removing the
fixed compile time limit of 20 or not auditing changes in debugfs (or
both) but neither are small and obvious so I am not sending them for
immediate inclusion (I hope Al forwards a real solution next devel
window).

In the meantime this patch simply adds 'audit' to the beginning of the
crap message so if a user sees it, they come blame me first and we can
talk about what it means and make sure we understand all of the reasons
it can happen and make sure this gets solved correctly in the long run.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

449cedf0

03 4月, 2010 8 次提交

perf: Always build the stub perf_arch_fetch_caller_regs version · 26d80aa7

由 Frederic Weisbecker 提交于 4月 03, 2010

Now that software events use perf_arch_fetch_caller_regs() too, we
need the stub version to be always built in for archs that don't
implement it.

Fixes the following build error in PARISC:

	kernel/built-in.o: In function `perf_event_task_sched_out':
	(.text.perf_event_task_sched_out+0x54): undefined reference to `perf_arch_fetch_caller_regs'
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>

26d80aa7

kgdb: Turn off tracing while in the debugger · 4da75b9c

由 Jason Wessel 提交于 4月 02, 2010

The kernel debugger should turn off kernel tracing any time the
debugger is active and restore it on resume.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>

4da75b9c

kgdb: use atomic_inc and atomic_dec instead of atomic_set · ae6bf53e

由 Jason Wessel 提交于 4月 02, 2010

Memory barriers should be used for the kgdb cpu synchronization.  The
atomic_set() does not imply a memory barrier.
Reported-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

ae6bf53e

kgdb: eliminate kgdb_wait(), all cpus enter the same way · 62fae312

由 Jason Wessel 提交于 4月 02, 2010

This is a kgdb architectural change to have all the cpus (master or
slave) enter the same function.

A cpu that hits an exception (wants to be the master cpu) will call
kgdb_handle_exception() from the trap handler and then invoke a
kgdb_roundup_cpu() to synchronize the other cpus and bring them into
the kgdb_handle_exception() as well.

A slave cpu will enter kgdb_handle_exception() from the
kgdb_nmicallback() and set the exception state to note that the
processor is a slave.

Previously the salve cpu would have called kgdb_wait().  This change
allows the debug core to change cpus without resuming the system in
order to inspect arch specific cpu information.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

62fae312

kgdb: have ebin2mem call probe_kernel_write once · a0279bd5

由 Jason Wessel 提交于 4月 02, 2010

Rather than call probe_kernel_write() one byte at a time, process the
whole buffer locally and pass the entire result in one go.  This way,
architectures that need to do special handling based on the length can
do so, or we only end up calling memcpy() once.

[sonic.zhang@analog.com: Reported original problem and preliminary patch]
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
Signed-off-by: NMike Frysinger <vapier@gentoo.org>

a0279bd5

sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlock · 47a70985

由 Oleg Nesterov 提交于 3月 30, 2010

Trivial typo fix. rq->migration_thread can be NULL after
task_rq_unlock(), this is why we have "mt" which should be
 used instead.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100330165829.GA18284@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

47a70985

sched: Fix proc_sched_set_task() · 269484a4

由 Mike Galbraith 提交于 3月 30, 2010

Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks
task_times().  Other places in kernel use nvcsw and nivcsw, which are
being cleared as well,  Clear task statistics only.
Reported-by: NTörök Edwin <edwintorok@gmail.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1269940193.19286.14.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

269484a4

perf: Fix 'perf sched record' deadlock · 8bb39f9a

由 Mike Galbraith 提交于 3月 26, 2010

perf sched record can deadlock a box should the holder of
handle->data->lock take an interrupt, and then attempt to
acquire an rq lock held by a CPU trying to acquire the
same lock. Disable interrupts.

   CPU0                            CPU1
   sched event with rq->lock held
                                   grab handle->data->lock
   spin on handle->data->lock
                                   interrupt
                                   try to grab rq->lock
Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1269598293.6174.8.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8bb39f9a

01 4月, 2010 2 次提交

perf: Use hot regs with software sched switch/migrate events · e49a5bd3

由 Frederic Weisbecker 提交于 3月 22, 2010

Scheduler's task migration events don't work because they always
pass NULL regs perf_sw_event(). The event hence gets filtered
in perf_swevent_add().

Scheduler's context switches events use task_pt_regs() to get
the context when the event occured which is a wrong thing to
do as this won't give us the place in the kernel where we went
to sleep but the place where we left userspace. The result is
even more wrong if we switch from a kernel thread.

Use the hot regs snapshot for both events as they belong to the
non-interrupt/exception based events family. Unlike page faults
or so that provide the regs matching the exact origin of the event,
we need to save the current context.

This makes the task migration event working and fix the context
switch callchains and origin ip.

Example: perf record -a -e cs

Before:

    10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                |
                --- (nil)
                    perf_callchain
                    perf_prepare_sample
                    __perf_event_overflow
                    perf_swevent_overflow
                    perf_swevent_add
                    perf_swevent_ctx_event
                    do_perf_sw_event
                    __perf_sw_event
                    perf_event_task_sched_out
                    schedule
                    run_ksoftirqd
                    kthread
                    kernel_thread_helper

After:

    23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
            |
            --- schedule
               |
               |--60.00%-- schedule_timeout
               |          wait_for_common
               |          wait_for_completion
               |          blk_execute_rq
               |          scsi_execute
               |          scsi_execute_req
               |          sr_test_unit_ready
               |          |
               |          |--66.67%-- sr_media_change
               |          |          media_changed
               |          |          cdrom_media_changed
               |          |          sr_block_media_changed
               |          |          check_disk_change
               |          |          cdrom_open

v2: Always build perf_arch_fetch_caller_regs() now that software
events need that too. They don't need it from modules, unlike trace
events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>

e49a5bd3

perf: Correctly align perf event tracing buffer · eb1e7961

由 Frederic Weisbecker 提交于 3月 23, 2010

The trace event buffer used by perf to record raw sample events
is typed as an array of char and may then not be aligned to 8
by alloc_percpu().

But we need it to be aligned to 8 in sparc64 because we cast
this buffer into a random structure type built by the TRACE_EVENT()
macro to store the traces. So if a random 64 bits field is accessed
inside, it may be not under an expected good alignment.

Use an array of long instead to force the appropriate alignment, and
perform a compile time check to ensure the size in byte of the buffer
is a multiple of sizeof(long) so that its actual size doesn't get
shrinked under us.

This fixes unaligned accesses reported while using perf lock
in sparc 64.
Suggested-by: NDavid Miller <davem@davemloft.net>
Suggested-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Steven Rostedt <rostedt@goodmis.org>

eb1e7961

31 3月, 2010 1 次提交

genirq: Force MSI irq handlers to run with interrupts disabled · 753649db

由 Thomas Gleixner 提交于 3月 31, 2010

Network folks reported that directing all MSI-X vectors of their multi
queue NICs to a single core can cause interrupt stack overflows when
enough interrupts fire at the same time.

This is caused by the fact that we run interrupt handlers by default
with interrupts enabled unless the driver reuqests the interrupt with
the IRQF_DISABLED set. The NIC handlers do not set this flag, so
simultaneous interrupts can nest unlimited and cause the stack
overflow.

The only safe counter measure is to run the interrupt handlers with
interrupts disabled. We can't switch to this mode in general right
now, but it is safe to do so for MSI interrupts.

Force IRQF_DISABLED for MSI interrupt handlers.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Linus Torvalds <torvalds@osdl.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: stable@kernel.org

753649db

30 3月, 2010 2 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

CRED: Fix memory leak in error handling · 570b8fb5

由 Mathieu Desnoyers 提交于 3月 30, 2010

Fix a memory leak on an OOM condition in prepare_usermodehelper_creds().
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

570b8fb5