提交 · 37dd3818d1f3d875125cffb6d84392df1a38f2f9 · openeuler / raspberrypi-kernel

09 3月, 2015 1 次提交

workqueue: dump workqueues on sysrq-t · 3494fc30

由 Tejun Heo 提交于 3月 09, 2015

Workqueues are used extensively throughout the kernel but sometimes
it's difficult to debug stalls involving work items because visibility
into its inner workings is fairly limited.  Although sysrq-t task dump
annotates each active worker task with the information on the work
item being executed, it is challenging to find out which work items
are pending or delayed on which queues and how pools are being
managed.

This patch implements show_workqueue_state() which dumps all busy
workqueues and pools and is called from the sysrq-t handler.  At the
end of sysrq-t dump, something like the following is printed.

 Showing busy workqueues and worker pools:
 ...
 workqueue filler_wq: flags=0x0
   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=2/256
     in-flight: 491:filler_workfn, 507:filler_workfn
   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
     in-flight: 501:filler_workfn
     pending: filler_workfn
 ...
 workqueue test_wq: flags=0x8
   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/1
     in-flight: 510(RESCUER):test_workfn BAR(69) BAR(500)
     delayed: test_workfn1 BAR(492), test_workfn2
 ...
 pool 0: cpus=0 node=0 flags=0x0 nice=0 workers=2 manager: 137
 pool 2: cpus=1 node=0 flags=0x0 nice=0 workers=3 manager: 469
 pool 3: cpus=1 node=0 flags=0x0 nice=-20 workers=2 idle: 16
 pool 8: cpus=0-3 flags=0x4 nice=0 workers=2 manager: 62

The above shows that test_wq is executing test_workfn() on pid 510
which is the rescuer and also that there are two tasks 69 and 500
waiting for the work item to finish in flush_work().  As test_wq has
max_active of 1, there are two work items for test_workfn1() and
test_workfn2() which are delayed till the current work item is
finished.  In addition, pid 492 is flushing test_workfn1().

The work item for test_workfn() is being executed on pwq of pool 2
which is the normal priority per-cpu pool for CPU 1.  The pool has
three workers, two of which are executing filler_workfn() for
filler_wq and the last one is assuming the manager role trying to
create more workers.

This extra workqueue state dump will hopefully help chasing down hangs
involving workqueues.

v3: cpulist_pr_cont() replaced with "%*pbl" printf formatting.

v2: As suggested by Andrew, minor formatting change in pr_cont_work(),
    printk()'s replaced with pr_info()'s, and cpumask printing now
    uses cpulist_pr_cont().
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@redhat.com>

3494fc30

12 2月, 2015 2 次提交

oom, PM: make OOM detection in the freezer path raceless · c32b3cbe

由 Michal Hocko 提交于 2月 11, 2015

Commit 5695be14 ("OOM, PM: OOM killed task shouldn't escape PM
suspend") has left a race window when OOM killer manages to
note_oom_kill after freeze_processes checks the counter.  The race
window is quite small and really unlikely and partial solution deemed
sufficient at the time of submission.

Tejun wasn't happy about this partial solution though and insisted on a
full solution.  That requires the full OOM and freezer's task freezing
exclusion, though.  This is done by this patch which introduces oom_sem
RW lock and turns oom_killer_disable() into a full OOM barrier.

oom_killer_disabled check is moved from the allocation path to the OOM
level and we take oom_sem for reading for both the check and the whole
OOM invocation.

oom_killer_disable() takes oom_sem for writing so it waits for all
currently running OOM killer invocations.  Then it disable all the further
OOMs by setting oom_killer_disabled and checks for any oom victims.
Victims are counted via mark_tsk_oom_victim resp.  unmark_oom_victim.  The
last victim wakes up all waiters enqueued by oom_killer_disable().
Therefore this function acts as the full OOM barrier.

The page fault path is covered now as well although it was assumed to be
safe before.  As per Tejun, "We used to have freezing points deep in file
system code which may be reacheable from page fault." so it would be
better and more robust to not rely on freezing points here.  Same applies
to the memcg OOM killer.

out_of_memory tells the caller whether the OOM was allowed to trigger and
the callers are supposed to handle the situation.  The page allocation
path simply fails the allocation same as before.  The page fault path will
retry the fault (more on that later) and Sysrq OOM trigger will simply
complain to the log.

Normally there wouldn't be any unfrozen user tasks after
try_to_freeze_tasks so the function will not block. But if there was an
OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
finish yet then we have to wait for it. This should complete in a finite
time, though, because

	- the victim cannot loop in the page fault handler (it would die
	  on the way out from the exception)
	- it cannot loop in the page allocator because all the further
	  allocation would fail and __GFP_NOFAIL allocations are not
	  acceptable at this stage
	- it shouldn't be blocked on any locks held by frozen tasks
	  (try_to_freeze expects lockless context) and kernel threads and
	  work queues are not frozen yet
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Suggested-by: NTejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c32b3cbe

sysrq: convert printk to pr_* equivalent · 401e4a7c

由 Michal Hocko 提交于 2月 11, 2015

While touching this area let's convert printk to pr_*.  This also makes
the printing of continuation lines done properly.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

401e4a7c

07 8月, 2014 1 次提交

mm, oom: ensure memoryless node zonelist always includes zones · 8d060bf4

由 David Rientjes 提交于 8月 06, 2014

With memoryless node support being worked on, it's possible that for
optimizations that a node may not have a non-NULL zonelist.  When
CONFIG_NUMA is enabled and node 0 is memoryless, this means the zonelist
for first_online_node may become NULL.

The oom killer requires a zonelist that includes all memory zones for
the sysrq trigger and pagefault out of memory handler.

Ensure that a non-NULL zonelist is always passed to the oom killer.

[akpm@linux-foundation.org: fix non-numa build]
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d060bf4

07 6月, 2014 2 次提交

sysrq,rcu: suppress RCU stall warnings while sysrq runs · 722773af

由 Rik van Riel 提交于 6月 06, 2014

Some sysrq handlers can run for a long time, because they dump a lot of
data onto a serial console.  Having RCU stall warnings pop up in the
middle of them only makes the problem worse.

This patch temporarily disables RCU stall warnings while a sysrq request
is handled.
Signed-off-by: NRik van Riel <riel@redhat.com>
Suggested-by: NPaul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Madper Xie <cxie@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

722773af

sysrq: rcu-ify __handle_sysrq · 984d74a7

由 Rik van Riel 提交于 6月 06, 2014

Echoing values into /proc/sysrq-trigger seems to be a popular way to get
information out of the kernel.  However, dumping information about
thousands of processes, or hundreds of CPUs to serial console can result
in IRQs being blocked for minutes, resulting in various kinds of cascade
failures.

The most common failure is due to interrupts being blocked for a very
long time.  This can lead to things like failed IO requests, and other
things the system cannot easily recover from.

This problem is easily fixable by making __handle_sysrq use RCU instead
of spin_lock_irqsave.

This leaves the warning that RCU grace periods have not elapsed for a
long time, but the system will come back from that automatically.

It also leaves sysrq-from-irq-context when the sysrq keys are pressed,
but that is probably desired since people want that to work in
situations where the system is already hosed.

The callers of register_sysrq_key and unregister_sysrq_key appear to be
capable of sleeping.
Signed-off-by: NRik van Riel <riel@redhat.com>
Reported-by: NMadper Xie <cxie@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

984d74a7

05 6月, 2014 1 次提交

kernel/printk: use symbolic defines for console loglevels · a8fe19eb

由 Borislav Petkov 提交于 6月 04, 2014

... instead of naked numbers.

Stuff in sysrq.c used to set it to 8 which is supposed to mean above
default level so set it to DEBUG instead as we're terminating/killing all
tasks and we want to be verbose there.

Also, correct the check in x86_64_start_kernel which should be >= as
we're clearly issuing the string there for all debug levels, not only
the magical 10.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8fe19eb

17 10月, 2013 1 次提交

sysrq: Allow magic SysRq key functions to be disabled through Kconfig · 8eaede49

由 Ben Hutchings 提交于 10月 07, 2013

Turn the initial value of sysctl kernel.sysrq (SYSRQ_DEFAULT_ENABLE)
into a Kconfig variable.

Original version by Bastian Blank <waldi@debian.org>.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8eaede49

13 8月, 2013 1 次提交

Input: sysrq - DT binding for key sequence · 4c076eb0

由 Mathieu J. Poirier 提交于 8月 03, 2013

Adding a simple device tree binding for the specification of key
sequences. Definition of the keys found in the sequence are located in
'include/uapi/linux/input.h'.

For the sysrq driver, holding the sequence of keys down for a specific
amount of time will reset the system.
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: NGrant Likely <grant.likely@linaro.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

4c076eb0

07 6月, 2013 1 次提交

Input: sysrq - request graceful shutdown for key reset · 3d289517

由 Mathieu J. Poirier 提交于 6月 05, 2013

Attempt to reboot the system gracefully when a key combo is detected.
If the reste combination is pressed the 2nd time we assume that graceful
reboot failed and perform emergency reboot. This fucntionality is useful
when UI is stuck but the system is otherwise working fine.
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

3d289517

04 6月, 2013 1 次提交

tty: replace strict_strtoul() with kstrtoul() · 86b40567

由 Jingoo Han 提交于 6月 01, 2013

The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

86b40567

02 4月, 2013 1 次提交

Input: sysrq - supplement reset sequence with timeout functionality · 39030786

由 Mathieu J. Poirier 提交于 4月 01, 2013

Some devices have too few buttons, which it makes it hard to have
a reset combo that won't trigger automatically.  As such a
timeout functionality that requires the combination to be held for
a given amount of time before triggering is introduced.

If a key combo is recognized and held for a 'timeout' amount of time,
the system triggers a reset.  If the timeout value is omitted the
driver simply ignores the functionality.
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

39030786

16 3月, 2013 1 次提交

sysrq: fix inconstistent help message of sysrq key · afa80ccb

由 zhangwei(Jovi) 提交于 3月 07, 2013

Currently help message of /proc/sysrq-trigger highlight its
upper-case characters, like below:

      SysRq : HELP : loglevel(0-9) reBoot Crash terminate-all-tasks(E)
      memory-full-oom-kill(F) kill-all-tasks(I) ...

this would confuse user trigger sysrq by upper-case character, which is
inconsistent with the real lower-case character registed key.

This inconsistent help message will also lead more confused when
26 upper-case letters put into use in future.

This patch fix it.

Thanks the comments from Andrew and Randy.
Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

afa80ccb

28 2月, 2013 1 次提交

sysrq: don't depend on weak undefined arrays to have an address that compares as NULL · adf96e6f

由 Linus Torvalds 提交于 2月 27, 2013

When taking an address of an extern array, gcc quite naturally should be
able to say "an address of an object can never be NULL" and just
optimize away the test entirely.

However, the new alternate sysrq reset code (commit 154b7a48:
"Input: sysrq - allow specifying alternate reset sequence") did exactly
that, and declared platform_sysrq_reset_seq[] as a weak array, and
expecting that testing the address of the array would show whether it
actually got linked against something or not.

And that doesn't work with all gcc versions.  Clearly it works with
*some* versions of gcc, and maybe it's even supposed to work, but it
really is a very fragile concept.

So instead of testing the address of the weak variable, just create a
weak instance of that array that is empty.  If some platform then has a
real platform_sysrq_reset_seq[] that overrides our weak one, the linker
will switch to that one, and it all works without any run-time
conditionals at all.
Reported-by: NDave Airlie <airlied@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

adf96e6f

08 2月, 2013 1 次提交

sched/rt: Move rt specific bits into new header file · 8bd75c77

由 Clark Williams 提交于 2月 07, 2013

Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.h
Signed-off-by: NClark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

8bd75c77

17 1月, 2013 1 次提交

Input: sysrq - allow specifying alternate reset sequence · 154b7a48

由 Mathieu Poirier 提交于 1月 06, 2013

This patch adds keyreset functionality to the sysrq driver. It allows
certain button/key combinations to be used in order to trigger emergency
reboots.

Redefining the '__weak platform_sysrq_reset_seq' variable is required
to trigger the feature.  Alternatively keys can be passed to the driver
via a module parameter.

This functionality comes from the keyreset driver submitted by
Arve Hjønnevåg in the Android kernel.
Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

154b7a48

16 11月, 2012 1 次提交

mm, oom: ensure sysrq+f always passes valid zonelist · 53186095

由 David Rientjes 提交于 11月 14, 2012

With hotpluggable and memoryless nodes, it's possible that node 0 will
not be online, so use the first online node's zonelist rather than
hardcoding node 0 to pass a zonelist with all zones to the oom killer.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

53186095

17 10月, 2012 1 次提交
- D
  sparc64: Add global PMU register dumping via sysrq. · 916ca14a
  由 David S. Miller 提交于 10月 16, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  916ca14a
06 4月, 2012 1 次提交

sysrq: use SEND_SIG_FORCED instead of force_sig() · b82c3287

由 Anton Vorontsov 提交于 4月 05, 2012

Change send_sig_all() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL).  With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.

And this is more correct.  force_sig() can race with the exiting thread,
while do_send_sig_info(group => true) kill the whole process.

Some more notes from Oleg Nesterov:

> Just one note. This change makes no difference for sysrq_handle_kill().
> But it obviously changes the behaviour sysrq_handle_term(). I think
> this is fine, if you want to really kill the task which blocks/ignores
> SIGTERM you can use sysrq_handle_kill().
>
> Even ignoring the reasons why force_sig() is simply wrong here,
> force_sig(SIGTERM) looks strange. The task won't be killed if it has
> a handler, but SIG_IGN can't help. However if it has the handler
> but blocks SIGTERM temporary (this is very common) it will be killed.

Also,

> force_sig() can't kill the process if the main thread has already
> exited. IOW, it is trivial to create the process which can't be
> killed by sysrq.

So, this patch fixes the issue.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b82c3287

22 3月, 2012 1 次提交

mm, oom: force oom kill on sysrq+f · 08ab9b10

由 David Rientjes 提交于 3月 21, 2012

The oom killer chooses not to kill a thread if:

 - an eligible thread has already been oom killed and has yet to exit,
   and

 - an eligible thread is exiting but has yet to free all its memory and
   is not the thread attempting to currently allocate memory.

SysRq+F manually invokes the global oom killer to kill a memory-hogging
task.  This is normally done as a last resort to free memory when no
progress is being made or to test the oom killer itself.

For both uses, we always want to kill a thread and never defer.  This
patch causes SysRq+F to always kill an eligible thread and can be used to
force a kill even if another oom killed thread has failed to exit.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NPekka Enberg <penberg@kernel.org>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08ab9b10

09 3月, 2012 1 次提交

vt:tackle kbd_table · 079c9534

由 Alan Cox 提交于 2月 28, 2012

Keyboard struct lifetime is easy, but the locking is not and is completely
ignored by the existing code. Tackle this one head on

- Make the kbd_table private so we can run down all direct users
- Hoick the relevant ioctl handlers into the keyboard layer
- Lock them with the keyboard lock so they don't change mid keypress
- Add helpers for things like console stop/start so we isolate the poking
  around properly
- Tweak the braille console so it still builds

There are a couple of FIXME locking cases left for ioctls that are so hideous
they should be addressed in a later patch. After this patch the kbd_table is
private and all the keyboard jiggery pokery is in one place.

This update fixes speakup and also a memory leak in the original.
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

079c9534

10 2月, 2012 2 次提交

sysrq: Properly check for kernel threads · d3a532a9

由 Anton Vorontsov 提交于 2月 07, 2012

There's a real possibility of killing kernel threads that might
have issued use_mm(), so kthread's mm might become non-NULL.

This patch fixes the issue by checking for PF_KTHREAD (just as
get_task_mm()).
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d3a532a9

sysrq: Fix possible race with exiting task · e502babe

由 Anton Vorontsov 提交于 2月 07, 2012

sysrq should grab the tasklist lock, otherwise calling force_sig() is
not safe, as it might race with exiting task, which ->sighand might be
set to NULL already.
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e502babe

04 1月, 2012 1 次提交

fs: move code out of buffer.c · ff01bb48

由 Al Viro 提交于 9月 16, 2011

Move invalidate_bdev, block_sync_page into fs/block_dev.c.  Export
kill_bdev as well, so brd doesn't have to open code it.  Reduce
buffer_head.h requirement accordingly.

Removed a rather large comment from invalidate_bdev, as it looked a bit
obsolete to bother moving.  The small comment replacing it says enough.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff01bb48

25 3月, 2011 1 次提交

lib, arch: add filter argument to show_mem and fix private implementations · b2b755b5

由 David Rientjes 提交于 3月 24, 2011

Commit ddd588b5 ("oom: suppress nodes that are not allowed from
meminfo on oom kill") moved lib/show_mem.o out of lib/lib.a, which
resulted in build warnings on all architectures that implement their own
versions of show_mem():

	lib/lib.a(show_mem.o): In function `show_mem':
	show_mem.c:(.text+0x1f4): multiple definition of `show_mem'
	arch/sparc/mm/built-in.o:(.text+0xd70): first defined here

The fix is to remove __show_mem() and add its argument to show_mem() in
all implementations to prevent this breakage.

Architectures that implement their own show_mem() actually don't do
anything with the argument yet, but they could be made to filter nodes
that aren't allowed in the current context in the future just like the
generic implementation.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reported-by: NJames Bottomley <James.Bottomley@hansenpartnership.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2b755b5

05 11月, 2010 1 次提交

TTY: create drivers/tty and move the tty core files there · 96fd7ce5

由 Greg Kroah-Hartman 提交于 11月 04, 2010

The tty code should be in its own subdirectory and not in the char
driver with all of the cruft that is currently there.

Based on work done by Arnd Bergmann <arnd@arndb.de>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

96fd7ce5

15 10月, 2010 1 次提交

llseek: automatically add .llseek fop · 6038f373

由 Arnd Bergmann 提交于 8月 15, 2010

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>

6038f373

30 9月, 2010 1 次提交

Input: sysrq - add locking to sysrq_filter() · 1966cb22

由 Dmitry Torokhov 提交于 9月 29, 2010

Similarly to the keyboard handler, we are called by different input
devices and thus need to add spinlock if we want to maintain our
state properly.
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

1966cb22

21 8月, 2010 1 次提交

Input: sysrq - drop tty argument form handle_sysrq() · f335397d

由 Dmitry Torokhov 提交于 8月 17, 2010

Sysrq operations do not accept tty argument anymore so no need to pass
it to us.

[Stephen Rothwell <sfr@canb.auug.org.au>: fix build breakage in drm code
 caused by sysrq using bool but not including linux/types.h]

[Sachin Sant <sachinp@in.ibm.com>: fix build breakage in s390 keyboadr
 driver]
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

f335397d

20 8月, 2010 1 次提交

Input: sysrq - drop tty argument from sysrq ops handlers · 1495cc9d

由 Dmitry Torokhov 提交于 8月 17, 2010

Noone is using tty argument so let's get rid of it.
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

1495cc9d

22 7月, 2010 1 次提交

sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function · edd63cb6

由 Jason Wessel 提交于 7月 21, 2010

The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

edd63cb6

09 6月, 2010 1 次提交

Input: sysrq - fix "stuck" SysRq mode · f5dec511

由 Dmitry Torokhov 提交于 6月 09, 2010

This shoud fix the problem with SysRq mode staying half-way enabled
and interfereing with normal PrtScrn operation after user presses ALT
for the first time.
Reported-and-tested-by: NÉric Piel <E.A.B.Piel@tudelft.nl>
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

f5dec511

22 4月, 2010 1 次提交

tracing: Dump either the oops's cpu source or all cpus buffers · cecbca96

由 Frederic Weisbecker 提交于 4月 18, 2010

The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.

It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.

Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.

This patch modifies ftrace_dump_on_oops to handle this choice.

The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.

Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.

v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>

cecbca96

14 4月, 2010 1 次提交

Input: implement SysRq as a separate input handler · 97f5f0cd

由 Dmitry Torokhov 提交于 3月 21, 2010

Instead of keeping SysRq support inside of legacy keyboard driver split
it out into a separate input handler (filter). This stops most SysRq input
events from leaking into evdev clients (some events, such as first SysRq
scancode - not keycode - event, are still leaked into both legacy keyboard
and evdev).

[martinez.javier@gmail.com: fix compile error when CONFIG_MAGIC_SYSRQ is
 not defined]
Signed-off-by: NDmitry Torokhov <dtor@mail.ru>

97f5f0cd

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

16 12月, 2009 1 次提交

oom-kill: fix NUMA constraint check with nodemask · 4365a567

由 KAMEZAWA Hiroyuki 提交于 12月 15, 2009

Fix node-oriented allocation handling in oom-kill.c I myself think of this
as a bugfix not as an ehnancement.

In these days, things are changed as
  - alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask().
  - mempolicy don't maintain its own private zonelists.
  (And cpuset doesn't use nodemask for __alloc_pages_nodemask())

So, current oom-killer's check function is wrong.

This patch does
  - check nodemask, if nodemask && nodemask doesn't cover all
    node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY.
  - Scan all zonelist under nodemask, if it hits cpuset's wall
    this faiulre is from cpuset.
And
  - modifies the caller of out_of_memory not to call oom if __GFP_THISNODE.
    This doesn't change "current" behavior. If callers use __GFP_THISNODE
    it should handle "page allocation failure" by itself.

  - handle __GFP_NOFAIL+__GFP_THISNODE path.
    This is something like a FIXME but this gfpmask is not used now.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4365a567

21 9月, 2009 1 次提交

perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482

由 Ingo Molnar 提交于 9月 21, 2009

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

  FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

  sed -i \
    -e 's/PERF_EVENT_/PERF_RECORD_/g' \
    -e 's/PERF_COUNTER/PERF_EVENT/g' \
    -e 's/perf_counter/perf_event/g' \
    -e 's/nb_counters/nb_events/g' \
    -e 's/swcounter/swevent/g' \
    -e 's/tpcounter_event/tp_event/g' \
    $FILES

  for N in $(find . -name perf_counter.[ch]); do
    M=$(echo $N | sed 's/perf_counter/perf_event/g')
    mv $N $M
  done

  FILES=$(find . -name perf_event.*)

  sed -i \
    -e 's/COUNTER_MASK/REG_MASK/g' \
    -e 's/COUNTER/EVENT/g' \
    -e 's/\<event\>/event_id/g' \
    -e 's/counter/event/g' \
    -e 's/Counter/Event/g' \
    $FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
  with hardware registers - and these sed scripts are a bit
  over-eager in renaming them. I've undone some of that, but
  in case there's something left where 'counter' would be
  better than 'event' we can undo that on an individual basis
  instead of touching an otherwise nicely automated patch. )
Suggested-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NPaul Mackerras <paulus@samba.org>
Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cdd6c482

03 8月, 2009 1 次提交

debug lockups: Improve lockup detection, fix generic arch fallback · 47cab6a7

由 Ingo Molnar 提交于 8月 03, 2009

As Andrew noted, my previous patch ("debug lockups: Improve lockup
detection") broke/removed SysRq-L support from architecture that do
not provide a __trigger_all_cpu_backtrace implementation.

Restore a fallback path and clean up the SysRq-L machinery a bit:

 - Rename the arch method to arch_trigger_all_cpu_backtrace()

 - Simplify the define

 - Document the method a bit - in the hope of more architectures
   adding support for it.

[ The patch touches Sparc code for the rename. ]

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
LKML-Reference: <20090802140809.7ec4bb6b.akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

47cab6a7

02 8月, 2009 1 次提交

debug lockups: Improve lockup detection · c1dc0b9c

由 Ingo Molnar 提交于 8月 02, 2009

When debugging a recent lockup bug i found various deficiencies
in how our current lockup detection helpers work:

 - SysRq-L is not very efficient as it uses a workqueue, hence
   it cannot punch through hard lockups and cannot see through
   most soft lockups either.

 - The SysRq-L code depends on the NMI watchdog - which is off
   by default.

 - We dont print backtraces from the RCU code's built-in
   'RCU state machine is stuck' debug code. This debug
   code tends to be one of the first (and only) mechanisms
   that show that a lockup has occured.

This patch changes the code so taht we:

 - Trigger the NMI backtrace code from SysRq-L instead of using
   a workqueue (which cannot punch through hard lockups)

 - Trigger print-all-CPU-backtraces from the RCU lockup detection
   code

Also decouple the backtrace printing code from the NMI watchdog:

 - Dont use variable size cpumasks (it might not be initialized
   and they are a bit more fragile anyway)

 - Trigger an NMI immediately via an IPI, instead of waiting
   for the NMI tick to occur. This is a lot faster and can
   produce more relevant backtraces. It will also work if the
   NMI watchdog is disabled.

 - Dont print the 'dazed and confused' message when we print
   a backtrace from the NMI

 - Do a show_regs() plus a dump_stack() to get maximum info
   out of the dump. Worst-case we get two stacktraces - which
   is not a big deal. Sometimes, if register content is
   corrupted, the precise stack walker in show_regs() wont
   give us a full backtrace - in this case dump_stack() will
   do it.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c1dc0b9c

30 7月, 2009 1 次提交

sysrq, kdump: make sysrq-c consistent · cab8bd34

由 Hidetoshi Seto 提交于 7月 29, 2009

commit d6580a9f ("kexec: sysrq: simplify
sysrq-c handler") changed the behavior of sysrq-c to unconditional
dereference of NULL pointer.  So in cases with CONFIG_KEXEC, where
crash_kexec() was directly called from sysrq-c before, now it can be said
that a step of "real oops" was inserted before starting kdump.

However, in contrast to oops via SysRq-c from keyboard which results in
panic due to in_interrupt(), oops via "echo c > /proc/sysrq-trigger" will
not become panic unless panic_on_oops=1.  It means that even if dump is
properly configured to be taken on panic, the sysrq-c from proc interface
might not start crashdump while the sysrq-c from keyboard can start
crashdump.  This confuses traditional users of kdump, i.e.  people who
expect sysrq-c to do common behavior in both of the keyboard and proc
interface.

This patch brings the keyboard and proc interface behavior of sysrq-c in
line, by forcing panic_on_oops=1 before oops in sysrq-c handler.

And some updates in documentation are included, to clarify that there is
no longer dependency with CONFIG_KEXEC, and that now the system can just
crash by sysrq-c if no dump mechanism is configured.
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Brayan Arraes <brayan@yack.com.br>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cab8bd34