提交 · 976513dbfc1547c7b1822566923058655f0c32fd · openeuler / raspberrypi-kernel

07 1月, 2011 2 次提交

PM / ACPI: Move NVS saving and restoring code to drivers/acpi · 976513db

由 Rafael J. Wysocki 提交于 1月 07, 2011

The saving of the ACPI NVS area during hibernation and suspend and
restoring it during the subsequent resume is entirely specific to
ACPI, so move it to drivers/acpi and drop the CONFIG_SUSPEND_NVS
configuration option which is redundant.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

976513db

PM: Fix oops in suspend/hibernate code related to failing ioremap() · 26fcaf60

由 Jiri Slaby 提交于 1月 07, 2011

When ioremap() fails (which might happen for some reason), we nicely
oops in suspend_nvs_save() due to NULL dereference by memcpy() in there.
Fail gracefully instead.
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

26fcaf60

03 1月, 2011 1 次提交

watchdog: Improve initialisation error message and documentation · 55142374

由 Ben Hutchings 提交于 1月 02, 2011

The error message 'NMI watchdog failed to create perf event...'
does not make it clear that this is a fatal error for the
watchdog.  It also currently prints the error value as a
pointer, rather than extracting the error code with PTR_ERR().
Fix that.

Add a note to the description of the 'nowatchdog' kernel
parameter to associate it with this message.
Reported-by: NCesare Leonardi <celeonar@gmail.com>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Cc: 599368@bugs.debian.org
Cc: 608138@bugs.debian.org
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org> # .37.x and later
LKML-Reference: <1294009362.3167.126.camel@localhost>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

55142374

30 12月, 2010 1 次提交

fix freeing user_struct in user cache · 4ef9e11d

由 Hillf Danton 提交于 12月 29, 2010

When racing on adding into user cache, the new allocated from mm slab
is freed without putting user namespace.

Since the user namespace is already operated by getting, putting has
to be issued.
Signed-off-by: NHillf Danton <dhillf@gmail.com>
Acked-by: NSerge Hallyn <serge@hallyn.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ef9e11d

24 12月, 2010 1 次提交

ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page · e1e35927

由 David Sharp 提交于 12月 22, 2010

Fix two related problems in the event-copying loop of
ring_buffer_read_page.

The loop condition for copying events is off-by-one.
"len" is the remaining space in the caller-supplied page.
"size" is the size of the next event (or two events).
If len == size, then there is just enough space for the next event.

size was set to rb_event_ts_length, which may include the size of two
events if the first event is a time-extend, in order to assure time-
extends are kept together with the event after it. However,
rb_advance_reader always advances by one event. This would result in the
event after any time-extend being duplicated. Instead, get the size of
a single event for the memcpy, but use rb_event_ts_length for the loop
condition.
Signed-off-by: NDavid Sharp <dhsharp@google.com>
LKML-Reference: <1293064704-8101-1-git-send-email-dhsharp@google.com>
LKML-Reference: <AANLkTin7nLrRPc9qGjdjHbeVDDWiJjAiYyb-L=gH85bx@mail.gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e1e35927

23 12月, 2010 1 次提交

taskstats: pad taskstats netlink response for aligment issues on ia64 · 4be2c95d

由 Jeff Mahoney 提交于 12月 21, 2010

The taskstats structure is internally aligned on 8 byte boundaries but the
layout of the aggregrate reply, with two NLA headers and the pid (each 4
bytes), actually force the entire structure to be unaligned. This causes
the kernel to issue unaligned access warnings on some architectures like
ia64. Unfortunately, some software out there doesn't properly unroll the
NLA packet and assumes that the start of the taskstats structure will
always be 20 bytes from the start of the netlink payload. Aligning the
start of the taskstats structure breaks this software, which we don't
want. So, for now the alignment only happens on architectures that
require it and those users will have to update to fixed versions of those
packages. Space is reserved in the packet only when needed. This ifdef
should be removed in several years e.g. 2012 once we can be confident
that fixed versions are installed on most systems. We add the padding
before the aggregate since the aggregate is already a defined type.

Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
previously addressed the alignment issues by padding out the pid field.
This was supposed to be a compatible change but the circumstances
described above mean that it wasn't. This patch backs out that change,
since it was a hack, and introduces a new NULL attribute type to provide
the padding. Padding the response with 4 bytes avoids allocating an
aligned taskstats structure and copying it back. Since the structure
weighs in at 328 bytes, it's too big to do it on the stack.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reported-by: NBrian Rogers <brian@xyzw.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Guillaume Chazarain <guichaz@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4be2c95d

22 12月, 2010 1 次提交

kthread_work: make lockdep happy · 4f32e9b1

由 Yong Zhang 提交于 12月 22, 2010

spinlock in kthread_worker and wait_queue_head in kthread_work both
should be lockdep sensible, so change the interface to make it
suiltable for CONFIG_LOCKDEP.

tj: comment update
Reported-by: NNicolas <nicolas.mailhot@laposte.net>
Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
Signed-off-by: NAndy Walls <awalls@md.metrocast.net>
Tested-by: NAndy Walls <awalls@md.metrocast.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

4f32e9b1

20 12月, 2010 1 次提交

sched: Remove debugging check · 050c6c9b

由 Ingo Molnar 提交于 12月 19, 2010

Linus reported that the new warning introduced by commit f26f9aff
"Sched: fix skip_clock_update optimization" triggers. The need_resched
flag can be set by other CPUs asynchronously so this debug check is
bogus - remove it.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <AANLkTinJ8hAG1TpyC+CSYPR47p48+1=E7fiC45hMXT_1@mail.gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

050c6c9b

18 12月, 2010 2 次提交

resources: add arch hook for preventing allocation in reserved areas · fcb11918

由 Bjorn Helgaas 提交于 12月 16, 2010

This adds arch_remove_reservations(), which an arch can implement if it
needs to protect part of the address space from allocation.

Sometimes that can be done by just putting a region in the resource tree,
but there are cases where that doesn't work well.  For example, x86 BIOS
E820 reservations are not related to devices, so they may overlap part of,
all of, or more than a device resource, so they may not end up at the
correct spot in the resource tree.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

fcb11918

Revert "resources: support allocating space within a region from the top down" · c0f5ac54

由 Bjorn Helgaas 提交于 12月 16, 2010

This reverts commit e7f8567d.
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

c0f5ac54

17 12月, 2010 2 次提交

PM / Hibernate: Restore old swap signature to avoid user space breakage · be8cd644

由 Rafael J. Wysocki 提交于 12月 11, 2010

Commit 3624eb04 (PM / Hibernate: Modify signature used to mark swap)
attempted to modify hibernate signature used to mark swap partitions
containing hibernation images, so that old kernels don't try to
handle compressed images.  However, this change broke resume from
hibernation on Fedora 14 that apparently doesn't pass the resume=
argument to the kernel and tries to trigger resume from early user
space.  This doesn't work, because the signature is now different,
so the old signature has to be restored to avoid the problem.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22732 .
Reported-by: NDr. David Alan Gilbert <linux@treblig.org>
Reported-by: NZhang Rui <rui.zhang@intel.com>
Reported-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

be8cd644

PM / Hibernate: Fix PM_POST_* notification with user-space suspend · 1497dd1d

由 Takashi Iwai 提交于 12月 10, 2010

The user-space hibernation sends a wrong notification after the image
restoration because of thinko for the file flag check.  RDONLY
corresponds to hibernation and WRONLY to restoration, confusingly.
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org

1497dd1d

16 12月, 2010 3 次提交

sched: Fix the irqtime code for 32bit · 8e92c201

由 Peter Zijlstra 提交于 12月 09, 2010

Since the irqtime accounting is using non-atomic u64 and can be read
from remote cpus (writes are strictly cpu local, reads are not) we
have to deal with observing partial updates.

When we do observe partial updates the clock movement (in particular,
->clock_task movement) will go funny (in either direction), a
subsequent clock update (observing the full update) will make it go
funny in the oposite direction.

Since we rely on these clocks to be strictly monotonic we cannot
suffer backwards motion. One possible solution would be to simply
ignore all backwards deltas, but that will lead to accounting
artefacts, most notable: clock_task + irq_time != clock, this
inaccuracy would end up in user visible stats.

Therefore serialize the reads using a seqcount.
Reviewed-by: NVenkatesh Pallipadi <venki@google.com>
Reported-by: NMikael Pettersson <mikpe@it.uu.se>
Tested-by: NMikael Pettersson <mikpe@it.uu.se>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292242434.6803.200.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8e92c201

sched: Fix the irqtime code to deal with u64 wraps · fe44d621

由 Peter Zijlstra 提交于 12月 09, 2010

Some ARM systems have a short sched_clock() [ which needs to be fixed
too ], but this exposed a bug in the irq_time code as well, it doesn't
deal with wraps at all.

Fix the irq_time code to deal with u64 wraps by re-writing the code to
only use delta increments, which avoids the whole issue.
Reviewed-by: NVenkatesh Pallipadi <venki@google.com>
Reported-by: NMikael Pettersson <mikpe@it.uu.se>
Tested-by: NMikael Pettersson <mikpe@it.uu.se>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292242433.6803.199.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fe44d621

perf: Fix off by one in perf_swevent_init() · ce677831

由 Dan Carpenter 提交于 10月 24, 2010

The perf_swevent_enabled[] array has PERF_COUNT_SW_MAX elements.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101024195041.GT5985@bicker>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ce677831

14 12月, 2010 1 次提交

workqueue: It is likely that WORKER_NOT_RUNNING is true · 2d64672e

由 Steven Rostedt 提交于 12月 03, 2010

Running the annotate branch profiler on three boxes, including my
main box that runs firefox, evolution, xchat, and is part of the distcc farm,
showed this with the likelys in the workqueue code:

 correct incorrect  %        Function                  File              Line
 ------- ---------  -        --------                  ----              ----
      96   996253  99 wq_worker_sleeping             workqueue.c          703
      96   996247  99 wq_worker_waking_up            workqueue.c          677

The likely()s in this case were assuming that WORKER_NOT_RUNNING will
most likely be false. But this is not the case. The reason is
(and shown by adding trace_printks and testing it) that most of the time
WORKER_PREP is set.

In worker_thread() we have:

	worker_clr_flags(worker, WORKER_PREP);

	[ do work stuff ]

	worker_set_flags(worker, WORKER_PREP, false);

(that 'false' means not to wake up an idle worker)

The wq_worker_sleeping() is called from schedule when a worker thread
is putting itself to sleep. Which happens most of the time outside
of that [ do work stuff ].

The wq_worker_waking_up is called by the wakeup worker code, which
is also callod outside that [ do work stuff ].

Thus, the likely and unlikely used by those two functions are actually
backwards.

Remove the annotation and let gcc figure it out.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

2d64672e

09 12月, 2010 4 次提交

nohz: Fix get_next_timer_interrupt() vs cpu hotplug · dbd87b5a

由 Heiko Carstens 提交于 12月 01, 2010

This fixes a bug as seen on 2.6.32 based kernels where timers got
enqueued on offline cpus.

If a cpu goes offline it might still have pending timers. These will
be migrated during CPU_DEAD handling after the cpu is offline.
However while the cpu is going offline it will schedule the idle task
which will then call tick_nohz_stop_sched_tick().

That function in turn will call get_next_timer_intterupt() to figure
out if the tick of the cpu can be stopped or not. If it turns out that
the next tick is just one jiffy off (delta_jiffies == 1)
tick_nohz_stop_sched_tick() incorrectly assumes that the tick should
not stop and takes an early exit and thus it won't update the load
balancer cpu.

Just afterwards the cpu will be killed and the load balancer cpu could
be the offline cpu.

On 2.6.32 based kernel get_nohz_load_balancer() gets called to decide
on which cpu a timer should be enqueued (see __mod_timer()). Which
leads to the possibility that timers get enqueued on an offline cpu.
These will never expire and can cause a system hang.

This has been observed 2.6.32 kernels. On current kernels
__mod_timer() uses get_nohz_timer_target() which doesn't have that
problem. However there might be other problems because of the too
early exit tick_nohz_stop_sched_tick() in case a cpu goes offline.

The easiest and probably safest fix seems to be to let
get_next_timer_interrupt() just lie and let it say there isn't any
pending timer if the current cpu is offline.

I also thought of moving migrate_[hr]timers() from CPU_DEAD to
CPU_DYING, but seeing that there already have been fixes at least in
the hrtimer code in this area I'm afraid that this could add new
subtle bugs.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101201091109.GA8984@osiris.boeblingen.de.ibm.com>
Cc: stable@kernel.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dbd87b5a

Sched: fix skip_clock_update optimization · f26f9aff

由 Mike Galbraith 提交于 12月 08, 2010

idle_balance() drops/retakes rq->lock, leaving the previous task
vulnerable to set_tsk_need_resched().  Clear it after we return
from balancing instead, and in setup_thread_stack() as well, so
no successfully descheduled or never scheduled task has it set.

Need resched confused the skip_clock_update logic, which assumes
that the next call to update_rq_clock() will come nearly immediately
after being set.  Make the optimization robust against the waking
a sleeper before it sucessfully deschedules case by checking that
the current task has not been dequeued before setting the flag,
since it is that useless clock update we're trying to save, and
clear unconditionally in schedule() proper instead of conditionally
in put_prev_task().
Signed-off-by: NMike Galbraith <efault@gmx.de>
Reported-by: NBjoern B. Brandenburg <bbb.lst@gmail.com>
Tested-by: NYong Zhang <yong.zhang0@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
LKML-Reference: <1291802742.1417.9.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f26f9aff

sched: Cure more NO_HZ load average woes · 0f004f5a

由 Peter Zijlstra 提交于 11月 30, 2010

There's a long-running regression that proved difficult to fix and
which is hitting certain people and is rather annoying in its effects.

Damien reported that after 74f5187a (sched: Cure load average vs
NO_HZ woes) his load average is unnaturally high, he also noted that
even with that patch reverted the load avgerage numbers are not
correct.

The problem is that the previous patch only solved half the NO_HZ
problem, it addressed the part of going into NO_HZ mode, not of
comming out of NO_HZ mode. This patch implements that missing half.

When comming out of NO_HZ mode there are two important things to take
care of:

 - Folding the pending idle delta into the global active count.
 - Correctly aging the averages for the idle-duration.

So with this patch the NO_HZ interaction should be complete and
behaviour between CONFIG_NO_HZ=[yn] should be equivalent.

Furthermore, this patch slightly changes the load average computation
by adding a rounding term to the fixed point multiplication.
Reported-by: NDamien Wyart <damien.wyart@free.fr>
Reported-by: NTim McGrath <tmhikaru@gmail.com>
Tested-by: NDamien Wyart <damien.wyart@free.fr>
Tested-by: NOrion Poplawski <orion@cora.nwra.com>
Tested-by: NKyle McMartin <kyle@mcmartin.ca>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Cc: Chase Douglas <chase.douglas@canonical.com>
LKML-Reference: <1291129145.32004.874.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0f004f5a

perf: Fix duplicate events with multiple-pmu vs software events · 51676957

由 Peter Zijlstra 提交于 12月 07, 2010

Because the multi-pmu bits can share contexts between struct pmu
instances we could get duplicate events by iterating the pmu list.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

51676957

07 12月, 2010 2 次提交

PM / Hibernate: Fix memory corruption related to swap · c9e664f1

由 Rafael J. Wysocki 提交于 12月 03, 2010

There is a problem that swap pages allocated before the creation of
a hibernation image can be released and used for storing the contents
of different memory pages while the image is being saved.  Since the
kernel stored in the image doesn't know of that, it causes memory
corruption to occur after resume from hibernation, especially on
systems with relatively small RAM that need to swap often.

This issue can be addressed by keeping the GFP_IOFS bits clear
in gfp_allowed_mask during the entire hibernation, including the
saving of the image, until the system is finally turned off or
the hibernation is aborted.  Unfortunately, for this purpose
it's necessary to rework the way in which the hibernate and
suspend code manipulates gfp_allowed_mask.

This change is based on an earlier patch from Hugh Dickins.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Reported-by: NOndrej Zary <linux@rainbow-software.org>
Acked-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: stable@kernel.org

c9e664f1

PM / Hibernate: Use async I/O when reading compressed hibernation image · 9f339caf

由 Bojan Smojver 提交于 11月 25, 2010

This is a fix for reading LZO compressed image using async I/O.
Essentially, instead of having just one page into which we keep
reading blocks from swap, we allocate enough of them to cover the
largest compressed size and then let block I/O pick them all up. Once
we have them all (and here we wait), we decompress them, as usual.
Obviously, the very first block we still pick up synchronously,
because we need to know the size of the lot before we pick up the
rest.

Also fixed the copyright line, which I've forgotten before.
Signed-off-by: NBojan Smojver <bojan@rexursive.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

9f339caf

03 12月, 2010 1 次提交

do_exit(): make sure that we run with get_fs() == USER_DS · 33dd94ae

由 Nelson Elhage 提交于 12月 02, 2010

If a user manages to trigger an oops with fs set to KERNEL_DS, fs is not
otherwise reset before do_exit().  do_exit may later (via mm_release in
fork.c) do a put_user to a user-controlled address, potentially allowing
a user to leverage an oops into a controlled write into kernel memory.

This is only triggerable in the presence of another bug, but this
potentially turns a lot of DoS bugs into privilege escalations, so it's
worth fixing.  I have proof-of-concept code which uses this bug along
with CVE-2010-3849 to write a zero to an arbitrary kernel address, so
I've tested that this is not theoretical.

A more logical place to put this fix might be when we know an oops has
occurred, before we call do_exit(), but that would involve changing
every architecture, in multiple places.

Let's just stick it in do_exit instead.

[akpm@linux-foundation.org: update code comment]
Signed-off-by: NNelson Elhage <nelhage@ksplice.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33dd94ae

01 12月, 2010 2 次提交

genirq: Fix incorrect proc spurious output · 25c9170e

由 Kenji Kaneshige 提交于 11月 30, 2010

Since commit a1afb637(switch /proc/irq/*/spurious to seq_file) all
/proc/irq/XX/spurious files show the information of irq 0.

Current irq_spurious_proc_open() passes on NULL as the 3rd argument,
which is used as an IRQ number in irq_spurious_proc_show(), to the
single_open(). Because of this, all the /proc/irq/XX/spurious file
shows IRQ 0 information regardless of the IRQ number.

To fix the problem, irq_spurious_proc_open() must pass on the
appropreate data (IRQ number) to single_open().
Signed-off-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <4CF4B778.90604@jp.fujitsu.com>
Cc: stable@kernel.org [2.6.33+]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

25c9170e

tracing: Fix panic when lseek() called on "trace" opened for writing · 364829b1

由 Slava Pestov 提交于 11月 24, 2010

The file_ops struct for the "trace" special file defined llseek as seq_lseek().
However, if the file was opened for writing only, seq_open() was not called,
and the seek would dereference a null pointer, file->private_data.

This patch introduces a new wrapper for seq_lseek() which checks if the file
descriptor is opened for reading first. If not, it does nothing.

Cc: <stable@kernel.org>
Signed-off-by: NSlava Pestov <slavapestov@google.com>
LKML-Reference: <1290640396-24179-1-git-send-email-slavapestov@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

364829b1

26 11月, 2010 5 次提交

nohz: Fix printk_needs_cpu() return value on offline cpus · 61ab2544

由 Heiko Carstens 提交于 11月 26, 2010

This patch fixes a hang observed with 2.6.32 kernels where timers got enqueued
on offline cpus.

printk_needs_cpu() may return 1 if called on offline cpus. When a cpu gets
offlined it schedules the idle process which, before killing its own cpu, will
call tick_nohz_stop_sched_tick(). That function in turn will call
printk_needs_cpu() in order to check if the local tick can be disabled. On
offline cpus this function should naturally return 0 since regardless if the
tick gets disabled or not the cpu will be dead short after. That is besides the
fact that __cpu_disable() should already have made sure that no interrupts on
the offlined cpu will be delivered anyway.

In this case it prevents tick_nohz_stop_sched_tick() to call
select_nohz_load_balancer(). No idea if that really is a problem. However what
made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is
used within __mod_timer() to select a cpu on which a timer gets enqueued. If
printk_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get
updated when a cpu gets offlined. It may contain the cpu number of an offline
cpu. In turn timers get enqueued on an offline cpu and not very surprisingly
they never expire and cause system hangs.

This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses
get_nohz_timer_target() which doesn't have that problem. However there might be
other problems because of the too early exit tick_nohz_stop_sched_tick() in
case a cpu goes offline.

Easiest way to fix this is just to test if the current cpu is offline and call
printk_tick() directly which clears the condition.

Alternatively I tried a cpu hotplug notifier which would clear the condition,
however between calling the notifier function and printk_needs_cpu() something
could have called printk() again and the problem is back again. This seems to
be the safest fix.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
LKML-Reference: <20101126120235.406766476@de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

61ab2544

printk: Fix wake_up_klogd() vs cpu hotplug · 49f41383

由 Heiko Carstens 提交于 11月 26, 2010

wake_up_klogd() may get called from preemptible context but uses
__raw_get_cpu_var() to write to a per cpu variable. If it gets preempted
between getting the address and writing to it, the cpu in question could be
offline if the process gets scheduled back and hence writes to the per cpu data
of an offline cpu.

This buggy behaviour was introduced with fa33507a "printk: robustify
printk, fix #2" which was supposed to fix a "using smp_processor_id() in
preemptible" warning.

Let's use this_cpu_write() instead which disables preemption and makes sure
that the outlined scenario cannot happen.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101126124247.GC7023@osiris.boeblingen.de.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

49f41383

perf: Fix the software context switch counter · ee6dcfa4

由 Peter Zijlstra 提交于 11月 26, 2010

Stephane noticed that because the perf_sw_event() call is inside the
perf_event_task_sched_out() call it won't get called unless we
have a per-task counter.
Reported-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ee6dcfa4

perf: Fix inherit vs. context rotation bug · dddd3379

由 Thomas Gleixner 提交于 11月 24, 2010

It was found that sometimes children of tasks with inherited events had
one extra event. Eventually it turned out to be due to the list rotation
no being exclusive with the list iteration in the inheritance code.

Cure this by temporarily disabling the rotation while we inherit the events.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dddd3379

workqueue: check the allocation of system_unbound_wq · e5cba24e

由 Hitoshi Mitake 提交于 11月 26, 2010

I found a trivial bug on initialization of workqueue.
Current init_workqueues doesn't check the result of
allocation of system_unbound_wq, this should be checked
like other queues.
Signed-off-by: NHitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

e5cba24e

20 11月, 2010 1 次提交

Revert "kernel: make /proc/kallsyms mode 400 to reduce ease of attacking" · 33e0d57f

由 Linus Torvalds 提交于 11月 19, 2010

This reverts commit 59365d13.

It turns out that this can break certain existing user land setups.
Quoth Sarah Sharp:

 "On Wednesday, I updated my branch to commit 460781b5 from linus' tree,
  and my box would not boot.  klogd segfaulted, which stalled the whole
  system.

  At first I thought it actually hung the box, but it continued booting
  after 5 minutes, and I was able to log in.  It dropped back to the
  text console instead of the graphical bootup display for that period
  of time.  dmesg surprisingly still works.  I've bisected the problem
  down to this commit (commit 59365d13)

  The box is running klogd 1.5.5ubuntu3 (from Jaunty).  Yes, I know
  that's old.  I read the bit in the commit about changing the
  permissions of kallsyms after boot, but if I can't boot that doesn't
  help."

So let's just keep the old default, and encourage distributions to do
the "chmod -r /proc/kallsyms" in their bootup scripts.  This is not
worth a kernel option to change default behavior, since it's so easily
done in user space.
Reported-and-bisected-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
Cc: Marcus Meissner <meissner@suse.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33e0d57f

18 11月, 2010 7 次提交

irq_work: Drop cmpxchg() result · 94e8ba72

由 Sergio Aguirre 提交于 11月 16, 2010

The compiler warned us about:

 kernel/irq_work.c: In function 'irq_work_run':
 kernel/irq_work.c:148: warning: value computed is not used

Dropping the cmpxchg() result is indeed weird, but correct -
so annotate away the warning.
Signed-off-by: NSergio Aguirre <saaguirre@ti.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1289930567-17828-1-git-send-email-saaguirre@ti.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

94e8ba72

perf: Fix owner-list vs exit · 8882135b

由 Peter Zijlstra 提交于 11月 09, 2010

Oleg noticed that a perf-fd keeping a reference on the creating task
leads to a few funny side effects.

There's two different aspects to this:

  - kernel based perf-events, these should not take out
    a reference on the creating task and appear on the task's
    event list since they're not bound to fds nor visible
    to userspace.

  - fork() and pthread_create(), these can lead to the creating
    task dying (and thus the task's event-list becomming useless)
    but keeping the list and ref alive until the event is closed.

Combined they lead to malfunction of the ptrace hw_tracepoints.

Cure this by not considering kernel based perf_events for the
owner-list and destroying the owner-list when the owner dies.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NOleg Nesterov <oleg@redhat.com>
LKML-Reference: <1289576883.2084.286.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8882135b

sched: Fix idle balancing · d5ad140b

由 Nikhil Rao 提交于 11月 17, 2010

An earlier commit reverts idle balancing throttling reset to fix a 30%
regression in volanomark throughput. We still need to reset idle_stamp
when we pull a task in newidle balance.
Reported-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NNikhil Rao <ncrao@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1290022924-3548-1-git-send-email-ncrao@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d5ad140b

sched: Fix volanomark performance regression · b5482cfa

由 Alex Shi 提交于 11月 16, 2010

Commit fab47622 triggers excessive idle balancing, causing a ~30% loss in
volanomark throughput. Remove idle balancing throttle reset.
Originally-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Acked-by: NNikhil Rao <ncrao@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1289928732.5169.211.camel@maggy.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b5482cfa

kdb: fix crash when KDB_BASE_CMD_MAX is exceeded · 5450d904

由 Jovi Zhang 提交于 11月 10, 2010

When the number of dyanmic kdb commands exceeds KDB_BASE_CMD_MAX, the
kernel will fault.
Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

5450d904

kdb: fix memory leak in kdb_main.c · 85e76ab5

由 Jovi Zhang 提交于 11月 10, 2010

Call kfree in the error path as well as the success path in kdb_ll().
Signed-off-by: NJovi Zhang <bookjovi@gmail.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

85e76ab5

BKL: remove extraneous #include <smp_lock.h> · 451a3c24

由 Arnd Bergmann 提交于 11月 17, 2010

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

451a3c24

17 11月, 2010 1 次提交

kernel: make /proc/kallsyms mode 400 to reduce ease of attacking · 59365d13

由 Marcus Meissner 提交于 11月 16, 2010

Making /proc/kallsyms readable only for root by default makes it
slightly harder for attackers to write generic kernel exploits by
removing one source of knowledge where things are in the kernel.

This is the second submit, discussion happened on this on first submit
and mostly concerned that this is just one hole of the sieve ...  but
one of the bigger ones.

Changing the permissions of at least System.map and vmlinux is also
required to fix the same set, but a packaging issue.

Target of this starter patch and follow ups is removing any kind of
kernel space address information leak from the kernel.

[ Side note: the default of root-only reading is the "safe" value, and
  it's easy enough to then override at any time after boot.  The /proc
  filesystem allows root to change the permissions with a regular
  chmod, so you can "revert" this at run-time by simply doing

    chmod og+r /proc/kallsyms

  as root if you really want regular users to see the kernel symbols.
  It does help some tools like "perf" figure them out without any
  setup, so it may well make sense in some situations.  - Linus ]
Signed-off-by: NMarcus Meissner <meissner@suse.de>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NEugene Teo <eugeneteo@kernel.org>
Reviewed-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

59365d13

16 11月, 2010 1 次提交

kernel/sysctl.c: Fix build failure with !CONFIG_PRINTK · df6e61d4

由 Joe Perches 提交于 11月 15, 2010

Sigh...
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df6e61d4