提交 · 19631cb3d67c24c8b1fa58bc69bc2fed8d15095d · openeuler / raspberrypi-kernel

26 4月, 2012 1 次提交

perf: Use static variant of perf_event_overflow in core.c · 33b07b8b

由 Robert Richter 提交于 4月 05, 2012

No need to have an additional function layer.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333643084-26776-4-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

33b07b8b

24 4月, 2012 3 次提交

ring-buffer: Add per_cpu ring buffer control files · 438ced17

由 Vaibhav Nagarnaik 提交于 2月 02, 2012

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Justin Teravest <teravest@google.com>
Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

438ced17

tracing: Remove an unneeded check in trace_seq_buffer() · 5a26c8f0

由 Dan Carpenter 提交于 4月 20, 2012

memcpy() returns a pointer to "bug".  Hopefully, it's not NULL here or
we would already have Oopsed.

Link: http://lkml.kernel.org/r/20120420063145.GA22649@elgon.mountain

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5a26c8f0

tracing: Add percpu buffers for trace_printk() · 07d777fe

由 Steven Rostedt 提交于 9月 22, 2011

Currently, trace_printk() uses a single buffer to write into
to calculate the size and format needed to save the trace. To
do this safely in an SMP environment, a spin_lock() is taken
to only allow one writer at a time to the buffer. But this could
also affect what is being traced, and add synchronization that
would not be there otherwise.

Ideally, using percpu buffers would be useful, but since trace_printk()
is only used in development, having per cpu buffers for something
never used is a waste of space. Thus, the use of the trace_bprintk()
format section is changed to be used for static fmts as well as dynamic ones.
Then at boot up, we can check if the section that holds the trace_printk
formats is non-empty, and if it does contain something, then we
know a trace_printk() has been added to the kernel. At this time
the trace_printk per cpu buffers are allocated. A check is also
done at module load time in case a module is added that contains a
trace_printk().

Once the buffers are allocated, they are never freed. If you use
a trace_printk() then you should know what you are doing.

A buffer is made for each type of context:

  normal
  softirq
  irq
  nmi

The context is checked and the appropriate buffer is used.
This allows for totally lockless usage of trace_printk(),
and they no longer even disable interrupts.
Requested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

07d777fe

14 4月, 2012 1 次提交

irq_work: fix compile failure on tile from missing include · ef1f0982

由 Chris Metcalf 提交于 4月 11, 2012

Building with IRQ_WORK configured results in

kernel/irq_work.c: In function ‘irq_work_run’:
kernel/irq_work.c:110: error: implicit declaration of function ‘irqs_disabled’

The appropriate header just needs to be included.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

ef1f0982

13 4月, 2012 2 次提交

irq_domain: fix type mismatch in debugfs output format · 5269a9ab

由 Grant Likely 提交于 4月 12, 2012

sizeof(void*) returns an unsigned long, but it was being used as a width parameter to a "%-*s" format string which requires an int.  On 64 bit platforms this causes a type mismatch:

    linux/kernel/irq/irqdomain.c:575: warning: field width should have type
    'int', but argument 6 has type 'long unsigned int'

This change casts the size to an int so printf gets the right data type.
Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>

5269a9ab

panic: fix stack dump print on direct call to panic() · 026ee1f6

由 Jason Wessel 提交于 4月 12, 2012

Commit 6e6f0a1f ("panic: don't print redundant backtraces on oops")
causes a regression where no stack trace will be printed at all for the
case where kernel code calls panic() directly while not processing an
oops, and of course there are 100's of instances of this type of call.

The original commit executed the check (!oops_in_progress), but this will
always be false because just before the dump_stack() there is a call to
bust_spinlocks(1), which does the following:

  void __attribute__((weak)) bust_spinlocks(int yes)
  {
	if (yes) {
		++oops_in_progress;

The proper way to resolve the problem that original commit tried to
solve is to avoid printing a stack dump from panic() when the either of
the following conditions is true:

  1) TAINT_DIE has been set (this is done by oops_end())
     This indicates and oops has already been printed.
  2) oops_in_progress > 1
     This guards against the rare case where panic() is invoked
     a second time, or in between oops_begin() and oops_end()
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>	[3.3+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

026ee1f6

12 4月, 2012 1 次提交

irq_domain: Move irq_virq_count into NOMAP revmap · 6fa6c8e2

由 Grant Likely 提交于 2月 15, 2012

This patch replaces the old global setting of irq_virq_count that is only
used by the NOMAP mapping and instead uses a revmap_data property so that
the maximum NOMAP allocation can be set per NOMAP irq_domain.

There is exactly one user of irq_virq_count in-tree right now: PS3.
Also, irq_virq_count is only useful for the NOMAP mapping.  So,
instead of having a single global irq_virq_count values, this change
drops it entirely and added a max_irq argument to irq_domain_add_nomap().
That makes it a property of an individual nomap irq domain instead of
a global system settting.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Milton Miller <miltonm@bga.com>

6fa6c8e2

11 4月, 2012 4 次提交

cred: copy_process() should clear child->replacement_session_keyring · 79549c6d

由 Oleg Nesterov 提交于 4月 09, 2012

keyctl_session_to_parent(task) sets ->replacement_session_keyring,
it should be processed and cleared by key_replace_session_keyring().

However, this task can fork before it notices TIF_NOTIFY_RESUME and
the new child gets the bogus ->replacement_session_keyring copied by
dup_task_struct(). This is obviously wrong and, if nothing else, this
leads to put_cred(already_freed_cred).

change copy_creds() to clear this member. If copy_process() fails
before this point the wrong ->replacement_session_keyring doesn't
matter, exit_creds() won't be called.

Cc: <stable@vger.kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79549c6d

irqdomain: Fix debugfs formatting · 15e06bf6

由 Grant Likely 提交于 4月 11, 2012

This patch fixes the irq_domain_mapping debugfs output to pad pointer
values with leading zeros so that pointer values are displayed
correctly.  Otherwise you get output similar to "0x 5e0000000000000".
Also, when the irq_domain is set to 'null'
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Cc: David Daney <david.daney@cavium.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>

15e06bf6

irq_domain: correct the debugfs file name · ac5830a3

由 Mika Westerberg 提交于 4月 10, 2012

The actual name of the irq_domain mapping debugfs file is
"irq_domain_mapping" not "virq_mapping".
Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

ac5830a3

irq/irq_domain: Quit ignoring error returns from irq_alloc_desc_from(). · 5b7526e3

由 David Daney 提交于 4月 05, 2012

In commit 4bbdd45a (irq_domain/powerpc: eliminate irq_map; use
irq_alloc_desc() instead) code was added that ignores error returns
from irq_alloc_desc_from() by (silently) casting the return value to
unsigned.  The negitive value error return now suddenly looks like a
valid irq number.

Commits cc79ca69 (irq_domain: Move irq_domain code from powerpc to
kernel/irq) and 1bc04f2c (irq_domain: Add support for base irq and
hwirq in legacy mappings) move this code to its current location in
irqdomain.c

The result of all of this is a null pointer dereference OOPS if one of
the error cases is hit.

The fix: Don't cast away the negativeness of the return value and then
check for errors.
Signed-off-by: NDavid Daney <david.daney@cavium.com>
Acked-by: NRob Herring <rob.herring@calxeda.com>
[grant.likely: dropped addition of new 'irq' variable]
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

5b7526e3

10 4月, 2012 2 次提交

clockevents: tTack broadcast device mode change in tick_broadcast_switch_to_oneshot() · fa4da365

由 Suresh Siddha 提交于 4月 09, 2012

In the commit 77b0d60c,
"clockevents: Leave the broadcast device in shutdown mode when not needed",
we were bailing out too quickly in tick_broadcast_switch_to_oneshot(),
with out tracking the broadcast device mode change to 'TICKDEV_MODE_ONESHOT'.

This breaks the platforms which need broadcast device oneshot services during
deep idle states. tick_broadcast_oneshot_control() thinks that it is
in periodic mode and fails to take proper decisions based on the
CLOCK_EVT_NOTIFY_BROADCAST_[ENTER, EXIT] notifications during deep
idle entry/exit.

Fix this by tracking the broadcast device mode as 'TICKDEV_MODE_ONESHOT',
before leaving the broadcast HW device in shutdown mode if there are no active
requests for the moment.
Reported-and-tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: johnstul@us.ibm.com
Link: http://lkml.kernel.org/r/1334011304.12400.81.camel@sbsiddha-desk.sc.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

fa4da365

itimer: Use printk_once instead of WARN_ONCE · 9886f444

由 Thomas Gleixner 提交于 4月 10, 2012

David pointed out, that WARN_ONCE() to report usage of an deprecated
misfeature make folks unhappy. Use printk_once() instead.

Andrew told me to stop grumbling and to remove the silly typecast
while touching the file.
Reported-by: NDavid Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9886f444

06 4月, 2012 2 次提交

nohz: Fix stale jiffies update in tick_nohz_restart() · 6f103929

由 Neal Cardwell 提交于 3月 27, 2012

Fix tick_nohz_restart() to not use a stale ktime_t "now" value when
calling tick_do_update_jiffies64(now).

If we reach this point in the loop it means that we crossed a tick
boundary since we grabbed the "now" timestamp, so at this point "now"
refers to a time in the old jiffy, so using the old value for "now" is
incorrect, and is likely to give us a stale jiffies value.

In particular, the first time through the loop the
tick_do_update_jiffies64(now) call is always a no-op, since the
caller, tick_nohz_restart_sched_tick(), will have already called
tick_do_update_jiffies64(now) with that "now" value.

Note that tick_nohz_stop_sched_tick() already uses the correct
approach: when we notice we cross a jiffy boundary, grab a new
timestamp with ktime_get(), and *then* update jiffies.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1332875377-23014-1-git-send-email-ncardwell@google.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

6f103929

simple_open: automatically convert to simple_open() · 234e3405

由 Stephen Boyd 提交于 4月 05, 2012

Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op.  This leads to a
proliferation of the default_open() implementation across the entire
tree.

Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().

This replacement was done with the following semantic patch:

<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}

@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

234e3405

05 4月, 2012 1 次提交

sysctl: fix write access to dmesg_restrict/kptr_restrict · 620f6e8e

由 Kees Cook 提交于 4月 04, 2012

Commit bfdc0b49 adds code to restrict access to dmesg_restrict,
however, it incorrectly alters kptr_restrict rather than
dmesg_restrict.

The original patch from Richard Weinberger
(https://lkml.org/lkml/2011/3/14/362) alters dmesg_restrict as
expected, and so the patch seems to have been misapplied.

This adds the CAP_SYS_ADMIN check to both dmesg_restrict and
kptr_restrict, since both are sensitive.
Reported-by: NPhillip Lougher <plougher@redhat.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: NRichard Weinberger <richard@nod.at>
Cc: stable@vger.kernel.org
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

620f6e8e

02 4月, 2012 1 次提交

irq_work: fix compile failure on MIPS from system.h split · 83e3fa6f

由 Paul Gortmaker 提交于 4月 01, 2012

Builds of the MIPS platform ip32_defconfig fails as of commit
0195c002 ("Merge tag 'split-asm_system_h ...") because MIPS xchg()
macro uses BUILD_BUG_ON and it was moved in commit b81947c6
("Disintegrate asm/system.h for MIPS").

The root cause is that the system.h split wasn't tested on a baseline
with commit 6c03438e ("kernel.h: doesn't explicitly use bug.h, so
don't include it.")

Since this file uses BUG code in several other places besides the xchg
call, simply make the inclusion explicit.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83e3fa6f

31 3月, 2012 3 次提交

tick: Document TICK_ONESHOT config option · 3872c48b

由 Thomas Gleixner 提交于 3月 31, 2012

This option has been selected from arch code as it was assumed that
it's necessary to support oneshot mode clockevent devices. But it's
just a core internal helper to compile tick-oneshot.c if NOHZ or
HIG_RES_TIMERS are selected.
Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3872c48b

sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq() · e3831edd

由 Srivatsa S. Bhat 提交于 3月 30, 2012

The function for_each_cpu_mask() expects a *pointer* to struct
cpumask as its second argument, whereas select_fallback_rq()
passes the value itself.

And moreover, for_each_cpu_mask() has been marked as obselete
in include/linux/cpumask.h. So move to the more appropriate
for_each_cpu() variant.
Reported-by: NSasha Levin <levinsasha928@gmail.com>
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Dave Jones <davej@redhat.com>
Cc: Liu Chuansheng <chuansheng.liu@intel.com>
Cc: vapier@gentoo.org
Cc: rusty@rustcorp.com.au
Link: http://lkml.kernel.org/r/4F75BED4.9050005@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e3831edd

genirq: Adjust irq thread affinity on IRQ_SET_MASK_OK_NOCOPY return value · f5cb92ac

由 Jiang Liu 提交于 3月 30, 2012

irq_move_masked_irq() checks the return code of
chip->irq_set_affinity() only for 0, but IRQ_SET_MASK_OK_NOCOPY is
also a valid return code, which is there to avoid a redundant copy of
the cpumask. But in case of IRQ_SET_MASK_OK_NOCOPY we not only avoid
the redundant copy, we also fail to adjust the thread affinity of an
eventually threaded interrupt handler.

Handle IRQ_SET_MASK_OK (==0) and IRQ_SET_MASK_OK_NOCOPY(==1) return
values correctly by checking the valid return values seperately.
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Keping Chen <chenkeping@huawei.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1333120296-13563-2-git-send-email-jiang.liu@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f5cb92ac

30 3月, 2012 5 次提交

itimer: Schedule silent NULL pointer fixup in setitimer() for removal · aa2bf9bc

由 Sasikantha babu 提交于 3月 21, 2012

setitimer() should return -EFAULT if called with an invalid pointer
for value. The current code excludes a NULL pointer from this rule and
silently uses it to stop the timer. This violates the spec.

Warn about user space apps which rely on that feature and schedule it
for removal.

[ tglx: Massaged changelog, warn message and Doc entry ]
Signed-off-by: NSasikantha babu <sasikanth.v19@gmail.com>
Link: http://lkml.kernel.org/r/1332340854-26053-1-git-send-email-sasikanth.v19@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

aa2bf9bc

cgroup: cgroup_attach_task() could return -errno after success · 8f121918

由 Tejun Heo 提交于 3月 29, 2012

61d1d219 "cgroup: remove extra calls to find_existing_css_set" made
cgroup_task_migrate() return void.  An unfortunate side effect was
that cgroup_attach_task() was depending on that function's return
value to clear its @retval on the success path.  On cgroup mounts
without any subsystem with ->can_attach() callback,
cgroup_attach_task() ended up returning @retval without initializing
it on success.

For some reason, gcc failed to warn about it and it didn't cause
cgroup_attach_task() to return non-zero value in many cases, probably
due to difference in register allocation.  When the problem
materializes, systemd fails to populate /systemd cgroup mount and
fails to boot.

Fix it by initializing @retval to zero on declaration.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJiri Kosina <jkosina@suse.cz>
LKML-Reference: <alpine.LNX.2.00.1203282354440.25526@pobox.suse.cz>
Reviewed-by: NMandeep Singh Baines <msb@chromium.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

8f121918

kgdb,debug_core: pass the breakpoint struct instead of address and memory · 98b54aa1

由 Jason Wessel 提交于 3月 21, 2012

There is extra state information that needs to be exposed in the
kgdb_bpt structure for tracking how a breakpoint was installed.  The
debug_core only uses the the probe_kernel_write() to install
breakpoints, but this is not enough for all the archs.  Some arch such
as x86 need to use text_poke() in order to install a breakpoint into a
read only page.

Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and
kgdb_arch_remove_breakpoint() allows other archs to set the type
variable which indicates how the breakpoint was installed.

Cc: stable@vger.kernel.org # >= 2.6.36
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

98b54aa1

kdb: Fix smatch warning on dbg_io_ops->is_console · 78724b8e

由 Jason Wessel 提交于 3月 29, 2012

The Smatch tool warned that the change from commit b8adde8d
(kdb: Avoid using dbg_io_ops until it is initialized) should
add another null check later in the kdb_printf().

It is worth noting that the second use of dbg_io_ops->is_console
is protected by the KDB_PAGER state variable which would only
get set when kdb is fully active and initialized.  If we
ever encounter changes or defects in the KDB_PAGER state
we do not want to crash the kernel in a kdb_printf/printk.

CC: Tim Bird <tim.bird@am.sony.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

78724b8e

irqdomain: Remove powerpc dependency from debugfs file · 092b2fb0

由 Grant Likely 提交于 3月 29, 2012

The debugfs code is really generic for all platforms.  This patch removes the
powerpc-specific directory reference and makes it available to all
architectures.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

092b2fb0

29 3月, 2012 14 次提交

padata: Fix cpu hotplug · 96120905

由 Steffen Klassert 提交于 3月 28, 2012

We don't remove the cpu that went offline from our cpumasks
on cpu hotplug. This got lost somewhere along the line, so
restore it. This fixes a hang of the padata instance on cpu
hotplug.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

96120905

padata: Use the online cpumask as the default · 13614e0f

由 Steffen Klassert 提交于 3月 28, 2012

We use the active cpumask to determine the superset of cpus
to use for parallelization. However, the active cpumask is
for internal usage of the scheduler and therefore not the
appropriate cpumask for these purposes. So use the online
cpumask instead.
Reported-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

13614e0f

padata: Add a reference to the api documentation · 107f8bda

由 Steffen Klassert 提交于 3月 28, 2012

Add a reference to the padata api documentation at Documentation/padata.txt
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

107f8bda

futex: Mark get_robust_list as deprecated · ec0c4274

由 Kees Cook 提交于 3月 23, 2012

Notify get_robust_list users that the syscall is going away.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: kernel-hardening@lists.openwall.com
Cc: spender@grsecurity.net
Link: http://lkml.kernel.org/r/20120323190855.GA27213@www.outflux.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

ec0c4274

futex: Do not leak robust list to unprivileged process · bdbb776f

由 Kees Cook 提交于 3月 19, 2012

It was possible to extract the robust list head address from a setuid
process if it had used set_robust_list(), allowing an ASLR info leak. This
changes the permission checks to be the same as those used for similar
info that comes out of /proc.

Running a setuid program that uses robust futexes would have had:
  cred->euid != pcred->euid
  cred->euid == pcred->uid
so the old permissions check would allow it. I'm not aware of any setuid
programs that use robust futexes, so this is just a preventative measure.

(This patch is based on changes from grsecurity.)
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: kernel-hardening@lists.openwall.com
Cc: spender@grsecurity.net
Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

bdbb776f

genirq: Respect NUMA node affinity in setup_irq_irq affinity() · 241fc640

由 Prarit Bhargava 提交于 3月 26, 2012

We respect node affinity of devices already in the irq descriptor
allocation, but we ignore it for the initial interrupt affinity
setup, so the interrupt might be routed to a different node.

Restrict the default affinity mask to the node on which the irq
descriptor is allocated.

[ tglx: Massaged changelog ]
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1332788538-17425-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

241fc640

genirq: Get rid of unneeded force parameter in irq_finalize_oneshot() · f3f79e38

由 Alexander Gordeev 提交于 3月 21, 2012

The only place irq_finalize_oneshot() is called with force parameter set
is the threaded handler error exit path. But IRQTF_RUNTHREAD is dropped
at this point and irq_wake_thread() is not going to set it again,
since PF_EXITING is set for this thread already. So irq_finalize_oneshot()
will drop the threads bit in threads_oneshot anyway and hence the force
parameter is superfluous.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120321162234.GP24806@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f3f79e38

genirq: Minor readablity improvement in irq_wake_thread() · 69592db2

由 Alexander Gordeev 提交于 3月 21, 2012

exit_irq_thread() clears IRQTF_RUNTHREAD flag and drops the thread's bit in
desc->threads_oneshot then. The bit must not be set again in between and it
does not, since irq_wake_thread() sees PF_EXITING flag first and returns.

Due to above the order or checking PF_EXITING and IRQTF_RUNTHREAD flags in
irq_wake_thread() is important. This change just makes it more visible in the
source code.
Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120321162212.GO24806@dhcp-26-207.brq.redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

69592db2

sched: Fix __schedule_bug() output when called from an interrupt · 6135fc1e

由 Stephen Boyd 提交于 3月 28, 2012

If schedule is called from an interrupt handler __schedule_bug()
will call show_regs() with the registers saved during the
interrupt handling done in do_IRQ(). This means we'll see the
registers and the backtrace for the process that was interrupted
and not the full backtrace explaining who called schedule().

This is due to 838225b4 ("sched: use show_regs() to improve
__schedule_bug() output", 2007-10-24) which improperly assumed
that get_irq_regs() would return the registers for the current
stack because it is being called from within an interrupt
handler. Simply remove the show_reg() code so that we dump a
backtrace for the interrupt handler that called schedule().

[ I ran across this when I was presented with a scheduling while
  atomic log with a stacktrace pointing at spin_unlock_irqrestore().
  It made no sense and I had to guess what interrupt handler could
  be called and poke around for someone calling schedule() in an
  interrupt handler. A simple test of putting an msleep() in
  an interrupt handler works better with this patch because you
  can actually see the msleep() call in the backtrace. ]
Also-reported-by: NChris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Cc: Satyam Sharma <satyam@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1332979847-27102-1-git-send-email-sboyd@codeaurora.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

6135fc1e

documentation: remove references to cpu_*_map. · 5f054e31

由 Rusty Russell 提交于 3月 29, 2012

This has been obsolescent for a while, fix documentation and
misc comments.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

5f054e31

pidns: add reboot_pid_ns() to handle the reboot syscall · cf3f8921

由 Daniel Lezcano 提交于 3月 28, 2012

In the case of a child pid namespace, rebooting the system does not really
makes sense.  When the pid namespace is used in conjunction with the other
namespaces in order to create a linux container, the reboot syscall leads
to some problems.

A container can reboot the host.  That can be fixed by dropping the
sys_reboot capability but we are unable to correctly to poweroff/
halt/reboot a container and the container stays stuck at the shutdown time
with the container's init process waiting indefinitively.

After several attempts, no solution from userspace was found to reliabily
handle the shutdown from a container.

This patch propose to make the init process of the child pid namespace to
exit with a signal status set to : SIGINT if the child pid namespace
called "halt/poweroff" and SIGHUP if the child pid namespace called
"reboot".  When the reboot syscall is called and we are not in the initial
pid namespace, we kill the pid namespace for "HALT", "POWEROFF",
"RESTART", and "RESTART2".  Otherwise we return EINVAL.

Returning EINVAL is also an easy way to check if this feature is supported
by the kernel when invoking another 'reboot' option like CAD.

By this way the parent process of the child pid namespace knows if it
rebooted or not and can take the right decision.

Test case:
==========

#include <alloca.h>
#include <stdio.h>
#include <sched.h>
#include <unistd.h>
#include <signal.h>
#include <sys/reboot.h>
#include <sys/types.h>
#include <sys/wait.h>

#include <linux/reboot.h>

static int do_reboot(void *arg)
{
        int *cmd = arg;

        if (reboot(*cmd))
                printf("failed to reboot(%d): %m\n", *cmd);
}

int test_reboot(int cmd, int sig)
{
        long stack_size = 4096;
        void *stack = alloca(stack_size) + stack_size;
        int status;
        pid_t ret;

        ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd);
        if (ret < 0) {
                printf("failed to clone: %m\n");
                return -1;
        }

        if (wait(&status) < 0) {
                printf("unexpected wait error: %m\n");
                return -1;
        }

        if (!WIFSIGNALED(status)) {
                printf("child process exited but was not signaled\n");
                return -1;
        }

        if (WTERMSIG(status) != sig) {
                printf("signal termination is not the one expected\n");
                return -1;
        }

        return 0;
}

int main(int argc, char *argv[])
{
        int status;

        status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT);
        if (status < 0)
                return 1;
        printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n");

        status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1);
        if (status >= 0) {
                printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n");
                return 1;
        }
        printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n");

        return 0;
}

[akpm@linux-foundation.org: tweak and add comments]
[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Tested-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf3f8921

sysctl: use bitmap library functions · 5a04cca6

由 Akinobu Mita 提交于 3月 28, 2012

Use bitmap_set() instead of using set_bit() for each bit.  This conversion
is valid because the bitmap is private in the function call and atomic
bitops were unnecessary.

This also includes minor change.
- Use bitmap_copy() for shorter typing
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a04cca6

kexec: add further check to crashkernel · eaa3be6a

由 Zhenzhong Duan 提交于 3月 28, 2012

When using crashkernel=2M-256M, the kernel doesn't give any warning.  This
is misleading sometimes.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eaa3be6a

kexec: crash: don't save swapper_pg_dir for !CONFIG_MMU configurations · d034cfab

由 Will Deacon 提交于 3月 28, 2012

nommu platforms don't have very interesting swapper_pg_dir pointers and
usually just #define them to NULL, meaning that we can't include them in
the vmcoreinfo on the kexec crash path.

This patch only saves the swapper_pg_dir if we have an MMU.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Reviewed-by: NSimon Horman <horms@verge.net.au>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d034cfab