提交 · 6fd84c0831ec78d98736b76dc5e9b849f1dbfc9e · openanolis / cloud-kernel

02 6月, 2012 8 次提交

A
TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set · 6fd84c08
由 Al Viro 提交于 5月 23, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6fd84c08

don't call try_to_freeze() from do_signal() · bf343dfd

由 Al Viro 提交于 4月 27, 2012

get_signal_to_deliver() will handle it itself
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bf343dfd

A
pull clearing RESTORE_SIGMASK into block_sigmask() · a610d6e6
由 Al Viro 提交于 5月 21, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a610d6e6
A
sh64: failure to build sigframe != signal without handler · 5754f412
由 Al Viro 提交于 4月 26, 2012
```
it's actually "send me SIGSEGV"...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5754f412
A
openrisc: tracehook_signal_handler() is supposed to be called on success · 39974d08
由 Al Viro 提交于 4月 26, 2012
```
... not if sigframe couldn't have been built.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
39974d08

new helper: sigmask_to_save() · b7f9a11a

由 Al Viro 提交于 5月 02, 2012

replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7f9a11a

new helper: restore_saved_sigmask() · 51a7b448

由 Al Viro 提交于 5月 21, 2012

first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper.  Open-coded instances switched...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51a7b448

new helpers: {clear,test,test_and_clear}_restore_sigmask() · 4ebefe3e

由 Al Viro 提交于 4月 26, 2012

helpers parallel to set_restore_sigmask(), used in the next commits
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ebefe3e

01 6月, 2012 10 次提交

syscalls, x86: add __NR_kcmp syscall · d97b46a6

由 Cyrill Gorcunov 提交于 5月 31, 2012

While doing the checkpoint-restore in the user space one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are
shared between tasks and restore this state.

The 2nd step can be solved by using appropriate CLONE_ flags and the
unshare syscall, while there's currently no ways for solving the 1st one.

One of the ways for checking whether two tasks share e.g.  mm_struct is to
provide some mm_struct ID of a task to its proc file, but showing such
info considered to be not that good for security reasons.

Thus after some debates we end up in conclusion that using that named
'comparison' syscall might be the best candidate.  So here is it --
__NR_kcmp.

It takes up to 5 arguments - the pids of the two tasks (which
characteristics should be compared), the comparison type and (in case of
comparison of files) two file descriptors.

Lookups for pids are done in the caller's PID namespace only.

At moment only x86 is supported and tested.

[akpm@linux-foundation.org: fix up selftests, warnings]
[akpm@linux-foundation.org: include errno.h]
[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: Michal Marek <mmarek@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d97b46a6

um: properly check all process' threads for a live mm · 2c922c51

由 Anton Vorontsov 提交于 5月 31, 2012

kill_off_processes() might miss a valid process, this is because checking
for process->mm is not enough.  Process' main thread may exit or detach
its mm via use_mm(), but other threads may still have a valid mm.

To catch this we use find_lock_task_mm(), which walks up all threads and
returns an appropriate task (with task lock held).
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c922c51

um: fix possible race on task->mm · 137d1a26

由 Anton Vorontsov 提交于 5月 31, 2012

Checking for task->mm is dangerous as ->mm might disappear (exit_mm()
assigns NULL under task_lock(), so tasklist lock is not enough).

We can't use get_task_mm()/mmput() pair as mmput() might sleep, so let's
take the task lock while we care about its mm.

Note that we should also use find_lock_task_mm() to check all process'
threads for a valid mm, but for uml we'll do it in a separate patch.
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

137d1a26

um: should hold tasklist_lock while traversing processes · 9bd0a077

由 Anton Vorontsov 提交于 5月 31, 2012

Traversing the tasks requires holding tasklist_lock, otherwise it is
unsafe.

p.s.  However, I'm not sure that calling os_kill_ptraced_process() in the
atomic context is correct.  It seem to work, but please take a closer
look.
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bd0a077

blackfin: fix possible deadlock in decode_address() · af1be5a5

由 Anton Vorontsov 提交于 5月 31, 2012

Oleg Nesterov found an interesting deadlock possibility:

> sysrq_showregs_othercpus() does smp_call_function(showacpu)
> and showacpu() show_stack()->decode_address(). Now suppose that IPI
> interrupts the task holding read_lock(tasklist).

To fix this, blackfin should not grab the write_ variant of the
tasklist lock, read_ one is enough.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af1be5a5

blackfin: a couple of task->mm handling fixes · 2214f707

由 Anton Vorontsov 提交于 5月 31, 2012

The patch fixes two problems:

1. Working with task->mm w/o getting mm or grabing the task lock is
   dangerous as ->mm might disappear (exit_mm() assigns NULL under
   task_lock(), so tasklist lock is not enough).

   We can't use get_task_mm()/mmput() pair as mmput() might sleep,
   so we have to take the task lock while handle its mm.

2. Checking for process->mm is not enough because process' main
   thread may exit or detach its mm via use_mm(), but other threads
   may still have a valid mm.

   To catch this we use find_lock_task_mm(), which walks up all
   threads and returns an appropriate task (with task lock held).
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2214f707

sh: use clear_tasks_mm_cpumask() · 1198c8b9

由 Anton Vorontsov 提交于 5月 31, 2012

Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.

To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).

clear_tasks_mm_cpumask() has the issue fixed, so let's use it.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1198c8b9

powerpc: use clear_tasks_mm_cpumask() · 73863ab0

由 Anton Vorontsov 提交于 5月 31, 2012

Current CPU hotplug code has some task->mm handling issues:

1. Working with task->mm w/o getting mm or grabing the task lock is
   dangerous as ->mm might disappear (exit_mm() assigns NULL under
   task_lock(), so tasklist lock is not enough).

   We can't use get_task_mm()/mmput() pair as mmput() might sleep,
   so we must take the task lock while handle its mm.

2. Checking for process->mm is not enough because process' main
   thread may exit or detach its mm via use_mm(), but other threads
   may still have a valid mm.

   To fix this we would need to use find_lock_task_mm(), which would
   walk up all threads and returns an appropriate task (with task
   lock held).

clear_tasks_mm_cpumask() has all the issues fixed, so let's use it.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73863ab0

arm: use clear_tasks_mm_cpumask() · 3eaa73bd

由 Anton Vorontsov 提交于 5月 31, 2012

Checking for process->mm is not enough because process' main thread may
exit or detach its mm via use_mm(), but other threads may still have a
valid mm.

To fix this we would need to use find_lock_task_mm(), which would walk up
all threads and returns an appropriate task (with task lock held).

clear_tasks_mm_cpumask() has this issue fixed, so let's use it.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3eaa73bd

um/kernel/trap.c: port OOM changes to handle_page_fault() · 1cefe28f

由 Kautuk Consul 提交于 5月 31, 2012

Commit d065bd81 ("mm: retry page fault when blocking on disk
transfer") and commit 37b23e05 ("x86,mm: make pagefault killable")
introduced changes into the x86 pagefault handler for making the page
fault handler retryable as well as killable.

These changes reduce the mmap_sem hold time, which is crucial during OOM
killer invocation.

Port these changes to um.
Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1cefe28f

31 5月, 2012 6 次提交

J
[PARISC] update parisc to use generic strncpy_from_user() · b1195c0e
由 James Bottomley 提交于 5月 26, 2012
```
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
```
b1195c0e

ARM: LPC32xx: Adjust dts files to gpio dt binding · a035254a

由 Roland Stigge 提交于 5月 19, 2012

The GPIO devicetree binding in 3.5 doesn't register the various LPC32xx GPIO
banks via DT subnodes but always all at once, and changes the gpio referencing
to 3 cells (bank, gpio, flags). This patch adjusts the DTS files to this
binding that was just accepted to the gpio subsystem.
Signed-off-by: NRoland Stigge <stigge@antcom.de>
Signed-off-by: NOlof Johansson <olof@lixom.net>

a035254a

x86, amd, xen: Avoid NULL pointer paravirt references · 1ab46fd3

由 Konrad Rzeszutek Wilk 提交于 5月 30, 2012

Stub out MSR methods that aren't actually needed.  This fixes a crash
as Xen Dom0 on AMD Trinity systems.  A bigger patch should be added to
remove the paravirt machinery completely for the methods which
apparently have no users!
Reported-by: NAndre Przywara <andre.przywara@amd.com>
Link: http://lkml.kernel.org/r/20120530222356.GA28417@andromeda.dapyr.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>

1ab46fd3

x86/mce: Cleanup timer mess · 82f7af09

由 Thomas Gleixner 提交于 5月 24, 2012

Use unsigned long for dealing with jiffies not int. Rename the
callback to something sensible. Use __this_cpu_read/write for
accessing per cpu data.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

82f7af09

x86, mtrr: Fix a type overflow in range_to_mtrr func · 2da06af8

由 zhenzhong.duan 提交于 5月 30, 2012

When boot on sun G5+ with 4T mem, see an overflow in mtrr cleanup as below.

*BAD*gran_size: 2G      chunk_size: 2G  num_reg: 10     lose cover RAM:
-18014398505283592M

This is because 1<<31 sign extended. Use an unsigned long constant to
fix it.  Useful for mem larger than or equal to 4T.

-v2: Use 64bit constant instead of explicit type conversion as suggested
by Yinghai. Description updated too.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Link: http://lkml.kernel.org/r/4FC5A77F.6060505@oracle.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

2da06af8

x86, realmode: Unbreak the ia64 build of drivers/acpi/sleep.c · 319b6ffc

由 H. Peter Anvin 提交于 5月 30, 2012

Revert usage of acpi_wakeup_address and move definition
to x86 architecture code in order to make compilation work
in ia64.

[jsakkine: tested compilation in ia64/x86-64 and added
proper commit message]
Reported-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Originally-by: NH. Peter Anvin <hpa@kernel.org>
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com>
Link: http://lkml.kernel.org/r/1338370421-27735-1-git-send-email-jarkko.sakkinen@intel.com
Cc: Tony Luck <tony.luck@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

319b6ffc

30 5月, 2012 16 次提交

x86/mm/pat: Improve scaling of pat_pagerange_is_ram() · fa83523f

由 John Dykstra 提交于 5月 25, 2012

Function pat_pagerange_is_ram() scales poorly to large address
ranges, because it probes the resource tree for each page.

On a 2.6 GHz Opteron, this function consumes 34 ms for a 1 GB range.

It is called twice during untrack_pfn_vma(), slowing process
cleanup and handicapping the OOM killer.

This replacement consumes less than 1ms, under the same conditions.

Signed-off-by: John Dykstra <jdykstra@cray.com> on behalf of Cray Inc.
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337980366.1979.6.camel@redwood
[ Small stylistic cleanups and renames ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

fa83523f

s390/uaccess: fix access_ok compile warnings · 491af990

由 Heiko Carstens 提交于 5月 29, 2012

On s390 access_ok is a macro which discards all parameters and always
returns 1. This can result in compile warnings which warn about unused
variables like this:

fs/read_write.c: In function 'rw_copy_check_uvector':
fs/read_write.c:684:16: warning: unused variable 'buf' [-Wunused-variable]

Fix this by adding a __range_ok() function which consumes all parameters
but still always returns 1.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

491af990

s390/cmpxchg: select HAVE_CMPXCHG_LOCAL option · 2e30db95

由 Heiko Carstens 提交于 5月 29, 2012

Now that hopefully all cmpxchg/xchg bugs have been fixed select
HAVE_CMPXCHG_LOCAL option which uncovered a couple of bugs on s390.

The only call site which is affected seems to be within mm/vmstat.c.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

2e30db95

s390/cmpxchg: fix sign extension bugs · 1896d256

由 Heiko Carstens 提交于 5月 29, 2012

For 1 and 2 byte operands for xchg and cmpxchg the old and new values
get or'ed into the larger 4 byte old value before the compare and swap
instruction gets executed. This is done without using the proper byte
mask before or'ing the values.
If the caller passed in negative old or new values these got sign
extended by the caller. Which in turn means that either the old value
never matches, or, even worse, unrelated bytes would be changed in memory.

Luckily there don't seem to be any callers around yet, since that would
have resulted in the specification exception fixed in an earlies patch.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1896d256

s390/cmpxchg: fix 1 and 2 byte memory accesses · bf3db853

由 Heiko Carstens 提交于 5月 29, 2012

When accessing a 1 or 2 byte memory operand we cannot use the
passed address since the compare and swap instruction only works
for 4 byte aligned memory operands.
Hence we calculate an aligned address so that compare and swap works
correctly. However we don't pass the calculated address to the inline
assembly. This results in incorrect memory accesses and in a
specification exception if used on non 4 byte aligned memory operands.

Since this didn't happen until now, there don't seem to be
too many users of cmpxchg on unaligned addresses.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

bf3db853

s390/cmpxchg: fix compile warnings specific to s390 · 6b894a40

由 Heiko Carstens 提交于 5月 29, 2012

The cmpxchg macros and functions are a bit different than on other
architectures. In particular the macros do not store the return
value of a __cmpxchg function call in a variable before returning the
value.

This causes compile warnings that only occur on s390 like this one:

net/ipv4/af_inet.c: In function 'build_ehash_secret':
net/ipv4/af_inet.c:241:2: warning: value computed is not used [-Wunused-value]

To get rid of these warnings use the same construct that we already use
for the xchg macro, which was introduced for the same reason.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

6b894a40

s390/cmpxchg: add missing memory barrier to cmpxchg64 · 0c44ca71

由 Heiko Carstens 提交于 5月 29, 2012

All cmpxchg functions imply a memory barrier.
cmpxch64 did not have one for 31 bit code, so add it.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0c44ca71

s390/cpu: remove cpu "capabilities" sysfs attribute · b9e3f776

由 Heiko Carstens 提交于 5月 25, 2012

It has been a big mistage to add the capabilities attribute to the
cpus in sysfs:
First the attribute only contains the cpu capability of primary cpus,
which however is not necessarily (or better: unlikely) the type of
cpu the kernel runs on, which is typically an IFL.
In addition all information that is necessary is available in
/proc/sysinfo already. So this attribute partially duplicated
informations.
So programs should look into the sysinfo file to retrieve all
informations they are interested in.

Since with this kernel release also the powersavings cpu attributes
are removed this seems to be a good opportunity to remove another
broken interface.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b9e3f776

s390/kernel: Fix smp_call_ipl_cpu() for offline CPUs · 061da3df

由 Michael Holzheu 提交于 5月 24, 2012

If the IPL CPU is offline, currently the pcpu_delegate() function
used by smp_call_ipl_cpu() does not work because pcpu_delegate()
modifies the lowcore of the target CPU. In case of an offline
IPL CPU currently the prefix register is zero but pcpu->lowcore
still points to the old prefix page. Therefore the lowcore changes
done by pcpu_delegate() have no effect.

With this fix pcpu_delegate() now uses memcpy_absolute() and therefore
also prepares the absolute zero lowcore if the target CPU has prefix
register zero.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

061da3df

s390/kernel: Introduce memcpy_absolute() function · 73bf463e

由 Michael Holzheu 提交于 5月 24, 2012

This patch introduces the new function memcpy_absolute() that allows to
copy memory using absolute addressing. This means that the prefix swap
does not apply when this function is used.

With this patch also all s390 kernel code that accesses absolute zero
now uses the new memcpy_absolute() function. The old and less generic
copy_to_absolute_zero() function is removed.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

73bf463e

rtc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC · 79811595

由 Fabio Estevam 提交于 5月 29, 2012

In order to keep consistency with other rtc drivers,rename CONFIG_RTC_MXC
to CONFIG_RTC_DRV_MXC.
Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
Acked-by: NWolfram Sang <w.sang@pengutronix.de>
Cc: Alessandro Zummo <a.zummo@towertech.it>
[akpm@linux-foundation.org: fix missed arch/arm/configs/imx_v6_v7_defconfig]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79811595

mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition · 26c19178

由 Andrea Arcangeli 提交于 5月 29, 2012

When holding the mmap_sem for reading, pmd_offset_map_lock should only
run on a pmd_t that has been read atomically from the pmdp pointer,
otherwise we may read only half of it leading to this crash.

PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
 #0 [f06a9dd8] crash_kexec at c049b5ec
 #1 [f06a9e2c] oops_end at c083d1c2
 #2 [f06a9e40] no_context at c0433ded
 #3 [f06a9e64] bad_area_nosemaphore at c043401a
 #4 [f06a9e6c] __do_page_fault at c0434493
 #5 [f06a9eec] do_page_fault at c083eb45
 #6 [f06a9f04] error_code (via page_fault) at c083c5d5
    EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
    00000000
    DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
    CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
 #7 [f06a9f38] _spin_lock at c083bc14
 #8 [f06a9f44] sys_mincore at c0507b7d
 #9 [f06a9fb0] system_call at c083becd
                         start           len
    EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
    DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
    SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
    CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286

This should be a longstanding bug affecting x86 32bit PAE without THP.
Only archs with 64bit large pmd_t and 32bit unsigned long should be
affected.

With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
would partly hide the bug when the pmd transition from none to stable,
by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
enabled a new set of problem arises by the fact could then transition
freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
unconditional isn't good idea and it would be a flakey solution.

This should be fully fixed by introducing a pmd_read_atomic that reads
the pmd in order with THP disabled, or by reading the pmd atomically
with cmpxchg8b with THP enabled.

Luckily this new race condition only triggers in the places that must
already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
is localized there but this bug is not related to THP.

NOTE: this can trigger on x86 32bit systems with PAE enabled with more
than 4G of ram, otherwise the high part of the pmd will never risk to be
truncated because it would be zero at all times, in turn so hiding the
SMP race.

This bug was discovered and fully debugged by Ulrich, quote:

----
[..]
pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
eax.

    496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
    *pmd)
    497 {
    498         /* depend on compiler for an atomic pmd read */
    499         pmd_t pmdval = *pmd;

                                // edi = pmd pointer
0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
...
                                // edx = PTE page table high address
0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
...
                                // eax = PTE page table low address
0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax

[..]

Please note that the PMD is not read atomically. These are two "mov"
instructions where the high order bits of the PMD entry are fetched
first. Hence, the above machine code is prone to the following race.

-  The PMD entry {high|low} is 0x0000000000000000.
   The "mov" at 0xc0507a84 loads 0x00000000 into edx.

-  A page fault (on another CPU) sneaks in between the two "mov"
   instructions and instantiates the PMD.

-  The PMD entry {high|low} is now 0x00000003fda38067.
   The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
----
Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26c19178

x86: print physical addresses consistently with other parts of kernel · 365811d6

由 Bjorn Helgaas 提交于 5月 29, 2012

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -found SMP MP-table at [ffff8800000fce90] fce90
    +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90]
    -initial memory mapped : 0 - 20000000
    +initial memory mapped: [mem 0x00000000-0x1fffffff]
    -Base memory trampoline at [ffff88000009c000] 9c000 size 8192
    +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000]
    -SRAT: Node 0 PXM 0 0-80000000
    +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

365811d6

x86: print e820 physical addresses consistently with other parts of kernel · 91eb0f67

由 Bjorn Helgaas 提交于 5月 29, 2012

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -BIOS-provided physical RAM map:
    +e820: BIOS-provided physical RAM map:
    - BIOS-e820: 0000000000000100 - 000000000009e000 (usable)
    +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable
    -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
    +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
    -reserve RAM buffer: 000000000009e000 - 000000000009ffff
    +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91eb0f67

cris: select GENERIC_ATOMIC64 · 4c9c6a1b

由 Cong Wang 提交于 5月 29, 2012

Cris doesn't implement atomic64 operations neither, should select
GENERIC_ATOMIC64.
Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c9c6a1b

sparc: fix sparc64 build due to leon.h inclusion · e49e6ff5

由 Sam Ravnborg 提交于 5月 29, 2012

Stephen Rothwell <sfr@canb.auug.org.au> reported following error:
In file included from arch/sparc/kernel/prom_common.c:26:0:
arch/sparc/include/asm/leon.h:221:9: error: unknown type name 'irq_flow_handler_t'
arch/sparc/include/asm/leon.h:224:10: error: unknown type name 'irq_flow_handler_t'

Fix this by:
1) Avoid including leon.h in prom_commen.h (not needed)
2) Include irq.h in leon.h to avoid the missing symbol error
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e49e6ff5

openanolis / cloud-kernel 11 个月 前同步成功

openanolis / cloud-kernel
11 个月前同步成功