提交 · c745552a82cbf3a82adea5210212ed31ec03388d · openeuler / Kernel

22 1月, 2011 4 次提交

L
Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6 · c745552a
由 Linus Torvalds 提交于 1月 21, 2011
```
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
  powerpc/83xx: fix build failures on dt compatible list.
```
c745552a

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · d41ad6df

由 Linus Torvalds 提交于 1月 21, 2011

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (34 commits)
  powerpc/mpic: Fix mask/unmask timeout message
  powerpc/pseries: Add BNX2=m to defconfig
  powerpc: Enable 64kB pages and 1024 threads in pseries config
  powerpc: Disable mcount tracers in pseries defconfig
  powerpc/boot/dts: Install dts from the right directory
  powerpc: machine_check_generic is wrong on 64bit
  powerpc: Check RTAS extended log flag before checking length
  powerpc: Fix corruption when grabbing FWNMI data
  powerpc: Rework pseries machine check handler
  powerpc: Don't silently handle machine checks from userspace
  powerpc: Remove duplicate debugger hook in machine_check_exception
  powerpc: Never halt RTAS error logging after receiving an unrecoverable machine check
  powerpc: Don't force MSR_RI in machine_check_exception
  powerpc: Print 32 bits of DSISR in show_regs
  powerpc/kdump: Disable ftrace during kexec
  powerpc/kdump: Move crash_kexec_stop_spus to kdump crash handler
  powerpc/kexec: Remove empty ppc_md.machine_kexec_prepare
  powerpc/kexec: Don't initialise kexec hooks to default handlers
  powerpc/kdump: Remove ppc_md.machine_crash_shutdown
  powerpc/kexec: Remove ppc_md.machine_kexec
  ...

d41ad6df

mm: System without MMU do not need pte_mkwrite · 3dece370

由 Michal Simek 提交于 1月 21, 2011

The patch "thp: export maybe_mkwrite" (commit 14fd403f) breaks
systems without MMU.

Error log:

    CC      arch/microblaze/mm/init.o
  In file included from include/linux/mman.h:14,
                   from arch/microblaze/mm/consistent.c:24:
  include/linux/mm.h: In function 'maybe_mkwrite':
  include/linux/mm.h:482: error: implicit declaration of function 'pte_mkwrite'
  include/linux/mm.h:482: error: incompatible types in assignment
Signed-off-by: NMichal Simek <monstr@monstr.eu>
CC: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: NRik van Riel <riel@redhat.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dece370

MAINTAINERS: Update Roland Dreier's email address · db9fd848

由 Roland Dreier 提交于 1月 20, 2011

The cisco.com address will stop working soon, and besides no one can
remember the second "d" in "rolandd" or how to spell "rdreier."
Signed-off-by: NRoland Dreier <rolandd@cisco.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db9fd848

21 1月, 2011 36 次提交

L
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 · 4843456c
由 Linus Torvalds 提交于 1月 21, 2011
```
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Fix deadlock during path resolution
```
4843456c

powerpc/mpic: Fix mask/unmask timeout message · 8bfc5e36

由 Scott Wood 提交于 1月 17, 2011

Don't say that enable timed out when it was disable, and
show which IRQ had the problem.
Signed-off-by: NScott Wood <scottwood@freescale.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8bfc5e36

powerpc/pseries: Add BNX2=m to defconfig · cb046de7

由 Nishanth Aravamudan 提交于 1月 13, 2011

Upcoming servers will include a Broadcom NIC, add to the defconfig to
increase testing coverage and make sure mainline builds come up with
networking.
Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cb046de7

powerpc: Enable 64kB pages and 1024 threads in pseries config · ef4f7c2d

由 Anton Blanchard 提交于 1月 12, 2011

- Enable 64kB pages so it gets some regular testing.

- The largest POWER7 has 1024 threads so bump NR_CPUS it to match.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ef4f7c2d

powerpc: Disable mcount tracers in pseries defconfig · 7d2dcd04

由 Anton Blanchard 提交于 1月 12, 2011

IRQSOFF_TRACER and STACK_TRACER force the kernel to be built with -pg
which is a substantial overhead.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7d2dcd04

powerpc/boot/dts: Install dts from the right directory · 4d9ef89d

由 Ben Hutchings 提交于 1月 08, 2011

The dts-installed variable is initialised using a wildcard path that
will be expanded relative to the build directory.  Use the existing
variable dtstree to generate an absolute wildcard path that will work
when building in a separate directory.
Reported-by: NGerhard Pircher <gerhard_pircher@gmx.net>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Tested-by: Gerhard Pircher <gerhard_pircher@gmx.net> [against 2.6.32]
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4d9ef89d

powerpc: machine_check_generic is wrong on 64bit · fbe754ca

由 Anton Blanchard 提交于 1月 11, 2011

Decoding machine checks is CPU specific and so machine_check_generic doesn't
do the right thing on 64bit chips. Luckily we never call into this code
because we call ppc_md.machine_check_exception instead if available.

Since we check cur_cpu_spec->machine_check before calling it, we may as
well remove machine_check_generic from 64bit archs.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fbe754ca

powerpc: Check RTAS extended log flag before checking length · 7f32c9c6

由 Anton Blanchard 提交于 1月 11, 2011

The spec suggests we should first check the extended log flag before checking
the length field.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7f32c9c6

powerpc: Fix corruption when grabbing FWNMI data · d368514c

由 Anton Blanchard 提交于 1月 11, 2011

The FWNMI code uses a global buffer without any locks to read the RTAS error
information. If two CPUs take a machine check at once then we will corrupt
this buffer.

Since most FWNMI rtas messages are not of the extended type, we can create a
64bit percpu buffer and use it where possible. If we do receive an extended
RTAS log then we fall back to the old behaviour of using the global buffer.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d368514c

powerpc: Rework pseries machine check handler · d47d1d8a

由 Anton Blanchard 提交于 1月 11, 2011

Rework pseries machine check handler:

- If MSR_RI isn't set, we cannot recover even if the machine check was fully
  recovered

- Rename nonfatal to recovered

- Handle RTAS_DISP_LIMITED_RECOVERY

- Use BUS_MCEERR_AR instead of BUS_ADRERR

- Don't check all the RTAS error log fields when receiving a synchronous
  machine check. Recent versions of the pseries firmware do not fill them
  in during a machine check and instead send a follow up error log with
  the detailed information. If we see a synchronous machine check, and we
  came from userspace then kill the task.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d47d1d8a

powerpc: Don't silently handle machine checks from userspace · e49b1fae

由 Anton Blanchard 提交于 1月 11, 2011

If a machine check comes from userspace we send a SIGBUS to the task and
fail to printk anything.

If we are taking machine checks due to bad hardware we want to know about
it right away. Furthermore if we don't complain loudly then it will look
a lot like a bug in the userspace application, potentially causing a lot
of confusion.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e49b1fae

powerpc: Remove duplicate debugger hook in machine_check_exception · dfb5509f

由 Anton Blanchard 提交于 1月 11, 2011

We are calling debugger_fault_handler twice in machine_check_exception.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

dfb5509f

powerpc: Never halt RTAS error logging after receiving an unrecoverable machine check · 3f9793e6

由 Anton Blanchard 提交于 1月 11, 2011

Newer versions of the System p firwmare send a partial RTAS error log in the
machine check handler with a more detailed response appearing sometime later
via check event.

This means at machine check time we do not have enough information to
ascertain exactly what went on. Furthermore, I have found the RTAS error
logs in the machine check handler contain no useful information, so halting on
them makes little sense. If we want to halt it would make more sense to do
it following the error log received sometime later via check event.

In light of this, never halt the error log in the pseries machine
check handler.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3f9793e6

powerpc: Don't force MSR_RI in machine_check_exception · a443506b

由 Anton Blanchard 提交于 1月 11, 2011

We should never force MSR_RI on. If we take a machine check with MSR_RI off
then we have no chance of recovering safely.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a443506b

powerpc: Print 32 bits of DSISR in show_regs · 7071854b

由 Anton Blanchard 提交于 1月 11, 2011

We were printing 64 bits of DSISR in show_regs even though it is 32 bit.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7071854b

powerpc/kdump: Disable ftrace during kexec · ac4414e4

由 Anton Blanchard 提交于 1月 06, 2011

We should disable ftrace during kexec, some of the tracers are very invasive
and we do not want them going off while doing the low level work of swapping
one kernel out for another. This mirrors what we do on x86.

Even though we cannot return from a kexec on powerpc (since we do not implement
CONFIG_KEXEC_JUMP), add the restore code in case we do one day.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

ac4414e4

powerpc/kdump: Move crash_kexec_stop_spus to kdump crash handler · 158d5b5e

由 Anton Blanchard 提交于 1月 21, 2011

Use the crash handler hooks to run the SPU stop code, just like we do for
ehea and cell RAS code.

While I'm here I noticed "CPUSs reliabally"

so fix the spelling MISTAKESs reliabally.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

158d5b5e

powerpc/kexec: Remove empty ppc_md.machine_kexec_prepare · c6baabfb

由 Anton Blanchard 提交于 1月 06, 2011

We check for a valid handler before calling ppc_md.machine_kexec_prepare
so we can just remove these empty handlers.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c6baabfb

powerpc/kexec: Don't initialise kexec hooks to default handlers · 2bb44d62

由 Anton Blanchard 提交于 1月 06, 2011

There's no need to initialise ppc_md.machine_kexec and
ppc_md.machine_kexec_prepare to the default handlers.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2bb44d62

powerpc/kdump: Remove ppc_md.machine_crash_shutdown · c1f784e5

由 Anton Blanchard 提交于 1月 06, 2011

No one uses ppc_md.machine_crash_shutdown, so remove it.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c1f784e5

powerpc/kexec: Remove ppc_md.machine_kexec · c9486878

由 Anton Blanchard 提交于 1月 06, 2011

No one uses ppc_md.machine_kexec, so remove it.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c9486878

powerpc/kexec: Remove ppc_md.machine_kexec_cleanup · 619b2677

由 Anton Blanchard 提交于 1月 06, 2011

No one uses ppc_md.machine_kexec_cleanup, so remove it.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

619b2677

powerpc/kexec: Move all ppc_md kexec function pointers together · 50266a1f

由 Anton Blanchard 提交于 1月 06, 2011

Move all the kexec handlers together.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

50266a1f

powerpc/cell: Use system_wq in cpufreq_spudemand · b18ae08d

由 Tejun Heo 提交于 1月 03, 2011

With cmwq, there's no reason to use a separate workqueue in
cpufreq_spudemand.  Use system_wq instead.  The work items are already
sync canceled on stop, so it's already guaranteed that no work is
running when spu_gov_exit() is entered.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Dave Jones <davej@redhat.com>
Cc: cpufreq@vger.kernel.org
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b18ae08d

powerpc/macintosh: Fix wrong test in fan_{read,write}_reg() · f32be0c5

由 roel kluin 提交于 12月 31, 2010

Fix error test in fan_{read,write}_reg()
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f32be0c5

powerpc/rtas_flash: Use simple_read_from_buffer · 4c4a5cf6

由 Akinobu Mita 提交于 12月 24, 2010

Simplify read file operation for /proc/powerpc/rtas/* interface
by using simple_read_from_buffer.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4c4a5cf6

powerpc/spufs: Use simple_write_to_buffer · 63c3b9d7

由 Akinobu Mita 提交于 12月 24, 2010

Simplify several write fileoperations for spufs by using
simple_write_to_buffer().
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

63c3b9d7

powerpc/ppc32/tracing: Add stack frame to calls of trace_hardirqs_on/off · 06ca2188

由 Steven Rostedt 提交于 12月 22, 2010

32-bit variant of the previous patch for 64-bit:

<<
    When an interrupt occurs in userspace, we can call trace_hardirqs_on/off()
    With one level stack. But if we have irqsoff tracing enabled,
    it checks both CALLER_ADDR0 and CALLER_ADDR1. The second call
    goes two stack frames up. If this is from user space, then there may
    not exist a second stack....
>>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

06ca2188

powerpc/ppc64/tracing: Add stack frame to calls of trace_hardirqs_on/off · 3cb5f1a3

由 Steven Rostedt 提交于 12月 23, 2010

    When an interrupt occurs in userspace, we can call trace_hardirqs_on/off()
    With one level stack. But if we have irqsoff tracing enabled,
    it checks both CALLER_ADDR0 and CALLER_ADDR1. The second call
    goes two stack frames up. If this is from user space, then there may
    not exist a second stack.

    Add a second stack when calling trace_hardirqs_on/off() otherwise
    the following oops might occur:

    Oops: Kernel access of bad area, sig: 11 [#1]
    PREEMPT SMP NR_CPUS=2 PA Semi PWRficient
    last sysfs file: /sys/block/sda/size
    Modules linked in: ohci_hcd ehci_hcd usbcore
    NIP: c0000000000e1c00 LR: c0000000000034d4 CTR: 000000011012c440
    REGS: c00000003e2f3af0 TRAP: 0300   Not tainted  (2.6.37-rc6+)
    MSR: 9000000000001032 <ME,IR,DR>  CR: 48044444  XER: 20000000
    DAR: 00000001ffb9db50, DSISR: 0000000040000000
    TASK = c00000003e1a00a0[2088] 'emacs' THREAD: c00000003e2f0000 CPU: 1
    GPR00: 0000000000000001 c00000003e2f3d70 c00000000084e0d0 c0000000008816e8
    GPR04: 000000001034c678 000000001032e8f9 0000000010336540 0000000040020000
    GPR08: 0000000040020000 00000001ffb9db40 c00000003e2f3e30 0000000060000000
    GPR12: 100000000000f032 c00000000fff0280 000000001032e8c9 0000000000000008
    GPR16: 00000000105be9c0 00000000105be950 00000000105be9b0 00000000105be950
    GPR20: 00000000ffb9dc50 00000000ffb9dbf0 00000000102f0000 00000000102f0000
    GPR24: 00000000102e0000 00000000102f0000 0000000010336540 c0000000009ded38
    GPR28: 00000000102e0000 c0000000000034d4 c0000000007ccb10 c00000003e2f3d70
    NIP [c0000000000e1c00] .trace_hardirqs_off+0xb0/0x1d0
    LR [c0000000000034d4] decrementer_common+0xd4/0x100
    Call Trace:
    [c00000003e2f3d70] [c00000003e2f3e30] 0xc00000003e2f3e30 (unreliable)
    [c00000003e2f3e30] [c0000000000034d4] decrementer_common+0xd4/0x100
    Instruction dump:
    81690000 7f8b0000 419e0018 f84a0028 60000000 60000000 60000000 e95f0000
    80030000 e92a0000 eb6301f8 2f800000 <eb890010> 41fe00dc a06d000a eb1e8050
    ---[ end trace 4ec7fd2be9240928 ]---
Reported-by: NJoerg Sommer <joerg@alea.gnuu.de>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3cb5f1a3

powerpc: Ensure the else case of feature sections will fit · c0337288

由 Michael Ellerman 提交于 11月 07, 2010

When we create an alternative feature section, the else case must be the
same size or smaller than the body. This is because when we patch the
else case in we just overwrite the body, so there must be room.

Up to now we just did this by inspection, but it's quite easy to enforce
it in the assembler, so we should.

The only change is to add the ifgt block, but that effects the alignment
of the tabs and so the whole macro is modified.

Also add a test, but #if 0 it because we don't want to break the build.
Anyone who's modifying the feature macros should enable the test.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c0337288

Merge branch 'core-fixes-for-linus' of... · 2b1caf6e

由 Linus Torvalds 提交于 1月 20, 2011

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  smp: Allow on_each_cpu() to be called while early_boot_irqs_disabled status to init/main.c
  lockdep: Move early boot local IRQ enable/disable status to init/main.c

2b1caf6e

ACPI / PM: Call suspend_nvs_free() earlier during resume · d551d81d

由 Rafael J. Wysocki 提交于 1月 19, 2011

It turns out that some device drivers map pages from the ACPI NVS region
during resume using ioremap(), which conflicts with ioremap_cache() used
for mapping those pages by the NVS save/restore code in nvs.c.

Make the NVS pages mapped by the code in nvs.c be unmapped before device
drivers' resume routines run.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d551d81d

ACPI: Introduce acpi_os_ioremap() · 2d6d9fd3

由 Rafael J. Wysocki 提交于 1月 19, 2011

Commit ca9b600b ("ACPI / PM: Make suspend_nvs_save() use
acpi_os_map_memory()") attempted to prevent the code in osl.c and nvs.c
from using different ioremap() variants by making the latter use
acpi_os_map_memory() for mapping the NVS pages.  However, that also
requires acpi_os_unmap_memory() to be used for unmapping them, which
causes synchronize_rcu() to be executed many times in a row
unnecessarily and introduces substantial delays during resume on some
systems.

Instead of using acpi_os_map_memory() for mapping the NVS pages in nvs.c
introduce acpi_os_ioremap() calling ioremap_cache() and make the code in
both osl.c and nvs.c use it.
Reported-by: NJeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d6d9fd3

Merge branch 'akpm' · 8d99641f

由 Linus Torvalds 提交于 1月 20, 2011

* akpm:
  kernel/smp.c: consolidate writes in smp_call_function_interrupt()
  kernel/smp.c: fix smp_call_function_many() SMP race
  memcg: correctly order reading PCG_USED and pc->mem_cgroup
  backlight: fix 88pm860x_bl macro collision
  drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking
  MAINTAINERS: update Atmel AT91 entry
  mm: fix truncate_setsize() comment
  memcg: fix rmdir, force_empty with THP
  memcg: fix LRU accounting with THP
  memcg: fix USED bit handling at uncharge in THP
  memcg: modify accounting function for supporting THP better
  fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio
  mm: compaction: prevent division-by-zero during user-requested compaction
  mm/vmscan.c: remove duplicate include of compaction.h
  memblock: fix memblock_is_region_memory()
  thp: keep highpte mapped until it is no longer needed
  kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT

8d99641f

kernel/smp.c: consolidate writes in smp_call_function_interrupt() · 225c8e01

由 Milton Miller 提交于 1月 20, 2011

We have to test the cpu mask in the interrupt handler before checking the
refs, otherwise we can start to follow an entry before its deleted and
find it partially initailzed for the next trip.  Presently we also clear
the cpumask bit before executing the called function, which implies
getting write access to the line.  After the function is called we then
decrement refs, and if they go to zero we then unlock the structure.

However, this implies getting write access to the call function data
before and after another the function is called.  If we can assert that no
smp_call_function execution function is allowed to enable interrupts, then
we can move both writes to after the function is called, hopfully allowing
both writes with one cache line bounce.

On a 256 thread system with a kernel compiled for 1024 threads, the time
to execute testcase in the "smp_call_function_many race" changelog was
reduced by about 30-40ms out of about 545 ms.

I decided to keep this as WARN because its now a buggy function, even
though the stack trace is of no value -- a simple printk would give us the
information needed.

Raw data:

Without patch:
  ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
  ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
  ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
  ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
  ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
  ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
  ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
  ipi_test startup took 21245824ns complete 530280180ns total 551526004ns

With patch:
  ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
  ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
  ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
  ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
  ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
  ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
  ipi_test startup took 6789954ns complete 493388112ns total 500178066ns

	#include <linux/module.h>
	#include <linux/init.h>
	#include <linux/sched.h> /* sched clock */

	#define ITERATIONS 100

	static void do_nothing_ipi(void *dummy)
	{
	}

	static void do_ipis(struct work_struct *dummy)
	{
		int i;

		for (i = 0; i < ITERATIONS; i++)
			smp_call_function(do_nothing_ipi, NULL, 1);

		printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
	}

	static struct work_struct work[NR_CPUS];

	static int __init testcase_init(void)
	{
		int cpu;
		u64 start, started, done;

		start = local_clock();
		for_each_online_cpu(cpu) {
			INIT_WORK(&work[cpu], do_ipis);
			schedule_work_on(cpu, &work[cpu]);
		}
		started = local_clock();
		for_each_online_cpu(cpu)
			flush_work(&work[cpu]);
		done = local_clock();
		pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
			started-start, done-started, done-start);

		return 0;
	}

	static void __exit testcase_exit(void)
	{
	}

	module_init(testcase_init)
	module_exit(testcase_exit)
	MODULE_LICENSE("GPL");
	MODULE_AUTHOR("Anton Blanchard");
Signed-off-by: NMilton Miller <miltonm@bga.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

225c8e01

kernel/smp.c: fix smp_call_function_many() SMP race · 6dc19899

由 Anton Blanchard 提交于 1月 20, 2011

I noticed a failure where we hit the following WARN_ON in
generic_smp_call_function_interrupt:

                if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
                        continue;

                data->csd.func(data->csd.info);

                refs = atomic_dec_return(&data->refs);
                WARN_ON(refs < 0);      <-------------------------

We atomically tested and cleared our bit in the cpumask, and yet the
number of cpus left (ie refs) was 0.  How can this be?

It turns out commit 54fdade1
("generic-ipi: make struct call_function_data lockless") is at fault.  It
removes locking from smp_call_function_many and in doing so creates a
rather complicated race.

The problem comes about because:

 - The smp_call_function_many interrupt handler walks call_function.queue
   without any locking.
 - We reuse a percpu data structure in smp_call_function_many.
 - We do not wait for any RCU grace period before starting the next
   smp_call_function_many.

Imagine a scenario where CPU A does two smp_call_functions back to back,
and CPU B does an smp_call_function in between.  We concentrate on how CPU
C handles the calls:

CPU A            CPU B                  CPU C              CPU D

smp_call_function
                                        smp_call_function_interrupt
                                            walks
					call_function.queue sees
					data from CPU A on list

                 smp_call_function

                                        smp_call_function_interrupt
                                            walks

                                        call_function.queue sees
                                          (stale) CPU A on list
							   smp_call_function int
							   clears last ref on A
							   list_del_rcu, unlock
smp_call_function reuses
percpu *data A
                                         data->cpumask sees and
                                         clears bit in cpumask
                                         might be using old or new fn!
                                         decrements refs below 0

set data->refs (too late!)

The important thing to note is since the interrupt handler walks a
potentially stale call_function.queue without any locking, then another
cpu can view the percpu *data structure at any time, even when the owner
is in the process of initialising it.

The following test case hits the WARN_ON 100% of the time on my PowerPC
box (having 128 threads does help :)

#include <linux/module.h>
#include <linux/init.h>

#define ITERATIONS 100

static void do_nothing_ipi(void *dummy)
{
}

static void do_ipis(struct work_struct *dummy)
{
	int i;

	for (i = 0; i < ITERATIONS; i++)
		smp_call_function(do_nothing_ipi, NULL, 1);

	printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
}

static struct work_struct work[NR_CPUS];

static int __init testcase_init(void)
{
	int cpu;

	for_each_online_cpu(cpu) {
		INIT_WORK(&work[cpu], do_ipis);
		schedule_work_on(cpu, &work[cpu]);
	}

	return 0;
}

static void __exit testcase_exit(void)
{
}

module_init(testcase_init)
module_exit(testcase_exit)
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anton Blanchard");

I tried to fix it by ordering the read and the write of ->cpumask and
->refs.  In doing so I missed a critical case but Paul McKenney was able
to spot my bug thankfully :) To ensure we arent viewing previous
iterations the interrupt handler needs to read ->refs then ->cpumask then
->refs _again_.

Thanks to Milton Miller and Paul McKenney for helping to debug this issue.

[miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
[miltonm@bga.com: remove excess tests]
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NMilton Miller <miltonm@bga.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: <stable@kernel.org> [2.6.32+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6dc19899

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功