提交 · 1ea2950884aa320c46315c8ddf62717c6ecf78d0 · openeuler / raspberrypi-kernel

12 8月, 2008 23 次提交

Merge branch 'sched-fixes-for-linus' of... · 1ea29508

由 Linus Torvalds 提交于 8月 11, 2008

Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks
  sched: fix mysql+oltp regression
  sched_clock: delay using sched_clock()
  sched clock: couple local and remote clocks
  sched clock: simplify __update_sched_clock()
  sched: eliminate scd->prev_raw
  sched clock: clean up sched_clock_cpu()
  sched clock: revert various sched_clock() changes
  sched: move sched_clock before first use
  sched: test runtime rather than period in global_rt_runtime()
  sched: fix SCHED_HRTICK dependency
  sched: fix warning in hrtick_start_fair()

1ea29508

Merge branch 'timers-fixes-for-linus' of... · 67a077dc

由 Linus Torvalds 提交于 8月 11, 2008

Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  posix-timers: fix posix_timer_event() vs dequeue_signal() race
  posix-timers: do_schedule_next_timer: fix the setting of ->si_overrun

67a077dc

Merge branch 'core-fixes-for-linus' of... · 9b4d0bab

由 Linus Torvalds 提交于 8月 11, 2008

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  lockdep: fix debug_lock_alloc
  lockdep: increase MAX_LOCKDEP_KEYS
  generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask()
  lockdep: fix overflow in the hlock shrinkage code
  lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]()
  lockdep: handle chains involving classes defined in modules
  mm: fix mm_take_all_locks() locking order
  lockdep: annotate mm_take_all_locks()
  lockdep: spin_lock_nest_lock()
  lockdep: lock protection locks
  lockdep: map_acquire
  lockdep: shrink held_lock structure
  lockdep: re-annotate scheduler runqueues
  lockdep: lock_set_subclass - reset a held lock's subclass
  lockdep: change scheduler annotation
  debug_locks: set oops_in_progress if we will log messages.
  lockdep: fix combinatorial explosion in lock subgraph traversal

9b4d0bab

Merge branch 'x86-fixes-for-linus' of... · 7019b1b5

由 Linus Torvalds 提交于 8月 11, 2008

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix 2.6.27rc1 cannot boot more than 8CPUs
  x86: make "apic" an early_param() on 32-bit, NULL check
  EFI, x86: fix function prototype
  x86, pci-calgary: fix function declaration
  x86: work around gcc 3.4.x bug
  x86: make "apic" an early_param() on 32-bit
  x86, debug: tone down arch/x86/kernel/mpparse.c debugging printk
  x86_64: restore the proper NR_IRQS define so larger systems work.
  x86: Restore proper vector locking during cpu hotplug
  x86: Fix broken VMI in 2.6.27-rc..
  x86: fdiv bug detection fix

7019b1b5

I

Merge branch 'core/locking' into core/urgent · 23a0ee90
由 Ingo Molnar 提交于 8月 12, 2008

23a0ee90
I

Merge branch 'sched/clock' into sched/urgent · e26b33e9
由 Ingo Molnar 提交于 8月 12, 2008

e26b33e9

lockdep: fix debug_lock_alloc · 0f2bc27b

由 Peter Zijlstra 提交于 8月 11, 2008

When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
we should be doing hlock checking (check=1).

This causes a false warning and a lockdep self-disable.

Rectify this.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0f2bc27b

x86: fix 2.6.27rc1 cannot boot more than 8CPUs · b74548e7

由 Yinghai Lu 提交于 8月 11, 2008

Jeff Chua reported that booting a !bigsmp kernel on a 16-way box
hangs silently.

this is a long-standing issue, smp start AP cpu could check the
apic id >=8 etc before trying to start it.

achieve this by moving the def_to_bigsmp check later and skip the
apicid id > 8

[ mingo@elte.hu: clean up the message that is printed. ]
Reported-by: N"Jeff Chua" <jeff.chua.linux@gmail.com>
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

 arch/x86/kernel/setup.c   |    6 ------
 arch/x86/kernel/smpboot.c |   10 ++++++++++
 2 files changed, 10 insertions(+), 6 deletions(-)

b74548e7

make struct scsi_dh_devlist's static · f08c0761

由 Adrian Bunk 提交于 8月 11, 2008

This patch makes several needlessly global struct scsi_dh_devlist's
static.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f08c0761

Merge branch 'for-linus' of git://git.o-hand.com/linux-mfd · 10fec20e

由 Linus Torvalds 提交于 8月 11, 2008

* 'for-linus' of git://git.o-hand.com/linux-mfd:
  mfd: tc6393 cleanup and update
  mfd: have TMIO drivers and subdevices depend on ARM
  mfd: TMIO MMC driver
  mfd: driver for the TMIO NAND controller
  mfd: t7l66 MMC platform data
  mfd: tc6387 MMC platform data
  mfd: Fix 7l66 and 6387 according to the new mfd-core API
  mfd: Fix tc6393 according to the new tmio.h
  mfd: driver for the TC6387XB TMIO controller.
  mfd: driver for the T7L66XB TMIO SoC
  mfd: TMIO MMC structures and accessors.

10fec20e

Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6 · 29bb1bdb

由 Linus Torvalds 提交于 8月 11, 2008

* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  hwmon: (lm75) Drop legacy i2c driver
  i2c: correct some size_t printk formats
  i2c: Check for address business before creating clients
  i2c: Let users select algorithm drivers manually again
  i2c: Fix NULL pointer dereference in i2c_new_probed_device
  i2c: Fix oops on bus multiplexer driver loading

29bb1bdb

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · 3f1ae223

由 Linus Torvalds 提交于 8月 11, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [WATCHDOG] pcwd.c - fix open_allowed type.
  [WATCHDOG] fix watchdog/ixp4xx_wdt.c compilation
  [WATCHDOG] fix watchdog/wdt285.c compilation
  [WATCHDOG] fix watchdog/at91rm9200_wdt.c compilation
  [WATCHDOG] fix watchdog/shwdt.c compilation
  [WATCHDOG] fix watchdog/txx9wdt.c compilation
  [WATCHDOG] MAINTAINERS: remove ZF MACHZ WATCHDOG entry
  [WATCHDOG] Fix build with CONFIG_ITCO_VENDOR_SUPPORT=n

3f1ae223

x86: make "apic" an early_param() on 32-bit, NULL check · 48d97cb6

由 Rene Herman 提交于 8月 11, 2008

Cyrill Gorcunov observed:

> you turned it into early_param so now it's NULL injecting vulnerabled.
> Could you please add checking for NULL str param?

fix that.

Also, change the name of 'str' into 'arg', to make it more apparent
that this is an optional argument that can be NULL, not a string
parameter that is empty when unset.
Reported-by: NCyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: NRene Herman <rene.herman@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

48d97cb6

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc · e2205a15

由 Linus Torvalds 提交于 8月 11, 2008

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  powerpc: Remove include/linux/harrier_defs.h
  powerpc: Do not ignore arch/powerpc/include
  powerpc: Delete completed "ppc removal" task from feature removal file
  powerpc/mm: Fix attribute confusion with htab_bolt_mapping()
  powerpc/pci: Don't keep ISA memory hole resources in the tree
  powerpc: Zero fill the return values of rtas argument buffer
  powerpc/4xx: Update defconfig files for 2.6.27-rc1
  powerpc/44x: Incorrect NOR offset in Warp DTS
  powerpc/44x: Warp DTS changes for board updates
  powerpc/4xx: Cleanup Warp for i2c driver changes.
  powerpc/44x: Adjust warp-nand resource end address

e2205a15

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 · a7ef6a40

由 Linus Torvalds 提交于 8月 11, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: Limit VPD length for Broadcom 5708S
  PCI PM: Export pci_pme_active to drivers
  PCI: remove duplicate symbol from pci_ids.h
  PCI: check the return value of device_create_bin_file() in pci_create_bus()
  PCI: fully restore MSI state at resume time
  DMA: make dma-coherent.c documentation kdoc-friendly
  PCI: make pci_register_driver() a macro
  PCI: add Broadcom 5708S to VPD length quirk

a7ef6a40

Fix race/oops in tty layer after BKL pushdown · 000b9151

由 Christian Borntraeger 提交于 8月 11, 2008

While testing our KVM code for s390 (starting and killall kvm in a loop)
I can reproduce the following oops:

  Unable to handle kernel pointer dereference at virtual kernel address 6b6b6b6b6b6b6000 Oops: 0038 [#1] SMP
  Modules linked in: dm_multipath sunrpc qeth_l3 qeth_l2 dm_mod qeth
  ccwgroup CPU: 1 Not tainted 2.6.27-rc1 #54
  Process kuli (pid: 4409, task: 00000000b6aa5940, ksp: 00000000b7343e10)
  Krnl PSW : 0704e00180000000 00000000002e0b8c
  (disassociate_ctty+0x1c0/0x288) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3
  CC:2 PM:0 EA:3 Krnl GPRS: 0000000000000000 6b6b6b6b6b6b6b6b
  0000000000000001 00000000000003a6 00000000002e0a46 00000000004b4160
  0000000000000001 00000000bbd79758 00000000b7343e58 00000000b8854148
  00000000bd34dea0 00000000b7343c20 0000000000000001 00000000004b6d08
  00000000002e0a46 00000000b7343c20 Krnl Code: 00000000002e0b7e:
  eb9fb0a00004	lmg	%r9,%r15,160(%r11) 00000000002e0b84:
  07f4		bcr	15,%r4 00000000002e0b86:
  e31090080004	lg	%r1,8(%r9) >00000000002e0b8c:
  d501109cd000	clc	156(2,%r1),0(%r13) 00000000002e0b92:
  a784ff5d		brc	8,2e0a4c 00000000002e0b96:
  b9040029		lgr	%r2,%r9 00000000002e0b9a:
  c0e5fffff9c3	brasl	%r14,2dff20 00000000002e0ba0:
  a7f4ff56		brc	15,2e0a4c Call Trace:
  ([<00000000002e0a46>] disassociate_ctty+0x7a/0x288)
   [<0000000000141fe6>] do_exit+0x212/0x8d4
   [<0000000000142708>] do_group_exit+0x60/0xcc
   [<0000000000150660>] get_signal_to_deliver+0x270/0x3ac
   [<000000000010bfd6>] do_signal+0x8e/0x8dc
   [<0000000000113772>] sysc_sigpending+0xe/0x22
   [<000001ff0000b134>] 0x1ff0000b134
  INFO: lockdep is turned off.
  Last Breaking-Event-Address:
   [<00000000002e0a48>] disassociate_ctty+0x7c/0x288
  Kernel panic - not syncing: Fatal exception: panic_on_oops

It seems that tty was already free in disassocate_ctty when it tries
to dereference tty->driver.

After moving the lock_kernel before the mutex_unlock, I can no longer
reproduce the problem.

[ This is a temporary partial fix for the documented and long standing
  race in disassociate_tty.  This stops most problem cases for now.

  For the next release the -next tree has an initial implementation of
  kref counting for tty structures and this quickfix will be dropped.

                                                              - Alan ]
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by; Alan Cox <alan@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

000b9151

m68k{,nommu}: Wire up new system calls · 0e7d5bb8

由 Geert Uytterhoeven 提交于 8月 11, 2008

Wire up for m68k{,nommu} the system calls that were added in the last merge
window:

 - 4006553b ("flag parameters: inotify_init")
 - ed8cae8b ("flag parameters: pipe")
 - 336dd1f7 ("flag parameters: dup2")
 - a0998b50 ("flag parameters: epoll_create")
 - 9fe5ad9c ("flag parameters add-on: remove
						 epoll_create size param")
 - b087498e ("flag parameters: eventfd")
 - 9deb27ba ("flag parameters: signalfd")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NGreg Ungerer <gerg@uclinux.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e7d5bb8

Revert "fbcon: bgcolor fix" · 3838f59f

由 Linus Torvalds 提交于 8月 11, 2008

This reverts commit 2d04a4a7, which made
it impossible to make the softcursor use the highlight colors.

Yes, the fourth bit should be "blinking", but since we cannot reasonably
blink in fbcon, highlighting it with a bright background is preferable.
Reported-by: NPavel Machek <pavel@suse.cz>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Antonino A. Daplas <adaplas@pol.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3838f59f

EFI, x86: fix function prototype · b0fbaa6b

由 Randy Dunlap 提交于 8月 07, 2008

Fix function prototype in header file to match source code:

linux-next-20080807/arch/x86/kernel/efi_64.c:100:14: error: symbol 'efi_ioremap' redeclared with different type (originally declared at include2/asm/efi.h:89) - different address spaces
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b0fbaa6b

x86, pci-calgary: fix function declaration · 9b0094f7

由 Randy Dunlap 提交于 8月 07, 2008

Fix function declaration:

linux-next-20080807/arch/x86/kernel/pci-calgary_64.c:1353:36: warning: non-ANSI function declaration of function 'get_tce_space_from_tar'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NAcked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9b0094f7

x86: work around gcc 3.4.x bug · cf3e5050

由 Jeremy Fitzhardinge 提交于 8月 08, 2008

Simon Horman reported that gcc-3.4.x crashes when compiling
pgd_prepopulate_pmd() when PREALLOCATED_PMDS == 0 and CONFIG_DEBUG_INFO
is enabled.

Adding an extra check for PREALLOCATED_PMDS == 0 [which is compiled out
by gcc] seems to avoid the problem.
Reported-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cf3e5050

x86: make "apic" an early_param() on 32-bit · fb6bef80

由 Rene Herman 提交于 8月 11, 2008

On 32-bit, "apic" is a __setup() param meaning it is parsed rather
late in the game. Make it an early_param() for apic_printk() use
by arch/x86/kernel/mpparse.c.

On 64-bit, it already is an early_param().
Signed-off-by: NRene Herman <rene.herman@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fb6bef80

x86, debug: tone down arch/x86/kernel/mpparse.c debugging printk · eeb0d7d1

由 Rene Herman 提交于 8月 11, 2008

commit 11a62a05 turns some formerly
nopped debugging printks in arch/x86/kernel/mppparse.c into regular
ones. The one at the top of smp_scan_config() in particular also
prints on !CONFIG_SMP/CONFIG_X86_LOCAL_APIC kernels and UP machines
without anything resembling MP tables which makes their lowly UP
owners wonder...

Turn the former Dprintk()s into apic_printk()s instead meaning that
their printing is dependent on passing the apic=verbose (or =debug)
command line param.

On 32-bit, "apic" is a __setup() param which isn't early enough
for this code and therefore needs a followup changing it into an
early_param(). On 64-bit, it already is.
Signed-off-by: NRene Herman <rene.herman@gmail.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

eeb0d7d1

11 8月, 2008 17 次提交

sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks · 279ef6bb

由 Dmitry Adamushko 提交于 7月 30, 2008

Mark Langsdorf reported:

> One of my co-workers noticed that the powernow-k8
> driver no longer restarts when a CPU core is
> hot-disabled and then hot-enabled on AMD quad-core
> systems.
>
> The following comands work fine on 2.6.26 and fail
> on 2.6.27-rc1:
>
> echo 0 > /sys/devices/system/cpu/cpu3/online
> echo 1 > /sys/devices/system/cpu/cpu3/online
> find /sys -name cpufreq
>
> For 2.6.26, the find will return a cpufreq
> directory for each processor.  In 2.6.27-rc1,
> the cpu3 directory is missing.
>
> After digging through the code, the following
> logic is failing when the core is hot-enabled
> at runtime.  The code works during the boot
> sequence.
>
>       cpumask_t = current->cpus_allowed;
>       set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
>       if (smp_processor_id() != cpu)
>               return -ENODEV;

So set the CPU active before calling the CPU_ONLINE notifier chain,
there are a handful of notifiers that use set_cpus_allowed().

This fix also solves the problem with x86-microcode. I've sent
alternative patches for microcode, but as this "rely on
set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
assumption seems to be more broad than what we thought, perhaps this fix
should be applied.

With this patch we define that by the moment CPU_ONLINE is being sent,
a 'cpu' is online and ready for tasks to be migrated onto it.
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Reported-by: NMark Langsdorf <mark.langsdorf@amd.com>
Tested-by: NMark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

279ef6bb

lockdep: increase MAX_LOCKDEP_KEYS · e5f363e3

由 Ingo Molnar 提交于 8月 11, 2008

certain configs produce:

 [   70.076229] BUG: MAX_LOCKDEP_KEYS too low!
 [   70.080230] turning off the locking correctness validator.

tune them up.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e5f363e3

generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() · cc7a486c

由 Nick Piggin 提交于 8月 11, 2008

* Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:

> Found a OOPS on a big SMP box during an overnight reboot test with
> upstream git.
>
> Suresh and I looked at the oops and looks like the root cause is in
> generic_smp_call_function_interrupt() and smp_call_function_mask() with
> wait parameter.
>
> The actual oops looked like
>
> [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
> [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
> [   11.278155] PGD 202063 PUD 0
> [   11.278576] Oops: 0010 [1] SMP
> [   11.279006] CPU 5
> [   11.279336] Modules linked in:
> [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f7 #290
> [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
> [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
> [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
> [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
> [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
> [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
> [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
> [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
> [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
> [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
> [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
> [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
> [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
> [   11.287120] Call Trace:
> [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
> [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
> [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
> [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
> [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
> [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
> [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
> [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
> [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
> [   11.291010]
> [   11.291287]
> [   11.291654] Code:  Bad RIP value.
> [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
> [   11.292380]  RSP <ffff88027f1f7f70>
> [   11.292741] CR2: ffff8802ffffffff
> [   11.310951] ---[ end trace 137c54d525305f1c ]---
>
> The problem is with the following sequence of events:
>
> - CPU A calls smp_call_function_mask() for CPU B with wait parameter
> - CPU A sets up the call_function_data on the stack and does an rcu add to
>   call_function_queue
> - CPU A waits until the WAIT flag is cleared
> - CPU B gets the call function interrupt and starts going through the
>   call_function_queue
> - CPU C also gets some other call function interrupt and starts going through
>   the call_function_queue
> - CPU C, which is also going through the call_function_queue, starts referencing
>   CPU A's stack, as that element is still in call_function_queue
> - CPU B finishes the function call that CPU A set up and as there are no other
>   references to it, rcu deletes the call_function_data (which was from CPU A
>   stack)
> - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
> - CPU A which was waiting on the flag continues executing and the stack
>   contents change
>
> - CPU C is still in rcu_read section accessing the CPU A's stack sees
>   inconsistent call_funation_data and can try to execute
>   function with some random pointer, causing stack corruption for A
>   (by clearing the bits in mask field) and oops.

Nice debugging work.

I'd suggest something like the attached (boot tested) patch as the simple
fix for now.

I expect the benefits from the less synchronized, multiple-in-flight-data
global queue will still outweigh the costs of dynamic allocations. But
if worst comes to worst then we just go back to a globally synchronous
one-at-a-time implementation, but that would be pretty sad!
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cc7a486c

sched: fix mysql+oltp regression · 77ae6513

由 Mike Galbraith 提交于 8月 11, 2008

Defer commit 6d299f1b to the next release.

Testing of the tip/sched/clock tree revealed a mysql+oltp regression
which bisection eventually traced back to this commit in mainline.

Pertinent test results:  Three run sysbench averages, throughput units
in read/write requests/sec.

clients         1     2     4     8    16    32    64
6e0534f2      9646 17876 34774 33868 32230 30767 29441
2.6.26.1     9112 17936 34652 33383 31929 30665 29232
6d299f1b      9112 14637 28370 33339 32038 30762 29204

Note: subsequent commits hide the majority of this regression until you
apply the clock fixes, at which time it reemerges at full magnitude.

We cannot see anything bad about the change itself so we defer it to the
next release until this problem is fully analysed.
Signed-off-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

77ae6513

I

Merge branch 'linus' into sched/urgent · 251a169c
由 Ingo Molnar 提交于 8月 11, 2008

251a169c

powerpc: Remove include/linux/harrier_defs.h · 13fa00a8

由 Paul Mackerras 提交于 8月 11, 2008

It was only used by code in arch/ppc, and arch/ppc is gone, so remove
the unused harrier_defs.h as well.
Signed-off-by: NPaul Mackerras <paulus@samba.org>

13fa00a8

lockdep: fix overflow in the hlock shrinkage code · b42e737e

由 Peter Zijlstra 提交于 8月 11, 2008

There is a overflow by 1 case in the new shrunken hlock code.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b42e737e

x86_64: restore the proper NR_IRQS define so larger systems work. · 3c7569b2

由 Eric W. Biederman 提交于 8月 10, 2008

As pointed out and tracked by Yinghai Lu <yhlu.kernel@gmail.com>:

 Dhaval Giani got:
 kernel BUG at arch/x86/kernel/io_apic_64.c:357!
 invalid opcode: 0000 [1] SMP
 CPU 24
 ...

his system (x3950) has 8 ioapic, irq > 256

This was caused by:

       commit 9b7dc567
       Author: Thomas Gleixner <tglx@linutronix.de>
       Date:   Fri May 2 20:10:09 2008 +0200

          x86: unify interrupt vector defines

          The interrupt vector defines are copied 4 times around with minimal
          differences. Move them all into asm-x86/irq_vectors.h

It appears that Thomas did not notice that x86_64 does something
completely different when he merge irq_vectors.h

We can solve this for 2.6.27 by simply reintroducing the old heuristic
for setting NR_IRQS on x86_64 to a usable value, which trivially removes
the regression.

Long term it would be nice to harmonize the handling of ioapic interrupts
of x86_32 and x86_64 so we don't have this kind of confusion.

Dhaval Giani <dhaval@linux.vnet.ibm.com> tested an earlier version of
this patch by YH which confirms simply increasing NR_IRQS fixes the
problem.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NYinghai Lu <yhlu.kernel@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3c7569b2

x86: Restore proper vector locking during cpu hotplug · d388e5fd

由 Eric W. Biederman 提交于 8月 09, 2008

Having cpu_online_map change during assign_irq_vector can result
in some really nasty and weird things happening.  The one that
bit me last time was accessing non existent per cpu memory for non
existent cpus.

This locking was removed in a sloppy x86_64 and x86_32 merge patch.

Guys can we please try and avoid subtly breaking x86 when we are
merging files together?
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d388e5fd

lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]() · 3295f0ef

由 Ingo Molnar 提交于 8月 11, 2008

the names were too generic:

 drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
 drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
 drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3295f0ef

lockdep: handle chains involving classes defined in modules · 8bfe0298

由 Rabin Vincent 提交于 8月 11, 2008

Solve this by marking the classes as unused and not printing information
about the unused classes.
Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
Signed-off-by: NRabin Vincent <rabin@rab.in>
Acked-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8bfe0298

mm: fix mm_take_all_locks() locking order · 7cd5a02f

由 Peter Zijlstra 提交于 8月 11, 2008

Lockdep spotted:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc1 #270
-------------------------------------------------------
qemu-kvm/2033 is trying to acquire lock:
 (&inode->i_data.i_mmap_lock){----}, at: [<ffffffff802996cc>] mm_take_all_locks+0xc2/0xea

but task is already holding lock:
 (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&anon_vma->lock){----}:
       [<ffffffff8025cd37>] __lock_acquire+0x11be/0x14d2
       [<ffffffff8025d0a9>] lock_acquire+0x5e/0x7a
       [<ffffffff804c655b>] _spin_lock+0x3b/0x47
       [<ffffffff8029a2ef>] vma_adjust+0x200/0x444
       [<ffffffff8029a662>] split_vma+0x12f/0x146
       [<ffffffff8029bc60>] mprotect_fixup+0x13c/0x536
       [<ffffffff8029c203>] sys_mprotect+0x1a9/0x21e
       [<ffffffff8020c0db>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&inode->i_data.i_mmap_lock){----}:
       [<ffffffff8025ca54>] __lock_acquire+0xedb/0x14d2
       [<ffffffff8025d397>] lock_release_non_nested+0x1c2/0x219
       [<ffffffff8025d515>] lock_release+0x127/0x14a
       [<ffffffff804c6403>] _spin_unlock+0x1e/0x50
       [<ffffffff802995d9>] mm_drop_all_locks+0x7f/0xb0
       [<ffffffff802a965d>] do_mmu_notifier_register+0xe2/0x112
       [<ffffffff802a96a8>] mmu_notifier_register+0xe/0x10
       [<ffffffffa0043b6b>] kvm_dev_ioctl+0x11e/0x287 [kvm]
       [<ffffffff802bd0ca>] vfs_ioctl+0x2a/0x78
       [<ffffffff802bd36f>] do_vfs_ioctl+0x257/0x274
       [<ffffffff802bd3e1>] sys_ioctl+0x55/0x78
       [<ffffffff8020c0db>] system_call_fastpath+0x16/0x1b
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

5 locks held by qemu-kvm/2033:
 #0:  (&mm->mmap_sem){----}, at: [<ffffffff802a95d0>] do_mmu_notifier_register+0x55/0x112
 #1:  (mm_all_locks_mutex){--..}, at: [<ffffffff8029963e>] mm_take_all_locks+0x34/0xea
 #2:  (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea
 #3:  (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea
 #4:  (&anon_vma->lock){----}, at: [<ffffffff8029967a>] mm_take_all_locks+0x70/0xea

stack backtrace:
Pid: 2033, comm: qemu-kvm Not tainted 2.6.27-rc1 #270

Call Trace:
 [<ffffffff8025b7c7>] print_circular_bug_tail+0xb8/0xc3
 [<ffffffff8025ca54>] __lock_acquire+0xedb/0x14d2
 [<ffffffff80259bb1>] ? add_lock_to_list+0x7e/0xad
 [<ffffffff8029967a>] ? mm_take_all_locks+0x70/0xea
 [<ffffffff8029967a>] ? mm_take_all_locks+0x70/0xea
 [<ffffffff8025d397>] lock_release_non_nested+0x1c2/0x219
 [<ffffffff802996cc>] ? mm_take_all_locks+0xc2/0xea
 [<ffffffff802996cc>] ? mm_take_all_locks+0xc2/0xea
 [<ffffffff8025b202>] ? trace_hardirqs_on_caller+0x4d/0x115
 [<ffffffff802995d9>] ? mm_drop_all_locks+0x7f/0xb0
 [<ffffffff8025d515>] lock_release+0x127/0x14a
 [<ffffffff804c6403>] _spin_unlock+0x1e/0x50
 [<ffffffff802995d9>] mm_drop_all_locks+0x7f/0xb0
 [<ffffffff802a965d>] do_mmu_notifier_register+0xe2/0x112
 [<ffffffff802a96a8>] mmu_notifier_register+0xe/0x10
 [<ffffffffa0043b6b>] kvm_dev_ioctl+0x11e/0x287 [kvm]
 [<ffffffff8033f9f2>] ? file_has_perm+0x83/0x8e
 [<ffffffff802bd0ca>] vfs_ioctl+0x2a/0x78
 [<ffffffff802bd36f>] do_vfs_ioctl+0x257/0x274
 [<ffffffff802bd3e1>] sys_ioctl+0x55/0x78
 [<ffffffff8020c0db>] system_call_fastpath+0x16/0x1b

Which the locking hierarchy in mm/rmap.c confirms as valid.

Fix this by first taking all the mapping->i_mmap_lock instances and then
take all anon_vma->lock instances.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7cd5a02f

lockdep: annotate mm_take_all_locks() · 454ed842

由 Peter Zijlstra 提交于 8月 11, 2008

The nesting is correct due to holding mmap_sem, use the new annotation
to annotate this.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

454ed842

lockdep: spin_lock_nest_lock() · b7d39aff

由 Peter Zijlstra 提交于 8月 11, 2008

Expose the new lock protection lock.

This can be used to annotate places where we take multiple locks of the
same class and avoid deadlocks by always taking another (top-level) lock
first.

NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b7d39aff

lockdep: lock protection locks · 7531e2f3

由 Peter Zijlstra 提交于 8月 11, 2008

On Fri, 2008-08-01 at 16:26 -0700, Linus Torvalds wrote:

> On Fri, 1 Aug 2008, David Miller wrote:
> >
> > Taking more than a few locks of the same class at once is bad
> > news and it's better to find an alternative method.
>
> It's not always wrong.
>
> If you can guarantee that anybody that takes more than one lock of a
> particular class will always take a single top-level lock _first_, then
> that's all good. You can obviously screw up and take the same lock _twice_
> (which will deadlock), but at least you cannot get into ABBA situations.
>
> So maybe the right thing to do is to just teach lockdep about "lock
> protection locks". That would have solved the multi-queue issues for
> networking too - all the actual network drivers would still have taken
> just their single queue lock, but the one case that needs to take all of
> them would have taken a separate top-level lock first.
>
> Never mind that the multi-queue locks were always taken in the same order:
> it's never wrong to just have some top-level serialization, and anybody
> who needs to take <n> locks might as well do <n+1>, because they sure as
> hell aren't going to be on _any_ fastpaths.
>
> So the simplest solution really sounds like just teaching lockdep about
> that one special case. It's not "nesting" exactly, although it's obviously
> related to it.

Do as Linus suggested. The lock protection lock is called nest_lock.

Note that we still have the MAX_LOCK_DEPTH (48) limit to consider, so anything
that spills that it still up shit creek.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7531e2f3

lockdep: map_acquire · 4f3e7524

由 Peter Zijlstra 提交于 8月 11, 2008

Most the free-standing lock_acquire() usages look remarkably similar, sweep
them into a new helper.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4f3e7524

lockdep: shrink held_lock structure · f82b217e

由 Dave Jones 提交于 8月 11, 2008

struct held_lock {
        u64                        prev_chain_key;       /*     0     8 */
        struct lock_class *        class;                /*     8     8 */
        long unsigned int          acquire_ip;           /*    16     8 */
        struct lockdep_map *       instance;             /*    24     8 */
        int                        irq_context;          /*    32     4 */
        int                        trylock;              /*    36     4 */
        int                        read;                 /*    40     4 */
        int                        check;                /*    44     4 */
        int                        hardirqs_off;         /*    48     4 */

        /* size: 56, cachelines: 1 */
        /* padding: 4 */
        /* last cacheline: 56 bytes */
};

struct held_lock {
        u64                        prev_chain_key;       /*     0     8 */
        long unsigned int          acquire_ip;           /*     8     8 */
        struct lockdep_map *       instance;             /*    16     8 */
        unsigned int               class_idx:11;         /*    24:21  4 */
        unsigned int               irq_context:2;        /*    24:19  4 */
        unsigned int               trylock:1;            /*    24:18  4 */
        unsigned int               read:2;               /*    24:16  4 */
        unsigned int               check:2;              /*    24:14  4 */
        unsigned int               hardirqs_off:1;       /*    24:13  4 */

        /* size: 32, cachelines: 1 */
        /* padding: 4 */
        /* bit_padding: 13 bits */
        /* last cacheline: 32 bytes */
};

[mingo@elte.hu: shrunk hlock->class too]
[peterz@infradead.org: fixup bit sizes]
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

f82b217e