提交 · d9d05fda919cb6414ae4889b696b2fada7a23217 · xiphi1978 / linux

08 5月, 2009 5 次提交

x86: MCE: make cmci_discover_lock irq-safe · e5299926

由 Hidetoshi Seto 提交于 5月 08, 2009

Lockdep reports the warning below when Li tries to offline one cpu:

[  110.835487] =================================
[  110.835616] [ INFO: inconsistent lock state ]
[  110.835688] 2.6.30-rc4-00336-g8c9ed899 #52
[  110.835757] ---------------------------------
[  110.835828] inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
[  110.835908] swapper/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
[  110.835982]  (cmci_discover_lock){?.+...}, at: [<ffffffff80236dc0>] cmci_clear+0x30/0x9b

cmci_clear() can be called via smp_call_function_single().

It is better to disable interrupt while holding cmci_discover_lock,
to turn it into an irq-safe lock - we can deadlock otherwise.

[ Impact: fix possible deadlock in the MCE code ]
Reported-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A03ED38.8000700@jp.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Reported-by: Shaohua Li<shaohua.li@intel.com>

e5299926

x86: xen, i386: reserve Xen pagetables · 33df4db0

由 Jeremy Fitzhardinge 提交于 5月 07, 2009

The Xen pagetables are no longer implicitly reserved as part of the other
i386_start_kernel reservations, so make sure we explicitly reserve them.
This prevents them from being released into the general kernel free page
pool and reused.

[ Impact: fix Xen guest crash ]
Also-Bisected-by: NBryan Donlan <bdonlan@gmail.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4A032EEC.30509@goop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

33df4db0

x86, kexec: fix crashdump panic with CONFIG_KEXEC_JUMP · 6407df5c

由 Huang Ying 提交于 5月 08, 2009

Tim Starling reported that crashdump will panic with kernel compiled
with CONFIG_KEXEC_JUMP due to null pointer deference in
machine_kexec_32.c: machine_kexec(), when deferencing
kexec_image. Refering to:

http://bugzilla.kernel.org/show_bug.cgi?id=13265

This patch fixes the BUG via replacing global variable reference:
kexec_image in machine_kexec() with local variable reference: image,
which is more appropriate, and will not be null.

Same BUG is in machine_kexec_64.c too, so fixed too in the same way.

[ Impact: fix crash on kexec ]
Reported-by: NTim Starling <tstarling@wikimedia.org>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
LKML-Reference: <1241751101.6259.85.camel@yhuang-dev.sh.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

6407df5c

x86-64: finish cleanup_highmaps()'s job wrt. _brk_end · 49834396

由 Jan Beulich 提交于 5月 06, 2009

With the introduction of the .brk section, special care must be taken
that no unused page table entries remain if _brk_end and _end are
separated by a 2M page boundary. cleanup_highmap() runs very early and
hence cannot take care of that, hence potential entries needing to be
removed past _brk_end must be cleared once the brk allocator has done
its job.

[ Impact: avoids undesirable TLB aliases ]
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

49834396

x86: fix boot hang in early_reserve_e820() · 61438766

由 Jan Beulich 提交于 5月 06, 2009

If the first non-reserved (sub-)range doesn't fit the size requested,
an endless loop will be entered. If a range returned from
find_e820_area_size() turns out insufficient in size, the range must
be skipped before calling the function again.

[ Impact: fixes boot hang on some platforms ]
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

61438766

06 5月, 2009 2 次提交

x86: Fix a typo in a printk message · e0e5ea32

由 Nikanth Karthikesan 提交于 5月 04, 2009

[ Impact: printk message cleanup ]
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
LKML-Reference: <200905040908.27299.knikanth@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e0e5ea32

x86, srat: do not register nodes beyond e820 map · 7eccf7b2

由 David Rientjes 提交于 5月 05, 2009

The mem= option will truncate the memory map at a specified address so
it's not possible to register nodes with memory beyond the e820 upper
bound.

unparse_node() is only called when then node had memory associated with
it, although with the mem= option it is no longer addressable.

[ Impact: fix boot hang on certain (large) systems ]
Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <alpine.DEB.2.00.0905051248150.20021@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7eccf7b2

05 5月, 2009 1 次提交

x86: show number of core_siblings instead of thread_siblings in /proc/cpuinfo · 35d11680

由 Andreas Herrmann 提交于 5月 04, 2009

Commit 7ad728f9
(cpumask: x86: convert cpu_sibling_map/cpu_core_map to cpumask_var_t)
changed the output of /proc/cpuinfo for siblings:

Example on an AMD Phenom:

  physical id   : 0
  siblings : 1
  core id	   : 3
  cpu cores  : 4

Before that commit it was:

  physical id	: 0
  siblings : 4
  core id	   : 3
  cpu cores  : 4

Instead of cpu_core_mask it now uses cpu_sibling_mask to count siblings.
This is due to the following hunk of above commit:

|  --- a/arch/x86/kernel/cpu/proc.c
|  +++ b/arch/x86/kernel/cpu/proc.c
|  @@ -14,7 +14,7 @@ static void show_cpuinfo_core(struct seq_file *m, struct cpuinf
|          if (c->x86_max_cores * smp_num_siblings > 1) {
|                  seq_printf(m, "physical id\t: %d\n", c->phys_proc_id);
|                  seq_printf(m, "siblings\t: %d\n",
|  -                          cpus_weight(per_cpu(cpu_core_map, cpu)));
|  +                          cpumask_weight(cpu_sibling_mask(cpu)));
|                  seq_printf(m, "core id\t\t: %d\n", c->cpu_core_id);
|                  seq_printf(m, "cpu cores\t: %d\n", c->booted_cores);
|                  seq_printf(m, "apicid\t\t: %d\n", c->apicid);

This was a mistake, because the impact line shows that this side-effect
was not anticipated:

   Impact: reduce per-cpu size for CONFIG_CPUMASK_OFFSTACK=y

So revert the respective hunk to restore the old behavior.

[ Impact: fix sibling-info regression in /proc/cpuinfo ]
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090504182859.GA29045@alberich.amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

35d11680

04 5月, 2009 1 次提交

amd-iommu: fix iommu flag masks · 6da7342f

由 Joerg Roedel 提交于 5月 04, 2009

The feature bits should be set via bitmasks, not via feature IDs.

[ Impact: fix feature enabling in newer IOMMU versions ]
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20090504102028.GA30307@amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6da7342f

02 5月, 2009 1 次提交

x86: initialize io_bitmap_base on 32bit · f9a196b8

由 Thomas Gleixner 提交于 5月 01, 2009

commit db949bba (x86-32: use non-lazy
io bitmap context switching) broke ioperm for 32bit because it removed
the lazy initialization of io_bitmap_base and did not set it to the
real bitmap offset.

[ Impact: fix non-working sys_ioperm() on 32-bit kernels ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f9a196b8

30 4月, 2009 1 次提交

x86: gettimeofday() vDSO: fix segfault when tv == NULL · 2f65dd47

由 John Wright 提交于 4月 29, 2009

According to the gettimeofday(2) manual:

       If either tv or tz is NULL, the corresponding structure is not
       set or returned.

Since it is legal to give NULL as the tv argument, the code should make
sure tv is not NULL before trying to dereference it.

This issue manifests itself on x86_64 when vdso=0 is not on the kernel
command-line and libc uses the vDSO for gettimeofday() (e.g. glibc >=
2.7).  A simple reproducer:

  #include <stdio.h>
  #include <sys/time.h>

  int main(void)
  {
      struct timezone tz;

      gettimeofday(NULL, &tz);

      return 0;
  }

See http://bugs.debian.org/466491 for more details.

[ Impact: fix gettimeofday(NULL, &tz) segfault ]
Signed-off-by: NJohn Wright <john.wright@hp.com>
Cc: Andi Kleen <ak@suse.de>
Cc: John Wright <john.wright@hp.com>
LKML-Reference: <1241037121-14805-1-git-send-email-john.wright@hp.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2f65dd47

29 4月, 2009 1 次提交

tracing: x86, mmiotrace: fix range test · 33015c85

由 Stuart Bennett 提交于 4月 28, 2009

Matching on (addr == (p->addr + p->len)) causes problems when mappings
are adjacent.

[ Impact: fix mmiotrace confusion on adjacent iomaps ]
Signed-off-by: NStuart Bennett <stuart@freedesktop.org>
Acked-by: NPekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1240946271-7083-2-git-send-email-stuart@freedesktop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

33015c85

24 4月, 2009 1 次提交

x86, hpet: Stop soliciting hpet=force users on ICH4M · d2c86041

由 Len Brown 提交于 4月 23, 2009

The HPET in the ICH4M is not documented in the data sheet
because it was not officially validated.

While it is fine for hackers to continue to use "hpet=force"
to enable the hardware that they have, it is not prudent to
solicit additional "hpet=force" users on this hardware.

[ Impact: remove hpet=force syslog message on old-ICH systems ]
Signed-off-by: NLen Brown <len.brown@intel.com>
Acked-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <alpine.LFD.2.00.0904231918510.15843@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d2c86041

23 4月, 2009 7 次提交

x86: check boundary in setup_node_bootmem() · 4c31e92b

由 Yinghai Lu 提交于 4月 22, 2009

Commit dc098551 ("x86/uv: fix init of memory-less nodes") causes a
two sockets system (where node-1 doesn't have RAM installed) to crash.

That commit makes node_possible include cpu nodes that do not have memory.
So check boundary in setup_node_bootmem().

[ Impact: fix boot crash on RAM-less NUMA node system ]
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: Jack Steiner <steiner@sgi.com>
LKML-Reference: <49EF89DF.9090404@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4c31e92b

x86/PCI: don't bother with root quirks if _CRS is used · b10ceb55

由 Yinghai Lu 提交于 4月 20, 2009

It will be overwriten later if _CRS is used, so don't bother to set it.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

b10ceb55

x86/PCI: set_pci_bus_resources_arch_default cleanups · 0e94ecd0

由 Yinghai Lu 提交于 4月 18, 2009

Rename set_pci_bus_resources_arch_default to x86_pci_root_bus_res_quirks, move
the weak version from common.c to i386.c, and before calling, make sure it's a
root bus.
Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

0e94ecd0

x86/PCI: Move set_pci_bus_resources_arch_default into arch/x86 · 0bb1be3e

由 Matthew Wilcox 提交于 4月 16, 2009

Commit 30a18d6c introduced a new
function to set the PCI bus resources.  Unfortunately, neither the
author, nor the committers seemed to know that we already have somewhere
to do that -- pcibios_fixup_bus().  This patch moves the hook (used only
by the K8 code) into x86-specific code where it should have been in the
first place.

Cc: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

0bb1be3e

x86/PCI: don't call e820_all_mapped with -1 in the mmconfig case · 044cd809

由 Yinghai Lu 提交于 4月 18, 2009

e820_all_mapped need end is (addr + size) instead of (addr + size - 1)

Cc: stable@kernel.org
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

044cd809

x86, mce: fix boot logging logic · 5679af4c

由 Andi Kleen 提交于 4月 07, 2009

The earlier patch to change the poller to a separate function subtly
broke the boot logging logic. This could lead to machine checks
getting logged at boot even when disabled or defaulting to off
on some systems. Fix that.

[ Impact: bug fix - avoid spurious MCE in log ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

5679af4c

x86, mce: make polling timer interval per CPU · 6298c512

由 Andi Kleen 提交于 4月 09, 2009

The polling timer while running per CPU still uses a global next_interval
variable, which lead to some CPUs either polling too fast or too slow.   
This was not a serious problem because all errors get picked up eventually,
but it's still better to avoid it. Turn next_interval into a per cpu variable.

v2: Fix check_interval == 0 case (Hidetoshi Seto)

[ Impact: minor bug fix ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

6298c512

22 4月, 2009 8 次提交

uv_time: add parameter to uv_read_rtc() · c5428e95

由 Coly Li 提交于 4月 22, 2009

uv_read_rtc() is referenced by read member of struct clocksource clocksource_uv.
In include/linux/clocksource.h, read of struct clocksource is declared as:
cycle_t (*read)(struct clocksource *cs)

This got introduced recently in:

   8e19608e: clocksource: pass clocksource to read() callback

But arch/x86/kernel/uv_time.c was not properly converted by that pach.

This patch adds a dummy parameter (struct clocksource type) to uv_read_rtc() to
fix the incompatible reference in clocksource_uv, and add a NULL parameter in
all places where uv_read_rtc() gets called.

[ Impact: cleanup, address compiler warning ]
Signed-off-by: NColy Li <coly.li@suse.de>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Cc: Magnus Damm <damm@igel.co.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hugh@veritas.com>
LKML-Reference: <49EF3614.1050806@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Dimitri Sivanich <sivanich@sgi.com>

c5428e95

x86: hpet: fix periodic mode programming on AMD 81xx · 7a6f9cbb

由 Andreas Herrmann 提交于 4月 21, 2009

(See http://bugzilla.kernel.org/show_bug.cgi?id=12961)

It partially reverts commit c23e253e
(x86: hpet: stop HPET_COUNTER when programming periodic mode)

HPET on AMD 81xx chipset needs a second write (with HPET_TN_SETVAL
cleared) to T0_CMP register to set the period in periodic mode.

With this patch HPET_COUNTER is still stopped but not reset when HPET
is programmed in periodic mode. This should help to avoid races when
HPET is programmed in periodic mode and fixes a boot time hang that
I've observed on a machine when using 1000HZ.

[ Impact: fix boot time hang on machines with AMD 81xx chipset ]
Reported-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: NJeff Mahoney <jeffm@suse.com>
LKML-Reference: <20090421180037.GA2763@alberich.amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7a6f9cbb

KVM: Unregister cpufreq notifier on unload · 888d256e

由 Jan Kiszka 提交于 4月 17, 2009

Properly unregister cpufreq notifier on onload if it was registered
during init.
Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

888d256e

KVM: x86: release time_page on vcpu destruction · 7f1ea208

由 Joerg Roedel 提交于 2月 25, 2009

Not releasing the time_page causes a leak of that page or the compound
page it is situated in.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

7f1ea208

KVM: MMU: disable global page optimization · bf47a760

由 Marcelo Tosatti 提交于 4月 05, 2009

Complexity to fix it not worthwhile the gains, as discussed
in http://article.gmane.org/gmane.comp.emulators.kvm.devel/28649.

Cc: stable@kernel.org
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf47a760

FRV: Fix the section attribute on UP DECLARE_PER_CPU() · 9b8de747

由 David Howells 提交于 4月 21, 2009

In non-SMP mode, the variable section attribute specified by DECLARE_PER_CPU()
does not agree with that specified by DEFINE_PER_CPU(). This means that
architectures that have a small data section references relative to a base
register may throw up linkage errors due to too great a displacement between
where the base register points and the per-CPU variable.

On FRV, the .h declaration says that the variable is in the .sdata section, but
the .c definition says it's actually in the .data section. The linker throws
up the following errors:

kernel/built-in.o: In function `release_task':
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o
kernel/exit.c:78: relocation truncated to fit: R_FRV_GPREL12 against symbol `per_cpu__process_counts' defined in .data section in kernel/built-in.o

To fix this, DECLARE_PER_CPU() should simply apply the same section attribute
as does DEFINE_PER_CPU(). However, this is made slightly more complex by
virtue of the fact that there are several variants on DEFINE, so these need to
be matched by variants on DECLARE.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b8de747

x86: more than 8 32-bit CPUs requires X86_BIGSMP · 2a3313f4

由 Michael K. Johnson 提交于 4月 21, 2009

$ cat x86-more-than-8-cpus-requires-bigsmp.patch

Enforce NR_CPUS <= 8 limitation if X86_BIGSMP not set

Configuring more than 8 logical CPUs on 32-bit x86 requires
X86_BIGSMP to be set in order to boot successfully, if more than 8
logical CPUs are actually found at boot time.  The X86_BIGSMP help
text describes that it is required to be set if more than 8 CPUs
are configured, but this was previously not enforced.

This configuration error has affected multiple distributions:
    https://bugzilla.redhat.com/show_bug.cgi?id=480844
    https://issues.rpath.com/browse/RPL-3022Signed-off-by: NMichael K Johnson <johnsonm@rpath.com>
LKML-Reference: <20090422014448.GB32541@logo.rdu.rpath.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

2a3313f4

clocksource: pass clocksource to read() callback · 8e19608e

由 Magnus Damm 提交于 4月 21, 2009

Pass clocksource pointer to the read() callback for clocksources.  This
allows us to share the callback between multiple instances.

[hugh@veritas.com: fix powerpc build of clocksource pass clocksource mods]
[akpm@linux-foundation.org: cleanup]
Signed-off-by: NMagnus Damm <damm@igel.co.jp>
Acked-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e19608e

21 4月, 2009 5 次提交

x86: avoid theoretical spurious NMI backtraces with CONFIG_CPUMASK_OFFSTACK=y · fcc5c4a2

由 Rusty Russell 提交于 4月 21, 2009

In theory (though not shown in practice) alloc_cpumask_var() doesn't zero
memory, so CPUs might print an "NMI backtrace for cpu %d" once on boot.

(Bug introduced in fcef8576).

[ Impact: avoid theoretical syslog noise in rare configs ]
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <alpine.DEB.2.00.0904202113520.10097@gandalf.stny.rr.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fcc5c4a2

x86: fix boot crash in NMI watchdog with CONFIG_CPUMASK_OFFSTACK=y and flat APIC · 2f537a9f

由 Rusty Russell 提交于 4月 21, 2009

fcef8576 converted backtrace_mask to a
cpumask_var_t, and assumed check_nmi_watchdog was called before
nmi_watchdog_tick was ever called.  Steven's oops shows I was wrong.

This is something of a bandaid: I'm not sure we *should* be calling
nmi_watchdog_tick before check_nmi_watchdog.  Note that gcc eliminates
this test for the CONFIG_CPUMASK_OFFSTACK=n case.

[ Impact: fix boot crash in rare configs ]
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <alpine.DEB.2.00.0904202113520.10097@gandalf.stny.rr.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2f537a9f

Separate out common fstatat code into vfs_fstatat · 0112fc22

由 Oleg Drokin 提交于 4月 08, 2009

This is a version incorporating Christoph's suggestion.

Separate out common *fstatat functionality into a single function
instead of duplicating it all over the code.
Signed-off-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0112fc22

x86-64: fix FPU corruption with signals and preemption · 06c38d5e

由 Suresh Siddha 提交于 4月 09, 2009

In 64bit signal delivery path, clear_used_math() was happening before saving
the current active FPU state on to the user stack for signal handling. Between
clear_used_math() and the state store on to the user stack, potentially we
can get a page fault for the user address and can block. Infact, while testing
we were hitting the might_fault() in __clear_user() which can do a schedule().

At a later point in time, we will schedule back into this process and
resume the save state (using "xsave/fxsave" instruction) which can lead
to DNA fault. And as used_math was cleared before, we will reinit the FP state
in the DNA fault and continue. This reinit will result in loosing the
FPU state of the process.

Move clear_used_math() to a point after the FPU state has been stored
onto the user stack.

This issue is present from a long time (even before the xsave changes
and the x86 merge). But it can easily be exposed in 2.6.28.x and 2.6.29.x
series because of the __clear_user() in this path, which has an explicit
__cond_resched() leading to a context switch with CONFIG_PREEMPT_VOLUNTARY.

[ Impact: fix FPU state corruption ]
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org>			[2.6.28.x, 2.6.29.x]
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

06c38d5e

x86/uv: fix for no memory at paddr 0 · fc61e663

由 Jack Steiner 提交于 4月 20, 2009

Fix endcase where the memory at physical address 0 does not really
exist AND one of the sockets on blade 0 has no active cpus.

The memory that _appears_ to be at physical address 0 is actually
memory that located at a different address but has been remapped by
the chipset so that it appears to be at physical address 0.

When determining the UV pnode, the algorithm for determining the pnode
incorrectly used the relocated physical address instead of the actual
(global) address.

[ Impact: boot failure on partitioned systems ]
Signed-off-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <20090420132530.GA23156@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fc61e663

20 4月, 2009 4 次提交

acpi-cpufreq: Do not let get_measured perf depend on internal variable · d876dfbb

由 Thomas Renninger 提交于 4月 17, 2009

Take already available policy->cpuinfo.max_freq and get rid of acpi-cpufreq
specific max_freq variable.

This implies that P0 is always the highest frequency which should always
be true as ACPI spec says:
As a result, the zeroth entry describes the highest performance state
Signed-off-by: NThomas Renninger <trenn@suse.de>
Acked-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

d876dfbb

T
acpi-cpufreq: style-only: add parens to math expression · d91758f5
由 Thomas Renninger 提交于 4月 17, 2009
```
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NLen Brown <len.brown@intel.com>
```
d91758f5

acpi-cpufreq: Cleanup: Use printk_once · e0e8c4e5

由 Thomas Renninger 提交于 4月 17, 2009

Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NLen Brown <len.brown@intel.com>

e0e8c4e5

x86, acpi_cpufreq: Fix the NULL pointer dereference in get_measured_perf · 093f13e2

由 Pallipadi, Venkatesh 提交于 4月 15, 2009

Fix for a regression that was introduced by earlier commit
18b2646f on Mon Apr 6 11:26:08 2009

Regression resulted in the below error happened on systems with
software coordination where per_cpu acpi data will not be initiated for
secondary CPUs in a P-state domain.

On Tue, 2009-04-14 at 23:01 -0700, Zhang, Yanmin wrote:
 My machine hanged with kernel 2.6.30-rc2 when script read
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor.
>
> opps happens in get_measured_perf:
>
>         cur.aperf.whole = readin.aperf.whole -
>                                 per_cpu(drv_data, cpu)->saved_aperf;
>
> Because per_cpu(drv_data, cpu)=NULL.
>
> So function get_measured_perf should check if (per_cpu(drv_data,
> cpu)==NULL)
> and return 0 if it's NULL.

--------------sys log------------------

BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
IP: [<ffffffff8021af75>] get_measured_perf+0x4a/0xf9
PGD a7dd88067 PUD a7ccf5067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
CPU 0
Modules linked in: video output
Pid: 2091, comm: kondemand/0 Not tainted 2.6.30-rc2 #1 MP Server
RIP: 0010:[<ffffffff8021af75>]  [<ffffffff8021af75>]
get_measured_perf+0x4a/0xf9
RSP: 0018:ffff880a7d56de20  EFLAGS: 00010246
RAX: 0000000000000000 RBX: 00000046241a42b6 RCX: ffff88004d219000
RDX: 000000000000b660 RSI: 0000000000000020 RDI: 0000000000000001
RBP: ffff880a7f052000 R08: 00000046241a42b6 R09: ffffffff807639f0
R10: 00000000ffffffea R11: ffffffff802207f4 R12: ffff880a7f052000
R13: ffff88004d20e460 R14: 0000000000ddd5a6 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff88004d200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 0000000a7f1bf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kondemand/0 (pid: 2091, threadinfo ffff880a7d56c000, task
ffff880a7d4d18c0)
Stack:
 ffff880a7f052078 ffffffff803efd54 00000046241a42b6 000000462ffa9e95
 0000000000000001 0000000000000001 00000000ffffffea ffffffff8064f41a
 0000000000000012 0000000000000012 ffff880a7f052000 ffffffff80650547
Call Trace:
 [<ffffffff803efd54>] ? kobject_get+0x12/0x17
 [<ffffffff8064f41a>] ? __cpufreq_driver_getavg+0x42/0x57
 [<ffffffff80650547>] ? do_dbs_timer+0x147/0x272
 [<ffffffff80650400>] ? do_dbs_timer+0x0/0x272
 [<ffffffff802474ca>] ? worker_thread+0x15b/0x1f5
 [<ffffffff8024a02c>] ? autoremove_wake_function+0x0/0x2e
 [<ffffffff8024736f>] ? worker_thread+0x0/0x1f5
 [<ffffffff80249f0d>] ? kthread+0x54/0x83
 [<ffffffff8020c87a>] ? child_rip+0xa/0x20
 [<ffffffff80249eb9>] ? kthread+0x0/0x83
 [<ffffffff8020c870>] ? child_rip+0x0/0x20
Code: 99 a6 03 00 31 c9 85 c0 0f 85 c3 00 00 00 89 df 4c 8b 44 24 10 48
c7 c2 60 b6 00 00 48 8b 0c fd e0 30 a5 80 4c 89 c3 48 8b 04 0a <48> 2b
58 20 48 8b 44 24 18 48 89 1c 24 48 8b 34 0a 48 2b 46 28
RIP  [<ffffffff8021af75>] get_measured_perf+0x4a/0xf9
 RSP <ffff880a7d56de20>
CR2: 0000000000000020
---[ end trace 2b8fac9a49e19ad4 ]---
Tested-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

093f13e2

19 4月, 2009 1 次提交

lguest: fix guest crash on non-linear addresses in gdt pvops · a489f0b5

由 Rusty Russell 提交于 4月 19, 2009

Fixes guest crash 'lguest: bad read address 0x4800000 len 256'

The new per-cpu allocator ends up handing a non-linear address to
write_gdt_entry.  We do __pa() on it, and hand it to the host, which
kills us.

I've long wanted to make the hypercall "LOAD_GDT_ENTRY" to match the IDT
code, but had no pressing reason until now.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org

a489f0b5

18 4月, 2009 2 次提交

lockdep, x86: account for irqs enabled in paranoid_exit · 0300e7f1

由 Steven Rostedt 提交于 4月 17, 2009

I hit the check_flags error of lockdep:

 WARNING: at kernel/lockdep.c:2893 check_flags+0x1a7/0x1d0()
 [...]
 hardirqs last  enabled at (12567): [<ffffffff8026206a>] local_bh_enable+0xaa/0x110
 hardirqs last disabled at (12569): [<ffffffff80610c76>] int3+0x16/0x40
 softirqs last  enabled at (12566): [<ffffffff80514d2b>] lock_sock_nested+0xfb/0x110
 softirqs last disabled at (12568): [<ffffffff8058454e>] tcp_prequeue_process+0x2e/0xa0

The check_flags warning of lockdep tells me that lockdep thought interrupts
were disabled, but they were really enabled.

The numbers in the above parenthesis show the order of events:

 12566: softirqs last enabled:  lock_sock_nested
 12567: hardirqs last enabled:  local_bh_enable
 12568: softirqs last disabled: tcp_prequeue_process
 12566: hardirqs last disabled: int3

int3 is a breakpoint!

Examining this further, I have CONFIG_NET_TCPPROBE enabled which adds
break points into the kernel.

The paranoid_exit of the return of int3 does not account for enabling
interrupts on return to kernel. This code is a bit tricky since it
is also used by the nmi handler (when lockdep is off), and we must be
careful about the swapgs. We can not call kernel code after the swapgs
has been performed.

[ Impact: fix lockdep check_flags warning + self-turn-off ]
Acked-by: NPeter Zijlsta <a.p.zijlstra@chello.nl>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0300e7f1

x86: mm/numa_32.c calculate_numa_remap_pages should use __init · a81b6314

由 Jaswinder Singh Rajput 提交于 4月 17, 2009

calculate_numa_remap_pages() is called only by __init initmem_init()
further calculate_numa_remap_pages is calling:
	__init find_e820_area() and __init reserve_early()

So calculate_numa_remap_pages() should be __init calculate_numa_remap_pages().

 WARNING: arch/x86/built-in.o(.text+0x82ea3): Section mismatch in reference from the function calculate_numa_remap_pages() to the function .init.text:find_e820_area()
 The function calculate_numa_remap_pages() references
 the function __init find_e820_area().
 This is often because calculate_numa_remap_pages lacks a __init
 annotation or the annotation of find_e820_area is wrong.

 WARNING: arch/x86/built-in.o(.text+0x82f5f): Section mismatch in reference from the function calculate_numa_remap_pages() to the function .init.text:reserve_early()
 The function calculate_numa_remap_pages() references
 the function __init reserve_early().
 This is often because calculate_numa_remap_pages lacks a __init
 annotation or the annotation of reserve_early is wrong.

[ Impact: save memory, address Section mismatch warning ]
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
LKML-Reference: <1239991281.3153.4.camel@ht.satnam>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a81b6314