提交 · b52bfee445d315549d41eacf2fa7c156e7d153d5 · openeuler / raspberrypi-kernel

12 10月, 2010 1 次提交

x86, numa: For each node, register the memory blocks actually used · 73cf624d

由 Yinghai Lu 提交于 10月 10, 2010

Russ reported SGI UV is broken recently. He said:

| The SRAT table shows that memory range is spread over two nodes.
|
| SRAT: Node 0 PXM 0 100000000-800000000
| SRAT: Node 1 PXM 1 800000000-1000000000
| SRAT: Node 0 PXM 0 1000000000-1080000000
|
|Previously, the kernel early_node_map[] would show three entries
|with the proper node.
|
|[    0.000000]     0: 0x00100000 -> 0x00800000
|[    0.000000]     1: 0x00800000 -> 0x01000000
|[    0.000000]     0: 0x01000000 -> 0x01080000
|
|The problem is recent community kernel early_node_map[] shows
|only two entries with the node 0 entry overlapping the node 1
|entry.
|
|    0: 0x00100000 -> 0x01080000
|    1: 0x00800000 -> 0x01000000

After looking at the changelog, Found out that it has been broken for a while by
following commit

|commit 8716273c
|Author: David Rientjes <rientjes@google.com>
|Date:   Fri Sep 25 15:20:04 2009 -0700
|
|    x86: Export srat physical topology

Before that commit, register_active_regions() is called for every SRAT memory
entry right away.

Use nodememblk_range[] instead of nodes[] in order to make sure we
capture the actual memory blocks registered with each node.  nodes[]
contains an extended range which spans all memory regions associated
with a node, but that does not mean that all the memory in between are
included.
Reported-by: NRuss Anderson <rja@sgi.com>
Tested-by: NRuss Anderson <rja@sgi.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4CB27BDF.5000800@kernel.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

73cf624d

11 10月, 2010 3 次提交

KVM: x86: Move TSC reset out of vmcb_init · 47008cd8

由 Zachary Amsden 提交于 8月 19, 2010

The VMCB is reset whenever we receive a startup IPI, so Linux is setting
TSC back to zero happens very late in the boot process and destabilizing
the TSC. Instead, just set TSC to zero once at VCPU creation time.

Why the separate patch? So git-bisect is your friend.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

47008cd8

KVM: x86: Fix SVM VMCB reset · 58877679

由 Zachary Amsden 提交于 8月 19, 2010

On reset, VMCB TSC should be set to zero.  Instead, code was setting
tsc_offset to zero, which passes through the underlying TSC.
Signed-off-by: NZachary Amsden <zamsden@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

58877679

x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order · 6dcbfe4f

由 Borislav Petkov 提交于 10月 08, 2010

This fixes possible cases of not collecting valid error info in
the MCE error thresholding groups on F10h hardware.

The current code contains a subtle problem of checking only the
Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
thresholding group) in its first iteration and breaking out if
the bit is cleared.

But (!), this MSR contains an offset value, BlkPtr[31:24], which
points to the remaining MSRs in this thresholding group which
might contain valid information too. But if we bail out only
after we checked the valid bit in the first MSR and not the
block pointer too, we miss that other information.

The thing is, MC4_MISC0[BlkPtr] is not predicated on
MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
prior to iterating over the MCI_MISCj thresholding group,
irrespective of the MC4_MISC0[Valid] setting.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6dcbfe4f

08 10月, 2010 1 次提交

x86, mce, therm_throt.c: Fix missing curly braces in error handling logic · b62be8ea

由 Jin Dongming 提交于 8月 26, 2010

When the feature PTS is not supported by CPU, the sysfile
package_power_limit_count for package should not be
generated.

This patch is used for fixing missing { and }.

The patch is not complete as there are other error handling
problems in this function - but that can wait until the
merge window.
Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: NFenghua Yu <fenghua.yu@initel.com>
Acked-by: NJean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b62be8ea

06 10月, 2010 1 次提交

modules: Fix module_bug_list list corruption race · 5336377d

由 Linus Torvalds 提交于 10月 05, 2010

With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.

However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling.  That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.

Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.

So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.

Future fixups:
 - move the module list handling code into kernel/module.c where it
   belongs.
 - get rid of 'module_bug_list' and just use the regular list of modules
   (called 'modules' - imagine that) that we already create and maintain
   for other reasons.
Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5336377d

05 10月, 2010 1 次提交

xen: do not initialize PV timers on HVM if !xen_have_vector_callback · 31e7e931

由 Stefano Stabellini 提交于 10月 01, 2010

if !xen_have_vector_callback do not initialize PV timer unconditionally
because we still don't know how many cpus are available and if there is
more than one we won't be able to receive the timer interrupts on
cpu > 0.

This patch fixes an hang at boot when Xen does not support vector
callbacks and the guest has multiple vcpus.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>

31e7e931

01 10月, 2010 4 次提交

x86, hpet: Fix bogus error check in hpet_assign_irq() · 02198962

由 Thomas Gleixner 提交于 9月 28, 2010

create_irq() returns -1 if the interrupt allocation failed, but the
code checks for irq == 0.

Use create_irq_nr() instead.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6>
Cc: stable@kernel.org
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

02198962

x86, irq: Plug memory leak in sparse irq · 1cf180c9

由 Thomas Gleixner 提交于 9月 28, 2010

free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this
triggers a use after free caused by the fact that copying struct
irq_cfg is done with memcpy, which copies the pointer not the cpumask.

Fix both places.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1cf180c9

[CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc · 36829306

由 Pekka Enberg 提交于 9月 30, 2010

If acpi_evaluate_object() function call doesn't fail, we must kfree()
output.buffer before returning from pcc_cpufreq_do_osc().
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NDave Jones <davej@redhat.com>

36829306

[CPUFREQ] acpi-cpufreq: add missing __percpu markup · 86cf1474

由 Namhyung Kim 提交于 8月 13, 2010

acpi_perf_data is a percpu pointer but was missing __percpu markup.
Add it.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDave Jones <davej@redhat.com>

86cf1474

30 9月, 2010 1 次提交

perf, x86: Handle in flight NMIs on P4 platform · 03e22198

由 Cyrill Gorcunov 提交于 9月 29, 2010

Stephane reported we've forgot to guard the P4 platform
against spurious in-flight performance IRQs. Fix it.

This fixes potential spurious 'dazed and confused' NMI
messages.
Reported-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: fweisbec@gmail.com
Cc: peterz@infradead.org
Cc: Robert Richter <robert.richter@amd.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

03e22198

29 9月, 2010 2 次提交

ACPI: add missing __percpu markup in arch/x86/kernel/acpi/cstate.c · bd126b23

由 Namhyung Kim 提交于 8月 08, 2010

cpu_cstate_entry is a percpu pointer
but was missing __percpu markup.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

bd126b23

x86, cpu: After uncapping CPUID, re-run CPU feature detection · d900329e

由 H. Peter Anvin 提交于 9月 28, 2010

After uncapping the CPUID level, we need to also re-run the CPU
feature detection code.

This resolves kernel bugzilla 16322.
Reported-by: Nboris64 <bugzilla.kernel.org@boris64.net>
Cc: <stable@kernel.org> v2.6.29..2.6.35
LKML-Reference: <tip-@git.kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d900329e

27 9月, 2010 1 次提交

x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile · c9e2fbd9

由 Alexander Chumachenko 提交于 4月 01, 2010

While debugging bit_spin_lock() hang, it was tracked down to gcc-4.4
misoptimization of non-inlined constant_test_bit() due to non-volatile
addr when 'const volatile unsigned long *addr' cast to 'unsigned long *'
with subsequent unconditional jump to pause (and not to the test) leading
to hang.

Compiling with gcc-4.3 or disabling CONFIG_OPTIMIZE_INLINING yields inlined
constant_test_bit() and correct jump, thus working around the kernel bug.

Other arches than asm-x86 may implement this slightly differently;
2.6.29 mitigates the misoptimization by changing the function prototype
(commit c4295fbb) but probably fixing the issue
itself is better.
Signed-off-by: NAlexander Chumachenko <ledest@gmail.com>
Signed-off-by: NMichael Shigorin <mike@osdn.org.ua>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c9e2fbd9

25 9月, 2010 1 次提交

x86/hwmon: fix initialization of coretemp · a4659053

由 Jan Beulich 提交于 9月 23, 2010

Using cpuid_eax() to determine feature availability on other than
the current CPU is invalid. And feature availability should also be
checked in the hotplug code path.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com>

a4659053

24 9月, 2010 1 次提交

perf, x86: Catch spurious interrupts after disabling counters · 63e6be6d

由 Robert Richter 提交于 9月 15, 2010

Some cpus still deliver spurious interrupts after disabling a
counter. This caused 'undelivered NMI' messages. This patch
fixes this. Introduced by:

  4177c42a: perf, x86: Try to handle unknown nmis with an enabled PMU
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: gorcunov@gmail.com <gorcunov@gmail.com>
Cc: fweisbec@gmail.com <fweisbec@gmail.com>
Cc: ying.huang@intel.com <ying.huang@intel.com>
Cc: ming.m.lin@intel.com <ming.m.lin@intel.com>
Cc: yinghai@kernel.org <yinghai@kernel.org>
Cc: andi@firstfloor.org <andi@firstfloor.org>
Cc: eranian@google.com <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100915162034.GO13563@erda.amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

63e6be6d

23 9月, 2010 5 次提交

x86/amd-iommu: Fix rounding-bug in __unmap_single · 04e0463e

由 Joerg Roedel 提交于 9月 23, 2010

In the __unmap_single function the dma_addr is rounded down
to a page boundary before the dma pages are unmapped. The
address is later also used to flush the TLB entries for that
mapping. But without the offset into the dma page the amount
of pages to flush might be miscalculated in the TLB flushing
path. This patch fixes this bug by using the original
address to flush the TLB.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

04e0463e

x86/amd-iommu: Work around S3 BIOS bug · 4c894f47

由 Joerg Roedel 提交于 9月 23, 2010

This patch adds a workaround for an IOMMU BIOS problem to
the AMD IOMMU driver. The result of the bug is that the
IOMMU does not execute commands anymore when the system
comes out of the S3 state resulting in system failure. The
bug in the BIOS is that is does not restore certain hardware
specific registers correctly. This workaround reads out the
contents of these registers at boot time and restores them
on resume from S3. The workaround is limited to the specific
IOMMU chipset where this problem occurs.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

4c894f47

x86/amd-iommu: Set iommu configuration flags in enable-loop · e9bf5197

由 Joerg Roedel 提交于 9月 20, 2010

This patch moves the setting of the configuration and
feature flags out out the acpi table parsing path and moves
it into the iommu-enable path. This is needed to reliably
fix resume-from-s3.

Cc: stable@kernel.org
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>

e9bf5197

tracing/x86: Don't use mcount in kvmclock.c · 258af474

由 Steven Rostedt 提交于 9月 22, 2010

The guest can use the paravirt clock in kvmclock.c which is used
by sched_clock(), which in turn is used by the tracing mechanism
for timestamps, which leads to infinite recursion.

Disable mcount/tracing for kvmclock.o.

Cc: stable@kernel.org
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

258af474

tracing/x86: Don't use mcount in pvclock.c · 9ecd4e16

由 Jeremy Fitzhardinge 提交于 9月 22, 2010

When using a paravirt clock, pvclock.c can be used by sched_clock(),
which in turn is used by the tracing mechanism for timestamps,
which leads to infinite recursion.

Disable mcount/tracing for pvclock.o.

Cc: stable@kernel.org
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
LKML-Reference: <4C9A9A3F.4040201@goop.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9ecd4e16

22 9月, 2010 2 次提交

x86, setup: Fix earlyprintk=serial,0x3f8,115200 · 74b3c444

由 Yinghai Lu 提交于 8月 29, 2010

earlyprintk can take and I/O port, so we need to handle this case in
the setup code too, otherwise 0x3f8 will be treated as a baud rate.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C7B05A6.4010801@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

74b3c444

x86, setup: Fix earlyprintk=serial,ttyS0,115200 · 83d9f65b

由 Yinghai Lu 提交于 8月 29, 2010

Torsten reported that there is garbage output,
after commit 8fee13a4 (x86,
setup: enable early console output from the decompressor)

It turns out we missed the offset for that case.
Reported-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C7B0578.8090807@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

83d9f65b

21 9月, 2010 2 次提交

oprofile: Add Support for Intel CPU Family 6 / Model 29 · bb7ab785

由 Jiri Olsa 提交于 9月 21, 2010

This patch adds CPU type detection for dunnington processor (Family 6
/ Model 29) to be identified as core 2 family cpu type (wikipedia
source).

I tested oprofile on Intel(R) Xeon(R) CPU E7440 reporting itself as
model 29, and it runs without an issue.

Spec:

 http://www.intel.com/Assets/en_US/PDF/specupdate/320336.pdfSigned-off-by: NJiri Olsa <jolsa@redhat.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Cc: stable@kernel.org
Signed-off-by: NRobert Richter <robert.richter@amd.com>

bb7ab785

lguest: update comments to reflect LHCALL_LOAD_GDT_ENTRY. · 9b6efcd2

由 Rusty Russell 提交于 9月 21, 2010

We used to have a hypercall which reloaded the entire GDT, then we
switched to one which loaded a single entry (to match the IDT code).

Some comments were not updated, so fix them.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Reported by: Eviatar Khen <eviatarkhen@gmail.com>

9b6efcd2

17 9月, 2010 1 次提交

x86: Fix instruction breakpoint encoding · 89e45aac

由 Frederic Weisbecker 提交于 9月 17, 2010

Lengths and types of breakpoints are encoded in a half byte
into CPU registers. However when we extract these values
and store them, we add a high half byte part to them: 0x40 to the
length and 0x80 to the type.
When that gets reloaded to the CPU registers, the high part
is masked.

While making the instruction breakpoints available for perf,
I zapped that high part on instruction breakpoint encoding
and that broke the arch -> generic translation used by ptrace
instruction breakpoints. Writing dr7 to set an inst breakpoint
was then failing.

There is no apparent reason for these high parts so we could get
rid of them altogether. That's an invasive change though so let's
do that later and for now fix the problem by restoring that inst
breakpoint high part encoding in this sole patch.
Reported-by: NKelvie Wong <kelvie@ieee.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>

89e45aac

16 9月, 2010 1 次提交

oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) · c33f543d

由 Patrick Simmons 提交于 9月 08, 2010

This patch adds CPU type detection for the Intel Celeron 540, which is
part of the Core 2 family according to Wikipedia; the family and ID pair
is absent from the Volume 3B table referenced in the source code
comments.  I have tested this patch on an Intel Celeron 540 machine
reporting itself as Family 6 Model 22, and OProfile runs on the machine
without issue.

Spec:

 http://download.intel.com/design/mobile/SPECUPDT/317667.pdfSigned-off-by: NPatrick Simmons <linuxrocks123@netscape.net>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: stable@kernel.org
Signed-off-by: NRobert Richter <robert.richter@amd.com>

c33f543d

15 9月, 2010 4 次提交

x86-64, compat: Retruncate rax after ia32 syscall entry tracing · eefdca04

由 Roland McGrath 提交于 9月 14, 2010

In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a
32-bit tracee in system call entry.  A %rax value set via ptrace at the
entry tracing stop gets used whole as a 32-bit syscall number, while we
only check the low 32 bits for validity.

Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
in addition to testing the full 64 bits as has already been added.
Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

eefdca04

x86-64, compat: Test %rax for the syscall number, not %eax · 36d001c7

由 H. Peter Anvin 提交于 9月 14, 2010

On 64 bits, we always, by necessity, jump through the system call
table via %rax.  For 32-bit system calls, in theory the system call
number is stored in %eax, and the code was testing %eax for a valid
system call number.  At one point we loaded the stored value back from
the stack to enforce zero-extension, but that was removed in checkin
d4d67150.  An actual 32-bit process
will not be able to introduce a non-zero-extended number, but it can
happen via ptrace.

Instead of re-introducing the zero-extension, test what we are
actually going to use, i.e. %rax.  This only adds a handful of REX
prefixes to the code.
Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

36d001c7

compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5

由 H. Peter Anvin 提交于 9月 07, 2010

compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.
Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NChris Metcalf <cmetcalf@tilera.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NTony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@kernel.org>

c41d68a5

x86: hpet: Work around hardware stupidity · 54ff7e59

由 Thomas Gleixner 提交于 9月 14, 2010

This more or less reverts commits 08be9796 (x86: Force HPET
readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
read back to affected ATI chipsets) to the status of commit 8da854cb
(x86, hpet: Erratum workaround for read after write of HPET
comparator).

The delta to commit 8da854cb is mostly comments and the change from
WARN_ONCE to printk_once as we know the call path of this function
already.

This needs really in depth explanation:

First of all the HPET design is a complete failure. Having a counter
compare register which generates an interrupt on matching values
forces the software to do at least one superfluous readback of the
counter register.

While it is nice in theory to program "absolute" time events it is
practically useless because the timer runs at some absurd frequency
which can never be matched to real world units. So we are forced to
calculate a relative delta and this forces a readout of the actual
counter value, adding the delta and programming the compare
register. When the delta is small enough we run into the danger that
we program a compare value which is already in the past. Due to the
compare for equal nature of HPET we need to read back the counter
value after writing the compare rehgister (btw. this is necessary for
absolute timeouts as well) to make sure that we did not miss the timer
event. We try to work around that by setting the minimum delta to a
value which is larger than the theoretical time which elapses between
the counter readout and the compare register write, but that's only
true in theory. A NMI or SMI which hits between the readout and the
write can easily push us beyond that limit. This would result in
waiting for the next HPET timer interrupt until the 32bit wraparound
of the counter happens which takes about 306 seconds.

So we designed the next event function to look like:

   match = read_cnt() + delta;
   write_compare_ref(match);
   return read_cnt() < match ? 0 : -ETIME;

At some point we got into trouble with certain ATI chipsets. Even the
above "safe" procedure failed. The reason was that the write to the
compare register was delayed probably for performance reasons. The
theory was that they wanted to avoid the synchronization of the write
with the HPET clock, which is understandable. So the write does not
hit the compare register directly instead it goes to some intermediate
register which is copied to the real compare register in sync with the
HPET clock. That opens another window for hitting the dreaded "wait
for a wraparound" problem.

To work around that "optimization" we added a read back of the compare
register which either enforced the update of the just written value or
just delayed the readout of the counter enough to avoid the issue. We
unfortunately never got any affirmative info from ATI/AMD about this.

One thing is sure, that we nuked the performance "optimization" that
way completely and I'm pretty sure that the result is worse than
before some HW folks came up with those.

Just for paranoia reasons I added a check whether the read back
compare register value was the same as the value we wrote right
before. That paranoia check triggered a couple of years after it was
added on an Intel ICH9 chipset. Venki added a workaround (commit
8da854cb) which was reading the compare register twice when the first
check failed. We considered this to be a penalty in general and
restricted the readback (thus the wasted CPU cycles) to the known to
be affected ATI chipsets.

This turned out to be a utterly wrong decision. 2.6.35 testers
experienced massive problems and finally one of them bisected it down
to commit 30a564be which spured some further investigation.

Finally we got confirmation that the write to the compare register can
be delayed by up to two HPET clock cycles which explains the problems
nicely. All we can do about this is to go back to Venki's initial
workaround in a slightly modified version.

Just for the record I need to say, that all of this could have been
avoided if hardware designers and of course the HPET committee would
have thought about the consequences for a split second. It's out of my
comprehension why designing a working timer is so hard. There are two
ways to achieve it:

 1) Use a counter wrap around aware compare_reg <= counter_reg
    implementation instead of the easy compare_reg == counter_reg

    Downsides:

	- It needs more silicon.

	- It needs a readout of the counter to apply a relative
	  timeout. This is necessary as the counter does not run in
	  any useful (and adjustable) frequency and there is no
	  guarantee that the counter which is used for timer events is
	  the same which is used for reading the actual time (and
	  therefor for calculating the delta)

    Upsides:

	- None

  2) Use a simple down counter for relative timer events

    Downsides:

	- Absolute timeouts are not possible, which is not a problem
	  at all in the context of an OS and the expected
	  max. latencies/jitter (also see Downsides of #1)

   Upsides:

	- It needs less or equal silicon.

	- It works ALWAYS

	- It is way faster than a compare register based solution (One
	  write versus one write plus at least one and up to four
	  reads)

I would not be so grumpy about all of this, if I would not have been
ignored for many years when pointing out these flaws to various
hardware folks. I really hate timers (at least those which seem to be
designed by janitors).

Though finally we got a reasonable explanation plus a solution and I
want to thank all the folks involved in chasing it down and providing
valuable input to this.
Bisected-by: NNix <nix@esperi.org.uk>
Reported-by: NArtur Skawina <art.08.09@gmail.com>
Reported-by: NDamien Wyart <damien.wyart@free.fr>
Reported-by: NJohn Drescher <drescherjm@gmail.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: stable@kernel.org
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

54ff7e59

14 9月, 2010 2 次提交

x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y · 08c2b394

由 basile@opensource.dyc.edu 提交于 9月 13, 2010

The arch/x86/Makefile uses scripts/gcc-x86_$(BITS)-has-stack-protector.sh
to check if cc1 supports -fstack-protector.  When -fPIE is passed to cc1,
these scripts fail causing stack protection to be disabled even when it
is available.

This fix is similar to commit c47efe55Reported-by: NKai Dietrich <mail@cleeus.de>
Signed-off-by: NMagnus Granberg <zorry@gentoo.org>
LKML-Reference: <20100913101319.748A1148E216@opensource.dyc.edu>
Signed-off-by: NAnthony G. Basile <basile@opensource.dyc.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

08c2b394

x86, cpufeature: Suppress compiler warning with gcc 3.x · 2fd81864

由 Tetsuo Handa 提交于 8月 30, 2010

Gcc 3.x generates a warning

  arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has':
  arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraints

on each file.
But static_cpu_has() for gcc 3.x does not need __static_cpu_has().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
LKML-Reference: <201008300127.o7U1RC6Z044051@www262.sakura.ne.jp>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2fd81864

11 9月, 2010 2 次提交

x86, tsc: Fix a preemption leak in restore_sched_clock_state() · 55496c89

由 Peter Zijlstra 提交于 9月 10, 2010

Doh, a real life genuine preemption leak..

This caused a suspend failure.

Reported-bisected-and-tested-by-the-invaluable: Jeff Chua <jeff.chua.linux@gmail.com>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Nico Schottelius <nico-linux-20100709@schottelius.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Florian Pritz <flo@xssn.at>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: <stable@kernel.org> # Greg, please apply after: cd7240c0 ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from")
sleep states
LKML-Reference: <1284150773.402.122.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

55496c89

x86, tsc: Fix a preemption leak in restore_sched_clock_state() · 5ee5e97e

由 Peter Zijlstra 提交于 9月 10, 2010

A real life genuine preemption leak..
Reported-and-tested-by: NJeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ee5e97e

10 9月, 2010 1 次提交

x86, UV: Fix initialization of max_pnode · 36ac4b98

由 Jack Steiner 提交于 9月 10, 2010

Fix calculation of "max_pnode" for systems where the the highest
blade has neither cpus or memory. (And, yes, although rare this
does occur).
Signed-off-by: NJack Steiner <steiner@sgi.com>
LKML-Reference: <20100910150808.GA19802@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

36ac4b98

09 9月, 2010 3 次提交

KVM: i8259: fix migration · eebb5f31

由 Gleb Natapov 提交于 8月 30, 2010

Top of kvm_kpic_state structure should have the same memory layout as
kvm_pic_state since it is copied by memcpy.
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

eebb5f31

KVM: fix i8259 oops when no vcpus are online · ae0635b3

由 Avi Kivity 提交于 7月 27, 2010

If there are no vcpus, found will be NULL.  Check before doing anything with
it.
Signed-off-by: NAvi Kivity <avi@redhat.com>

ae0635b3

KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts · 16518d5a

由 Avi Kivity 提交于 8月 26, 2010

operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b
operands are 64-bit.

Fix by adding val64 and orig_val64 union members to struct operand, and
using them where needed.
Signed-off-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

16518d5a