提交 · 4efc0670baf4b14bc95502e54a83ccf639146125 · openanolis / cloud-kernel

29 5月, 2009 29 次提交

x86, mce: use 64bit machine check code on 32bit · 4efc0670

由 Andi Kleen 提交于 4月 28, 2009

The 64bit machine check code is in many ways much better than
the 32bit machine check code: it is more specification compliant,
is cleaner, only has a single code base versus one per CPU,
has better infrastructure for recovery, has a cleaner way to communicate
with user space etc. etc.

Use the 64bit code for 32bit too.

This is the second attempt to do this. There was one a couple of years
ago to unify this code for 32bit and 64bit.  Back then this ran into some
trouble with K7s and was reverted.

I believe this time the K7 problems (and some others) are addressed.
I went over the old handlers and was very careful to retain
all quirks.

But of course this needs a lot of testing on old systems. On newer
64bit capable systems I don't expect much problems because they have been
already tested with the 64bit kernel.

I made this a CONFIG for now that still allows to select the old
machine check code. This is mostly to make testing easier,
if someone runs into a problem we can ask them to try
with the CONFIG switched.

The new code is default y for more coverage.

Once there is confidence the 64bit code works well on older hardware
too the CONFIG_X86_OLD_MCE and the associated code can be easily
removed.

This causes a behaviour change for 32bit installations. They now
have to install the mcelog package to be able to log
corrected machine checks.

The 64bit machine check code only handles CPUs which support the
standard Intel machine check architecture described in the IA32 SDM.
The 32bit code has special support for some older CPUs which
have non standard machine check architectures, in particular
WinChip C3 and Intel P5.  I made those a separate CONFIG option
and kept them for now. The WinChip variant could be probably
removed without too much pain, it doesn't really do anything
interesting. P5 is also disabled by default (like it
was before) because many motherboards have it miswired, but
according to Alan Cox a few embedded setups use that one.

Forward ported/heavily changed version of old patch, original patch
included review/fixes from Thomas Gleixner, Bert Wesarg.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

4efc0670

x86, mce: remove oops_begin() use in 64bit machine check · d896a940

由 Andi Kleen 提交于 4月 28, 2009

First 32bit doesn't have oops_begin, so it's a barrier of using
this code on 32bit.

On closer examination it turns out oops_begin is not
a good idea in a machine check panic anyways. All oops_begin
does it so check for recursive/parallel oopses and implement the
"wait on oops" heuristic. But there's actually no good reason
to lock machine checks against oopses or prevent them
from recursion. Also "wait on oops" does not really make
sense for a machine check too.

Replace it with a manual bust_spinlocks/console_verbose.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d896a940

x86, mce: remove machine check handler idle notify on 64bit · 8e97aef5

由 Andi Kleen 提交于 4月 28, 2009

i386 has no idle notifiers, but the 64bit machine check
code uses them to wake up mcelog from a fatal machine check
exception.

For corrected machine checks found by the poller or
threshold interrupts going through an idle notifier is not needed
because the wake_up can is just done directly and doesn't
need the idle notifier. It is only needed for logging
exceptions.

To be honest I never liked the idle notifier even though I signed
off on it. On closer investigation the code actually turned out
to be nearly. Right now machine check exceptions on x86 are always
unrecoverable (lead to panic due to PCC), which means we never execute
the idle notifier path.

The only exception is the somewhat weird tolerant==3 case, which
ignores PCC. I'll fix this in a future patch in a much cleaner way.

So remove the "mcelog wakeup through idle notifier" code
from 64bit.

This allows to compile the 64bit machine check handler on 32bit
which doesn't have idle notifiers.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

8e97aef5

x86, mce: move mce_disabled option into common 32bit/64bit code · d7c3c9a6

由 Andi Kleen 提交于 4月 28, 2009

It's the same function, so let's share it.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d7c3c9a6

x86, mce: rename 64bit mce_dont_init to mce_disabled · 04b2b1a4

由 Andi Kleen 提交于 4月 28, 2009

Give it the same name as on 32bit. This makes further merging easier.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

04b2b1a4

x86, mce: use a call vector to call the 64bit mce handler · 5d727926

由 Andi Kleen 提交于 4月 27, 2009

Allows to call different machine check handlers from the low
level machine check entry vector.

This is needed for later when it will be used for 32bit too.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5d727926

x86, mce: port K7 bank 0 quirk to 64bit mce code · 2e6f694f

由 Andi Kleen 提交于 4月 27, 2009

Various K7 have broken bank 0s. Don't enable it by default

Port from the 32bit code.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

2e6f694f

x86, mce: implement the PPro bank 0 quirk in the 64bit machine check code · 06b7a7a5

由 Andi Kleen 提交于 4月 27, 2009

Quoting the comment:

* SDM documents that on family 6 bank 0 should not be written
* because it aliases to another special BIOS controlled
* register.
* But it's not aliased anymore on model 0x1a+
* Don't ignore bank 0 completely because there could be a valid
* event later, merely don't write CTL0.

This is mostly a port on the 32bit code, except that 32bit
always didn't write it and didn't have the 0x1a heuristic. I checked
with the CPU designers that the quirk is not required starting with
this model.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

06b7a7a5

x86, mce: initial steps to make 64bit mce code 32bit clean · 3cde5c8c

由 Andi Kleen 提交于 4月 27, 2009

Replace unsigned long with u64s if they need to contain 64bit values.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

3cde5c8c

x86, mce: Cleanup MCG definitions · 01c6680a

由 Thomas Gleixner 提交于 4月 08, 2009

Decode more magic constants and turn them into symbols.

[ Sort definitions bitwise, introduce MCG_EXT_CNT - HS ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

01c6680a

x86, mce: Cleanup symbols in intel thermal codes · ba2d0f2b

由 Thomas Gleixner 提交于 4月 08, 2009

Decode magic constants and turn them into symbols.

[ Cleanup to use symbols already exists - HS ]

[ Impact: cleanup ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ba2d0f2b

x86, mce: print number of MCE banks · b659294b

由 Ingo Molnar 提交于 4月 08, 2009

The number of MCE banks supported by a CPU is a useful number to know,
so print it out during CPU initialization.

[ Impact: add printout ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

b659294b

x86, mce: Rename sysfs variables · cb491fca

由 Ingo Molnar 提交于 4月 08, 2009

Shorten variable names. This also compacts the code a bit.

	device_mce		=> mce_dev
	mce_device_initialized	=> mce_dev_initialized
	mce_attribute		=> mce_attrs

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

cb491fca

x86, mce: unify · dba3725d

由 Ingo Molnar 提交于 4月 08, 2009

move mce_64.c => mce.c and glue it up in the Makefile.
Remove mce_32.c
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

dba3725d

x86, mce: unify, prepare for 32-bit v2 · 711c2e48

由 Ingo Molnar 提交于 4月 08, 2009

Prepare the 64-bit mce_64.c code side to be built on 32-bit.

[ includes ifdef relocation by Andi Kleen ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@firstfloor.org>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

711c2e48

x86, mce: unify, prepare codes · a988d334

由 Ingo Molnar 提交于 4月 08, 2009

Move current 32-bit mce_32.c code into mce_64.c.

[ Remove unused artifact stop/restart_mce pointed by Andi Kleen ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: Andi Kleen <ak@firstfloor.org>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

a988d334

x86, mce: unify Intel thermal init · a65d0862

由 Thomas Gleixner 提交于 4月 08, 2009

Mechanic unification. No change in code.

[ Impact: cleanup, 32-bit / 64-bit unification ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

a65d0862

x86, mce: unify Intel thermal init, prepare · 6cc6f3eb

由 Thomas Gleixner 提交于 4月 08, 2009

Prepare for unification, make two intel_init_thermal equal.

[ Impact: cleanup ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

6cc6f3eb

x86, mce: clean up mce_amd_64.c · 1cb2a8e1

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

1cb2a8e1

x86, mce: clean up therm_throt.c · cb6f3c15

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

cb6f3c15

x86, mce: clean up non-fatal.c · bdbfbdd5

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

bdbfbdd5

x86, mce: clean up winchip.c · 91425084

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

91425084

x86, mce: clean up k7.c · efee4ca8

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

efee4ca8

x86, mce: clean up p6.c · ea2566ff

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ea2566ff

x86, mce: clean up p5.c · ed8bc7ed

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ed8bc7ed

x86, mce: clean up p4.c · c5aaf0e0

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

c5aaf0e0

x86, mce: clean up mce_32.c · 3b58dfd0

由 Ingo Molnar 提交于 4月 08, 2009

Make the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

3b58dfd0

x86, mce: clean up mce_64.c · e9eee03e

由 Ingo Molnar 提交于 4月 08, 2009

This file has been modified many times along the years, by multiple
authors, so the general style and structure has diverged in a number
of areas making this file hard to read.

So fix the coding style match that of the rest of the x86 arch code.

[ Impact: cleanup ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

e9eee03e

x86, mce: Cleanup param parser · 13503fa9

由 Hidetoshi Seto 提交于 3月 26, 2009

- Fix the comment formatting.

- The error path does not return 0, and printk lacks level and "\n".

- Move __setup("nomce") next to mcheck_disable().

- Improve readability etc.

[ Impact: cleanup ]
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: NAndi Kleen <ak@linux.intel.com>
LKML-Reference: <49CB3F38.7090703@jp.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

13503fa9

27 5月, 2009 4 次提交

[CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates · ca446d06

由 Andreas Herrmann 提交于 4月 22, 2009

Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h.

Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but
according to AMD family 11h BKDG this frequency is just a rounded value:

  "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid]
  rounded to the nearest 100 Mhz."

As a consequnce powernow-k8 reports wrong CPU frequency on some systems,
e.g. on Turion X2 Ultra:

  powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82
               processors (2 cpu cores) (version 2.20.00)
  powernow-k8:    0 : pstate 0 (2200 MHz)
  powernow-k8:    1 : pstate 1 (1100 MHz)
  powernow-k8:    2 : pstate 2 (600 MHz)

But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it
correctly:

  #x86info -a |grep Pstate
  ...
  Pstate-0: fid=e, did=0, vid=24 (2200MHz)
  Pstate-1: fid=e, did=1, vid=30 (1100MHz)
  Pstate-2: fid=e, did=2, vid=3c (550MHz) (current)

Solution is to determine the frequency directly from Pstate MSRs instead
of using rounded values from ACPI table.
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

ca446d06

[CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data · df182977

由 Thomas Renninger 提交于 4月 22, 2009

- Make the message shorter and easier to grep for
- Use printk_once instead of WARN_ONCE (functionality of these was mixed)
Signed-off-by: NThomas Renninger <trenn@suse.de>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: NDave Jones <davej@redhat.com>

df182977

[CPUFREQ] powernow-k7 build fix when ACPI=n · d38e73e8

由 Dave Jones 提交于 4月 23, 2009

arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NDave Jones <davej@redhat.com>

d38e73e8

[CPUFREQ] add atom family to p4-clockmod · 43195037

由 Jarod Wilson 提交于 3月 06, 2009

Some atom procs don't do freq scaling (such as the atom 330 on my own
littlefalls2 board). By adding the atom family here, we at least get
the benefit of passive cooling in a thermal emergency. Not sure how
to see that its actually helping any, but the driver does bind and
claim its functioning on my atom 330.
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDave Jones <davej@redhat.com>

43195037

25 5月, 2009 1 次提交

x86: Remove remap percpu allocator for the time being · 71c9d8b6

由 Tejun Heo 提交于 5月 25, 2009

Remap percpu allocator has subtle bug when combined with page
attribute changing.  Remap percpu allocator aliases PMD pages for the
first chunk and as pageattr doesn't know about the alias it ends up
updating page attributes of the original mapping thus leaving the
alises in inconsistent state which might lead to subtle data
corruption.  Please read the following threads for more information:

  http://thread.gmane.org/gmane.linux.kernel/835783

The following is the proposed fix which teaches pageattr about percpu
aliases.

  http://thread.gmane.org/gmane.linux.kernel/837157

However, the above changes are deemed too pervasive for upstream
inclusion for 2.6.30 release, so this patch essentially disables
the remap allocator for the time being.
Signed-off-by: NTejun Heo <tj@kernel.org>
LKML-Reference: <4A1A0A27.4050301@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71c9d8b6

23 5月, 2009 1 次提交

x86: introduce noxsave boot parameter · 0c752a93

由 Suresh Siddha 提交于 5月 22, 2009

Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor
capabilities. Useful for debugging and working around xsave related issues.

[ Impact: make it possible to debug problems in the field ]
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

0c752a93

22 5月, 2009 1 次提交

x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot · 88dff493

由 Zhang Rui 提交于 5月 22, 2009

x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot,
see:

  http://bugzilla.kernel.org/show_bug.cgi?id=12901

[ Impact: fix hung reboot on certain systems ]
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <1242963350.32574.53.camel@rzhang-dt>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88dff493

16 5月, 2009 1 次提交

x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126

由 Jeremy Fitzhardinge 提交于 5月 13, 2009

Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.

It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:

 | commit 8efcbab6
 | Date:   Mon Jul 7 12:07:51 2008 -0700
 |
 |     paravirt: introduce a "lock-byte" spinlock implementation

The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test).  The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons).  This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions.  But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).

If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).

Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:

		nopv		Pv-nospin	Pv-spin
CPU cycles	100.00%		99.89%		102.18%
instructions	100.00%		100.10%		100.15%
CPI		100.00%		99.79%		102.03%
cache ref	100.00%		100.84%		100.28%
cache miss	100.00%		90.47%		88.56%
cache miss rate	100.00%		89.72%		88.31%
branches	100.00%		99.93%		100.04%
branch miss	100.00%		103.66%		107.72%
branch miss rt	100.00%		103.73%		107.67%
wallclock	100.00%		99.90%		102.20%

The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.

(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates.  Not too surprising, but it suggests that
the non-pvops kernel is over-inlined.  On the flipside,
the branch misses go up correspondingly...)

So, what's the fix?

Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls.  For example, the compiler
generated code for paravirtualized _spin_lock is:

<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq  *0xffffffff805a5b30
<_spin_lock+22>:	retq

The indirect call will get patched to:
<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq <__ticket_spin_lock>
<_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
<_spin_lock+22>:	retq

One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled).  That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case.  The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.

The other obvious answer is to disable pv-spinlocks.  Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin).  Still it is a
reasonable short-term workaround.

[ Impact: fix pvops performance regression when running native ]
Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: N"Li Xin" <xin.li@intel.com>
Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b4ecc126

15 5月, 2009 1 次提交

kgdb,i386: use address that SP register points to in the exception frame · 33ab1979

由 Jason Wessel 提交于 2月 11, 2009

The treatment of the SP register is different on x86_64 and i386.
This is a regression fix that lived outside the mainline kernel from
2.6.27 to now.  The regression was a result of the original merge
consolidation of the i386 and x86_64 archs to x86.

The incorrectly reported SP on i386 prevented stack tracebacks from
working correctly in gdb.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

33ab1979

14 5月, 2009 1 次提交

x86/function-graph: fix constraint for recording old return value · aa512a27

由 Steven Rostedt 提交于 5月 13, 2009

After upgrading from gcc 4.2.2 to 4.4.0, the function graph tracer broke.
Investigating, I found that in the asm that replaces the return value,
gcc was using the same register for the old value as it was for the
new value.

	mov	(addr), old
	mov	new, (addr)

But if old and new are the same register, we clobber new with old!
I first thought this was a bug in gcc 4.4.0 and reported it:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40132

Andrew Pinski responded (quickly), saying that it was correct gcc behavior
and the code needed to denote old as an "early clobber".

Instead of "=r"(old), we need "=&r"(old).

[Impact: keep function graph tracer from breaking with gcc 4.4.0 ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

aa512a27

11 5月, 2009 1 次提交

x86: mtrr: Fix high_width computation when phys-addr is >= 44bit · 917a0153

由 Yinghai Lu 提交于 5月 06, 2009

found one system where cpu address line is 44bits, mtrr printout
is not right:

 [    0.000000] MTRR variable ranges enabled:
 [    0.000000]   0 base 0   00000000 mask FF0 00000000 write-back
 [    0.000000]   1 base 10  00000000 mask FFF 80000000 write-back
 [    0.000000]   2 base 0   80000000 mask FFF 80000000 uncachable
 [    0.000000]   3 base 0   7F800000 mask FFF FF800000 uncachable

Li Zefan and Frederic pointed out the high_width could be -4 some how.

It turns out when phys_addr is 44bit, size_or_mask will be
ffffffff,00000000 so ffs(size_or_mask) will be 0.

Try to check low 32 bit, to get correct high_width.
Signed-off-by: NYinghai Lu <yinghai@kerne.org>
Also-analyzed-by: NFrederic Weisbecker <fweisbec@gmail.com>
Also-analyzed-by: NLi Zefan <lizf@cn.fujitsu.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zhaolei <zhaolei@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4A026540.8060504@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

917a0153

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功