提交 · d3f7eae182b04997be19343a23f7009170f4f7a5 · openanolis / cloud-kernel

12 8月, 2007 3 次提交

i386: Use global flag to disable broken local apic timer on AMD CPUs. · d3f7eae1

由 Andi Kleen 提交于 8月 10, 2007

The Averatec 2370 and some other Turion laptop BIOS seems to program the
ENABLE_C1E MSR inconsistently between cores. This confuses the lapic
use heuristics because when C1E is enabled anywhere it seems to affect
the complete chip.

Use a global flag instead of a per cpu flag to handle this.
If any CPU has C1E enabled disabled lapic use.

Thanks to Cal Peake for debugging.

Cc: tglx@linutronix.de
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3f7eae1

i386: Make patching more robust, fix paravirt issue · ab144f5e

由 Andi Kleen 提交于 8月 10, 2007

Commit 19d36ccd "x86: Fix alternatives
and kprobes to remap write-protected kernel text" uses code which is
being patched for patching.

In particular, paravirt_ops does patching in two stages: first it
calls paravirt_ops.patch, then it fills any remaining instructions
with nop_out().  nop_out calls text_poke() which calls
lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
that call site is one of the places we patch.

If we always do patching as one single call to text_poke(), we only
need make sure we're not patching the memcpy in text_poke itself.
This means the prototype to paravirt_ops.patch needs to change, to
marshal the new code into a buffer rather than patching in place as it
does now.  It also means all patching goes through text_poke(), which
is known to be safe (apply_alternatives is also changed to make a
single patch).

AK: fix compilation on x86-64 (bad rusty!)
AK: fix boot on x86-64 (sigh)
AK: merged with other patches
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab144f5e

finish i386 and x86-64 sysdata conversion · 73c59afc

由 Muli Ben-Yehuda 提交于 8月 10, 2007

This patch finishes the i386 and x86-64 ->sysdata conversion and hopefully
also fixes Riku's and Andy's observed bugs.  It is based on Yinghai Lu's
and Andy Whitcroft's patches (thanks!) with some changes:

- introduce pci_scan_bus_with_sysdata() and use it instead of
  pci_scan_bus() where appropriate. pci_scan_bus_with_sysdata() will
  allocate the sysdata structure and then call pci_scan_bus().
- always allocate pci_sysdata dynamically. The whole point of this
  sysdata work is to make it easy to do root-bus specific things
  (e.g., support PCI domains and IOMMU's). I dislike using a default
  struct pci_sysdata in some places and a dynamically allocated
  pci_sysdata elsewhere - the potential for someone indavertantly
  changing the default structure is too high.
- this patch only makes the minimal changes necessary, i.e., the NUMA node is
  always initialized to -1. Patches to do the right thing with regards
  to the NUMA node can build on top of this (either add a 'node'
  parameter to pci_scan_bus_with_sysdata() or just update the node
  when it becomes known).

The patch was compile tested with various configurations (e.g., NUMAQ,
VISWS) and run-time tested on i386 and x86-64.  Unfortunately none of my
machines exhibited the bugs so caveat emptor.

Andy, could you please see if this fixes the NUMA issues you've seen?
Riku, does this fix "pci=noacpi" on your laptop?
Signed-off-by: NMuli Ben-Yehuda <muli@il.ibm.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: <riku.seppala@kymp.net>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73c59afc

01 8月, 2007 2 次提交

revert "x86, serial: convert legacy COM ports to platform devices" · 57d4810e

由 Andrew Morton 提交于 7月 31, 2007

Revert 7e92b4fc.  It broke Sébastien Dugué's
machine and Jeff said (persuasively)

  This seems like it will break decades-long-working stuff, in favor of
  breaking new ground in our favorite area, "trusting the BIOS."

  It's just not worth it for serial ports, IMO.  Serial ports are something
  that just shouldn't break at this late stage in the game.  My new Intel
  platform boxes don't even have serial ports, so I question the value of
  messing with serial port probing even more...  because...  just wait a year,
  and your box won't have a serial port either!  :)

  I certainly don't object to the use of platform devices (or isa_driver),
  but the probe change seems questionable.  That's sorta analagous to
  rewriting the floppy driver probe routine.  Sure you could do it...  but why
  risk all that damage and go through debugging all over again?

  It seems clear from this report that we cannot, should not, trust BIOS for
  something (a) so simple and (b) that has been working for over a decade.

Much discussion ensued and we've decided to have another go at all of this.

Cc: Sébastien Dugué <sebastien.dugue@bull.net>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <ambx1@neo.rr.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jeff Garzik <jeff@garzik.org>
Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Michal Piotrowski <michal.k.k.piotrowski@gmail.com>
Cc: Sascha Sommer <saschasommer@freenet.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57d4810e

remove unused TIF_NOTIFY_RESUME flag · a583f1b5

由 Stephane Eranian 提交于 7月 31, 2007

Remove unused TIF_NOTIFY_RESUME flag for all processor architectures.  The
flag was not used excecpt on IA-64 where the patch replaces it with
TIF_PERFMON_WORK.
Signed-off-by: Nstephane eranian <eranian@hpl.hp.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a583f1b5

30 7月, 2007 1 次提交

Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION · b0cb1a19

由 Rafael J. Wysocki 提交于 7月 29, 2007

Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0cb1a19

26 7月, 2007 2 次提交

[x86 setup] Make struct ist_info cross-architecture, and use in setup code · 238b706d

由 H. Peter Anvin 提交于 7月 18, 2007

Make "struct ist_info" valid on both i386 and x86-64, and use the
structure by name in the setup code.  Additionally, "Intel SpeedStep
IST" is redundant, refer to it as IST consistently.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

238b706d

[x86 setup] Fix typos in struct efi_info · f77b1ab3

由 H. Peter Anvin 提交于 7月 18, 2007

Fix missing letters in the structure members of struct efi_info.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f77b1ab3

25 7月, 2007 1 次提交

ACPI: Kconfig: remove CONFIG_ACPI_SLEEP from source · e8b2fd01

由 Len Brown 提交于 7月 24, 2007

As it was a synonym for (CONFIG_ACPI && CONFIG_X86),
the ifdefs for it were more clutter than they were worth.

For ia64, just add a few stubs in anticipation of future
S3 or S4 support.
Signed-off-by: NLen Brown <len.brown@intel.com>

e8b2fd01

23 7月, 2007 4 次提交

i386: Use patchable lock prefix in set_64bit · e05aff85

由 Andi Kleen 提交于 7月 22, 2007

Previously lock was unconditionally used, but shouldn't be needed on
UP systems.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e05aff85

x86: Replace NSC/Cyrix specific chipset access macros by inlined functions. · f25f64ed

由 Juergen Beisert 提交于 7月 22, 2007

Due to index register access ordering problems, when using macros a line
like this fails (and does nothing):

	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);

With inlined functions this line will work as expected.

Note about a side effect: Seems on Geode GX1 based systems the
"suspend on halt power saving feature" was never enabled due to this
wrong macro expansion. With inlined functions it will be enabled, but
this will stop the TSC when the CPU runs into a HLT instruction.
Kernel output something like this:
	Clocksource tsc unstable (delta = -472746897 ns)

This is the 3rd version of this patch.

 - Adding missed arch/i386/kernel/cpu/mtrr/state.c
	Thanks to Andres Salomon
 - Adding some big fat comments into the new header file
 	Suggested by Andi Kleen

AK: fixed x86-64 compilation
Signed-off-by: NJuergen Beisert <juergen@kreuzholzen.de>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f25f64ed

x86: Stop MCEs and NMIs during code patching · 8f4e956b

由 Andi Kleen 提交于 7月 22, 2007

When a machine check or NMI occurs while multiple byte code is patched
the CPU could theoretically see an inconsistent instruction and crash.
Prevent this by temporarily disabling MCEs and returning early in the
NMI handler.

Based on discussion with Mathieu Desnoyers.

Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8f4e956b

x86: Fix alternatives and kprobes to remap write-protected kernel text · 19d36ccd

由 Andi Kleen 提交于 7月 22, 2007

Reenable kprobes and alternative patching when the kernel text is write
protected by DEBUG_RODATA

Add a general utility function to change write protected text.  The new
function remaps the code using vmap to write it and takes care of CPU
synchronization.  It also does CLFLUSH to make icache recovery faster.

There are some limitations on when the function can be used, see the
comment.

This is a newer version that also changes the paravirt_ops code.
text_poke also supports multi byte patching now.

Contains bug fixes from Zach Amsden and suggestions from Mathieu
Desnoyers.

Cc: Jan Beulich <jbeulich@novell.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19d36ccd

22 7月, 2007 15 次提交

x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata · 08f1c192

由 Muli Ben-Yehuda 提交于 7月 22, 2007

This patch introduces struct pci_sysdata to x86 and x86-64, and
converts the existing two users (NUMA, Calgary) to use it.

This lays the groundwork for having other users of sysdata, such as
the PCI domains work.

The Calgary bits are tested, the NUMA bits just look ok.
Signed-off-by: NJeff Garzik <jeff@garzik.org>
Signed-off-by: NMuli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08f1c192

i386: basic infrastructure support for AMD geode-class machines · f62e5184

由 Andres Salomon 提交于 7月 21, 2007

This builds upon the existing geode infrastructure, but adds southbridge
support, some GPIO functions, and a header file (asm-i386/geode.h) with some
useful GX/LX detection tests.

The majority of this code was written by Jordan Crouse.
Signed-off-by: NJordan Crouse <jordan.crouse@amd.com>
Signed-off-by: NAndres Salomon <dilinger@debian.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f62e5184

i386: move PIT function declarations and constants to correct header file · a2900975

由 Thomas Gleixner 提交于 7月 21, 2007

setup_pit_timer is declared in asm-i386/timer.h.  Move it to the pit header
file, so it can be used by x86_64 as well.

Move also the PIT constants.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2900975

i386: replace hard-coded constant with appropriate macro from kernel.h · 48dd9343

由 Robert P. J. Day 提交于 7月 21, 2007

Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

48dd9343

i386: add cpu_relax() to cmos_lock() · 267eb01a

由 Andreas Mohr 提交于 7月 21, 2007

Add cpu_relax() to cmos_lock() inline function for faster operation on SMT
CPUs and less power consumption on others in case of lock contention (which
probably doesn't happen too often, so admittedly this patch is not too
exciting).

[akpm@linux-foundation.org: Include the header file for cpu_relax()]
Signed-off-by: NAndreas Mohr <andi@lisas.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

267eb01a

i386: do not restore reserved memory after hibernation · 1c10070a

由 Rafael J. Wysocki 提交于 7月 21, 2007

On some systems the ACPI NVS area is located in the first 1 MB of RAM and
it is overwritten by the i386 code during the restore after hibernation.
This confuses the ACPI platform firmware that doesn't update the AC adapter
status appropriately as a result
(http://bugzilla.kernel.org/show_bug.cgi?id=7995).

The solution is to register the reserved memory in the first 1 MB as
'nosave', so that swsusp doesn't touch it during the restore.  Also, this
has been done on x86_64 for a long time now, so this patch makes the i386
restore code behave like the x86_64 one.

[akpm@linux-foundation.org: build fix]
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NPavel Machek <pavel@ucw.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c10070a

x86: remove support for the Rise CPU · 9596017e

由 Adrian Bunk 提交于 7月 21, 2007

The Rise CPUs were only very short-lived, and there are no reports of
anyone both owning one and running Linux on it.

Googling for the printk string "CPU: Rise iDragon" didn't find any dmesg
available online.

If it turns out that against all expectations there are actually users
reverting this patch would be easy.

This patch will make the kernel images smaller by a few bytes for all
i386 users.
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Acked-by: NDave Jones <davej@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9596017e

i386: add reference to the arguments · 45902954

由 Andrew Morton 提交于 7月 21, 2007

Prevent stuff like this:

mm/vmalloc.c: In function 'unmap_kernel_range':
mm/vmalloc.c:75: warning: unused variable 'start'

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45902954

x86: PM_TRACE support · 44bf4cea

由 Nigel Cunningham 提交于 7月 21, 2007

Signed-off-by: NNigel Cunningham <nigel@nigel.suspend2.net>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44bf4cea

i386: fix machine rebooting · 8b937898

由 Truxton Fulton 提交于 7月 21, 2007

Commit 59f4e7d5 fixed machine rebooting
on Truxton's machine (when no keyboard was present).  But it broke it on
Lee's machine.

The patch reinstates the old (pre-59f4e7d5)
code and if that doesn't work out, try the new,
post-59f4e7d5 code instead.

Cc: Lee Garrett <lee-in-berlin@web.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b937898

i386: minor nx handling adjustment · d5321abe

由 Jan Beulich 提交于 7月 21, 2007

Constrain __supported_pte_mask and NX handling to just the PAE kernel.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5321abe

x86: share hpet.h with i386 · 0655d7c3

由 Thomas Gleixner 提交于 7月 21, 2007

hpet.h in asm-i386 and asm-x86_64 contain tons of duplicated stuff.
Consolidate into one shared header file.

AK: Fix i386 compilation with !X86_IO_APIC
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0655d7c3

i386: remove pit_interrupt_hook · bef9f9de

由 Chris Wright 提交于 7月 21, 2007

Remove pit_interrupt_hook as it adds just an extra layer.
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bef9f9de

i386: Move all simple string operations out of line · b520b85a

由 Andi Kleen 提交于 7月 21, 2007

The compiler generally generates reasonable inline code for the simple
cases and for the rest it's better for code size for them to be out of line.
Also there they can be potentially optimized more in the future.

In fact they probably should be in a .S file because they're all pure
assembly, but that's for another day.

Also some code style cleanup on them while I was on it (this seems
to be the last untouched really early Linux code)

This saves ~12k text for a defconfig kernel with gcc 4.1.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b520b85a

NTP: move the cmos update code into ntp.c · 82644459

由 Thomas Gleixner 提交于 7月 21, 2007

i386 and sparc64 have the identical code to update the cmos clock.  Move it
into kernel/time/ntp.c as there are other architectures coming along with the
same requirements.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

82644459

20 7月, 2007 6 次提交

i386: Allow KVM on i386 nonpae · 2d9ce177

由 Avi Kivity 提交于 7月 19, 2007

Currently, CONFIG_X86_CMPXCHG64 both enables boot-time checking of
the cmpxchg64b feature and enables compilation of the set_64bit() family.
Since the option is dependent on PAE, and since KVM depends on set_64bit(),
this effectively disables KVM on i386 nopae.

Simplify by removing the config option altogether: the boot check is made
dependent on CONFIG_X86_PAE directly, and the set_64bit() family is exposed
without constraints. It is up to users to check for the feature flag (KVM
does not as virtualiation extensions imply its existence).
Signed-off-by: NAvi Kivity <avi@qumranet.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d9ce177

[PATCH] sched: sched_cacheflush is now unused · c41917df

由 Ralf Baechle 提交于 7月 19, 2007

Since Ingo's recent scheduler rewrite which was merged as commit
0437e109 sched_cacheflush is unused.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c41917df

lguest: the host code · d7e28ffe

由 Rusty Russell 提交于 7月 19, 2007

This is the code for the "lg.ko" module, which allows lguest guests to
be launched.

[akpm@linux-foundation.org: update for futex-new-private-futexes]
[akpm@linux-foundation.org: build fix]
[jmorris@namei.org: lguest: use hrtimers]
[akpm@linux-foundation.org: x86_64 build fix]
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7e28ffe

arch: personality independent stack top · b111757c

由 Peter Zijlstra 提交于 7月 19, 2007

New arch macro STACK_TOP_MAX it gives the larges valid stack address for the
architecture in question.

It differs from STACK_TOP in that it will not distinguish between
personalities but will always return the largest possible address.

This is used to create the initial stack on execve, which we will move down to
the proper location once the binfmt code has figured out where that is.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NOllie Wild <aaw@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b111757c

define new percpu interface for shared data · 5fb7dc37

由 Fenghua Yu 提交于 7月 19, 2007

per cpu data section contains two types of data.  One set which is
exclusively accessed by the local cpu and the other set which is per cpu,
but also shared by remote cpus.  In the current kernel, these two sets are
not clearely separated out.  This can potentially cause the same data
cacheline shared between the two sets of data, which will result in
unnecessary bouncing of the cacheline between cpus.

One way to fix the problem is to cacheline align the remotely accessed per
cpu data, both at the beginning and at the end.  Because of the padding at
both ends, this will likely cause some memory wastage and also the
interface to achieve this is not clean.

This patch:

Moves the remotely accessed per cpu data (which is currently marked
as ____cacheline_aligned_in_smp) into a different section, where all the data
elements are cacheline aligned. And as such, this differentiates the local
only data and remotely accessed data cleanly.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fb7dc37

jprobes: remove JPROBE_ENTRY() · 9e367d85

由 Michael Ellerman 提交于 7月 19, 2007

AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything
useful - so remove it ..

I've left a do-nothing version so that out-of-tree jprobes code will still
compile without modifications.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e367d85

18 7月, 2007 6 次提交

xen: add the Xenbus sysfs and virtual device hotplug driver · 4bac07c9

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

This communicates with the machine control software via a registry
residing in a controlling virtual machine. This allows dynamic
creation, destruction and modification of virtual device
configurations (network devices, block devices and CPUS, to name some
examples).

[ Greg, would you mind giving this a review?  Thanks -J ]
Signed-off-by: NIan Pratt <ian.pratt@xensource.com>
Signed-off-by: NChristian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Cc: Greg KH <greg@kroah.com>

4bac07c9

xen: Core Xen implementation · 5ead97c8

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

This patch is a rollup of all the core pieces of the Xen
implementation, including:
 - booting and setup
 - pagetable setup
 - privileged instructions
 - segmentation
 - interrupt flags
 - upcalls
 - multicall batching

BOOTING AND SETUP

The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.

Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.

Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper.  The main
steps are:
  1. Install the Xen paravirt_ops, which is simply a matter of a
     structure assignment.
  2. Set init_mm to use the Xen-supplied pagetables (analogous to the
     head.S generated pagetables in a native boot).
  3. Reserve address space for Xen, since it takes a chunk at the top
     of the address space for its own use.
  4. Call start_kernel()

PAGETABLE SETUP

Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state.  One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable.  Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind.  It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.

This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.

PRIVILEGED INSTRUCTIONS AND SEGMENTATION

When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly.  Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops.  In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.

The privileged instructions fall into the broad classes of:
  Segmentation: setting up the GDT and the GDT entries, LDT,
     TLS and so on.  Xen doesn't allow the GDT to be directly
     modified; all GDT updates are done via hypercalls where the new
     entries can be validated.  This is important because Xen uses
     segment limits to prevent the guest kernel from damaging the
     hypervisor itself.
  Traps and exceptions: Xen uses a special format for trap entrypoints,
     so when the kernel wants to set an IDT entry, it needs to be
     converted to the form Xen expects.  Xen sets int 0x80 up specially
     so that the trap goes straight from userspace into the guest kernel
     without going via the hypervisor.  sysenter isn't supported.
  Kernel stack: The esp0 entry is extracted from the tss and provided to
     Xen.
  TLB operations: the various TLB calls are mapped into corresponding
     Xen hypercalls.
  Control registers: all the control registers are privileged.  The most
     important is cr3, which points to the base of the current pagetable,
     and we handle it specially.

Another instruction we treat specially is CPUID, even though its not
privileged.  We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.

INTERRUPT FLAGS

Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure.  Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).

(A note on terminology: "events" and interrupts are effectively
synonymous.  However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)

There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state.  The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.

UPCALLS

Xen needs a couple of upcall (or callback) functions to be implemented
by each guest.  One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests.  The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret.  These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.

MULTICALL BATCHING

Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor.  This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one.  This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>

5ead97c8

xen: Add Xen interface header files · a42089dd

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

Add Xen interface header files. These are taken fairly directly from
the Xen tree, but somewhat rearranged to suit the kernel's conventions.

Define macros and inline functions for doing hypercalls into the
hypervisor.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NIan Pratt <ian.pratt@xensource.com>
Signed-off-by: NChristian Limpach <Christian.Limpach@cl.cam.ac.uk>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>

a42089dd

Add a sched_clock paravirt_op · 688340ea

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

The tsc-based get_scheduled_cycles interface is not a good match for
Xen's runstate accounting, which reports everything in nanoseconds.

This patch replaces this interface with a sched_clock interface, which
matches both Xen and VMI's requirements.

In order to do this, we:
   1. replace get_scheduled_cycles with sched_clock
   2. hoist cycles_2_ns into a common header
   3. update vmi accordingly

One thing to note: because sched_clock is implemented as a weak
function in kernel/sched.c, we must define a real function in order to
override this weak binding.  This means the usual paravirt_ops
technique of using an inline function won't work in this case.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: john stultz <johnstul@us.ibm.com>

688340ea

paravirt: helper to disable all IO space · d572929c

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

In a virtual environment, device drivers such as legacy IDE will waste
quite a lot of time probing for their devices which will never appear.
This helper function allows a paravirt implementation to lay claim to
the whole iomem and ioport space, thereby disabling all device drivers
trying to claim IO resources.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>

d572929c

paravirt: make siblingmap functions visible · c70df743

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

Paravirt implementations need to set the sibling map on new cpus.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>

c70df743

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功