提交 · 9e41bff2708e420e61e6b89a54c15232857069b1 · openeuler / raspberrypi-kernel

22 10月, 2008 1 次提交

由 Andy Henroid 提交于 10月 09, 2008

The Intel 7300 Memory Controller supports dynamic throttling of memory which can
be used to save power when system is idle. This driver does the memory
throttling when all CPUs are idle on such a system.

Refer to "Intel 7300 Memory Controller Hub (MCH)" datasheet
for the config space description.
Signed-off-by: NAndy Henroid <andrew.d.henroid@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>

27471fdb

20 10月, 2008 1 次提交

container freezer: implement freezer cgroup subsystem · dc52ddc0

由 Matt Helsley 提交于 10月 18, 2008

This patch implements a new freezer subsystem in the control groups
framework.  It provides a way to stop and resume execution of all tasks in
a cgroup by writing in the cgroup filesystem.

The freezer subsystem in the container filesystem defines a file named
freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
the cgroup.  Reading will return the current state.

* Examples of usage :

   # mkdir /containers/freezer
   # mount -t cgroup -ofreezer freezer  /containers
   # mkdir /containers/0
   # echo $some_pid > /containers/0/tasks

to get status of the freezer subsystem :

   # cat /containers/0/freezer.state
   RUNNING

to freeze all tasks in the container :

   # echo FROZEN > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   FREEZING
   # cat /containers/0/freezer.state
   FROZEN

to unfreeze all tasks in the container :

   # echo RUNNING > /containers/0/freezer.state
   # cat /containers/0/freezer.state
   RUNNING

This is the basic mechanism which should do the right thing for user space
task in a simple scenario.

It's important to note that freezing can be incomplete.  In that case we
return EBUSY.  This means that some tasks in the cgroup are busy doing
something that prevents us from completely freezing the cgroup at this
time.  After EBUSY, the cgroup will remain partially frozen -- reflected
by freezer.state reporting "FREEZING" when read.  The state will remain
"FREEZING" until one of these things happens:

	1) Userspace cancels the freezing operation by writing "RUNNING" to
		the freezer.state file
	2) Userspace retries the freezing operation by writing "FROZEN" to
		the freezer.state file (writing "FREEZING" is not legal
		and returns EIO)
	3) The tasks that blocked the cgroup from entering the "FROZEN"
		state disappear from the cgroup's set of tasks.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: export thaw_process]
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Tested-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc52ddc0

17 10月, 2008 2 次提交

cpuidle: upon BIOS bug, default to default_idle rather than polling · 89cedfef

由 Venkatesh Pallipadi 提交于 10月 16, 2008

http://bugzilla.kernel.org/show_bug.cgi?id=11345Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

89cedfef

Kconfig: eliminate "def_bool n" constructs · 9ba16087

由 Jan Beulich 提交于 10月 15, 2008

Using "def_bool n" is pointless, simply using bool here appears more
appropriate.

Further, retaining such options that don't have a prompt and aren't
selected by anything seems also at least questionable.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ba16087

16 10月, 2008 7 次提交

genirq: revert dynarray · d6c88a50

由 Thomas Gleixner 提交于 10月 15, 2008

Revert the dynarray changes. They need more thought and polishing.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d6c88a50

T
x86: remove sparse irq from Kconfig · 3235e936
由 Thomas Gleixner 提交于 10月 15, 2008
```
This code is not ready yet.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
```
3235e936

x86: make HAVE_SPARSE_IRQ support selectable · 8f09cd20

由 Yinghai Lu 提交于 8月 19, 2008

Ingo said sparse_irq is some intrusive. need to make it selectable

to make it simple, remove irq_desc as parameter in some functions.
(ack, eoi, set_affinity).
may need to make member if irq_chip to take irq_desc, or struct irq later.
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8f09cd20

x86: make 32 bit to use sparse_irq · 199751d7

由 Yinghai Lu 提交于 8月 19, 2008

but actually irq still needs to be less than NR_IRQS, because
interrupt[NR_IRQS] in entry.S.

need to enable per_cpu vector...
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

199751d7

x86: remove irqbalance in kernel for 32 bit · 8b8e8c1b

由 Yinghai Lu 提交于 8月 19, 2008

This has been deprecated for years, the user space irqbalanced utility
works better with numa, has configurable policies, etc...
Signed-off-by: NYinghai Lu <yhlu.kernel@gmai.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8b8e8c1b

generic: sparse irqs: use irq_desc() together with dyn_array, instead of irq_desc[] · 08678b08

由 Yinghai Lu 提交于 8月 19, 2008

add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
Get rid of irq_desc[] array assumptions.

Preallocate 32 irq_desc, and irq_desc() will try to get more.

( No change in functionality is expected anywhere, except the odd build
  failure where we missed a code site or where a crossing commit itroduces
  new irq_desc[] usage. )

v2: according to Eric, change get_irq_desc() to irq_desc()
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

08678b08

x86: enable dyn_array support · 6da55c3e

由 Yinghai Lu 提交于 8月 19, 2008

Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6da55c3e

14 10月, 2008 1 次提交

ftrace: enable using mcount recording on x86 · e4b2b886

由 Steven Rostedt 提交于 8月 14, 2008

Enable the use of the __mcount_loc infrastructure on x86_64 and i386.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e4b2b886

13 10月, 2008 2 次提交

x86: remove additional_cpus configurability · b8073050

由 Ingo Molnar 提交于 10月 05, 2008

additional_cpus=<x> parameter is dangerous and broken: for example
if we boot additional_cpus=-2 on a stock dual-core system it will
crash the box on bootup.

So reduce the maze of code a bit by removingthe user-configurability
angle.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b8073050

x86: allow number of additional hotplug CPUs to be set at compile time, V2 · 7f2f49a5

由 Chuck Ebbert 提交于 10月 02, 2008

x86: allow number of additional hotplug CPUs to be set at compile time, V2

The default number of additional CPU IDs for hotplugging is determined
by asking ACPI or mptables how many "disabled" CPUs there are in the
system, but many systems get this wrong so that e.g. a uniprocessor
machine gets an extra CPU allocated and never switches to single CPU
mode.

And sometimes CPU hotplugging is enabled only for suspend/hibernate
anyway, so the additional CPU IDs are not wanted. Allow the number
to be set to zero at compile time.

Also, force the number of extra CPUs to zero if hotplugging is disabled
which allows removing some conditional code.

Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
with CPU hotplugging configured.

("After" has the number of additional CPUs set to 0)
Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1

[Changed the name of the option and the prompt according to Ingo's
 suggestion.]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7f2f49a5

01 10月, 2008 1 次提交

x86: change MTRR_SANITIZER to def_bool y · 2ffb3501

由 Yinghai Lu 提交于 9月 30, 2008

This option has been added in v2.6.26 as a default-disabled
feature and went through several revisions since then.

The feature fixes a wide range of MTRR setup problems that BIOSes
leave us with: slow system, slow Xorg, slow system when adding lots
of RAM, etc., so we want to enable it by default for v2.6.28.

See:

  [Bug 10508] Upgrade to 4GB of RAM messes up MTRRs
  http://bugzilla.kernel.org/show_bug.cgi?id=10508

and the test results in:

   http://lkml.org/lkml/2008/9/29/273

1. hpa
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x13c000000 (5056MB), size=  64MB: uncachable, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg04: base=0xbf700000 (3063MB), size=   1MB: uncachable, count=1
reg05: base=0xbf800000 (3064MB), size=   8MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M   chunk_size: 128M        num_reg: 6      lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
range0: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
hole: 000000013c000000 - 0000000140000000
Setting variable MTRR 5, base: 5056MB, range: 64MB, type UC

2. Dylan Taft
reg00: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg02: base=0x120000000 (4608MB), size= 256MB: write-back, count=1
reg03: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg04: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg05: base=0xc7e00000 (3198MB), size=   2MB: uncachable, count=1
reg06: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M   chunk_size: 4M  num_reg: 6      lose RAM: 0M
range0: 0000000000000000 - 00000000c8000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 128MB, type WB
hole: 00000000c7e00000 - 00000000c8000000
Setting variable MTRR 3, base: 3198MB, range: 2MB, type UC
rangeX: 0000000100000000 - 0000000130000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 256MB, type WB

3. Gabriel
reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
reg05: base=0x128000000 (4736MB), size=  64MB: write-back, count=1
reg06: base=0xcf600000 (3318MB), size=   2MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M   chunk_size: 16M         num_reg: 7      lose RAM: 0M
range0: 0000000000000000 - 00000000d0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
Setting variable MTRR 2, base: 3072MB, range: 256MB, type WB
hole: 00000000cf600000 - 00000000cf800000
Setting variable MTRR 3, base: 3318MB, range: 2MB, type UC
rangeX: 0000000100000000 - 000000012c000000
Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
Setting variable MTRR 5, base: 4608MB, range: 128MB, type WB
Setting variable MTRR 6, base: 4736MB, range: 64MB, type WB

4. Mika Fischer
reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1

will get
Found optimal setting for mtrr clean up
gran_size: 1M   chunk_size: 16M         num_reg: 5      lose RAM: 0M
range0: 0000000000000000 - 00000000c0000000
Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
hole: 00000000bf700000 - 00000000c0000000
Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
rangeX: 0000000100000000 - 0000000140000000
Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2ffb3501

23 9月, 2008 1 次提交

x86: moved microcode.c to microcode_intel.c · 18dbc916

由 Dmitry Adamushko 提交于 9月 23, 2008

Combine both generic and arch-specific parts of microcode into a
single module (arch-specific parts are config-dependent).

Also while we are at it, move arch-specific parts from microcode.h
into their respective arch-specific .c files.
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: "Peter Oruba" <peter.oruba@amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

18dbc916

19 9月, 2008 1 次提交

AMD IOMMU: add MSI interrupt support · a80dc3e0

由 Joerg Roedel 提交于 9月 11, 2008

The AMD IOMMU can generate interrupts for various reasons. This patch
adds the basic interrupt enabling infrastructure to the driver.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a80dc3e0

16 9月, 2008 1 次提交

x86: add X86_RESERVE_LOW_64K · fc381519

由 Ingo Molnar 提交于 9月 16, 2008

This bugzilla:

  http://bugzilla.kernel.org/show_bug.cgi?id=11237

Documents a wide range of systems where the BIOS utilizes the first
64K of physical memory during suspend/resume and other hardware events.

Currently we reserve this memory on all AMI and Phoenix BIOS systems.
Life is too short to hunt subtle memory corruption problems like this,
so we try to be robust by default.

Still, allow this to be overriden: allow users who want that first 64K
of memory to be available to the kernel disable the quirk, via
CONFIG_X86_RESERVE_LOW_64K=n.

Also, allow the early reservation to overlap with other
early reservations.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

fc381519

14 9月, 2008 3 次提交

generic: redefine resource_size_t as phys_addr_t · 8308c54d

由 Jeremy Fitzhardinge 提交于 9月 11, 2008

There's no good reason why a resource_size_t shouldn't just be a
physical address, so simply redefine it in terms of phys_addr_t.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8308c54d

generic: add phys_addr_t for holding physical addresses · 600715dc

由 Jeremy Fitzhardinge 提交于 9月 11, 2008

Add a kernel-wide "phys_addr_t" which is guaranteed to be able to hold
any physical address.  By default it equals the word size of the
architecture, but a 32-bit architecture can set ARCH_PHYS_ADDR_T_64BIT
if it needs a 64-bit phys_addr_t.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

600715dc

x86: simpler SYSVIPC_COMPAT definition · b8992195

由 Alexey Dobriyan 提交于 9月 14, 2008

X86_64 part is entirely redundant.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b8992195

09 9月, 2008 1 次提交

seccomp: drop now bogus dependency on PROC_FS · 9c0bbee8

由 Alexey Dobriyan 提交于 9月 09, 2008

seccomp is prctl(2)-driven now.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9c0bbee8

07 9月, 2008 5 次提交

x86: default corruption check to off, but put parameter default in Kconfig · c885df50

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

Default the low memory corruption check to off, but make the default setting of
the memory_corruption_check kernel parameter a config parameter.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c885df50

x86: clean up memory corruption check and add more kernel parameters · 9f077871

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

The corruption check is enabled in Kconfig by default, but disabled at runtime.

This patch adds several kernel parameters to control the corruption
check's behaviour; these are documented in kernel-parameters.txt.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9f077871

x86: check for and defend against BIOS memory corruption · 5394f80f

由 Jeremy Fitzhardinge 提交于 9月 07, 2008

Some BIOSes have been observed to corrupt memory in the low 64k.  This
change:
 - Reserves all memory which does not have to be in that area, to
   prevent it from being used as general memory by the kernel.  Things
   like the SMP trampoline are still in the memory, however.
 - Clears the reserved memory so we can observe changes to it.
 - Adds a function check_for_bios_corruption() which checks and reports on
   memory becoming unexpectedly non-zero.  Currently it's called in the
   x86 fault handler, and the powermanagement debug output.
Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5394f80f

Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUT · e17c6d56

由 David Woodhouse 提交于 6月 17, 2008

HAVE_AOUT doesn't quite do the same thing as the recently removed
ARCH_SUPPORTS_AOUT config option. That was set even on platforms where
binfmt_aout isn't supported, although it's not entirely clear why.

So it's best just to introduce a new symbol, handled consistently with
other similar HAVE_xxx symbols; with a simple 'select' in the arch Kconfig.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

e17c6d56

Remove redundant CONFIG_ARCH_SUPPORTS_AOUT · 6b213e1b

由 David Woodhouse 提交于 6月 16, 2008

We don't need this any more; arguably we never really did.
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

6b213e1b

26 8月, 2008 1 次提交

[x86] Clean up MAXSMP Kconfig, and limit NR_CPUS to 512 · d25e26b6

由 Linus Torvalds 提交于 8月 25, 2008

This fixes a regression that was indirectly caused by commit
1184dc2f ("x86: modify Kconfig to allow
up to 4096 cpus").

Allowing 4k CPU's is not practical at this time, because we still have a
number of places that have several 'cpumask_t's on the stack, and a
4k-bit cpumask is 512 bytes of stack-space for each such variable.  This
literally caused functions like 'smp_call_function_mask' to have a 2.5kB
stack frame, and several functions to have 2kB stackframes.

With an 8kB stack total, smashing the stack was simply much too likely.
At least bugzilla entry

	http://bugzilla.kernel.org/show_bug.cgi?id=11342

was due to this.

The earlier commit to not inline load_module() into sys_init_module()
fixed the particular symptoms of this that Alan Brunelle saw in that
bugzilla entry, but the huge stack waste by cpumask_t's was the more
direct cause.

Some day we'll have allocation helpers that allocate large CPU masks
dynamically, but in the meantime we simply cannot allow cpumasks this
large.

Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d25e26b6

15 8月, 2008 2 次提交

x86, bootup: add built-in kernel command line for x86 (v2) · 516cbf37

由 Tim Bird 提交于 8月 12, 2008

Allow x86 to support a built-in kernel command line.  The built-in
command line can override the one provided by the boot loader, for
those cases where the boot loader is broken or it is difficult
to change the command line in the the boot loader.

H. Peter Anvin wrote:
> Ingo Molnar wrote:
>> Best would be to make it really apparent in the code that nothing
>> changes if this config option is not set. Preferably there should be
>> no extra code at all in that case.
>>
>
> I would like to see this:
[...Nested ifdefs...]

OK. This version changes absolutely nothing if CONFIG_CMDLINE_BOOL is not
set (the default).  Also, no space is appended even when CONFIG_CMDLINE_BOOL
is set, but the builtin string is empty.  This is less sloppy all the way
around, IMHO.

Note that I use the same option names as on other arches for
this feature.

[ mingo@elte.hu: build fix ]
Signed-off-by: NTim Bird <tim.bird@am.sony.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

516cbf37

arch/x86/Kconfig: clean up, experimental adjustement · 04b69447

由 Pavel Machek 提交于 8月 14, 2008

Adjust experimental tags in Kconfig, update config to notice that
i386/x86_64 is now single architecture.
Signed-off-by: NPavel Machek <pavel@suse.cz>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

04b69447

12 8月, 2008 3 次提交

mm: Make generic weak get_user_pages_fast and EXPORT_GPL it · 912985dc

由 Rusty Russell 提交于 8月 12, 2008

Out of line get_user_pages_fast fallback implementation, make it a weak
symbol, get rid of CONFIG_HAVE_GET_USER_PAGES_FAST.

Export the symbol to modules so lguest can use it.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

912985dc

x86: make sparsemem more available · 99809963

由 Jeff Chua 提交于 8月 06, 2008

With CONFIG_X86_PC, I can set CONFIG_SPARSEMEM=y.

With CONFIG_X86_GENERICARCH, CONFIG_SPARSEMEM depends on CONFIG_NUMA.

I'm using the patch below to enable sparsemem instead of flatmem.
System booted and is running.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

99809963

x86: remove EXPERIMENTAL restriction from CONFIG_HOTPLUG_CPU · 7c13e6a3

由 Dimitri Sivanich 提交于 8月 11, 2008

This removes the EXPERIMENTAL restriction from CONFIG_HOTPLUG_CPU
on the x86 architecture.
Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7c13e6a3

29 7月, 2008 2 次提交

x86: AMD microcode patch loading support · 80cc9f10

由 Peter Oruba 提交于 7月 28, 2008

This patch introduces microcode patch loading for AMD
processors. It is based on previous corresponding work
for Intel processors.

It hooks into the general patch loading module. Main
difference is that a container file format is used to hold
all patch data for multiple processors as well as an
equivalent CPU table, which comes seperately, as opposed
to Intel's microcode patching solution.

Kconfig and Makefile have been changed provice config
and build option for new source file.
Signed-off-by: NPeter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

80cc9f10

x86: major refactoring · 8d86f390

由 Peter Oruba 提交于 7月 28, 2008

Refactored code by introducing a two-module solution.

There is one general module in which vendor specific modules can hook into.
However, that is exclusive, there is only one vendor specific module
allowed at a time. A CPU vendor check makes sure only the correct
module for the underlying system gets called.

Functinally in terms of patch loading itself there are no changes. This
refactoring provides a basis for future implementations of other vendors'
patch loaders.
Signed-off-by: NPeter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8d86f390

28 7月, 2008 1 次提交

documentation: move mtrr.txt to Doc/x86/ subdir · 7225e751

由 Randy Dunlap 提交于 7月 26, 2008

Move mtrr.txt to the Documentation/x86/ subdirectory.
Add 00-INDEX to the Documentation/x86/ subdirectory.
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7225e751

27 7月, 2008 4 次提交

x86: tracehook: CONFIG_HAVE_ARCH_TRACEHOOK · 99bbc4b1

由 Roland McGrath 提交于 4月 20, 2008

The x86 arch code has all the prerequisites, so set HAVE_ARCH_TRACEHOOK.
Signed-off-by: NRoland McGrath <roland@redhat.com>

99bbc4b1

x86: lockless get_user_pages_fast() · 8174c430

由 Nick Piggin 提交于 7月 25, 2008

Implement get_user_pages_fast without locking in the fastpath on x86.

Do an optimistic lockless pagetable walk, without taking mmap_sem or any
page table locks or even mmap_sem.  Page table existence is guaranteed by
turning interrupts off (combined with the fact that we're always looking
up the current mm, means we can do the lockless page table walk within the
constraints of the TLB shootdown design).  Basically we can do this
lockless pagetable walk in a similar manner to the way the CPU's pagetable
walker does not have to take any locks to find present ptes.

This patch (combined with the subsequent ones to convert direct IO to use
it) was found to give about 10% performance improvement on a 2 socket 8
core Intel Xeon system running an OLTP workload on DB2 v9.5

 "To test the effects of the patch, an OLTP workload was run on an IBM
  x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
  2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel.  Comparing
  runs with and without the patch resulted in an overall performance
  benefit of ~9.8%.  Correspondingly, oprofiles showed that samples from
  __up_read and __down_read routines that is seen during thread contention
  for system resources was reduced from 2.8% down to .05%.  Monitoring the
  /proc/vmstat output from the patched run showed that the counter for
  fast_gup contained a very high number while the fast_gup_slow value was
  zero."

(fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
counter we had for the number of times the slowpath was invoked).

The main reason for the improvement is that DB2 has multiple threads each
issuing direct-IO.  Direct-IO uses get_user_pages, and thus the threads
contend the mmap_sem cacheline, and can also contend on page table locks.

I would anticipate larger performance gains on larger systems, however I
think DB2 uses an adaptive mix of threads and processes, so it could be
that thread contention remains pretty constant as machine size increases.
In which case, we stuck with "only" a 10% gain.

The downside of using get_user_pages_fast is that if there is not a pte
with the correct permissions for the access, we end up falling back to
get_user_pages and so the get_user_pages_fast is a bit of extra work.
However this should not be the common case in most performance critical
code.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: Kconfig fix]
[akpm@linux-foundation.org: Makefile fix/cleanup]
[akpm@linux-foundation.org: warning fix]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8174c430

kexec jump: save/restore device state · 89081d17

由 Huang Ying 提交于 7月 25, 2008

This patch implements devices state save/restore before after kexec.

This patch together with features in kexec_jump patch can be used for
following:

- A simple hibernation implementation without ACPI support.  You can kexec a
  hibernating kernel, save the memory image of original system and shutdown
  the system.  When resuming, you restore the memory image of original system
  via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot.  You can make system
  snapshot, jump back, do some thing and make another system snapshot.

- Cooperative multi-kernel/system.  With kexec jump, you can switch between
  several kernels/systems quickly without boot process except the first time.
  This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.

The following user-space tools can be used with kexec jump:

- kexec-tools needs to be patched to support kexec jump. The patches
  and the precompiled kexec can be download from the following URL:
       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

- makedumpfile with patches are used as memory image saving tool, it
  can exclude free pages from original kernel memory image file. The
  patches and the precompiled makedumpfile can be download from the
  following URL:
       source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10

- An initramfs image can be used as the root file system of kexeced
  kernel. An initramfs image built with "BuildRoot" can be downloaded
  from the following URL:
       initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
  All user space tools above are included in the initramfs image.

Usage example of simple hibernation:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
   download the pre-built initramfs image, called rootfs.gz in
   following text.

3. Prepare a partition to save memory image of original kernel, called
   hibernating partition in following text.

4. Boot kernel compiled in step 1 (kernel A).

5. In the kernel A, load kernel compiled in step 1 (kernel B) with
   /sbin/kexec. The shell command line can be as follow:

   /sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
     --mem-max=0xffffff --initrd=rootfs.gz

6. Boot the kernel B with following shell command line:

   /sbin/kexec -e

7. The kernel B will boot as normal kexec. In kernel B the memory
   image of kernel A can be saved into hibernating partition as
   follow:

   jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
   echo $jump_back_entry > kexec_jump_back_entry
   cp /proc/vmcore dump.elf

   Then you can shutdown the machine as normal.

8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

9. In kernel C, load the memory image of kernel A as follow:

   /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

10. Jump back to the kernel A as follow:

   /sbin/kexec -e

   Then, kernel A is resumed.

Implementation point:

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

Known issues:

- Because the segment number supported by sys_kexec_load is limited,
  hibernation image with many segments may not be load. This is
  planned to be eliminated by adding a new flag to sys_kexec_load to
  make a image can be loaded with multiple sys_kexec_load invoking.

Now, only the i386 architecture is supported.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89081d17

kexec jump · 3ab83521

由 Huang Ying 提交于 7月 25, 2008

This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ab83521