提交 · 796216a57fe45c04adc35bda1f0782efec78a713 · openanolis / cloud-kernel

15 3月, 2009 3 次提交

x86: allow extend_brk users to reserve brk space · 796216a5

由 Jeremy Fitzhardinge 提交于 3月 12, 2009

Impact: new interface; remove hard-coded limit

Add RESERVE_BRK(name, size) macro to reserve space in the brk
area.  This should be a conservative (ie, larger) estimate of
how much space might possibly be required from the brk area.
Any unused space will be freed, so there's no real downside
on making the reservation too large (within limits).

The name should be unique within a given file, and somewhat
descriptive.

The C definition of RESERVE_BRK() ends up being more complex than
one would expect to work around a cluster of gcc infelicities:

  The first attempt was to simply try putting __section(.brk_reservation)
  on a variable.  This doesn't work because it ends up making it a
  @progbits section, which gets actual space allocated in the vmlinux
  executable.

  The second attempt was to emit the space into a section using asm,
  but gcc doesn't allow arguments to be passed to file-level asm()
  statements, making it hard to pass in the size.

  The final attempt is to wrap the asm() in a function to allow
  it to have arguments, and put the function itself into the
  .discard section, which vmlinux*.lds drops entirely from the
  emitted vmlinux.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

796216a5

x86-32: compute initial mapping size more accurately · 7543c1de

由 Yinghai Lu 提交于 3月 12, 2009

Impact: simplification

We only need to map the kernel in head_32.S, not the whole of
lowmem.  We use 512MB as a reasonable (but arbitrary) limit on
the maximum size of the kernel image.
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

7543c1de

x86-32: use brk segment for allocating initial kernel pagetable · ccf3fe02

由 Jeremy Fitzhardinge 提交于 2月 27, 2009

Impact: use new interface instead of previous ad hoc implementation

Rather than having special purpose init_pg_table_start/end variables
to delimit the kernel pagetable built by head_32.S, just use the brk
mechanism to extend the bss for the new pagetable.

This patch removes init_pg_table_start/end and pg0, defines __brk_base
(which is page-aligned and immediately follows _end), initializes
the brk region to start there, and uses it for the 32-bit pagetable.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ccf3fe02

14 2月, 2009 1 次提交

x86: use _types.h headers in asm where available · 0341c14d

由 Jeremy Fitzhardinge 提交于 2月 13, 2009

In general, the only definitions that assembly files can use
are in _types.S headers (where available), so convert them.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

0341c14d

11 2月, 2009 1 次提交

x86: fix x86_32 stack protector bugs · 5c79d2a5

由 Tejun Heo 提交于 2月 11, 2009

Impact: fix x86_32 stack protector

Brian Gerst found out that %gs was being initialized to stack_canary
instead of stack_canary - 20, which basically gave the same canary
value for all threads.  Fixing this also exposed the following bugs.

* cpu_idle() didn't call boot_init_stack_canary()

* stack canary switching in switch_to() was being done too late making
  the initial run of a new thread use the old stack canary value.

Fix all of them and while at it update comment in cpu_idle() about
calling boot_init_stack_canary().
Reported-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5c79d2a5

10 2月, 2009 1 次提交

x86: implement x86_32 stack protector · 60a5317f

由 Tejun Heo 提交于 2月 09, 2009

Impact: stack protector for x86_32

Implement stack protector for x86_32.  GDT entry 28 is used for it.
It's set to point to stack_canary-20 and have the length of 24 bytes.
CONFIG_CC_STACKPROTECTOR turns off CONFIG_X86_32_LAZY_GS and sets %gs
to the stack canary segment on entry.  As %gs is otherwise unused by
the kernel, the canary can be anywhere.  It's defined as a percpu
variable.

x86_32 exception handlers take register frame on stack directly as
struct pt_regs.  With -fstack-protector turned on, gcc copies the
whole structure after the stack canary and (of course) doesn't copy
back on return thus losing all changed.  For now, -fno-stack-protector
is added to all files which contain those functions.  We definitely
need something better.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

60a5317f

26 1月, 2009 2 次提交

I
x86: improve early fault/irq printout · d5e397cb
由 Ingo Molnar 提交于 1月 26, 2009
```
Impact: add a stack dump to early IRQs/faults
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
d5e397cb

x86, debug: remove early_printk() #ifdefs from head_32.S · 34707bcd

由 Ingo Molnar 提交于 1月 26, 2009

Impact: cleanup

Remove such constructs:

 #ifdef CONFIG_EARLY_PRINTK
        call early_printk
 #else
        call printk
 #endif

Not only are they ugly, they are also pointless: a call to printk()
maps to early_printk during early bootup anyway, if CONFIG_EARLY_PRINTK
is enabled.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

34707bcd

21 1月, 2009 1 次提交

x86: set %fs to __KERNEL_PERCPU unconditionally for x86_32 · 0dd76d73

由 Brian Gerst 提交于 1月 21, 2009

Impact: cleanup

%fs is currently set to __KERNEL_DS at boot, and conditionally
switched to __KERNEL_PERCPU for secondary cpus.  Instead, initialize
GDT_ENTRY_PERCPU to the same attributes as GDT_ENTRY_KERNEL_DS and
set %fs to __KERNEL_PERCPU unconditionally.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

0dd76d73

11 10月, 2008 1 次提交

x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot · b2bc2731

由 Suresh Siddha 提交于 9月 23, 2008

Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b2bc2731

28 7月, 2008 1 次提交

x86: fix cpu hotplug on 32bit · 583323b9

由 Thomas Gleixner 提交于 7月 27, 2008

commit 3e970473 ("x86: boot secondary
cpus through initial_code") causes the kernel to crash when a CPU is
brought online after the read only sections have been write
protected. The write to initial_code in do_boot_cpu() fails.

Move inital_code to .cpuinit.data section.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NH. Peter Anvin <hpa@zytor.com>

583323b9

08 7月, 2008 2 次提交

x86: boot secondary cpus through initial_code · 3e970473

由 Glauber Costa 提交于 5月 28, 2008

remove "initialize_secondary". Boot both architectures via
initial_code.
Signed-off-by: NGlauber Costa <gcosta@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3e970473

x86: use initial_code for i386 · e3f77edf

由 Glauber Costa 提交于 5月 28, 2008

x86_64 jumps to whatever is written in "initial_code" symbol,
instead of a fixed address. Do it for i386 too. It will allow us
to integrate more of the smp boot code.
Signed-off-by: NGlauber Costa <gcosta@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e3f77edf

13 6月, 2008 1 次提交

x86: fix asm warning in head_32.S · 86b2b70e

由 Joe Korty 提交于 6月 02, 2008

On Mon, May 19, 2008 at 04:10:02PM -0700, Linus Torvalds wrote:
> It also causes these warnings on 32-bit PAE:
>
> 	  AS      arch/x86/kernel/head_32.o
> 	arch/x86/kernel/head_32.S: Assembler messages:
> 	arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
> 	arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed
>
> and I do not see why (the end result seems to be identical).

Fix head_32.S gcc bignum warnings when CONFIG_PAE=y.

    arch/x86/kernel/head_32.S: Assembler messages:
    arch/x86/kernel/head_32.S:225: Warning: left operand is a bignum; integer 0 assumed
    arch/x86/kernel/head_32.S:609: Warning: left operand is a bignum; integer 0 assumed

The assembler was stumbling over the 64-bit constant 0x100000000 in the
KPMDS #define.

Testing: a cmp(1) on head_32.o before and after shows the binary is unchanged.

Signed-off-by: Joe Korty <joe.korty@ccur.com
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Theodore Tso <tytso@mit.edu>
Cc: Gabriel C <nix.or.die@googlemail.com>
Cc: Keith Packard <keithp@keithp.com>
Cc: "Pallipadi Venkatesh" <venkatesh.pallipadi@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: "Siddha Suresh B" <suresh.b.siddha@intel.com>
Cc: bugme-daemon@bugzilla.kernel.org
Cc: airlied@linux.ie
Cc: "Barnes Jesse" <jesse.barnes@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

86b2b70e

03 6月, 2008 1 次提交

x86: clean up max_pfn_mapped usage - 32-bit · 6af61a76

由 Yinghai Lu 提交于 6月 01, 2008

on 32-bit in head_32.S after initial page table is done, we get initial
max_pfn_mapped, and then kernel_physical_mapping_init will give us
a final one.

We need to use that to make sure find_e820_area will get valid addresses
for boot_map and for NODE_DATA(0) on numa32.

XEN PV and lguest may need to assign max_pfn_mapped too.
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6af61a76

31 5月, 2008 1 次提交

x86: extend e820 early_res support 32bit -fix · f0d43100

由 Yinghai Lu 提交于 5月 29, 2008

introduce init_pg_table_start, so xen PV could specify the value.
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f0d43100

01 5月, 2008 1 次提交

x86: fix early-BUG message · 575ca735

由 Vegard Nossum 提交于 4月 25, 2008

The .asciz directive takes any number of strings, but each one is zero-
terminated, and string pasting is not done as in C. That results in only the
first line being output.

Replace .asciz with multiple .ascii directives and terminate with .asciz.
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

575ca735

20 4月, 2008 1 次提交

x86: remove pointless comments · cf9b111c

由 WANG Cong 提交于 3月 08, 2008

Remove old comments that include the old arch/i386 directory.
Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

cf9b111c

17 4月, 2008 1 次提交

x86: introduce kernel/head32.c · 700efc1b

由 Eric W. Biederman 提交于 2月 23, 2008

Copy x86_64 and add a head32.c so we can start moving early
architecture initialization out of assembly.

[ Sam Ravnborg <sam@ravnborg.org>: updated it to x86 ]
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

700efc1b

22 3月, 2008 1 次提交

x86: fix fault_msg nul termination · e215f3c2

由 Jiri Slaby 提交于 3月 12, 2008

The fault_msg text is not explictly nul terminated now in startup
assembly. Do so by converting .ascii to .asciz.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

e215f3c2

26 2月, 2008 1 次提交

x86: don't make swapper_pg_pmd global · ed2b7e2b

由 Adrian Bunk 提交于 2月 22, 2008

There doesn't seem to be any reason for swapper_pg_pmd being global.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ed2b7e2b

19 2月, 2008 1 次提交

x86: don't make swapper_pg_fixmap global · aa65af3f

由 Adrian Bunk 提交于 2月 13, 2008

Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Cc: Ian Campbell <ijc@hellion.org.uk>
Cc: hpa@zytor.com
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

aa65af3f

10 2月, 2008 1 次提交

x86: construct 32-bit boot time page tables in native format. · 551889a6

由 Ian Campbell 提交于 2月 09, 2008

Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
kernel are in PAE format.

early_ioremap is updated to use the standard page table accessors.

Clear any mappings beyond max_low_pfn from the boot page tables in
native_pagetable_setup_start because the initial mappings can extend
beyond the range of physical memory and into the vmalloc area.

Derived from patches by Eric Biederman and H. Peter Anvin.

[ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]
Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika PenttilÃÂ¤ <mika.penttila@kolumbus.fi>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

551889a6

30 1月, 2008 4 次提交

x86: fix Section mismatch: reference to .init.text:lguest_entry · 8b2f7fff

由 Sam Ravnborg 提交于 1月 30, 2008

fix:

> WARNING: vmlinux.o(.data+0x4): Section mismatch: reference to .init.text:lguest_entry (between 'subarch_entries' and 'stack_start')
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

8b2f7fff

x86_32: always run the full set of paging state. · 5756dd59

由 Ian Campbell 提交于 1月 30, 2008

I am preparing to convert the boot time page table to the kernels
native format.  To achieve that I need to enable PAE. Enabling PSE
and the no execute bit would not hurt.  So this patch modifies
the boot cpu path to execute all of the kernels enable code
if and only if we have the proper bits set in mmu_cr4_features.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5756dd59

x86_32: remove unnecessary use of %ebx as the boot cpu flag · 50359501

由 Ian Campbell 提交于 1月 30, 2008

Currently in head_32.S there are two ways we test to see if we
are the boot cpu.  By looking at %ebx and by looking at the
static variable ready.  When changing things around I have
found that it gets tricky to preserve %ebx.  So this
patch just switches head.S over to the more reliable
test of always using ready.

Hopefully later we can kill these tests entirely.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

50359501

x86: early fault debugging improvement · 94878efd

由 Ingo Molnar 提交于 1月 30, 2008

Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

94878efd

02 1月, 2008 1 次提交

fix lguest rmmod "bad pgd" · 476c6c11

由 Rusty Russell 提交于 1月 01, 2008

After 17d57a92 ("x86: fix x86-32 early
fixmap initialization.") removing lg.ko caused a printk from vunmap:

	mm/memory.c:115: bad pgd 004b3027.

On the second use after module load, the kernel crashes.

This fixes the immediate problem (accessed and dirty bits not set as
expected in pmd_none_or_clear_bad).  I can't see why this would cause
a crash, but I haven't been able to reproduce it once this is applied.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

476c6c11

04 12月, 2007 1 次提交

x86: fix x86-32 early fixmap initialization. · 17d57a92

由 Eric W. Biederman 提交于 12月 01, 2007

pageexec@freemail.hu writes:

> i've just noticed that the chunk in i386/kernel/head.S ended up in a
> weird place, namely, it's not going to be executed as it's just after
> a 'jmp 3f' and before startup_32_smp, probably not what you intended.
> on a sidenote, the whole thing can be done in a single insn, like:
>
> movl $(swapper_pg_pmd - __PAGE_OFFSET + 0x067), (swapper_pg_dir -
> __PAGE_OFFSET+ 4092)

Thanks for the reminder I thought we had fixed this problem a while ago.

Needed to get fixed virtual address for USB debug and earlycon with mmio.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

17d57a92

24 10月, 2007 1 次提交

x86: clean up setup.h and the boot code · fa76dab9

由 H. Peter Anvin 提交于 10月 23, 2007

Make <asm/setup.h> usable by the boot code.

Clean up vestiges of the old command-line protocol from setup.h and
head_32.S (it is still supported from the boot loader point of
view, since it is converted to the new command-line protocol by the
boot code.)
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

fa76dab9

22 10月, 2007 1 次提交

i386: paravirt boot sequence · a24e7851

由 Rusty Russell 提交于 10月 21, 2007

This patch uses the updated boot protocol to do paravirtualized boot.
If the boot version is >= 2.07, then it will do two things:

 1. Check the bootparams loadflags to see if we should reload the
    segment registers and clear interrupts.  This is appropriate
    for normal native boot and some paravirtualized environments, but
    inapproprate for others.

 2. Check the hardware architecture, and dispatch to the appropriate
    kernel entrypoint.  If the bootloader doesn't set this, then we
    simply do the normal boot sequence.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a24e7851

18 10月, 2007 2 次提交

i386: print better early fault info · 382f64ab

由 Ingo Molnar 提交于 10月 17, 2007

improve early fault output.

old format:

 Int 14: CR2 010001e3  err 00000002  EIP c011f2f9  CS 00000060  flags 00010046
 Stack: c073695e c0791c10 00000000 ffffffff 00000000 01000000 00001000 c0791c10

new format:

 BUG: Int 14: CR2 010001e3
      EDI c1000000  ESI c0693c10  EBP c0637f9c  ESP c0637f08
      EBX 00000000  EDX 0000000e  ECX 00000000  EAX 010001e3
      err 00000002  EIP c0123119   CS 00000060  flg 00010046
 Stack: c064d589 c0693000 00000000 c0637f60 00c001e3 01000000 00038000 00000163
        00000000 00000163 00000000 ffffffff 00038000 00000000 00000000 00001000
        00001000 00000000 c0637f88 c06509be c0a2ae60 00001000 00001000 00000000
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

382f64ab

x86: prepare page allocator for high allocations on PAGEALLOC=y · 1e3e1972

由 Ingo Molnar 提交于 10月 17, 2007

To preserve the DMA pool in CONFIG_DEBUG_PAGEALLOC=y kernels, we'll
allocate pagetables from above the 16MB DMA limit, so we'll have to set
up boot pagetables to cover 16MB more RAM (worst-case).
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1e3e1972

11 10月, 2007 3 次提交

i386: move kernel · 9a163ed8

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9a163ed8

i386: move xen · 9702785a

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9702785a

i386: prepare shared kernel/head.S · deb86771

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

deb86771

12 8月, 2007 1 次提交

i386: Fix start_kernel warning · 5fe4486c

由 Andi Kleen 提交于 8月 10, 2007

Fix

WARNING: vmlinux.o(.text+0x183): Section mismatch: reference to .init.text:start_kernel (between 'is386' and 'check_x87')
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fe4486c

18 7月, 2007 1 次提交

xen: Core Xen implementation · 5ead97c8

由 Jeremy Fitzhardinge 提交于 7月 17, 2007

This patch is a rollup of all the core pieces of the Xen
implementation, including:
 - booting and setup
 - pagetable setup
 - privileged instructions
 - segmentation
 - interrupt flags
 - upcalls
 - multicall batching

BOOTING AND SETUP

The vmlinux image is decorated with ELF notes which tell the Xen
domain builder what the kernel's requirements are; the domain builder
then constructs the address space accordingly and starts the kernel.

Xen has its own entrypoint for the kernel (contained in an ELF note).
The ELF notes are set up by xen-head.S, which is included into head.S.
In principle it could be linked separately, but it seems to provoke
lots of binutils bugs.

Because the domain builder starts the kernel in a fairly sane state
(32-bit protected mode, paging enabled, flat segments set up), there's
not a lot of setup needed before starting the kernel proper.  The main
steps are:
  1. Install the Xen paravirt_ops, which is simply a matter of a
     structure assignment.
  2. Set init_mm to use the Xen-supplied pagetables (analogous to the
     head.S generated pagetables in a native boot).
  3. Reserve address space for Xen, since it takes a chunk at the top
     of the address space for its own use.
  4. Call start_kernel()

PAGETABLE SETUP

Once we hit the main kernel boot sequence, it will end up calling back
via paravirt_ops to set up various pieces of Xen specific state.  One
of the critical things which requires a bit of extra care is the
construction of the initial init_mm pagetable.  Because Xen places
tight constraints on pagetables (an active pagetable must always be
valid, and must always be mapped read-only to the guest domain), we
need to be careful when constructing the new pagetable to keep these
constraints in mind.  It turns out that the easiest way to do this is
use the initial Xen-provided pagetable as a template, and then just
insert new mappings for memory where a mapping doesn't already exist.

This means that during pagetable setup, it uses a special version of
xen_set_pte which ignores any attempt to remap a read-only page as
read-write (since Xen will map its own initial pagetable as RO), but
lets other changes to the ptes happen, so that things like NX are set
properly.

PRIVILEGED INSTRUCTIONS AND SEGMENTATION

When the kernel runs under Xen, it runs in ring 1 rather than ring 0.
This means that it is more privileged than user-mode in ring 3, but it
still can't run privileged instructions directly.  Non-performance
critical instructions are dealt with by taking a privilege exception
and trapping into the hypervisor and emulating the instruction, but
more performance-critical instructions have their own specific
paravirt_ops.  In many cases we can avoid having to do any hypercalls
for these instructions, or the Xen implementation is quite different
from the normal native version.

The privileged instructions fall into the broad classes of:
  Segmentation: setting up the GDT and the GDT entries, LDT,
     TLS and so on.  Xen doesn't allow the GDT to be directly
     modified; all GDT updates are done via hypercalls where the new
     entries can be validated.  This is important because Xen uses
     segment limits to prevent the guest kernel from damaging the
     hypervisor itself.
  Traps and exceptions: Xen uses a special format for trap entrypoints,
     so when the kernel wants to set an IDT entry, it needs to be
     converted to the form Xen expects.  Xen sets int 0x80 up specially
     so that the trap goes straight from userspace into the guest kernel
     without going via the hypervisor.  sysenter isn't supported.
  Kernel stack: The esp0 entry is extracted from the tss and provided to
     Xen.
  TLB operations: the various TLB calls are mapped into corresponding
     Xen hypercalls.
  Control registers: all the control registers are privileged.  The most
     important is cr3, which points to the base of the current pagetable,
     and we handle it specially.

Another instruction we treat specially is CPUID, even though its not
privileged.  We want to control what CPU features are visible to the
rest of the kernel, and so CPUID ends up going into a paravirt_op.
Xen implements this mainly to disable the ACPI and APIC subsystems.

INTERRUPT FLAGS

Xen maintains its own separate flag for masking events, which is
contained within the per-cpu vcpu_info structure.  Because the guest
kernel runs in ring 1 and not 0, the IF flag in EFLAGS is completely
ignored (and must be, because even if a guest domain disables
interrupts for itself, it can't disable them overall).

(A note on terminology: "events" and interrupts are effectively
synonymous.  However, rather than using an "enable flag", Xen uses a
"mask flag", which blocks event delivery when it is non-zero.)

There are paravirt_ops for each of cli/sti/save_fl/restore_fl, which
are implemented to manage the Xen event mask state.  The only thing
worth noting is that when events are unmasked, we need to explicitly
see if there's a pending event and call into the hypervisor to make
sure it gets delivered.

UPCALLS

Xen needs a couple of upcall (or callback) functions to be implemented
by each guest.  One is the event upcalls, which is how events
(interrupts, effectively) are delivered to the guests.  The other is
the failsafe callback, which is used to report errors in either
reloading a segment register, or caused by iret.  These are
implemented in i386/kernel/entry.S so they can jump into the normal
iret_exc path when necessary.

MULTICALL BATCHING

Xen provides a multicall mechanism, which allows multiple hypercalls
to be issued at once in order to mitigate the cost of trapping into
the hypervisor.  This is particularly useful for context switches,
since the 4-5 hypercalls they would normally need (reload cr3, update
TLS, maybe update LDT) can be reduced to one.  This patch implements a
generic batching mechanism for hypercalls, which gets used in many
places in the Xen code.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Cc: Ian Pratt <ian.pratt@xensource.com>
Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk>
Cc: Adrian Bunk <bunk@stusta.de>

5ead97c8

17 7月, 2007 1 次提交

x86: initial fixmap support · b1c931e3

由 Eric W. Biderman 提交于 7月 15, 2007

Needed to get fixed virtual address for USB debug and earlycon with mmio.
Signed-off-by: NEric W. Biderman <ebiderman@xmisson.com>
Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Gerd Hoffmann <kraxel@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b1c931e3

11 5月, 2007 1 次提交

Revert "[PATCH] paravirt: Add startup infrastructure for paravirtualization" · 5a18c92a

由 Eric W. Biederman 提交于 5月 10, 2007

This reverts commit c9ccf30d.

Entering the kernel at startup_32 without passing our real mode data in
%esi, and without guaranteeing that physical and virtual addresses are
identity mapped makes head.S impossible to maintain.

The only user of this infrastructure is lguest which is not merged so
nothing we currently support will break by removing this over designed
nightmare, and only the pending lguest patches will be affected.  The
pending Xen patches have a different entry point that they use.

We are currently discussing what Xen and lguest need to do to boot the
kernel in a more normal fashion so using startup_32 in this weird manner is
clearly not their long term direction.

So let's remove this code in head.S before it causes brain damage to people
trying to maintain head.S

Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Zachary Amsden <zach@vmware.com>
CC: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a18c92a

openanolis / cloud-kernel 11 个月 前同步成功

openanolis / cloud-kernel
11 个月前同步成功