提交 · 62ec56524f0eeaa1aa4f7281425fa34d400cdacc · openanolis / cloud-kernel

12 11月, 2007 1 次提交

virtio: Force use of power-of-two for descriptor ring sizes · 42b36cc0

由 Rusty Russell 提交于 11月 12, 2007

The virtio descriptor rings of size N-1 were nicely set up to be
aligned to an N-byte boundary. But as Anthony Liguori points out, the
free-running indices used by virtio require that the sizes be a power
of 2, otherwise we get problems on wrap (demonstrated with lguest).

So we replace the clever "2^n-1" scheme with a simple "align to page
boundary" scheme: this means that all virtio rings take at least two
pages, but it's safer than guessing cache alignment.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

42b36cc0

25 10月, 2007 3 次提交
- R
  lguest: documentation update · e1e72965
  由 Rusty Russell 提交于 10月 25, 2007
```
Went through the documentation doing typo and content fixes.  This
patch contains only comment and whitespace changes.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  e1e72965
- R
  lguest: remove unused "wake" element from struct lguest · 197bff63
  由 Rusty Russell 提交于 10月 25, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  197bff63
- R
  lguest: use defines from x86 headers instead of magic numbers · 25c47bb3
  由 Rusty Russell 提交于 10月 25, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  25c47bb3
23 10月, 2007 25 次提交

generalize lgread_u32/lgwrite_u32. · 2d37f94a

由 Rusty Russell 提交于 10月 22, 2007

Jes complains that page table code still uses lgread_u32 even though
it now uses general kernel pte types.  The best thing to do is to
generalize lgread_u32 and lgwrite_u32.

This means we lose the efficiency of getuser().  We could potentially
regain it if we used __copy_from_user instead of copy_from_user, but
I'm not certain that our range check is equivalent to access_ok() on
all platforms.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NJes Sorensen <jes@sgi.com>

2d37f94a

Lguest support for Virtio · 19f1537b

由 Rusty Russell 提交于 10月 22, 2007

This makes lguest able to use the virtio devices.

We change the device descriptor page from a simple array to a variable
length "type, config_len, status, config data..." format, and
implement virtio_config_ops to read from that config data.

We use the virtio ring implementation for an efficient Guest <-> Host
virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
host when it changes.

We also use LHCALL_NOTIFY on kernel addresses for very very early
console output.  We could have another hypercall, but this hack works
quite well.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

19f1537b

Remove old lguest I/O infrrasructure. · 15045275

由 Rusty Russell 提交于 10月 22, 2007

This patch gets rid of the old lguest host I/O infrastructure and
replaces it with a single hypercall "LHCALL_NOTIFY" which takes an
address.

The main change is the removal of io.c: that mainly did inter-guest
I/O, which virtio doesn't yet support.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

15045275

Remove old lguest bus and drivers. · 0ca49ca9

由 Rusty Russell 提交于 10月 22, 2007

This gets rid of the lguest bus, drivers and DMA mechanism, to make
way for a generic virtio mechanism.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

0ca49ca9

Boot with virtual == physical to get closer to native Linux. · 47436aa4

由 Rusty Russell 提交于 10月 22, 2007

1) This allows us to get alot closer to booting bzImages.

2) It means we don't have to know page_offset.

3) The Guest needs to modify the boot pagetables to create the
   PAGE_OFFSET mapping before jumping to C code.

4) guest_pa() walks the page tables rather than using page_offset.

5) We don't use page_offset to figure out whether to emulate: it was
   always kinda quesationable, and won't work for instructions done
   before remapping (bzImage unpacking in particular).

6) We still want the kernel address for tlb flushing: have the initial
   hypercall give us that, too.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

47436aa4

Allow guest to specify syscall vector to use. · c18acd73

由 Rusty Russell 提交于 10月 22, 2007

(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).

This patch allows Guests to specify what system call vector they want,
and we try to reserve it.  We only allow one non-Linux system call
vector, to try to avoid DoS on the Host.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

c18acd73

R
Rename "cr3" to "gpgdir" to avoid x86-specific naming. · ee3db0f2
由 Rusty Russell 提交于 10月 22, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
ee3db0f2

Pagetables to use normal kernel types · df29f43e

由 Matias Zabaljauregui 提交于 10月 22, 2007

This is my first step in the migration of page_tables.c to the kernel
types and functions/macros (2.6.23-rc3). Seems to be working OK.
Signed-off-by: NMatias Zabaljauregui <matias.zabaljauregui@cern.ch>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

df29f43e

Move register setup into i386_core.c · d612cde0

由 Jes Sorensen 提交于 10月 22, 2007

Move setup_regs() to lguest_arch_setup_regs() in i386_core.c given
that this is very architecture specific.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d612cde0

Change example launcher to use unsigned long not u32 · 511801dc

由 Jes Sorensen 提交于 10月 22, 2007

Apply Clue 2x4 to lguest userland<->kernel handling code and the
lguest launcher. Pointers are not to be passed in u32's!

Basic rule of thumb: Anything passing u32's back and forth should be
passing unsigned longs to be portable to 64 bit archs.

For those who forgotten already, I repeat: NO POINTERS IN u32!
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

511801dc

Make hypercalls arch-independent. · b410e7b1

由 Jes Sorensen 提交于 10月 22, 2007

Clean up the hypercall code to make the code in hypercalls.c
architecture independent. First process the common hypercalls and
then call lguest_arch_do_hcall() if the call hasn't been handled.
Rename struct hcall_ring to hcall_args.

This patch requires the previous patch which reorganize the layout of
struct lguest_regs on i386 so they match the layout of struct
hcall_args.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b410e7b1

Introduce "hcall" pointer to indicate pending hypercall. · cc6d4fbc

由 Rusty Russell 提交于 10月 22, 2007

Currently we look at the "trapnum" to see if the Guest wants a
hypercall.  But once the hypercall is done we have to reset trapnum to
a bogus value, otherwise if we exit to userspace and return, we'd run
the same hypercall twice (that was a nasty bug to find!).

This has two main effects:

1) When Jes's patch changes the hypercall args to be a generic "struct
   hcall_args" we simply change the type of "lg->hcall".  It's set by
   arch code, so if it has to copy args or something it can do so, and
   point "hcall" into lg->arch somewhere.

2) Async hypercalls only get run when an actual hypercall is pending.
   This simplfies the code a little and is a more logical semantic.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

cc6d4fbc

Reorder guest saved regs to match hyperall order · 4614a3a3

由 Jes Sorensen 提交于 10月 22, 2007

Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they
will be located together and allow it to map directly to a struct
hcall_ring entry (which will be renamed struct hcall_args as in a
subsequent patch).

This is in preparation for making the code hcall code architecture
independent.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

4614a3a3

Move i386 part of core.c to x86/core.c. · 625efab1

由 Jes Sorensen 提交于 10月 22, 2007

Separate i386 architecture specific from core.c and move it to
x86/core.c and add x86/lguest.h header file to match.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

625efab1

Make shadow IDT a complete IDT with 256 entries. · 56adbe9d

由 Rusty Russell 提交于 10月 22, 2007

This simplifies the code a little, in preparation for allowing
alternate system call vectors in guests (Plan 9 uses 0x40).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

56adbe9d

Remove fixed limit on number of guests, and lguests array. · 48245cc0

由 Rusty Russell 提交于 10月 22, 2007

Back when we had all the Guest state in the switcher, we had a fixed
array of them.  This is no longer necessary.

If we switch the network code to using random_ether_addr (46 bits is
enough to avoid clashes), we can get rid of the concept of "guest id"
altogether.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

48245cc0

Introduce guest mem offset, static link example launcher · 3c6b5bfa

由 Rusty Russell 提交于 10月 22, 2007

In order to avoid problematic special linking of the Launcher, we give
the Host an offset: this means we can use any memory region in the
Launcher as Guest memory rather than insisting on mmap() at 0.

The result is quite pleasing: a number of casts are replaced with
simple additions.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

3c6b5bfa

Rename switcher.S to x86/switcher_32.S · 1f4e1de4

由 Rusty Russell 提交于 10月 22, 2007

lguest uses a "switcher" shim mapped high to bounce between host and
guest.  As lguest becomes less i386-centric, we separate this code
into a subdir.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1f4e1de4

Move lguest guest support to arch/x86. · 34b8867a

由 Rusty Russell 提交于 10月 22, 2007

Lguest has two sides: host support (to launch guests) and guest
support (replacement boot path and paravirt_ops).  This moves the
guest side to arch/x86/lguest where it's closer to related code.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>

34b8867a

Clocksource is continuous regardless of the state of the host's TSC. · 05aa026a

由 Tony Breeds 提交于 10月 22, 2007

Currently lguest will spend a lot of of time waking up the host, as it
cannot go tickless (if the [host] TSC has been marked unstable). On my
laptop I was getting ~40% of wakeups from lguest.

With this patch applied, my laptop is much happier!
Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

05aa026a

R
lguest_devices belongs in lguest_bus.c: it's not i386-specific. · ebac5252
由 Rusty Russell 提交于 10月 22, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
ebac5252
R
Lguest currently depends on 32-bit x86, not just x86. · 141341cd
由 Rusty Russell 提交于 10月 22, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
141341cd

Use copy_to_user() not put_user for struct timespec · 891ff65f

由 Jes Sorensen 提交于 10月 22, 2007

Use copy_to_user() when copying a struct timespec to the guest -
put_user() cannot handle two long's in one go on a 64bit arch.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>

891ff65f

Remove binfmts.h include from lg.h · 25e82eba

由 Rusty Russell 提交于 10月 23, 2007

It wasn't needed since a very early prototype of lguest.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

25e82eba

Normalize config options for guest support · d3d1c4bd

由 Rusty Russell 提交于 10月 22, 2007

1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
   menu.
2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
3) Make kconfig help titles consistent.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>

d3d1c4bd

17 10月, 2007 3 次提交

[x86] remove uses of magic macros for boot_params access · 30c82645

由 H. Peter Anvin 提交于 10月 15, 2007

Instead of using magic macros for boot_params access, simply use the
boot_params structure.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

30c82645

paravirt: clean up lazy mode handling · 8965c1c0

由 Jeremy Fitzhardinge 提交于 10月 16, 2007

Currently, the set_lazy_mode pv_op is overloaded with 5 functions:
 1. enter lazy cpu mode
 2. leave lazy cpu mode
 3. enter lazy mmu mode
 4. leave lazy mmu mode
 5. flush pending batched operations

This complicates each paravirt backend, since it needs to deal with
all the possible state transitions, handling flushing, etc. In
particular, flushing is quite distinct from the other 4 functions, and
seems to just cause complication.

This patch removes the set_lazy_mode operation, and adds "enter" and
"leave" lazy mode operations on mmu_ops and cpu_ops.  All the logic
associated with enter and leaving lazy states is now in common code
(basically BUG_ONs to make sure that no mode is current when entering
a lazy mode, and make sure that the mode is current when leaving).
Also, flush is handled in a common way, by simply leaving and
re-entering the lazy mode.

The result is that the Xen, lguest and VMI lazy mode implementations
are much simpler.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Anthony Liguory <aliguori@us.ibm.com>
Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>

8965c1c0

paravirt: refactor struct paravirt_ops into smaller pv_*_ops · 93b1eab3

由 Jeremy Fitzhardinge 提交于 10月 16, 2007

This patch refactors the paravirt_ops structure into groups of
functionally related ops:

pv_info - random info, rather than function entrypoints
pv_init_ops - functions used at boot time (some for module_init too)
pv_misc_ops - lazy mode, which didn't fit well anywhere else
pv_time_ops - time-related functions
pv_cpu_ops - various privileged instruction ops
pv_irq_ops - operations for managing interrupt state
pv_apic_ops - APIC operations
pv_mmu_ops - operations for managing pagetables

There are several motivations for this:

1. Some of these ops will be general to all x86, and some will be
   i386/x86-64 specific.  This makes it easier to share common stuff
   while allowing separate implementations where needed.

2. At the moment we must export all of paravirt_ops, but modules only
   need selected parts of it.  This allows us to export on a case by case
   basis (and also choose which export license we want to apply).

3. Functional groupings make things a bit more readable.

Struct paravirt_ops is now only used as a template to generate
patch-site identifiers, and to extract function pointers for inserting
into jmp/calls when patching.  It is only instantiated when needed.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Anthony Liguory <aliguori@us.ibm.com>
Cc: "Glauber de Oliveira Costa" <glommer@gmail.com>
Cc: Jun Nakajima <jun.nakajima@intel.com>

93b1eab3

25 9月, 2007 1 次提交

fix modules oopsing in lguest guests · bbbd2bf0

由 Rusty Russell 提交于 9月 24, 2007

The assembly templates for lguest guest patching are in the .init.text
section.  This means that modules get patched with "cc cc cc cc" or similar
junk.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bbbd2bf0

13 9月, 2007 1 次提交

lguest: Fix guest crash when CONFIG_X86_USE_3DNOW=y · c413fecc

由 Rusty Russell 提交于 9月 11, 2007

One of the very first things lguest_init() does is a memcpy.  On
Athlon/Duron/K7 or CyrixIII/VIA-C3 or Geode GX/LX, this tries to use
MMX.

memcpy -> _mmx_memcpy -> kernel_fpu_begin -> clts -> paravirt_ops.clts

But we haven't set paravirt_ops.clts yet, so we do the native version
and crash.  The simplest solution is to use __memcpy.

Thanks to Michael Rasenberger for the bug report.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c413fecc

31 8月, 2007 1 次提交

Fix lguest page-pinning logic ("lguest: bad stack page 0xc057a000") · 8057d763

由 Rusty Russell 提交于 8月 30, 2007

If the stack pointer is 0xc057a000, then the first stack page is at
0xc0579000 (the stack pointer is decremented before use).  Not
calculating this correctly caused guests with CONFIG_DEBUG_PAGEALLOC=y
to be killed with a "bad stack page" message: the initial kernel stack
was just proceeding the .smp_locks section which
CONFIG_DEBUG_PAGEALLOC marks read-only when freeing.

Thanks to Frederik Deweerdt for the bug report!
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8057d763

24 8月, 2007 1 次提交

lguest should depend on CONFIG_FUTEX · deec5950

由 Alexey Dobriyan 提交于 8月 24, 2007

It uses get_futex_key().
Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

deec5950

12 8月, 2007 2 次提交

i386: Make patching more robust, fix paravirt issue · ab144f5e

由 Andi Kleen 提交于 8月 10, 2007

Commit 19d36ccd "x86: Fix alternatives
and kprobes to remap write-protected kernel text" uses code which is
being patched for patching.

In particular, paravirt_ops does patching in two stages: first it
calls paravirt_ops.patch, then it fills any remaining instructions
with nop_out().  nop_out calls text_poke() which calls
lookup_address() which calls pgd_val() (aka paravirt_ops.pgd_val):
that call site is one of the places we patch.

If we always do patching as one single call to text_poke(), we only
need make sure we're not patching the memcpy in text_poke itself.
This means the prototype to paravirt_ops.patch needs to change, to
marshal the new code into a buffer rather than patching in place as it
does now.  It also means all patching goes through text_poke(), which
is known to be safe (apply_alternatives is also changed to make a
single patch).

AK: fix compilation on x86-64 (bad rusty!)
AK: fix boot on x86-64 (sigh)
AK: merged with other patches
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab144f5e

lguest files should explicitly include asm/paravirt.h · b1a47190

由 Jes Sorensen 提交于 8月 10, 2007

Files using bits from paravirt.h should explicitly include it rather than
relying on it being pulled in by something else.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b1a47190

09 8月, 2007 2 次提交

lguest: Fix Malicious Guest GDT Host Crash · 0d027c01

由 Rusty Russell 提交于 8月 09, 2007

If a Guest makes hypercall which sets a GDT entry to not present, we
currently set any segment registers using that GDT entry to 0.
Unfortunately, this is not sufficient: there are other ways of
altering GDT entries which will cause a fault.

The correct solution to do what Linux does: let them set any GDT value
they want and handle the #GP when popping causes a fault.  This has
the added benefit of making our Switcher slightly more robust in the
case of any other bugs which cause it to fault.

We kill the Guest if it causes a fault in the Switcher: it's the
Guest's responsibility to make sure it's not using segments when it
changes them.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0d027c01

Fix non-TSC guest clocksource lockup · 37250097

由 Rusty Russell 提交于 8月 09, 2007

lguest uses a host-supplied wallclock-based clocksource when the TSC
is not reliable.  As this is already in nanoseconds, I naively used a
multiplier of 1 and a shift of 0.

But update_wall_time() in its infinite wisdom decides to adjust the
clock a little (where does it think it's getting a more accurate time
from?)

It will happily tweak the multiplier... to 0, then -1.

So the "fix" is to use a shift of 22 like everyone else, and a
multiplier of 1 << 22.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37250097

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功