提交 · 56ae43dfe233323683248a5c553bad7160db2fa5 · openanolis / cloud-kernel

23 10月, 2007 34 次提交

由 Rusty Russell 提交于 10月 22, 2007

This makes lguest able to use the virtio devices.

We change the device descriptor page from a simple array to a variable
length "type, config_len, status, config data..." format, and
implement virtio_config_ops to read from that config data.

We use the virtio ring implementation for an efficient Guest <-> Host
virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
host when it changes.

We also use LHCALL_NOTIFY on kernel addresses for very very early
console output.  We could have another hypercall, but this hack works
quite well.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

19f1537b

Remove old lguest I/O infrrasructure. · 15045275

由 Rusty Russell 提交于 10月 22, 2007

This patch gets rid of the old lguest host I/O infrastructure and
replaces it with a single hypercall "LHCALL_NOTIFY" which takes an
address.

The main change is the removal of io.c: that mainly did inter-guest
I/O, which virtio doesn't yet support.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

15045275

Remove old lguest bus and drivers. · 0ca49ca9

由 Rusty Russell 提交于 10月 22, 2007

This gets rid of the lguest bus, drivers and DMA mechanism, to make
way for a generic virtio mechanism.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

0ca49ca9

Virtio helper routines for a descriptor ringbuffer implementation · 0a8a69dd

由 Rusty Russell 提交于 10月 22, 2007

These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio.  Unlike the previous lguest
implementation:

1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
   cacheline.
5) We do a modulo on a variable.  We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.

Users need only get the ring pages and provide a notify hook (KVM
wants the guest to allocate the rings, lguest does it sanely).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>

0a8a69dd

Module autoprobing support for virtio drivers. · b01d9f28

由 Rusty Russell 提交于 10月 22, 2007

This adds the logic to convert the virtio ids into module aliases, and
includes a modalias entry in sysfs and the env var to make probing work.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b01d9f28

Virtio console driver · 31610434

由 Rusty Russell 提交于 10月 22, 2007

This is an hvc-based virtio console driver.  It's suboptimal becuase
hvc expects to have raw access to interrupts and virtio doesn't assume
that, so it currently polls.

There are two solutions: expose hvc's "kick" interface, or wean off hvc.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

31610434

Block driver using virtio. · e467cde2

由 Rusty Russell 提交于 10月 22, 2007

The block driver uses scatter-gather lists with sg[0] being the
request information (struct virtio_blk_outhdr) with the type, sector
and inbuf id.  The next N sg entries are the bio itself, then the last
sg is the status byte.  Whether the N entries are in or out depends on
whether it's a read or a write.

We accept the normal (SCSI) ioctls: they get handed through to the other
side which can then handle it or reply that it's unsupported.  It's
not clear that this actually works in general, since I don't know
if blk_pc_request() requests have an accurate rq_data_dir().

Although we try to reply -ENOTTY on unsupported commands, ioctl(fd,
CDROMEJECT) returns success to userspace.  This needs a separate
patch.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <jens.axboe@oracle.com>

e467cde2

Net driver using virtio · 296f96fc

由 Rusty Russell 提交于 10月 22, 2007

The network driver uses two virtqueues: one for input packets and one
for output packets.  This has nice locking properties (ie. we don't do
any for recv vs send).

TODO:
	1) Big packets.
	2) Multi-client devices (maybe separate driver?).
	3) Resolve freeing of old xmit skbs (Christian Borntraeger)
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org

296f96fc

Virtio interface · ec3d41c4

由 Rusty Russell 提交于 10月 22, 2007

This attempts to implement a "virtual I/O" layer which should allow
common drivers to be efficiently used across most virtual I/O
mechanisms.  It will no-doubt need further enhancement.

The virtio drivers add buffers to virtio queues; as the buffers are consumed
the driver "interrupt" callbacks are invoked.

There is also a generic implementation of config space which drivers can query
to get setup information from the host.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
Cc: Arnd Bergmann <arnd@arndb.de>

ec3d41c4

Boot with virtual == physical to get closer to native Linux. · 47436aa4

由 Rusty Russell 提交于 10月 22, 2007

1) This allows us to get alot closer to booting bzImages.

2) It means we don't have to know page_offset.

3) The Guest needs to modify the boot pagetables to create the
   PAGE_OFFSET mapping before jumping to C code.

4) guest_pa() walks the page tables rather than using page_offset.

5) We don't use page_offset to figure out whether to emulate: it was
   always kinda quesationable, and won't work for instructions done
   before remapping (bzImage unpacking in particular).

6) We still want the kernel address for tlb flushing: have the initial
   hypercall give us that, too.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

47436aa4

Allow guest to specify syscall vector to use. · c18acd73

由 Rusty Russell 提交于 10月 22, 2007

(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).

This patch allows Guests to specify what system call vector they want,
and we try to reserve it.  We only allow one non-Linux system call
vector, to try to avoid DoS on the Host.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

c18acd73

R
Rename "cr3" to "gpgdir" to avoid x86-specific naming. · ee3db0f2
由 Rusty Russell 提交于 10月 22, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
ee3db0f2

Pagetables to use normal kernel types · df29f43e

由 Matias Zabaljauregui 提交于 10月 22, 2007

This is my first step in the migration of page_tables.c to the kernel
types and functions/macros (2.6.23-rc3). Seems to be working OK.
Signed-off-by: NMatias Zabaljauregui <matias.zabaljauregui@cern.ch>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

df29f43e

Move register setup into i386_core.c · d612cde0

由 Jes Sorensen 提交于 10月 22, 2007

Move setup_regs() to lguest_arch_setup_regs() in i386_core.c given
that this is very architecture specific.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d612cde0

Change example launcher to use unsigned long not u32 · 511801dc

由 Jes Sorensen 提交于 10月 22, 2007

Apply Clue 2x4 to lguest userland<->kernel handling code and the
lguest launcher. Pointers are not to be passed in u32's!

Basic rule of thumb: Anything passing u32's back and forth should be
passing unsigned longs to be portable to 64 bit archs.

For those who forgotten already, I repeat: NO POINTERS IN u32!
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

511801dc

Make hypercalls arch-independent. · b410e7b1

由 Jes Sorensen 提交于 10月 22, 2007

Clean up the hypercall code to make the code in hypercalls.c
architecture independent. First process the common hypercalls and
then call lguest_arch_do_hcall() if the call hasn't been handled.
Rename struct hcall_ring to hcall_args.

This patch requires the previous patch which reorganize the layout of
struct lguest_regs on i386 so they match the layout of struct
hcall_args.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b410e7b1

Introduce "hcall" pointer to indicate pending hypercall. · cc6d4fbc

由 Rusty Russell 提交于 10月 22, 2007

Currently we look at the "trapnum" to see if the Guest wants a
hypercall.  But once the hypercall is done we have to reset trapnum to
a bogus value, otherwise if we exit to userspace and return, we'd run
the same hypercall twice (that was a nasty bug to find!).

This has two main effects:

1) When Jes's patch changes the hypercall args to be a generic "struct
   hcall_args" we simply change the type of "lg->hcall".  It's set by
   arch code, so if it has to copy args or something it can do so, and
   point "hcall" into lg->arch somewhere.

2) Async hypercalls only get run when an actual hypercall is pending.
   This simplfies the code a little and is a more logical semantic.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

cc6d4fbc

Reorder guest saved regs to match hyperall order · 4614a3a3

由 Jes Sorensen 提交于 10月 22, 2007

Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they
will be located together and allow it to map directly to a struct
hcall_ring entry (which will be renamed struct hcall_args as in a
subsequent patch).

This is in preparation for making the code hcall code architecture
independent.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

4614a3a3

Move i386 part of core.c to x86/core.c. · 625efab1

由 Jes Sorensen 提交于 10月 22, 2007

Separate i386 architecture specific from core.c and move it to
x86/core.c and add x86/lguest.h header file to match.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

625efab1

Make shadow IDT a complete IDT with 256 entries. · 56adbe9d

由 Rusty Russell 提交于 10月 22, 2007

This simplifies the code a little, in preparation for allowing
alternate system call vectors in guests (Plan 9 uses 0x40).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

56adbe9d

Remove fixed limit on number of guests, and lguests array. · 48245cc0

由 Rusty Russell 提交于 10月 22, 2007

Back when we had all the Guest state in the switcher, we had a fixed
array of them.  This is no longer necessary.

If we switch the network code to using random_ether_addr (46 bits is
enough to avoid clashes), we can get rid of the concept of "guest id"
altogether.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

48245cc0

Introduce guest mem offset, static link example launcher · 3c6b5bfa

由 Rusty Russell 提交于 10月 22, 2007

In order to avoid problematic special linking of the Launcher, we give
the Host an offset: this means we can use any memory region in the
Launcher as Guest memory rather than insisting on mmap() at 0.

The result is quite pleasing: a number of casts are replaced with
simple additions.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

3c6b5bfa

Rename switcher.S to x86/switcher_32.S · 1f4e1de4

由 Rusty Russell 提交于 10月 22, 2007

lguest uses a "switcher" shim mapped high to bounce between host and
guest.  As lguest becomes less i386-centric, we separate this code
into a subdir.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1f4e1de4

Move lguest guest support to arch/x86. · 34b8867a

由 Rusty Russell 提交于 10月 22, 2007

Lguest has two sides: host support (to launch guests) and guest
support (replacement boot path and paravirt_ops).  This moves the
guest side to arch/x86/lguest where it's closer to related code.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>

34b8867a

Clocksource is continuous regardless of the state of the host's TSC. · 05aa026a

由 Tony Breeds 提交于 10月 22, 2007

Currently lguest will spend a lot of of time waking up the host, as it
cannot go tickless (if the [host] TSC has been marked unstable). On my
laptop I was getting ~40% of wakeups from lguest.

With this patch applied, my laptop is much happier!
Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

05aa026a

R
lguest_devices belongs in lguest_bus.c: it's not i386-specific. · ebac5252
由 Rusty Russell 提交于 10月 22, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
ebac5252
R
Lguest currently depends on 32-bit x86, not just x86. · 141341cd
由 Rusty Russell 提交于 10月 22, 2007
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
141341cd

Use copy_to_user() not put_user for struct timespec · 891ff65f

由 Jes Sorensen 提交于 10月 22, 2007

Use copy_to_user() when copying a struct timespec to the guest -
put_user() cannot handle two long's in one go on a 64bit arch.
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>

891ff65f

Remove binfmts.h include from lg.h · 25e82eba

由 Rusty Russell 提交于 10月 23, 2007

It wasn't needed since a very early prototype of lguest.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

25e82eba

Consolidate host virtualization support under Virtualization menu · 9525ca02

由 Rusty Russell 提交于 10月 22, 2007

Move lguest under the virtualization menu.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Avi Kivity <avi@qumranet.com>

9525ca02

Normalize config options for guest support · d3d1c4bd

由 Rusty Russell 提交于 10月 22, 2007

1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
   menu.
2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
3) Make kconfig help titles consistent.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>

d3d1c4bd

J
[SG] Update drivers to use sg helpers · 45711f1a
由 Jens Axboe 提交于 10月 22, 2007
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
45711f1a

firewire: fw-ohci: shut up a superfluous compiler warning · 4b6d51ec

由 Stefan Richter 提交于 10月 21, 2007

New warning since commit ab88ca48,
"firewire: fw-ohci: missing dma_unmap_single":
drivers/firewire/fw-ohci.c: In function 'at_context_transmit':
drivers/firewire/fw-ohci.c:609: warning: 'payload_bus' may be used
 uninitialized in this function

Access to payload_bus is conditional on packet->payload_length > 0,
and that won't change while in at_context_queue_packet.
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>

4b6d51ec

firewire: fw-ohci: log a note about unsupported features · c74e92c2

由 Stefan Richter 提交于 10月 21, 2007

because there seems to be more time needed to implement this.
Also, change related error return values to more appropriate ones.
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>

c74e92c2

22 10月, 2007 6 次提交

KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs() · 49d3bd7e

由 Laurent Vivier 提交于 10月 22, 2007

In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single()
by a single call to smp_call_function_mask() (which is new for x86_64).
Signed-off-by: NLaurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

49d3bd7e

intel-iommu sg chaining support · c03ab37c

由 FUJITA Tomonori 提交于 10月 21, 2007

x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
support sg chaining, we will get data corruption.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c03ab37c

intel-iommu: fix for IOMMU early crash · 358dd8ac

由 Keshavamurthy, Anil S 提交于 10月 21, 2007

pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due
to IOMMU code depending on this field.

This patch introduces new field in pci_dev's dev.archdata struct to hold
IOMMU specific per device IOMMU private data.
Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

358dd8ac

intel-iommu: optimize sg map/unmap calls · f76aec76

由 Keshavamurthy, Anil S 提交于 10月 21, 2007

This patch adds PageSelectiveInvalidation support replacing existing
DomainSelectiveInvalidation for intel_{map/unmap}_sg() calls and also
enables to mapping one big contiguous DMA virtual address which is mapped
to discontiguous physical address for SG map/unmap calls.

"Doamin selective invalidations" wipes out the IOMMU address translation
cache based on domain ID where as "Page selective invalidations" wipes out
the IOMMU address translation cache for that address mask range which is
more cache friendly when compared to Domain selective invalidations.

Here is how it is done.
1) changes to iova.c
alloc_iova() now takes a bool size_aligned argument, which
when when set, returns the io virtual address that is
naturally aligned to 2 ^ x, where x is the order
of the size requested.

Returning this io vitual address which is naturally
aligned helps iommu to do the "page selective
invalidations" which is IOMMU cache friendly
over "domain selective invalidations".

2) Changes to driver/pci/intel-iommu.c
Clean up intel_{map/unmap}_{single/sg} () calls so that
s/g map/unamp calls is no more dependent on
intel_{map/unmap}_single()

intel_map_sg() now computes the total DMA virtual address
required and allocates the size aligned total DMA virtual address
and maps the discontiguous physical address to the allocated
contiguous DMA virtual address.

In the intel_unmap_sg() case since the DMA virtual address
is contiguous and size_aligned, PageSelectiveInvalidation
is used replacing earlier DomainSelectiveInvalidations.
Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Suresh B <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f76aec76

Intel IOMMU: Iommu floppy workaround · 49a0429e

由 Keshavamurthy, Anil S 提交于 10月 21, 2007

This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so
that the floppy device which does not use DMA api's will continue to work.

Once the floppy driver starts using DMA api's this config option can be turn
off or this patch can be yanked out of kernel at that time.

[akpm@linux-foundation.org: cleanups, rename things, build fix]
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49a0429e

Intel IOMMU: Iommu Gfx workaround · e820482c

由 Keshavamurthy, Anil S 提交于 10月 21, 2007

When we fix all the opensource gfx drivers to use the DMA api's, at that time
we can yank this config options out.

[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NJan Engelhardt <jengelh@gmx.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e820482c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功