提交 · 38bc2b8c56a2e212bbd19de7cf9976dcc7bf9953 · openeuler / Kernel

12 6月, 2009 8 次提交

lguest: implement deferred interrupts in example Launcher · 38bc2b8c

由 Rusty Russell 提交于 6月 12, 2009

Rather than sending an interrupt on every buffer, we only send an interrupt
when we're about to wait for the Guest to send us a new one.  The console
input and network input still send interrupts manually, but the block device,
network and console output queues can simply rely on this logic to send
interrupts to the Guest at the right time.

The patch is cluttered by moving trigger_irq() higher in the code.

In practice, two factors make this optimization less interesting:
(1) we often only get one input at a time, even for networking,
(2) triggering an interrupt rapidly tends to get coalesced anyway.

Before:				Secs	RxIRQS	TxIRQs
 1G TCP Guest->Host:		3.72	32784	32771
 1M normal pings:		99	1000004	995541
 100,000 1k pings (-l 120):	5	49510	49058

After:
 1G TCP Guest->Host:		3.69	32809	32769
 1M normal pings:		99	1000004	996196
 100,000 1k pings (-l 120):	5	52435	52361

(Note the interrupt count on 100k pings goes *up*: see next patch).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

38bc2b8c

lguest: have example Launcher service all devices in separate threads · 659a0e66

由 Rusty Russell 提交于 6月 12, 2009

Currently lguest has three threads: the main Launcher thread, a Waker
thread, and a thread for the block device (because synchronous block
was simply too painful to bear).

The Waker selects() on all the input file descriptors (eg. stdin, net
devices, pipe to the block thread) and when one becomes readable it calls
into the kernel to kick the Launcher thread out into userspace, which
repeats the poll, services the device(s), and then tells the kernel to
release the Waker before re-entering the kernel to run the Guest.

Also, to make a slightly-decent network transmit routine, the Launcher
would suppress further network interrupts while it set a timer: that
signal handler would write to a pipe, which would rouse the Waker
which would prod the Launcher out of the kernel to check the network
device again.

Now we can convert all our virtqueues to separate threads: each one has
a separate eventfd for when the Guest pokes the device, and can trigger
interrupts in the Guest directly.

The linecount shows how much this simplifies, but to really bring it
home, here's an strace analysis of single Guest->Host ping before:

* Guest sends packet, notifies xmit vq, return control to Launcher
* Launcher clears notification flag on xmit ring
* Launcher writes packet to TUN device
	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
* Launcher sets up interrupt for Guest (xmit ring is empty)
	write(10, "\2\0\0\0\3\0\0\0", 8) = 0
* Launcher sets up timer for interrupt mitigation
	setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
* Launcher re-runs guest
	pread64(10, 0xbfa5f4d4, 4, 0) ...
* Waker notices reply packet in tun device (it was in select)
	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
* Waker kicks Launcher out of guest:
	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
* Launcher returns from running guest:
	... = -1 EAGAIN (Resource temporarily unavailable)
* Launcher looks at input fds:
	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
* Launcher reads pong from tun device:
	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
* Launcher injects guest notification:
	write(10, "\2\0\0\0\2\0\0\0", 8) = 0
* Launcher rechecks fds:
	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
* Launcher clears Waker:
	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
* Launcher reruns Guest:
	pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
* Signal comes in, uses pipe to wake up Launcher:
	--- SIGALRM (Alarm clock) @ 0 (0) ---
	write(8, "\0", 1)       = 1
	sigreturn()             = ? (mask now [])
* Waker sees write on pipe:
	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
* Waker kicks Launcher out of Guest:
	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
* Launcher exits from kernel:
	pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
* Launcher looks to see what fd woke it:
	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
* Launcher reads timeout fd, sets notification flag on xmit ring
	read(6, "\0", 32)       = 1
* Launcher rechecks fds:
	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
* Launcher clears Waker:
	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
* Launcher resumes Guest:
	pread64(10, "\0p\0\4", 4, 0) ....

strace analysis of single Guest->Host ping after:

* Guest sends packet, notifies xmit vq, creates event on eventfd.
* Network xmit thread wakes from read on eventfd:
	read(7, "\1\0\0\0\0\0\0\0", 8)          = 8
* Network xmit thread writes packet to TUN device
	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
* Network recv thread wakes up from read on tunfd:
	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
* Network recv thread sets up interrupt for the Guest
	write(6, "\2\0\0\0\2\0\0\0", 8) = 0
* Network recv thread goes back to reading tunfd
	13:39:42.460285 readv(4,  <unfinished ...>
* Network xmit thread sets up interrupt for Guest (xmit ring is empty)
	write(6, "\2\0\0\0\3\0\0\0", 8) = 0
* Network xmit thread goes back to reading from eventfd
	read(7, <unfinished ...>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

659a0e66

lguest: fix writev returning short on console output · 7b5c806c

由 Rusty Russell 提交于 6月 12, 2009

I've never seen it here, but I can't find anywhere that says writev
will write everything.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

7b5c806c

lguest: clean up length-used value in example launcher · e606490c

由 Rusty Russell 提交于 6月 12, 2009

The "len" field in the used ring for virtio indicates the number of
bytes *written* to the buffer.  This means the guest doesn't have to
zero the buffers in advance as it always knows the used length.

Erroneously, the console and network example code puts the length
*read* into that field.  The guest ignores it, but it's wrong.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

e606490c

lguest: remove invalid interrupt forcing logic. · ebf9a5a9

由 Rusty Russell 提交于 6月 12, 2009

20887611 (lguest: notify on empty) introduced
lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
interrupts all the time.

Because we always process one buffer at a time, the inflight count is always 0
when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
the Guest.

It should be looking to see if there are more buffers in the Guest's queue:
if it's empty, then we force an interrupt.

This makes little difference, since we usually have an empty queue; but
that's the subject of another patch.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

ebf9a5a9

lguest: get more serious about wmb() in example Launcher code · f7027c63

由 Rusty Russell 提交于 6月 12, 2009

Since the Launcher process runs the Guest, it doesn't have to be very
serious about its barriers: the Guest isn't running while we are (Guest
is UP).

Before we change to use threads to service devices, we need to fix this.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

f7027c63

lguest: cleanup passing of /dev/lguest fd around example launcher. · 56739c80

由 Rusty Russell 提交于 6月 12, 2009

We hand the /dev/lguest fd everywhere; it's far neater to just make it
a global (it already is, in fact, hidden in the waker_fds struct).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

56739c80

lguest: be paranoid about guest playing with device descriptors. · 713b15b3

由 Rusty Russell 提交于 6月 12, 2009

We can't trust the values in the device descriptor table once the
guest has booted, so keep local copies.  They could set them to
strange values then cause us to segv (they're 8 bit values, so they
can't make our pointers go too wild).

This becomes more important with the following patches which read them.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

713b15b3

30 3月, 2009 1 次提交

lguest: barrier me harder · d1881d31

由 Rusty Russell 提交于 3月 30, 2009

Impact: barrier correctness in example launcher

I doubt either lguest user will complain about performance.
Reported-by: NChristoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d1881d31

30 12月, 2008 2 次提交

lguest: move the initial guest page table creation code to the host · 58a24566

由 Matias Zabaljauregui 提交于 9月 29, 2008

This patch moves the initial guest page table creation code to the host,
so the launcher keeps working with PAE enabled configs.
Signed-off-by: NMatias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

58a24566

virtio: use LGUEST_VRING_ALIGN instead of relying on pagesize · 2966af73

由 Rusty Russell 提交于 12月 30, 2008

This doesn't really matter, since lguest is i386 only at the moment,
but we could actually choose a different value.  (lguest doesn't have
a guarenteed ABI).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

2966af73

31 10月, 2008 1 次提交
- R
  lguest: fix example launcher compile after moved asm-x86 dir. · d5d02d6d
  由 Rusty Russell 提交于 10月 31, 2008
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  d5d02d6d
28 10月, 2008 1 次提交

doc/x86: fix doc subdirs · 71cced6e

由 Uwe Hermann 提交于 10月 20, 2008

The Documentation/i386 and Documentation/x86_64 directories and their
contents have been moved into Documentation/x86. Fix references to
those files accordingly.
Signed-off-by: NUwe Hermann <uwe@hermann-uwe.de>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71cced6e

25 8月, 2008 1 次提交
- R
  lguest: update commentry · 1dc3e3bc
  由 Rusty Russell 提交于 8月 26, 2008
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  1dc3e3bc
12 8月, 2008 1 次提交

lguest: don't set MAC address for guest unless specified · 40c42076

由 Rusty Russell 提交于 8月 12, 2008

This shows up when trying to bridge:
	tap0: received packet with  own address as source address

As Max Krasnyansky points out, there's no reason to give the guest the
same mac address as the TUN device.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Max Krasnyansky <maxk@qualcomm.com>

40c42076

29 7月, 2008 12 次提交

lguest: turn Waker into a thread, not a process · 8c79873d

由 Rusty Russell 提交于 7月 29, 2008

lguest uses a Waker process to break it out of the kernel (ie.
actually running the guest) when file descriptor needs attention.

Changing this from a process to a thread somewhat simplifies things:
it can directly access the fd_set of things to watch.  More
importantly, it means that the Waker can see Guest memory correctly,
so /dev/vring file descriptors will work as anticipated (the
alternative is to actually mmap MAP_SHARED, but you can't do that with
/dev/zero).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

8c79873d

lguest: Enlarge virtio rings · 0f0c4fab

由 Rusty Russell 提交于 7月 29, 2008

With big packets, 128 entries is a little small.

Guest -> Host 1GB TCP:
Before: 8.43625 seconds xmit 95640 recv 198266 timeout 49771 usec 1252
After: 8.01099 seconds xmit 49200 recv 102263 timeout 26014 usec 2118
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

0f0c4fab

lguest: Use GSO/IFF_VNET_HDR extensions on tun/tap · 398f187d

由 Rusty Russell 提交于 7月 29, 2008

Guest -> Host 1GB TCP:
Before 20.1974 seconds xmit 214510 recv 5 timeout 214491 usec 278
After 8.43625 seconds xmit 95640 recv 198266 timeout 49771 usec 1252

Host -> Guest 1GB TCP:
Before: Seconds 9.98854 xmit 172166 recv 5344 timeout 172157 usec 251
After: Seconds 5.72803 xmit 244322 recv 9919 timeout 244302 usec 156
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

398f187d

lguest: Remove 'network: no dma buffer!' warning · 9254926f

由 Rusty Russell 提交于 7月 29, 2008

This warning can happen a lot under load, and it should be warnx not
warn anwyay.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

9254926f

lguest: Adaptive timeout · aa124984

由 Rusty Russell 提交于 7月 29, 2008

Since the correct timeout value varies, use a heuristic which adjusts
the timeout depending on how many packets we've seen.  This gives
slightly worse results, but doesn't need tweaking when GSO is
introduced.

500 usec	19.1887		xmit 561141 recv 1 timeout 559657
Dynamic (278)	20.1974		xmit 214510 recv 5 timeout 214491 usec 278
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

aa124984

lguest: Tell Guest net not to notify us on every packet xmit · a161883a

由 Rusty Russell 提交于 7月 29, 2008

virtio_ring has the ability to suppress notifications.  This prevents
a guest exit for every packet, but we need to set a timer on packet
receipt to re-check if there were any remaining packets.

Here are the times for 1G TCP Guest->Host with different timeout
settings (it matters because the TCP window doesn't grow big enough to
fill the entire buffer):

Timeout value	Seconds		Xmit/Recv/Timeout
None (before)	25.3784		xmit 7750233 recv 1
2500 usec	62.5119		xmit 207020 recv 2 timeout 207020
1000 usec	34.5379		xmit 207003 recv 2 timeout 207003
750 usec	29.2305		xmit 207002 recv 1 timeout 207002
500 usec	19.1887		xmit 561141 recv 1 timeout 559657
250 usec	20.0465		xmit 214128 recv 2 timeout 214110
100 usec	19.2583		xmit 561621 recv 1 timeout 560153

(Note that these values are sensitive to the GSO patches which come
 later, and probably other traffic-related variables, so take with a
 large grain of salt).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a161883a

lguest: net block unneeded receive queue update notifications · 5dae785a

由 Rusty Russell 提交于 7月 29, 2008

Number of exits transmitting 10GB Guest->Host before:
	network xmit 7858610 recv 118136

After:
	network xmit 7750233 recv 1
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

5dae785a

lguest: wrap last_avail accesses. · b5111790

由 Rusty Russell 提交于 7月 29, 2008

To simplify the transition to when we publish indices in the ring
(and make shuffling my patch queue easier), wrap them in a lg_last_avail()
macro.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

b5111790

lguest: virtio-rng support · 28fd6d7f

由 Rusty Russell 提交于 7月 29, 2008

This is a simple patch to add support for the virtio "hardware random
generator" to lguest.  It gets about 1.2 MB/sec reading from /dev/hwrng
in the guest.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

28fd6d7f

lguest: Support assigning a MAC address · dec6a2be

由 Mark McLoughlin 提交于 7月 29, 2008

If you've got a nice DHCP configuration which maps MAC
addresses to specific IP addresses, then you're going to
want to start your guest with one of those MAC addresses.

Also, in Fedora, we have persistent network interface naming
based on the MAC address, so with randomly assigned
addresses you're soon going to hit eth13. Who knows what
will happen then!

Allow assigning a MAC address to the network interface with
e.g.

  --tunnet=bridge:eth0:00:FF:95:6B:DA:3D

or:

  --tunnet=192.168.121.1:00:FF:95:6B:DA:3D

which is pretty unintelligable, but ...

(includes Rusty's minor rework)
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

dec6a2be

lguest: Don't leak /dev/zero fd · 34bdaab4

由 Mark McLoughlin 提交于 6月 13, 2008

Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

34bdaab4

R
lguest: fix verbose printing of device features. · 32c68e5c
由 Rusty Russell 提交于 7月 29, 2008
```
%02x is more appropriate for bytes than %08x.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
32c68e5c

30 5月, 2008 1 次提交

lguest: notify on empty · 20887611

由 Rusty Russell 提交于 5月 30, 2008

This is the lguest implementation of the VIRTIO_F_NOTIFY_ON_EMPTY feature.
It is currently only published for network devices, but it is turned on for
everyone.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

20887611

02 5月, 2008 2 次提交

lguest: make Launcher see device status updates · a007a751

由 Rusty Russell 提交于 5月 02, 2008

This brings us closer to Real Life, where we'd examine the device
features once it's set the DRIVER_OK status bit.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a007a751

virtio: de-structify virtio_block status byte · cb38fa23

由 Rusty Russell 提交于 5月 02, 2008

Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.

Cc: "ron minnich" <rminnich@gmail.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

cb38fa23

28 3月, 2008 2 次提交

lguest: comment documentation update. · a6bd8e13

由 Rusty Russell 提交于 3月 28, 2008

Took some cycles to re-read the Lguest Journey end-to-end, fix some
rot and tighten some phrases.

Only comments change.  No new jokes, but a couple of recycled old jokes.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a6bd8e13

R
lguest: Don't need comment terminator before disk section. · e18b094f
由 Rusty Russell 提交于 3月 28, 2008
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
e18b094f

11 3月, 2008 1 次提交

lguest: Do not append space to guests kernel command line · 1ef36fa6

由 Paul Bolle 提交于 3月 10, 2008

The lguest launcher appends a space to the kernel command line (if kernel
arguments are specified on its command line). This space is unneeded. More
importantly, this appended space will make Red Hat's nash script interpreter
(used in a Fedora style initramfs) add an empty argument to init's command
line. This empty argument will make kernel arguments like "init=/bin/bash"
fail (because the shell will try to execute a script with an empty name).
This could be considered a bug in nash, but is easily fixed in the lguest
launcher too.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

1ef36fa6

04 2月, 2008 3 次提交

virtio: reset function · 6e5aa7ef

由 Rusty Russell 提交于 2月 04, 2008

A reset function solves three problems:

1) It allows us to renegotiate features, eg. if we want to upgrade a
   guest driver without rebooting the guest.

2) It gives us a clean way of shutting down virtqueues: after a reset,
   we know that the buffers won't be used by the host, and

3) It helps the guest recover from messed-up drivers.

So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.

We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

6e5aa7ef

virtio: clarify NO_NOTIFY flag usage · 426e3e0a

由 Rusty Russell 提交于 2月 04, 2008

The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things".  Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

426e3e0a

virtio: simplify config mechanism. · a586d4f6

由 Rusty Russell 提交于 2月 04, 2008

Previously we used a type/len pair within the config space, but this
seems overkill.  We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.

The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

a586d4f6

30 1月, 2008 2 次提交

lguest: adapt launcher to per-cpuness · e3283fa0

由 Glauber de Oliveira Costa 提交于 1月 07, 2008

This patch makes uses of pread() and pwrite() in lguest launcher
to communicate the vcpu id to the lguest driver. The id is kept in
a thread variable, which means we'll span in the future, vcpus as
threads. But right now, only the infrastructure is out there.
Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

e3283fa0

lguest: Reboot support · ec04b13f

由 Balaji Rao 提交于 12月 28, 2007

Reboot Implemented

(Prevent fd leak, fix style and fix documentation --RR)
Signed-off-by: NBalaji Rao <balajirrao@gmail.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

ec04b13f

19 11月, 2007 1 次提交
- R
  lguest: Fix uninitialized members in example launcher · d1c856e0
  由 Rusty Russell 提交于 11月 19, 2007
```
Thanks valgrind!
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  d1c856e0
12 11月, 2007 1 次提交

virtio: Force use of power-of-two for descriptor ring sizes · 42b36cc0

由 Rusty Russell 提交于 11月 12, 2007

The virtio descriptor rings of size N-1 were nicely set up to be
aligned to an N-byte boundary. But as Anthony Liguori points out, the
free-running indices used by virtio require that the sizes be a power
of 2, otherwise we get problems on wrap (demonstrated with lguest).

So we replace the clever "2^n-1" scheme with a simple "align to page
boundary" scheme: this means that all virtio rings take at least two
pages, but it's safer than guessing cache alignment.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

42b36cc0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功