1. 12 6月, 2009 8 次提交
    • R
      lguest: implement deferred interrupts in example Launcher · 38bc2b8c
      Rusty Russell 提交于
      Rather than sending an interrupt on every buffer, we only send an interrupt
      when we're about to wait for the Guest to send us a new one.  The console
      input and network input still send interrupts manually, but the block device,
      network and console output queues can simply rely on this logic to send
      interrupts to the Guest at the right time.
      
      The patch is cluttered by moving trigger_irq() higher in the code.
      
      In practice, two factors make this optimization less interesting:
      (1) we often only get one input at a time, even for networking,
      (2) triggering an interrupt rapidly tends to get coalesced anyway.
      
      Before:				Secs	RxIRQS	TxIRQs
       1G TCP Guest->Host:		3.72	32784	32771
       1M normal pings:		99	1000004	995541
       100,000 1k pings (-l 120):	5	49510	49058
      
      After:
       1G TCP Guest->Host:		3.69	32809	32769
       1M normal pings:		99	1000004	996196
       100,000 1k pings (-l 120):	5	52435	52361
      
      (Note the interrupt count on 100k pings goes *up*: see next patch).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      38bc2b8c
    • R
      lguest: have example Launcher service all devices in separate threads · 659a0e66
      Rusty Russell 提交于
      Currently lguest has three threads: the main Launcher thread, a Waker
      thread, and a thread for the block device (because synchronous block
      was simply too painful to bear).
      
      The Waker selects() on all the input file descriptors (eg. stdin, net
      devices, pipe to the block thread) and when one becomes readable it calls
      into the kernel to kick the Launcher thread out into userspace, which
      repeats the poll, services the device(s), and then tells the kernel to
      release the Waker before re-entering the kernel to run the Guest.
      
      Also, to make a slightly-decent network transmit routine, the Launcher
      would suppress further network interrupts while it set a timer: that
      signal handler would write to a pipe, which would rouse the Waker
      which would prod the Launcher out of the kernel to check the network
      device again.
      
      Now we can convert all our virtqueues to separate threads: each one has
      a separate eventfd for when the Guest pokes the device, and can trigger
      interrupts in the Guest directly.
      
      The linecount shows how much this simplifies, but to really bring it
      home, here's an strace analysis of single Guest->Host ping before:
      
      * Guest sends packet, notifies xmit vq, return control to Launcher
      * Launcher clears notification flag on xmit ring
      * Launcher writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Launcher sets up interrupt for Guest (xmit ring is empty)
      	write(10, "\2\0\0\0\3\0\0\0", 8) = 0
      * Launcher sets up timer for interrupt mitigation
      	setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
      * Launcher re-runs guest
      	pread64(10, 0xbfa5f4d4, 4, 0) ...
      * Waker notices reply packet in tun device (it was in select)
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
      * Waker kicks Launcher out of guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher returns from running guest:
      	... = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks at input fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
      * Launcher reads pong from tun device:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
      * Launcher injects guest notification:
      	write(10, "\2\0\0\0\2\0\0\0", 8) = 0
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher reruns Guest:
      	pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
      * Signal comes in, uses pipe to wake up Launcher:
      	--- SIGALRM (Alarm clock) @ 0 (0) ---
      	write(8, "\0", 1)       = 1
      	sigreturn()             = ? (mask now [])
      * Waker sees write on pipe:
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
      * Waker kicks Launcher out of Guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher exits from kernel:
      	pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks to see what fd woke it:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
      * Launcher reads timeout fd, sets notification flag on xmit ring
      	read(6, "\0", 32)       = 1
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher resumes Guest:
      	pread64(10, "\0p\0\4", 4, 0) ....
      
      strace analysis of single Guest->Host ping after:
      
      * Guest sends packet, notifies xmit vq, creates event on eventfd.
      * Network xmit thread wakes from read on eventfd:
      	read(7, "\1\0\0\0\0\0\0\0", 8)          = 8
      * Network xmit thread writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Network recv thread wakes up from read on tunfd:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
      * Network recv thread sets up interrupt for the Guest
      	write(6, "\2\0\0\0\2\0\0\0", 8) = 0
      * Network recv thread goes back to reading tunfd
      	13:39:42.460285 readv(4,  <unfinished ...>
      * Network xmit thread sets up interrupt for Guest (xmit ring is empty)
      	write(6, "\2\0\0\0\3\0\0\0", 8) = 0
      * Network xmit thread goes back to reading from eventfd
      	read(7, <unfinished ...>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      659a0e66
    • R
      lguest: fix writev returning short on console output · 7b5c806c
      Rusty Russell 提交于
      I've never seen it here, but I can't find anywhere that says writev
      will write everything.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      7b5c806c
    • R
      lguest: clean up length-used value in example launcher · e606490c
      Rusty Russell 提交于
      The "len" field in the used ring for virtio indicates the number of
      bytes *written* to the buffer.  This means the guest doesn't have to
      zero the buffers in advance as it always knows the used length.
      
      Erroneously, the console and network example code puts the length
      *read* into that field.  The guest ignores it, but it's wrong.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e606490c
    • R
      lguest: remove invalid interrupt forcing logic. · ebf9a5a9
      Rusty Russell 提交于
      20887611 (lguest: notify on empty) introduced
      lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
      interrupts all the time.
      
      Because we always process one buffer at a time, the inflight count is always 0
      when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
      the Guest.
      
      It should be looking to see if there are more buffers in the Guest's queue:
      if it's empty, then we force an interrupt.
      
      This makes little difference, since we usually have an empty queue; but
      that's the subject of another patch.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ebf9a5a9
    • R
      lguest: get more serious about wmb() in example Launcher code · f7027c63
      Rusty Russell 提交于
      Since the Launcher process runs the Guest, it doesn't have to be very
      serious about its barriers: the Guest isn't running while we are (Guest
      is UP).
      
      Before we change to use threads to service devices, we need to fix this.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      f7027c63
    • R
      lguest: cleanup passing of /dev/lguest fd around example launcher. · 56739c80
      Rusty Russell 提交于
      We hand the /dev/lguest fd everywhere; it's far neater to just make it
      a global (it already is, in fact, hidden in the waker_fds struct).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      56739c80
    • R
      lguest: be paranoid about guest playing with device descriptors. · 713b15b3
      Rusty Russell 提交于
      We can't trust the values in the device descriptor table once the
      guest has booted, so keep local copies.  They could set them to
      strange values then cause us to segv (they're 8 bit values, so they
      can't make our pointers go too wild).
      
      This becomes more important with the following patches which read them.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      713b15b3
  2. 30 3月, 2009 1 次提交
  3. 30 12月, 2008 2 次提交
  4. 31 10月, 2008 1 次提交
  5. 28 10月, 2008 1 次提交
  6. 25 8月, 2008 1 次提交
  7. 12 8月, 2008 1 次提交
  8. 29 7月, 2008 12 次提交
  9. 30 5月, 2008 1 次提交
  10. 02 5月, 2008 2 次提交
  11. 28 3月, 2008 2 次提交
  12. 11 3月, 2008 1 次提交
    • P
      lguest: Do not append space to guests kernel command line · 1ef36fa6
      Paul Bolle 提交于
      The lguest launcher appends a space to the kernel command line (if kernel
      arguments are specified on its command line). This space is unneeded. More
      importantly, this appended space will make Red Hat's nash script interpreter
      (used in a Fedora style initramfs) add an empty argument to init's command
      line. This empty argument will make kernel arguments like "init=/bin/bash"
      fail (because the shell will try to execute a script with an empty name).
      This could be considered a bug in nash, but is easily fixed in the lguest
      launcher too.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      1ef36fa6
  13. 04 2月, 2008 3 次提交
    • R
      virtio: reset function · 6e5aa7ef
      Rusty Russell 提交于
      A reset function solves three problems:
      
      1) It allows us to renegotiate features, eg. if we want to upgrade a
         guest driver without rebooting the guest.
      
      2) It gives us a clean way of shutting down virtqueues: after a reset,
         we know that the buffers won't be used by the host, and
      
      3) It helps the guest recover from messed-up drivers.
      
      So we remove the ->shutdown hook, and the only way we now remove
      feature bits is via reset.
      
      We leave it to the driver to do the reset before it deletes queues:
      the balloon driver, for example, needs to chat to the host in its
      remove function.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6e5aa7ef
    • R
      virtio: clarify NO_NOTIFY flag usage · 426e3e0a
      Rusty Russell 提交于
      The other side (host) can set the NO_NOTIFY flag as an optimization,
      to say "no need to kick me when you add things".  Make it clear that
      this is advisory only; especially that we should always notify when
      the ring is full.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      426e3e0a
    • R
      virtio: simplify config mechanism. · a586d4f6
      Rusty Russell 提交于
      Previously we used a type/len pair within the config space, but this
      seems overkill.  We now simply define a structure which represents the
      layout in the config space: the config space can now only be extended
      at the end.
      
      The main driver-visible changes:
      1) We indicate what fields are present with an explicit feature bit.
      2) Virtqueues are explicitly numbered, and not in the config space.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a586d4f6
  14. 30 1月, 2008 2 次提交
  15. 19 11月, 2007 1 次提交
  16. 12 11月, 2007 1 次提交
    • R
      virtio: Force use of power-of-two for descriptor ring sizes · 42b36cc0
      Rusty Russell 提交于
      The virtio descriptor rings of size N-1 were nicely set up to be
      aligned to an N-byte boundary.  But as Anthony Liguori points out, the
      free-running indices used by virtio require that the sizes be a power
      of 2, otherwise we get problems on wrap (demonstrated with lguest).
      
      So we replace the clever "2^n-1" scheme with a simple "align to page
      boundary" scheme: this means that all virtio rings take at least two
      pages, but it's safer than guessing cache alignment.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      42b36cc0