1. 15 8月, 2011 1 次提交
  2. 22 7月, 2011 3 次提交
    • R
      lguest: update comments · 9f54288d
      Rusty Russell 提交于
      Also removes a long-unused #define and an extraneous semicolon.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9f54288d
    • R
      lguest: Simplify device initialization. · 3c3ed482
      Rusty Russell 提交于
      We used to notify the Host every time we updated a device's status.  However,
      it only really needs to know when we're resetting the device, or failed to
      initialize it, or when we've finished our feature negotiation.
      
      In particular, we used to wait for VIRTIO_CONFIG_S_DRIVER_OK in the
      status byte before starting the device service threads.  But this
      corresponds to the successful finish of device initialization, which
      might (like virtio_blk's partition scanning) use the device.  So we
      had a hack, if they used the device before we expected we started the
      threads anyway.
      
      Now we hook into the finalize_features hook in the Guest: at that
      point we tell the Launcher that it can rely on the features we have
      acked.  On the Launcher side, we look at the status at that point, and
      start servicing the device.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3c3ed482
    • S
      lguest: Do not exit on non-fatal errors · e0377e25
      Sakari Ailus 提交于
      Do not exit on some non-fatal errors:
      
      - writev() fails in net_output(). The result is a lost packet or packets.
      - writev() fails in console_output(). The result is partially lost console
      output.
      - readv() fails in net_input(). The result is a lost packet or packets.
      
      Rather than bringing the guest down, this patch ignores e.g. an allocation
      failure on the host side. Example:
      
      lguest: page allocation failure. order:4, mode:0x4d0
      Pid: 4045, comm: lguest Tainted: G        W   2.6.36 #1
      Call Trace:
       [<c138d614>] ? printk+0x18/0x1c
       [<c106a4e2>] __alloc_pages_nodemask+0x4d2/0x570
       [<c1087954>] cache_alloc_refill+0x2a4/0x4d0
       [<c1305149>] ? __netif_receive_skb+0x189/0x270
       [<c1087c5a>] __kmalloc+0xda/0xf0
       [<c12fffa5>] __alloc_skb+0x55/0x100
       [<c1305519>] ? net_rx_action+0x79/0x100
       [<c12fafed>] sock_alloc_send_pskb+0x18d/0x280
       [<c11fda25>] ? _copy_from_user+0x35/0x130
       [<c13010b6>] ? memcpy_fromiovecend+0x56/0x80
       [<c12a74dc>] tun_chr_aio_write+0x1cc/0x500
       [<c108a125>] do_sync_readv_writev+0x95/0xd0
       [<c11fda25>] ? _copy_from_user+0x35/0x130
       [<c1089fa8>] ? rw_copy_check_uvector+0x58/0x100
       [<c108a7bc>] do_readv_writev+0x9c/0x1d0
       [<c12a7310>] ? tun_chr_aio_write+0x0/0x500
       [<c108a93a>] vfs_writev+0x4a/0x60
       [<c108aa21>] sys_writev+0x41/0x80
       [<c138f061>] syscall_call+0x7/0xb
      Mem-Info:
      DMA per-cpu:
      CPU    0: hi:    0, btch:   1 usd:   0
      Normal per-cpu:
      CPU    0: hi:  186, btch:  31 usd:   0
      HighMem per-cpu:
      CPU    0: hi:  186, btch:  31 usd:   0
      active_anon:134651 inactive_anon:50543 isolated_anon:0
       active_file:96881 inactive_file:132007 isolated_file:0
       unevictable:0 dirty:3 writeback:0 unstable:0
       free:91374 slab_reclaimable:6300 slab_unreclaimable:2802
       mapped:2281 shmem:9 pagetables:330 bounce:0
      DMA free:3524kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:8kB active_file:8760kB inactive_file:2760kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15868kB mlocked:0kB dirty:0kB writeback:0kB mapped:16kB shmem:0kB slab_reclaimable:88kB slab_unreclaimable:148kB kernel_stack:40kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 865 2016 2016
      Normal free:150100kB min:3728kB low:4660kB high:5592kB active_anon:6224kB inactive_anon:15772kB active_file:324084kB inactive_file:325944kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:885944kB mlocked:0kB dirty:12kB writeback:0kB mapped:1520kB shmem:0kB slab_reclaimable:25112kB slab_unreclaimable:11060kB kernel_stack:1888kB pagetables:1320kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 9207 9207
      HighMem free:211872kB min:512kB low:1752kB high:2992kB active_anon:532380kB inactive_anon:186392kB active_file:54680kB inactive_file:199324kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1178504kB mlocked:0kB dirty:0kB writeback:0kB mapped:7588kB shmem:36kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      DMA: 3*4kB 65*8kB 35*16kB 18*32kB 11*64kB 9*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3524kB
      Normal: 35981*4kB 344*8kB 158*16kB 28*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 150100kB
      HighMem: 5732*4kB 5462*8kB 2826*16kB 1598*32kB 84*64kB 10*128kB 7*256kB 1*512kB 1*1024kB 1*2048kB 9*4096kB = 211872kB
      231237 total pagecache pages
      2340 pages in swap cache
      Swap cache stats: add 160060, delete 157720, find 189017/194106
      Free swap  = 4179840kB
      Total swap = 4194300kB
      524271 pages RAM
      296946 pages HighMem
      5668 pages reserved
      867664 pages shared
      82155 pages non-shared
      Signed-off-by: NSakari Ailus <sakari.ailus@iki.fi>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e0377e25
  3. 30 5月, 2011 2 次提交
  4. 07 5月, 2011 1 次提交
  5. 20 1月, 2011 2 次提交
  6. 10 9月, 2010 1 次提交
  7. 27 8月, 2010 1 次提交
    • R
      lguest: clean up warnings in demonstration launcher. · f846619e
      Rusty Russell 提交于
      These days the headers we use are in glibc.  If those are too old, you can
      add the -I lines to get the kernel headers.
      
      In file included from ../../include/linux/if_tun.h:19,
                       from lguest.c:33:
      ../../include/linux/types.h:13:2: warning: #warning "Attempt to use kernel headers from user space, see http://kernelnewbies.org/KernelHeaders"
      lguest.c: In function ‘setup_tun_net’:
      lguest.c:1456: warning: dereferencing pointer ‘sin’ does break strict-aliasing rules
      lguest.c:1457: warning: dereferencing pointer ‘sin’ does break strict-aliasing rules
      lguest.c:1450: note: initialized from here
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      f846619e
  8. 23 4月, 2010 1 次提交
  9. 24 2月, 2010 1 次提交
  10. 04 12月, 2009 1 次提交
  11. 22 10月, 2009 1 次提交
    • C
      virtio: let header files include virtio_ids.h · e95646c3
      Christian Borntraeger 提交于
      Rusty,
      
      commit 3ca4f5ca
          virtio: add virtio IDs file
      moved all device IDs into a single file. While the change itself is
      a very good one, it can break userspace applications. For example
      if a userspace tool wanted to get the ID of virtio_net it used to
      include virtio_net.h. This does no longer work, since virtio_net.h
      does not include virtio_ids.h.
      This patch moves all "#include <linux/virtio_ids.h>" from the C
      files into the header files, making the header files compatible with
      the old ones.
      
      In addition, this patch exports virtio_ids.h to userspace.
      
      CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e95646c3
  12. 23 9月, 2009 2 次提交
  13. 30 7月, 2009 4 次提交
  14. 12 6月, 2009 12 次提交
    • M
      lguest: add support for indirect ring entries · d1f0132e
      Mark McLoughlin 提交于
      Support the VIRTIO_RING_F_INDIRECT_DESC feature.
      
      This is a simple matter of changing the descriptor walking
      code to operate on a struct vring_desc* and supplying it
      with an indirect table if detected.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      d1f0132e
    • R
      lguest: suppress notifications in example Launcher · b60da13f
      Rusty Russell 提交于
      The Guest only really needs to tell us about activity when we're going
      to listen to the eventfd: normally, we don't want to know.
      
      So if there are no available buffers, turn on notifications, re-check,
      then wait for the Guest to notify us via the eventfd, then turn
      notifications off again.
      
      There's enough else going on that the differences are in the noise.
      
      Before:				Secs	RxKicks	TxKicks
       1G TCP Guest->Host:		3.94	  4686	  32815
       1M normal pings:		104	142862	1000010
       1M 1k pings (-l 120):		57	142026	1000007
      
      After:
       1G TCP Guest->Host:		3.76	  4691	  32811
       1M normal pings:		111	142859	 997467
       1M 1k pings (-l 120):		55	 19648	 501549
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b60da13f
    • R
      lguest: try to batch interrupts on network receive · 4a8962e2
      Rusty Russell 提交于
      Rather than triggering an interrupt every time, we only trigger an
      interrupt when there are no more incoming packets (or the recv queue
      is full).
      
      However, the overhead of doing the select to figure this out is
      measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host
      TCP goes from 3.69 to 3.94 seconds.  It's close to the noise though.
      
      I tested various timeouts, including reducing it as the number of
      pending packets increased, timing a 1 gigabyte TCP send from Guest ->
      Host and Host -> Guest (GSO disabled, to increase packet rate).
      
      // time tcpblast -o -s 65536 -c 16k 192.168.2.1:9999 > /dev/null
      
      Timeout		Guest->Host	Pkts/irq	Host->Guest	Pkts/irq
      Before		11.3s		1.0		6.3s		1.0
      0		11.7s		1.0		6.6s		23.5
      1		17.1s		8.8		8.6s		26.0
      1/pending	13.4s		1.9		6.6s		23.8
      2/pending	13.6s		2.8		6.6s		24.1
      5/pending	14.1s		5.0		6.6s		24.4
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4a8962e2
    • R
      lguest: avoid sending interrupts to Guest when no activity occurs. · 95c517c0
      Rusty Russell 提交于
      If we track how many buffers we've used, we can tell whether we really
      need to interrupt the Guest.  This happens as a side effect of
      spurious notifications.
      
      Spurious notifications happen because it can take a while before the
      Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and
      meanwhile the Guest can more notifications.
      
      A real fix would be to use wake counts, rather than a suppression
      flag, but the practical difference is generally in the noise: the
      interrupt is usually coalesced into a pending one anyway so we just
      save a system call which isn't clearly measurable.
      
      				Secs	Spurious IRQS
      1G TCP Guest->Host:		3.93	58
      1M normal pings:		100	72
      1M 1k pings (-l 120):		57	492904
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      95c517c0
    • R
      lguest: implement deferred interrupts in example Launcher · 38bc2b8c
      Rusty Russell 提交于
      Rather than sending an interrupt on every buffer, we only send an interrupt
      when we're about to wait for the Guest to send us a new one.  The console
      input and network input still send interrupts manually, but the block device,
      network and console output queues can simply rely on this logic to send
      interrupts to the Guest at the right time.
      
      The patch is cluttered by moving trigger_irq() higher in the code.
      
      In practice, two factors make this optimization less interesting:
      (1) we often only get one input at a time, even for networking,
      (2) triggering an interrupt rapidly tends to get coalesced anyway.
      
      Before:				Secs	RxIRQS	TxIRQs
       1G TCP Guest->Host:		3.72	32784	32771
       1M normal pings:		99	1000004	995541
       100,000 1k pings (-l 120):	5	49510	49058
      
      After:
       1G TCP Guest->Host:		3.69	32809	32769
       1M normal pings:		99	1000004	996196
       100,000 1k pings (-l 120):	5	52435	52361
      
      (Note the interrupt count on 100k pings goes *up*: see next patch).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      38bc2b8c
    • R
      lguest: have example Launcher service all devices in separate threads · 659a0e66
      Rusty Russell 提交于
      Currently lguest has three threads: the main Launcher thread, a Waker
      thread, and a thread for the block device (because synchronous block
      was simply too painful to bear).
      
      The Waker selects() on all the input file descriptors (eg. stdin, net
      devices, pipe to the block thread) and when one becomes readable it calls
      into the kernel to kick the Launcher thread out into userspace, which
      repeats the poll, services the device(s), and then tells the kernel to
      release the Waker before re-entering the kernel to run the Guest.
      
      Also, to make a slightly-decent network transmit routine, the Launcher
      would suppress further network interrupts while it set a timer: that
      signal handler would write to a pipe, which would rouse the Waker
      which would prod the Launcher out of the kernel to check the network
      device again.
      
      Now we can convert all our virtqueues to separate threads: each one has
      a separate eventfd for when the Guest pokes the device, and can trigger
      interrupts in the Guest directly.
      
      The linecount shows how much this simplifies, but to really bring it
      home, here's an strace analysis of single Guest->Host ping before:
      
      * Guest sends packet, notifies xmit vq, return control to Launcher
      * Launcher clears notification flag on xmit ring
      * Launcher writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Launcher sets up interrupt for Guest (xmit ring is empty)
      	write(10, "\2\0\0\0\3\0\0\0", 8) = 0
      * Launcher sets up timer for interrupt mitigation
      	setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
      * Launcher re-runs guest
      	pread64(10, 0xbfa5f4d4, 4, 0) ...
      * Waker notices reply packet in tun device (it was in select)
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
      * Waker kicks Launcher out of guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher returns from running guest:
      	... = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks at input fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
      * Launcher reads pong from tun device:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
      * Launcher injects guest notification:
      	write(10, "\2\0\0\0\2\0\0\0", 8) = 0
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher reruns Guest:
      	pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
      * Signal comes in, uses pipe to wake up Launcher:
      	--- SIGALRM (Alarm clock) @ 0 (0) ---
      	write(8, "\0", 1)       = 1
      	sigreturn()             = ? (mask now [])
      * Waker sees write on pipe:
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
      * Waker kicks Launcher out of Guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher exits from kernel:
      	pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks to see what fd woke it:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
      * Launcher reads timeout fd, sets notification flag on xmit ring
      	read(6, "\0", 32)       = 1
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher resumes Guest:
      	pread64(10, "\0p\0\4", 4, 0) ....
      
      strace analysis of single Guest->Host ping after:
      
      * Guest sends packet, notifies xmit vq, creates event on eventfd.
      * Network xmit thread wakes from read on eventfd:
      	read(7, "\1\0\0\0\0\0\0\0", 8)          = 8
      * Network xmit thread writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Network recv thread wakes up from read on tunfd:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
      * Network recv thread sets up interrupt for the Guest
      	write(6, "\2\0\0\0\2\0\0\0", 8) = 0
      * Network recv thread goes back to reading tunfd
      	13:39:42.460285 readv(4,  <unfinished ...>
      * Network xmit thread sets up interrupt for Guest (xmit ring is empty)
      	write(6, "\2\0\0\0\3\0\0\0", 8) = 0
      * Network xmit thread goes back to reading from eventfd
      	read(7, <unfinished ...>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      659a0e66
    • R
      lguest: fix writev returning short on console output · 7b5c806c
      Rusty Russell 提交于
      I've never seen it here, but I can't find anywhere that says writev
      will write everything.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      7b5c806c
    • R
      lguest: clean up length-used value in example launcher · e606490c
      Rusty Russell 提交于
      The "len" field in the used ring for virtio indicates the number of
      bytes *written* to the buffer.  This means the guest doesn't have to
      zero the buffers in advance as it always knows the used length.
      
      Erroneously, the console and network example code puts the length
      *read* into that field.  The guest ignores it, but it's wrong.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e606490c
    • R
      lguest: remove invalid interrupt forcing logic. · ebf9a5a9
      Rusty Russell 提交于
      20887611 (lguest: notify on empty) introduced
      lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
      interrupts all the time.
      
      Because we always process one buffer at a time, the inflight count is always 0
      when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
      the Guest.
      
      It should be looking to see if there are more buffers in the Guest's queue:
      if it's empty, then we force an interrupt.
      
      This makes little difference, since we usually have an empty queue; but
      that's the subject of another patch.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ebf9a5a9
    • R
      lguest: get more serious about wmb() in example Launcher code · f7027c63
      Rusty Russell 提交于
      Since the Launcher process runs the Guest, it doesn't have to be very
      serious about its barriers: the Guest isn't running while we are (Guest
      is UP).
      
      Before we change to use threads to service devices, we need to fix this.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      f7027c63
    • R
      lguest: cleanup passing of /dev/lguest fd around example launcher. · 56739c80
      Rusty Russell 提交于
      We hand the /dev/lguest fd everywhere; it's far neater to just make it
      a global (it already is, in fact, hidden in the waker_fds struct).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      56739c80
    • R
      lguest: be paranoid about guest playing with device descriptors. · 713b15b3
      Rusty Russell 提交于
      We can't trust the values in the device descriptor table once the
      guest has booted, so keep local copies.  They could set them to
      strange values then cause us to segv (they're 8 bit values, so they
      can't make our pointers go too wild).
      
      This becomes more important with the following patches which read them.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      713b15b3
  15. 30 3月, 2009 1 次提交
  16. 30 12月, 2008 2 次提交
  17. 31 10月, 2008 1 次提交
  18. 28 10月, 2008 1 次提交
  19. 25 8月, 2008 1 次提交
  20. 12 8月, 2008 1 次提交