1. 05 2月, 2009 5 次提交
  2. 27 1月, 2009 1 次提交
  3. 26 1月, 2009 1 次提交
  4. 22 1月, 2009 2 次提交
  5. 07 1月, 2009 1 次提交
  6. 23 12月, 2008 1 次提交
  7. 02 12月, 2008 1 次提交
  8. 17 11月, 2008 3 次提交
    • M
      virtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation) · 3f2c31d9
      Mark McLoughlin 提交于
      If segmentation offload is enabled by the host, we currently allocate
      maximum sized packet buffers and pass them to the host. This uses up
      20 ring entries, allowing us to supply only 20 packet buffers to the
      host with a 256 entry ring. This is a huge overhead when receiving
      small packets, and is most keenly felt when receiving MTU sized
      packets from off-host.
      
      The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
      using receive buffers which are smaller than the maximum packet size.
      In order to transfer large packets to the guest, the host merges
      together multiple receive buffers to form a larger logical buffer.
      The number of merged buffers is returned to the guest via a field in
      the virtio_net_hdr.
      
      Make use of this support by supplying single page receive buffers to
      the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
      the payload to the skb's linear data buffer and adjust the fragment
      offset to point to the remaining data. This ensures proper alignment
      and allows us to not use any paged data for small packets. If the
      payload occupies multiple pages, we simply append those pages as
      fragments and free the associated skbs.
      
      This scheme allows us to be efficient in our use of ring entries
      while still supporting large packets. Benchmarking using netperf from
      an external machine to a guest over a 10Gb/s network shows a 100%
      improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
      with GSO disabled on the host side, throughput was seen to increase
      from 700Mb/s to 1.7Gb/s.
      
      Based on a patch from Herbert Xu.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2c31d9
    • M
      virtio_net: hook up the set-tso ethtool op · 0276b497
      Mark McLoughlin 提交于
      Seems like an oversight that we have set-tx-csum and set-sg hooked
      up, but not set-tso.
      
      Also leads to the strange situation that if you e.g. disable tx-csum,
      then tso doesn't get disabled.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0276b497
    • M
      virtio_net: Recycle some more rx buffer pages · 0a888fd1
      Mark McLoughlin 提交于
      Each time we re-fill the recv queue with buffers, we allocate
      one too many skbs and free it again when adding fails. We should
      recycle the pages allocated in this case.
      
      A previous version of this patch made trim_pages() trim trailing
      unused pages from skbs with some paged data, but this actually
      caused a barely measurable slowdown.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a888fd1
  9. 13 11月, 2008 1 次提交
    • W
      netdevice: safe convert to netdev_priv() #part-3 · 8f15ea42
      Wang Chen 提交于
      We have some reasons to kill netdev->priv:
      1. netdev->priv is equal to netdev_priv().
      2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
         netdev_priv() is more flexible than netdev->priv.
      But we cann't kill netdev->priv, because so many drivers reference to it
      directly.
      
      This patch is a safe convert for netdev->priv to netdev_priv(netdev).
      Since all of the netdev->priv is only for read.
      But it is too big to be sent in one mail.
      I split it to 4 parts and make every part smaller than 100,000 bytes,
      which is max size allowed by vger.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f15ea42
  10. 28 10月, 2008 1 次提交
  11. 25 7月, 2008 4 次提交
    • R
      virtio: Recycle unused recv buffer pages for large skbs in net driver · fb6813f4
      Rusty Russell 提交于
      If we hack the virtio_net driver to always allocate full-sized (64k+)
      skbuffs, the driver slows down (lguest numbers):
      
        Time to receive 1GB (small buffers): 10.85 seconds
        Time to receive 1GB (64k+ buffers): 24.75 seconds
      
      Of course, large buffers use up more space in the ring, so we increase
      that from 128 to 2048:
      
        Time to receive 1GB (64k+ buffers, 2k ring): 16.61 seconds
      
      If we recycle pages rather than using alloc_page/free_page:
      
        Time to receive 1GB (64k+ buffers, 2k ring, recycle pages): 10.81 seconds
      
      This demonstrates that with efficient allocation, we don't need to
      have a separate "small buffer" queue.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fb6813f4
    • H
      virtio net: Allow receiving SG packets · 97402b96
      Herbert Xu 提交于
      Finally this patch lets virtio_net receive GSO packets in addition
      to sending them.  This can definitely be optimised for the non-GSO
      case.  For comparison the Xen approach stores one page in each skb
      and uses subsequent skb's pages to construct an SG skb instead of
      preallocating the maximum amount of pages per skb.
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added feature bits)
      97402b96
    • H
      virtio net: Add ethtool ops for SG/GSO · a9ea3fc6
      Herbert Xu 提交于
      This patch adds some basic ethtool operations to virtio_net so
      I could test SG without GSO (which was really useful because TSO
      turned out to be buggy :)
      
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (remove MTU setting)
      a9ea3fc6
    • M
      virtio: fix virtio_net xmit of freed skb bug · 9953ca6c
      Mark McLoughlin 提交于
      On Mon, 2008-05-26 at 17:42 +1000, Rusty Russell wrote:
      > If we fail to transmit a packet, we assume the queue is full and put
      > the skb into last_xmit_skb.  However, if more space frees up before we
      > xmit it, we loop, and the result can be transmitting the same skb twice.
      >
      > Fix is simple: set skb to NULL if we've used it in some way, and check
      > before sending.
      ...
      > diff -r 564237b31993 drivers/net/virtio_net.c
      > --- a/drivers/net/virtio_net.c	Mon May 19 12:22:00 2008 +1000
      > +++ b/drivers/net/virtio_net.c	Mon May 19 12:24:58 2008 +1000
      > @@ -287,21 +287,25 @@ again:
      >  	free_old_xmit_skbs(vi);
      >
      >  	/* If we has a buffer left over from last time, send it now. */
      > -	if (vi->last_xmit_skb) {
      > +	if (unlikely(vi->last_xmit_skb)) {
      >  		if (xmit_skb(vi, vi->last_xmit_skb) != 0) {
      >  			/* Drop this skb: we only queue one. */
      >  			vi->dev->stats.tx_dropped++;
      >  			kfree_skb(skb);
      > +			skb = NULL;
      >  			goto stop_queue;
      >  		}
      >  		vi->last_xmit_skb = NULL;
      
      With this, may drop an skb and then later in the function discover that
      we could have sent it after all. Poor wee skb :)
      
      How about the incremental patch below?
      
      Cheers,
      Mark.
      
      Subject: [PATCH] virtio_net: Delay dropping tx skbs
      
      Currently we drop the skb in start_xmit() if we have a
      queued buffer and fail to transmit it.
      
      However, if we delay dropping it until we've stopped the
      queue and enabled the tx notification callback, then there
      is a chance space might become available for it.
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9953ca6c
  12. 11 7月, 2008 1 次提交
  13. 11 6月, 2008 3 次提交
  14. 31 5月, 2008 2 次提交
  15. 23 5月, 2008 1 次提交
  16. 02 5月, 2008 5 次提交
    • R
      virtio: explicit advertisement of driver features · c45a6816
      Rusty Russell 提交于
      A recent proposed feature addition to the virtio block driver revealed
      some flaws in the API: in particular, we assume that feature
      negotiation is complete once a driver's probe function returns.
      
      There is nothing in the API to require this, however, and even I
      didn't notice when it was violated.
      
      So instead, we require the driver to specify what features it supports
      in a table, we can then move the feature negotiation into the virtio
      core.  The intersection of device and driver features are presented in
      a new 'features' bitmap in the struct virtio_device.
      
      Note that this highlights the difference between Linux unsigned-long
      bitmaps where each unsigned long is in native endian, and a
      straight-forward little-endian array of bytes.
      
      Drivers can still remove feature bits in their probe routine if they
      really have to.
      
      API changes:
      - dev->config->feature() no longer gets and acks a feature.
      - drivers should advertise their features in the 'feature_table' field
      - use virtio_has_feature() for extra sanity when checking feature bits
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c45a6816
    • R
      virtio: finer-grained features for virtio_net · 5539ae96
      Rusty Russell 提交于
      So, we previously had a 'VIRTIO_NET_F_GSO' bit which meant that 'the
      host can handle csum offload, and any TSO (v4&v6 incl ECN) or UFO
      packets you might want to send.  I thought this was good enough for
      Linux, but it actually isn't, since we don't do UFO in software.
      
      So, add separate feature bits for what the host can handle.  Add
      equivalent ones for the guest to say what it can handle, because LRO
      is coming too (thanks Herbert!).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      5539ae96
    • R
      virtio: wean net driver off NETDEV_TX_BUSY · 99ffc696
      Rusty Russell 提交于
      Herbert tells me that returning NETDEV_TX_BUSY from hard_start_xmit is
      seen as a poor thing to do; we should cache the packet and stop the queue.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      99ffc696
    • R
      virtio: fix scatterlist sizing in net driver. · 05271685
      Rusty Russell 提交于
      Herbert Xu points out (within another patch) that my scatterlists are
      too short: one entry for the gso header, one for the skb->data, and
      MAX_SKB_FRAGS for all the fragments.
      
      Fix both xmit and recv sides (recv currently unused, coming in later
      patch).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      05271685
    • R
      virtio: fix tx_ stats in virtio_net · 655aa31f
      Rusty Russell 提交于
      get_buf() gives the length written by the other side, which will be
      zero.  We want to add the skb length.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      655aa31f
  17. 09 4月, 2008 1 次提交
    • D
      [NET]: Undo code bloat in hot paths due to print_mac(). · 21f644f3
      David S. Miller 提交于
      If print_mac() is used inside of a pr_debug() the compiler
      can't see that the call is redundant so still performs it
      even of pr_debug() ends up being a nop.
      
      So don't use print_mac() in such cases in hot code paths,
      use MAC_FMT et al. instead.
      
      As noted by Joe Perches, pr_debug() could be modified to
      handle this better, but that is a change to an interface
      used by the entire kernel and thus needs to be validated
      carefully.  This here is thus the less risky fix for
      2.6.25
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21f644f3
  18. 08 4月, 2008 1 次提交
  19. 17 3月, 2008 2 次提交
    • C
      virtio: fix race in enable_cb · 4265f161
      Christian Borntraeger 提交于
      There is a race in virtio_net, dealing with disabling/enabling the callback.
      I saw the following oops:
      
      kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:218!
      illegal operation: 0001 [#1] SMP
      Modules linked in: sunrpc dm_mod
      CPU: 2 Not tainted 2.6.25-rc1zlive-host-10623-gd358142-dirty #99
      Process swapper (pid: 0, task: 000000000f85a610, ksp: 000000000f873c60)
      Krnl PSW : 0404300180000000 00000000002b81a6 (vring_disable_cb+0x16/0x20)
                 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
      Krnl GPRS: 0000000000000001 0000000000000001 0000000010005800 0000000000000001
                 000000000f3a0900 000000000f85a610 0000000000000000 0000000000000000
                 0000000000000000 000000000f870000 0000000000000000 0000000000001237
                 000000000f3a0920 000000000010ff74 00000000002846f6 000000000fa0bcd8
      Krnl Code: 00000000002b819a: a7110001           tmll    %r1,1
                 00000000002b819e: a7840004           brc     8,2b81a6
                 00000000002b81a2: a7f40001           brc     15,2b81a4
                >00000000002b81a6: a51b0001           oill    %r1,1
                 00000000002b81aa: 40102000           sth     %r1,0(%r2)
                 00000000002b81ae: 07fe               bcr     15,%r14
                 00000000002b81b0: eb7ff0380024       stmg    %r7,%r15,56(%r15)
                 00000000002b81b6: a7f13e00           tmll    %r15,15872
      Call Trace:
      ([<000000000fa0bcd0>] 0xfa0bcd0)
       [<00000000002b8350>] vring_interrupt+0x5c/0x6c
       [<000000000010ab08>] do_extint+0xb8/0xf0
       [<0000000000110716>] ext_no_vtime+0x16/0x1a
       [<0000000000107e72>] cpu_idle+0x1c2/0x1e0
      
      The problem can be triggered with a high amount of host->guest traffic.
      I think its the following race:
      
      poll says netif_rx_complete
      poll calls enable_cb
      enable_cb opens the interrupt mask
      a new packet comes, an interrupt is triggered----\
      enable_cb sees that there is more work           |
      enable_cb disables the interrupt                 |
             .                                         V
             .                            interrupt is delivered
             .                            skb_recv_done does atomic napi test, ok
       some waiting                       disable_cb is called->check fails->bang!
             .
      poll would do napi check
      poll would do disable_cb
      
      The fix is to let enable_cb not disable the interrupt again, but expect the
      caller to do the cleanup if it returns false. In that case, the interrupt is
      only disabled, if the napi test_set_bit was successful.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned up doco)
      4265f161
    • A
      virtio: Enable netpoll interface for netconsole logging · da74e89d
      Amit Shah 提交于
      Add a new poll_controller handler that the netpoll interface needs.
      
      This enables netconsole logging from a kvm guest over the virtio
      net interface.
      Signed-off-by: NAmit Shah <amitshah@gmx.net>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      da74e89d
  20. 24 2月, 2008 1 次提交
  21. 06 2月, 2008 1 次提交
    • C
      virtio net: fix oops on interface-up · 370076d9
      Christian Borntraeger 提交于
      I got the following oops during interface ifup. Unfortunately its not
      easily reproducable so I cant say for sure that my fix fixes this
      problem, but I am confident and I think its correct anyway:
      
         <2>kernel BUG at /space/kvm/drivers/virtio/virtio_ring.c:234!
          <4>illegal operation: 0001 [#1] PREEMPT SMP
          <4>Modules linked in:
          <4>CPU: 0 Not tainted 2.6.24zlive-guest-07293-gf1ca1512-dirty #91
          <4>Process swapper (pid: 0, task: 0000000000800938, ksp: 000000000084ddb8)
          <4>Krnl PSW : 0404300180000000 0000000000466374 (vring_disable_cb+0x30/0x34)
          <4>           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3
          <4>Krnl GPRS: 0000000000000001 0000000000000001 0000000010003800 0000000000466344
          <4>           000000000e980900 00000000008848b0 000000000084e748 0000000000000000
          <4>           000000000087b300 0000000000001237 0000000000001237 000000000f85bdd8
          <4>           000000000e980920 00000000001137c0 0000000000464754 000000000f85bdd8
          <4>Krnl Code: 0000000000466368: e3b0b0700004        lg      %r11,112(%r11)
          <4>           000000000046636e: 07fe                bcr     15,%r14
          <4>           0000000000466370: a7f40001            brc     15,466372
          <4>          >0000000000466374: a7f4fff6            brc     15,466360
          <4>           0000000000466378: eb7ff0500024        stmg    %r7,%r15,80(%r15)
          <4>           000000000046637e: a7f13e00            tmll    %r15,15872
          <4>           0000000000466382: b90400ef            lgr     %r14,%r15
          <4>           0000000000466386: a7840001            brc     8,466388
          <4>Call Trace:
          <4>([<000201500f85c000>] 0x201500f85c000)
          <4> [<0000000000466556>] vring_interrupt+0x72/0x88
          <4> [<00000000004801a0>] kvm_extint_handler+0x34/0x44
          <4> [<000000000010d22c>] do_extint+0xbc/0xf8
          <4> [<0000000000113f98>] ext_no_vtime+0x16/0x1a
          <4> [<000000000010a182>] cpu_idle+0x216/0x238
          <4>([<000000000010a162>] cpu_idle+0x1f6/0x238)
          <4> [<0000000000568656>] rest_init+0xaa/0xb8
          <4> [<000000000084ee2c>] start_kernel+0x3fc/0x490
          <4> [<0000000000100020>] _stext+0x20/0x80
          <4>
          <4> <0>Kernel panic - not syncing: Fatal exception in interrupt
          <4>
      
      After looking at the code and the dump I think the following scenario
      happened: Ifup was running on cpu2 and the interrupt arrived on cpu0.
      Now virtnet_open on cpu 2 managed to execute napi_enable and disable_cb
      but did not execute rx_schedule. Meanwhile on cpu 0 skb_recv_done was
      called by vring_interrupt, executed netif_rx_schedule_prep, which
      succeeded and therefore called disable_cb. This triggered the BUG_ON,
      as interrupts were already disabled by cpu 2.
      
      I think the proper solution is to make the call to disable_cb depend on
      the atomic update of NAPI_STATE_SCHED by using netif_rx_schedule_prep
      in the same way as skb_recv_done.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      370076d9
  22. 04 2月, 2008 1 次提交