1. 21 1月, 2015 1 次提交
  2. 23 12月, 2014 1 次提交
    • H
      virtio_net: Fix napi poll list corruption · 8acdf999
      Herbert Xu 提交于
      The commit d75b1ade (net: less
      interrupt masking in NAPI) breaks virtio_net in an insidious way.
      
      It is now required that if the entire budget is consumed when poll
      returns, the napi poll_list must remain empty.  However, like some
      other drivers virtio_net tries to do a last-ditch check and if
      there is more work it will call napi_schedule and then immediately
      process some of this new work.  Should the entire budget be consumed
      while processing such new work then we will violate the new caller
      contract.
      
      This patch fixes this by not touching any work when we reschedule
      in virtio_net.
      
      The worst part of this bug is that the list corruption causes other
      napi users to be moved off-list.  In my case I was chasing a stall
      in IPsec (IPsec uses netif_rx) and I only belatedly realised that it
      was virtio_net which caused the stall even though the virtio_net
      poll was still functioning perfectly after IPsec stalled.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8acdf999
  3. 09 12月, 2014 8 次提交
  4. 21 11月, 2014 1 次提交
  5. 31 10月, 2014 1 次提交
    • B
      drivers/net: Disable UFO through virtio · 3d0ad094
      Ben Hutchings 提交于
      IPv6 does not allow fragmentation by routers, so there is no
      fragmentation ID in the fixed header.  UFO for IPv6 requires the ID to
      be passed separately, but there is no provision for this in the virtio
      net protocol.
      
      Until recently our software implementation of UFO/IPv6 generated a new
      ID, but this was a bug.  Now we will use ID=0 for any UFO/IPv6 packet
      passed through a tap, which is even worse.
      
      Unfortunately there is no distinction between UFO/IPv4 and v6
      features, so disable UFO on taps and virtio_net completely until we
      have a proper solution.
      
      We cannot depend on VM managers respecting the tap feature flags, so
      keep accepting UFO packets but log a warning the first time we do
      this.
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Fixes: 916e4cf4 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d0ad094
  6. 16 10月, 2014 1 次提交
  7. 15 10月, 2014 6 次提交
  8. 14 9月, 2014 1 次提交
    • R
      virtio_net: pass well-formed sgs to virtqueue_add_*() · a5835440
      Rusty Russell 提交于
      This is the only driver which doesn't hand virtqueue_add_inbuf and
      virtqueue_add_outbuf a well-formed, well-terminated sg.  Fix it,
      so we can make virtio_add_* simpler.
      
      pktgen results:
      	modprobe pktgen
      	echo 'add_device eth0' > /proc/net/pktgen/kpktgend_0
      	echo nowait 1 > /proc/net/pktgen/eth0
      	echo count 1000000 > /proc/net/pktgen/eth0
      	echo clone_skb 100000 > /proc/net/pktgen/eth0
      	echo dst_mac 4e:14:25:a9:30:ac > /proc/net/pktgen/eth0
      	echo dst 192.168.1.2 > /proc/net/pktgen/eth0
      	for i in `seq 20`; do echo start > /proc/net/pktgen/pgctrl; tail -n1 /proc/net/pktgen/eth0; done
      
      Before:
        746547-793084(786421+/-9.6e+03)pps 346-367(364.4+/-4.4)Mb/sec (346397808-367990976(3.649e+08+/-4.5e+06)bps) errors: 0
      
      After:
        767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5835440
  9. 28 8月, 2014 1 次提交
  10. 26 8月, 2014 1 次提交
    • D
      net: Remove ndo_xmit_flush netdev operation, use signalling instead. · 0b725a2c
      David S. Miller 提交于
      As reported by Jesper Dangaard Brouer, for high packet rates the
      overhead of having another indirect call in the TX path is
      non-trivial.
      
      There is the indirect call itself, and then there is all of the
      reloading of the state to refetch the tail pointer value and
      then write the device register.
      
      Move to a more passive scheme, which requires very light modifications
      to the device drivers.
      
      The signal is a new skb->xmit_more value, if it is non-zero it means
      that more SKBs are pending to be transmitted on the same queue as the
      current SKB.  And therefore, the driver may elide the tail pointer
      update.
      
      Right now skb->xmit_more is always zero.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b725a2c
  11. 25 8月, 2014 1 次提交
  12. 24 7月, 2014 2 次提交
    • J
      virtio-net: rx busy polling support · 91815639
      Jason Wang 提交于
      Add basic support for rx busy polling. Instead of introducing new
      states and spinlock to synchronize between NAPI and polling method,
      this patch just reuse NAPI state to avoid extra overhead for fast path
      and simplified the codes.
      
      Test was done between a kvm guest and an external host. Two hosts were
      connected through 40gb mlx4 cards. With both busy_poll and busy_read
      are set to 50 in guest, 1 byte netperf tcp_rr shows 127% improvement:
      transaction rate was increased from 8353.33 to 18966.87.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91815639
    • J
      virtio-net: introduce virtnet_receive() · 2ffa7598
      Jason Wang 提交于
      Move common receive logic to a new helper virtnet_receive(). It will
      also be used by rx busy polling method.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ffa7598
  13. 14 5月, 2014 1 次提交
  14. 01 5月, 2014 1 次提交
    • Z
      virtio-net: Set needed_headroom for virtio-net when VIRTIO_F_ANY_LAYOUT is true · 6ebbc1a6
      Zhangjie \(HZ\) 提交于
      This is a small supplement for commit e7428e95
      ("virtio-net: put virtio-net header inline with data"). TCP packages have
      enough room to put virtio-net header in, but UDP packages do not. By
      setting dev->needed_headroom for virtio-net device, UDP packages could have
      enough room.
      
      For UDP packages, sk_buff is alloced in fun __ip_append_data. The size is
      "alloclen + hh_len + 15", and "hh_len = LL_RESERVED_SPACE(rt-dst.dev);".
      The Macro is defined as follows:
      #define LL_RESERVED_SPACE(dev) \
           ((((dev)->hard_header_len+(dev)->needed_headroom)\
           &~(HH_DATA_MOD - 1)) + HH_DATA_MOD)
      By default, for UDP packages, after skb is allocated, only 16 bytes
      reserved. And 2 bytes remained after mac header is set. That is not enough
      to put virtio-net header in. If we set dev->needed_headroom to 12 or 10
      (according to mergeable_rx_bufs is on or off ), more room can be reserved.
      Then there is enough room for UDP packages to put the header in.
      
      test result list as below:
      guest and host: suse11sp3, netperf, intel 2.4GHz
      +-------+---------+---------+---------+---------+
      |       |   old             |   new             |
      +-------+---------+---------+---------+---------+
      | UDP   |  Gbit/s | pps     |  Gbit/s | pps     |
      | 64    |  0.57   | 692232  |  0.61   | 742420  |
      | 256   |  1.60   | 686860  |  1.71   | 733331  |
      | 512   |  2.92   | 674576  |  3.07   | 710446  |
      | 1024  |  4.99   | 598977  |  5.17   | 620821  |
      | 1460  |  5.68   | 483757  |  7.16   | 610519  |
      | 4096  |  6.98   | 637468  |  7.21   | 658471  |
      +-------+---------+---------+---------+---------+
      Signed-off-by: NZhang Jie <zhangjie14@huawei.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ebbc1a6
  15. 23 4月, 2014 1 次提交
  16. 28 3月, 2014 1 次提交
    • J
      virtio-net: correct error handling of virtqueue_kick() · 681daee2
      Jason Wang 提交于
      Current error handling of virtqueue_kick() was wrong in two places:
      - The skb were freed immediately when virtqueue_kick() fail during
        xmit. This may lead double free since the skb was not detached from
        the virtqueue.
      - try_fill_recv() returns false when virtqueue_kick() fail. This will
        lead unnecessary rescheduling of refill work.
      
      Actually, it's safe to just ignore the kick failure in those two
      places. So this patch fixes this by partially revert commit
      67975901.
      
      Fixes 67975901
      (virtio_net: verify if virtqueue_kick() succeeded).
      
      Cc: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      681daee2
  17. 25 3月, 2014 1 次提交
  18. 15 3月, 2014 1 次提交
  19. 13 3月, 2014 1 次提交
  20. 25 2月, 2014 1 次提交
  21. 17 1月, 2014 4 次提交
    • M
      virtio-net: initial rx sysfs support, export mergeable rx buffer size · fbf28d78
      Michael Dalton 提交于
      Add initial support for per-rx queue sysfs attributes to virtio-net. If
      mergeable packet buffers are enabled, adds a read-only mergeable packet
      buffer size sysfs attribute for each RX queue.
      Suggested-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael Dalton <mwdalton@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbf28d78
    • M
      virtio-net: auto-tune mergeable rx buffer size for improved performance · ab7db917
      Michael Dalton 提交于
      Commit 2613af0e ("virtio_net: migrate mergeable rx buffers to page frag
      allocators") changed the mergeable receive buffer size from PAGE_SIZE to
      MTU-size, introducing a single-stream regression for benchmarks with large
      average packet size. There is no single optimal buffer size for all
      workloads.  For workloads with packet size <= MTU bytes, MTU + virtio-net
      header-sized buffers are preferred as larger buffers reduce the TCP window
      due to SKB truesize. However, single-stream workloads with large average
      packet sizes have higher throughput if larger (e.g., PAGE_SIZE) buffers
      are used.
      
      This commit auto-tunes the mergeable receiver buffer packet size by
      choosing the packet buffer size based on an EWMA of the recent packet
      sizes for the receive queue. Packet buffer sizes range from MTU_SIZE +
      virtio-net header len to PAGE_SIZE. This improves throughput for
      large packet workloads, as any workload with average packet size >=
      PAGE_SIZE will use PAGE_SIZE buffers.
      
      These optimizations interact positively with recent commit
      ba275241 ("virtio-net: coalesce rx frags when possible during rx"),
      which coalesces adjacent RX SKB fragments in virtio_net. The coalescing
      optimizations benefit buffers of any size.
      
      Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
      between two QEMU VMs on a single physical machine. Each VM has two VCPUs
      with all offloads & vhost enabled. All VMs and vhost threads run in a
      single 4 CPU cgroup cpuset, using cgroups to ensure that other processes
      in the system will not be scheduled on the benchmark CPUs. Trunk includes
      SKB rx frag coalescing.
      
      net-next w/ virtio_net before 2613af0e (PAGE_SIZE bufs): 14642.85Gb/s
      net-next (MTU-size bufs):  13170.01Gb/s
      net-next + auto-tune: 14555.94Gb/s
      
      Jason Wang also reported a throughput increase on mlx4 from 22Gb/s
      using MTU-sized buffers to about 26Gb/s using auto-tuning.
      Signed-off-by: NMichael Dalton <mwdalton@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab7db917
    • M
      virtio-net: use per-receive queue page frag alloc for mergeable bufs · fb51879d
      Michael Dalton 提交于
      The virtio-net driver currently uses netdev_alloc_frag() for GFP_ATOMIC
      mergeable rx buffer allocations. This commit migrates virtio-net to use
      per-receive queue page frags for GFP_ATOMIC allocation. This change unifies
      mergeable rx buffer memory allocation, which now will use skb_refill_frag()
      for both atomic and GFP-WAIT buffer allocations.
      
      To address fragmentation concerns, if after buffer allocation there
      is too little space left in the page frag to allocate a subsequent
      buffer, the remaining space is added to the current allocated buffer
      so that the remaining space can be used to store packet data.
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMichael Dalton <mwdalton@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb51879d
    • J
      virtio-net: drop rq->max and rq->num · be121f46
      Jason Wang 提交于
      It looks like there's no need for those two fields:
      
      - Unless there's a failure for the first refill try, rq->max should be always
        equal to the vring size.
      - rq->num is only used to determine the condition that we need to do the refill,
        we could check vq->num_free instead.
      - rq->num was required to be increased or decreased explicitly after each
        get/put which results a bad API.
      
      So this patch removes them both to make the code simpler.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be121f46
  22. 03 1月, 2014 1 次提交
    • J
      virtio-net: fix refill races during restore · 6cd4ce00
      Jason Wang 提交于
      During restoring, try_fill_recv() was called with neither napi lock nor napi
      disabled. This can lead two try_fill_recv() was called in the same time. Fix
      this by refilling before trying to enable napi.
      
      Fixes 0741bcb5
      (virtio: net: Add freeze, restore handlers to support S4).
      
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cd4ce00
  23. 11 12月, 2013 2 次提交