1. 19 11月, 2013 1 次提交
  2. 15 11月, 2013 1 次提交
    • M
      virtio-net: mergeable buffer size should include virtio-net header · 5061de36
      Michael Dalton 提交于
      Commit 2613af0e ("virtio_net: migrate mergeable rx buffers to page
      frag allocators") changed the mergeable receive buffer size from PAGE_SIZE
      to MTU-size. However, the merge buffer size does not take into account the
      size of the virtio-net header. Consequently, packets that are MTU-size
      will take two buffers intead of one (to store the virtio-net header),
      substantially decreasing the throughput of MTU-size traffic due to TCP
      window / SKB truesize effects.
      
      This commit changes the mergeable buffer size to include the virtio-net
      header. The buffer size is cacheline-aligned because skb_page_frag_refill
      will not automatically align the requested size.
      
      Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
      between two QEMU VMs on a single physical machine. Each VM has two VCPUs and
      vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup
      cpuset, using cgroups to ensure that other processes in the system will not
      be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive
      buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to
      force MTU-sized packets on the receiver.
      
      next-net trunk before 2613af0e (PAGE_SIZE buf): 3861.08Gb/s
      net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s
      net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s
      net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s
      Suggested-by: NEric Northup <digitaleric@google.com>
      Signed-off-by: NMichael Dalton <mwdalton@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5061de36
  3. 06 11月, 2013 1 次提交
  4. 05 11月, 2013 1 次提交
    • J
      virtio-net: coalesce rx frags when possible during rx · ba275241
      Jason Wang 提交于
      Commit 2613af0e (virtio_net: migrate mergeable
      rx buffers to page frag allocators) try to increase the payload/truesize for
      MTU-sized traffic. But this will introduce the extra overhead for GSO packets
      received because of the frag list. This commit tries to reduce this issue by
      coalesce the possible rx frags when possible during rx. Test result shows the
      about 15% improvement on full size GSO packet receiving (and even better than
      before commit 2613af0e).
      
      Before this commit:
      ./netperf -H 192.168.100.4
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
      () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    20303.87
      
      After this commit:
      ./netperf -H 192.168.100.4
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.100.4
      () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    23841.26
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Michael Dalton <mwdalton@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba275241
  5. 30 10月, 2013 1 次提交
    • J
      virtio-net: correctly handle cpu hotplug notifier during resuming · ec9debbd
      Jason Wang 提交于
      commit 3ab098df (virtio-net: don't respond to
      cpu hotplug notifier if we're not ready) tries to bypass the cpu hotplug
      notifier by checking the config_enable and does nothing is it was false. So it
      need to try to hold the config_lock mutex which may happen in atomic
      environment which leads the following warnings:
      
      [  622.944441] CPU0 attaching NULL sched-domain.
      [  622.944446] CPU1 attaching NULL sched-domain.
      [  622.944485] CPU0 attaching NULL sched-domain.
      [  622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616
      [  622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1
      [  622.950796] no locks held by migration/1/10.
      [  622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317
      [  622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  622.950802]  0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70
      [  622.950803]  ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed
      [  622.950805]  0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000
      [  622.950805] Call Trace:
      [  622.950810]  [<ffffffff81a32f22>] dump_stack+0x54/0x74
      [  622.950815]  [<ffffffff810edb02>] __might_sleep+0x112/0x114
      [  622.950817]  [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6
      [  622.950818]  [<ffffffff810e861d>] ? up+0x39/0x3e
      [  622.950821]  [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d
      [  622.950824]  [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62
      [  622.950828]  [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87
      [  622.950830]  [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e
      [  622.950832]  [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10
      [  622.950835]  [<ffffffff810c5556>] __cpu_notify+0x20/0x37
      [  622.950836]  [<ffffffff810c5580>] cpu_notify+0x13/0x15
      [  622.950838]  [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a
      [  622.950841]  [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1
      [  622.950842]  [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f
      [  622.950844]  [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f
      [  622.950847]  [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8
      [  622.950848]  [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47
      [  622.950850]  [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3
      [  622.950852]  [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67
      [  622.950854]  [<ffffffff810e36b7>] kthread+0xd8/0xe0
      [  622.950857]  [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164
      [  622.950859]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
      [  622.950861]  [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0
      [  622.950862]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
      [  622.950876] smpboot: CPU 1 is now offline
      [  623.194556] SMP alternatives: lockdep: fixing up alternatives
      [  623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1
      ...
      
      A correct fix is to unregister the hotcpu notifier during restore and register a
      new one in resume.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec9debbd
  6. 29 10月, 2013 1 次提交
  7. 18 10月, 2013 2 次提交
  8. 04 9月, 2013 1 次提交
  9. 28 7月, 2013 1 次提交
  10. 10 7月, 2013 1 次提交
    • M
      virtio_net: fix race in RX VQ processing · cbdadbbf
      Michael S. Tsirkin 提交于
      virtio net called virtqueue_enable_cq on RX path after napi_complete, so
      with NAPI_STATE_SCHED clear - outside the implicit napi lock.
      This violates the requirement to synchronize virtqueue_enable_cq wrt
      virtqueue_add_buf.  In particular, used event can move backwards,
      causing us to lose interrupts.
      In a debug build, this can trigger panic within START_USE.
      
      Jason Wang reports that he can trigger the races artificially,
      by adding udelay() in virtqueue_enable_cb() after virtio_mb().
      
      However, we must call napi_complete to clear NAPI_STATE_SCHED before
      polling the virtqueue for used buffers, otherwise napi_schedule_prep in
      a callback will fail, causing us to lose RX events.
      
      To fix, call virtqueue_enable_cb_prepare with NAPI_STATE_SCHED
      set (under napi lock), later call virtqueue_poll with
      NAPI_STATE_SCHED clear (outside the lock).
      Reported-by: NJason Wang <jasowang@redhat.com>
      Tested-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbdadbbf
  11. 04 7月, 2013 1 次提交
    • J
      virtio-net: fix the race between channels setting and refill · 9b9cd802
      Jason Wang 提交于
      Commit 55257d72 (virtio-net: fill only rx queues
      which are being used) tries to refill on demand when changing the number of
      channels by call try_refill_recv() directly, this may race:
      
      - the refill work who may do the refill in the same time
      - the try_refill_recv() called in bh since napi was not disabled
      
      Which may led guest complain during setting channels:
      
      virtio_net virtio0: input.1:id 0 is not a head!
      
      Solve this issue by scheduling a refill work which can guarantee the
      serialization of refill.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9b9cd802
  12. 23 5月, 2013 1 次提交
  13. 12 5月, 2013 1 次提交
  14. 29 4月, 2013 1 次提交
    • S
      virtio-net: fill only rx queues which are being used · 55257d72
      Sasha Levin 提交于
      Due to MQ support we may allocate a whole bunch of rx queues but
      never use them. With this patch we'll safe the space used by
      the receive buffers until they are actually in use:
      
      sh-4.2# free -h
                   total       used       free     shared    buffers     cached
      Mem:          490M        35M       455M         0B         0B       4.1M
      -/+ buffers/cache:        31M       459M
      Swap:           0B         0B         0B
      sh-4.2# ethtool -L eth0 combined 8
      sh-4.2# free -h
                   total       used       free     shared    buffers     cached
      Mem:          490M       162M       327M         0B         0B       4.1M
      -/+ buffers/cache:       158M       331M
      Swap:           0B         0B         0B
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      55257d72
  15. 20 4月, 2013 2 次提交
  16. 12 4月, 2013 1 次提交
    • J
      virtio-net: initialize vlan_features · 4fda8302
      Jason Wang 提交于
      There's nothing that prevent passing the device features of virtio_net to its
      vlan device. So this patch simply passes those to vlan device to benefit from
      advanced features.
      
      Netperf shows better sending performance for vlan device since TSO can work on
      vlan now.
      
      before:
      netperf -H 192.168.5.2
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
      port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    4162.35
      
      after:
      netperf -H 192.168.5.2
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
      port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    9365.42
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fda8302
  17. 22 3月, 2013 1 次提交
  18. 20 3月, 2013 2 次提交
  19. 14 2月, 2013 1 次提交
    • P
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar 提交于
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9af6db4
  20. 13 2月, 2013 1 次提交
  21. 05 2月, 2013 1 次提交
  22. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  23. 27 1月, 2013 3 次提交
  24. 22 1月, 2013 2 次提交
  25. 18 12月, 2012 4 次提交
  26. 11 12月, 2012 1 次提交
  27. 09 12月, 2012 3 次提交
  28. 04 12月, 2012 1 次提交
  29. 10 11月, 2012 1 次提交