1. 29 11月, 2013 1 次提交
  2. 15 11月, 2013 1 次提交
    • J
      macvtap: limit head length of skb allocated · 16a3fa28
      Jason Wang 提交于
      We currently use hdr_len as a hint of head length which is advertised by
      guest. But when guest advertise a very big value, it can lead to an 64K+
      allocating of kmalloc() which has a very high possibility of failure when host
      memory is fragmented or under heavy stress. The huge hdr_len also reduce the
      effect of zerocopy or even disable if a gso skb is linearized in guest.
      
      To solves those issues, this patch introduces an upper limit (PAGE_SIZE) of the
      head, which guarantees an order 0 allocation each time.
      
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16a3fa28
  3. 21 8月, 2013 3 次提交
  4. 12 8月, 2013 1 次提交
    • E
      macvtap: fix two races · 29d79196
      Eric Dumazet 提交于
      Since commit ac4e4af1 ("macvtap: Consistently use rcu functions"),
      Thomas gets two different warnings :
      
      BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45891/45892
      caller is macvtap_do_read+0x45c/0x600 [macvtap]
      CPU: 1 PID: 45892 Comm: vhost-45891 Not tainted 3.11.0-bisecttest #13
      Call Trace:
      ([<00000000001126ee>] show_trace+0x126/0x144)
       [<00000000001127d2>] show_stack+0xc6/0xd4
       [<000000000068bcec>] dump_stack+0x74/0xd8
       [<0000000000481066>] debug_smp_processor_id+0xf6/0x114
       [<000003ff802e9a18>] macvtap_do_read+0x45c/0x600 [macvtap]
       [<000003ff802e9c1c>] macvtap_recvmsg+0x60/0x88 [macvtap]
       [<000003ff80318c5e>] handle_rx+0x5b2/0x800 [vhost_net]
       [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost]
       [<000000000015f3ac>] kthread+0xd8/0xe4
       [<00000000006934a6>] kernel_thread_starter+0x6/0xc
       [<00000000006934a0>] kernel_thread_starter+0x0/0xc
      
      And
      
      BUG: using smp_processor_id() in preemptible [00000000] code: vhost-45897/45898
      caller is macvlan_start_xmit+0x10a/0x1b4 [macvlan]
      CPU: 1 PID: 45898 Comm: vhost-45897 Not tainted 3.11.0-bisecttest #16
      Call Trace:
      ([<00000000001126ee>] show_trace+0x126/0x144)
       [<00000000001127d2>] show_stack+0xc6/0xd4
       [<000000000068bdb8>] dump_stack+0x74/0xd4
       [<0000000000481132>] debug_smp_processor_id+0xf6/0x114
       [<000003ff802b72ca>] macvlan_start_xmit+0x10a/0x1b4 [macvlan]
       [<000003ff802ea69a>] macvtap_get_user+0x982/0xbc4 [macvtap]
       [<000003ff802ea92a>] macvtap_sendmsg+0x4e/0x60 [macvtap]
       [<000003ff8031947c>] handle_tx+0x494/0x5ec [vhost_net]
       [<000003ff8028f77c>] vhost_worker+0x15c/0x1c4 [vhost]
       [<000000000015f3ac>] kthread+0xd8/0xe4
       [<000000000069356e>] kernel_thread_starter+0x6/0xc
       [<0000000000693568>] kernel_thread_starter+0x0/0xc
      2 locks held by vhost-45897/45898:
       #0:  (&vq->mutex){+.+.+.}, at: [<000003ff8031903c>] handle_tx+0x54/0x5ec [vhost_net]
       #1:  (rcu_read_lock){.+.+..}, at: [<000003ff802ea53c>] macvtap_get_user+0x824/0xbc4 [macvtap]
      
      In the first case, macvtap_put_user() calls macvlan_count_rx()
      in a preempt-able context, and this is not allowed.
      
      In the second case, macvtap_get_user() calls
      macvlan_start_xmit() with BH enabled, and this is not allowed.
      Reported-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Bisected-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Cc: Vlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29d79196
  5. 10 8月, 2013 1 次提交
    • E
      net: attempt high order allocations in sock_alloc_send_pskb() · 28d64271
      Eric Dumazet 提交于
      Adding paged frags skbs to af_unix sockets introduced a performance
      regression on large sends because of additional page allocations, even
      if each skb could carry at least 100% more payload than before.
      
      We can instruct sock_alloc_send_pskb() to attempt high order
      allocations.
      
      Most of the time, it does a single page allocation instead of 8.
      
      I added an additional parameter to sock_alloc_send_pskb() to
      let other users to opt-in for this new feature on followup patches.
      
      Tested:
      
      Before patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    46861.15
      
      After patch :
      
      $ netperf -t STREAM_STREAM
      STREAM STREAM TEST
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       2304  212992  212992    10.00    57981.11
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28d64271
  6. 08 8月, 2013 2 次提交
  7. 19 7月, 2013 1 次提交
    • J
      macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS · ece793fc
      Jason Wang 提交于
      We try to linearize part of the skb when the number of iov is greater than
      MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
      one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
      network.
      
      Solve this problem by calculate the pages needed for iov before trying to do
      zerocopy and switch to use copy instead of zerocopy if it needs more than
      MAX_SKB_FRAGS.
      
      This is done through introducing a new helper to count the pages for iov, and
      call uarg->callback() manually when switching from zerocopy to copy to notify
      vhost.
      
      We can do further optimization on top.
      
      This bug were introduced from b92946e2
      (macvtap: zerocopy: validate vectors before building skb).
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ece793fc
  8. 17 7月, 2013 2 次提交
  9. 11 7月, 2013 1 次提交
    • J
      macvtap: correctly linearize skb when zerocopy is used · 61d46bf9
      Jason Wang 提交于
      Userspace may produce vectors greater than MAX_SKB_FRAGS. When we try to
      linearize parts of the skb to let the rest of iov to be fit in
      the frags, we need count copylen into linear when calling macvtap_alloc_skb()
      instead of partly counting it into data_len. Since this breaks
      zerocopy_sg_from_iovec() since its inner counter assumes nr_frags should
      be zero at beginning. This cause nr_frags to be increased wrongly without
      setting the correct frags.
      
      This bug were introduced from b92946e2
      (macvtap: zerocopy: validate vectors before building skb).
      
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61d46bf9
  10. 26 6月, 2013 5 次提交
  11. 13 6月, 2013 2 次提交
    • J
      macvtap: fix uninitialized return value macvtap_ioctl_set_queue() · f57855a5
      Jason Wang 提交于
      Return -EINVAL on illegal flag instead of uninitialized value. This fixes the
      kbuild test warning.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f57855a5
    • J
      macvtap: slient sparse warnings · d9a90a31
      Jason Wang 提交于
      This patch silents the following sparse warnings:
      
      drivers/net/macvtap.c:98:9: warning: incorrect type in assignment (different
      address spaces)
      drivers/net/macvtap.c:98:9:    expected struct macvtap_queue *<noident>
      drivers/net/macvtap.c:98:9:    got struct macvtap_queue [noderef]
      <asn:4>*<noident>
      drivers/net/macvtap.c:120:9: warning: incorrect type in assignment (different
      address spaces)
      drivers/net/macvtap.c:120:9:    expected struct macvtap_queue *<noident>
      drivers/net/macvtap.c:120:9:    got struct macvtap_queue [noderef]
      <asn:4>*<noident>
      drivers/net/macvtap.c:151:22: error: incompatible types in comparison expression
      (different address spaces)
      drivers/net/macvtap.c:233:23: error: incompatible types in comparison expression
      (different address spaces)
      drivers/net/macvtap.c:243:23: error: incompatible types in comparison expression
      (different address spaces)
      drivers/net/macvtap.c:247:15: error: incompatible types in comparison expression
      (different address spaces)
        CC [M]  drivers/net/macvtap.o
      drivers/net/macvlan.c:232:24: error: incompatible types in comparison expression
      (different address spaces)
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9a90a31
  12. 08 6月, 2013 6 次提交
  13. 29 5月, 2013 1 次提交
  14. 28 3月, 2013 1 次提交
  15. 27 3月, 2013 1 次提交
  16. 28 2月, 2013 1 次提交
  17. 14 2月, 2013 1 次提交
    • P
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar 提交于
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9af6db4
  18. 28 1月, 2013 1 次提交
    • E
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet 提交于
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      Reported-by: NPravin Shelar <pshelar@nicira.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cef401de
  19. 13 8月, 2012 1 次提交
  20. 08 6月, 2012 1 次提交
  21. 12 5月, 2012 1 次提交
  22. 02 5月, 2012 5 次提交