1. 12 6月, 2018 1 次提交
  2. 31 5月, 2018 1 次提交
  3. 25 5月, 2018 1 次提交
  4. 11 4月, 2018 3 次提交
  5. 09 4月, 2018 1 次提交
    • H
      vhost-net: set packet weight of tx polling to 2 * vq size · a2ac9990
      haibinzhang(张海斌) 提交于
      handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
      polling udp packets with small length(e.g. 1byte udp payload), because setting
      VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
      
      Ping-Latencies shown below were tested between two Virtual Machines using
      netperf (UDP_STREAM, len=1), and then another machine pinged the client:
      
      vq size=256
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           3.319   18.489    57.303
      64               1.643    2.021     2.552
      128              1.825    2.600     3.224
      256              1.997    2.710     4.295
      512              1.860    3.171     4.631
      1024             2.002    4.173     9.056
      2048             2.257    5.650     9.688
      4096             2.093    8.508    15.943
      
      vq size=512
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           6.537   29.177    66.245
      64               2.798    3.614     4.403
      128              2.861    3.820     4.775
      256              3.008    4.018     4.807
      512              3.254    4.523     5.824
      1024             3.079    5.335     7.747
      2048             3.944    8.201    12.762
      4096             4.158   11.057    19.985
      
      Seems pretty consistent, a small dip at 2 VQ sizes.
      Ring size is a hint from device about a burst size it can tolerate. Based on
      benchmarks, set the weight to 2 * vq size.
      
      To evaluate this change, another tests were done using netperf(RR, TX) between
      two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
      tweaked through qemu. Results shown below does not show obvious changes.
      
      vq size=256 TCP_RR                vq size=512 TCP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -7%/        -2%      1/       1/   0%/        -2%
         1/       4/  +1%/         0%      1/       4/  +1%/         0%
         1/       8/  +1%/        -2%      1/       8/   0%/        +1%
        64/       1/  -6%/         0%     64/       1/  +7%/        +3%
        64/       4/   0%/        +2%     64/       4/  -1%/        +1%
        64/       8/   0%/         0%     64/       8/  -1%/        -2%
       256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
       256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
       256/       8/  +2%/         0%    256/       8/  +1%/        -1%
      
      vq size=256 UDP_RR                vq size=512 UDP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
         1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
         1/       8/  -1%/        -1%      1/       8/  -1%/         0%
        64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
        64/       4/  -5%/        -1%     64/       4/  +2%/         0%
        64/       8/   0%/        -1%     64/       8/  -2%/        +1%
       256/       1/  +7%/        +1%    256/       1/  -7%/         0%
       256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
       256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
      
      vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
        64/       1/   0%/        -3%     64/       1/   0%/         0%
        64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
        64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
       256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
       256/       4/  -1%/        -1%    256/       4/  -3%/         0%
       256/       8/  +7%/        +5%    256/       8/  -3%/         0%
       512/       1/  +1%/         0%    512/       1/  -1%/        -1%
       512/       4/  +1%/        -1%    512/       4/   0%/         0%
       512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
      1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
      1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
      1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
      2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
      2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
      2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
      4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
      4096/       4/  +2%/         0%   4096/       4/   0%/         0%
      4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NHaibin Zhang <haibinzhang@tencent.com>
      Signed-off-by: NYunfang Tai <yunfangtai@tencent.com>
      Signed-off-by: NLidong Chen <lidongchen@tencent.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2ac9990
  6. 30 3月, 2018 1 次提交
  7. 28 3月, 2018 1 次提交
  8. 27 3月, 2018 1 次提交
  9. 20 3月, 2018 2 次提交
  10. 10 3月, 2018 4 次提交
    • J
      vhost_net: examine pointer types during un-producing · 3a403076
      Jason Wang 提交于
      After commit fc72d1d5 ("tuntap: XDP transmission"), we can
      actually queueing XDP pointers in the pointer ring, so we should
      examine the pointer type before freeing the pointer.
      
      Fixes: fc72d1d5 ("tuntap: XDP transmission")
      Reported-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a403076
    • J
      vhost_net: keep private_data and rx_ring synced · 303fd71b
      Jason Wang 提交于
      We get pointer ring from the exported sock, this means we should keep
      rx_ring and vq->private synced during both vq stop and backend set,
      otherwise we may see stale rx_ring.
      
      Fixes: c67df11f ("vhost_net: try batch dequing from skb array")
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303fd71b
    • A
      vhost_net: initialize rx_ring in vhost_net_open() · ab7e34b3
      Alexander Potapenko 提交于
      KMSAN reported a use of uninit memory in vhost_net_buf_unproduce()
      while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring:
      
      ==================================================================
      BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho
      et.c:170
      CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170
       vhost_net_stop_vq drivers/vhost/net.c:974 [inline]
       vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982
       vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015
       __fput+0x49f/0xa00 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:191 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:166 [inline]
       prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196
       syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265
       do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292
      ...
      origin:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
       kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392
       kmalloc_large_node_hook mm/slub.c:1366 [inline]
       kmalloc_large_node mm/slub.c:3808 [inline]
       __kmalloc_node+0x100e/0x1290 mm/slub.c:3818
       kmalloc_node include/linux/slab.h:554 [inline]
       kvmalloc_node+0x1a5/0x2e0 mm/util.c:419
       kvmalloc include/linux/mm.h:541 [inline]
       vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921
       misc_open+0x7b5/0x8b0 drivers/char/misc.c:154
       chrdev_open+0xc28/0xd90 fs/char_dev.c:417
       do_dentry_open+0xccb/0x1430 fs/open.c:752
       vfs_open+0x272/0x2e0 fs/open.c:866
       do_last fs/namei.c:3378 [inline]
       path_openat+0x49ad/0x6580 fs/namei.c:3519
       do_filp_open+0x267/0x640 fs/namei.c:3553
       do_sys_open+0x6ad/0x9c0 fs/open.c:1059
       SYSC_openat+0xc7/0xe0 fs/open.c:1086
       SyS_openat+0x63/0x90 fs/open.c:1080
       do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287
      ==================================================================
      
      Fixes: c67df11f ("vhost_net: try batch dequing from skb array")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab7e34b3
    • V
      drivers: vhost: vsock: fixed a brace coding style issue · ff3c1b1a
      Vaibhav Murkute 提交于
      Fixed a coding style issue.
      Signed-off-by: NVaibhav Murkute <vaibhavmurkute88@gmail.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff3c1b1a
  11. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  12. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  13. 01 2月, 2018 5 次提交
  14. 31 1月, 2018 1 次提交
  15. 30 1月, 2018 1 次提交
  16. 25 1月, 2018 2 次提交
  17. 11 1月, 2018 1 次提交
    • J
      vhost_net: batch used ring update in rx · e2b3b35e
      Jason Wang 提交于
      This patch tries to batched used ring update during RX. This is pretty
      fit for the case when guest is much faster (e.g dpdk based
      backend). In this case, used ring is almost empty:
      
      - we may get serious cache line misses/contending on both used ring
        and used idx.
      - at most 1 packet could be dequeued at one time, batching in guest
        does not make much effect.
      
      Update used ring in a batch can help since guest won't access the used
      ring until used idx was advanced for several descriptors and since we
      advance used ring for every N packets, guest will only need to access
      used idx for every N packet since it can cache the used idx. To have a
      better interaction for both batch dequeuing and dpdk batching,
      VHOST_RX_BATCH was used as the maximum number of descriptors that
      could be batched.
      
      Test were done between two machines with 2.40GHz Intel(R) Xeon(R) CPU
      E5-2630 connected back to back through ixgbe. Traffic were generated
      on one remote ixgbe through MoonGen and measure the RX pps through
      testpmd in guest when do xdp_redirect_map from local ixgbe to
      tap. RX pps were increased from 3.05 Mpps to 4.00 Mpps (about 31%
      improvement).
      
      One possible concern for this is the implications for TCP (especially
      latency sensitive workload). Result[1] does not show obvious changes
      for most of the netperf test (RR, TX, and RX). And we do get some
      improvements for RX on some specific size.
      
      Guest RX:
      
      size/sessions/+thu%/+normalize%
         64/     1/   +2%/   +2%
         64/     2/   +2%/   -1%
         64/     4/   +1%/   +1%
         64/     8/    0%/    0%
        256/     1/   +6%/   -3%
        256/     2/   -3%/   +2%
        256/     4/  +11%/  +11%
        256/     8/    0%/    0%
        512/     1/   +4%/    0%
        512/     2/   +2%/   +2%
        512/     4/    0%/   -1%
        512/     8/   -8%/   -8%
       1024/     1/   -7%/  -17%
       1024/     2/   -8%/   -7%
       1024/     4/   +1%/    0%
       1024/     8/    0%/    0%
       2048/     1/  +30%/  +14%
       2048/     2/  +46%/  +40%
       2048/     4/    0%/    0%
       2048/     8/    0%/    0%
       4096/     1/  +23%/  +22%
       4096/     2/  +26%/  +23%
       4096/     4/    0%/   +1%
       4096/     8/    0%/    0%
      16384/     1/   -2%/   -3%
      16384/     2/   +1%/   -4%
      16384/     4/   -1%/   -3%
      16384/     8/    0%/   -1%
      65535/     1/  +15%/   +7%
      65535/     2/   +4%/   +7%
      65535/     4/    0%/   +1%
      65535/     8/    0%/    0%
      
      TCP_RR:
      
      size/sessions/+thu%/+normalize%
          1/     1/    0%/   +1%
          1/    25/   +2%/   +1%
          1/    50/   +4%/   +1%
         64/     1/    0%/   -4%
         64/    25/   +2%/   +1%
         64/    50/    0%/   -1%
        256/     1/    0%/    0%
        256/    25/    0%/    0%
        256/    50/   +4%/   +2%
      
      Guest TX:
      
      size/sessions/+thu%/+normalize%
         64/     1/   +4%/   -2%
         64/     2/   -6%/   -5%
         64/     4/   +3%/   +6%
         64/     8/    0%/   +3%
        256/     1/  +15%/  +16%
        256/     2/  +11%/  +12%
        256/     4/   +1%/    0%
        256/     8/   +5%/   +5%
        512/     1/   -1%/   -6%
        512/     2/    0%/   -8%
        512/     4/   -2%/   +4%
        512/     8/   +6%/   +9%
       1024/     1/   +3%/   +1%
       1024/     2/   +3%/   +9%
       1024/     4/    0%/   +7%
       1024/     8/    0%/   +7%
       2048/     1/   +8%/   +2%
       2048/     2/   +3%/   -1%
       2048/     4/   -1%/  +11%
       2048/     8/   +3%/   +9%
       4096/     1/   +8%/   +8%
       4096/     2/    0%/   -7%
       4096/     4/   +4%/   +4%
       4096/     8/   +2%/   +5%
      16384/     1/   -3%/   +1%
      16384/     2/   -1%/  -12%
      16384/     4/   -1%/   +5%
      16384/     8/    0%/   +1%
      65535/     1/    0%/   -3%
      65535/     2/   +5%/  +16%
      65535/     4/   +1%/   +2%
      65535/     8/   +1%/   -1%
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2b3b35e
  18. 09 1月, 2018 2 次提交
    • J
      tuntap: XDP transmission · fc72d1d5
      Jason Wang 提交于
      This patch implements XDP transmission for TAP. Since we can't create
      new queues for TAP during XDP set, exist ptr_ring was reused for
      queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
      (0x1UL) was encoded into lowest bit of xpd_buff pointer during
      ptr_ring_produce, and was decoded during consuming. XDP metadata was
      stored in the headroom of the packet which should work in most of
      cases since driver usually reserve enough headroom. Very minor changes
      were done for vhost_net: it just need to peek the length depends on
      the type of pointer.
      
      Tests were done on two Intel E5-2630 2.40GHz machines connected back
      to back through two 82599ES. Traffic were generated/received through
      MoonGen/testpmd(rxonly). It reports ~20% improvements when
      xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps
      to 3.05Mpps)
      
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc72d1d5
    • J
      tun/tap: use ptr_ring instead of skb_array · 5990a305
      Jason Wang 提交于
      This patch switches to use ptr_ring instead of skb_array. This will be
      used to enqueue different types of pointers by encoding type into
      lower bits.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5990a305
  19. 06 12月, 2017 1 次提交
  20. 03 12月, 2017 1 次提交
  21. 29 11月, 2017 1 次提交
  22. 28 11月, 2017 3 次提交
  23. 15 11月, 2017 4 次提交