1. 06 9月, 2017 2 次提交
  2. 19 8月, 2017 1 次提交
    • E
      tun: handle register_netdevice() failures properly · ff244c6b
      Eric Dumazet 提交于
      syzkaller reported a double free [1], caused by the fact
      that tun driver was not updated properly when priv_destructor
      was added.
      
      When/if register_netdevice() fails, priv_destructor() must have been
      called already.
      
      [1]
      BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
      
      CPU: 0 PID: 2919 Comm: syzkaller227220 Not tainted 4.13.0-rc4+ #23
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x7f/0x260 mm/kasan/report.c:252
       kasan_report_double_free+0x55/0x80 mm/kasan/report.c:333
       kasan_slab_free+0xa0/0xc0 mm/kasan/kasan.c:514
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xd3/0x260 mm/slab.c:3820
       selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
       security_tun_dev_free_security+0x48/0x80 security/security.c:1512
       tun_set_iff drivers/net/tun.c:1884 [inline]
       __tun_chr_ioctl+0x2ce6/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x443ff9
      RSP: 002b:00007ffc34271f68 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000443ff9
      RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000003
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401ce0
      R13: 0000000000401d70 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 2919:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:551
       kmem_cache_alloc_trace+0x101/0x6f0 mm/slab.c:3627
       kmalloc include/linux/slab.h:493 [inline]
       kzalloc include/linux/slab.h:666 [inline]
       selinux_tun_dev_alloc_security+0x49/0x170 security/selinux/hooks.c:5012
       security_tun_dev_alloc_security+0x6d/0xa0 security/security.c:1506
       tun_set_iff drivers/net/tun.c:1839 [inline]
       __tun_chr_ioctl+0x1730/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 2919:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x6e/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xd3/0x260 mm/slab.c:3820
       selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
       security_tun_dev_free_security+0x48/0x80 security/security.c:1512
       tun_free_netdev+0x13b/0x1b0 drivers/net/tun.c:1563
       register_netdevice+0x8d0/0xee0 net/core/dev.c:7605
       tun_set_iff drivers/net/tun.c:1859 [inline]
       __tun_chr_ioctl+0x1caf/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801d2843b40
       which belongs to the cache kmalloc-32 of size 32
      The buggy address is located 0 bytes inside of
       32-byte region [ffff8801d2843b40, ffff8801d2843b60)
      The buggy address belongs to the page:
      page:ffffea000660cea8 count:1 mapcount:0 mapping:ffff8801d2843000 index:0xffff8801d2843fc1
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801d2843000 ffff8801d2843fc1 000000010000003f
      raw: ffffea0006626a40 ffffea00066141a0 ffff8801dbc00100
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
       ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc
      >ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc
                                                 ^
       ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
       ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
      
      ==================================================================
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff244c6b
  3. 17 8月, 2017 2 次提交
  4. 14 8月, 2017 2 次提交
    • J
      tap: XDP support · 761876c8
      Jason Wang 提交于
      This patch tries to implement XDP for tun. The implementation was
      split into two parts:
      
      - fast path: small and no gso packet. We try to do XDP at page level
        before build_skb(). For XDP_TX, since creating/destroying queues
        were completely under control of userspace, it was implemented
        through generic XDP helper after skb has been built. This could be
        optimized in the future.
      - slow path: big or gso packet. We try to do it after skb was created
        through generic XDP helpers.
      
      Test were done through pktgen with small packets.
      
      xdp1 test shows ~41.1% improvement:
      
      Before: ~1.7Mpps
      After:  ~2.3Mpps
      
      xdp_redirect to ixgbe shows ~60% improvement:
      
      Before: ~0.8Mpps
      After:  ~1.38Mpps
      Suggested-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      761876c8
    • J
      tap: use build_skb() for small packet · 66ccbc9c
      Jason Wang 提交于
      We use tun_alloc_skb() which calls sock_alloc_send_pskb() to allocate
      skb in the past. This socket based method is not suitable for high
      speed userspace like virtualization which usually:
      
      - ignore sk_sndbuf (INT_MAX) and expect to receive the packet as fast as
        possible
      - don't want to be block at sendmsg()
      
      To eliminate the above overheads, this patch tries to use build_skb()
      for small packet. We will do this only when the following conditions
      are all met:
      
      - TAP instead of TUN
      - sk_sndbuf is INT_MAX
      - caller don't want to be blocked
      - zerocopy is not used
      - packet size is smaller enough to use build_skb()
      
      Pktgen from guest to host shows ~11% improvement for rx pps of tap:
      
      Before: ~1.70Mpps
      After : ~1.88Mpps
      
      What's more important, this makes it possible to implement XDP for tap
      before creating skbs.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66ccbc9c
  5. 04 8月, 2017 1 次提交
    • W
      sock: enable MSG_ZEROCOPY · 1f8b977a
      Willem de Bruijn 提交于
      Prepare the datapath for refcounted ubuf_info. Clone ubuf_info with
      skb_zerocopy_clone() wherever needed due to skb split, merge, resize
      or clone.
      
      Split skb_orphan_frags into two variants. The split, merge, .. paths
      support reference counted zerocopy buffers, so do not do a deep copy.
      Add skb_orphan_frags_rx for paths that may loop packets to receive
      sockets. That is not allowed, as it may cause unbounded latency.
      Deep copy all zerocopy copy buffers, ref-counted or not, in this path.
      
      The exact locations to modify were chosen by exhaustively searching
      through all code that might modify skb_frag references and/or the
      the SKBTX_DEV_ZEROCOPY tx_flags bit.
      
      The changes err on the safe side, in two ways.
      
      (1) legacy ubuf_info paths virtio and tap are not modified. They keep
          a 1:1 ubuf_info to sk_buff relationship. Calls to skb_orphan_frags
          still call skb_copy_ubufs and thus copy frags in this case.
      
      (2) not all copies deep in the stack are addressed yet. skb_shift,
          skb_split and skb_try_coalesce can be refined to avoid copying.
          These are not in the hot path and this patch is hairy enough as
          is, so that is left for future refinement.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f8b977a
  6. 25 7月, 2017 1 次提交
  7. 18 7月, 2017 1 次提交
  8. 27 6月, 2017 1 次提交
  9. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  10. 07 6月, 2017 1 次提交
  11. 18 5月, 2017 2 次提交
  12. 22 3月, 2017 1 次提交
  13. 14 3月, 2017 2 次提交
  14. 10 3月, 2017 1 次提交
  15. 02 3月, 2017 1 次提交
  16. 07 2月, 2017 1 次提交
  17. 21 1月, 2017 1 次提交
  18. 19 1月, 2017 1 次提交
    • J
      tun: rx batching · 5503fcec
      Jason Wang 提交于
      We can only process 1 packet at one time during sendmsg(). This often
      lead bad cache utilization under heavy load. So this patch tries to do
      some batching during rx before submitting them to host network
      stack. This is done through accepting MSG_MORE as a hint from
      sendmsg() caller, if it was set, batch the packet temporarily in a
      linked list and submit them all once MSG_MORE were cleared.
      
      Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host:
      
                                       Mpps  -+%
          rx-frames = 0                0.91  +0%
          rx-frames = 4                1.00  +9.8%
          rx-frames = 8                1.00  +9.8%
          rx-frames = 16               1.01  +10.9%
          rx-frames = 32               1.07  +17.5%
          rx-frames = 48               1.07  +17.5%
          rx-frames = 64               1.08  +18.6%
          rx-frames = 64 (no MSG_MORE) 0.91  +0%
      
      User were allowed to change per device batched packets through
      ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation
      to prevent bh from being disabled too long.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5503fcec
  19. 09 1月, 2017 1 次提交
  20. 25 12月, 2016 1 次提交
  21. 07 12月, 2016 1 次提交
  22. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  23. 01 12月, 2016 1 次提交
  24. 25 11月, 2016 1 次提交
  25. 19 11月, 2016 2 次提交
  26. 31 10月, 2016 1 次提交
  27. 30 10月, 2016 1 次提交
  28. 21 10月, 2016 1 次提交
    • J
      net: use core MTU range checking in core net infra · 91572088
      Jarod Wilson 提交于
      geneve:
      - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
      - This one isn't quite as straight-forward as others, could use some
        closer inspection and testing
      
      macvlan:
      - set min/max_mtu
      
      tun:
      - set min/max_mtu, remove tun_net_change_mtu
      
      vxlan:
      - Merge __vxlan_change_mtu back into vxlan_change_mtu
      - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      - This one is also not as straight-forward and could use closer inspection
        and testing from vxlan folks
      
      bridge:
      - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      
      openvswitch:
      - set min/max_mtu, remove internal_dev_change_mtu
      - note: max_mtu wasn't checked previously, it's been set to 65535, which
        is the largest possible size supported
      
      sch_teql:
      - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
      
      macsec:
      - min_mtu = 0, max_mtu = 65535
      
      macvlan:
      - min_mtu = 0, max_mtu = 65535
      
      ntb_netdev:
      - min_mtu = 0, max_mtu = 65535
      
      veth:
      - min_mtu = 68, max_mtu = 65535
      
      8021q:
      - min_mtu = 0, max_mtu = 65535
      
      CC: netdev@vger.kernel.org
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Jiri Benc <jbenc@redhat.com>
      CC: WANG Cong <xiyou.wangcong@gmail.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91572088
  29. 24 8月, 2016 1 次提交
    • S
      tun: fix transmit timestamp support · 7b996243
      Soheil Hassas Yeganeh 提交于
      Instead of using sock_tx_timestamp, use skb_tx_timestamp to record
      software transmit timestamp of a packet.
      
      sock_tx_timestamp resets and overrides the tx_flags of the skb.
      The function is intended to be called from within the protocol
      layer when creating the skb, not from a device driver. This is
      inconsistent with other drivers and will cause issues for TCP.
      
      In TCP, we intend to sample the timestamps for the last byte
      for each sendmsg/sendpage. For that reason, tcp_sendmsg calls
      tcp_tx_timestamp only with the last skb that it generates.
      For example, if a 128KB message is split into two 64KB packets
      we want to sample the SND timestamp of the last packet. The current
      code in the tun driver, however, will result in sampling the SND
      timestamp for both packets.
      
      Also, when the last packet is split into smaller packets for
      retranmission (see tcp_fragment), the tun driver will record
      timestamps for all of the retransmitted packets and not only the
      last packet.
      
      Fixes: eda29772 (tun: Support software transmit time stamping.)
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NFrancis Yan <francisyyan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b996243
  30. 21 8月, 2016 2 次提交
  31. 09 7月, 2016 1 次提交
    • C
      tun: Don't assume type tun in tun_device_event · 86dfb4ac
      Craig Gallek 提交于
      The referenced change added a netlink notifier for processing
      device queue size events.  These events are fired for all devices
      but the registered callback assumed they only occurred for tun
      devices.  This fix adds a check (borrowed from macvtap.c) to discard
      non-tun device events.
      
      For reference, this fixes the following splat:
      [   71.505935] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      [   71.513870] IP: [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
      [   71.519906] PGD 3f41f56067 PUD 3f264b7067 PMD 0
      [   71.524497] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
      [   71.529374] gsmi: Log Shutdown Reason 0x03
      [   71.533417] Modules linked in:[   71.533826] mlx4_en: eth1: Link Up
      
      [   71.539616]  bonding w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core
      [   71.549282] CPU: 12 PID: 7915 Comm: set.ixion-haswe Not tainted 4.7.0-dbx-DEV #8
      [   71.556586] Hardware name: Intel Grantley,Wellsburg/Ixion_IT_15, BIOS 2.58.0 05/03/2016
      [   71.564495] task: ffff887f00bb20c0 ti: ffff887f00798000 task.ti: ffff887f00798000
      [   71.571894] RIP: 0010:[<ffffffff8153c1a0>]  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
      [   71.580327] RSP: 0018:ffff887f0079bbd8  EFLAGS: 00010202
      [   71.585576] RAX: fffffffffffffae8 RBX: ffff887ef6d03378 RCX: 0000000000000000
      [   71.592624] RDX: 0000000000000000 RSI: 0000000000000028 RDI: 0000000000000000
      [   71.599675] RBP: ffff887f0079bc48 R08: 0000000000000000 R09: 0000000000000001
      [   71.606730] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000010
      [   71.613780] R13: 0000000000000000 R14: 0000000000000001 R15: ffff887f0079bd00
      [   71.620832] FS:  00007f5cdc581700(0000) GS:ffff883f7f700000(0000) knlGS:0000000000000000
      [   71.628826] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   71.634500] CR2: 0000000000000010 CR3: 0000003f3eb62000 CR4: 00000000001406e0
      [   71.641549] Stack:
      [   71.643533]  ffff887f0079bc08 0000000000000246 000000000000001e ffff887ef6d00000
      [   71.650871]  ffff887f0079bd00 0000000000000000 0000000000000000 ffffffff00000000
      [   71.658210]  ffff887f0079bc48 ffffffff81d24070 00000000fffffff9 ffffffff81cec7a0
      [   71.665549] Call Trace:
      [   71.667975]  [<ffffffff810eeb0d>] notifier_call_chain+0x5d/0x80
      [   71.673823]  [<ffffffff816365d0>] ? show_tx_maxrate+0x30/0x30
      [   71.679502]  [<ffffffff810eeb3e>] __raw_notifier_call_chain+0xe/0x10
      [   71.685778]  [<ffffffff810eeb56>] raw_notifier_call_chain+0x16/0x20
      [   71.691976]  [<ffffffff8160eb30>] call_netdevice_notifiers_info+0x40/0x70
      [   71.698681]  [<ffffffff8160ec36>] call_netdevice_notifiers+0x16/0x20
      [   71.704956]  [<ffffffff81636636>] change_tx_queue_len+0x66/0x90
      [   71.710807]  [<ffffffff816381ef>] netdev_store.isra.5+0xbf/0xd0
      [   71.716658]  [<ffffffff81638350>] tx_queue_len_store+0x50/0x60
      [   71.722431]  [<ffffffff814a6798>] dev_attr_store+0x18/0x30
      [   71.727857]  [<ffffffff812ea3ff>] sysfs_kf_write+0x4f/0x70
      [   71.733274]  [<ffffffff812e9507>] kernfs_fop_write+0x147/0x1d0
      [   71.739045]  [<ffffffff81134a4f>] ? rcu_read_lock_sched_held+0x8f/0xa0
      [   71.745499]  [<ffffffff8125a108>] __vfs_write+0x28/0x120
      [   71.750748]  [<ffffffff8111b137>] ? percpu_down_read+0x57/0x90
      [   71.756516]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
      [   71.762278]  [<ffffffff8125d7d8>] ? __sb_start_write+0xc8/0xe0
      [   71.768038]  [<ffffffff8125bd5e>] vfs_write+0xbe/0x1b0
      [   71.773113]  [<ffffffff8125c092>] SyS_write+0x52/0xa0
      [   71.778110]  [<ffffffff817528e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [   71.784472] Code: 45 31 f6 48 8b 93 78 33 00 00 48 81 c3 78 33 00 00 48 39 d3 48 8d 82 e8 fa ff ff 74 25 48 8d b0 40 05 00 00 49 63 d6 41 83 c6 01 <49> 89 34 d4 48 8b 90 18 05 00 00 48 39 d3 48 8d 82 e8 fa ff ff
      [   71.803655] RIP  [<ffffffff8153c1a0>] tun_device_event+0x110/0x340
      [   71.809769]  RSP <ffff887f0079bbd8>
      [   71.813213] CR2: 0000000000000010
      [   71.816512] ---[ end trace 4db6449606319f73 ]---
      
      Fixes: 1576d986 ("tun: switch to use skb array for tx")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86dfb4ac
  32. 05 7月, 2016 1 次提交
  33. 01 7月, 2016 1 次提交
    • J
      tun: switch to use skb array for tx · 1576d986
      Jason Wang 提交于
      We used to queue tx packets in sk_receive_queue, this is less
      efficient since it requires spinlocks to synchronize between producer
      and consumer.
      
      This patch tries to address this by:
      
      - switch from sk_receive_queue to a skb_array, and resize it when
        tx_queue_len was changed.
      - introduce a new proto_ops peek_len which was used for peeking the
        skb length.
      - implement a tun version of peek_len for vhost_net to use and convert
        vhost_net to use peek_len if possible.
      
      Pktgen test shows about 15.3% improvement on guest receiving pps for small
      buffers:
      
      Before: ~1300000pps
      After : ~1500000pps
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1576d986