1. 02 6月, 2020 1 次提交
    • W
      tun: correct header offsets in napi frags mode · 96aa1b22
      Willem de Bruijn 提交于
      Tun in IFF_NAPI_FRAGS mode calls napi_gro_frags. Unlike netif_rx and
      netif_gro_receive, this expects skb->data to point to the mac layer.
      
      But skb_probe_transport_header, __skb_get_hash_symmetric, and
      xdp_do_generic in tun_get_user need skb->data to point to the network
      header. Flow dissection also needs skb->protocol set, so
      eth_type_trans has to be called.
      
      Ensure the link layer header lies in linear as eth_type_trans pulls
      ETH_HLEN. Then take the same code paths for frags as for not frags.
      Push the link layer header back just before calling napi_gro_frags.
      
      By pulling up to ETH_HLEN from frag0 into linear, this disables the
      frag0 optimization in the special case when IFF_NAPI_FRAGS is used
      with zero length iov[0] (and thus empty skb->linear).
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPetar Penkov <ppenkov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96aa1b22
  2. 15 5月, 2020 1 次提交
  3. 13 4月, 2020 1 次提交
  4. 07 4月, 2020 1 次提交
  5. 07 3月, 2020 1 次提交
  6. 06 3月, 2020 5 次提交
  7. 24 2月, 2020 1 次提交
  8. 23 1月, 2020 1 次提交
    • E
      tun: add mutex_unlock() call and napi.skb clearing in tun_get_user() · 1efba987
      Eric Dumazet 提交于
      If both IFF_NAPI_FRAGS mode and XDP are enabled, and the XDP program
      consumes the skb, we need to clear the napi.skb (or risk
      a use-after-free) and release the mutex (or risk a deadlock)
      
      WARNING: lock held when returning to user space!
      5.5.0-rc6-syzkaller #0 Not tainted
      ------------------------------------------------
      syz-executor.0/455 is leaving the kernel with locks still held!
      1 lock held by syz-executor.0/455:
       #0: ffff888098f6e748 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x1604/0x3fc0 drivers/net/tun.c:1835
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Petar Penkov <ppenkov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1efba987
  9. 17 1月, 2020 1 次提交
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
  10. 16 11月, 2019 1 次提交
    • P
      tun: fix data-race in gro_normal_list() · c39e342a
      Petar Penkov 提交于
      There is a race in the TUN driver between napi_busy_loop and
      napi_gro_frags. This commit resolves the race by adding the NAPI struct
      via netif_tx_napi_add, instead of netif_napi_add, which disables polling
      for the NAPI struct.
      
      KCSAN reported:
      BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop
      
      write to 0xffff8880b5d474b0 of 4 bytes by task 11205 on cpu 0:
       gro_normal_list.part.0+0x77/0xb0 net/core/dev.c:5682
       gro_normal_list net/core/dev.c:5678 [inline]
       gro_normal_one net/core/dev.c:5692 [inline]
       napi_frags_finish net/core/dev.c:5705 [inline]
       napi_gro_frags+0x625/0x770 net/core/dev.c:5778
       tun_get_user+0x2150/0x26a0 drivers/net/tun.c:1976
       tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
       call_write_iter include/linux/fs.h:1895 [inline]
       do_iter_readv_writev+0x487/0x5b0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x13b/0x3c0 fs/read_write.c:951
       vfs_writev+0x118/0x1c0 fs/read_write.c:1015
       do_writev+0xe3/0x250 fs/read_write.c:1058
       __do_sys_writev fs/read_write.c:1131 [inline]
       __se_sys_writev fs/read_write.c:1128 [inline]
       __x64_sys_writev+0x4e/0x60 fs/read_write.c:1128
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff8880b5d474b0 of 4 bytes by task 11168 on cpu 1:
       gro_normal_list net/core/dev.c:5678 [inline]
       napi_busy_loop+0xda/0x4f0 net/core/dev.c:6126
       sk_busy_loop include/net/busy_poll.h:108 [inline]
       __skb_recv_udp+0x4ad/0x560 net/ipv4/udp.c:1689
       udpv6_recvmsg+0x29e/0xe90 net/ipv6/udp.c:288
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 94317099 ("tun: enable NAPI for TUN/TAP driver")
      Signed-off-by: NPetar Penkov <ppenkov@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c39e342a
  11. 08 11月, 2019 1 次提交
    • E
      tun: switch to u64_stats_t · 5260dd3e
      Eric Dumazet 提交于
      In order to fix this data-race found by KCSAN [1],
      switch to u64_stats_t helpers. They provide all
      the needed annotations, without adding extra cost.
      
      [1]
      BUG: KCSAN: data-race in tun_get_user / tun_net_get_stats64
      
      read to 0xffffe8ffffd8aca8 of 8 bytes by task 4882 on cpu 0:
       tun_net_get_stats64+0x9b/0x230 drivers/net/tun.c:1171
       dev_get_stats+0x89/0x1e0 net/core/dev.c:9103
       rtnl_fill_stats+0x56/0x370 net/core/rtnetlink.c:1177
       rtnl_fill_ifinfo+0xd3b/0x2100 net/core/rtnetlink.c:1667
       rtmsg_ifinfo_build_skb+0xb0/0x150 net/core/rtnetlink.c:3472
       rtmsg_ifinfo_event.part.0+0x4e/0xb0 net/core/rtnetlink.c:3504
       rtmsg_ifinfo_event net/core/rtnetlink.c:3515 [inline]
       rtmsg_ifinfo+0x85/0x90 net/core/rtnetlink.c:3513
       __dev_notify_flags+0x18b/0x200 net/core/dev.c:7649
       dev_change_flags+0xb8/0xe0 net/core/dev.c:7691
       dev_ifsioc+0x201/0x6a0 net/core/dev_ioctl.c:237
       dev_ioctl+0x149/0x660 net/core/dev_ioctl.c:489
       sock_do_ioctl+0xdb/0x230 net/socket.c:1061
       sock_ioctl+0x3a3/0x5e0 net/socket.c:1189
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:509 [inline]
       do_vfs_ioctl+0x991/0xc60 fs/ioctl.c:696
      
      write to 0xffffe8ffffd8aca8 of 8 bytes by task 4883 on cpu 1:
       tun_get_user+0x1d94/0x2ba0 drivers/net/tun.c:2002
       tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
       call_write_iter include/linux/fs.h:1895 [inline]
       new_sync_write+0x388/0x4a0 fs/read_write.c:483
       __vfs_write+0xb1/0xc0 fs/read_write.c:496
       __kernel_write+0xb8/0x240 fs/read_write.c:515
       write_pipe_buf+0xb6/0xf0 fs/splice.c:794
       splice_from_pipe_feed fs/splice.c:500 [inline]
       __splice_from_pipe+0x248/0x480 fs/splice.c:624
       splice_from_pipe+0xbb/0x100 fs/splice.c:659
       default_file_splice_write+0x45/0x90 fs/splice.c:806
       do_splice_from fs/splice.c:848 [inline]
       direct_splice_actor+0xa0/0xc0 fs/splice.c:1020
       splice_direct_to_actor+0x215/0x510 fs/splice.c:975
       do_splice_direct+0x161/0x1e0 fs/splice.c:1063
       do_sendfile+0x384/0x7f0 fs/read_write.c:1464
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 4883 Comm: syz-executor.1 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5260dd3e
  12. 10 10月, 2019 1 次提交
  13. 09 10月, 2019 2 次提交
    • E
      Revert "tun: call dev_get_valid_name() before register_netdevice()" · bacb7e18
      Eric Dumazet 提交于
      This reverts commit 0ad646c8.
      
      As noticed by Jakub, this is no longer needed after
      commit 11fc7d5a ("tun: fix memory leak in error path")
      
      This no longer exports dev_get_valid_name() for the exclusive
      use of tun driver.
      Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      bacb7e18
    • E
      tun: fix memory leak in error path · 11fc7d5a
      Eric Dumazet 提交于
      syzbot reported a warning [1] that triggered after recent Jiri patch.
      
      This exposes a bug that we hit already in the past (see commit
      ff244c6b ("tun: handle register_netdevice() failures properly")
      for details)
      
      tun uses priv->destructor without an ndo_init() method.
      
      register_netdevice() can return an error, but will
      not call priv->destructor() in some cases. Jiri recent
      patch added one more.
      
      A long term fix would be to transfer the initialization
      of what we destroy in ->destructor() in the ndo_init()
      
      This looks a bit risky given the complexity of tun driver.
      
      A simpler fix is to detect after the failed register_netdevice()
      if the tun_free_netdev() function was called already.
      
      [1]
      ODEBUG: free active (active state 0) object type: timer_list hint: tun_flow_cleanup+0x0/0x280 drivers/net/tun.c:457
      WARNING: CPU: 0 PID: 8653 at lib/debugobjects.c:481 debug_print_object+0x168/0x250 lib/debugobjects.c:481
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 8653 Comm: syz-executor976 Not tainted 5.4.0-rc1-next-20191004 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       panic+0x2dc/0x755 kernel/panic.c:220
       __warn.cold+0x2f/0x3c kernel/panic.c:581
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
      RIP: 0010:debug_print_object+0x168/0x250 lib/debugobjects.c:481
      Code: dd 80 b9 e6 87 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 b5 00 00 00 48 8b 14 dd 80 b9 e6 87 48 c7 c7 e0 ae e6 87 e8 80 84 ff fd <0f> 0b 83 05 e3 ee 80 06 01 48 83 c4 20 5b 41 5c 41 5d 41 5e 5d c3
      RSP: 0018:ffff888095997a28 EFLAGS: 00010082
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815cb526 RDI: ffffed1012b32f37
      RBP: ffff888095997a68 R08: ffff8880a92ac580 R09: ffffed1015d04101
      R10: ffffed1015d04100 R11: ffff8880ae820807 R12: 0000000000000001
      R13: ffffffff88fb5340 R14: ffffffff81627110 R15: ffff8880aa41eab8
       __debug_check_no_obj_freed lib/debugobjects.c:963 [inline]
       debug_check_no_obj_freed+0x2d4/0x43f lib/debugobjects.c:994
       kfree+0xf8/0x2c0 mm/slab.c:3755
       kvfree+0x61/0x70 mm/util.c:593
       netdev_freemem net/core/dev.c:9384 [inline]
       free_netdev+0x39d/0x450 net/core/dev.c:9533
       tun_set_iff drivers/net/tun.c:2871 [inline]
       __tun_chr_ioctl+0x317b/0x3f30 drivers/net/tun.c:3075
       tun_chr_ioctl+0x2b/0x40 drivers/net/tun.c:3355
       vfs_ioctl fs/ioctl.c:47 [inline]
       file_ioctl fs/ioctl.c:539 [inline]
       do_vfs_ioctl+0xdb6/0x13e0 fs/ioctl.c:726
       ksys_ioctl+0xab/0xd0 fs/ioctl.c:743
       __do_sys_ioctl fs/ioctl.c:750 [inline]
       __se_sys_ioctl fs/ioctl.c:748 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:748
       do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441439
      Code: e8 9c ae 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff61c37438 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441439
      RDX: 0000000020000400 RSI: 00000000400454ca RDI: 0000000000000004
      RBP: 00007fff61c37470 R08: 0000000000000001 R09: 0000000100000000
      R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffffffffff
      R13: 0000000000000005 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      11fc7d5a
  14. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  15. 12 9月, 2019 1 次提交
    • Y
      tun: fix use-after-free when register netdev failed · 77f22f92
      Yang Yingliang 提交于
      I got a UAF repport in tun driver when doing fuzzy test:
      
      [  466.269490] ==================================================================
      [  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
      [  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
      [  466.271810]
      [  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
      [  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [  466.271838] Call Trace:
      [  466.271858]  dump_stack+0xca/0x13e
      [  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271890]  print_address_description+0x79/0x440
      [  466.271906]  ? vprintk_func+0x5e/0xf0
      [  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271935]  __kasan_report+0x15c/0x1df
      [  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271976]  kasan_report+0xe/0x20
      [  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
      [  466.272013]  do_iter_readv_writev+0x4b7/0x740
      [  466.272032]  ? default_llseek+0x2d0/0x2d0
      [  466.272072]  do_iter_read+0x1c5/0x5e0
      [  466.272110]  vfs_readv+0x108/0x180
      [  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
      [  466.299020]  ? fsnotify+0x888/0xd50
      [  466.299040]  ? __fsnotify_parent+0xd0/0x350
      [  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
      [  466.304548]  ? vfs_write+0x264/0x510
      [  466.304569]  ? ksys_write+0x101/0x210
      [  466.304591]  ? do_preadv+0x116/0x1a0
      [  466.304609]  do_preadv+0x116/0x1a0
      [  466.309829]  do_syscall_64+0xc8/0x600
      [  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.309861] RIP: 0033:0x4560f9
      [  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      [  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
      [  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
      [  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
      [  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
      [  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
      [  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
      [  466.323057]
      [  466.323064] Allocated by task 2605:
      [  466.335165]  save_stack+0x19/0x80
      [  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
      [  466.337755]  kmem_cache_alloc+0xe8/0x320
      [  466.339050]  getname_flags+0xca/0x560
      [  466.340229]  user_path_at_empty+0x2c/0x50
      [  466.341508]  vfs_statx+0xe6/0x190
      [  466.342619]  __do_sys_newstat+0x81/0x100
      [  466.343908]  do_syscall_64+0xc8/0x600
      [  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.347034]
      [  466.347517] Freed by task 2605:
      [  466.348471]  save_stack+0x19/0x80
      [  466.349476]  __kasan_slab_free+0x12e/0x180
      [  466.350726]  kmem_cache_free+0xc8/0x430
      [  466.351874]  putname+0xe2/0x120
      [  466.352921]  filename_lookup+0x257/0x3e0
      [  466.354319]  vfs_statx+0xe6/0x190
      [  466.355498]  __do_sys_newstat+0x81/0x100
      [  466.356889]  do_syscall_64+0xc8/0x600
      [  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.359567]
      [  466.360050] The buggy address belongs to the object at ffff888372139100
      [  466.360050]  which belongs to the cache names_cache of size 4096
      [  466.363735] The buggy address is located 336 bytes inside of
      [  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
      [  466.367179] The buggy address belongs to the page:
      [  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
      [  466.371582] flags: 0x2fffff80010200(slab|head)
      [  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
      [  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  466.377778] page dumped because: kasan: bad access detected
      [  466.379730]
      [  466.380288] Memory state around the buggy address:
      [  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.388257]                                                  ^
      [  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.394667] ==================================================================
      
      tun_chr_read_iter() accessed the memory which freed by free_netdev()
      called by tun_set_iff():
      
              CPUA                                           CPUB
        tun_set_iff()
          alloc_netdev_mqs()
          tun_attach()
                                                        tun_chr_read_iter()
                                                          tun_get()
                                                          tun_do_read()
                                                            tun_ring_recv()
          register_netdevice() <-- inject error
          goto err_detach
          tun_detach_all() <-- set RCV_SHUTDOWN
          free_netdev() <-- called from
                           err_free_dev path
            netdev_freemem() <-- free the memory
                              without check refcount
            (In this path, the refcount cannot prevent
             freeing the memory of dev, and the memory
             will be used by dev_put() called by
             tun_chr_read_iter() on CPUB.)
                                                           (Break from tun_ring_recv(),
                                                           because RCV_SHUTDOWN is set)
                                                         tun_put()
                                                           dev_put() <-- use the memory
                                                                         freed by netdev_freemem()
      
      Put the publishing of tfile->tun after register_netdevice(),
      so tun_get() won't get the tun pointer that freed by
      err_detach path if register_netdevice() failed.
      
      Fixes: eb0fb363 ("tuntap: attach queue 0 before registering netdevice")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Suggested-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f22f92
  16. 26 7月, 2019 1 次提交
    • A
      tun: mark small packets as owned by the tap sock · 4b663366
      Alexis Bauvin 提交于
      - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
      
      Small packets going out of a tap device go through an optimized code
      path that uses build_skb() rather than sock_alloc_send_pskb(). The
      latter calls skb_set_owner_w(), but the small packet code path does not.
      
      The net effect is that small packets are not owned by the userland
      application's socket (e.g. QEMU), while large packets are.
      This can be seen with a TCP session, where packets are not owned when
      the window size is small enough (around PAGE_SIZE), while they are once
      the window grows (note that this requires the host to support virtio
      tso for the guest to offload segmentation).
      All this leads to inconsistent behaviour in the kernel, especially on
      netfilter modules that uses sk->socket (e.g. xt_owner).
      
      Fixes: 66ccbc9c ("tap: use build_skb() for small packet")
      Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b663366
  17. 09 7月, 2019 1 次提交
    • A
      coallocate socket_wq with socket itself · 333f7909
      Al Viro 提交于
      socket->wq is assign-once, set when we are initializing both
      struct socket it's in and struct socket_wq it points to.  As the
      matter of fact, the only reason for separate allocation was the
      ability to RCU-delay freeing of socket_wq.  RCU-delaying the
      freeing of socket itself gets rid of that need, so we can just
      fold struct socket_wq into the end of struct socket and simplify
      the life both for sock_alloc_inode() (one allocation instead of
      two) and for tun/tap oddballs, where we used to embed struct socket
      and struct socket_wq into the same structure (now - embedding just
      the struct socket).
      
      Note that reference to struct socket_wq in struct sock does remain
      a reference - that's unchanged.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      333f7909
  18. 19 6月, 2019 1 次提交
    • F
      tun: wake up waitqueues after IFF_UP is set · 72b319dc
      Fei Li 提交于
      Currently after setting tap0 link up, the tun code wakes tx/rx waited
      queues up in tun_net_open() when .ndo_open() is called, however the
      IFF_UP flag has not been set yet. If there's already a wait queue, it
      would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
      Then the saving vhost_poll_start() will add the wq into wqh until it
      is waken up again. Although this works when IFF_UP flag has been set
      when tun_chr_poll detects; this is not true if IFF_UP flag has not
      been set at that time. Sadly the latter case is a fatal error, as
      the wq will never be waken up in future unless later manually
      setting link up on purpose.
      
      Fix this by moving the wakeup process into the NETDEV_UP event
      notifying process, this makes sure IFF_UP has been set before all
      waited queues been waken up.
      Signed-off-by: NFei Li <lifei.shirley@bytedance.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72b319dc
  19. 31 5月, 2019 1 次提交
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 · c942fddf
      Thomas Gleixner 提交于
      Based on 3 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version this program is distributed in the
        hope that it will be useful but without any warranty without even
        the implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version [author] [kishon] [vijay] [abraham]
        [i] [kishon]@[ti] [com] this program is distributed in the hope that
        it will be useful but without any warranty without even the implied
        warranty of merchantability or fitness for a particular purpose see
        the gnu general public license for more details
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version [author] [graeme] [gregory]
        [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
        [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
        [hk] [hemahk]@[ti] [com] this program is distributed in the hope
        that it will be useful but without any warranty without even the
        implied warranty of merchantability or fitness for a particular
        purpose see the gnu general public license for more details
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 1105 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Reviewed-by: NRichard Fontana <rfontana@redhat.com>
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c942fddf
  20. 10 5月, 2019 2 次提交
  21. 24 4月, 2019 1 次提交
    • S
      net: pass net_device argument to the eth_get_headlen · c43f1255
      Stanislav Fomichev 提交于
      Update all users of eth_get_headlen to pass network device, fetch
      network namespace from it and pass it down to the flow dissector.
      This commit is a noop until administrator inserts BPF flow dissector
      program.
      
      Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: intel-wired-lan@lists.osuosl.org
      Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
      Cc: Salil Mehta <salil.mehta@huawei.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Cc: Igor Russkikh <igor.russkikh@aquantia.com>
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c43f1255
  22. 24 3月, 2019 1 次提交
  23. 22 3月, 2019 2 次提交
  24. 21 3月, 2019 1 次提交
    • P
      net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce
      Paolo Abeni 提交于
      After the previous patch, all the callers of ndo_select_queue()
      provide as a 'fallback' argument netdev_pick_tx.
      The only exceptions are nested calls to ndo_select_queue(),
      which pass down the 'fallback' available in the current scope
      - still netdev_pick_tx.
      
      We can drop such argument and replace fallback() invocation with
      netdev_pick_tx(). This avoids an indirect call per xmit packet
      in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
      with device drivers implementing such ndo. It also clean the code
      a bit.
      
      Tested with ixgbe and CONFIG_FCOE=m
      
      With pktgen using queue xmit:
      threads		vanilla 	patched
      		(kpps)		(kpps)
      1		2334		2428
      2		4166		4278
      4		7895		8100
      
       v1 -> v2:
       - rebased after helper's name change
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a350ecce
  25. 17 3月, 2019 1 次提交
  26. 16 3月, 2019 1 次提交
    • E
      tun: properly test for IFF_UP · 4477138f
      Eric Dumazet 提交于
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx_ni() or napi_gro_frags() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro layer.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4477138f
  27. 26 2月, 2019 1 次提交
  28. 25 2月, 2019 1 次提交
  29. 23 2月, 2019 1 次提交
    • M
      net: Don't set transport offset to invalid value · d2aa125d
      Maxim Mikityanskiy 提交于
      If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
      skb->protocol will be unset, __skb_flow_dissect() will fail, and
      skb_probe_transport_header() will fall back to the offset_hint, making
      the resulting skb_transport_offset incorrect.
      
      If, however, there is no transport header in the packet,
      transport_header shouldn't be set to an arbitrary value.
      
      Fix it by leaving the transport offset unset if it couldn't be found, to
      be explicit rather than to fill it with some wrong value. It changes the
      behavior, but if some code relied on the old behavior, it would be
      broken anyway, as the old one is incorrect.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2aa125d
  30. 31 1月, 2019 1 次提交
  31. 10 1月, 2019 1 次提交
    • S
      tun: publish tfile after it's fully initialized · 0b7959b6
      Stanislav Fomichev 提交于
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
      Call Trace:
       ? napi_gro_frags+0xa7/0x2c0
       tun_get_user+0xb50/0xf20
       tun_chr_write_iter+0x53/0x70
       new_sync_write+0xff/0x160
       vfs_write+0x191/0x1e0
       __x64_sys_write+0x5e/0xd0
       do_syscall_64+0x47/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I think there is a subtle race between sending a packet via tap and
      attaching it:
      
      CPU0:                    CPU1:
      tun_chr_ioctl(TUNSETIFF)
        tun_set_iff
          tun_attach
            rcu_assign_pointer(tfile->tun, tun);
                               tun_fops->write_iter()
                                 tun_chr_write_iter
                                   tun_napi_alloc_frags
                                     napi_get_frags
                                       napi->skb = napi_alloc_skb
            tun_napi_init
              netif_napi_add
                napi->skb = NULL
                                    napi->skb is NULL here
                                    napi_gro_frags
                                      napi_frags_skb
      				  skb = napi->skb
      				  skb_reset_mac_header(skb)
      				  panic()
      
      Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
      be the last thing we do in tun_attach(); this should guarantee that when we
      call tun_get() we always get an initialized object.
      
      v2 changes:
      * remove extra napi_mutex locks/unlocks for napi operations
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b7959b6
  32. 15 12月, 2018 1 次提交
  33. 14 12月, 2018 1 次提交