1. 14 8月, 2023 1 次提交
  2. 06 3月, 2023 1 次提交
  3. 11 1月, 2023 1 次提交
    • S
      net: tun: Fix use-after-free in tun_detach() · 5a82612f
      Shigeru Yoshida 提交于
      stable inclusion
      from stable-v4.19.268
      commit 1f23f1890d91812c35d32eab1b49621b6d32dc7b
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I69KBO
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 5daadc86 ]
      
      syzbot reported use-after-free in tun_detach() [1].  This causes call
      trace like below:
      
      ==================================================================
      BUG: KASAN: use-after-free in notifier_call_chain+0x1ee/0x200 kernel/notifier.c:75
      Read of size 8 at addr ffff88807324e2a8 by task syz-executor.0/3673
      
      CPU: 0 PID: 3673 Comm: syz-executor.0 Not tainted 6.1.0-rc5-syzkaller-00044-gcc675d22 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:284 [inline]
       print_report+0x15e/0x461 mm/kasan/report.c:395
       kasan_report+0xbf/0x1f0 mm/kasan/report.c:495
       notifier_call_chain+0x1ee/0x200 kernel/notifier.c:75
       call_netdevice_notifiers_info+0x86/0x130 net/core/dev.c:1942
       call_netdevice_notifiers_extack net/core/dev.c:1983 [inline]
       call_netdevice_notifiers net/core/dev.c:1997 [inline]
       netdev_wait_allrefs_any net/core/dev.c:10237 [inline]
       netdev_run_todo+0xbc6/0x1100 net/core/dev.c:10351
       tun_detach drivers/net/tun.c:704 [inline]
       tun_chr_close+0xe4/0x190 drivers/net/tun.c:3467
       __fput+0x27c/0xa90 fs/file_table.c:320
       task_work_run+0x16f/0x270 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xb3d/0x2a30 kernel/exit.c:820
       do_group_exit+0xd4/0x2a0 kernel/exit.c:950
       get_signal+0x21b1/0x2440 kernel/signal.c:2858
       arch_do_signal_or_restart+0x86/0x2300 arch/x86/kernel/signal.c:869
       exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
       exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
       do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The cause of the issue is that sock_put() from __tun_detach() drops
      last reference count for struct net, and then notifier_call_chain()
      from netdev_state_change() accesses that struct net.
      
      This patch fixes the issue by calling sock_put() from tun_detach()
      after all necessary accesses for the struct net has done.
      
      Fixes: 83c1f36f ("tun: send netlink notification when the device is modified")
      Reported-by: syzbot+106f9b687cd64ee70cd1@syzkaller.appspotmail.com
      Link: https://syzkaller.appspot.com/bug?id=96eb7f1ce75ef933697f24eeab928c4a716edefe [1]
      Signed-off-by: NShigeru Yoshida <syoshida@redhat.com>
      Link: https://lore.kernel.org/r/20221124175134.1589053-1-syoshida@redhat.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      5a82612f
  4. 07 11月, 2022 1 次提交
    • Z
      net: tun: fix bugs for oversize packet when napi frags enabled · 60a7c8b5
      Ziyang Xuan 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5Z86E
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=363a5328f4b0
      
      --------------------------------
      
      Recently, we got two syzkaller problems because of oversize packet
      when napi frags enabled.
      
      One of the problems is because the first seg size of the iov_iter
      from user space is very big, it is 2147479538 which is bigger than
      the threshold value for bail out early in __alloc_pages(). And
      skb->pfmemalloc is true, __kmalloc_reserve() would use pfmemalloc
      reserves without __GFP_NOWARN flag. Thus we got a warning as following:
      
      ========================================================
      WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
      ...
      Call trace:
       __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
       __alloc_pages_node include/linux/gfp.h:550 [inline]
       alloc_pages_node include/linux/gfp.h:564 [inline]
       kmalloc_large_node+0x94/0x350 mm/slub.c:4038
       __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
       __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
       pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
       __skb_grow include/linux/skbuff.h:2779 [inline]
       tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
       tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
       tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
      
      The other problem is because odd IPv6 packets without NEXTHDR_NONE
      extension header and have big packet length, it is 2127925 which is
      bigger than ETH_MAX_MTU(65535). After ipv6_gso_pull_exthdrs() in
      ipv6_gro_receive(), network_header offset and transport_header offset
      are all bigger than U16_MAX. That would trigger skb->network_header
      and skb->transport_header overflow error, because they are all '__u16'
      type. Eventually, it would affect the value for __skb_push(skb, value),
      and make it be a big value. After __skb_push() in ipv6_gro_receive(),
      skb->data would less than skb->head, an out of bounds memory bug occurred.
      That would trigger the problem as following:
      
      ==================================================================
      BUG: KASAN: use-after-free in eth_type_trans+0x100/0x260
      ...
      Call trace:
       dump_backtrace+0xd8/0x130
       show_stack+0x1c/0x50
       dump_stack_lvl+0x64/0x7c
       print_address_description.constprop.0+0xbc/0x2e8
       print_report+0x100/0x1e4
       kasan_report+0x80/0x120
       __asan_load8+0x78/0xa0
       eth_type_trans+0x100/0x260
       napi_gro_frags+0x164/0x550
       tun_get_user+0xda4/0x1270
       tun_chr_write_iter+0x74/0x130
       do_iter_readv_writev+0x130/0x1ec
       do_iter_write+0xbc/0x1e0
       vfs_writev+0x13c/0x26c
      
      To fix the problems, restrict the packet size less than
      (ETH_MAX_MTU - NET_SKB_PAD - NET_IP_ALIGN) which has considered reserved
      skb space in napi_alloc_skb() because transport_header is an offset from
      skb->head. Add len check in tun_napi_alloc_frags() simply.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221029094101.1653855-1-william.xuanziyang@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Conflicts:
      	drivers/net/tun.c
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      60a7c8b5
  5. 08 8月, 2022 3 次提交
  6. 28 12月, 2021 1 次提交
  7. 22 9月, 2020 2 次提交
  8. 05 3月, 2020 1 次提交
    • P
      tun: fix data-race in gro_normal_list() · 6aed5cc5
      Petar Penkov 提交于
      [ Upstream commit c39e342a ]
      
      There is a race in the TUN driver between napi_busy_loop and
      napi_gro_frags. This commit resolves the race by adding the NAPI struct
      via netif_tx_napi_add, instead of netif_napi_add, which disables polling
      for the NAPI struct.
      
      KCSAN reported:
      BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop
      
      write to 0xffff8880b5d474b0 of 4 bytes by task 11205 on cpu 0:
       gro_normal_list.part.0+0x77/0xb0 net/core/dev.c:5682
       gro_normal_list net/core/dev.c:5678 [inline]
       gro_normal_one net/core/dev.c:5692 [inline]
       napi_frags_finish net/core/dev.c:5705 [inline]
       napi_gro_frags+0x625/0x770 net/core/dev.c:5778
       tun_get_user+0x2150/0x26a0 drivers/net/tun.c:1976
       tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022
       call_write_iter include/linux/fs.h:1895 [inline]
       do_iter_readv_writev+0x487/0x5b0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x13b/0x3c0 fs/read_write.c:951
       vfs_writev+0x118/0x1c0 fs/read_write.c:1015
       do_writev+0xe3/0x250 fs/read_write.c:1058
       __do_sys_writev fs/read_write.c:1131 [inline]
       __se_sys_writev fs/read_write.c:1128 [inline]
       __x64_sys_writev+0x4e/0x60 fs/read_write.c:1128
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff8880b5d474b0 of 4 bytes by task 11168 on cpu 1:
       gro_normal_list net/core/dev.c:5678 [inline]
       napi_busy_loop+0xda/0x4f0 net/core/dev.c:6126
       sk_busy_loop include/net/busy_poll.h:108 [inline]
       __skb_recv_udp+0x4ad/0x560 net/ipv4/udp.c:1689
       udpv6_recvmsg+0x29e/0xe90 net/ipv6/udp.c:288
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       sock_read_iter+0x15f/0x1e0 net/socket.c:967
       call_read_iter include/linux/fs.h:1889 [inline]
       new_sync_read+0x389/0x4f0 fs/read_write.c:414
       __vfs_read+0xb1/0xc0 fs/read_write.c:427
       vfs_read fs/read_write.c:461 [inline]
       vfs_read+0x143/0x2c0 fs/read_write.c:446
       ksys_read+0xd5/0x1b0 fs/read_write.c:587
       __do_sys_read fs/read_write.c:597 [inline]
       __se_sys_read fs/read_write.c:595 [inline]
       __x64_sys_read+0x4c/0x60 fs/read_write.c:595
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 94317099 ("tun: enable NAPI for TUN/TAP driver")
      Signed-off-by: NPetar Penkov <ppenkov@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6aed5cc5
  9. 27 12月, 2019 14 次提交
    • Y
      tun: fix use-after-free when register netdev failed · fc268ebd
      Yang Yingliang 提交于
      [ Upstream commit 77f22f92 ]
      
      I got a UAF repport in tun driver when doing fuzzy test:
      
      [  466.269490] ==================================================================
      [  466.271792] BUG: KASAN: use-after-free in tun_chr_read_iter+0x2ca/0x2d0
      [  466.271806] Read of size 8 at addr ffff888372139250 by task tun-test/2699
      [  466.271810]
      [  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 5.3.0-rc1-00001-g5a9433db2614-dirty #427
      [  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
      [  466.271838] Call Trace:
      [  466.271858]  dump_stack+0xca/0x13e
      [  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271890]  print_address_description+0x79/0x440
      [  466.271906]  ? vprintk_func+0x5e/0xf0
      [  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271935]  __kasan_report+0x15c/0x1df
      [  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
      [  466.271976]  kasan_report+0xe/0x20
      [  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
      [  466.272013]  do_iter_readv_writev+0x4b7/0x740
      [  466.272032]  ? default_llseek+0x2d0/0x2d0
      [  466.272072]  do_iter_read+0x1c5/0x5e0
      [  466.272110]  vfs_readv+0x108/0x180
      [  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
      [  466.299020]  ? fsnotify+0x888/0xd50
      [  466.299040]  ? __fsnotify_parent+0xd0/0x350
      [  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
      [  466.304548]  ? vfs_write+0x264/0x510
      [  466.304569]  ? ksys_write+0x101/0x210
      [  466.304591]  ? do_preadv+0x116/0x1a0
      [  466.304609]  do_preadv+0x116/0x1a0
      [  466.309829]  do_syscall_64+0xc8/0x600
      [  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.309861] RIP: 0033:0x4560f9
      [  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      [  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 0000000000000127
      [  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 00000000004560f9
      [  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 0000000000000003
      [  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 0000000000000000
      [  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 000000000040cb10
      [  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 0000000000000000
      [  466.323057]
      [  466.323064] Allocated by task 2605:
      [  466.335165]  save_stack+0x19/0x80
      [  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
      [  466.337755]  kmem_cache_alloc+0xe8/0x320
      [  466.339050]  getname_flags+0xca/0x560
      [  466.340229]  user_path_at_empty+0x2c/0x50
      [  466.341508]  vfs_statx+0xe6/0x190
      [  466.342619]  __do_sys_newstat+0x81/0x100
      [  466.343908]  do_syscall_64+0xc8/0x600
      [  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.347034]
      [  466.347517] Freed by task 2605:
      [  466.348471]  save_stack+0x19/0x80
      [  466.349476]  __kasan_slab_free+0x12e/0x180
      [  466.350726]  kmem_cache_free+0xc8/0x430
      [  466.351874]  putname+0xe2/0x120
      [  466.352921]  filename_lookup+0x257/0x3e0
      [  466.354319]  vfs_statx+0xe6/0x190
      [  466.355498]  __do_sys_newstat+0x81/0x100
      [  466.356889]  do_syscall_64+0xc8/0x600
      [  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  466.359567]
      [  466.360050] The buggy address belongs to the object at ffff888372139100
      [  466.360050]  which belongs to the cache names_cache of size 4096
      [  466.363735] The buggy address is located 336 bytes inside of
      [  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
      [  466.367179] The buggy address belongs to the page:
      [  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
      [  466.371582] flags: 0x2fffff80010200(slab|head)
      [  466.372910] raw: 002fffff80010200 dead000000000100 dead000000000122 ffff8883df1b4f00
      [  466.375209] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  466.377778] page dumped because: kasan: bad access detected
      [  466.379730]
      [  466.380288] Memory state around the buggy address:
      [  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.388257]                                                  ^
      [  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  466.394667] ==================================================================
      
      tun_chr_read_iter() accessed the memory which freed by free_netdev()
      called by tun_set_iff():
      
              CPUA                                           CPUB
        tun_set_iff()
          alloc_netdev_mqs()
          tun_attach()
                                                        tun_chr_read_iter()
                                                          tun_get()
                                                          tun_do_read()
                                                            tun_ring_recv()
          register_netdevice() <-- inject error
          goto err_detach
          tun_detach_all() <-- set RCV_SHUTDOWN
          free_netdev() <-- called from
                           err_free_dev path
            netdev_freemem() <-- free the memory
                              without check refcount
            (In this path, the refcount cannot prevent
             freeing the memory of dev, and the memory
             will be used by dev_put() called by
             tun_chr_read_iter() on CPUB.)
                                                           (Break from tun_ring_recv(),
                                                           because RCV_SHUTDOWN is set)
                                                         tun_put()
                                                           dev_put() <-- use the memory
                                                                         freed by netdev_freemem()
      
      Put the publishing of tfile->tun after register_netdevice(),
      so tun_get() won't get the tun pointer that freed by
      err_detach path if register_netdevice() failed.
      
      Fixes: eb0fb363 ("tuntap: attach queue 0 before registering netdevice")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Suggested-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      fc268ebd
    • A
      tun: mark small packets as owned by the tap sock · ea79eedb
      Alexis Bauvin 提交于
      [ Upstream commit 4b663366 ]
      
      - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
      
      Small packets going out of a tap device go through an optimized code
      path that uses build_skb() rather than sock_alloc_send_pskb(). The
      latter calls skb_set_owner_w(), but the small packet code path does not.
      
      The net effect is that small packets are not owned by the userland
      application's socket (e.g. QEMU), while large packets are.
      This can be seen with a TCP session, where packets are not owned when
      the window size is small enough (around PAGE_SIZE), while they are once
      the window grows (note that this requires the host to support virtio
      tso for the guest to offload segmentation).
      All this leads to inconsistent behaviour in the kernel, especially on
      netfilter modules that uses sk->socket (e.g. xt_owner).
      
      Fixes: 66ccbc9c ("tap: use build_skb() for small packet")
      Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ea79eedb
    • F
      tun: wake up waitqueues after IFF_UP is set · 06d0f0d5
      Fei Li 提交于
      [ Upstream commit 72b319dc ]
      
      Currently after setting tap0 link up, the tun code wakes tx/rx waited
      queues up in tun_net_open() when .ndo_open() is called, however the
      IFF_UP flag has not been set yet. If there's already a wait queue, it
      would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
      Then the saving vhost_poll_start() will add the wq into wqh until it
      is waken up again. Although this works when IFF_UP flag has been set
      when tun_chr_poll detects; this is not true if IFF_UP flag has not
      been set at that time. Sadly the latter case is a fatal error, as
      the wq will never be waken up in future unless later manually
      setting link up on purpose.
      
      Fix this by moving the wakeup process into the NETDEV_UP event
      notifying process, this makes sure IFF_UP has been set before all
      waited queues been waken up.
      Signed-off-by: NFei Li <lifei.shirley@bytedance.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      06d0f0d5
    • J
      tuntap: synchronize through tfiles array instead of tun->numqueues · 9404e394
      Jason Wang 提交于
      mainline inclusion
      from mainline-5.1
      commit 9871a9e47a2646fe30ae7fd2e67668a8d30912f6
      category: bugfix
      bugzilla: 13267
      CVE: NA
      
      -------------------------------------------------
      
      When a queue(tfile) is detached through __tun_detach(), we move the
      last enabled tfile to the position where detached one sit but don't
      NULL out last position. We expect to synchronize the datapath through
      tun->numqueues. Unfortunately, this won't work since we're lacking
      sufficient mechanism to order or synchronize the access to
      tun->numqueues.
      
      To fix this, NULL out the last position during detaching and check
      RCU protected tfile against NULL instead of checking tun->numqueues in
      datapath.
      
      Cc: YueHaibing <yuehaibing@huawei.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: weiyongjun (A) <weiyongjun1@huawei.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Fixes: c8d68e6b ("tuntap: multiqueue support")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9404e394
    • J
      tuntap: fix dividing by zero in ebpf queue selection · 2623734d
      Jason Wang 提交于
      mainline inclusion
      from mainline-5.1
      commit a35d310f03a692bf4798eb309a1950a06a150620
      category: bugfix
      bugzilla: 13267
      CVE: NA
      
      -------------------------------------------------
      
      We need check if tun->numqueues is zero (e.g for the persist device)
      before trying to use it for modular arithmetic.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Fixes: 96f84061("tun: add eBPF based queue selection method")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2623734d
    • Y
      Revert "tuntap: synchronize through tfiles instead of numqueues" · e4f04ac5
      YueHaibing 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 13267
      CVE: NA
      
      -------------------------------------------------
      
      This reverts commit f21e16fe09f32e389c66b9a83c7e47dd1e3e6ce6.
      
      Sync code with mainline
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e4f04ac5
    • J
      tuntap: synchronize through tfiles instead of numqueues · 6aaf0c01
      Jason Wang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 13267
      CVE: NA
      
      from: https://lkml.org/lkml/2019/4/28/294
      
      -------------------------------------------------
      
      KASAN report this:
      
      BUG: KASAN: use-after-free in tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
      Read of size 8 at addr ffff88836cc26a70 by task swapper/3/0
      
      CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.32 #6
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xca/0x13e lib/dump_stack.c:113
       print_address_description+0x79/0x330 mm/kasan/report.c:253
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x18a/0x2d0 mm/kasan/report.c:409
       tun_net_xmit+0x1670/0x1750 drivers/net/tun.c:1104
       __netdev_start_xmit include/linux/netdevice.h:4300 [inline]
       netdev_start_xmit include/linux/netdevice.h:4309 [inline]
       xmit_one net/core/dev.c:3243 [inline]
       dev_hard_start_xmit+0x17c/0x780 net/core/dev.c:3259
       sch_direct_xmit+0x24f/0x8a0 net/sched/sch_generic.c:327
       qdisc_restart net/sched/sch_generic.c:390 [inline]
       __qdisc_run+0x45b/0x1590 net/sched/sch_generic.c:398
       qdisc_run include/net/pkt_sched.h:120 [inline]
       __dev_xmit_skb net/core/dev.c:3438 [inline]
       __dev_queue_xmit+0xa6b/0x2500 net/core/dev.c:3797
       neigh_output include/net/neighbour.h:501 [inline]
       ip6_finish_output2+0xa36/0x2290 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x3e7/0xa20 net/ipv6/ip6_output.c:154
       NF_HOOK_COND include/linux/netfilter.h:278 [inline]
       ip6_output+0x1e2/0x720 net/ipv6/ip6_output.c:171
       dst_output include/net/dst.h:444 [inline]
       NF_HOOK include/linux/netfilter.h:289 [inline]
       mld_sendpack+0x740/0xf20 net/ipv6/mcast.c:1683
       mld_send_cr net/ipv6/mcast.c:1979 [inline]
       mld_ifc_timer_expire+0x3cc/0x7f0 net/ipv6/mcast.c:2478
       call_timer_fn+0x1ea/0x6a0 kernel/time/timer.c:1326
       expire_timers kernel/time/timer.c:1363 [inline]
       __run_timers kernel/time/timer.c:1682 [inline]
       run_timer_softirq+0x637/0x1070 kernel/time/timer.c:1695
       __do_softirq+0x26d/0xabd kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:372 [inline]
       irq_exit+0x209/0x290 kernel/softirq.c:412
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       smp_apic_timer_interrupt+0xf6/0x480 arch/x86/kernel/apic/apic.c:1056
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:864
       </IRQ>
      RIP: 0010:native_safe_halt+0x2/0x10 arch/x86/include/asm/irqflags.h:58
      Code: 01 f0 0f 82 bc fd ff ff 48 c7 c7 40 25 b1 83 e8 a1 72 03 ff e9 ab fd ff ff 4c 89 e7 e8 37 f5 a7 fe e9 6a ff ff ff 90 90 fb f4 <c3> 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
      RSP: 0018:ffff8883e0f2fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
      RAX: 0000000000000007 RBX: dffffc0000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8883e0f04f1c
      RBP: 0000000000000003 R08: ffffed107c5dc77b R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff848b96e0
      R13: 0000000000000003 R14: 1ffff1107c1e5fae R15: 0000000000000000
       arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline]
       default_idle+0x24/0x2b0 arch/x86/kernel/process.c:561
       cpuidle_idle_call kernel/sched/idle.c:153 [inline]
       do_idle+0x2ca/0x420 kernel/sched/idle.c:262
       cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:368
       start_secondary+0x421/0x570 arch/x86/kernel/smpboot.c:271
       secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243
      
      Allocated by task 19764:
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
       __kmalloc+0x11b/0x2d0 mm/slub.c:3750
       kmalloc include/linux/slab.h:518 [inline]
       sk_prot_alloc+0xf6/0x290 net/core/sock.c:1469
       sk_alloc+0x3d/0xc00 net/core/sock.c:1523
       tun_chr_open+0x80/0x560 drivers/net/tun.c:3204
       misc_open+0x367/0x4e0 drivers/char/misc.c:141
       chrdev_open+0x212/0x580 fs/char_dev.c:417
       do_dentry_open+0x704/0x1050 fs/open.c:777
       do_last fs/namei.c:3418 [inline]
       path_openat+0x7ed/0x2ae0 fs/namei.c:3533
       do_filp_open+0x1aa/0x2b0 fs/namei.c:3564
       do_sys_open+0x307/0x430 fs/open.c:1069
       do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 19764:
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x12e/0x180 mm/kasan/kasan.c:521
       slab_free_hook mm/slub.c:1370 [inline]
       slab_free_freelist_hook mm/slub.c:1397 [inline]
       slab_free mm/slub.c:2952 [inline]
       kfree+0xeb/0x2f0 mm/slub.c:3905
       sk_prot_free net/core/sock.c:1506 [inline]
       __sk_destruct+0x4e6/0x6a0 net/core/sock.c:1588
       sk_destruct+0x48/0x70 net/core/sock.c:1596
       __sk_free+0xa9/0x270 net/core/sock.c:1607
       sk_free+0x2a/0x30 net/core/sock.c:1618
       sock_put include/net/sock.h:1696 [inline]
       __tun_detach+0x464/0xf70 drivers/net/tun.c:735
       tun_detach drivers/net/tun.c:747 [inline]
       tun_chr_close+0xd8/0x190 drivers/net/tun.c:3241
       __fput+0x27f/0x7f0 fs/file_table.c:278
       task_work_run+0x136/0x1b0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:193 [inline]
       exit_to_usermode_loop+0x1a7/0x1d0 arch/x86/entry/common.c:166
       prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
       do_syscall_64+0x461/0x580 arch/x86/entry/common.c:293
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff88836cc26600
       which belongs to the cache kmalloc-4096 of size 4096
      The buggy address is located 1136 bytes inside of
       4096-byte region [ffff88836cc26600, ffff88836cc27600)
      The buggy address belongs to the page:
      page:ffffea000db30800 count:1 mapcount:0 mapping:ffff8883e280e600 index:0x0 compound_mapcount: 0
      flags: 0x2fffff80008100(slab|head)
      raw: 002fffff80008100 dead000000000100 dead000000000200 ffff8883e280e600
      raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88836cc26900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88836cc26980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      > ffff88836cc26a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                   ^
       ffff88836cc26a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88836cc26b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      If tun driver have multiqueues, user close the last queue by
      tun_detach, then tun->tfiles[index] is not cleared. Then a new
      queue may add to the tun, which using rcu_assign_pointer
      tun->tfiles[index] to the new tfile and increase the numqueues.
      However if there send a packet during this time, which picking the last
      queue, it may uses the old tun->tfiles[index], beacause there no
      RCU grace period.
      
      synchronize everything explicitly and make sure tfile is valid
      before use it.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: c8d68e6b ("tuntap: multiqueue support")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6aaf0c01
    • E
      tun: add a missing rcu_read_unlock() in error path · 11ed48d9
      Eric Dumazet 提交于
      commit 9180bb4f upstream.
      
      In my latest patch I missed one rcu_read_unlock(), in case
      device is down.
      
      Fixes: 4477138f ("tun: properly test for IFF_UP")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      11ed48d9
    • E
      tun: properly test for IFF_UP · 3205bc2b
      Eric Dumazet 提交于
      [ Upstream commit 4477138f ]
      
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx_ni() or napi_gro_frags() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro layer.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3205bc2b
    • T
      tun: remove unnecessary memory barrier · 3809d708
      Timur Celik 提交于
      [ Upstream commit ecef67cb ]
      
      Replace set_current_state with __set_current_state since no memory
      barrier is needed at this point.
      Signed-off-by: NTimur Celik <mail@timurcelik.de>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3809d708
    • T
      tun: fix blocking read · ea31e4ec
      Timur Celik 提交于
      [ Upstream commit 71828b22 ]
      
      This patch moves setting of the current state into the loop. Otherwise
      the task may end up in a busy wait loop if none of the break conditions
      are met.
      Signed-off-by: NTimur Celik <mail@timurcelik.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ea31e4ec
    • E
      tun: use netdev_alloc_frag() in tun_napi_alloc_frags() · 80aeb531
      Eric Dumazet 提交于
      mainline inclusion
      from mainline-5.0
      commit aa6daaca
      category: bugfix
      bugzilla: 6169
      CVE: NA
      
      -------------------------------------------------
      
      In order to cook skbs in the same way than Ethernet drivers,
      it is probably better to not use GFP_KERNEL, but rather
      use the GFP_ATOMIC and PFMEMALLOC mechanisms provided by
      netdev_alloc_frag().
      
      This would allow to use tun driver even in memory stress
      situations, especially if swap is used over this tun channel.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petar Penkov <peterpenkov96@gmail.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLin Miaohe <linmiaohe@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      80aeb531
    • G
      tun: move the call to tun_set_real_num_queues · e227e353
      George Amanakis 提交于
      [ Upstream commit 3a03cb84 ]
      
      Call tun_set_real_num_queues() after the increment of tun->numqueues
      since the former depends on it. Otherwise, the number of queues is not
      correctly accounted for, which results to warnings similar to:
      "vnet0 selects TX queue 11, but real number of TX queues is 11".
      
      Fixes: 0b7959b6 ("tun: publish tfile after it's fully initialized")
      Reported-and-tested-by: NGeorge Amanakis <gamanakis@gmail.com>
      Signed-off-by: NGeorge Amanakis <gamanakis@gmail.com>
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e227e353
    • S
      tun: publish tfile after it's fully initialized · 76ad321b
      Stanislav Fomichev 提交于
      [ Upstream commit 0b7959b6 ]
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
      Call Trace:
       ? napi_gro_frags+0xa7/0x2c0
       tun_get_user+0xb50/0xf20
       tun_chr_write_iter+0x53/0x70
       new_sync_write+0xff/0x160
       vfs_write+0x191/0x1e0
       __x64_sys_write+0x5e/0xd0
       do_syscall_64+0x47/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I think there is a subtle race between sending a packet via tap and
      attaching it:
      
      CPU0:                    CPU1:
      tun_chr_ioctl(TUNSETIFF)
        tun_set_iff
          tun_attach
            rcu_assign_pointer(tfile->tun, tun);
                               tun_fops->write_iter()
                                 tun_chr_write_iter
                                   tun_napi_alloc_frags
                                     napi_get_frags
                                       napi->skb = napi_alloc_skb
            tun_napi_init
              netif_napi_add
                napi->skb = NULL
                                    napi->skb is NULL here
                                    napi_gro_frags
                                      napi_frags_skb
      				  skb = napi->skb
      				  skb_reset_mac_header(skb)
      				  panic()
      
      Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
      be the last thing we do in tun_attach(); this should guarantee that when we
      call tun_get() we always get an initialized object.
      
      v2 changes:
      * remove extra napi_mutex locks/unlocks for napi operations
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      76ad321b
  10. 17 12月, 2018 1 次提交
  11. 23 11月, 2018 1 次提交
    • M
      tuntap: fix multiqueue rx · 3e8f5d55
      Matthew Cover 提交于
      [ Upstream commit 8ebebcba559a1bfbaec7bbda64feb9870b9c58da ]
      
      When writing packets to a descriptor associated with a combined queue, the
      packets should end up on that queue.
      
      Before this change all packets written to any descriptor associated with a
      tap interface end up on rx-0, even when the descriptor is associated with a
      different queue.
      
      The rx traffic can be generated by either of the following.
        1. a simple tap program which spins up multiple queues and writes packets
           to each of the file descriptors
        2. tx from a qemu vm with a tap multiqueue netdev
      
      The queue for rx traffic can be observed by either of the following (done
      on the hypervisor in the qemu case).
        1. a simple netmap program which opens and reads from per-queue
           descriptors
        2. configuring RPS and doing per-cpu captures with rxtxcpu
      
      Alternatively, if you printk() the return value of skb_get_rx_queue() just
      before each instance of netif_receive_skb() in tun.c, you will get 65535
      for every skb.
      
      Calling skb_record_rx_queue() to set the rx queue to the queue_index fixes
      the association between descriptor and rx queue.
      Signed-off-by: NMatthew Cover <matthew.cover@stackpath.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e8f5d55
  12. 14 11月, 2018 1 次提交
  13. 02 10月, 2018 3 次提交
    • E
      tun: napi flags belong to tfile · af3fb24e
      Eric Dumazet 提交于
      Since tun->flags might be shared by multiple tfile structures,
      it is better to make sure tun_get_user() is using the flags
      for the current tfile.
      
      Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint
      of what could happen, but we need something stronger to please
      syzbot.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
      RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
      RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
      RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
      R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
      R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
      FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715
       tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967
       call_write_iter include/linux/fs.h:1808 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
       vfs_write+0x1fc/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457579
      Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579
      RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a
      RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4
      R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff
      Modules linked in:
      
      RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
      RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
      RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
      RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
      R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
      R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
      FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af3fb24e
    • E
      tun: initialize napi_mutex unconditionally · c7256f57
      Eric Dumazet 提交于
      This is the first part to fix following syzbot report :
      
      console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=443816db871edd66
      dashboard link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80
      
      Following patch is fixing the race condition, but it seems safer
      to initialize this mutex at tfile creation anyway.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7256f57
    • E
      tun: remove unused parameters · 06e55add
      Eric Dumazet 提交于
      tun_napi_disable() and tun_napi_del() do not need
      a pointer to the tun_struct
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06e55add
  14. 24 9月, 2018 1 次提交
    • E
      tun: remove ndo_poll_controller · 765cdc20
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      tun uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765cdc20
  15. 05 8月, 2018 1 次提交
  16. 21 7月, 2018 1 次提交
    • E
      signal: Use PIDTYPE_TGID to clearly store where file signals will be sent · 01919134
      Eric W. Biederman 提交于
      When f_setown is called a pid and a pid type are stored.  Replace the use
      of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
      group.  Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
      is only for a thread.
      
      Update the users of __f_setown to use PIDTYPE_TGID instead of
      PIDTYPE_PID.
      
      For now the code continues to capture task_pid (when task_tgid would
      really be appropriate), and iterate on PIDTYPE_PID (even when type ==
      PIDTYPE_TGID) out of an abundance of caution to preserve existing
      behavior.
      
      Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
      for tgid lookup also be used to avoid taking the tasklist lock.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      01919134
  17. 17 7月, 2018 1 次提交
  18. 14 7月, 2018 1 次提交
  19. 10 7月, 2018 1 次提交
  20. 08 6月, 2018 1 次提交
    • W
      net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan · fd3a8862
      Willem de Bruijn 提交于
      Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
      to communicate packet metadata to userspace.
      
      For skbuffs with vlan, the first two return the packet as it may have
      existed on the wire, inserting the VLAN tag in the user buffer.  Then
      virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
      
      Commit f09e2249 ("macvtap: restore vlan header on user read")
      added this feature to macvtap. Commit 3ce9b20f ("macvtap: Fix
      csum_start when VLAN tags are present") then fixed up csum_start.
      
      Virtio, packet and uml do not insert the vlan header in the user
      buffer.
      
      When introducing virtio_net_hdr_from_skb to deduplicate filling in
      the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
      applied uniformly, breaking csum offset for packets with vlan on
      virtio and packet.
      
      Make insertion of VLAN_HLEN optional. Convert the callers to pass it
      when needed.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Fixes: 1276f24e ("packet: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd3a8862
  21. 05 6月, 2018 2 次提交