1. 22 3月, 2019 1 次提交
  2. 21 3月, 2019 1 次提交
    • P
      net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce
      Paolo Abeni 提交于
      After the previous patch, all the callers of ndo_select_queue()
      provide as a 'fallback' argument netdev_pick_tx.
      The only exceptions are nested calls to ndo_select_queue(),
      which pass down the 'fallback' available in the current scope
      - still netdev_pick_tx.
      
      We can drop such argument and replace fallback() invocation with
      netdev_pick_tx(). This avoids an indirect call per xmit packet
      in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
      with device drivers implementing such ndo. It also clean the code
      a bit.
      
      Tested with ixgbe and CONFIG_FCOE=m
      
      With pktgen using queue xmit:
      threads		vanilla 	patched
      		(kpps)		(kpps)
      1		2334		2428
      2		4166		4278
      4		7895		8100
      
       v1 -> v2:
       - rebased after helper's name change
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a350ecce
  3. 17 3月, 2019 1 次提交
  4. 16 3月, 2019 1 次提交
    • E
      tun: properly test for IFF_UP · 4477138f
      Eric Dumazet 提交于
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx_ni() or napi_gro_frags() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro layer.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Fixes: 1bd4978a ("tun: honor IFF_UP in tun_get_user()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4477138f
  5. 26 2月, 2019 1 次提交
  6. 25 2月, 2019 1 次提交
  7. 23 2月, 2019 1 次提交
    • M
      net: Don't set transport offset to invalid value · d2aa125d
      Maxim Mikityanskiy 提交于
      If the socket was created with socket(AF_PACKET, SOCK_RAW, 0),
      skb->protocol will be unset, __skb_flow_dissect() will fail, and
      skb_probe_transport_header() will fall back to the offset_hint, making
      the resulting skb_transport_offset incorrect.
      
      If, however, there is no transport header in the packet,
      transport_header shouldn't be set to an arbitrary value.
      
      Fix it by leaving the transport offset unset if it couldn't be found, to
      be explicit rather than to fill it with some wrong value. It changes the
      behavior, but if some code relied on the old behavior, it would be
      broken anyway, as the old one is incorrect.
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2aa125d
  8. 31 1月, 2019 1 次提交
  9. 10 1月, 2019 1 次提交
    • S
      tun: publish tfile after it's fully initialized · 0b7959b6
      Stanislav Fomichev 提交于
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
      Call Trace:
       ? napi_gro_frags+0xa7/0x2c0
       tun_get_user+0xb50/0xf20
       tun_chr_write_iter+0x53/0x70
       new_sync_write+0xff/0x160
       vfs_write+0x191/0x1e0
       __x64_sys_write+0x5e/0xd0
       do_syscall_64+0x47/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I think there is a subtle race between sending a packet via tap and
      attaching it:
      
      CPU0:                    CPU1:
      tun_chr_ioctl(TUNSETIFF)
        tun_set_iff
          tun_attach
            rcu_assign_pointer(tfile->tun, tun);
                               tun_fops->write_iter()
                                 tun_chr_write_iter
                                   tun_napi_alloc_frags
                                     napi_get_frags
                                       napi->skb = napi_alloc_skb
            tun_napi_init
              netif_napi_add
                napi->skb = NULL
                                    napi->skb is NULL here
                                    napi_gro_frags
                                      napi_frags_skb
      				  skb = napi->skb
      				  skb_reset_mac_header(skb)
      				  panic()
      
      Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
      be the last thing we do in tun_attach(); this should guarantee that when we
      call tun_get() we always get an initialized object.
      
      v2 changes:
      * remove extra napi_mutex locks/unlocks for napi operations
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b7959b6
  10. 15 12月, 2018 1 次提交
  11. 14 12月, 2018 1 次提交
  12. 07 12月, 2018 2 次提交
  13. 04 12月, 2018 1 次提交
    • P
      tun: remove skb access after netif_receive_skb · 4e4b08e5
      Prashant Bhole 提交于
      In tun.c skb->len was accessed while doing stats accounting after a
      call to netif_receive_skb. We can not access skb after this call
      because buffers may be dropped.
      
      The fix for this bug would be to store skb->len in local variable and
      then use it after netif_receive_skb(). IMO using xdp data size for
      accounting bytes will be better because input for tun_xdp_one() is
      xdp_buff.
      
      Hence this patch:
      - fixes a bug by removing skb access after netif_receive_skb()
      - uses xdp data size for accounting bytes
      
      [613.019057] BUG: KASAN: use-after-free in tun_sendmsg+0x77c/0xc50 [tun]
      [613.021062] Read of size 4 at addr ffff8881da9ab7c0 by task vhost-1115/1155
      [613.023073]
      [613.024003] CPU: 0 PID: 1155 Comm: vhost-1115 Not tainted 4.20.0-rc3-vm+ #232
      [613.026029] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [613.029116] Call Trace:
      [613.031145]  dump_stack+0x5b/0x90
      [613.032219]  print_address_description+0x6c/0x23c
      [613.034156]  ? tun_sendmsg+0x77c/0xc50 [tun]
      [613.036141]  kasan_report.cold.5+0x241/0x308
      [613.038125]  tun_sendmsg+0x77c/0xc50 [tun]
      [613.040109]  ? tun_get_user+0x1960/0x1960 [tun]
      [613.042094]  ? __isolate_free_page+0x270/0x270
      [613.045173]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
      [613.047127]  ? peek_head_len.part.13+0x90/0x90 [vhost_net]
      [613.049096]  ? get_tx_bufs+0x5a/0x2c0 [vhost_net]
      [613.051106]  ? vhost_enable_notify+0x2d8/0x420 [vhost]
      [613.053139]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
      [613.053139]  ? vhost_net_buf_peek+0x340/0x340 [vhost_net]
      [613.053139]  ? __mutex_lock+0x8d9/0xb30
      [613.053139]  ? finish_task_switch+0x8f/0x3f0
      [613.053139]  ? handle_tx+0x32/0x120 [vhost_net]
      [613.053139]  ? mutex_trylock+0x110/0x110
      [613.053139]  ? finish_task_switch+0xcf/0x3f0
      [613.053139]  ? finish_task_switch+0x240/0x3f0
      [613.053139]  ? __switch_to_asm+0x34/0x70
      [613.053139]  ? __switch_to_asm+0x40/0x70
      [613.053139]  ? __schedule+0x506/0xf10
      [613.053139]  handle_tx+0xc7/0x120 [vhost_net]
      [613.053139]  vhost_worker+0x166/0x200 [vhost]
      [613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
      [613.053139]  ? __kthread_parkme+0x77/0x90
      [613.053139]  ? vhost_dev_init+0x580/0x580 [vhost]
      [613.053139]  kthread+0x1b1/0x1d0
      [613.053139]  ? kthread_park+0xb0/0xb0
      [613.053139]  ret_from_fork+0x35/0x40
      [613.088705]
      [613.088705] Allocated by task 1155:
      [613.088705]  kasan_kmalloc+0xbf/0xe0
      [613.088705]  kmem_cache_alloc+0xdc/0x220
      [613.088705]  __build_skb+0x2a/0x160
      [613.088705]  build_skb+0x14/0xc0
      [613.088705]  tun_sendmsg+0x4f0/0xc50 [tun]
      [613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
      [613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
      [613.088705]  handle_tx+0xc7/0x120 [vhost_net]
      [613.088705]  vhost_worker+0x166/0x200 [vhost]
      [613.088705]  kthread+0x1b1/0x1d0
      [613.088705]  ret_from_fork+0x35/0x40
      [613.088705]
      [613.088705] Freed by task 1155:
      [613.088705]  __kasan_slab_free+0x12e/0x180
      [613.088705]  kmem_cache_free+0xa0/0x230
      [613.088705]  ip6_mc_input+0x40f/0x5a0
      [613.088705]  ipv6_rcv+0xc9/0x1e0
      [613.088705]  __netif_receive_skb_one_core+0xc1/0x100
      [613.088705]  netif_receive_skb_internal+0xc4/0x270
      [613.088705]  br_pass_frame_up+0x2b9/0x2e0
      [613.088705]  br_handle_frame_finish+0x2fb/0x7a0
      [613.088705]  br_handle_frame+0x30f/0x6c0
      [613.088705]  __netif_receive_skb_core+0x61a/0x15b0
      [613.088705]  __netif_receive_skb_one_core+0x8e/0x100
      [613.088705]  netif_receive_skb_internal+0xc4/0x270
      [613.088705]  tun_sendmsg+0x738/0xc50 [tun]
      [613.088705]  vhost_tx_batch.isra.14+0xeb/0x1f0 [vhost_net]
      [613.088705]  handle_tx_copy+0x2d0/0x8f0 [vhost_net]
      [613.088705]  handle_tx+0xc7/0x120 [vhost_net]
      [613.088705]  vhost_worker+0x166/0x200 [vhost]
      [613.088705]  kthread+0x1b1/0x1d0
      [613.088705]  ret_from_fork+0x35/0x40
      [613.088705]
      [613.088705] The buggy address belongs to the object at ffff8881da9ab740
      [613.088705]  which belongs to the cache skbuff_head_cache of size 232
      
      Fixes: 043d222f ("tuntap: accept an array of XDP buffs through sendmsg()")
      Reviewed-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e4b08e5
  14. 01 12月, 2018 2 次提交
  15. 19 11月, 2018 2 次提交
    • M
      tuntap: fix multiqueue rx · 8ebebcba
      Matthew Cover 提交于
      When writing packets to a descriptor associated with a combined queue, the
      packets should end up on that queue.
      
      Before this change all packets written to any descriptor associated with a
      tap interface end up on rx-0, even when the descriptor is associated with a
      different queue.
      
      The rx traffic can be generated by either of the following.
        1. a simple tap program which spins up multiple queues and writes packets
           to each of the file descriptors
        2. tx from a qemu vm with a tap multiqueue netdev
      
      The queue for rx traffic can be observed by either of the following (done
      on the hypervisor in the qemu case).
        1. a simple netmap program which opens and reads from per-queue
           descriptors
        2. configuring RPS and doing per-cpu captures with rxtxcpu
      
      Alternatively, if you printk() the return value of skb_get_rx_queue() just
      before each instance of netif_receive_skb() in tun.c, you will get 65535
      for every skb.
      
      Calling skb_record_rx_queue() to set the rx queue to the queue_index fixes
      the association between descriptor and rx queue.
      Signed-off-by: NMatthew Cover <matthew.cover@stackpath.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ebebcba
    • E
      tun: use netdev_alloc_frag() in tun_napi_alloc_frags() · aa6daaca
      Eric Dumazet 提交于
      In order to cook skbs in the same way than Ethernet drivers,
      it is probably better to not use GFP_KERNEL, but rather
      use the GFP_ATOMIC and PFMEMALLOC mechanisms provided by
      netdev_alloc_frag().
      
      This would allow to use tun driver even in memory stress
      situations, especially if swap is used over this tun channel.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Petar Penkov <peterpenkov96@gmail.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa6daaca
  16. 18 11月, 2018 2 次提交
  17. 08 11月, 2018 1 次提交
  18. 16 10月, 2018 1 次提交
    • S
      tun: Consistently configure generic netdev params via rtnetlink · df52eab2
      Serhey Popovych 提交于
      Configuring generic network device parameters on tun will fail in
      presence of IFLA_INFO_KIND attribute in IFLA_LINKINFO nested attribute
      since tun_validate() always return failure.
      
      This can be visualized with following ip-link(8) command sequences:
      
        # ip link set dev tun0 group 100
        # ip link set dev tun0 group 100 type tun
        RTNETLINK answers: Invalid argument
      
      with contrast to dummy and veth drivers:
      
        # ip link set dev dummy0 group 100
        # ip link set dev dummy0 type dummy
      
        # ip link set dev veth0 group 100
        # ip link set dev veth0 group 100 type veth
      
      Fix by returning zero in tun_validate() when @data is NULL that is
      always in case since rtnl_link_ops->maxtype is zero in tun driver.
      
      Fixes: f019a7a5 ("tun: Implement ip link del tunXXX")
      Signed-off-by: NSerhey Popovych <serhe.popovych@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df52eab2
  19. 11 10月, 2018 1 次提交
  20. 02 10月, 2018 3 次提交
    • E
      tun: napi flags belong to tfile · af3fb24e
      Eric Dumazet 提交于
      Since tun->flags might be shared by multiple tfile structures,
      it is better to make sure tun_get_user() is using the flags
      for the current tfile.
      
      Presence of the READ_ONCE() in tun_napi_frags_enabled() gave a hint
      of what could happen, but we need something stronger to please
      syzbot.
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 13647 Comm: syz-executor5 Not tainted 4.19.0-rc5+ #59
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
      RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
      RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
      RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
      R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
      R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
      FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       napi_gro_frags+0x3f4/0xc90 net/core/dev.c:5715
       tun_get_user+0x31d5/0x42a0 drivers/net/tun.c:1922
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1967
       call_write_iter include/linux/fs.h:1808 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
       vfs_write+0x1fc/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457579
      Code: 1d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fe003614c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457579
      RDX: 0000000000000012 RSI: 0000000020000000 RDI: 000000000000000a
      RBP: 000000000072c040 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0036156d4
      R13: 00000000004c5574 R14: 00000000004d8e98 R15: 00000000ffffffff
      Modules linked in:
      
      RIP: 0010:dev_gro_receive+0x132/0x2720 net/core/dev.c:5427
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 6e 20 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8b 6e 10 49 8d bd d0 00 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 59 20 00 00 4d 8b a5 d0 00 00 00 31 ff 41 81 e4
      RSP: 0018:ffff8801c400f410 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8618d325
      RDX: 000000000000001a RSI: ffffffff86189f97 RDI: 00000000000000d0
      RBP: ffff8801c400f608 R08: ffff8801c8fb4300 R09: 0000000000000000
      R10: ffffed0038801ed7 R11: 0000000000000003 R12: ffff8801d327d358
      R13: 0000000000000000 R14: ffff8801c16dd8c0 R15: 0000000000000004
      FS:  00007fe003615700(0000) GS:ffff8801dac00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fe1f3c43db8 CR3: 00000001bebb2000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af3fb24e
    • E
      tun: initialize napi_mutex unconditionally · c7256f57
      Eric Dumazet 提交于
      This is the first part to fix following syzbot report :
      
      console output: https://syzkaller.appspot.com/x/log.txt?x=145378e6400000
      kernel config:  https://syzkaller.appspot.com/x/.config?x=443816db871edd66
      dashboard link: https://syzkaller.appspot.com/bug?extid=e662df0ac1d753b57e80
      
      Following patch is fixing the race condition, but it seems safer
      to initialize this mutex at tfile creation anyway.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: syzbot+e662df0ac1d753b57e80@syzkaller.appspotmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7256f57
    • E
      tun: remove unused parameters · 06e55add
      Eric Dumazet 提交于
      tun_napi_disable() and tun_napi_del() do not need
      a pointer to the tun_struct
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06e55add
  21. 24 9月, 2018 1 次提交
    • E
      tun: remove ndo_poll_controller · 765cdc20
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      tun uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      765cdc20
  22. 14 9月, 2018 9 次提交
  23. 05 8月, 2018 1 次提交
  24. 21 7月, 2018 1 次提交
    • E
      signal: Use PIDTYPE_TGID to clearly store where file signals will be sent · 01919134
      Eric W. Biederman 提交于
      When f_setown is called a pid and a pid type are stored.  Replace the use
      of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
      group.  Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
      is only for a thread.
      
      Update the users of __f_setown to use PIDTYPE_TGID instead of
      PIDTYPE_PID.
      
      For now the code continues to capture task_pid (when task_tgid would
      really be appropriate), and iterate on PIDTYPE_PID (even when type ==
      PIDTYPE_TGID) out of an abundance of caution to preserve existing
      behavior.
      
      Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
      for tgid lookup also be used to avoid taking the tasklist lock.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      01919134
  25. 17 7月, 2018 1 次提交
  26. 14 7月, 2018 1 次提交