1. 23 6月, 2020 1 次提交
    • T
      hsr: avoid to create proc file after unregister · de0083c7
      Taehee Yoo 提交于
      When an interface is being deleted, "/proc/net/dev_snmp6/<interface name>"
      is deleted.
      The function for this is addrconf_ifdown() in the addrconf_notify() and
      it is called by notification, which is NETDEV_UNREGISTER.
      But, if NETDEV_CHANGEMTU is triggered after NETDEV_UNREGISTER,
      this proc file will be created again.
      This recreated proc file will be deleted by netdev_wati_allrefs().
      Before netdev_wait_allrefs() is called, creating a new HSR interface
      routine can be executed and It tries to create a proc file but it will
      find an un-deleted proc file.
      At this point, it warns about it.
      
      To avoid this situation, it can use ->dellink() instead of
      ->ndo_uninit() to release resources because ->dellink() is called
      before NETDEV_UNREGISTER.
      So, a proc file will not be recreated.
      
      Test commands
          ip link add dummy0 type dummy
          ip link add dummy1 type dummy
          ip link set dummy0 mtu 1300
      
          #SHELL1
          while :
          do
              ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
          done
      
          #SHELL2
          while :
          do
              ip link del hsr0
          done
      
      Splat looks like:
      [ 9888.980852][ T2752] proc_dir_entry 'dev_snmp6/hsr0' already registered
      [ 9888.981797][    C2] WARNING: CPU: 2 PID: 2752 at fs/proc/generic.c:372 proc_register+0x2d5/0x430
      [ 9888.981798][    C2] Modules linked in: hsr dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6x
      [ 9888.981814][    C2] CPU: 2 PID: 2752 Comm: ip Tainted: G        W         5.8.0-rc1+ #616
      [ 9888.981815][    C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [ 9888.981816][    C2] RIP: 0010:proc_register+0x2d5/0x430
      [ 9888.981818][    C2] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 65 01 00 00 49 8b b5 e0 00 00 00 48 89 ea 40
      [ 9888.981819][    C2] RSP: 0018:ffff8880628dedf0 EFLAGS: 00010286
      [ 9888.981821][    C2] RAX: dffffc0000000008 RBX: ffff888028c69170 RCX: ffffffffaae09a62
      [ 9888.981822][    C2] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f75ac
      [ 9888.981823][    C2] RBP: ffff888028c693f4 R08: ffffed100d9401bd R09: ffffed100d9401bd
      [ 9888.981824][    C2] R10: ffffffffaddf406f R11: 0000000000000001 R12: ffff888028c69308
      [ 9888.981825][    C2] R13: ffff8880663584c8 R14: dffffc0000000000 R15: ffffed100518d27e
      [ 9888.981827][    C2] FS:  00007f3876b3b0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
      [ 9888.981828][    C2] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9888.981829][    C2] CR2: 00007f387601a8c0 CR3: 000000004101a002 CR4: 00000000000606e0
      [ 9888.981830][    C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 9888.981831][    C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 9888.981832][    C2] Call Trace:
      [ 9888.981833][    C2]  ? snmp6_seq_show+0x180/0x180
      [ 9888.981834][    C2]  proc_create_single_data+0x7c/0xa0
      [ 9888.981835][    C2]  snmp6_register_dev+0xb0/0x130
      [ 9888.981836][    C2]  ipv6_add_dev+0x4b7/0xf60
      [ 9888.981837][    C2]  addrconf_notify+0x684/0x1ca0
      [ 9888.981838][    C2]  ? __mutex_unlock_slowpath+0xd0/0x670
      [ 9888.981839][    C2]  ? kasan_unpoison_shadow+0x30/0x40
      [ 9888.981840][    C2]  ? wait_for_completion+0x250/0x250
      [ 9888.981841][    C2]  ? inet6_ifinfo_notify+0x100/0x100
      [ 9888.981842][    C2]  ? dropmon_net_event+0x227/0x410
      [ 9888.981843][    C2]  ? notifier_call_chain+0x90/0x160
      [ 9888.981844][    C2]  ? inet6_ifinfo_notify+0x100/0x100
      [ 9888.981845][    C2]  notifier_call_chain+0x90/0x160
      [ 9888.981846][    C2]  register_netdevice+0xbe5/0x1070
      [ ... ]
      
      Reported-by: syzbot+1d51c8b74efa4c44adeb@syzkaller.appspotmail.com
      Fixes: e0a4b997 ("hsr: use upper/lower device infrastructure")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de0083c7
  2. 21 6月, 2020 2 次提交
    • R
      net: Add MODULE_DESCRIPTION entries to network modules · 67c20de3
      Rob Gill 提交于
      The user tool modinfo is used to get information on kernel modules, including a
      description where it is available.
      
      This patch adds a brief MODULE_DESCRIPTION to the following modules:
      
      9p
      drop_monitor
      esp4_offload
      esp6_offload
      fou
      fou6
      ila
      sch_fq
      sch_fq_codel
      sch_hhf
      Signed-off-by: NRob Gill <rrobgill@protonmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67c20de3
    • D
      rxrpc: Fix notification call on completion of discarded calls · 0041cd5a
      David Howells 提交于
      When preallocated service calls are being discarded, they're passed to
      ->discard_new_call() to have the caller clean up any attached higher-layer
      preallocated pieces before being marked completed.  However, the act of
      marking them completed now invokes the call's notification function - which
      causes a problem because that function might assume that the previously
      freed pieces of memory are still there.
      
      Fix this by setting a dummy notification function on the socket after
      calling ->discard_new_call().
      
      This results in the following kasan message when the kafs module is
      removed.
      
      ==================================================================
      BUG: KASAN: use-after-free in afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
      Write of size 1 at addr ffff8880946c39e4 by task kworker/u4:1/21
      
      CPU: 0 PID: 21 Comm: kworker/u4:1 Not tainted 5.8.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: netns cleanup_net
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x18f/0x20d lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:383
       __kasan_report mm/kasan/report.c:513 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
       afs_wake_up_async_call+0x6aa/0x770 fs/afs/rxrpc.c:707
       rxrpc_notify_socket+0x1db/0x5d0 net/rxrpc/recvmsg.c:40
       __rxrpc_set_call_completion.part.0+0x172/0x410 net/rxrpc/recvmsg.c:76
       __rxrpc_call_completed net/rxrpc/recvmsg.c:112 [inline]
       rxrpc_call_completed+0xca/0xf0 net/rxrpc/recvmsg.c:111
       rxrpc_discard_prealloc+0x781/0xab0 net/rxrpc/call_accept.c:233
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      Allocated by task 6820:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc mm/kasan/common.c:494 [inline]
       __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:467
       kmem_cache_alloc_trace+0x153/0x7d0 mm/slab.c:3551
       kmalloc include/linux/slab.h:555 [inline]
       kzalloc include/linux/slab.h:669 [inline]
       afs_alloc_call+0x55/0x630 fs/afs/rxrpc.c:141
       afs_charge_preallocation+0xe9/0x2d0 fs/afs/rxrpc.c:757
       afs_open_socket+0x292/0x360 fs/afs/rxrpc.c:92
       afs_net_init+0xa6c/0xe30 fs/afs/main.c:125
       ops_init+0xaf/0x420 net/core/net_namespace.c:151
       setup_net+0x2de/0x860 net/core/net_namespace.c:341
       copy_net_ns+0x293/0x590 net/core/net_namespace.c:482
       create_new_namespaces+0x3fb/0xb30 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0xbd/0x1f0 kernel/nsproxy.c:231
       ksys_unshare+0x43d/0x8e0 kernel/fork.c:2983
       __do_sys_unshare kernel/fork.c:3051 [inline]
       __se_sys_unshare kernel/fork.c:3049 [inline]
       __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3049
       do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:359
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 21:
       save_stack+0x1b/0x40 mm/kasan/common.c:48
       set_track mm/kasan/common.c:56 [inline]
       kasan_set_free_info mm/kasan/common.c:316 [inline]
       __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:455
       __cache_free mm/slab.c:3426 [inline]
       kfree+0x109/0x2b0 mm/slab.c:3757
       afs_put_call+0x585/0xa40 fs/afs/rxrpc.c:190
       rxrpc_discard_prealloc+0x764/0xab0 net/rxrpc/call_accept.c:230
       rxrpc_listen+0x147/0x360 net/rxrpc/af_rxrpc.c:245
       afs_close_socket+0x95/0x320 fs/afs/rxrpc.c:110
       afs_net_exit+0x1bc/0x310 fs/afs/main.c:155
       ops_exit_list.isra.0+0xa8/0x150 net/core/net_namespace.c:186
       cleanup_net+0x511/0xa50 net/core/net_namespace.c:603
       process_one_work+0x965/0x1690 kernel/workqueue.c:2269
       worker_thread+0x96/0xe10 kernel/workqueue.c:2415
       kthread+0x3b5/0x4a0 kernel/kthread.c:291
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:293
      
      The buggy address belongs to the object at ffff8880946c3800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 484 bytes inside of
       1024-byte region [ffff8880946c3800, ffff8880946c3c00)
      The buggy address belongs to the page:
      page:ffffea000251b0c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0
      flags: 0xfffe0000000200(slab)
      raw: 00fffe0000000200 ffffea0002546508 ffffea00024fa248 ffff8880aa000c40
      raw: 0000000000000000 ffff8880946c3000 0000000100000002 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880946c3880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8880946c3980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff8880946c3a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880946c3a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Reported-by: syzbot+d3eccef36ddbd02713e9@syzkaller.appspotmail.com
      Fixes: 5ac0d622 ("rxrpc: Fix missing notification")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0041cd5a
  3. 20 6月, 2020 4 次提交
    • W
      net/sched: cls_api: fix nooffloaddevcnt warning dmesg log · 3c005110
      wenxu 提交于
      The block->nooffloaddevcnt should always count for indr block.
      even the indr block offload successful. The representor maybe
      gone away and the ingress qdisc can work in software mode.
      
      block->nooffloaddevcnt warning with following dmesg log:
      
      [  760.667058] #####################################################
      [  760.668186] ## TEST test-ecmp-add-vxlan-encap-disable-sriov.sh ##
      [  760.669179] #####################################################
      [  761.780655] :test: Fedora 30 (Thirty)
      [  761.783794] :test: Linux reg-r-vrt-018-180 5.7.0+
      [  761.822890] :test: NIC ens1f0 FW 16.26.6000 PCI 0000:81:00.0 DEVICE 0x1019 ConnectX-5 Ex
      [  761.860244] mlx5_core 0000:81:00.0 ens1f0: Link up
      [  761.880693] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
      [  762.059732] mlx5_core 0000:81:00.1 ens1f1: Link up
      [  762.234341] :test: unbind vfs of ens1f0
      [  762.257825] :test: Change ens1f0 eswitch (0000:81:00.0) mode to switchdev
      [  762.291363] :test: unbind vfs of ens1f1
      [  762.306914] :test: Change ens1f1 eswitch (0000:81:00.1) mode to switchdev
      [  762.309237] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(LEGACY), nvfs(2), active vports(3)
      [  763.282598] mlx5_core 0000:81:00.1: E-Switch: Supported tc offload range - chains: 4294967294, prios: 4294967295
      [  763.362825] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  763.444465] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
      [  763.460088] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  763.502586] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  763.552429] ens1f1_0: renamed from eth0
      [  763.569569] mlx5_core 0000:81:00.1: E-Switch: Enable: mode(OFFLOADS), nvfs(2), active vports(3)
      [  763.629694] ens1f1_1: renamed from eth1
      [  764.631552] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_0: link becomes ready
      [  764.670841] :test: unbind vfs of ens1f0
      [  764.681966] :test: unbind vfs of ens1f1
      [  764.726762] mlx5_core 0000:81:00.0 ens1f0: Link up
      [  764.766511] mlx5_core 0000:81:00.1 ens1f1: Link up
      [  764.797325] :test: Add multipath vxlan encap rule and disable sriov
      [  764.798544] :test: config multipath route
      [  764.812732] mlx5_core 0000:81:00.0: lag map port 1:2 port 2:2
      [  764.874556] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:2
      [  765.603681] :test: OK
      [  765.659048] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1_1: link becomes ready
      [  765.675085] :test: verify rule in hw
      [  765.694237] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0: link becomes ready
      [  765.711892] IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1: link becomes ready
      [  766.979230] :test: OK
      [  768.125419] :test: OK
      [  768.127519] :test: - disable sriov ens1f1
      [  768.131160] pci 0000:81:02.2: Removing from iommu group 75
      [  768.132646] pci 0000:81:02.3: Removing from iommu group 76
      [  769.179749] mlx5_core 0000:81:00.1: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
      [  769.455627] mlx5_core 0000:81:00.0: modify lag map port 1:1 port 2:1
      [  769.703990] mlx5_core 0000:81:00.1: MLX5E: StrdRq(1) RqSz(8) StrdSz(2048) RxCqeCmprss(0)
      [  769.988637] mlx5_core 0000:81:00.1 ens1f1: renamed from eth0
      [  769.990022] :test: - disable sriov ens1f0
      [  769.994922] pci 0000:81:00.2: Removing from iommu group 73
      [  769.997048] pci 0000:81:00.3: Removing from iommu group 74
      [  771.035813] mlx5_core 0000:81:00.0: E-Switch: Disable: mode(OFFLOADS), nvfs(2), active vports(3)
      [  771.339091] ------------[ cut here ]------------
      [  771.340812] WARNING: CPU: 6 PID: 3448 at net/sched/cls_api.c:749 tcf_block_offload_unbind.isra.0+0x5c/0x60
      [  771.341728] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlxfw act_ct nf_flow_table kvm_intel nf_nat kvm nf_conntrack irqbypass crct10dif_pclmul igb crc32_pclmul nf_defrag_ipv6 libcrc32c nf_defrag_ipv4 crc32c_intel ghash_clmulni_intel ptp ipmi_ssif intel_cstate pps_c
      ore ses intel_uncore mei_me iTCO_wdt joydev ipmi_si iTCO_vendor_support i2c_i801 enclosure mei ioatdma dca lpc_ich wmi ipmi_devintf pcspkr acpi_power_meter ipmi_msghandler acpi_pad ast i2c_algo_bit drm_vram_helper drm_kms_helper drm_ttm_helper ttm drm mpt3sas raid_class scsi_transport_sas
      [  771.347818] CPU: 6 PID: 3448 Comm: test-ecmp-add-v Not tainted 5.7.0+ #1146
      [  771.348727] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
      [  771.349646] RIP: 0010:tcf_block_offload_unbind.isra.0+0x5c/0x60
      [  771.350553] Code: 4a fd ff ff 83 f8 a1 74 0e 5b 4c 89 e7 5d 41 5c 41 5d e9 07 93 89 ff 8b 83 a0 00 00 00 8d 50 ff 89 93 a0 00 00 00 85 c0 75 df <0f> 0b eb db 0f 1f 44 00 00 41 57 41 56 41 55 41 89 cd 41 54 49 89
      [  771.352420] RSP: 0018:ffffb33144cd3b00 EFLAGS: 00010246
      [  771.353353] RAX: 0000000000000000 RBX: ffff8b37cf4b2800 RCX: 0000000000000000
      [  771.354294] RDX: 00000000ffffffff RSI: ffff8b3b9aad0000 RDI: ffffffff8d5c6e20
      [  771.355245] RBP: ffff8b37eb546948 R08: ffffffffc0b7a348 R09: ffff8b3b9aad0000
      [  771.356189] R10: 0000000000000001 R11: ffff8b3ba7a0a1c0 R12: ffff8b37cf4b2850
      [  771.357123] R13: ffff8b3b9aad0000 R14: ffff8b37cf4b2820 R15: ffff8b37cf4b2820
      [  771.358039] FS:  00007f8a19b6e740(0000) GS:ffff8b3befa00000(0000) knlGS:0000000000000000
      [  771.358965] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  771.359885] CR2: 00007f3afb91c1a0 CR3: 000000045133c004 CR4: 00000000001606e0
      [  771.360825] Call Trace:
      [  771.361764]  __tcf_block_put+0x84/0x150
      [  771.362712]  ingress_destroy+0x1b/0x20 [sch_ingress]
      [  771.363658]  qdisc_destroy+0x3e/0xc0
      [  771.364594]  dev_shutdown+0x7a/0xa5
      [  771.365522]  rollback_registered_many+0x20d/0x530
      [  771.366458]  ? netdev_upper_dev_unlink+0x15d/0x1c0
      [  771.367387]  unregister_netdevice_many.part.0+0xf/0x70
      [  771.368310]  vxlan_netdevice_event+0xa4/0x110 [vxlan]
      [  771.369454]  notifier_call_chain+0x4c/0x70
      [  771.370579]  rollback_registered_many+0x2f5/0x530
      [  771.371719]  rollback_registered+0x56/0x90
      [  771.372843]  unregister_netdevice_queue+0x73/0xb0
      [  771.373982]  unregister_netdev+0x18/0x20
      [  771.375168]  mlx5e_vport_rep_unload+0x56/0xc0 [mlx5_core]
      [  771.376327]  esw_offloads_disable+0x81/0x90 [mlx5_core]
      [  771.377512]  mlx5_eswitch_disable_locked.cold+0xcb/0x1af [mlx5_core]
      [  771.378679]  mlx5_eswitch_disable+0x44/0x60 [mlx5_core]
      [  771.379822]  mlx5_device_disable_sriov+0xad/0xb0 [mlx5_core]
      [  771.380968]  mlx5_core_sriov_configure+0xc1/0xe0 [mlx5_core]
      [  771.382087]  sriov_numvfs_store+0xfc/0x130
      [  771.383195]  kernfs_fop_write+0xce/0x1b0
      [  771.384302]  vfs_write+0xb6/0x1a0
      [  771.385410]  ksys_write+0x5f/0xe0
      [  771.386500]  do_syscall_64+0x5b/0x1d0
      [  771.387569]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: 0fdcf78d ("net: use flow_indr_dev_setup_offload()")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c005110
    • W
      net: flow_offload: fix flow_indr_dev_unregister path · a1db2178
      wenxu 提交于
      If the representor is removed, then identify the indirect flow_blocks
      that need to be removed by the release callback and the port representor
      structure. To identify the port representor structure, a new
      indr.cb_priv field needs to be introduced. The flow_block also needs to
      be removed from the driver list from the cleanup path.
      
      Fixes: 1fac52da ("net: flow_offload: consolidate indirect flow_block infrastructure")
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1db2178
    • W
      flow_offload: use flow_indr_block_cb_alloc/remove function · 66f1939a
      wenxu 提交于
      Prepare fix the bug in the next patch. use flow_indr_block_cb_alloc/remove
      function and remove the __flow_block_indr_binding.
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66f1939a
    • W
      flow_offload: add flow_indr_block_cb_alloc/remove function · 26f2eb27
      wenxu 提交于
      Add flow_indr_block_cb_alloc/remove function for next fix patch.
      Signed-off-by: Nwenxu <wenxu@ucloud.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26f2eb27
  4. 19 6月, 2020 10 次提交
    • E
      net: increment xmit_recursion level in dev_direct_xmit() · 0ad6f6e7
      Eric Dumazet 提交于
      Back in commit f60e5990 ("ipv6: protect skb->sk accesses
      from recursive dereference inside the stack") Hannes added code
      so that IPv6 stack would not trust skb->sk for typical cases
      where packet goes through 'standard' xmit path (__dev_queue_xmit())
      
      Alas af_packet had a dev_direct_xmit() path that was not
      dealing yet with xmit_recursion level.
      
      Also change sk_mc_loop() to dump a stack once only.
      
      Without this patch, syzbot was able to trigger :
      
      [1]
      [  153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
      [  153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
      [  153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
      [  153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
      [  153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 <0f> 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
      [  153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
      [  153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
      [  153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
      [  153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
      [  153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
      [  153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
      [  153.567390] FS:  00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
      [  153.567390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
      [  153.567391] Call Trace:
      [  153.567391]  ip6_finish_output2+0x34e/0x550
      [  153.567391]  __ip6_finish_output+0xe7/0x110
      [  153.567391]  ip6_finish_output+0x2d/0xb0
      [  153.567392]  ip6_output+0x77/0x120
      [  153.567392]  ? __ip6_finish_output+0x110/0x110
      [  153.567392]  ip6_local_out+0x3d/0x50
      [  153.567392]  ipvlan_queue_xmit+0x56c/0x5e0
      [  153.567393]  ? ksize+0x19/0x30
      [  153.567393]  ipvlan_start_xmit+0x18/0x50
      [  153.567393]  dev_direct_xmit+0xf3/0x1c0
      [  153.567393]  packet_direct_xmit+0x69/0xa0
      [  153.567394]  packet_sendmsg+0xbf0/0x19b0
      [  153.567394]  ? plist_del+0x62/0xb0
      [  153.567394]  sock_sendmsg+0x65/0x70
      [  153.567394]  sock_write_iter+0x93/0xf0
      [  153.567394]  new_sync_write+0x18e/0x1a0
      [  153.567395]  __vfs_write+0x29/0x40
      [  153.567395]  vfs_write+0xb9/0x1b0
      [  153.567395]  ksys_write+0xb1/0xe0
      [  153.567395]  __x64_sys_write+0x1a/0x20
      [  153.567395]  do_syscall_64+0x43/0x70
      [  153.567396]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  153.567396] RIP: 0033:0x453549
      [  153.567396] Code: Bad RIP value.
      [  153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
      [  153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
      [  153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
      [  153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
      [  153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
      [  153.567399] ---[ end trace c1d5ae2b1059ec62 ]---
      
      f60e5990 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad6f6e7
    • A
      net: ethtool: add missing NETIF_F_GSO_FRAGLIST feature string · eddbf5d0
      Alexander Lobakin 提交于
      Commit 3b335832 ("net: Add fraglist GRO/GSO feature flags") missed
      an entry for NETIF_F_GSO_FRAGLIST in netdev_features_strings array. As
      a result, fraglist GSO feature is not shown in 'ethtool -k' output and
      can't be toggled on/off.
      The fix is trivial.
      
      Fixes: 3b335832 ("net: Add fraglist GRO/GSO feature flags")
      Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eddbf5d0
    • P
      mptcp: drop MP_JOIN request sock on syn cookies · 9e365ff5
      Paolo Abeni 提交于
      Currently any MPTCP socket using syn cookies will fallback to
      TCP at 3rd ack time. In case of MP_JOIN requests, the RFC mandate
      closing the child and sockets, but the existing error paths
      do not handle the syncookie scenario correctly.
      
      Address the issue always forcing the child shutdown in case of
      MP_JOIN fallback.
      
      Fixes: ae2dd716 ("mptcp: handle tcp fallback when using syn cookies")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e365ff5
    • P
      mptcp: cache msk on MP_JOIN init_req · 8fd4de12
      Paolo Abeni 提交于
      The msk ownership is transferred to the child socket at
      3rd ack time, so that we avoid more lookups later. If the
      request does not reach the 3rd ack, the MSK reference is
      dropped at request sock release time.
      
      As a side effect, fallback is now tracked by a NULL msk
      reference instead of zeroed 'mp_join' field. This will
      simplify the next patch.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fd4de12
    • G
      net: Fix the arp error in some cases · 5eea3a63
      guodeqing 提交于
      ie.,
      $ ifconfig eth0 6.6.6.6 netmask 255.255.255.0
      
      $ ip rule add from 6.6.6.6 table 6666
      
      $ ip route add 9.9.9.9 via 6.6.6.6
      
      $ ping -I 6.6.6.6 9.9.9.9
      PING 9.9.9.9 (9.9.9.9) from 6.6.6.6 : 56(84) bytes of data.
      
      3 packets transmitted, 0 received, 100% packet loss, time 2079ms
      
      $ arp
      Address     HWtype  HWaddress           Flags Mask            Iface
      6.6.6.6             (incomplete)                              eth0
      
      The arp request address is error, this is because fib_table_lookup in
      fib_check_nh lookup the destnation 9.9.9.9 nexthop, the scope of
      the fib result is RT_SCOPE_LINK,the correct scope is RT_SCOPE_HOST.
      Here I add a check of whether this is RT_TABLE_MAIN to solve this problem.
      
      Fixes: 3bfd8472 ("net: Use passed in table for nexthop lookups")
      Signed-off-by: Nguodeqing <geffrey.guo@huawei.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5eea3a63
    • D
      net/sched: act_gate: fix configuration of the periodic timer · c362a06e
      Davide Caratti 提交于
      assigning a dummy value of 'clock_id' to avoid cancellation of the cycle
      timer before its initialization was a temporary solution, and we still
      need to handle the case where act_gate timer parameters are changed by
      commands like the following one:
      
       # tc action replace action gate <parameters>
      
      the fix consists in the following items:
      
      1) remove the workaround assignment of 'clock_id', and init the list of
         entries before the first error path after IDR atomic check/allocation
      2) validate 'clock_id' earlier: there is no need to do IDR atomic
         check/allocation if we know that 'clock_id' is a bad value
      3) use a dedicated function, 'gate_setup_timer()', to ensure that the
         timer is cancelled and re-initialized on action overwrite, and also
         ensure we initialize the timer in the error path of tcf_gate_init()
      
      v3: improve comment in the error path of tcf_gate_init() (thanks to
          Vladimir Oltean)
      v2: avoid 'goto' in gate_setup_timer (thanks to Cong Wang)
      
      CC: Ivan Vecera <ivecera@redhat.com>
      Fixes: a01c2454 ("net/sched: fix a couple of splats in the error path of tfc_gate_init()")
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c362a06e
    • D
      net/sched: act_gate: fix NULL dereference in tcf_gate_init() · 7024339a
      Davide Caratti 提交于
      it is possible to see a KASAN use-after-free, immediately followed by a
      NULL dereference crash, with the following command:
      
       # tc action add action gate index 3 cycle-time 100000000ns \
       > cycle-time-ext 100000000ns clockid CLOCK_TAI
      
       BUG: KASAN: use-after-free in tcf_action_init_1+0x8eb/0x960
       Write of size 1 at addr ffff88810a5908bc by task tc/883
      
       CPU: 0 PID: 883 Comm: tc Not tainted 5.7.0+ #188
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       Call Trace:
        dump_stack+0x75/0xa0
        print_address_description.constprop.6+0x1a/0x220
        kasan_report.cold.9+0x37/0x7c
        tcf_action_init_1+0x8eb/0x960
        tcf_action_init+0x157/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x120/0x380
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [...]
      
       KASAN: null-ptr-deref in range [0x0000000000000070-0x0000000000000077]
       CPU: 0 PID: 883 Comm: tc Tainted: G    B             5.7.0+ #188
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:tcf_action_fill_size+0xa3/0xf0
       [....]
       RSP: 0018:ffff88813a48f250 EFLAGS: 00010212
       RAX: dffffc0000000000 RBX: 0000000000000094 RCX: ffffffffa47c3eb6
       RDX: 000000000000000e RSI: 0000000000000008 RDI: 0000000000000070
       RBP: ffff88810a590800 R08: 0000000000000004 R09: ffffed1027491e03
       R10: 0000000000000003 R11: ffffed1027491e03 R12: 0000000000000000
       R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88810a590800
       FS:  00007f62cae8ce40(0000) GS:ffff888147c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f62c9d20a10 CR3: 000000013a52a000 CR4: 0000000000340ef0
       Call Trace:
        tcf_action_init+0x172/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x120/0x380
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      this is caused by the test on 'cycletime_ext', that is still unassigned
      when the action is newly created. This makes the action .init() return 0
      without calling tcf_idr_insert(), hence the UAF + crash.
      
      rework the logic that prevents zero values of cycle-time, as follows:
      
      1) 'tcfg_cycletime_ext' seems to be unused in the action software path,
         and it was already possible by other means to obtain non-zero
         cycletime and zero cycletime-ext. So, removing that test should not
         cause any damage.
      2) while at it, we must prevent overwriting configuration data with wrong
         ones: use a temporary variable for 'tcfg_cycletime', and validate it
         preserving the original semantic (that allowed computing the cycle
         time as the sum of all intervals, when not specified by
         TCA_GATE_CYCLE_TIME).
      3) remove the test on 'tcfg_cycletime', no more useful, and avoid
         returning -EFAULT, which did not seem an appropriate return value for
         a wrong netlink attribute.
      
      v3: fix uninitialized 'cycletime' (thanks to Vladimir Oltean)
      v2: remove useless 'return;' at the end of void gate_get_start_time()
      
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      CC: Ivan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7024339a
    • T
      ip_tunnel: fix use-after-free in ip_tunnel_lookup() · ba61539c
      Taehee Yoo 提交于
      In the datapath, the ip_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add gre1 type gre local 10.0.0.1 \
      	    remote 10.0.0.2
          ip netns exec A ip link set gre1 up
          ip netns exec A ip a a 10.0.100.1/24 dev gre1
          ip netns exec A ip a a 10.0.0.1/24 dev eth0
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add gre1 type gre local 10.0.0.2 \
      	    remote 10.0.0.1
          ip netns exec B ip link set gre1 up
          ip netns exec B ip a a 10.0.100.2/24 dev gre1
          ip netns exec B ip a a 10.0.0.2/24 dev eth1
          ip netns exec A hping3 10.0.100.2 -2 --flood -d 60000 &
          ip netns del B
      
      Splat looks like:
      [   77.793450][    C3] ==================================================================
      [   77.794702][    C3] BUG: KASAN: use-after-free in ip_tunnel_lookup+0xcc4/0xf30
      [   77.795573][    C3] Read of size 4 at addr ffff888060bd9c84 by task hping3/2905
      [   77.796398][    C3]
      [   77.796664][    C3] CPU: 3 PID: 2905 Comm: hping3 Not tainted 5.8.0-rc1+ #616
      [   77.797474][    C3] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   77.798453][    C3] Call Trace:
      [   77.798815][    C3]  <IRQ>
      [   77.799142][    C3]  dump_stack+0x9d/0xdb
      [   77.799605][    C3]  print_address_description.constprop.7+0x2cc/0x450
      [   77.800365][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.800908][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.801517][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.802145][    C3]  kasan_report+0x154/0x190
      [   77.802821][    C3]  ? ip_tunnel_lookup+0xcc4/0xf30
      [   77.803503][    C3]  ip_tunnel_lookup+0xcc4/0xf30
      [   77.804165][    C3]  __ipgre_rcv+0x1ab/0xaa0 [ip_gre]
      [   77.804862][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.805621][    C3]  gre_rcv+0x304/0x1910 [ip_gre]
      [   77.806293][    C3]  ? lock_acquire+0x1a9/0x870
      [   77.806925][    C3]  ? gre_rcv+0xfe/0x354 [gre]
      [   77.807559][    C3]  ? erspan_xmit+0x2e60/0x2e60 [ip_gre]
      [   77.808305][    C3]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   77.809032][    C3]  ? rcu_read_lock_held+0x90/0xa0
      [   77.809713][    C3]  gre_rcv+0x1b8/0x354 [gre]
      [ ... ]
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba61539c
    • T
      ip6_gre: fix use-after-free in ip6gre_tunnel_lookup() · dafabb65
      Taehee Yoo 提交于
      In the datapath, the ip6gre_tunnel_lookup() is used and it internally uses
      fallback tunnel device pointer, which is fb_tunnel_dev.
      This pointer variable should be set to NULL when a fb interface is deleted.
      But there is no routine to set fb_tunnel_dev pointer to NULL.
      So, this pointer will be still used after interface is deleted and
      it eventually results in the use-after-free problem.
      
      Test commands:
          ip netns add A
          ip netns add B
          ip link add eth0 type veth peer name eth1
          ip link set eth0 netns A
          ip link set eth1 netns B
      
          ip netns exec A ip link set lo up
          ip netns exec A ip link set eth0 up
          ip netns exec A ip link add ip6gre1 type ip6gre local fc:0::1 \
      	    remote fc:0::2
          ip netns exec A ip -6 a a fc:100::1/64 dev ip6gre1
          ip netns exec A ip link set ip6gre1 up
          ip netns exec A ip -6 a a fc:0::1/64 dev eth0
          ip netns exec A ip link set ip6gre0 up
      
          ip netns exec B ip link set lo up
          ip netns exec B ip link set eth1 up
          ip netns exec B ip link add ip6gre1 type ip6gre local fc:0::2 \
      	    remote fc:0::1
          ip netns exec B ip -6 a a fc:100::2/64 dev ip6gre1
          ip netns exec B ip link set ip6gre1 up
          ip netns exec B ip -6 a a fc:0::2/64 dev eth1
          ip netns exec B ip link set ip6gre0 up
          ip netns exec A ping fc:100::2 -s 60000 &
          ip netns del B
      
      Splat looks like:
      [   73.087285][    C1] BUG: KASAN: use-after-free in ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.088361][    C1] Read of size 4 at addr ffff888040559218 by task ping/1429
      [   73.089317][    C1]
      [   73.089638][    C1] CPU: 1 PID: 1429 Comm: ping Not tainted 5.7.0+ #602
      [   73.090531][    C1] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   73.091725][    C1] Call Trace:
      [   73.092160][    C1]  <IRQ>
      [   73.092556][    C1]  dump_stack+0x96/0xdb
      [   73.093122][    C1]  print_address_description.constprop.6+0x2cc/0x450
      [   73.094016][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.094894][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.095767][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.096619][    C1]  kasan_report+0x154/0x190
      [   73.097209][    C1]  ? ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.097989][    C1]  ip6gre_tunnel_lookup+0x1064/0x13f0 [ip6_gre]
      [   73.098750][    C1]  ? gre_del_protocol+0x60/0x60 [gre]
      [   73.099500][    C1]  gre_rcv+0x1c5/0x1450 [ip6_gre]
      [   73.100199][    C1]  ? ip6gre_header+0xf00/0xf00 [ip6_gre]
      [   73.100985][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.101830][    C1]  ? ip6_input_finish+0x5/0xf0
      [   73.102483][    C1]  ip6_protocol_deliver_rcu+0xcbb/0x1510
      [   73.103296][    C1]  ip6_input_finish+0x5b/0xf0
      [   73.103920][    C1]  ip6_input+0xcd/0x2c0
      [   73.104473][    C1]  ? ip6_input_finish+0xf0/0xf0
      [   73.105115][    C1]  ? rcu_read_lock_held+0x90/0xa0
      [   73.105783][    C1]  ? rcu_read_lock_sched_held+0xc0/0xc0
      [   73.106548][    C1]  ipv6_rcv+0x1f1/0x300
      [ ... ]
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dafabb65
    • Y
      net: fix memleak in register_netdevice() · 814152a8
      Yang Yingliang 提交于
      I got a memleak report when doing some fuzz test:
      
      unreferenced object 0xffff888112584000 (size 13599):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
          00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000002f60ba65>] __kmalloc_node+0x309/0x3a0
          [<0000000075b211ec>] kvmalloc_node+0x7f/0xc0
          [<00000000d3a97396>] alloc_netdev_mqs+0x76/0xfc0
          [<00000000609c3655>] __tun_chr_ioctl+0x1456/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff888111845cc0 (size 8):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 8 bytes):
          74 61 70 30 00 88 ff ff                          tap0....
        backtrace:
          [<000000004c159777>] kstrdup+0x35/0x70
          [<00000000d8b496ad>] kstrdup_const+0x3d/0x50
          [<00000000494e884a>] kvasprintf_const+0xf1/0x180
          [<0000000097880a2b>] kobject_set_name_vargs+0x56/0x140
          [<000000008fbdfc7b>] dev_set_name+0xab/0xe0
          [<000000005b99e3b4>] netdev_register_kobject+0xc0/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      unreferenced object 0xffff88811886d800 (size 512):
        comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
        hex dump (first 32 bytes):
          00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
          ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
        backtrace:
          [<0000000050315800>] device_add+0x61e/0x1950
          [<0000000021008dfb>] netdev_register_kobject+0x17e/0x390
          [<00000000602704fe>] register_netdevice+0xb61/0x1250
          [<000000002b7ca244>] __tun_chr_ioctl+0x1cd1/0x3d70
          [<000000001127ca24>] ksys_ioctl+0xe5/0x130
          [<00000000b7d5e66a>] __x64_sys_ioctl+0x6f/0xb0
          [<00000000e1023498>] do_syscall_64+0x56/0xa0
          [<000000009ec0eb12>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      If call_netdevice_notifiers() failed, then rollback_registered()
      calls netdev_unregister_kobject() which holds the kobject. The
      reference cannot be put because the netdev won't be add to todo
      list, so it will leads a memleak, we need put the reference to
      avoid memleak.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      814152a8
  5. 18 6月, 2020 3 次提交
  6. 17 6月, 2020 1 次提交
    • E
      tcp: grow window for OOO packets only for SACK flows · 66205121
      Eric Dumazet 提交于
      Back in 2013, we made a change that broke fast retransmit
      for non SACK flows.
      
      Indeed, for these flows, a sender needs to receive three duplicate
      ACK before starting fast retransmit. Sending ACK with different
      receive window do not count.
      
      Even if enabling SACK is strongly recommended these days,
      there still are some cases where it has to be disabled.
      
      Not increasing the window seems better than having to
      rely on RTO.
      
      After the fix, following packetdrill test gives :
      
      // Initialize connection
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1000,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,wscale 8>
         +0 < . 1:1(0) ack 1 win 514
      
         +0 accept(3, ..., ...) = 4
      
         +0 < . 1:1001(1000) ack 1 win 514
      // Quick ack
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 2001:3001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 3001:4001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
         +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 4001:5001(1000) ack 1 win 514
      // DUPACK : Normally we should not change the window
          +0 > . 1:1(0) ack 1001 win 264
      
         +0 < . 1001:2001(1000) ack 1 win 514
      // Hole is repaired.
         +0 > . 1:1(0) ack 5001 win 272
      
      Fixes: 4e4f1fc2 ("tcp: properly increase rcv_ssthresh for ofo packets")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66205121
  7. 16 6月, 2020 7 次提交
  8. 15 6月, 2020 3 次提交
  9. 14 6月, 2020 2 次提交
  10. 13 6月, 2020 4 次提交
  11. 12 6月, 2020 3 次提交
    • P
      netfilter: nf_tables: hook list memleak in flowtable deletion · 3003055f
      Pablo Neira Ayuso 提交于
      After looking up for the flowtable hooks that need to be removed,
      release the hook objects in the deletion list. The error path needs to
      released these hook objects too.
      
      Fixes: abadb2f8 ("netfilter: nf_tables: delete devices from flowtable")
      Reported-by: syzbot+eb9d5924c51d6d59e094@syzkaller.appspotmail.com
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3003055f
    • D
      rxrpc: Fix race between incoming ACK parser and retransmitter · 2ad6691d
      David Howells 提交于
      There's a race between the retransmission code and the received ACK parser.
      The problem is that the retransmission loop has to drop the lock under
      which it is iterating through the transmission buffer in order to transmit
      a packet, but whilst the lock is dropped, the ACK parser can crank the Tx
      window round and discard the packets from the buffer.
      
      The retransmission code then updated the annotations for the wrong packet
      and a later retransmission thought it had to retransmit a packet that
      wasn't there, leading to a NULL pointer dereference.
      
      Fix this by:
      
       (1) Moving the annotation change to before we drop the lock prior to
           transmission.  This means we can't vary the annotation depending on
           the outcome of the transmission, but that's fine - we'll retransmit
           again later if it failed now.
      
       (2) Skipping the packet if the skb pointer is NULL.
      
      The following oops was seen:
      
      	BUG: kernel NULL pointer dereference, address: 000000000000002d
      	Workqueue: krxrpcd rxrpc_process_call
      	RIP: 0010:rxrpc_get_skb+0x14/0x8a
      	...
      	Call Trace:
      	 rxrpc_resend+0x331/0x41e
      	 ? get_vtime_delta+0x13/0x20
      	 rxrpc_process_call+0x3c0/0x4ac
      	 process_one_work+0x18f/0x27f
      	 worker_thread+0x1a3/0x247
      	 ? create_worker+0x17d/0x17d
      	 kthread+0xe6/0xeb
      	 ? kthread_delayed_work_timer_fn+0x83/0x83
      	 ret_from_fork+0x1f/0x30
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ad6691d
    • L
      xdp: Fix xsk_generic_xmit errno · aa2cad06
      Li RongQing 提交于
      Propagate sock_alloc_send_skb error code, not set it to
      EAGAIN unconditionally, when fail to allocate skb, which
      might cause that user space unnecessary loops.
      
      Fixes: 35fcde7f ("xsk: support for Tx")
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/1591852266-24017-1-git-send-email-lirongqing@baidu.com
      aa2cad06