1. 14 7月, 2022 1 次提交
  2. 10 5月, 2022 1 次提交
    • E
      ipv6: annotate accesses to fn->fn_sernum · 90b4738b
      Eric Dumazet 提交于
      stable inclusion
      from stable-v5.10.96
      commit 4cd0ef621509950b30503a4d2fd7047cb7eaf0de
      bugzilla: https://gitee.com/openeuler/kernel/issues/I55NWB
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4cd0ef621509950b30503a4d2fd7047cb7eaf0de
      
      --------------------------------
      
      commit aafc2e32 upstream.
      
      struct fib6_node's fn_sernum field can be
      read while other threads change it.
      
      Add READ_ONCE()/WRITE_ONCE() annotations.
      
      Do not change existing smp barriers in fib6_get_cookie_safe()
      and __fib6_update_sernum_upto_root()
      
      syzbot reported:
      
      BUG: KCSAN: data-race in fib6_clean_node / inet6_csk_route_socket
      
      write to 0xffff88813df62e2c of 4 bytes by task 1920 on cpu 1:
       fib6_clean_node+0xc2/0x260 net/ipv6/ip6_fib.c:2178
       fib6_walk_continue+0x38e/0x430 net/ipv6/ip6_fib.c:2112
       fib6_walk net/ipv6/ip6_fib.c:2160 [inline]
       fib6_clean_tree net/ipv6/ip6_fib.c:2240 [inline]
       __fib6_clean_all+0x1a9/0x2e0 net/ipv6/ip6_fib.c:2256
       fib6_flush_trees+0x6c/0x80 net/ipv6/ip6_fib.c:2281
       rt_genid_bump_ipv6 include/net/net_namespace.h:488 [inline]
       addrconf_dad_completed+0x57f/0x870 net/ipv6/addrconf.c:4230
       addrconf_dad_work+0x908/0x1170
       process_one_work+0x3f6/0x960 kernel/workqueue.c:2307
       worker_thread+0x616/0xa70 kernel/workqueue.c:2454
       kthread+0x1bf/0x1e0 kernel/kthread.c:359
       ret_from_fork+0x1f/0x30
      
      read to 0xffff88813df62e2c of 4 bytes by task 15701 on cpu 0:
       fib6_get_cookie_safe include/net/ip6_fib.h:285 [inline]
       rt6_get_cookie include/net/ip6_fib.h:306 [inline]
       ip6_dst_store include/net/ip6_route.h:234 [inline]
       inet6_csk_route_socket+0x352/0x3c0 net/ipv6/inet6_connection_sock.c:109
       inet6_csk_xmit+0x91/0x1e0 net/ipv6/inet6_connection_sock.c:121
       __tcp_transmit_skb+0x1323/0x1840 net/ipv4/tcp_output.c:1402
       tcp_transmit_skb net/ipv4/tcp_output.c:1420 [inline]
       tcp_write_xmit+0x1450/0x4460 net/ipv4/tcp_output.c:2680
       __tcp_push_pending_frames+0x68/0x1c0 net/ipv4/tcp_output.c:2864
       tcp_push+0x2d9/0x2f0 net/ipv4/tcp.c:725
       mptcp_push_release net/mptcp/protocol.c:1491 [inline]
       __mptcp_push_pending+0x46c/0x490 net/mptcp/protocol.c:1578
       mptcp_sendmsg+0x9ec/0xa50 net/mptcp/protocol.c:1764
       inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg net/socket.c:725 [inline]
       kernel_sendmsg+0x97/0xd0 net/socket.c:745
       sock_no_sendpage+0x84/0xb0 net/core/sock.c:3086
       inet_sendpage+0x9d/0xc0 net/ipv4/af_inet.c:834
       kernel_sendpage+0x187/0x200 net/socket.c:3492
       sock_sendpage+0x5a/0x70 net/socket.c:1007
       pipe_to_sendpage+0x128/0x160 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x207/0x500 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0x94/0xd0 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       direct_splice_actor+0x80/0xa0 fs/splice.c:936
       splice_direct_to_actor+0x345/0x650 fs/splice.c:891
       do_splice_direct+0x106/0x190 fs/splice.c:979
       do_sendfile+0x675/0xc40 fs/read_write.c:1245
       __do_sys_sendfile64 fs/read_write.c:1310 [inline]
       __se_sys_sendfile64 fs/read_write.c:1296 [inline]
       __x64_sys_sendfile64+0x102/0x140 fs/read_write.c:1296
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x0000026f -> 0x00000271
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 15701 Comm: syz-executor.2 Not tainted 5.16.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      The Fixes tag I chose is probably arbitrary, I do not think
      we need to backport this patch to older kernels.
      
      Fixes: c5cff856 ("ipv6: add rcu grace period before freeing fib6_node")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20220120174112.1126644-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYu Liao <liaoyu15@huawei.com>
      Reviewed-by: NWei Li <liwei391@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      90b4738b
  3. 08 3月, 2022 1 次提交
  4. 28 1月, 2022 5 次提交
  5. 14 1月, 2022 1 次提交
  6. 15 11月, 2021 1 次提交
  7. 19 10月, 2021 2 次提交
  8. 15 10月, 2021 1 次提交
  9. 15 6月, 2021 1 次提交
    • C
      ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions · 6b967910
      Coco Li 提交于
      stable inclusion
      from stable-5.10.43
      commit 09870235827451409ff546b073d754a19fd17e2e
      bugzilla: 109284
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 821bbf79 ]
      
      Reported by syzbot:
      HEAD commit:    90c911ad Merge tag 'fixes' of git://git.kernel.org/pub/scm..
      git tree:       git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
      dashboard link: https://syzkaller.appspot.com/bug?extid=123aa35098fd3c000eb7
      compiler:       Debian clang version 11.0.1-2
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
      BUG: KASAN: slab-out-of-bounds in fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
      Read of size 8 at addr ffff8880145c78f8 by task syz-executor.4/17760
      
      CPU: 0 PID: 17760 Comm: syz-executor.4 Not tainted 5.12.0-rc8-syzkaller #0
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x202/0x31e lib/dump_stack.c:120
       print_address_description+0x5f/0x3b0 mm/kasan/report.c:232
       __kasan_report mm/kasan/report.c:399 [inline]
       kasan_report+0x15c/0x200 mm/kasan/report.c:416
       fib6_nh_get_excptn_bucket net/ipv6/route.c:1604 [inline]
       fib6_nh_flush_exceptions+0xbd/0x360 net/ipv6/route.c:1732
       fib6_nh_release+0x9a/0x430 net/ipv6/route.c:3536
       fib6_info_destroy_rcu+0xcb/0x1c0 net/ipv6/ip6_fib.c:174
       rcu_do_batch kernel/rcu/tree.c:2559 [inline]
       rcu_core+0x8f6/0x1450 kernel/rcu/tree.c:2794
       __do_softirq+0x372/0x7a6 kernel/softirq.c:345
       invoke_softirq kernel/softirq.c:221 [inline]
       __irq_exit_rcu+0x22c/0x260 kernel/softirq.c:422
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:434
       sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100
       </IRQ>
       asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
      RIP: 0010:lock_acquire+0x1f6/0x720 kernel/locking/lockdep.c:5515
      Code: f6 84 24 a1 00 00 00 02 0f 85 8d 02 00 00 f7 c3 00 02 00 00 49 bd 00 00 00 00 00 fc ff df 74 01 fb 48 c7 44 24 40 0e 36 e0 45 <4b> c7 44 3d 00 00 00 00 00 4b c7 44 3d 09 00 00 00 00 43 c7 44 3d
      RSP: 0018:ffffc90009e06560 EFLAGS: 00000206
      RAX: 1ffff920013c0cc0 RBX: 0000000000000246 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffffc90009e066e0 R08: dffffc0000000000 R09: fffffbfff1f992b1
      R10: fffffbfff1f992b1 R11: 0000000000000000 R12: 0000000000000000
      R13: dffffc0000000000 R14: 0000000000000000 R15: 1ffff920013c0cb4
       rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:267
       rcu_read_lock include/linux/rcupdate.h:656 [inline]
       ext4_get_group_info+0xea/0x340 fs/ext4/ext4.h:3231
       ext4_mb_prefetch+0x123/0x5d0 fs/ext4/mballoc.c:2212
       ext4_mb_regular_allocator+0x8a5/0x28f0 fs/ext4/mballoc.c:2379
       ext4_mb_new_blocks+0xc6e/0x24f0 fs/ext4/mballoc.c:4982
       ext4_ext_map_blocks+0x2be3/0x7210 fs/ext4/extents.c:4238
       ext4_map_blocks+0xab3/0x1cb0 fs/ext4/inode.c:638
       ext4_getblk+0x187/0x6c0 fs/ext4/inode.c:848
       ext4_bread+0x2a/0x1c0 fs/ext4/inode.c:900
       ext4_append+0x1a4/0x360 fs/ext4/namei.c:67
       ext4_init_new_dir+0x337/0xa10 fs/ext4/namei.c:2768
       ext4_mkdir+0x4b8/0xc00 fs/ext4/namei.c:2814
       vfs_mkdir+0x45b/0x640 fs/namei.c:3819
       ovl_do_mkdir fs/overlayfs/overlayfs.h:161 [inline]
       ovl_mkdir_real+0x53/0x1a0 fs/overlayfs/dir.c:146
       ovl_create_real+0x280/0x490 fs/overlayfs/dir.c:193
       ovl_workdir_create+0x425/0x600 fs/overlayfs/super.c:788
       ovl_make_workdir+0xed/0x1140 fs/overlayfs/super.c:1355
       ovl_get_workdir fs/overlayfs/super.c:1492 [inline]
       ovl_fill_super+0x39ee/0x5370 fs/overlayfs/super.c:2035
       mount_nodev+0x52/0xe0 fs/super.c:1413
       legacy_get_tree+0xea/0x180 fs/fs_context.c:592
       vfs_get_tree+0x86/0x270 fs/super.c:1497
       do_new_mount fs/namespace.c:2903 [inline]
       path_mount+0x196f/0x2be0 fs/namespace.c:3233
       do_mount fs/namespace.c:3246 [inline]
       __do_sys_mount fs/namespace.c:3454 [inline]
       __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3431
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f68f2b87188 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 000000000056bf60 RCX: 00000000004665f9
      RDX: 00000000200000c0 RSI: 0000000020000000 RDI: 000000000040000a
      RBP: 00000000004bfbb9 R08: 0000000020000100 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf60
      R13: 00007ffe19002dff R14: 00007f68f2b87300 R15: 0000000000022000
      
      Allocated by task 17768:
       kasan_save_stack mm/kasan/common.c:38 [inline]
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:427 [inline]
       ____kasan_kmalloc+0xc2/0xf0 mm/kasan/common.c:506
       kasan_kmalloc include/linux/kasan.h:233 [inline]
       __kmalloc+0xb4/0x380 mm/slub.c:4055
       kmalloc include/linux/slab.h:559 [inline]
       kzalloc include/linux/slab.h:684 [inline]
       fib6_info_alloc+0x2c/0xd0 net/ipv6/ip6_fib.c:154
       ip6_route_info_create+0x55d/0x1a10 net/ipv6/route.c:3638
       ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
       inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
       rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2433
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Last potentially related work creation:
       kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
       kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
       __call_rcu kernel/rcu/tree.c:3039 [inline]
       call_rcu+0x1b1/0xa30 kernel/rcu/tree.c:3114
       fib6_info_release include/net/ip6_fib.h:337 [inline]
       ip6_route_info_create+0x10c4/0x1a10 net/ipv6/route.c:3718
       ip6_route_add+0x22/0x120 net/ipv6/route.c:3728
       inet6_rtm_newroute+0x2cd/0x2260 net/ipv6/route.c:5352
       rtnetlink_rcv_msg+0xb34/0xe70 net/core/rtnetlink.c:5553
       netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2502
       netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
       netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1338
       netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1927
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg net/socket.c:674 [inline]
       ____sys_sendmsg+0x5a2/0x900 net/socket.c:2350
       ___sys_sendmsg net/socket.c:2404 [inline]
       __sys_sendmsg+0x319/0x400 net/socket.c:2433
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Second to last potentially related work creation:
       kasan_save_stack+0x27/0x50 mm/kasan/common.c:38
       kasan_record_aux_stack+0xee/0x120 mm/kasan/generic.c:345
       insert_work+0x54/0x400 kernel/workqueue.c:1331
       __queue_work+0x981/0xcc0 kernel/workqueue.c:1497
       queue_work_on+0x111/0x200 kernel/workqueue.c:1524
       queue_work include/linux/workqueue.h:507 [inline]
       call_usermodehelper_exec+0x283/0x470 kernel/umh.c:433
       kobject_uevent_env+0x1349/0x1730 lib/kobject_uevent.c:617
       kvm_uevent_notify_change+0x309/0x3b0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4809
       kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:877 [inline]
       kvm_put_kvm+0x9c/0xd10 arch/x86/kvm/../../../virt/kvm/kvm_main.c:920
       kvm_vcpu_release+0x53/0x60 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3120
       __fput+0x352/0x7b0 fs/file_table.c:280
       task_work_run+0x146/0x1c0 kernel/task_work.c:140
       tracehook_notify_resume include/linux/tracehook.h:189 [inline]
       exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
       exit_to_user_mode_prepare+0x10b/0x1e0 kernel/entry/common.c:208
       __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
       syscall_exit_to_user_mode+0x26/0x70 kernel/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff8880145c7800
       which belongs to the cache kmalloc-192 of size 192
      The buggy address is located 56 bytes to the right of
       192-byte region [ffff8880145c7800, ffff8880145c78c0)
      The buggy address belongs to the page:
      page:ffffea00005171c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x145c7
      flags: 0xfff00000000200(slab)
      raw: 00fff00000000200 ffffea00006474c0 0000000200000002 ffff888010c41a00
      raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880145c7780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8880145c7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8880145c7880: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
                                                                      ^
       ffff8880145c7900: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8880145c7980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      ==================================================================
      
      In the ip6_route_info_create function, in the case that the nh pointer
      is not NULL, the fib6_nh in fib6_info has not been allocated.
      Therefore, when trying to free fib6_info in this error case using
      fib6_info_release, the function will call fib6_info_destroy_rcu,
      which it will access fib6_nh_release(f6i->fib6_nh);
      However, f6i->fib6_nh doesn't have any refcount yet given the lack of allocation
      causing the reported memory issue above.
      Therefore, releasing the empty pointer directly instead would be the solution.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Fixes: 706ec919 ("ipv6: Fix nexthop refcnt leak when creating ipv6 route info")
      Signed-off-by: NCoco Li <lixiaoyan@google.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      6b967910
  10. 26 4月, 2021 1 次提交
  11. 19 4月, 2021 1 次提交
  12. 10 10月, 2020 1 次提交
    • G
      net: ipv6: Discard next-hop MTU less than minimum link MTU · 4a65dff8
      Georg Kohmann 提交于
      When a ICMPV6_PKT_TOOBIG report a next-hop MTU that is less than the IPv6
      minimum link MTU, the estimated path MTU is reduced to the minimum link
      MTU. This behaviour breaks TAHI IPv6 Core Conformance Test v6LC4.1.6:
      Packet Too Big Less than IPv6 MTU.
      
      Referring to RFC 8201 section 4: "If a node receives a Packet Too Big
      message reporting a next-hop MTU that is less than the IPv6 minimum link
      MTU, it must discard it. A node must not reduce its estimate of the Path
      MTU below the IPv6 minimum link MTU on receipt of a Packet Too Big
      message."
      
      Drop the path MTU update if reported MTU is less than the minimum link MTU.
      Signed-off-by: NGeorg Kohmann <geokohma@cisco.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      4a65dff8
  13. 22 9月, 2020 1 次提交
  14. 12 9月, 2020 1 次提交
  15. 29 7月, 2020 1 次提交
  16. 26 7月, 2020 1 次提交
  17. 22 7月, 2020 1 次提交
  18. 08 7月, 2020 1 次提交
    • D
      ipv6: Fix use of anycast address with loopback · aea23c32
      David Ahern 提交于
      Thomas reported a regression with IPv6 and anycast using the following
      reproducer:
      
          echo 1 >  /proc/sys/net/ipv6/conf/all/forwarding
          ip -6 a add fc12::1/16 dev lo
          sleep 2
          echo "pinging lo"
          ping6 -c 2 fc12::
      
      The conversion of addrconf_f6i_alloc to use ip6_route_info_create missed
      the use of fib6_is_reject which checks addresses added to the loopback
      interface and sets the REJECT flag as needed. Update fib6_is_reject for
      loopback checks to handle RTF_ANYCAST addresses.
      
      Fixes: c7a1ce39 ("ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create")
      Reported-by: thomas.gambier@nexedi.com
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aea23c32
  19. 07 7月, 2020 1 次提交
    • D
      ipv6: fib6_select_path can not use out path for nexthop objects · 34fe5a1c
      David Ahern 提交于
      Brian reported a crash in IPv6 code when using rpfilter with a setup
      running FRR and external nexthop objects. The root cause of the crash
      is fib6_select_path setting fib6_nh in the result to NULL because of
      an improper check for nexthop objects.
      
      More specifically, rpfilter invokes ip6_route_lookup with flowi6_oif
      set causing fib6_select_path to be called with have_oif_match set.
      fib6_select_path has early check on have_oif_match and jumps to the
      out label which presumes a builtin fib6_nh. This path is invalid for
      nexthop objects; for external nexthops fib6_select_path needs to just
      return if the fib6_nh has already been set in the result otherwise it
      returns after the call to nexthop_path_fib6_result. Update the check
      on have_oif_match to not bail on external nexthops.
      
      Update selftests for this problem.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Reported-by: NBrian Rak <brak@choopa.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34fe5a1c
  20. 24 6月, 2020 1 次提交
    • B
      ipv6: fib6: avoid indirect calls from fib6_rule_lookup · 55cced4f
      Brian Vazquez 提交于
      It was reported that a considerable amount of cycles were spent on the
      expensive indirect calls on fib6_rule_lookup. This patch introduces an
      inline helper called pol_route_func that uses the indirect_call_wrappers
      to avoid the indirect calls.
      
      This patch saves around 50ns per call.
      
      Performance was measured on the receiver by checking the amount of
      syncookies that server was able to generate under a synflood load.
      
      Traffic was generated using trafgen[1] which was pushing around 1Mpps on
      a single queue. Receiver was using only one rx queue which help to
      create a bottle neck and make the experiment rx-bounded.
      
      These are the syncookies generated over 10s from the different runs:
      
      Whithout the patch:
      TcpExtSyncookiesSent            3553749            0.0
      TcpExtSyncookiesSent            3550895            0.0
      TcpExtSyncookiesSent            3553845            0.0
      TcpExtSyncookiesSent            3541050            0.0
      TcpExtSyncookiesSent            3539921            0.0
      TcpExtSyncookiesSent            3557659            0.0
      TcpExtSyncookiesSent            3526812            0.0
      TcpExtSyncookiesSent            3536121            0.0
      TcpExtSyncookiesSent            3529963            0.0
      TcpExtSyncookiesSent            3536319            0.0
      
      With the patch:
      TcpExtSyncookiesSent            3611786            0.0
      TcpExtSyncookiesSent            3596682            0.0
      TcpExtSyncookiesSent            3606878            0.0
      TcpExtSyncookiesSent            3599564            0.0
      TcpExtSyncookiesSent            3601304            0.0
      TcpExtSyncookiesSent            3609249            0.0
      TcpExtSyncookiesSent            3617437            0.0
      TcpExtSyncookiesSent            3608765            0.0
      TcpExtSyncookiesSent            3620205            0.0
      TcpExtSyncookiesSent            3601895            0.0
      
      Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
      the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
      estimate of 50 ns per packet.
      
      [1] http://netsniff-ng.org/
      
      Changelog since v1:
       - Change ordering in the ICW (Paolo Abeni)
      
      Cc: Luigi Rizzo <lrizzo@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NBrian Vazquez <brianvv@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55cced4f
  21. 23 5月, 2020 1 次提交
  22. 19 5月, 2020 1 次提交
  23. 14 5月, 2020 3 次提交
  24. 10 5月, 2020 1 次提交
  25. 09 5月, 2020 2 次提交
  26. 08 5月, 2020 1 次提交
    • M
      Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu" · 09454fd0
      Maciej Żenczykowski 提交于
      This reverts commit 19bda36c:
      
      | ipv6: add mtu lock check in __ip6_rt_update_pmtu
      |
      | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
      | It leaded to that mtu lock doesn't really work when receiving the pkt
      | of ICMPV6_PKT_TOOBIG.
      |
      | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
      | did in __ip_rt_update_pmtu.
      
      The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
      There's already a comment to this effect elsewhere in the kernel:
      
        $ git grep -p -B1 -A3 'RTAX_MTU lock'
        net/ipv6/route.c=4813=
      
        static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg)
        ...
          /* In IPv6 pmtu discovery is not optional,
             so that RTAX_MTU lock cannot disable it.
             We still use this lock to block changes
             caused by addrconf/ndisc.
          */
      
      This reverts to the pre-4.9 behaviour.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Fixes: 19bda36c ("ipv6: add mtu lock check in __ip6_rt_update_pmtu")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09454fd0
  27. 02 5月, 2020 1 次提交
    • D
      ipv6: Use global sernum for dst validation with nexthop objects · 8f34e53b
      David Ahern 提交于
      Nik reported a bug with pcpu dst cache when nexthop objects are
      used illustrated by the following:
          $ ip netns add foo
          $ ip -netns foo li set lo up
          $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
          $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
          $ ip li add veth1 type veth peer name veth2
          $ ip li set veth1 up
          $ ip addr add 2001:db8:10::1/64 dev veth1
          $ ip li set dev veth2 netns foo
          $ ip -netns foo li set veth2 up
          $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
          $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
          $ ip -6 route add 2001:db8:11::1/128 nhid 100
      
          Create a pcpu entry on cpu 0:
          $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
      
          Re-add the route entry:
          $ ip -6 ro del 2001:db8:11::1
          $ ip -6 route add 2001:db8:11::1/128 nhid 100
      
          Route get on cpu 0 returns the stale pcpu:
          $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
          RTNETLINK answers: Network is unreachable
      
          While cpu 1 works:
          $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
          2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium
      
      Conversion of FIB entries to work with external nexthop objects
      missed an important difference between IPv4 and IPv6 - how dst
      entries are invalidated when the FIB changes. IPv4 has a per-network
      namespace generation id (rt_genid) that is bumped on changes to the FIB.
      Checking if a dst_entry is still valid means comparing rt_genid in the
      rtable to the current value of rt_genid for the namespace.
      
      IPv6 also has a per network namespace counter, fib6_sernum, but the
      count is saved per fib6_node. With the per-node counter only dst_entries
      based on fib entries under the node are invalidated when changes are
      made to the routes - limiting the scope of invalidations. IPv6 uses a
      reference in the rt6_info, 'from', to track the corresponding fib entry
      used to create the dst_entry. When validating a dst_entry, the 'from'
      is used to backtrack to the fib6_node and check the sernum of it to the
      cookie passed to the dst_check operation.
      
      With the inline format (nexthop definition inline with the fib6_info),
      dst_entries cached in the fib6_nh have a 1:1 correlation between fib
      entries, nexthop data and dst_entries. With external nexthops, IPv6
      looks more like IPv4 which means multiple fib entries across disparate
      fib6_nodes can all reference the same fib6_nh. That means validation
      of dst_entries based on external nexthops needs to use the IPv4 format
      - the per-network namespace counter.
      
      Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
      rt6_get_cookie to return sernum if it is set and update dst_check for
      IPv6 to look for sernum set and based the check on it if so. Finally,
      rt6_get_pcpu_route needs to validate the cached entry before returning
      a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
      __mkroute_output for IPv4).
      
      This problem only affects routes using the new, external nexthops.
      
      Thanks to the kbuild test robot for catching the IS_ENABLED needed
      around rt_genid_ipv6 before I sent this out.
      
      Fixes: 5b98324e ("ipv6: Allow routes to use nexthop objects")
      Reported-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsahern@kernel.org>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f34e53b
  28. 29 4月, 2020 2 次提交
  29. 27 4月, 2020 1 次提交
  30. 30 3月, 2020 1 次提交
  31. 24 3月, 2020 1 次提交