1. 28 6月, 2019 3 次提交
  2. 25 6月, 2019 1 次提交
  3. 24 6月, 2019 4 次提交
    • W
      ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF · 7d9e5f42
      Wei Wang 提交于
      For tx path, in most cases, we still have to take refcnt on the dst
      cause the caller is caching the dst somewhere. But it still is
      beneficial to make use of RT6_LOOKUP_F_DST_NOREF flag while doing the
      route lookup. It is cause this flag prevents manipulating refcnt on
      net->ipv6.ip6_null_entry when doing fib6_rule_lookup() to traverse each
      routing table. The null_entry is a shared object and constant updates on
      it cause false sharing.
      
      We converted the current major lookup function ip6_route_output_flags()
      to make use of RT6_LOOKUP_F_DST_NOREF.
      
      Together with the change in the rx path, we see noticable performance
      boost:
      I ran synflood tests between 2 hosts under the same switch. Both hosts
      have 20G mlx NIC, and 8 tx/rx queues.
      Sender sends pure SYN flood with random src IPs and ports using trafgen.
      Receiver has a simple TCP listener on the target port.
      Both hosts have multiple custom rules:
      - For incoming packets, only local table is traversed.
      - For outgoing packets, 3 tables are traversed to find the route.
      The packet processing rate on the receiver is as follows:
      - Before the fix: 3.78Mpps
      - After the fix:  5.50Mpps
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d9e5f42
    • W
      ipv6: honor RT6_LOOKUP_F_DST_NOREF in rule lookup logic · d64a1f57
      Wei Wang 提交于
      This patch specifically converts the rule lookup logic to honor this
      flag and not release refcnt when traversing each rule and calling
      lookup() on each routing table.
      Similar to previous patch, we also need some special handling of dst
      entries in uncached list because there is always 1 refcnt taken for them
      even if RT6_LOOKUP_F_DST_NOREF flag is set.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d64a1f57
    • W
      ipv6: introduce RT6_LOOKUP_F_DST_NOREF flag in ip6_pol_route() · 0e09edcc
      Wei Wang 提交于
      This new flag is to instruct the route lookup function to not take
      refcnt on the dst entry. The user which does route lookup with this flag
      must properly use rcu protection.
      ip6_pol_route() is the major route lookup function for both tx and rx
      path.
      In this function:
      Do not take refcnt on dst if RT6_LOOKUP_F_DST_NOREF flag is set, and
      directly return the route entry. The caller should be holding rcu lock
      when using this flag, and decide whether to take refcnt or not.
      
      One note on the dst cache in the uncached_list:
      As uncached_list does not consume refcnt, one refcnt is always returned
      back to the caller even if RT6_LOOKUP_F_DST_NOREF flag is set.
      Uncached dst is only possible in the output path. So in such call path,
      caller MUST check if the dst is in the uncached_list before assuming
      that there is no refcnt taken on the returned dst.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e09edcc
    • Q
      inet: fix compilation warnings in fqdir_pre_exit() · 08003d0b
      Qian Cai 提交于
      The linux-next commit "inet: fix various use-after-free in defrags
      units" [1] introduced compilation warnings,
      
      ./include/net/inet_frag.h:117:1: warning: 'inline' is not at beginning
      of declaration [-Wold-style-declaration]
       static void inline fqdir_pre_exit(struct fqdir *fqdir)
       ^~~~~~
      In file included from ./include/net/netns/ipv4.h:10,
                       from ./include/net/net_namespace.h:20,
                       from ./include/linux/netdevice.h:38,
                       from ./include/linux/icmpv6.h:13,
                       from ./include/linux/ipv6.h:86,
                       from ./include/net/ipv6.h:12,
                       from ./include/rdma/ib_verbs.h:51,
                       from ./include/linux/mlx5/device.h:37,
                       from ./include/linux/mlx5/driver.h:51,
                       from
      drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:37:
      
      [1] https://lore.kernel.org/netdev/20190618180900.88939-3-edumazet@google.com/Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08003d0b
  4. 23 6月, 2019 1 次提交
    • A
      net: fastopen: robustness and endianness fixes for SipHash · 438ac880
      Ard Biesheuvel 提交于
      Some changes to the TCP fastopen code to make it more robust
      against future changes in the choice of key/cookie size, etc.
      
      - Instead of keeping the SipHash key in an untyped u8[] buffer
        and casting it to the right type upon use, use the correct
        type directly. This ensures that the key will appear at the
        correct alignment if we ever change the way these data
        structures are allocated. (Currently, they are only allocated
        via kmalloc so they always appear at the correct alignment)
      
      - Use DIV_ROUND_UP when sizing the u64[] array to hold the
        cookie, so it is always of sufficient size, even if
        TCP_FASTOPEN_COOKIE_MAX is no longer a multiple of 8.
      
      - Drop the 'len' parameter from the tcp_fastopen_reset_cipher()
        function, which is no longer used.
      
      - Add endian swabbing when setting the keys and calculating the hash,
        to ensure that cookie values are the same for a given key and
        source/destination address pair regardless of the endianness of
        the server.
      
      Note that none of these are functional changes wrt the current
      state of the code, with the exception of the swabbing, which only
      affects big endian systems.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      438ac880
  5. 22 6月, 2019 1 次提交
  6. 20 6月, 2019 3 次提交
    • J
      page_pool: fix compile warning when CONFIG_PAGE_POOL is disabled · 497ad9f5
      Jesper Dangaard Brouer 提交于
      Kbuild test robot reported compile warning:
       warning: no return statement in function returning non-void
      in function page_pool_request_shutdown, when CONFIG_PAGE_POOL is disabled.
      
      The fix makes the code a little more verbose, with a descriptive variable.
      
      Fixes: 99c07c43 ("xdp: tracking page_pool resources and safe removal")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      497ad9f5
    • E
      inet: clear num_timeout reqsk_alloc() · 85f9aa75
      Eric Dumazet 提交于
      KMSAN caught uninit-value in tcp_create_openreq_child() [1]
      This is caused by a recent change, combined by the fact
      that TCP cleared num_timeout, num_retrans and sk fields only
      when a request socket was about to be queued.
      
      Under syncookie mode, a temporary request socket is used,
      and req->num_timeout could contain garbage.
      
      Lets clear these three fields sooner, there is really no
      point trying to defer this and risk other bugs.
      
      [1]
      
      BUG: KMSAN: uninit-value in tcp_create_openreq_child+0x157f/0x1cc0 net/ipv4/tcp_minisocks.c:526
      CPU: 1 PID: 13357 Comm: syz-executor591 Not tainted 5.2.0-rc4+ #3
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x191/0x1f0 lib/dump_stack.c:113
       kmsan_report+0x162/0x2d0 mm/kmsan/kmsan.c:611
       __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:304
       tcp_create_openreq_child+0x157f/0x1cc0 net/ipv4/tcp_minisocks.c:526
       tcp_v6_syn_recv_sock+0x761/0x2d80 net/ipv6/tcp_ipv6.c:1152
       tcp_get_cookie_sock+0x16e/0x6b0 net/ipv4/syncookies.c:209
       cookie_v6_check+0x27e0/0x29a0 net/ipv6/syncookies.c:252
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1039 [inline]
       tcp_v6_do_rcv+0xf1c/0x1ce0 net/ipv6/tcp_ipv6.c:1344
       tcp_v6_rcv+0x60b7/0x6a30 net/ipv6/tcp_ipv6.c:1554
       ip6_protocol_deliver_rcu+0x1433/0x22f0 net/ipv6/ip6_input.c:397
       ip6_input_finish net/ipv6/ip6_input.c:438 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_input+0x2af/0x340 net/ipv6/ip6_input.c:447
       dst_input include/net/dst.h:439 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:272
       __netif_receive_skb_one_core net/core/dev.c:4981 [inline]
       __netif_receive_skb net/core/dev.c:5095 [inline]
       process_backlog+0x721/0x1410 net/core/dev.c:5906
       napi_poll net/core/dev.c:6329 [inline]
       net_rx_action+0x738/0x1940 net/core/dev.c:6395
       __do_softirq+0x4ad/0x858 kernel/softirq.c:293
       do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1052
       </IRQ>
       do_softirq kernel/softirq.c:338 [inline]
       __local_bh_enable_ip+0x199/0x1e0 kernel/softirq.c:190
       local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
       rcu_read_unlock_bh include/linux/rcupdate.h:682 [inline]
       ip6_finish_output2+0x213f/0x2670 net/ipv6/ip6_output.c:117
       ip6_finish_output+0xae4/0xbc0 net/ipv6/ip6_output.c:150
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:167
       dst_output include/net/dst.h:433 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0x1f53/0x2650 net/ipv6/ip6_output.c:271
       inet6_csk_xmit+0x3df/0x4f0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x4076/0x5b40 net/ipv4/tcp_output.c:1156
       tcp_transmit_skb net/ipv4/tcp_output.c:1172 [inline]
       tcp_write_xmit+0x39a9/0xa730 net/ipv4/tcp_output.c:2397
       __tcp_push_pending_frames+0x124/0x4e0 net/ipv4/tcp_output.c:2573
       tcp_send_fin+0xd43/0x1540 net/ipv4/tcp_output.c:3118
       tcp_close+0x16ba/0x1860 net/ipv4/tcp.c:2403
       inet_release+0x1f7/0x270 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:470
       __sock_release net/socket.c:601 [inline]
       sock_close+0x156/0x490 net/socket.c:1273
       __fput+0x4c9/0xba0 fs/file_table.c:280
       ____fput+0x37/0x40 fs/file_table.c:313
       task_work_run+0x22e/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:185 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:168 [inline]
       prepare_exit_to_usermode+0x39d/0x4d0 arch/x86/entry/common.c:199
       syscall_return_slowpath+0x90/0x5c0 arch/x86/entry/common.c:279
       do_syscall_64+0xe2/0xf0 arch/x86/entry/common.c:305
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x401d50
      Code: 01 f0 ff ff 0f 83 40 0d 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d dd 8d 2d 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 14 0d 00 00 c3 48 83 ec 08 e8 7a 02 00 00
      RSP: 002b:00007fff1cf58cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000401d50
      RDX: 000000000000001c RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00000000004a9050 R08: 0000000020000040 R09: 000000000000001c
      R10: 0000000020004004 R11: 0000000000000246 R12: 0000000000402ef0
      R13: 0000000000402f80 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:201 [inline]
       kmsan_internal_poison_shadow+0x53/0xa0 mm/kmsan/kmsan.c:160
       kmsan_kmalloc+0xa4/0x130 mm/kmsan/kmsan_hooks.c:177
       kmem_cache_alloc+0x534/0xb00 mm/slub.c:2781
       reqsk_alloc include/net/request_sock.h:84 [inline]
       inet_reqsk_alloc+0xa8/0x600 net/ipv4/tcp_input.c:6384
       cookie_v6_check+0xadb/0x29a0 net/ipv6/syncookies.c:173
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1039 [inline]
       tcp_v6_do_rcv+0xf1c/0x1ce0 net/ipv6/tcp_ipv6.c:1344
       tcp_v6_rcv+0x60b7/0x6a30 net/ipv6/tcp_ipv6.c:1554
       ip6_protocol_deliver_rcu+0x1433/0x22f0 net/ipv6/ip6_input.c:397
       ip6_input_finish net/ipv6/ip6_input.c:438 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_input+0x2af/0x340 net/ipv6/ip6_input.c:447
       dst_input include/net/dst.h:439 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:272
       __netif_receive_skb_one_core net/core/dev.c:4981 [inline]
       __netif_receive_skb net/core/dev.c:5095 [inline]
       process_backlog+0x721/0x1410 net/core/dev.c:5906
       napi_poll net/core/dev.c:6329 [inline]
       net_rx_action+0x738/0x1940 net/core/dev.c:6395
       __do_softirq+0x4ad/0x858 kernel/softirq.c:293
       do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1052
       do_softirq kernel/softirq.c:338 [inline]
       __local_bh_enable_ip+0x199/0x1e0 kernel/softirq.c:190
       local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
       rcu_read_unlock_bh include/linux/rcupdate.h:682 [inline]
       ip6_finish_output2+0x213f/0x2670 net/ipv6/ip6_output.c:117
       ip6_finish_output+0xae4/0xbc0 net/ipv6/ip6_output.c:150
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:167
       dst_output include/net/dst.h:433 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0x1f53/0x2650 net/ipv6/ip6_output.c:271
       inet6_csk_xmit+0x3df/0x4f0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x4076/0x5b40 net/ipv4/tcp_output.c:1156
       tcp_transmit_skb net/ipv4/tcp_output.c:1172 [inline]
       tcp_write_xmit+0x39a9/0xa730 net/ipv4/tcp_output.c:2397
       __tcp_push_pending_frames+0x124/0x4e0 net/ipv4/tcp_output.c:2573
       tcp_send_fin+0xd43/0x1540 net/ipv4/tcp_output.c:3118
       tcp_close+0x16ba/0x1860 net/ipv4/tcp.c:2403
       inet_release+0x1f7/0x270 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:470
       __sock_release net/socket.c:601 [inline]
       sock_close+0x156/0x490 net/socket.c:1273
       __fput+0x4c9/0xba0 fs/file_table.c:280
       ____fput+0x37/0x40 fs/file_table.c:313
       task_work_run+0x22e/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:185 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:168 [inline]
       prepare_exit_to_usermode+0x39d/0x4d0 arch/x86/entry/common.c:199
       syscall_return_slowpath+0x90/0x5c0 arch/x86/entry/common.c:279
       do_syscall_64+0xe2/0xf0 arch/x86/entry/common.c:305
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Fixes: 336c39a0 ("tcp: undo init congestion window on false SYNACK timeout")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85f9aa75
    • K
      net: sched: act_ctinfo: tidy UAPI definition · 16e5a266
      Kevin Darbyshire-Bryant 提交于
      Remove some enums from the UAPI definition that were only used
      internally and are NOT part of the UAPI.
      Signed-off-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16e5a266
  7. 19 6月, 2019 27 次提交