1. 15 7月, 2022 9 次提交
  2. 14 7月, 2022 1 次提交
    • N
      ip: fix dflt addr selection for connected nexthop · 747c1430
      Nicolas Dichtel 提交于
      When a nexthop is added, without a gw address, the default scope was set
      to 'host'. Thus, when a source address is selected, 127.0.0.1 may be chosen
      but rejected when the route is used.
      
      When using a route without a nexthop id, the scope can be configured in the
      route, thus the problem doesn't exist.
      
      To explain more deeply: when a user creates a nexthop, it cannot specify
      the scope. To create it, the function nh_create_ipv4() calls fib_check_nh()
      with scope set to 0. fib_check_nh() calls fib_check_nh_nongw() wich was
      setting scope to 'host'. Then, nh_create_ipv4() calls
      fib_info_update_nhc_saddr() with scope set to 'host'. The src addr is
      chosen before the route is inserted.
      
      When a 'standard' route (ie without a reference to a nexthop) is added,
      fib_create_info() calls fib_info_update_nhc_saddr() with the scope set by
      the user. iproute2 set the scope to 'link' by default.
      
      Here is a way to reproduce the problem:
      ip netns add foo
      ip -n foo link set lo up
      ip netns add bar
      ip -n bar link set lo up
      sleep 1
      
      ip -n foo link add name eth0 type dummy
      ip -n foo link set eth0 up
      ip -n foo address add 192.168.0.1/24 dev eth0
      
      ip -n foo link add name veth0 type veth peer name veth1 netns bar
      ip -n foo link set veth0 up
      ip -n bar link set veth1 up
      
      ip -n bar address add 192.168.1.1/32 dev veth1
      ip -n bar route add default dev veth1
      
      ip -n foo nexthop add id 1 dev veth0
      ip -n foo route add 192.168.1.1 nhid 1
      
      Try to get/use the route:
      > $ ip -n foo route get 192.168.1.1
      > RTNETLINK answers: Invalid argument
      > $ ip netns exec foo ping -c1 192.168.1.1
      > ping: connect: Invalid argument
      
      Try without nexthop group (iproute2 sets scope to 'link' by dflt):
      ip -n foo route del 192.168.1.1
      ip -n foo route add 192.168.1.1 dev veth0
      
      Try to get/use the route:
      > $ ip -n foo route get 192.168.1.1
      > 192.168.1.1 dev veth0 src 192.168.0.1 uid 0
      >     cache
      > $ ip netns exec foo ping -c1 192.168.1.1
      > PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
      > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.039 ms
      >
      > --- 192.168.1.1 ping statistics ---
      > 1 packets transmitted, 1 received, 0% packet loss, time 0ms
      > rtt min/avg/max/mdev = 0.039/0.039/0.039/0.000 ms
      
      CC: stable@vger.kernel.org
      Fixes: 597cfe4f ("nexthop: Add support for IPv4 nexthops")
      Reported-by: NEdwin Brossette <edwin.brossette@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Link: https://lore.kernel.org/r/20220713114853.29406-1-nicolas.dichtel@6wind.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      747c1430
  3. 13 7月, 2022 12 次提交
  4. 08 7月, 2022 5 次提交
  5. 27 6月, 2022 1 次提交
    • E
      tunnels: do not assume mac header is set in skb_tunnel_check_pmtu() · 853a7614
      Eric Dumazet 提交于
      Recently added debug in commit f9aefd6b ("net: warn if mac header
      was not set") caught a bug in skb_tunnel_check_pmtu(), as shown
      in this syzbot report [1].
      
      In ndo_start_xmit() paths, there is really no need to use skb->mac_header,
      because skb->data is supposed to point at it.
      
      [1] WARNING: CPU: 1 PID: 8604 at include/linux/skbuff.h:2784 skb_mac_header_len include/linux/skbuff.h:2784 [inline]
      WARNING: CPU: 1 PID: 8604 at include/linux/skbuff.h:2784 skb_tunnel_check_pmtu+0x5de/0x2f90 net/ipv4/ip_tunnel_core.c:413
      Modules linked in:
      CPU: 1 PID: 8604 Comm: syz-executor.3 Not tainted 5.19.0-rc2-syzkaller-00443-g8720bd95 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:skb_mac_header_len include/linux/skbuff.h:2784 [inline]
      RIP: 0010:skb_tunnel_check_pmtu+0x5de/0x2f90 net/ipv4/ip_tunnel_core.c:413
      Code: 00 00 00 00 fc ff df 4c 89 fa 48 c1 ea 03 80 3c 02 00 0f 84 b9 fe ff ff 4c 89 ff e8 7c 0f d7 f9 e9 ac fe ff ff e8 c2 13 8a f9 <0f> 0b e9 28 fc ff ff e8 b6 13 8a f9 48 8b 54 24 70 48 b8 00 00 00
      RSP: 0018:ffffc90002e4f520 EFLAGS: 00010212
      RAX: 0000000000000324 RBX: ffff88804d5fd500 RCX: ffffc90005b52000
      RDX: 0000000000040000 RSI: ffffffff87f05e3e RDI: 0000000000000003
      RBP: ffffc90002e4f650 R08: 0000000000000003 R09: 000000000000ffff
      R10: 000000000000ffff R11: 0000000000000000 R12: 000000000000ffff
      R13: 0000000000000000 R14: 000000000000ffcd R15: 000000000000001f
      FS: 00007f3babba9700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000080 CR3: 0000000075319000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
      <TASK>
      geneve_xmit_skb drivers/net/geneve.c:927 [inline]
      geneve_xmit+0xcf8/0x35d0 drivers/net/geneve.c:1107
      __netdev_start_xmit include/linux/netdevice.h:4805 [inline]
      netdev_start_xmit include/linux/netdevice.h:4819 [inline]
      __dev_direct_xmit+0x500/0x730 net/core/dev.c:4309
      dev_direct_xmit include/linux/netdevice.h:3007 [inline]
      packet_direct_xmit+0x1b8/0x2c0 net/packet/af_packet.c:282
      packet_snd net/packet/af_packet.c:3073 [inline]
      packet_sendmsg+0x21f4/0x55d0 net/packet/af_packet.c:3104
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg+0xcf/0x120 net/socket.c:734
      ____sys_sendmsg+0x6eb/0x810 net/socket.c:2489
      ___sys_sendmsg+0xf3/0x170 net/socket.c:2543
      __sys_sendmsg net/socket.c:2572 [inline]
      __do_sys_sendmsg net/socket.c:2581 [inline]
      __se_sys_sendmsg net/socket.c:2579 [inline]
      __x64_sys_sendmsg+0x132/0x220 net/socket.c:2579
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f3baaa89109
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f3babba9168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f3baab9bf60 RCX: 00007f3baaa89109
      RDX: 0000000000000000 RSI: 0000000020000a00 RDI: 0000000000000003
      RBP: 00007f3baaae305d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffe74f2543f R14: 00007f3babba9300 R15: 0000000000022000
      </TASK>
      
      Fixes: 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      853a7614
  6. 25 6月, 2022 1 次提交
  7. 23 6月, 2022 1 次提交
  8. 20 6月, 2022 1 次提交
    • E
      erspan: do not assume transport header is always set · 301bd140
      Eric Dumazet 提交于
      Rewrite tests in ip6erspan_tunnel_xmit() and
      erspan_fb_xmit() to not assume transport header is set.
      
      syzbot reported:
      
      WARNING: CPU: 0 PID: 1350 at include/linux/skbuff.h:2911 skb_transport_header include/linux/skbuff.h:2911 [inline]
      WARNING: CPU: 0 PID: 1350 at include/linux/skbuff.h:2911 ip6erspan_tunnel_xmit+0x15af/0x2eb0 net/ipv6/ip6_gre.c:963
      Modules linked in:
      CPU: 0 PID: 1350 Comm: aoe_tx0 Not tainted 5.19.0-rc2-syzkaller-00160-g274295c6 #0
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
      RIP: 0010:skb_transport_header include/linux/skbuff.h:2911 [inline]
      RIP: 0010:ip6erspan_tunnel_xmit+0x15af/0x2eb0 net/ipv6/ip6_gre.c:963
      Code: 0f 47 f0 40 88 b5 7f fe ff ff e8 8c 16 4b f9 89 de bf ff ff ff ff e8 a0 12 4b f9 66 83 fb ff 0f 85 1d f1 ff ff e8 71 16 4b f9 <0f> 0b e9 43 f0 ff ff e8 65 16 4b f9 48 8d 85 30 ff ff ff ba 60 00
      RSP: 0018:ffffc90005daf910 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 000000000000ffff RCX: 0000000000000000
      RDX: ffff88801f032100 RSI: ffffffff882e8d3f RDI: 0000000000000003
      RBP: ffffc90005dafab8 R08: 0000000000000003 R09: 000000000000ffff
      R10: 000000000000ffff R11: 0000000000000000 R12: ffff888024f21d40
      R13: 000000000000a288 R14: 00000000000000b0 R15: ffff888025a2e000
      FS: 0000000000000000(0000) GS:ffff88802c800000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2e425000 CR3: 000000006d099000 CR4: 0000000000152ef0
      Call Trace:
      <TASK>
      __netdev_start_xmit include/linux/netdevice.h:4805 [inline]
      netdev_start_xmit include/linux/netdevice.h:4819 [inline]
      xmit_one net/core/dev.c:3588 [inline]
      dev_hard_start_xmit+0x188/0x880 net/core/dev.c:3604
      sch_direct_xmit+0x19f/0xbe0 net/sched/sch_generic.c:342
      __dev_xmit_skb net/core/dev.c:3815 [inline]
      __dev_queue_xmit+0x14a1/0x3900 net/core/dev.c:4219
      dev_queue_xmit include/linux/netdevice.h:2994 [inline]
      tx+0x6a/0xc0 drivers/block/aoe/aoenet.c:63
      kthread+0x1e7/0x3b0 drivers/block/aoe/aoecmd.c:1229
      kthread+0x2e9/0x3a0 kernel/kthread.c:376
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
      </TASK>
      
      Fixes: d5db21a3 ("erspan: auto detect truncated ipv6 packets.")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: William Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      301bd140
  9. 17 6月, 2022 2 次提交
  10. 09 6月, 2022 3 次提交
  11. 01 6月, 2022 1 次提交
    • E
      tcp: tcp_rtx_synack() can be called from process context · 0a375c82
      Eric Dumazet 提交于
      Laurent reported the enclosed report [1]
      
      This bug triggers with following coditions:
      
      0) Kernel built with CONFIG_DEBUG_PREEMPT=y
      
      1) A new passive FastOpen TCP socket is created.
         This FO socket waits for an ACK coming from client to be a complete
         ESTABLISHED one.
      2) A socket operation on this socket goes through lock_sock()
         release_sock() dance.
      3) While the socket is owned by the user in step 2),
         a retransmit of the SYN is received and stored in socket backlog.
      4) At release_sock() time, the socket backlog is processed while
         in process context.
      5) A SYNACK packet is cooked in response of the SYN retransmit.
      6) -> tcp_rtx_synack() is called in process context.
      
      Before blamed commit, tcp_rtx_synack() was always called from BH handler,
      from a timer handler.
      
      Fix this by using TCP_INC_STATS() & NET_INC_STATS()
      which do not assume caller is in non preemptible context.
      
      [1]
      BUG: using __this_cpu_add() in preemptible [00000000] code: epollpep/2180
      caller is tcp_rtx_synack.part.0+0x36/0xc0
      CPU: 10 PID: 2180 Comm: epollpep Tainted: G           OE     5.16.0-0.bpo.4-amd64 #1  Debian 5.16.12-1~bpo11+1
      Hardware name: Supermicro SYS-5039MC-H8TRF/X11SCD-F, BIOS 1.7 11/23/2021
      Call Trace:
       <TASK>
       dump_stack_lvl+0x48/0x5e
       check_preemption_disabled+0xde/0xe0
       tcp_rtx_synack.part.0+0x36/0xc0
       tcp_rtx_synack+0x8d/0xa0
       ? kmem_cache_alloc+0x2e0/0x3e0
       ? apparmor_file_alloc_security+0x3b/0x1f0
       inet_rtx_syn_ack+0x16/0x30
       tcp_check_req+0x367/0x610
       tcp_rcv_state_process+0x91/0xf60
       ? get_nohz_timer_target+0x18/0x1a0
       ? lock_timer_base+0x61/0x80
       ? preempt_count_add+0x68/0xa0
       tcp_v4_do_rcv+0xbd/0x270
       __release_sock+0x6d/0xb0
       release_sock+0x2b/0x90
       sock_setsockopt+0x138/0x1140
       ? __sys_getsockname+0x7e/0xc0
       ? aa_sk_perm+0x3e/0x1a0
       __sys_setsockopt+0x198/0x1e0
       __x64_sys_setsockopt+0x21/0x30
       do_syscall_64+0x38/0xc0
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: 168a8f58 ("tcp: TCP Fast Open Server - main code path")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NLaurent Fasnacht <laurent.fasnacht@proton.ch>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20220530213713.601888-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0a375c82
  12. 31 5月, 2022 1 次提交
  13. 28 5月, 2022 1 次提交
    • E
      tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd · 11825765
      Eric Dumazet 提交于
      syzbot got a new report [1] finally pointing to a very old bug,
      added in initial support for MTU probing.
      
      tcp_mtu_probe() has checks about starting an MTU probe if
      tcp_snd_cwnd(tp) >= 11.
      
      But nothing prevents tcp_snd_cwnd(tp) to be reduced later
      and before the MTU probe succeeds.
      
      This bug would lead to potential zero-divides.
      
      Debugging added in commit 40570375 ("tcp: add accessors
      to read/set tp->snd_cwnd") has paid off :)
      
      While we are at it, address potential overflows in this code.
      
      [1]
      WARNING: CPU: 1 PID: 14132 at include/net/tcp.h:1219 tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712
      Modules linked in:
      CPU: 1 PID: 14132 Comm: syz-executor.2 Not tainted 5.18.0-syzkaller-07857-gbabf0bb9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_snd_cwnd_set include/net/tcp.h:1219 [inline]
      RIP: 0010:tcp_mtup_probe_success+0x366/0x570 net/ipv4/tcp_input.c:2712
      Code: 74 08 48 89 ef e8 da 80 17 f9 48 8b 45 00 65 48 ff 80 80 03 00 00 48 83 c4 30 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 aa b0 c5 f8 <0f> 0b e9 16 fe ff ff 48 8b 4c 24 08 80 e1 07 38 c1 0f 8c c7 fc ff
      RSP: 0018:ffffc900079e70f8 EFLAGS: 00010287
      RAX: ffffffff88c0f7f6 RBX: ffff8880756e7a80 RCX: 0000000000040000
      RDX: ffffc9000c6c4000 RSI: 0000000000031f9e RDI: 0000000000031f9f
      RBP: 0000000000000000 R08: ffffffff88c0f606 R09: ffffc900079e7520
      R10: ffffed101011226d R11: 1ffff1101011226c R12: 1ffff1100eadcf50
      R13: ffff8880756e72c0 R14: 1ffff1100eadcf89 R15: dffffc0000000000
      FS:  00007f643236e700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f1ab3f1e2a0 CR3: 0000000064fe7000 CR4: 00000000003506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_clean_rtx_queue+0x223a/0x2da0 net/ipv4/tcp_input.c:3356
       tcp_ack+0x1962/0x3c90 net/ipv4/tcp_input.c:3861
       tcp_rcv_established+0x7c8/0x1ac0 net/ipv4/tcp_input.c:5973
       tcp_v6_do_rcv+0x57b/0x1210 net/ipv6/tcp_ipv6.c:1476
       sk_backlog_rcv include/net/sock.h:1061 [inline]
       __release_sock+0x1d8/0x4c0 net/core/sock.c:2849
       release_sock+0x5d/0x1c0 net/core/sock.c:3404
       sk_stream_wait_memory+0x700/0xdc0 net/core/stream.c:145
       tcp_sendmsg_locked+0x111d/0x3fc0 net/ipv4/tcp.c:1410
       tcp_sendmsg+0x2c/0x40 net/ipv4/tcp.c:1448
       sock_sendmsg_nosec net/socket.c:714 [inline]
       sock_sendmsg net/socket.c:734 [inline]
       __sys_sendto+0x439/0x5c0 net/socket.c:2119
       __do_sys_sendto net/socket.c:2131 [inline]
       __se_sys_sendto net/socket.c:2127 [inline]
       __x64_sys_sendto+0xda/0xf0 net/socket.c:2127
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f6431289109
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f643236e168 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007f643139c100 RCX: 00007f6431289109
      RDX: 00000000d0d0c2ac RSI: 0000000020000080 RDI: 000000000000000a
      RBP: 00007f64312e308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fff372533af R14: 00007f643236e300 R15: 0000000000022000
      
      Fixes: 5d424d5a ("[TCP]: MTU probing")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11825765
  14. 21 5月, 2022 1 次提交
    • J
      net: Add a second bind table hashed by port and address · d5a42de8
      Joanne Koong 提交于
      We currently have one tcp bind table (bhash) which hashes by port
      number only. In the socket bind path, we check for bind conflicts by
      traversing the specified port's inet_bind2_bucket while holding the
      bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).
      
      In instances where there are tons of sockets hashed to the same port
      at different addresses, checking for a bind conflict is time-intensive
      and can cause softirq cpu lockups, as well as stops new tcp connections
      since __inet_inherit_port() also contests for the spinlock.
      
      This patch proposes adding a second bind table, bhash2, that hashes by
      port and ip address. Searching the bhash2 table leads to significantly
      faster conflict resolution and less time holding the spinlock.
      Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d5a42de8