1. 21 12月, 2019 2 次提交
    • E
      tcp: md5: fix potential overestimation of TCP option space · a148815a
      Eric Dumazet 提交于
      [ Upstream commit 9424e2e7ad93ffffa88f882c9bc5023570904b55 ]
      
      Back in 2008, Adam Langley fixed the corner case of packets for flows
      having all of the following options : MD5 TS SACK
      
      Since MD5 needs 20 bytes, and TS needs 12 bytes, no sack block
      can be cooked from the remaining 8 bytes.
      
      tcp_established_options() correctly sets opts->num_sack_blocks
      to zero, but returns 36 instead of 32.
      
      This means TCP cooks packets with 4 extra bytes at the end
      of options, containing unitialized bytes.
      
      Fixes: 33ad798c ("tcp: options clean up")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a148815a
    • E
      inet: protect against too small mtu values. · d80d67cd
      Eric Dumazet 提交于
      [ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ]
      
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d80d67cd
  2. 18 12月, 2019 1 次提交
  3. 13 12月, 2019 3 次提交
  4. 05 12月, 2019 3 次提交
  5. 01 12月, 2019 2 次提交
    • T
      net: bpfilter: fix iptables failure if bpfilter_umh is disabled · e6c6c043
      Taehee Yoo 提交于
      [ Upstream commit 97adaddaa6db7a8af81b9b11e30cbe3628cd6700 ]
      
      When iptables command is executed, ip_{set/get}sockopt() try to upload
      bpfilter.ko if bpfilter is enabled. if it couldn't find bpfilter.ko,
      command is failed.
      bpfilter.ko is generated if CONFIG_BPFILTER_UMH is enabled.
      ip_{set/get}sockopt() only checks CONFIG_BPFILTER.
      So that if CONFIG_BPFILTER is enabled and CONFIG_BPFILTER_UMH is disabled,
      iptables command is always failed.
      
      test config:
         CONFIG_BPFILTER=y
         # CONFIG_BPFILTER_UMH is not set
      
      test command:
         %iptables -L
         iptables: No chain/target/match by that name.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e6c6c043
    • H
      ipv4/igmp: fix v1/v2 switchback timeout based on rfc3376, 8.12 · 32d7474b
      Hangbin Liu 提交于
      [ Upstream commit 966c37f2d77eb44d47af8e919267b1ba675b2eca ]
      
      Similiar with ipv6 mcast commit 89225d1c ("net: ipv6: mld: fix v1/v2
      switchback timeout to rfc3810, 9.12.")
      
      i) RFC3376 8.12. Older Version Querier Present Timeout says:
      
         The Older Version Querier Interval is the time-out for transitioning
         a host back to IGMPv3 mode once an older version query is heard.
         When an older version query is received, hosts set their Older
         Version Querier Present Timer to Older Version Querier Interval.
      
         This value MUST be ((the Robustness Variable) times (the Query
         Interval in the last Query received)) plus (one Query Response
         Interval).
      
      Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT.
      Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response
      Interval) in struct in_device.
      
      Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri.
      We need update these values when receive IGMPv3 queries.
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      32d7474b
  6. 24 11月, 2019 2 次提交
  7. 21 11月, 2019 3 次提交
    • T
      netfilter: masquerade: don't flush all conntracks if only one address deleted on device · 8ddec6aa
      Tan Hu 提交于
      [ Upstream commit 097f95d319f817e651bd51f8846aced92a55a6a1 ]
      
      We configured iptables as below, which only allowed incoming data on
      established connections:
      
      iptables -t mangle -A PREROUTING -m state --state ESTABLISHED -j ACCEPT
      iptables -t mangle -P PREROUTING DROP
      
      When deleting a secondary address, current masquerade implements would
      flush all conntracks on this device. All the established connections on
      primary address also be deleted, then subsequent incoming data on the
      connections would be dropped wrongly because it was identified as NEW
      connection.
      
      So when an address was delete, it should only flush connections related
      with the address.
      Signed-off-by: NTan Hu <tan.hu@zte.com.cn>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8ddec6aa
    • H
      ip_gre: fix parsing gre header in ipgre_err · 6745591c
      Haishuang Yan 提交于
      [ Upstream commit b0350d51f001e6edc13ee4f253b98b50b05dd401 ]
      
      gre_parse_header stops parsing when csum_err is encountered, which means
      tpi->key is undefined and ip_tunnel_lookup will return NULL improperly.
      
      This patch introduce a NULL pointer as csum_err parameter. Even when
      csum_err is encountered, it won't return error and continue parsing gre
      header as expected.
      
      Fixes: 9f57c67c ("gre: Remove support for sharing GRE protocol hook.")
      Reported-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6745591c
    • G
      ipmr: Fix skb headroom in ipmr_get_route(). · 66daa057
      Guillaume Nault 提交于
      [ Upstream commit 7901cd97963d6cbde88fa25a4a446db3554c16c6 ]
      
      In route.c, inet_rtm_getroute_build_skb() creates an skb with no
      headroom. This skb is then used by inet_rtm_getroute() which may pass
      it to rt_fill_info() and, from there, to ipmr_get_route(). The later
      might try to reuse this skb by cloning it and prepending an IPv4
      header. But since the original skb has no headroom, skb_push() triggers
      skb_under_panic():
      
      skbuff: skb_under_panic: text:00000000ca46ad8a len:80 put:20 head:00000000cd28494e data:000000009366fd6b tail:0x3c end:0xec0 dev:veth0
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:108!
      invalid opcode: 0000 [#1] SMP KASAN PTI
      CPU: 6 PID: 587 Comm: ip Not tainted 5.4.0-rc6+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      RIP: 0010:skb_panic+0xbf/0xd0
      Code: 41 a2 ff 8b 4b 70 4c 8b 4d d0 48 c7 c7 20 76 f5 8b 44 8b 45 bc 48 8b 55 c0 48 8b 75 c8 41 54 41 57 41 56 41 55 e8 75 dc 7a ff <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
      RSP: 0018:ffff888059ddf0b0 EFLAGS: 00010286
      RAX: 0000000000000086 RBX: ffff888060a315c0 RCX: ffffffff8abe4822
      RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88806c9a79cc
      RBP: ffff888059ddf118 R08: ffffed100d9361b1 R09: ffffed100d9361b0
      R10: ffff88805c68aee3 R11: ffffed100d9361b1 R12: ffff88805d218000
      R13: ffff88805c689fec R14: 000000000000003c R15: 0000000000000ec0
      FS:  00007f6af184b700(0000) GS:ffff88806c980000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffc8204a000 CR3: 0000000057b40006 CR4: 0000000000360ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_push+0x7e/0x80
       ipmr_get_route+0x459/0x6fa
       rt_fill_info+0x692/0x9f0
       inet_rtm_getroute+0xd26/0xf20
       rtnetlink_rcv_msg+0x45d/0x630
       netlink_rcv_skb+0x1a5/0x220
       rtnetlink_rcv+0x15/0x20
       netlink_unicast+0x305/0x3a0
       netlink_sendmsg+0x575/0x730
       sock_sendmsg+0xb5/0xc0
       ___sys_sendmsg+0x497/0x4f0
       __sys_sendmsg+0xcb/0x150
       __x64_sys_sendmsg+0x48/0x50
       do_syscall_64+0xd2/0xac0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Actually the original skb used to have enough headroom, but the
      reserve_skb() call was lost with the introduction of
      inet_rtm_getroute_build_skb() by commit 404eb77e ("ipv4: support
      sport, dport and ip_proto in RTM_GETROUTE").
      
      We could reserve some headroom again in inet_rtm_getroute_build_skb(),
      but this function shouldn't be responsible for handling the special
      case of ipmr_get_route(). Let's handle that directly in
      ipmr_get_route() by calling skb_realloc_headroom() instead of
      skb_clone().
      
      Fixes: 404eb77e ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66daa057
  8. 13 11月, 2019 1 次提交
  9. 10 11月, 2019 8 次提交
    • P
      ipv4: fix route update on metric change. · b166e883
      Paolo Abeni 提交于
      [ Upstream commit 0b834ba00ab5337e938c727e216e1f5249794717 ]
      
      Since commit af4d768a ("net/ipv4: Add support for specifying metric
      of connected routes"), when updating an IP address with a different metric,
      the associated connected route is updated, too.
      
      Still, the mentioned commit doesn't handle properly some corner cases:
      
      $ ip addr add dev eth0 192.168.1.0/24
      $ ip addr add dev eth0 192.168.2.1/32 peer 192.168.2.2
      $ ip addr add dev eth0 192.168.3.1/24
      $ ip addr change dev eth0 192.168.1.0/24 metric 10
      $ ip addr change dev eth0 192.168.2.1/32 peer 192.168.2.2 metric 10
      $ ip addr change dev eth0 192.168.3.1/24 metric 10
      $ ip -4 route
      192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.0
      192.168.2.2 dev eth0 proto kernel scope link src 192.168.2.1
      192.168.3.0/24 dev eth0 proto kernel scope link src 192.168.2.1 metric 10
      
      Only the last route is correctly updated.
      
      The problem is the current test in fib_modify_prefix_metric():
      
      	if (!(dev->flags & IFF_UP) ||
      	    ifa->ifa_flags & (IFA_F_SECONDARY | IFA_F_NOPREFIXROUTE) ||
      	    ipv4_is_zeronet(prefix) ||
      	    prefix == ifa->ifa_local || ifa->ifa_prefixlen == 32)
      
      Which should be the logical 'not' of the pre-existing test in
      fib_add_ifaddr():
      
      	if (!ipv4_is_zeronet(prefix) && !(ifa->ifa_flags & IFA_F_SECONDARY) &&
      	    (prefix != addr || ifa->ifa_prefixlen < 32))
      
      To properly negate the original expression, we need to change the last
      logical 'or' to a logical 'and'.
      
      Fixes: af4d768a ("net/ipv4: Add support for specifying metric of connected routes")
      Reported-and-suggested-by: NBeniamino Galvani <bgalvani@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b166e883
    • E
      net: use skb_queue_empty_lockless() in busy poll contexts · 4f3df7f1
      Eric Dumazet 提交于
      [ Upstream commit 3f926af3f4d688e2e11e7f8ed04e277a14d4d4a4 ]
      
      Busy polling usually runs without locks.
      Let's use skb_queue_empty_lockless() instead of skb_queue_empty()
      
      Also uses READ_ONCE() in __skb_try_recv_datagram() to address
      a similar potential problem.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4f3df7f1
    • E
      net: use skb_queue_empty_lockless() in poll() handlers · eaf548fe
      Eric Dumazet 提交于
      [ Upstream commit 3ef7cf57c72f32f61e97f8fa401bc39ea1f1a5d4 ]
      
      Many poll() handlers are lockless. Using skb_queue_empty_lockless()
      instead of skb_queue_empty() is more appropriate.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eaf548fe
    • E
      udp: use skb_queue_empty_lockless() · afa1f5e9
      Eric Dumazet 提交于
      [ Upstream commit 137a0dbe3426fd7bcfe3f8117b36a87b3590e4eb ]
      
      syzbot reported a data-race [1].
      
      We should use skb_queue_empty_lockless() to document that we are
      not ensuring a mutual exclusion and silence KCSAN.
      
      [1]
      BUG: KCSAN: data-race in __skb_recv_udp / __udp_enqueue_schedule_skb
      
      write to 0xffff888122474b50 of 8 bytes by interrupt on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       __udp_enqueue_schedule_skb+0x2c1/0x410 net/ipv4/udp.c:1470
       __udp_queue_rcv_skb net/ipv4/udp.c:1940 [inline]
       udp_queue_rcv_one_skb+0x7bd/0xc70 net/ipv4/udp.c:2057
       udp_queue_rcv_skb+0xb5/0x400 net/ipv4/udp.c:2074
       udp_unicast_rcv_skb.isra.0+0x7e/0x1c0 net/ipv4/udp.c:2233
       __udp4_lib_rcv+0xa44/0x17c0 net/ipv4/udp.c:2300
       udp_rcv+0x2b/0x40 net/ipv4/udp.c:2470
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
      
      read to 0xffff888122474b50 of 8 bytes by task 8921 on cpu 1:
       skb_queue_empty include/linux/skbuff.h:1494 [inline]
       __skb_recv_udp+0x18d/0x500 net/ipv4/udp.c:1653
       udp_recvmsg+0xe1/0xb10 net/ipv4/udp.c:1712
       inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 8921 Comm: syz-executor.4 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afa1f5e9
    • E
      udp: fix data-race in udp_set_dev_scratch() · a8a5adbb
      Eric Dumazet 提交于
      [ Upstream commit a793183caa9afae907a0d7ddd2ffd57329369bf5 ]
      
      KCSAN reported a data-race in udp_set_dev_scratch() [1]
      
      The issue here is that we must not write over skb fields
      if skb is shared. A similar issue has been fixed in commit
      89c22d8c ("net: Fix skb csum races when peeking")
      
      While we are at it, use a helper only dealing with
      udp_skb_scratch(skb)->csum_unnecessary, as this allows
      udp_set_dev_scratch() to be called once and thus inlined.
      
      [1]
      BUG: KCSAN: data-race in udp_set_dev_scratch / udpv6_recvmsg
      
      write to 0xffff888120278317 of 1 bytes by task 10411 on cpu 1:
       udp_set_dev_scratch+0xea/0x200 net/ipv4/udp.c:1308
       __first_packet_length+0x147/0x420 net/ipv4/udp.c:1556
       first_packet_length+0x68/0x2a0 net/ipv4/udp.c:1579
       udp_poll+0xea/0x110 net/ipv4/udp.c:2720
       sock_poll+0xed/0x250 net/socket.c:1256
       vfs_poll include/linux/poll.h:90 [inline]
       do_select+0x7d0/0x1020 fs/select.c:534
       core_sys_select+0x381/0x550 fs/select.c:677
       do_pselect.constprop.0+0x11d/0x160 fs/select.c:759
       __do_sys_pselect6 fs/select.c:784 [inline]
       __se_sys_pselect6 fs/select.c:769 [inline]
       __x64_sys_pselect6+0x12e/0x170 fs/select.c:769
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      read to 0xffff888120278317 of 1 bytes by task 10413 on cpu 0:
       udp_skb_csum_unnecessary include/net/udp.h:358 [inline]
       udpv6_recvmsg+0x43e/0xe90 net/ipv6/udp.c:310
       inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592
       sock_recvmsg_nosec+0x5c/0x70 net/socket.c:871
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 10413 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8a5adbb
    • E
      net: annotate accesses to sk->sk_incoming_cpu · 0cfaf03c
      Eric Dumazet 提交于
      [ Upstream commit 7170a977743b72cf3eb46ef6ef89885dc7ad3621 ]
      
      This socket field can be read and written by concurrent cpus.
      
      Use READ_ONCE() and WRITE_ONCE() annotations to document this,
      and avoid some compiler 'optimizations'.
      
      KCSAN reported :
      
      BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv
      
      write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0:
       sk_incoming_cpu_update include/net/sock.h:953 [inline]
       tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
       do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
       do_softirq kernel/softirq.c:329 [inline]
       __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
      
      read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1:
       sk_incoming_cpu_update include/net/sock.h:952 [inline]
       tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934
       ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204
       ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:442 [inline]
       ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:305 [inline]
       NF_HOOK include/linux/netfilter.h:299 [inline]
       ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010
       __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124
       process_backlog+0x1d3/0x420 net/core/dev.c:5955
       napi_poll net/core/dev.c:6392 [inline]
       net_rx_action+0x3ae/0xa90 net/core/dev.c:6460
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       run_ksoftirqd+0x46/0x60 kernel/softirq.c:603
       smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cfaf03c
    • E
      inet: stop leaking jiffies on the wire · 07de7389
      Eric Dumazet 提交于
      [ Upstream commit a904a0693c189691eeee64f6c6b188bd7dc244e9 ]
      
      Historically linux tried to stick to RFC 791, 1122, 2003
      for IPv4 ID field generation.
      
      RFC 6864 made clear that no matter how hard we try,
      we can not ensure unicity of IP ID within maximum
      lifetime for all datagrams with a given source
      address/destination address/protocol tuple.
      
      Linux uses a per socket inet generator (inet_id), initialized
      at connection startup with a XOR of 'jiffies' and other
      fields that appear clear on the wire.
      
      Thiemo Nagel pointed that this strategy is a privacy
      concern as this provides 16 bits of entropy to fingerprint
      devices.
      
      Let's switch to a random starting point, this is just as
      good as far as RFC 6864 is concerned and does not leak
      anything critical.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NThiemo Nagel <tnagel@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      07de7389
    • X
      erspan: fix the tun_info options_len check for erspan · 163901dc
      Xin Long 提交于
      [ Upstream commit 2eb8d6d2910cfe3dc67dc056f26f3dd9c63d47cd ]
      
      The check for !md doens't really work for ip_tunnel_info_opts(info) which
      only does info + 1. Also to avoid out-of-bounds access on info, it should
      ensure options_len is not less than erspan_metadata in both erspan_xmit()
      and ip6erspan_tunnel_xmit().
      
      Fixes: 1a66a836 ("gre: add collect_md mode to ERSPAN tunnel")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      163901dc
  10. 29 10月, 2019 2 次提交
    • S
      ipv4: Return -ENETUNREACH if we can't create route but saddr is valid · 2fa80e64
      Stefano Brivio 提交于
      [ Upstream commit 595e0651d0296bad2491a4a29a7a43eae6328b02 ]
      
      ...instead of -EINVAL. An issue was found with older kernel versions
      while unplugging a NFS client with pending RPCs, and the wrong error
      code here prevented it from recovering once link is back up with a
      configured address.
      
      Incidentally, this is not an issue anymore since commit 4f8943f80883
      ("SUNRPC: Replace direct task wakeups from softirq context"), included
      in 5.2-rc7, had the effect of decoupling the forwarding of this error
      by using SO_ERROR in xs_wake_error(), as pointed out by Benjamin
      Coddington.
      
      To the best of my knowledge, this isn't currently causing any further
      issue, but the error code doesn't look appropriate anyway, and we
      might hit this in other paths as well.
      
      In detail, as analysed by Gonzalo Siero, once the route is deleted
      because the interface is down, and can't be resolved and we return
      -EINVAL here, this ends up, courtesy of inet_sk_rebuild_header(),
      as the socket error seen by tcp_write_err(), called by
      tcp_retransmit_timer().
      
      In turn, tcp_write_err() indirectly calls xs_error_report(), which
      wakes up the RPC pending tasks with a status of -EINVAL. This is then
      seen by call_status() in the SUN RPC implementation, which aborts the
      RPC call calling rpc_exit(), instead of handling this as a
      potentially temporary condition, i.e. as a timeout.
      
      Return -EINVAL only if the input parameters passed to
      ip_route_output_key_hash_rcu() are actually invalid (this is the case
      if the specified source address is multicast, limited broadcast or
      all zeroes), but return -ENETUNREACH in all cases where, at the given
      moment, the given source address doesn't allow resolving the route.
      
      While at it, drop the initialisation of err to -ENETUNREACH, which
      was added to __ip_route_output_key() back then by commit
      0315e382 ("net: Fix behaviour of unreachable, blackhole and
      prohibit routes"), but actually had no effect, as it was, and is,
      overwritten by the fib_lookup() return code assignment, and anyway
      ignored in all other branches, including the if (fl4->saddr) one:
      I find this rather confusing, as it would look like -ENETUNREACH is
      the "default" error, while that statement has no effect.
      
      Also note that after commit fc75fc83 ("ipv4: dont create routes
      on down devices"), we would get -ENETUNREACH if the device is down,
      but -EINVAL if the source address is specified and we can't resolve
      the route, and this appears to be rather inconsistent.
      Reported-by: NStefan Walter <walteste@inf.ethz.ch>
      Analysed-by: NBenjamin Coddington <bcodding@redhat.com>
      Analysed-by: NGonzalo Siero <gsierohu@redhat.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2fa80e64
    • W
      ipv4: fix race condition between route lookup and invalidation · 2ec0df4e
      Wei Wang 提交于
      [ Upstream commit 5018c59607a511cdee743b629c76206d9c9e6d7b ]
      
      Jesse and Ido reported the following race condition:
      <CPU A, t0> - Received packet A is forwarded and cached dst entry is
      taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()
      
      <t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
      from multiple ISPs"), route is added / deleted and rt_cache_flush() is
      called
      
      <CPU B, t2> - Received packet B tries to use the same cached dst entry
      from t0, but rt_cache_valid() is no longer true and it is replaced in
      rt_cache_route() by the newer one. This calls dst_dev_put() on the
      original dst entry which assigns the blackhole netdev to 'dst->dev'
      
      <CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
      to 'dst->dev' being the blackhole netdev
      
      There are 2 issues in the v4 routing code:
      1. A per-netns counter is used to do the validation of the route. That
      means whenever a route is changed in the netns, users of all routes in
      the netns needs to redo lookup. v6 has an implementation of only
      updating fn_sernum for routes that are affected.
      2. When rt_cache_valid() returns false, rt_cache_route() is called to
      throw away the current cache, and create a new one. This seems
      unnecessary because as long as this route does not change, the route
      cache does not need to be recreated.
      
      To fully solve the above 2 issues, it probably needs quite some code
      changes and requires careful testing, and does not suite for net branch.
      
      So this patch only tries to add the deleted cached rt into the uncached
      list, so user could still be able to use it to receive packets until
      it's done.
      
      Fixes: 95c47f9c ("ipv4: call dst_dev_put() properly")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Reported-by: NJesse Hathaway <jesse@mbuki-mvuki.org>
      Tested-by: NJesse Hathaway <jesse@mbuki-mvuki.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ec0df4e
  11. 08 10月, 2019 4 次提交
  12. 05 10月, 2019 1 次提交
  13. 01 10月, 2019 1 次提交
  14. 21 9月, 2019 1 次提交
  15. 19 9月, 2019 1 次提交
    • N
      tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR · 67fe3b94
      Neal Cardwell 提交于
      [ Upstream commit af38d07ed391b21f7405fa1f936ca9686787d6d2 ]
      
      Fix tcp_ecn_withdraw_cwr() to clear the correct bit:
      TCP_ECN_QUEUE_CWR.
      
      Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about
      the behavior of data receivers, and deciding whether to reflect
      incoming IP ECN CE marks as outgoing TCP th->ece marks. The
      TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders,
      and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function
      is only called from tcp_undo_cwnd_reduction() by data senders during
      an undo, so it should zero the sender-side state,
      TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of
      incoming CE bits on incoming data packets just because outgoing
      packets were spuriously retransmitted.
      
      The bug has been reproduced with packetdrill to manifest in a scenario
      with RFC3168 ECN, with an incoming data packet with CE bit set and
      carrying a TCP timestamp value that causes cwnd undo. Before this fix,
      the IP CE bit was ignored and not reflected in the TCP ECE header bit,
      and sender sent a TCP CWR ('W') bit on the next outgoing data packet,
      even though the cwnd reduction had been undone.  After this fix, the
      sender properly reflects the CE bit and does not set the W bit.
      
      Note: the bug actually predates 2005 git history; this Fixes footer is
      chosen to be the oldest SHA1 I have tested (from Sep 2007) for which
      the patch applies cleanly (since before this commit the code was in a
      .h file).
      
      Fixes: bdf1ee5d ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67fe3b94
  16. 10 9月, 2019 2 次提交
  17. 06 9月, 2019 1 次提交
    • H
      ipv4/icmp: fix rt dst dev null pointer dereference · 9febfd30
      Hangbin Liu 提交于
      [ Upstream commit e2c693934194fd3b4e795635934883354c06ebc9 ]
      
      In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the reason we need rt->dst.dev is to get the net.
      So we can just try get it from skb->dev when rt->dst.dev is NULL.
      
      v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
      still use dst.dev and do a check to avoid crash.
      
      v3: No changes.
      
      v2: fix the issue in __icmp_send() instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9febfd30
  18. 16 8月, 2019 1 次提交
  19. 09 8月, 2019 1 次提交