1. 29 4月, 2021 1 次提交
  2. 27 4月, 2021 1 次提交
  3. 26 4月, 2021 8 次提交
  4. 20 4月, 2021 1 次提交
  5. 19 4月, 2021 1 次提交
    • I
      netfilter: Dissect flow after packet mangling · 812fa71f
      Ido Schimmel 提交于
      Netfilter tries to reroute mangled packets as a different route might
      need to be used following the mangling. When this happens, netfilter
      does not populate the IP protocol, the source port and the destination
      port in the flow key. Therefore, FIB rules that match on these fields
      are ignored and packets can be misrouted.
      
      Solve this by dissecting the outer flow and populating the flow key
      before rerouting the packet. Note that flow dissection only happens when
      FIB rules that match on these fields are installed, so in the common
      case there should not be a penalty.
      Reported-by: NMichal Soltys <msoltyspl@yandex.pl>
      Signed-off-by: NIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      812fa71f
  6. 14 4月, 2021 2 次提交
    • J
      net: Make tcp_allowed_congestion_control readonly in non-init netns · 97684f09
      Jonathon Reinhart 提交于
      Currently, tcp_allowed_congestion_control is global and writable;
      writing to it in any net namespace will leak into all other net
      namespaces.
      
      tcp_available_congestion_control and tcp_allowed_congestion_control are
      the only sysctls in ipv4_net_table (the per-netns sysctl table) with a
      NULL data pointer; their handlers (proc_tcp_available_congestion_control
      and proc_allowed_congestion_control) have no other way of referencing a
      struct net. Thus, they operate globally.
      
      Because ipv4_net_table does not use designated initializers, there is no
      easy way to fix up this one "bad" table entry. However, the data pointer
      updating logic shouldn't be applied to NULL pointers anyway, so we
      instead force these entries to be read-only.
      
      These sysctls used to exist in ipv4_table (init-net only), but they were
      moved to the per-net ipv4_net_table, presumably without realizing that
      tcp_allowed_congestion_control was writable and thus introduced a leak.
      
      Because the intent of that commit was only to know (i.e. read) "which
      congestion algorithms are available or allowed", this read-only solution
      should be sufficient.
      
      The logic added in recent commit
      31c4d2f1: ("net: Ensure net namespace isolation of sysctls")
      does not and cannot check for NULL data pointers, because
      other table entries (e.g. /proc/sys/net/netfilter/nf_log/) have
      .data=NULL but use other methods (.extra2) to access the struct net.
      
      Fixes: 9cb8e048 ("net/ipv4/sysctl: show tcp_{allowed, available}_congestion_control in non-initial netns")
      Signed-off-by: NJonathon Reinhart <jonathon.reinhart@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97684f09
    • A
      icmp: ICMPV6: pass RFC 8335 reply messages to ping_rcv · 31433202
      Andreas Roeseler 提交于
      The current icmp_rcv function drops all unknown ICMP types, including
      ICMP_EXT_ECHOREPLY (type 43). In order to parse Extended Echo Reply messages, we have
      to pass these packets to the ping_rcv function, which does not do any
      other filtering and passes the packet to the designated socket.
      
      Pass incoming RFC 8335 ICMP Extended Echo Reply packets to the ping_rcv
      handler instead of discarding the packet.
      Signed-off-by: NAndreas Roeseler <andreas.a.roeseler@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31433202
  7. 13 4月, 2021 1 次提交
  8. 12 4月, 2021 1 次提交
  9. 11 4月, 2021 1 次提交
  10. 10 4月, 2021 1 次提交
    • E
      Revert "tcp: Reset tcp connections in SYN-SENT state" · a7150e38
      Eric Dumazet 提交于
      This reverts commit e880f8b3.
      
      1) Patch has not been properly tested, and is wrong [1]
      2) Patch submission did not include TCP maintainer (this is me)
      
      [1]
      divide error: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8426 Comm: syz-executor478 Not tainted 5.12.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__tcp_select_window+0x56d/0xad0 net/ipv4/tcp_output.c:3015
      Code: 44 89 ff e8 d5 cd f0 f9 45 39 e7 0f 8d 20 ff ff ff e8 f7 c7 f0 f9 44 89 e3 e9 13 ff ff ff e8 ea c7 f0 f9 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 d3 c7 f0 f9 41 f7 dc bf 1f 00
      RSP: 0018:ffffc9000184fac0 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff87832e76 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff87832e14 R11: 0000000000000000 R12: 0000000000000000
      R13: 1ffff92000309f5c R14: 0000000000000000 R15: 0000000000000000
      FS:  00000000023eb300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fc2b5f426c0 CR3: 000000001c5cf000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tcp_select_window net/ipv4/tcp_output.c:264 [inline]
       __tcp_transmit_skb+0xa82/0x38f0 net/ipv4/tcp_output.c:1351
       tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
       tcp_send_active_reset+0x475/0x8e0 net/ipv4/tcp_output.c:3449
       tcp_disconnect+0x15a9/0x1e60 net/ipv4/tcp.c:2955
       inet_shutdown+0x260/0x430 net/ipv4/af_inet.c:905
       __sys_shutdown_sock net/socket.c:2189 [inline]
       __sys_shutdown_sock net/socket.c:2183 [inline]
       __sys_shutdown+0xf1/0x1b0 net/socket.c:2201
       __do_sys_shutdown net/socket.c:2209 [inline]
       __se_sys_shutdown net/socket.c:2207 [inline]
       __x64_sys_shutdown+0x50/0x70 net/socket.c:2207
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: e880f8b3 ("tcp: Reset tcp connections in SYN-SENT state")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Manoj Basapathi <manojbm@codeaurora.org>
      Cc: Sauvik Saha <ssaha@codeaurora.org>
      Link: https://lore.kernel.org/r/20210409170237.274904-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      a7150e38
  11. 09 4月, 2021 1 次提交
  12. 07 4月, 2021 2 次提交
  13. 06 4月, 2021 1 次提交
  14. 03 4月, 2021 1 次提交
    • F
      mptcp: add mptcp reset option support · dc87efdb
      Florian Westphal 提交于
      The MPTCP reset option allows to carry a mptcp-specific error code that
      provides more information on the nature of a connection reset.
      
      Reset option data received gets stored in the subflow context so it can
      be sent to userspace via the 'subflow closed' netlink event.
      
      When a subflow is closed, the desired error code that should be sent to
      the peer is also placed in the subflow context structure.
      
      If a reset is sent before subflow establishment could complete, e.g. on
      HMAC failure during an MP_JOIN operation, the mptcp skb extension is
      used to store the reset information.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc87efdb
  15. 02 4月, 2021 6 次提交
  16. 01 4月, 2021 5 次提交
  17. 31 3月, 2021 6 次提交
    • E
      net: fix icmp_echo_enable_probe sysctl · b8128656
      Eric Dumazet 提交于
      sysctl_icmp_echo_enable_probe is an u8.
      
      ipv4_net_table entry should use
       .maxlen       = sizeof(u8).
       .proc_handler = proc_dou8vec_minmax,
      
      Fixes: f1b8fa9f ("net: add sysctl for enabling RFC 8335 PROBE messages")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Andreas Roeseler <andreas.a.roeseler@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8128656
    • P
      udp: never accept GSO_FRAGLIST packets · 78352f73
      Paolo Abeni 提交于
      Currently the UDP protocol delivers GSO_FRAGLIST packets to
      the sockets without the expected segmentation.
      
      This change addresses the issue introducing and maintaining
      a couple of new fields to explicitly accept SKB_GSO_UDP_L4
      or GSO_FRAGLIST packets. Additionally updates  udp_unexpected_gso()
      accordingly.
      
      UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist
      zeroed.
      
      v1 -> v2:
       - use 2 bits instead of a whole GSO bitmask (Willem)
      
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78352f73
    • P
      udp: properly complete L4 GRO over UDP tunnel packet · e0e3070a
      Paolo Abeni 提交于
      After the previous patch, the stack can do L4 UDP aggregation
      on top of a UDP tunnel.
      
      In such scenario, udp{4,6}_gro_complete will be called twice. This function
      will enter its is_flist branch immediately, even though that is only
      correct on the second call, as GSO_FRAGLIST is only relevant for the
      inner packet.
      
      Instead, we need to try first UDP tunnel-based aggregation, if the GRO
      packet requires that.
      
      This patch changes udp{4,6}_gro_complete to skip the frag list processing
      when while encap_mark == 1, identifying processing of the outer tunnel
      header.
      Additionally, clears the field in udp_gro_complete() so that we can enter
      the frag list path on the next round, for the inner header.
      
      v1 -> v2:
       - hopefully clarified the commit message
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e3070a
    • P
      udp: skip L4 aggregation for UDP tunnel packets · 18f25dc3
      Paolo Abeni 提交于
      If NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD are enabled, and there
      are UDP tunnels available in the system, udp_gro_receive() could end-up
      doing L4 aggregation (either SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST) at
      the outer UDP tunnel level for packets effectively carrying and UDP
      tunnel header.
      
      That could cause inner protocol corruption. If e.g. the relevant
      packets carry a vxlan header, different vxlan ids will be ignored/
      aggregated to the same GSO packet. Inner headers will be ignored, too,
      so that e.g. TCP over vxlan push packets will be held in the GRO
      engine till the next flush, etc.
      
      Just skip the SKB_GSO_UDP_L4 and SKB_GSO_FRAGLIST code path if the
      current packet could land in a UDP tunnel, and let udp_gro_receive()
      do GRO via udp_sk(sk)->gro_receive.
      
      The check implemented in this patch is broader than what is strictly
      needed, as the existing UDP tunnel could be e.g. configured on top of
      a different device: we could end-up skipping GRO at-all for some packets.
      
      Anyhow, that is a very thin corner case and covering it will add quite
      a bit of complexity.
      
      v1 -> v2:
       - hopefully clarify the commit message
      
      Fixes: 9fd1ff5d ("udp: Support UDP fraglist GRO/GSO.")
      Fixes: 36707061 ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18f25dc3
    • P
      udp: fixup csum for GSO receive slow path · 000ac44d
      Paolo Abeni 提交于
      When UDP packets generated locally by a socket with UDP_SEGMENT
      traverse the following path:
      
      UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
      	UDP tunnel (rx) -> UDP socket (no UDP_GRO)
      
      ip_summed will be set to CHECKSUM_PARTIAL at creation time and
      such checksum mode will be preserved in the above path up to the
      UDP tunnel receive code where we have:
      
       __iptunnel_pull_header() -> skb_pull_rcsum() ->
      skb_postpull_rcsum() -> __skb_postpull_rcsum()
      
      The latter will convert the skb to CHECKSUM_NONE.
      
      The UDP GSO packet will be later segmented as part of the rx socket
      receive operation, and will present a CHECKSUM_NONE after segmentation.
      
      Additionally the segmented packets UDP CB still refers to the original
      GSO packet len. Overall that causes unexpected/wrong csum validation
      errors later in the UDP receive path.
      
      We could possibly address the issue with some additional checks and
      csum mangling in the UDP tunnel code. Since the issue affects only
      this UDP receive slow path, let's set a suitable csum status there.
      
      Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP
      encapsulation present a valid checksum when landing to udp_queue_rcv_skb(),
      as the UDP checksum has been validated by the GRO engine.
      
      v2 -> v3:
       - even more verbose commit message and comments
      
      v1 -> v2:
       - restrict the csum update to the packets strictly needing them
       - hopefully clarify the commit message and code comments
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      000ac44d
    • F
      netfilter: nf_log_arp: merge with nf_log_syslog · f11d61e7
      Florian Westphal 提交于
      similar to previous change: nf_log_syslog now covers ARP logging
      as well, the old nf_log_arp module is removed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f11d61e7