1. 03 2月, 2021 3 次提交
  2. 02 2月, 2021 2 次提交
  3. 30 1月, 2021 2 次提交
    • E
      net: proc: speedup /proc/net/netstat · 0d6cd689
      Eric Dumazet 提交于
      Use cache friendly helpers to better use cpu caches
      while reading /proc/net/netstat
      
      Tested on a platform with 256 threads (AMD Rome)
      
      Before: 305 usec spent in netstat_seq_show()
      After: 130 usec spent in netstat_seq_show()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20210128162145.1703601-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0d6cd689
    • X
      ip_gre: add csum offload support for gre header · efa1a65c
      Xin Long 提交于
      This patch is to add csum offload support for gre header:
      
      On the TX path in gre_build_header(), when CHECKSUM_PARTIAL's set
      for inner proto, it will calculate the csum for outer proto, and
      inner csum will be offloaded later. Otherwise, CHECKSUM_PARTIAL
      and csum_start/offset will be set for outer proto, and the outer
      csum will be offloaded later.
      
      On the GSO path in gre_gso_segment(), when CHECKSUM_PARTIAL is
      not set for inner proto and the hardware supports csum offload,
      CHECKSUM_PARTIAL and csum_start/offset will be set for outer
      proto, and outer csum will be offloaded later. Otherwise, it
      will do csum for outer proto by calling gso_make_checksum().
      
      Note that SCTP has to do the csum by itself for non GSO path in
      sctp_packet_pack(), as gre_build_header() can't handle the csum
      with CHECKSUM_PARTIAL set for SCTP CRC csum offload.
      
      v1->v2:
        - remove the SCTP part, as GRE dev doesn't support SCTP CRC CSUM
          and it will always do checksum for SCTP in sctp_packet_pack()
          when it's not a GSO packet.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      efa1a65c
  4. 29 1月, 2021 12 次提交
  5. 24 1月, 2021 3 次提交
  6. 23 1月, 2021 3 次提交
  7. 21 1月, 2021 6 次提交
  8. 20 1月, 2021 4 次提交
    • Y
      tcp: fix TCP socket rehash stats mis-accounting · 9c30ae83
      Yuchung Cheng 提交于
      The previous commit 32efcc06 ("tcp: export count for rehash attempts")
      would mis-account rehashing SNMP and socket stats:
      
        a. During handshake of an active open, only counts the first
           SYN timeout
      
        b. After handshake of passive and active open, stop updating
           after (roughly) TCP_RETRIES1 recurring RTOs
      
        c. After the socket aborts, over count timeout_rehash by 1
      
      This patch fixes this by checking the rehash result from sk_rethink_txhash.
      
      Fixes: 32efcc06 ("tcp: export count for rehash attempts")
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Link: https://lore.kernel.org/r/20210119192619.1848270-1-ycheng@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9c30ae83
    • E
      tcp: do not mess with cloned skbs in tcp_add_backlog() · b160c285
      Eric Dumazet 提交于
      Heiner Kallweit reported that some skbs were sent with
      the following invalid GSO properties :
      - gso_size > 0
      - gso_type == 0
      
      This was triggerring a WARN_ON_ONCE() in rtl8169_tso_csum_v2.
      
      Juerg Haefliger was able to reproduce a similar issue using
      a lan78xx NIC and a workload mixing TCP incoming traffic
      and forwarded packets.
      
      The problem is that tcp_add_backlog() is writing
      over gso_segs and gso_size even if the incoming packet will not
      be coalesced to the backlog tail packet.
      
      While skb_try_coalesce() would bail out if tail packet is cloned,
      this overwriting would lead to corruptions of other packets
      cooked by lan78xx, sharing a common super-packet.
      
      The strategy used by lan78xx is to use a big skb, and split
      it into all received packets using skb_clone() to avoid copies.
      The drawback of this strategy is that all the small skb share a common
      struct skb_shared_info.
      
      This patch rewrites TCP gso_size/gso_segs handling to only
      happen on the tail skb, since skb_try_coalesce() made sure
      it was not cloned.
      
      Fixes: 4f693b55 ("tcp: implement coalescing on backlog queue")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Bisected-by: NJuerg Haefliger <juergh@canonical.com>
      Tested-by: NJuerg Haefliger <juergh@canonical.com>
      Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=209423
      Link: https://lore.kernel.org/r/20210119164900.766957-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      b160c285
    • G
      netfilter: rpfilter: mask ecn bits before fib lookup · 2e5a6266
      Guillaume Nault 提交于
      RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
      treats Not-ECT or ECT(1) packets in a different way than those with
      ECT(0) or CE.
      
      Reproducer:
      
        Create two netns, connected with a veth:
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
        $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
        $ ip -netns ns1 address add 192.0.2.11/32 dev veth10
      
        Add a route to ns1 in ns0:
        $ ip -netns ns0 route add 192.0.2.11/32 dev veth01
      
        In ns1, only packets with TOS 4 can be routed to ns0:
        $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10
      
        Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
        is 4:
        $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
          ... 0% packet loss ...
      
        Now use iptable's rpfilter module in ns1:
        $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP
      
        Not-ECT and ECT(1) packets still pass:
        $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
          ... 0% packet loss ...
      
        But ECT(0) and ECN packets are dropped:
        $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
          ... 100% packet loss ...
        $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
          ... 100% packet loss ...
      
      After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.
      
      Fixes: 8f97339d ("netfilter: add ipv4 reverse path filter match")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2e5a6266
    • G
      udp: mask TOS bits in udp_v4_early_demux() · 8d2b51b0
      Guillaume Nault 提交于
      udp_v4_early_demux() is the only function that calls
      ip_mc_validate_source() with a TOS that hasn't been masked with
      IPTOS_RT_MASK.
      
      This results in different behaviours for incoming multicast UDPv4
      packets, depending on if ip_mc_validate_source() is called from the
      early-demux path (udp_v4_early_demux) or from the regular input path
      (ip_route_input_noref).
      
      ECN would normally not be used with UDP multicast packets, so the
      practical consequences should be limited on that side. However,
      IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
      with the non-early-demux path behaviour.
      
      Reproducer:
      
        Setup two netns, connected with veth:
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip -netns ns0 link set dev lo up
        $ ip -netns ns1 link set dev lo up
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
        $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
        $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10
      
        In ns0, add route to multicast address 224.0.2.0/24 using source
        address 198.51.100.10:
        $ ip -netns ns0 address add 198.51.100.10/32 dev lo
        $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10
      
        In ns1, define route to 198.51.100.10, only for packets with TOS 4:
        $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10
      
        Also activate rp_filter in ns1, so that incoming packets not matching
        the above route get dropped:
        $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1
      
        Now try to receive packets on 224.0.2.11:
        $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -
      
        In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
        tos 6 for socat):
        $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
      
        The "test0" message is properly received by socat in ns1, because
        early-demux has no cached dst to use, so source address validation
        is done by ip_route_input_mc(), which receives a TOS that has the
        ECN bits masked.
      
        Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
        $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
      
        The "test1" message isn't received by socat in ns1, because, now,
        early-demux has a cached dst to use and calls ip_mc_validate_source()
        immediately, without masking the ECN bits.
      
      Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8d2b51b0
  9. 19 1月, 2021 1 次提交
  10. 16 1月, 2021 1 次提交
  11. 14 1月, 2021 1 次提交
  12. 12 1月, 2021 1 次提交
    • W
      esp: avoid unneeded kmap_atomic call · 9bd6b629
      Willem de Bruijn 提交于
      esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for
      the esp trailer.
      
      It accesses the page with kmap_atomic to handle highmem. But
      skb_page_frag_refill can return compound pages, of which
      kmap_atomic only maps the first underlying page.
      
      skb_page_frag_refill does not return highmem, because flag
      __GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP.
      That also does not call kmap_atomic, but directly uses page_address,
      in skb_copy_to_page_nocache. Do the same for ESP.
      
      This issue has become easier to trigger with recent kmap local
      debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.
      
      Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
      Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9bd6b629
  13. 09 1月, 2021 1 次提交
    • J
      net: ip_tunnel: clean up endianness conversions · fda4fde2
      Julian Wiedmann 提交于
      sparse complains about some harmless endianness issues:
      
      > net/ipv4/ip_tunnel_core.c:225:43: warning: cast to restricted __be16
      > net/ipv4/ip_tunnel_core.c:225:43: warning: incorrect type in initializer (different base types)
      > net/ipv4/ip_tunnel_core.c:225:43:    expected restricted __be16 [usertype] mtu
      > net/ipv4/ip_tunnel_core.c:225:43:    got unsigned short [usertype]
      
      iptunnel_pmtud_build_icmp() uses the wrong flavour of byte-order conversion
      when storing the MTU into the ICMPv4 packet. Use htons(), just like
      iptunnel_pmtud_build_icmpv6() does.
      
      > net/ipv4/ip_tunnel_core.c:248:35: warning: cast from restricted __be16
      > net/ipv4/ip_tunnel_core.c:248:35: warning: incorrect type in argument 3 (different base types)
      > net/ipv4/ip_tunnel_core.c:248:35:    expected unsigned short type
      > net/ipv4/ip_tunnel_core.c:248:35:    got restricted __be16 [usertype]
      > net/ipv4/ip_tunnel_core.c:341:35: warning: cast from restricted __be16
      > net/ipv4/ip_tunnel_core.c:341:35: warning: incorrect type in argument 3 (different base types)
      > net/ipv4/ip_tunnel_core.c:341:35:    expected unsigned short type
      > net/ipv4/ip_tunnel_core.c:341:35:    got restricted __be16 [usertype]
      
      eth_header() wants the Ethertype in host-order, use the correct flavour of
      byte-order conversion.
      
      > net/ipv4/ip_tunnel_core.c:600:45: warning: restricted __be16 degrades to integer
      > net/ipv4/ip_tunnel_core.c:609:30: warning: incorrect type in assignment (different base types)
      > net/ipv4/ip_tunnel_core.c:609:30:    expected int type
      > net/ipv4/ip_tunnel_core.c:609:30:    got restricted __be16 [usertype]
      > net/ipv4/ip_tunnel_core.c:619:30: warning: incorrect type in assignment (different base types)
      > net/ipv4/ip_tunnel_core.c:619:30:    expected int type
      > net/ipv4/ip_tunnel_core.c:619:30:    got restricted __be16 [usertype]
      > net/ipv4/ip_tunnel_core.c:629:30: warning: incorrect type in assignment (different base types)
      > net/ipv4/ip_tunnel_core.c:629:30:    expected int type
      > net/ipv4/ip_tunnel_core.c:629:30:    got restricted __be16 [usertype]
      
      The TUNNEL_* types are big-endian, so adjust the type of the local
      variable in ip_tun_parse_opts().
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Link: https://lore.kernel.org/r/20210107144008.25777-1-jwi@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      fda4fde2