1. 29 4月, 2021 1 次提交
  2. 14 4月, 2021 1 次提交
  3. 31 3月, 2021 1 次提交
    • A
      icmp: add response to RFC 8335 PROBE messages · d329ea5b
      Andreas Roeseler 提交于
      Modify the icmp_rcv function to check PROBE messages and call icmp_echo
      if a PROBE request is detected.
      
      Modify the existing icmp_echo function to respond ot both ping and PROBE
      requests.
      
      This was tested using a custom modification to the iputils package and
      wireshark. It supports IPV4 probing by name, ifindex, and probing by
      both IPV4 and IPV6 addresses. It currently does not support responding
      to probes off the proxy node (see RFC 8335 Section 2).
      
      The modification to the iputils package is still in development and can
      be found here: https://github.com/Juniper-Clinic-2020/iputils.git. It
      supports full sending functionality of PROBE requests, but currently
      does not parse the response messages, which is why Wireshark is required
      to verify the sent and recieved PROBE messages. The modification adds
      the ``-e'' flag to the command which allows the user to specify the
      interface identifier to query the probed host. An example usage would be
      <./ping -4 -e 1 [destination]> to send a PROBE request of ifindex 1 to the
      destination node.
      Signed-off-by: NAndreas Roeseler <andreas.a.roeseler@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d329ea5b
  4. 24 2月, 2021 1 次提交
    • J
      net: icmp: pass zeroed opts from icmp{,v6}_ndo_send before sending · ee576c47
      Jason A. Donenfeld 提交于
      The icmp{,v6}_send functions make all sorts of use of skb->cb, casting
      it with IPCB or IP6CB, assuming the skb to have come directly from the
      inet layer. But when the packet comes from the ndo layer, especially
      when forwarded, there's no telling what might be in skb->cb at that
      point. As a result, the icmp sending code risks reading bogus memory
      contents, which can result in nasty stack overflows such as this one
      reported by a user:
      
          panic+0x108/0x2ea
          __stack_chk_fail+0x14/0x20
          __icmp_send+0x5bd/0x5c0
          icmp_ndo_send+0x148/0x160
      
      In icmp_send, skb->cb is cast with IPCB and an ip_options struct is read
      from it. The optlen parameter there is of particular note, as it can
      induce writes beyond bounds. There are quite a few ways that can happen
      in __ip_options_echo. For example:
      
          // sptr/skb are attacker-controlled skb bytes
          sptr = skb_network_header(skb);
          // dptr/dopt points to stack memory allocated by __icmp_send
          dptr = dopt->__data;
          // sopt is the corrupt skb->cb in question
          if (sopt->rr) {
              optlen  = sptr[sopt->rr+1]; // corrupt skb->cb + skb->data
              soffset = sptr[sopt->rr+2]; // corrupt skb->cb + skb->data
      	// this now writes potentially attacker-controlled data, over
      	// flowing the stack:
              memcpy(dptr, sptr+sopt->rr, optlen);
          }
      
      In the icmpv6_send case, the story is similar, but not as dire, as only
      IP6CB(skb)->iif and IP6CB(skb)->dsthao are used. The dsthao case is
      worse than the iif case, but it is passed to ipv6_find_tlv, which does
      a bit of bounds checking on the value.
      
      This is easy to simulate by doing a `memset(skb->cb, 0x41,
      sizeof(skb->cb));` before calling icmp{,v6}_ndo_send, and it's only by
      good fortune and the rarity of icmp sending from that context that we've
      avoided reports like this until now. For example, in KASAN:
      
          BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xa0e/0x12b0
          Write of size 38 at addr ffff888006f1f80e by task ping/89
          CPU: 2 PID: 89 Comm: ping Not tainted 5.10.0-rc7-debug+ #5
          Call Trace:
           dump_stack+0x9a/0xcc
           print_address_description.constprop.0+0x1a/0x160
           __kasan_report.cold+0x20/0x38
           kasan_report+0x32/0x40
           check_memory_region+0x145/0x1a0
           memcpy+0x39/0x60
           __ip_options_echo+0xa0e/0x12b0
           __icmp_send+0x744/0x1700
      
      Actually, out of the 4 drivers that do this, only gtp zeroed the cb for
      the v4 case, while the rest did not. So this commit actually removes the
      gtp-specific zeroing, while putting the code where it belongs in the
      shared infrastructure of icmp{,v6}_ndo_send.
      
      This commit fixes the issue by passing an empty IPCB or IP6CB along to
      the functions that actually do the work. For the icmp_send, this was
      already trivial, thanks to __icmp_send providing the plumbing function.
      For icmpv6_send, this required a tiny bit of refactoring to make it
      behave like the v4 case, after which it was straight forward.
      
      Fixes: a2b78e9b ("sunvnet: generate ICMP PTMUD messages for smaller port MTUs")
      Reported-by: NSinYu <liuxyon@gmail.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/netdev/CAF=yD-LOF116aHub6RMe8vB8ZpnrrnoTdqhobEx+bvoA8AsP0w@mail.gmail.com/T/Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Link: https://lore.kernel.org/r/20210223131858.72082-1-Jason@zx2c4.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ee576c47
  5. 24 11月, 2020 1 次提交
  6. 17 10月, 2020 1 次提交
  7. 15 10月, 2020 1 次提交
    • M
      ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2) · e1e84eb5
      Mathieu Desnoyers 提交于
      As per RFC792, ICMP errors should be sent to the source host.
      
      However, in configurations with Virtual Routing and Forwarding tables,
      looking up which routing table to use is currently done by using the
      destination net_device.
      
      commit 9d1a6c4e ("net: icmp_route_lookup should use rt dev to
      determine L3 domain") changes the interface passed to
      l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
      to skb_dst(skb_in)->dev. This effectively uses the destination device
      rather than the source device for choosing which routing table should be
      used to lookup where to send the ICMP error.
      
      Therefore, if the source and destination interfaces are within separate
      VRFs, or one in the global routing table and the other in a VRF, looking
      up the source host in the destination interface's routing table will
      fail if the destination interface's routing table contains no route to
      the source host.
      
      One observable effect of this issue is that traceroute does not work in
      the following cases:
      
      - Route leaking between global routing table and VRF
      - Route leaking between VRFs
      
      Preferably use the source device routing table when sending ICMP error
      messages. If no source device is set, fall-back on the destination
      device routing table. Else, use the main routing table (index 0).
      
      [ It has been pointed out that a similar issue may exist with ICMP
        errors triggered when forwarding between network namespaces. It would
        be worthwhile to investigate, but is outside of the scope of this
        investigation. ]
      
      [ It has also been pointed out that a similar issue exists with
        unreachable / fragmentation needed messages, which can be triggered by
        changing the MTU of eth1 in r1 to 1400 and running:
      
        ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
      
        Some investigation points to raw_icmp_error() and raw_err() as being
        involved in this last scenario. The focus of this patch is TTL expired
        ICMP messages, which go through icmp_route_lookup.
        Investigation of failure modes related to raw_icmp_error() is beyond
        this investigation's scope. ]
      
      Fixes: 9d1a6c4e ("net: icmp_route_lookup should use rt dev to determine L3 domain")
      Link: https://tools.ietf.org/html/rfc792Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e1e84eb5
  8. 01 9月, 2020 1 次提交
  9. 25 8月, 2020 1 次提交
  10. 21 8月, 2020 3 次提交
  11. 25 7月, 2020 3 次提交
    • W
      icmp6: support rfc 4884 · 01370434
      Willem de Bruijn 提交于
      Extend the rfc 4884 read interface introduced for ipv4 in
      commit eba75c58 ("icmp: support rfc 4884") to ipv6.
      
      Add socket option SOL_IPV6/IPV6_RECVERR_RFC4884.
      
      Changes v1->v2:
        - make ipv6_icmp_error_rfc4884 static (file scope)
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01370434
    • W
      icmp: prepare rfc 4884 for ipv6 · 178c49d9
      Willem de Bruijn 提交于
      The RFC 4884 spec is largely the same between IPv4 and IPv6.
      Factor out the IPv4 specific parts in preparation for IPv6 support:
      
      - icmp types supported
      
      - icmp header size, and thus offset to original datagram start
      
      - datagram length field offset in icmp(6)hdr.
      
      - datagram length field word size: 4B for IPv4, 8B for IPv6.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      178c49d9
    • W
      icmp: revise rfc4884 tests · c4e9e09f
      Willem de Bruijn 提交于
      1) Only accept packets with original datagram len field >= header len.
      
      The extension header must start after the original datagram headers.
      The embedded datagram len field is compared against the 128B minimum
      stipulated by RFC 4884. It is unlikely that headers extend beyond
      this. But as we know the exact header length, check explicitly.
      
      2) Remove the check that datagram length must be <= 576B.
      
      This is a send constraint. There is no value in testing this on rx.
      Within private networks it may be known safe to send larger packets.
      Process these packets.
      
      This test was also too lax. It compared original datagram length
      rather than entire icmp packet length. The stand-alone fix would be:
      
        -       if (hlen + skb->len > 576)
        +       if (-skb_network_offset(skb) + skb->len > 576)
      
      Fixes: eba75c58 ("icmp: support rfc 4884")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4e9e09f
  12. 20 7月, 2020 1 次提交
    • W
      icmp: support rfc 4884 · eba75c58
      Willem de Bruijn 提交于
      Add setsockopt SOL_IP/IP_RECVERR_4884 to return the offset to an
      extension struct if present.
      
      ICMP messages may include an extension structure after the original
      datagram. RFC 4884 standardized this behavior. It stores the offset
      in words to the extension header in u8 icmphdr.un.reserved[1].
      
      The field is valid only for ICMP types destination unreachable, time
      exceeded and parameter problem, if length is at least 128 bytes and
      entire packet does not exceed 576 bytes.
      
      Return the offset to the start of the extension struct when reading an
      ICMP error from the error queue, if it matches the above constraints.
      
      Do not return the raw u8 field. Return the offset from the start of
      the user buffer, in bytes. The kernel does not return the network and
      transport headers, so subtract those.
      
      Also validate the headers. Return the offset regardless of validation,
      as an invalid extension must still not be misinterpreted as part of
      the original datagram. Note that !invalid does not imply valid. If
      the extension version does not match, no validation can take place,
      for instance.
      
      For backward compatibility, make this optional, set by setsockopt
      SOL_IP/IP_RECVERR_RFC4884. For API example and feature test, see
      github.com/wdebruij/kerneltools/blob/master/tests/recv_icmp_v2.c
      
      For forward compatibility, reserve only setsockopt value 1, leaving
      other bits for additional icmp extensions.
      
      Changes
        v1->v2:
        - convert word offset to byte offset from start of user buffer
          - return in ee_data as u8 may be insufficient
        - define extension struct and object header structs
        - return len only if constraints met
        - if returning len, also validate
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eba75c58
  13. 02 7月, 2020 1 次提交
    • W
      ip: Fix SO_MARK in RST, ACK and ICMP packets · 0da7536f
      Willem de Bruijn 提交于
      When no full socket is available, skbs are sent over a per-netns
      control socket. Its sk_mark is temporarily adjusted to match that
      of the real (request or timewait) socket or to reflect an incoming
      skb, so that the outgoing skb inherits this in __ip_make_skb.
      
      Introduction of the socket cookie mark field broke this. Now the
      skb is set through the cookie and cork:
      
      <caller>		# init sockc.mark from sk_mark or cmsg
      ip_append_data
        ip_setup_cork		# convert sockc.mark to cork mark
      ip_push_pending_frames
        ip_finish_skb
          __ip_make_skb	# set skb->mark to cork mark
      
      But I missed these special control sockets. Update all callers of
      __ip(6)_make_skb that were originally missed.
      
      For IPv6, the same two icmp(v6) paths are affected. The third
      case is not, as commit 92e55f41 ("tcp: don't annotate
      mark on control socket from tcp_v6_send_response()") replaced
      the ctl_sk->sk_mark with passing the mark field directly as a
      function argument. That commit predates the commit that
      introduced the bug.
      
      Fixes: c6af0c22 ("ip: support SO_MARK cmsg")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reported-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0da7536f
  14. 29 4月, 2020 1 次提交
  15. 13 3月, 2020 1 次提交
  16. 14 2月, 2020 1 次提交
  17. 09 11月, 2019 1 次提交
    • E
      net: icmp: fix data-race in cmp_global_allow() · bbab7ef2
      Eric Dumazet 提交于
      This code reads two global variables without protection
      of a lock. We need READ_ONCE()/WRITE_ONCE() pairs to
      avoid load/store-tearing and better document the intent.
      
      KCSAN reported :
      BUG: KCSAN: data-race in icmp_global_allow / icmp_global_allow
      
      read to 0xffffffff861a8014 of 4 bytes by task 11201 on cpu 0:
       icmp_global_allow+0x36/0x1b0 net/ipv4/icmp.c:254
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
      
      write to 0xffffffff861a8014 of 4 bytes by task 11183 on cpu 1:
       icmp_global_allow+0x174/0x1b0 net/ipv4/icmp.c:272
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 4cdf507d ("icmp: add a global rate limitation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbab7ef2
  18. 04 11月, 2019 1 次提交
    • F
      net: icmp: use input address in traceroute · 2adf81c0
      Francesco Ruggeri 提交于
      Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the
      primary address of the interface the packet was received on, even if
      the path goes through a secondary address. In the example:
      
                          1.0.3.1/24
       ---- 1.0.1.3/24    1.0.1.1/24 ---- 1.0.2.1/24    1.0.2.4/24 ----
       |H1|--------------------------|R1|--------------------------|H2|
       ----            N1            ----            N2            ----
      
      where 1.0.3.1/24 is R1's primary address on N1, traceroute from
      H1 to H2 returns:
      
      traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
       1  1.0.3.1 (1.0.3.1)  0.018 ms  0.006 ms  0.006 ms
       2  1.0.2.4 (1.0.2.4)  0.021 ms  0.007 ms  0.007 ms
      
      After applying this patch, it returns:
      
      traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
       1  1.0.1.1 (1.0.1.1)  0.033 ms  0.007 ms  0.006 ms
       2  1.0.2.4 (1.0.2.4)  0.011 ms  0.007 ms  0.007 ms
      Original-patch-by: NBill Fenner <fenner@arista.com>
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2adf81c0
  19. 25 8月, 2019 1 次提交
    • H
      ipv4/icmp: fix rt dst dev null pointer dereference · e2c69393
      Hangbin Liu 提交于
      In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the reason we need rt->dst.dev is to get the net.
      So we can just try get it from skb->dev when rt->dst.dev is NULL.
      
      v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
      still use dst.dev and do a check to avoid crash.
      
      v3: No changes.
      
      v2: fix the issue in __icmp_send() instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: c8b34e68 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2c69393
  20. 22 8月, 2019 1 次提交
  21. 04 6月, 2019 1 次提交
  22. 31 5月, 2019 1 次提交
  23. 26 2月, 2019 1 次提交
  24. 25 2月, 2019 1 次提交
  25. 09 11月, 2018 1 次提交
    • S
      net: Convert protocol error handlers from void to int · 32bbd879
      Stefano Brivio 提交于
      We'll need this to handle ICMP errors for tunnels without a sending socket
      (i.e. FoU and GUE). There, we might have to look up different types of IP
      tunnels, registered as network protocols, before we get a match, so we
      want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
      inet_protos and inet6_protos. These error codes will be used in the next
      patch.
      
      For consistency, return sensible error codes in protocol error handlers
      whenever handlers can't handle errors because, even if valid, they don't
      match a protocol or any of its states.
      
      This has no effect on existing error handling paths.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32bbd879
  26. 27 9月, 2018 2 次提交
  27. 07 7月, 2018 1 次提交
  28. 04 7月, 2018 1 次提交
  29. 28 3月, 2018 1 次提交
  30. 13 2月, 2018 1 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
  31. 24 10月, 2017 1 次提交
  32. 15 10月, 2017 1 次提交
    • M
      icmp: don't fail on fragment reassembly time exceeded · 258bbb1b
      Matteo Croce 提交于
      The ICMP implementation currently replies to an ICMP time exceeded message
      (type 11) with an ICMP host unreachable message (type 3, code 1).
      
      However, time exceeded messages can either represent "time to live exceeded
      in transit" (code 0) or "fragment reassembly time exceeded" (code 1).
      
      Unconditionally replying to "fragment reassembly time exceeded" with
      host unreachable messages might cause unjustified connection resets
      which are now easily triggered as UFO has been removed, because, in turn,
      sending large buffers triggers IP fragmentation.
      
      The issue can be easily reproduced by running a lot of UDP streams
      which is likely to trigger IP fragmentation:
      
        # start netserver in the test namespace
        ip netns add test
        ip netns exec test netserver
      
        # create a VETH pair
        ip link add name veth0 type veth peer name veth0 netns test
        ip link set veth0 up
        ip -n test link set veth0 up
      
        for i in $(seq 20 29); do
            # assign addresses to both ends
            ip addr add dev veth0 192.168.$i.1/24
            ip -n test addr add dev veth0 192.168.$i.2/24
      
            # start the traffic
            netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
        done
      
        # wait
        send_data: data send error: No route to host (errno 113)
        netperf: send_omni: send_data failed: No route to host
      
      We need to differentiate instead: if fragment reassembly time exceeded
      is reported, we need to silently drop the packet,
      if time to live exceeded is reported, maintain the current behaviour.
      In both cases increment the related error count "icmpInTimeExcds".
      
      While at it, fix a typo in a comment, and convert the if statement
      into a switch to mate it more readable.
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      258bbb1b
  33. 07 8月, 2017 1 次提交
  34. 15 6月, 2017 1 次提交
    • J
      net: don't global ICMP rate limit packets originating from loopback · 849a44de
      Jesper Dangaard Brouer 提交于
      Florian Weimer seems to have a glibc test-case which requires that
      loopback interfaces does not get ICMP ratelimited.  This was broken by
      commit c0303efe ("net: reduce cycles spend on ICMP replies that
      gets rate limited").
      
      An ICMP response will usually be routed back-out the same incoming
      interface.  Thus, take advantage of this and skip global ICMP
      ratelimit when the incoming device is loopback.  In the unlikely event
      that the outgoing it not loopback, due to strange routing policy
      rules, ICMP rate limiting still works via peer ratelimiting via
      icmpv4_xrlim_allow().  Thus, we should still comply with RFC1812
      (section 4.3.2.8 "Rate Limiting").
      
      This seems to fix the reproducer given by Florian.  While still
      avoiding to perform expensive and unneeded outgoing route lookup for
      rate limited packets (in the non-loopback case).
      
      Fixes: c0303efe ("net: reduce cycles spend on ICMP replies that gets rate limited")
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Reported-by: N"H.J. Lu" <hjl.tools@gmail.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      849a44de
  35. 27 5月, 2017 1 次提交