1. 27 4月, 2017 9 次提交
  2. 25 4月, 2017 4 次提交
    • W
      net/tcp_fastopen: Remove mss check in tcp_write_timeout() · 59450f8d
      Wei Wang 提交于
      Christoph Paasch from Apple found another firewall issue for TFO:
      After successful 3WHS using TFO, server and client starts to exchange
      data. Afterwards, a 10s idle time occurs on this connection. After that,
      firewall starts to drop every packet on this connection.
      
      The fix for this issue is to extend existing firewall blackhole detection
      logic in tcp_write_timeout() by removing the mss check.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59450f8d
    • W
      net/tcp_fastopen: Add snmp counter for blackhole detection · 46c2fa39
      Wei Wang 提交于
      This counter records the number of times the firewall blackhole issue is
      detected and active TFO is disabled.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46c2fa39
    • W
      net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
      Wei Wang 提交于
      Middlebox firewall issues can potentially cause server's data being
      blackholed after a successful 3WHS using TFO. Following are the related
      reports from Apple:
      https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
      Slide 31 identifies an issue where the client ACK to the server's data
      sent during a TFO'd handshake is dropped.
      C ---> syn-data ---> S
      C <--- syn/ack ----- S
      C (accept & write)
      C <---- data ------- S
      C ----- ACK -> X     S
      		[retry and timeout]
      
      https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
      Slide 5 shows a similar situation that the server's data gets dropped
      after 3WHS.
      C ---- syn-data ---> S
      C <--- syn/ack ----- S
      C ---- ack --------> S
      S (accept & write)
      C?  X <- data ------ S
      		[retry and timeout]
      
      This is the worst failure b/c the client can not detect such behavior to
      mitigate the situation (such as disabling TFO). Failing to proceed, the
      application (e.g., SSL library) may simply timeout and retry with TFO
      again, and the process repeats indefinitely.
      
      The proposed solution is to disable active TFO globally under the
      following circumstances:
      1. client side TFO socket detects out of order FIN
      2. client side TFO socket receives out of order RST
      
      We disable active side TFO globally for 1hr at first. Then if it
      happens again, we disable it for 2h, then 4h, 8h, ...
      And we reset the timeout to 1hr if a client side TFO sockets not opened
      on loopback has successfully received data segs from server.
      And we examine this condition during close().
      
      The rational behind it is that when such firewall issue happens,
      application running on the client should eventually close the socket as
      it is not able to get the data it is expecting. Or application running
      on the server should close the socket as it is not able to receive any
      response from client.
      In both cases, out of order FIN or RST will get received on the client
      given that the firewall will not block them as no data are in those
      frames.
      And we want to disable active TFO globally as it helps if the middle box
      is very close to the client and most of the connections are likely to
      fail.
      
      Also, add a debug sysctl:
        tcp_fastopen_blackhole_detect_timeout_sec:
          the initial timeout to use when firewall blackhole issue happens.
          This can be set and read.
          When setting it to 0, it means to disable the active disable logic.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf1ef3f0
    • D
      net: add rcu locking when changing early demux · 58c4c6a3
      David Ahern 提交于
      systemd-sysctl is triggering a suspicious RCU usage message when
      net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
      a sysctl config file:
      
      [   33.896184] ===============================
      [   33.899558] [ ERR: suspicious RCU usage.  ]
      [   33.900624] 4.11.0-rc7+ #104 Not tainted
      [   33.901698] -------------------------------
      [   33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
      [   33.905724]
      other info that might help us debug this:
      
      [   33.907656]
      rcu_scheduler_active = 2, debug_locks = 0
      [   33.909288] 1 lock held by systemd-sysctl/143:
      [   33.910373]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48
      [   33.912407]
      stack backtrace:
      [   33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
      [   33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   33.917870] Call Trace:
      [   33.918431]  dump_stack+0x81/0xb6
      [   33.919241]  lockdep_rcu_suspicious+0x10f/0x118
      [   33.920263]  proc_configure_early_demux+0x65/0x10a
      [   33.921391]  proc_udp_early_demux+0x3a/0x41
      
      add rcu locking to proc_configure_early_demux.
      
      Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58c4c6a3
  3. 22 4月, 2017 1 次提交
    • C
      ip_tunnel: Allow policy-based routing through tunnels · 9830ad4c
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      
      There is no concept of per-packet routing decisions through IPv4
      tunnels, so this implementation does not need to work with
      per-packet route lookups as the v6 implementation may
      (with IP6_TNL_F_USE_ORIG_FWMARK).
      
      Further, since the v4 tunnel ioctls share datastructures
      (which can not be trivially modified) with the kernel's internal
      tunnel configuration structures, the mark attribute must be stored
      in the tunnel structure itself and passed as a parameter when
      creating or changing tunnel attributes.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9830ad4c
  4. 21 4月, 2017 3 次提交
  5. 19 4月, 2017 1 次提交
    • I
      esp4/6: Fix GSO path for non-GSO SW-crypto packets · 8f92e03e
      Ilan Tayari 提交于
      If esp*_offload module is loaded, outbound packets take the
      GSO code path, being encapsulated at layer 3, but encrypted
      in layer 2. validate_xmit_xfrm calls esp*_xmit for that.
      
      esp*_xmit was wrongfully detecting these packets as going
      through hardware crypto offload, while in fact they should
      be encrypted in software, causing plaintext leakage to
      the network, and also dropping at the receiver side.
      
      Perform the encryption in esp*_xmit, if the SA doesn't have
      a hardware offload_handle.
      
      Also, align esp6 code to esp4 logic.
      
      Fixes: fca11ebd ("esp4: Reorganize esp_output")
      Fixes: 383d0350 ("esp6: Reorganize esp_output")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      8f92e03e
  6. 18 4月, 2017 3 次提交
  7. 14 4月, 2017 10 次提交
  8. 10 4月, 2017 1 次提交
    • E
      tcp: clear saved_syn in tcp_disconnect() · 17c3060b
      Eric Dumazet 提交于
      In the (very unlikely) case a passive socket becomes a listener,
      we do not want to duplicate its saved SYN headers.
      
      This would lead to double frees, use after free, and please hackers and
      various fuzzers
      
      Tested:
          0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, IPPROTO_TCP, TCP_SAVE_SYN, [1], 4) = 0
         +0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
      
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 5) = 0
      
         +0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <...>
        +.1 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
         +0 connect(4, AF_UNSPEC, ...) = 0
         +0 close(3) = 0
         +0 bind(4, ..., ...) = 0
         +0 listen(4, 5) = 0
      
         +0 < S 0:0(0) win 32972 <mss 1460,nop,wscale 7>
         +0 > S. 0:0(0) ack 1 <...>
        +.1 < . 1:1(0) ack 1 win 257
      
      Fixes: cd8ae852 ("tcp: provide SYN headers for passive connections")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17c3060b
  9. 08 4月, 2017 2 次提交
  10. 07 4月, 2017 2 次提交
    • F
      net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given · bbadb9a2
      Florian Larysch 提交于
      inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
      ip_route_input when iif is given. If a multipath route is present for
      the designated destination, fib_multipath_hash ends up being called with
      that skb. However, as that skb contains no information beyond the
      protocol type, the calculated hash does not match the one we would see
      for a real packet.
      
      There is currently no way to fix this for layer 4 hashing, as
      RTM_GETROUTE doesn't have the necessary information to create layer 4
      headers. To fix this for layer 3 hashing, set appropriate saddr/daddrs
      in the skb and also change the protocol to UDP to avoid special
      treatment for ICMP.
      Signed-off-by: NFlorian Larysch <fl@n621.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbadb9a2
    • F
      net: ipv4: fix multipath RTM_GETROUTE behavior when iif is given · a8801799
      Florian Larysch 提交于
      inet_rtm_getroute synthesizes a skeletal ICMP skb, which is passed to
      ip_route_input when iif is given. If a multipath route is present for
      the designated destination, ip_multipath_icmp_hash ends up being called,
      which uses the source/destination addresses within the skb to calculate
      a hash. However, those are not set in the synthetic skb, causing it to
      return an arbitrary and incorrect result.
      
      Instead, use UDP, which gets no such special treatment.
      Signed-off-by: NFlorian Larysch <fl@n621.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8801799
  11. 06 4月, 2017 2 次提交
  12. 05 4月, 2017 1 次提交
  13. 04 4月, 2017 1 次提交
    • M
      tcp: minimize false-positives on TCP/GRO check · 0b9aefea
      Marcelo Ricardo Leitner 提交于
      Markus Trippelsdorf reported that after commit dcb17d22 ("tcp: warn
      on bogus MSS and try to amend it") the kernel started logging the
      warning for a NIC driver that doesn't even support GRO.
      
      It was diagnosed that it was possibly caused on connections that were
      using TCP Timestamps but some packets lacked the Timestamps option. As
      we reduce rcv_mss when timestamps are used, the lack of them would cause
      the packets to be bigger than expected, although this is a valid case.
      
      As this warning is more as a hint, getting a clean-cut on the
      threshold is probably not worth the execution time spent on it. This
      patch thus alleviates the false-positives with 2 quick checks: by
      accounting for the entire TCP option space and also checking against the
      interface MTU if it's available.
      
      These changes, specially the MTU one, might mask some real positives,
      though if they are really happening, it's possible that sooner or later
      it will be triggered anyway.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b9aefea