1. 28 10月, 2017 4 次提交
  2. 27 10月, 2017 14 次提交
  3. 20 10月, 2017 1 次提交
  4. 02 10月, 2017 4 次提交
  5. 01 8月, 2017 1 次提交
  6. 16 6月, 2017 1 次提交
    • D
      tcp: ULP infrastructure · 734942cc
      Dave Watson 提交于
      Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
      sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
      ULP can add its own logic by changing the TCP proto_ops structure to its own
      methods.
      
      Example usage:
      
      setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
      
      modules will call:
      tcp_register_ulp(&tcp_tls_ulp_ops);
      
      to register/unregister their ulp, with an init function and name.
      
      A list of registered ulps will be returned by tcp_get_available_ulp, which is
      hooked up to /proc.  Example:
      
      $ cat /proc/sys/net/ipv4/tcp_available_ulp
      tls
      
      There is currently no functionality to remove or chain ULPs, but
      it should be possible to add these in the future if needed.
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      734942cc
  7. 08 6月, 2017 3 次提交
  8. 25 4月, 2017 2 次提交
    • W
      net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
      Wei Wang 提交于
      Middlebox firewall issues can potentially cause server's data being
      blackholed after a successful 3WHS using TFO. Following are the related
      reports from Apple:
      https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
      Slide 31 identifies an issue where the client ACK to the server's data
      sent during a TFO'd handshake is dropped.
      C ---> syn-data ---> S
      C <--- syn/ack ----- S
      C (accept & write)
      C <---- data ------- S
      C ----- ACK -> X     S
      		[retry and timeout]
      
      https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
      Slide 5 shows a similar situation that the server's data gets dropped
      after 3WHS.
      C ---- syn-data ---> S
      C <--- syn/ack ----- S
      C ---- ack --------> S
      S (accept & write)
      C?  X <- data ------ S
      		[retry and timeout]
      
      This is the worst failure b/c the client can not detect such behavior to
      mitigate the situation (such as disabling TFO). Failing to proceed, the
      application (e.g., SSL library) may simply timeout and retry with TFO
      again, and the process repeats indefinitely.
      
      The proposed solution is to disable active TFO globally under the
      following circumstances:
      1. client side TFO socket detects out of order FIN
      2. client side TFO socket receives out of order RST
      
      We disable active side TFO globally for 1hr at first. Then if it
      happens again, we disable it for 2h, then 4h, 8h, ...
      And we reset the timeout to 1hr if a client side TFO sockets not opened
      on loopback has successfully received data segs from server.
      And we examine this condition during close().
      
      The rational behind it is that when such firewall issue happens,
      application running on the client should eventually close the socket as
      it is not able to get the data it is expecting. Or application running
      on the server should close the socket as it is not able to receive any
      response from client.
      In both cases, out of order FIN or RST will get received on the client
      given that the firewall will not block them as no data are in those
      frames.
      And we want to disable active TFO globally as it helps if the middle box
      is very close to the client and most of the connections are likely to
      fail.
      
      Also, add a debug sysctl:
        tcp_fastopen_blackhole_detect_timeout_sec:
          the initial timeout to use when firewall blackhole issue happens.
          This can be set and read.
          When setting it to 0, it means to disable the active disable logic.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf1ef3f0
    • D
      net: add rcu locking when changing early demux · 58c4c6a3
      David Ahern 提交于
      systemd-sysctl is triggering a suspicious RCU usage message when
      net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
      a sysctl config file:
      
      [   33.896184] ===============================
      [   33.899558] [ ERR: suspicious RCU usage.  ]
      [   33.900624] 4.11.0-rc7+ #104 Not tainted
      [   33.901698] -------------------------------
      [   33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
      [   33.905724]
      other info that might help us debug this:
      
      [   33.907656]
      rcu_scheduler_active = 2, debug_locks = 0
      [   33.909288] 1 lock held by systemd-sysctl/143:
      [   33.910373]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48
      [   33.912407]
      stack backtrace:
      [   33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
      [   33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   33.917870] Call Trace:
      [   33.918431]  dump_stack+0x81/0xb6
      [   33.919241]  lockdep_rcu_suspicious+0x10f/0x118
      [   33.920263]  proc_configure_early_demux+0x65/0x10a
      [   33.921391]  proc_udp_early_demux+0x3a/0x41
      
      add rcu locking to proc_configure_early_demux.
      
      Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58c4c6a3
  9. 25 3月, 2017 1 次提交
    • S
      net: Add sysctl to toggle early demux for tcp and udp · dddb64bc
      subashab@codeaurora.org 提交于
      Certain system process significant unconnected UDP workload.
      It would be preferrable to disable UDP early demux for those systems
      and enable it for TCP only.
      
      By disabling UDP demux, we see these slight gains on an ARM64 system-
      782 -> 788Mbps unconnected single stream UDPv4
      633 -> 654Mbps unconnected UDPv4 different sources
      
      The performance impact can change based on CPU architecure and cache
      sizes. There will not much difference seen if entire UDP hash table
      is in cache.
      
      Both sysctls are enabled by default to preserve existing behavior.
      
      v1->v2: Change function pointer instead of adding conditional as
      suggested by Stephen.
      
      v2->v3: Read once in callers to avoid issues due to compiler
      optimizations. Also update commit message with the tests.
      
      v3->v4: Store and use read once result instead of querying pointer
      again incorrectly.
      
      v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dddb64bc
  10. 22 3月, 2017 1 次提交
    • N
      net: ipv4: add support for ECMP hash policy choice · bf4e0a3d
      Nikolay Aleksandrov 提交于
      This patch adds support for ECMP hash policy choice via a new sysctl
      called fib_multipath_hash_policy and also adds support for L4 hashes.
      The current values for fib_multipath_hash_policy are:
       0 - layer 3 (default)
       1 - layer 4
      If there's an skb hash already set and it matches the chosen policy then it
      will be used instead of being calculated (currently only for L4).
      In L3 mode we always calculate the hash due to the ICMP error special
      case, the flow dissector's field consistentification should handle the
      address order thus we can remove the address reversals.
      If the skb is provided we always use it for the hash calculation,
      otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf4e0a3d
  11. 17 3月, 2017 1 次提交
    • S
      tcp: remove tcp_tw_recycle · 4396e461
      Soheil Hassas Yeganeh 提交于
      The tcp_tw_recycle was already broken for connections
      behind NAT, since the per-destination timestamp is not
      monotonically increasing for multiple machines behind
      a single destination address.
      
      After the randomization of TCP timestamp offsets
      in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
      for each connection), the tcp_tw_recycle is broken for all
      types of connections for the same reason: the timestamps
      received from a single machine is not monotonically increasing,
      anymore.
      
      Remove tcp_tw_recycle, since it is not functional. Also, remove
      the PAWSPassive SNMP counter since it is only used for
      tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
      since the strict argument is only set when tcp_tw_recycle is
      enabled.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Cc: Lutz Vieweg <lvml@5t9.de>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4396e461
  12. 31 1月, 2017 1 次提交
    • R
      net: Avoid receiving packets with an l3mdev on unbound UDP sockets · 63a6fff3
      Robert Shearman 提交于
      Packets arriving in a VRF currently are delivered to UDP sockets that
      aren't bound to any interface. TCP defaults to not delivering packets
      arriving in a VRF to unbound sockets. IP route lookup and socket
      transmit both assume that unbound means using the default table and
      UDP applications that haven't been changed to be aware of VRFs may not
      function correctly in this case since they may not be able to handle
      overlapping IP address ranges, or be able to send packets back to the
      original sender if required.
      
      So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
      being analgous to the existing tcp_l3mdev_accept, namely to allow a
      process to have a VRF-global listen socket. Have this default to off
      as this is the behaviour that users will expect, given that there is
      no explicit mechanism to set unmodified VRF-unaware application into a
      default VRF.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63a6fff3
  13. 25 1月, 2017 1 次提交
    • K
      Introduce a sysctl that modifies the value of PROT_SOCK. · 4548b683
      Krister Johansen 提交于
      Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl
      that denotes the first unprivileged inet port in the namespace.  To
      disable all privileged ports set this to zero.  It also checks for
      overlap with the local port range.  The privileged and local range may
      not overlap.
      
      The use case for this change is to allow containerized processes to bind
      to priviliged ports, but prevent them from ever being allowed to modify
      their container's network configuration.  The latter is accomplished by
      ensuring that the network namespace is not a child of the user
      namespace.  This modification was needed to allow the container manager
      to disable a namespace's priviliged port restrictions without exposing
      control of the network namespace to processes in the user namespace.
      Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4548b683
  14. 14 1月, 2017 1 次提交
    • Y
      tcp: remove thin_dupack feature · 4a7f6009
      Yuchung Cheng 提交于
      Thin stream DUPACK is to start fast recovery on only one DUPACK
      provided the connection is a thin stream (i.e., low inflight).  But
      this older feature is now subsumed with RACK. If a connection
      receives only a single DUPACK, RACK would arm a reordering timer
      and soon starts fast recovery instead of timeout if no further
      ACKs are received.
      
      The socket option (THIN_DUPACK) is kept as a nop for compatibility.
      Note that this patch does not change another thin-stream feature
      which enables linear RTO. Although it might be good to generalize
      that in the future (i.e., linear RTO for the first say 3 retries).
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a7f6009
  15. 10 1月, 2017 1 次提交
  16. 30 12月, 2016 2 次提交
  17. 28 12月, 2016 1 次提交