1. 12 7月, 2017 1 次提交
  2. 08 7月, 2017 1 次提交
  3. 03 7月, 2017 1 次提交
  4. 28 6月, 2017 1 次提交
  5. 16 6月, 2017 1 次提交
  6. 08 6月, 2017 3 次提交
  7. 07 6月, 2017 1 次提交
  8. 05 6月, 2017 4 次提交
    • A
      net: Update TCP congestion control documentation · 1e0ce2a1
      Anmol Sarma 提交于
      Update tcp.txt to fix mandatory congestion control ops and default
      CCA selection. Also, fix comment in tcp.h for undo_cwnd.
      Signed-off-by: NAnmol Sarma <me@anmolsarma.in>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e0ce2a1
    • D
      rxrpc: Add service upgrade support for client connections · 4e255721
      David Howells 提交于
      Make it possible for a client to use AuriStor's service upgrade facility.
      
      The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
      the first sendmsg() of a call.  This takes no parameters.
      
      When recvmsg() starts returning data from the call, the service ID field in
      the returned msg_name will reflect the result of the upgrade attempt.  If
      the upgrade was ignored, srx_service will match what was set in the
      sendmsg(); if the upgrade happened the srx_service will be altered to
      indicate the service the server upgraded to.
      
      Note that:
      
       (1) The choice of upgrade service is up to the server
      
       (2) Further client calls to the same server that would share a connection
           are blocked if an upgrade probe is in progress.
      
       (3) This should only be used to probe the service.  Clients should then
           use the returned service ID in all subsequent communications with that
           server (and not set the upgrade).  Note that the kernel will not
           retain this information should the connection expire from its cache.
      
       (4) If a server that supports upgrading is replaced by one that doesn't,
           whilst a connection is live, and if the replacement is running, say,
           OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
           server will not respond to packets sent to the upgraded connection.
      
           At this point, calls will time out and the server must be reprobed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4e255721
    • D
      rxrpc: Implement service upgrade · 4722974d
      David Howells 提交于
      Implement AuriStor's service upgrade facility.  There are three problems
      that this is meant to deal with:
      
       (1) Various of the standard AFS RPC calls have IPv4 addresses in their
           requests and/or replies - but there's no room for including IPv6
           addresses.
      
       (2) Definition of IPv6-specific RPC operations in the standard operation
           sets has not yet been achieved.
      
       (3) One could envision the creation a new service on the same port that as
           the original service.  The new service could implement improved
           operations - and the client could try this first, falling back to the
           original service if it's not there.
      
           Unfortunately, certain servers ignore packets addressed to a service
           they don't implement and don't respond in any way - not even with an
           ABORT.  This means that the client must then wait for the call timeout
           to occur.
      
      What service upgrade does is to see if the connection is marked as being
      'upgradeable' and if so, change the service ID in the server and thus the
      request and reply formats.  Note that the upgrade isn't mandatory - a
      server that supports only the original call set will ignore the upgrade
      request.
      
      In the protocol, the procedure is then as follows:
      
       (1) To request an upgrade, the first DATA packet in a new connection must
           have the userStatus set to 1 (this is normally 0).  The userStatus
           value is normally ignored by the server.
      
       (2) If the server doesn't support upgrading, the reply packets will
           contain the same service ID as for the first request packet.
      
       (3) If the server does support upgrading, all future reply packets on that
           connection will contain the new service ID and the new service ID will
           be applied to *all* further calls on that connection as well.
      
       (4) The RPC op used to probe the upgrade must take the same request data
           as the shadow call in the upgrade set (but may return a different
           reply).  GetCapability RPC ops were added to all standard sets for
           just this purpose.  Ops where the request formats differ cannot be
           used for probing.
      
       (5) The client must wait for completion of the probe before sending any
           further RPC ops to the same destination.  It should then use the
           service ID that recvmsg() reported back in all future calls.
      
       (6) The shadow service must have call definitions for all the operation
           IDs defined by the original service.
      
      
      To support service upgrading, a server should:
      
       (1) Call bind() twice on its AF_RXRPC socket before calling listen().
           Each bind() should supply a different service ID, but the transport
           addresses must be the same.  This allows the server to receive
           requests with either service ID.
      
       (2) Enable automatic upgrading by calling setsockopt(), specifying
           RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
           unsigned shorts as the argument:
      
      	unsigned short optval[2];
      
           This specifies a pair of service IDs.  They must be different and must
           match the service IDs bound to the socket.  Member 0 is the service ID
           to upgrade from and member 1 is the service ID to upgrade to.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4722974d
    • D
      rxrpc: Permit multiple service binding · 28036f44
      David Howells 提交于
      Permit bind() to be called on an AF_RXRPC socket more than once (currently
      maximum twice) to bind multiple listening services to it.  There are some
      restrictions:
      
       (1) All bind() calls involved must have a non-zero service ID.
      
       (2) The service IDs must all be different.
      
       (3) The rest of the address (notably the transport part) must be the same
           in all (a single UDP socket is shared).
      
       (4) This must be done before listen() or sendmsg() is called.
      
      This allows someone to connect to the service socket with different service
      IDs and lays the foundation for service upgrading.
      
      The service ID used by an incoming call can be extracted from the msg_name
      returned by recvmsg().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      28036f44
  9. 02 6月, 2017 1 次提交
  10. 31 5月, 2017 1 次提交
  11. 22 5月, 2017 3 次提交
    • M
      net: allow simultaneous SW and HW transmit timestamping · b50a5c70
      Miroslav Lichvar 提交于
      Add SOF_TIMESTAMPING_OPT_TX_SWHW option to allow an outgoing packet to
      be looped to the socket's error queue with a software timestamp even
      when a hardware transmit timestamp is expected to be provided by the
      driver.
      
      Applications using this option will receive two separate messages from
      the error queue, one with a software timestamp and the other with a
      hardware timestamp. As the hardware timestamp is saved to the shared skb
      info, which may happen before the first message with software timestamp
      is received by the application, the hardware timestamp is copied to the
      SCM_TIMESTAMPING control message only when the skb has no software
      timestamp or it is an incoming packet.
      
      While changing sw_tx_timestamp(), inline it in skb_tx_timestamp() as
      there are no other users.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b50a5c70
    • M
      net: fix documentation of struct scm_timestamping · 67953d47
      Miroslav Lichvar 提交于
      The scm_timestamping struct may return multiple non-zero fields, e.g.
      when both software and hardware RX timestamping is enabled, or when the
      SO_TIMESTAMP(NS) option is combined with SCM_TIMESTAMPING and a false
      software timestamp is generated in the recvmsg() call in order to always
      return a SCM_TIMESTAMP(NS) message.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      CC: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67953d47
    • M
      net: add new control message for incoming HW-timestamped packets · aad9c8c4
      Miroslav Lichvar 提交于
      Add SOF_TIMESTAMPING_OPT_PKTINFO option to request a new control message
      for incoming packets with hardware timestamps. It contains the index of
      the real interface which received the packet and the length of the
      packet at layer 2.
      
      The index is useful with bonding, bridges and other interfaces, where
      IP_PKTINFO doesn't allow applications to determine which PHC made the
      timestamp. With the L2 length (and link speed) it is possible to
      transpose preamble timestamps to trailer timestamps, which are used in
      the NTP protocol.
      
      While this information could be provided by two new socket options
      independently from timestamping, it doesn't look like they would be very
      useful. With this option any performance impact is limited to hardware
      timestamping.
      
      Use dev_get_by_napi_id() to get the device and its index. On kernels
      with disabled CONFIG_NET_RX_BUSY_POLL or drivers not using NAPI, a zero
      index will be returned in the control message.
      
      CC: Richard Cochran <richardcochran@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aad9c8c4
  12. 20 5月, 2017 1 次提交
    • D
      net: more accurate checksumming in validate_xmit_skb() · 43c26a1a
      Davide Caratti 提交于
      skb_csum_hwoffload_help() uses netdev features and skb->csum_not_inet to
      determine if skb needs software computation of Internet Checksum or crc32c
      (or nothing, if this computation can be done by the hardware). Use it in
      place of skb_checksum_help() in validate_xmit_skb() to avoid corruption
      of non-GSO SCTP packets having skb->ip_summed equal to CHECKSUM_PARTIAL.
      
      While at it, remove references to skb_csum_off_chk* functions, since they
      are not present anymore in Linux  _ see commit cf53b1da ("Revert
       "net: Add driver helper functions to determine checksum offloadability"").
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43c26a1a
  13. 19 5月, 2017 1 次提交
  14. 16 5月, 2017 3 次提交
  15. 02 5月, 2017 1 次提交
  16. 25 4月, 2017 1 次提交
    • W
      net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
      Wei Wang 提交于
      Middlebox firewall issues can potentially cause server's data being
      blackholed after a successful 3WHS using TFO. Following are the related
      reports from Apple:
      https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
      Slide 31 identifies an issue where the client ACK to the server's data
      sent during a TFO'd handshake is dropped.
      C ---> syn-data ---> S
      C <--- syn/ack ----- S
      C (accept & write)
      C <---- data ------- S
      C ----- ACK -> X     S
      		[retry and timeout]
      
      https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
      Slide 5 shows a similar situation that the server's data gets dropped
      after 3WHS.
      C ---- syn-data ---> S
      C <--- syn/ack ----- S
      C ---- ack --------> S
      S (accept & write)
      C?  X <- data ------ S
      		[retry and timeout]
      
      This is the worst failure b/c the client can not detect such behavior to
      mitigate the situation (such as disabling TFO). Failing to proceed, the
      application (e.g., SSL library) may simply timeout and retry with TFO
      again, and the process repeats indefinitely.
      
      The proposed solution is to disable active TFO globally under the
      following circumstances:
      1. client side TFO socket detects out of order FIN
      2. client side TFO socket receives out of order RST
      
      We disable active side TFO globally for 1hr at first. Then if it
      happens again, we disable it for 2h, then 4h, 8h, ...
      And we reset the timeout to 1hr if a client side TFO sockets not opened
      on loopback has successfully received data segs from server.
      And we examine this condition during close().
      
      The rational behind it is that when such firewall issue happens,
      application running on the client should eventually close the socket as
      it is not able to get the data it is expecting. Or application running
      on the server should close the socket as it is not able to receive any
      response from client.
      In both cases, out of order FIN or RST will get received on the client
      given that the firewall will not block them as no data are in those
      frames.
      And we want to disable active TFO globally as it helps if the middle box
      is very close to the client and most of the connections are likely to
      fail.
      
      Also, add a debug sysctl:
        tcp_fastopen_blackhole_detect_timeout_sec:
          the initial timeout to use when firewall blackhole issue happens.
          This can be set and read.
          When setting it to 0, it means to disable the active disable logic.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf1ef3f0
  17. 24 4月, 2017 1 次提交
  18. 21 4月, 2017 1 次提交
  19. 30 3月, 2017 1 次提交
  20. 25 3月, 2017 1 次提交
    • S
      net: Add sysctl to toggle early demux for tcp and udp · dddb64bc
      subashab@codeaurora.org 提交于
      Certain system process significant unconnected UDP workload.
      It would be preferrable to disable UDP early demux for those systems
      and enable it for TCP only.
      
      By disabling UDP demux, we see these slight gains on an ARM64 system-
      782 -> 788Mbps unconnected single stream UDPv4
      633 -> 654Mbps unconnected UDPv4 different sources
      
      The performance impact can change based on CPU architecure and cache
      sizes. There will not much difference seen if entire UDP hash table
      is in cache.
      
      Both sysctls are enabled by default to preserve existing behavior.
      
      v1->v2: Change function pointer instead of adding conditional as
      suggested by Stephen.
      
      v2->v3: Read once in callers to avoid issues due to compiler
      optimizations. Also update commit message with the tests.
      
      v3->v4: Store and use read once result instead of querying pointer
      again incorrectly.
      
      v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dddb64bc
  21. 24 3月, 2017 1 次提交
  22. 23 3月, 2017 1 次提交
  23. 22 3月, 2017 1 次提交
    • N
      net: ipv4: add support for ECMP hash policy choice · bf4e0a3d
      Nikolay Aleksandrov 提交于
      This patch adds support for ECMP hash policy choice via a new sysctl
      called fib_multipath_hash_policy and also adds support for L4 hashes.
      The current values for fib_multipath_hash_policy are:
       0 - layer 3 (default)
       1 - layer 4
      If there's an skb hash already set and it matches the chosen policy then it
      will be used instead of being calculated (currently only for L4).
      In L3 mode we always calculate the hash due to the ICMP error special
      case, the flow dissector's field consistentification should handle the
      address order thus we can remove the address reversals.
      If the skb is provided we always use it for the hash calculation,
      otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf4e0a3d
  24. 17 3月, 2017 1 次提交
    • S
      tcp: remove tcp_tw_recycle · 4396e461
      Soheil Hassas Yeganeh 提交于
      The tcp_tw_recycle was already broken for connections
      behind NAT, since the per-destination timestamp is not
      monotonically increasing for multiple machines behind
      a single destination address.
      
      After the randomization of TCP timestamp offsets
      in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
      for each connection), the tcp_tw_recycle is broken for all
      types of connections for the same reason: the timestamps
      received from a single machine is not monotonically increasing,
      anymore.
      
      Remove tcp_tw_recycle, since it is not functional. Also, remove
      the PAWSPassive SNMP counter since it is only used for
      tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
      since the strict argument is only set when tcp_tw_recycle is
      enabled.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Cc: Lutz Vieweg <lvml@5t9.de>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4396e461
  25. 16 3月, 2017 4 次提交
  26. 14 3月, 2017 2 次提交
    • R
      mpls: allow TTL propagation from IP packets to be configured · a59166e4
      Robert Shearman 提交于
      Allow TTL propagation from IP packets to MPLS packets to be
      configured. Add a new optional LWT attribute, MPLS_IPTUNNEL_TTL, which
      allows the TTL to be set in the resulting MPLS packet, with the value
      of 0 having the semantics of enabling propagation of the TTL from the
      IP header (i.e. non-zero values disable propagation).
      
      Also allow the configuration to be overridden globally by reusing the
      same sysctl to control whether the TTL is propagated from IP packets
      into the MPLS header. If the per-LWT attribute is set then it
      overrides the global configuration. If the TTL isn't propagated then a
      default TTL value is used which can be configured via a new sysctl,
      "net.mpls.default_ttl". This is kept separate from the configuration
      of whether IP TTL propagation is enabled as it can be used in the
      future when non-IP payloads are supported (i.e. where there is no
      payload TTL that can be propagated).
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59166e4
    • R
      mpls: allow TTL propagation to IP packets to be configured · 5b441ac8
      Robert Shearman 提交于
      Provide the ability to control on a per-route basis whether the TTL
      value from an MPLS packet is propagated to an IPv4/IPv6 packet when
      the last label is popped as per the theoretical model in RFC 3443
      through a new route attribute, RTA_TTL_PROPAGATE which can be 0 to
      mean disable propagation and 1 to mean enable propagation.
      
      In order to provide the ability to change the behaviour for packets
      arriving with IPv4/IPv6 Explicit Null labels and to provide an easy
      way for a user to change the behaviour for all existing routes without
      having to reprogram them, a global knob is provided. This is done
      through the addition of a new per-namespace sysctl,
      "net.mpls.ip_ttl_propagate", which defaults to enabled. If the
      per-route attribute is set (either enabled or disabled) then it
      overrides the global configuration.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b441ac8
  27. 13 3月, 2017 1 次提交