1. 20 7月, 2012 3 次提交
    • Y
      net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) · cf60af03
      Yuchung Cheng 提交于
      sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
      and write(2). The application should replace connect() with it to
      send data in the opening SYN packet.
      
      For blocking socket, sendmsg() blocks until all the data are buffered
      locally and the handshake is completed like connect() call. It
      returns similar errno like connect() if the TCP handshake fails.
      
      For non-blocking socket, it returns the number of bytes queued (and
      transmitted in the SYN-data packet) if cookie is available. If cookie
      is not available, it transmits a data-less SYN packet with Fast Open
      cookie request option and returns -EINPROGRESS like connect().
      
      Using MSG_FASTOPEN on connecting or connected socket will result in
      simlar errno like repeating connect() calls. Therefore the application
      should only use this flag on new sockets.
      
      The buffer size of sendmsg() is independent of the MSS of the connection.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf60af03
    • Y
      net-tcp: Fast Open client - sending SYN-data · 783237e8
      Yuchung Cheng 提交于
      This patch implements sending SYN-data in tcp_connect(). The data is
      from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).
      
      The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
      type of the SYN. If the cookie is not cached (len==0), the host sends
      data-less SYN with Fast Open cookie request option to solicit a cookie
      from the remote. If cookie is not available (len > 0), the host sends
      a SYN-data with Fast Open cookie option. If cookie length is negative,
        the SYN will not include any Fast Open option (for fall back operations).
      
      To deal with middleboxes that may drop SYN with data or experimental TCP
      option, the SYN-data is only sent once. SYN retransmits do not include
      data or Fast Open options. The connection will fall back to regular TCP
      handshake.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      783237e8
    • Y
      net-tcp: Fast Open base · 2100c8d2
      Yuchung Cheng 提交于
      This patch impelements the common code for both the client and server.
      
      1. TCP Fast Open option processing. Since Fast Open does not have an
         option number assigned by IANA yet, it shares the experiment option
         code 254 by implementing draft-ietf-tcpm-experimental-options
         with a 16 bits magic number 0xF989. This enables global experiments
         without clashing the scarce(2) experimental options available for TCP.
      
         When the draft status becomes standard (maybe), the client should
         switch to the new option number assigned while the server supports
         both numbers for transistion.
      
      2. The new sysctl tcp_fastopen
      
      3. A place holder init function
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2100c8d2
  2. 19 7月, 2012 4 次提交
  3. 18 7月, 2012 3 次提交
  4. 17 7月, 2012 5 次提交
    • E
      tcp: implement RFC 5961 4.2 · 0c24604b
      Eric Dumazet 提交于
      Implement the RFC 5691 mitigation against Blind
      Reset attack using SYN bit.
      
      Section 4.2 of RFC 5961 advises to send a Challenge ACK and drop
      incoming packet, instead of resetting the session.
      
      Add a new SNMP counter to count number of challenge acks sent
      in response to SYN packets.
      (netstat -s | grep TCPSYNChallenge)
      
      Remove obsolete TCPAbortOnSyn, since we no longer abort a TCP session
      because of a SYN flag.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kiran Kumar Kella <kkiran@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c24604b
    • E
      tcp: implement RFC 5961 3.2 · 282f23c6
      Eric Dumazet 提交于
      Implement the RFC 5691 mitigation against Blind
      Reset attack using RST bit.
      
      Idea is to validate incoming RST sequence,
      to match RCV.NXT value, instead of previouly accepted
      window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)
      
      If sequence is in window but not an exact match, send
      a "challenge ACK", so that the other part can resend an
      RST with the appropriate sequence.
      
      Add a new sysctl, tcp_challenge_ack_limit, to limit
      number of challenge ACK sent per second.
      
      Add a new SNMP counter to count number of challenge acks sent.
      (netstat -s | grep TCPChallengeACK)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kiran Kumar Kella <kkiran@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      282f23c6
    • J
      etherdevice: Rename random_ether_addr to eth_random_addr · 0a4dd594
      Joe Perches 提交于
      Add some API symmetry to eth_broadcast_addr and
      add a #define to the old name for backward compatibility.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a4dd594
    • A
      net: make sock diag per-namespace · 51d7cccf
      Andrey Vagin 提交于
      Before this patch sock_diag works for init_net only and dumps
      information about sockets from all namespaces.
      
      This patch expands sock_diag for all name-spaces.
      It creates a netlink kernel socket for each netns and filters
      data during dumping.
      
      v2: filter accoding with netns in all places
          remove an unused variable.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51d7cccf
    • E
      tcp: add OFO snmp counters · a6df1ae9
      Eric Dumazet 提交于
      Add three SNMP TCP counters, to better track TCP behavior
      at global stage (netstat -s), when packets are received
      Out Of Order (OFO)
      
      TCPOFOQueue : Number of packets queued in OFO queue
      
      TCPOFODrop  : Number of packets meant to be queued in OFO
                    but dropped because socket rcvbuf limit hit.
      
      TCPOFOMerge : Number of packets in OFO that were merged with
                    other packets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6df1ae9
  5. 14 7月, 2012 1 次提交
  6. 12 7月, 2012 5 次提交
    • J
    • J
    • F
      net: sched: add ipset ematch · 6d4fa852
      Florian Westphal 提交于
      Can be used to match packets against netfilter ip sets created via ipset(8).
      skb->sk_iif is used as 'incoming interface', skb->dev is 'outgoing interface'.
      
      Since ipset is usually called from netfilter, the ematch
      initializes a fake xt_action_param, pulls the ip header into the
      linear area and also sets skb->data to the IP header (otherwise
      matching Layer 4 set types doesn't work).
      Tested-by: NMr Dash Four <mr.dash.four@googlemail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d4fa852
    • E
      tcp: TCP Small Queues · 46d3ceab
      Eric Dumazet 提交于
      This introduce TSQ (TCP Small Queues)
      
      TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
      device queues), to reduce RTT and cwnd bias, part of the bufferbloat
      problem.
      
      sk->sk_wmem_alloc not allowed to grow above a given limit,
      allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
      given time.
      
      TSO packets are sized/capped to half the limit, so that we have two
      TSO packets in flight, allowing better bandwidth use.
      
      As a side effect, setting the limit to 40000 automatically reduces the
      standard gso max limit (65536) to 40000/2 : It can help to reduce
      latencies of high prio packets, having smaller TSO packets.
      
      This means we divert sock_wfree() to a tcp_wfree() handler, to
      queue/send following frames when skb_orphan() [2] is called for the
      already queued skbs.
      
      Results on my dev machines (tg3/ixgbe nics) are really impressive,
      using standard pfifo_fast, and with or without TSO/GSO.
      
      Without reduction of nominal bandwidth, we have reduction of buffering
      per bulk sender :
      < 1ms on Gbit (instead of 50ms with TSO)
      < 8ms on 100Mbit (instead of 132 ms)
      
      I no longer have 4 MBytes backlogged in qdisc by a single netperf
      session, and both side socket autotuning no longer use 4 Mbytes.
      
      As skb destructor cannot restart xmit itself ( as qdisc lock might be
      taken at this point ), we delegate the work to a tasklet. We use one
      tasklest per cpu for performance reasons.
      
      If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
      This flag is tested in a new protocol method called from release_sock(),
      to eventually send new segments.
      
      [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
      [2] skb_orphan() is usually called at TX completion time,
        but some drivers call it in their start_xmit() handler.
        These drivers should at least use BQL, or else a single TCP
        session can still fill the whole NIC TX ring, since TSQ will
        have no effect.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Dave Taht <dave.taht@bufferbloat.net>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Matt Mathis <mattmathis@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46d3ceab
    • H
      650cef38
  7. 11 7月, 2012 4 次提交
  8. 10 7月, 2012 5 次提交
  9. 09 7月, 2012 1 次提交
  10. 08 7月, 2012 4 次提交
  11. 05 7月, 2012 3 次提交
  12. 04 7月, 2012 1 次提交
  13. 02 7月, 2012 1 次提交