1. 30 9月, 2015 9 次提交
  2. 29 9月, 2015 1 次提交
    • E
      tcp: avoid reorders for TFO passive connections · 7c85af88
      Eric Dumazet 提交于
      We found that a TCP Fast Open passive connection was vulnerable
      to reorders, as the exchange might look like
      
      [1] C -> S S <FO ...> <request>
      [2] S -> C S. ack request <options>
      [3] S -> C . <answer>
      
      packets [2] and [3] can be generated at almost the same time.
      
      If C receives the 3rd packet before the 2nd, it will drop it as
      the socket is in SYN_SENT state and expects a SYNACK.
      
      S will have to retransmit the answer.
      
      Current OOO avoidance in linux is defeated because SYNACK
      packets are attached to the LISTEN socket, while DATA packets
      are attached to the children. They might be sent by different cpus,
      and different TX queues might be selected.
      
      It turns out that for TFO, we created a child, which is a
      full blown socket in TCP_SYN_RECV state, and we simply can attach
      the SYNACK packet to this socket.
      
      This means that at the time tcp_sendmsg() pushes DATA packet,
      skb->ooo_okay will be set iff the SYNACK packet had been sent
      and TX completed.
      
      This removes the reorder source at the host level.
      
      We also removed the export of tcp_try_fastopen(), as it is no
      longer called from IPv6.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c85af88
  3. 27 9月, 2015 1 次提交
  4. 26 9月, 2015 14 次提交
  5. 25 9月, 2015 10 次提交
  6. 24 9月, 2015 1 次提交
  7. 22 9月, 2015 3 次提交
    • E
      tcp/dccp: fix timewait races in timer handling · ed2e9239
      Eric Dumazet 提交于
      When creating a timewait socket, we need to arm the timer before
      allowing other cpus to find it. The signal allowing cpus to find
      the socket is setting tw_refcnt to non zero value.
      
      As we set tw_refcnt in __inet_twsk_hashdance(), we therefore need to
      call inet_twsk_schedule() first.
      
      This also means we need to remove tw_refcnt changes from
      inet_twsk_schedule() and let the caller handle it.
      
      Note that because we use mod_timer_pinned(), we have the guarantee
      the timer wont expire before we set tw_refcnt as we run in BH context.
      
      To make things more readable I introduced inet_twsk_reschedule() helper.
      
      When rearming the timer, we can use mod_timer_pending() to make sure
      we do not rearm a canceled timer.
      
      Note: This bug can possibly trigger if packets of a flow can hit
      multiple cpus. This does not normally happen, unless flow steering
      is broken somehow. This explains this bug was spotted ~5 months after
      its introduction.
      
      A similar fix is needed for SYN_RECV sockets in reqsk_queue_hash_req(),
      but will be provided in a separate patch for proper tracking.
      
      Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NYing Cai <ycai@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed2e9239
    • Y
      tcp: usec resolution SYN/ACK RTT · 0f1c28ae
      Yuchung Cheng 提交于
      Currently SYN/ACK RTT is measured in jiffies. For LAN the SYN/ACK
      RTT is often measured as 0ms or sometimes 1ms, which would affect
      RTT estimation and min RTT samping used by some congestion control.
      
      This patch improves SYN/ACK RTT to be usec resolution if platform
      supports it. While the timestamping of SYN/ACK is done in request
      sock, the RTT measurement is carefully arranged to avoid storing
      another u64 timestamp in tcp_sock.
      
      For regular handshake w/o SYNACK retransmission, the RTT is sampled
      right after the child socket is created and right before the request
      sock is released (tcp_check_req() in tcp_minisocks.c)
      
      For Fast Open the child socket is already created when SYN/ACK was
      sent, the RTT is sampled in tcp_rcv_state_process() after processing
      the final ACK an right before the request socket is released.
      
      If the SYN/ACK was retransmistted or SYN-cookie was used, we rely
      on TCP timestamps to measure the RTT. The sample is taken at the
      same place in tcp_rcv_state_process() after the timestamp values
      are validated in tcp_validate_incoming(). Note that we do not store
      TS echo value in request_sock for SYN-cookies, because the value
      is already stored in tp->rx_opt used by tcp_ack_update_rtt().
      
      One side benefit is that the RTT measurement now happens before
      initializing congestion control (of the passive side). Therefore
      the congestion control can use the SYN/ACK RTT.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f1c28ae
    • U
      s390/iucv: do not use arrays as argument · 91e60eb6
      Ursula Braun 提交于
      The iucv code uses arrays as arguments. Even though this does not
      really cause a problem, it could be misleading, since the compiler
      turns array arguments into just a pointer argument. To be more
      precise this patch changes the array arguments into pointers.
      Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91e60eb6
  8. 21 9月, 2015 1 次提交
    • N
      ip6tunnel: make rx/tx bytes counters consistent · 83cf9a25
      Nicolas Dichtel 提交于
      Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part.
      
      Before the patch, the external ipv6 header + gre header were included on
      tx.
      
      After the patch:
      $ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1
      PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data.
      64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms
      
      --- 192.168.6.121 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms
      7: ip6gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21
          RX: bytes  packets  errors  dropped overrun mcast
          84         1        0       0       0       0
          TX: bytes  packets  errors  dropped carrier collsns
          84         1        0       0       0       0
      $ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1
      PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data.
      64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms
      
      --- 192.168.1.121 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms
      8: ip6tnl1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121
          RX: bytes  packets  errors  dropped overrun mcast
          84         1        0       0       0       0
          TX: bytes  packets  errors  dropped carrier collsns
          84         1        0       0       0       0
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83cf9a25