1. 03 11月, 2008 1 次提交
  2. 27 10月, 2008 1 次提交
  3. 24 10月, 2008 1 次提交
    • I
      tcp: Restore ordering of TCP options for the sake of inter-operability · fd6149d3
      Ilpo Järvinen 提交于
      This is not our bug! Sadly some devices cannot cope with the change
      of TCP option ordering which was a result of the recent rewrite of
      the option code (not that there was some particular reason steming
      from the rewrite for the reordering) though any ordering of TCP
      options is perfectly legal. Thus we restore the original ordering
      to allow interoperability with/through such broken devices and add
      some warning about this trap. Since the reordering just happened
      without any particular reason, this change shouldn't cost us
      anything.
      
      There are already couple of known failure reports (within close
      proximity of the last release), so the problem might be more
      wide-spread than a single device. And other reports which may
      be due to the same problem though the symptoms were less obvious.
      Analysis of one of the case revealed (with very high probability)
      that sack capability cannot be negotiated as the first option
      (SYN never got a response).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Reported-by: NAldo Maggi <sentiniate@tiscali.it>
      Tested-by: NAldo Maggi <sentiniate@tiscali.it>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd6149d3
  4. 22 10月, 2008 1 次提交
    • I
      tcp: should use number of sack blocks instead of -1 · 75e3d8db
      Ilpo Järvinen 提交于
      While looking for the recent "sack issue" I also read all eff_sacks
      usage that was played around by some relevant commit. I found
      out that there's another thing that is asking for a fix (unrelated
      to the "sack issue" though).
      
      This feature has probably very little significance in practice.
      Opposite direction timeout with bidirectional tcp comes to me as
      the most likely scenario though there might be other cases as
      well related to non-data segments we send (e.g., response to the
      opposite direction segment). Also some ACK losses or option space
      wasted for other purposes is necessary to prevent the earlier
      SACK feedback getting to the sender.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75e3d8db
  5. 08 10月, 2008 1 次提交
    • I
      tcp: kill pointless urg_mode · 33f5f57e
      Ilpo Järvinen 提交于
      It all started from me noticing that this urgent check in
      tcp_clean_rtx_queue is unnecessarily inside the loop. Then
      I took a longer look to it and found out that the users of
      urg_mode can trivially do without, well almost, there was
      one gotcha.
      
      Bonus: those funny people who use urg with >= 2^31 write_seq -
      snd_una could now rejoice too (that's the only purpose for the
      between being there, otherwise a simple compare would have done
      the thing). Not that I assume that the rest of the tcp code
      happily lives with such mind-boggling numbers :-). Alas, it
      turned out to be impossible to set wmem to such numbers anyway,
      yes I really tried a big sendfile after setting some wmem but
      nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
      So I hacked a bit variable to long and found out that it seems
      to work... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33f5f57e
  6. 01 10月, 2008 1 次提交
    • K
      tcp: Port redirection support for TCP · a3116ac5
      KOVACS Krisztian 提交于
      Current TCP code relies on the local port of the listening socket
      being the same as the destination address of the incoming
      connection. Port redirection used by many transparent proxying
      techniques obviously breaks this, so we have to store the original
      destination port address.
      
      This patch extends struct inet_request_sock and stores the incoming
      destination port value there. It also modifies the handshake code to
      use that value as the source port when sending reply packets.
      Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3116ac5
  7. 23 9月, 2008 1 次提交
  8. 21 9月, 2008 11 次提交
  9. 27 8月, 2008 1 次提交
  10. 22 7月, 2008 1 次提交
  11. 19 7月, 2008 2 次提交
  12. 17 7月, 2008 3 次提交
  13. 03 7月, 2008 1 次提交
    • P
      tcp: de-bloat a bit with factoring NET_INC_STATS_BH out · 40b215e5
      Pavel Emelyanov 提交于
      There are some places in TCP that select one MIB index to
      bump snmp statistics like this:
      
      	if (<something>)
      		NET_INC_STATS_BH(<some_id>);
      	else if (<something_else>)
      		NET_INC_STATS_BH(<some_other_id>);
      	...
      	else
      		NET_INC_STATS_BH(<default_id>);
      
      or in a more tricky but still similar way.
      
      On the other hand, this NET_INC_STATS_BH is a camouflaged
      increment of percpu variable, which is not that small.
      
      Factoring those cases out de-bloats 235 bytes on non-preemptible
      i386 config and drives parts of the code into 80 columns.
      
      add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-235 (-235)
      function                                     old     new   delta
      tcp_fastretrans_alert                       1437    1424     -13
      tcp_dsack_set                                137     124     -13
      tcp_xmit_retransmit_queue                    690     676     -14
      tcp_try_undo_recovery                        283     265     -18
      tcp_sacktag_write_queue                     1550    1515     -35
      tcp_update_reordering                        162     106     -56
      tcp_retransmit_timer                         990     904     -86
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40b215e5
  14. 12 6月, 2008 2 次提交
  15. 05 6月, 2008 1 次提交
  16. 22 5月, 2008 1 次提交
    • S
      tcp: TCP connection times out if ICMP frag needed is delayed · 7d227cd2
      Sridhar Samudrala 提交于
      We are seeing an issue with TCP in handling an ICMP frag needed
      message that is received after net.ipv4.tcp_retries1 retransmits.
      The default value of retries1 is 3. So if the path mtu changes
      and ICMP frag needed is lost for the first 3 retransmits or if
      it gets delayed until 3 retransmits are done, TCP doesn't update
      MSS correctly and continues to retransmit the orginal message
      until it timesout after tcp_retries2 retransmits.
      
      I am seeing this issue even with the latest 2.6.25.4 kernel.
      
      In tcp_retransmit_timer(), when retransmits counter exceeds 
      tcp_retries1 value, the dst cache entry of the socket is reset.
      At this time, if we receive an ICMP frag needed message, the 
      dst entry gets updated with the new MTU, but the TCP sockets
      dst_cache entry remains NULL.
      
      So the next time when we try to retransmit after the ICMP frag
      needed is received, tcp_retransmit_skb() gets called. Here the
      cur_mss value is calculated at the start of the routine with
      a NULL sk_dst_cache. Instead we should call tcp_current_mss after
      the rebuild_header that caches the dst entry with the updated mtu.
      Also the rebuild_header should be called before tcp_fragment
      so that skb is fragmented if the mss goes down.
      Signed-off-by: NSridhar Samudrala <sri@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d227cd2
  17. 16 4月, 2008 1 次提交
  18. 10 4月, 2008 1 次提交
    • F
      [Syncookies]: Add support for TCP options via timestamps. · 4dfc2817
      Florian Westphal 提交于
      Allow the use of SACK and window scaling when syncookies are used
      and the client supports tcp timestamps. Options are encoded into
      the timestamp sent in the syn-ack and restored from the timestamp
      echo when the ack is received.
      
      Based on earlier work by Glenn Griffin.
      This patch avoids increasing the size of structs by encoding TCP
      options into the least significant bits of the timestamp and
      by not using any 'timestamp offset'.
      
      The downside is that the timestamp sent in the packet after the synack
      will increase by several seconds.
      
      changes since v1:
       don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c
       and have ipv6/syncookies.c use it.
       Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if ()
      Reviewed-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dfc2817
  19. 08 4月, 2008 1 次提交
    • I
      [TCP]: tcp_simple_retransmit can cause S+L · 882bebaa
      Ilpo Järvinen 提交于
      This fixes Bugzilla #10384
      
      tcp_simple_retransmit does L increment without any checking
      whatsoever for overflowing S+L when Reno is in use.
      
      The simplest scenario I can currently think of is rather
      complex in practice (there might be some more straightforward
      cases though). Ie., if mss is reduced during mtu probing, it
      may end up marking everything lost and if some duplicate ACKs
      arrived prior to that sacked_out will be non-zero as well,
      leading to S+L > packets_out, tcp_clean_rtx_queue on the next
      cumulative ACK or tcp_fastretrans_alert on the next duplicate
      ACK will fix the S counter.
      
      More straightforward (but questionable) solution would be to
      just call tcp_reset_reno_sack() in tcp_simple_retransmit but
      it would negatively impact the probe's retransmission, ie.,
      the retransmissions would not occur if some duplicate ACKs
      had arrived.
      
      So I had to add reno sacked_out reseting to CA_Loss state
      when the first cumulative ACK arrives (this stale sacked_out
      might actually be the explanation for the reports of left_out
      overflows in kernel prior to 2.6.23 and S+L overflow reports
      of 2.6.24). However, this alone won't be enough to fix kernel
      before 2.6.24 because it is building on top of the commit
      1b6d427b ([TCP]: Reduce sacked_out with reno when purging
      write_queue) to keep the sacked_out from overflowing.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Reported-by: NAlessandro Suardi <alessandro.suardi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      882bebaa
  20. 21 3月, 2008 2 次提交
    • P
      [NET]: Add per-connection option to set max TSO frame size · 82cc1a7a
      Peter P Waskiewicz Jr 提交于
      Update: My mailer ate one of Jarek's feedback mails...  Fixed the
      parameter in netif_set_gso_max_size() to be u32, not u16.  Fixed the
      whitespace issue due to a patch import botch.  Changed the types from
      u32 to unsigned int to be more consistent with other variables in the
      area.  Also brought the patch up to the latest net-2.6.26 tree.
      
      Update: Made gso_max_size container 32 bits, not 16.  Moved the
      location of gso_max_size within netdev to be less hotpath.  Made more
      consistent names between the sock and netdev layers, and added a
      define for the max GSO size.
      
      Update: Respun for net-2.6.26 tree.
      
      Update: changed max_gso_frame_size and sk_gso_max_size from signed to
      unsigned - thanks Stephen!
      
      This patch adds the ability for device drivers to control the size of
      the TSO frames being sent to them, per TCP connection.  By setting the
      netdevice's gso_max_size value, the socket layer will set the GSO
      frame size based on that value.  This will propogate into the TCP
      layer, and send TSO's of that size to the hardware.
      
      This can be desirable to help tune the bursty nature of TSO on a
      per-adapter basis, where one may have 1 GbE and 10 GbE devices
      coexisting in a system, one running multiqueue and the other not, etc.
      
      This can also be desirable for devices that cannot support full 64 KB
      TSO's, but still want to benefit from some level of segmentation
      offloading.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82cc1a7a
    • P
      [TCP]: Fix shrinking windows with window scaling · 607bfbf2
      Patrick McHardy 提交于
      When selecting a new window, tcp_select_window() tries not to shrink
      the offered window by using the maximum of the remaining offered window
      size and the newly calculated window size. The newly calculated window
      size is always a multiple of the window scaling factor, the remaining
      window size however might not be since it depends on rcv_wup/rcv_nxt.
      This means we're effectively shrinking the window when scaling it down.
      
      
      The dump below shows the problem (scaling factor 2^7):
      
      - Window size of 557 (71296) is advertised, up to 3111907257:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
      
      - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
        below the last end:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>
      
      The number 40 results from downscaling the remaining window:
      
      3111907257 - 3111841425 = 65832
      65832 / 2^7 = 514
      65832 % 2^7 = 40
      
      If the sender uses up the entire window before it is shrunk, this can have
      chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
      will notice that the window has been shrunk since tcp_wnd_end() is before
      tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
      This will fail the receivers checks in tcp_sequence() however since it
      is before it's tp->rcv_wup, making it respond with a dupack.
      
      If both sides are in this condition, this leads to a constant flood of
      ACKs until the connection times out.
      
      Make sure the window is never shrunk by aligning the remaining window to
      the window scaling factor.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      607bfbf2
  21. 12 3月, 2008 1 次提交
  22. 04 3月, 2008 1 次提交
  23. 01 2月, 2008 1 次提交
  24. 29 1月, 2008 2 次提交