1. 02 3月, 2009 3 次提交
  2. 22 2月, 2009 1 次提交
    • H
      tcp: Always set urgent pointer if it's beyond snd_nxt · 7691367d
      Herbert Xu 提交于
      Our TCP stack does not set the urgent flag if the urgent pointer
      does not fit in 16 bits, i.e., if it is more than 64K from the
      sequence number of a packet.
      
      This behaviour is different from the BSDs, and clearly contradicts
      the purpose of urgent mode, which is to send the notification
      (though not necessarily the associated data) as soon as possible.
      Our current behaviour may in fact delay the urgent notification
      indefinitely if the receiver window does not open up.
      
      Simply matching BSD however may break legacy applications which
      incorrectly rely on the out-of-band delivery of urgent data, and
      conversely the in-band delivery of non-urgent data.
      
      Alexey Kuznetsov suggested a safe solution of following BSD only
      if the urgent pointer itself has not yet been transmitted.  This
      way we guarantee that when the remote end sees the packet with
      non-urgent data marked as urgent due to wrap-around we would have
      advanced the urgent pointer beyond, either to the actual urgent
      data or to an as-yet untransmitted packet.
      
      The only potential downside is that applications on the remote
      end may see multiple SIGURG notifications.  However, this would
      occur anyway with other TCP stacks.  More importantly, the outcome
      of such a duplicate notification is likely to be harmless since
      the signal itself does not carry any information other than the
      fact that we're in urgent mode.
      
      Thanks to Ilpo Järvinen for fixing a critical bug in this and
      Jeff Chua for reporting that bug.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7691367d
  3. 19 2月, 2009 1 次提交
  4. 06 2月, 2009 1 次提交
  5. 26 12月, 2008 1 次提交
    • H
      tcp: Always set urgent pointer if it's beyond snd_nxt · 64ff3b93
      Herbert Xu 提交于
      Our TCP stack does not set the urgent flag if the urgent pointer
      does not fit in 16 bits, i.e., if it is more than 64K from the
      sequence number of a packet.
      
      This behaviour is different from the BSDs, and clearly contradicts
      the purpose of urgent mode, which is to send the notification
      (though not necessarily the associated data) as soon as possible.
      Our current behaviour may in fact delay the urgent notification
      indefinitely if the receiver window does not open up.
      
      Simply matching BSD however may break legacy applications which
      incorrectly rely on the out-of-band delivery of urgent data, and
      conversely the in-band delivery of non-urgent data.
      
      Alexey Kuznetsov suggested a safe solution of following BSD only
      if the urgent pointer itself has not yet been transmitted.  This
      way we guarantee that when the remote end sees the packet with
      non-urgent data marked as urgent due to wrap-around we would have
      advanced the urgent pointer beyond, either to the actual urgent
      data or to an as-yet untransmitted packet.
      
      The only potential downside is that applications on the remote
      end may see multiple SIGURG notifications.  However, this would
      occur anyway with other TCP stacks.  More importantly, the outcome
      of such a duplicate notification is likely to be harmless since
      the signal itself does not carry any information other than the
      fact that we're in urgent mode.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64ff3b93
  6. 06 12月, 2008 3 次提交
  7. 04 12月, 2008 1 次提交
    • I
      tcp: make urg+gso work for real this time · f8269a49
      Ilpo Järvinen 提交于
      I should have noticed this earlier... :-) The previous solution
      to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
      patterns, or even worse, a steep downward slope into packet
      counting because each skb pcount would be truncated to pcount
      of 2 and then the following fragments of the later portion would
      restore the window again.
      
      Basically this reverts "tcp: Do not use TSO/GSO when there is
      urgent data" (33cf71ce). It also removes some unnecessary code
      from tcp_current_mss that didn't work as intented either (could
      be that something was changed down the road, or it might have
      been broken since the dawn of time) because it only works once
      urg is already written while this bug shows up starting from
      ~64k before the urg point.
      
      The retransmissions already are split to mss sized chunks, so
      only new data sending paths need splitting in case they have
      a segment otherwise suitable for gso/tso. The actually check
      can be improved to be more narrow but since this is late -rc
      already, I'll postpone thinking the more fine-grained things.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8269a49
  8. 25 11月, 2008 2 次提交
    • I
      e1aa680f
    • I
      tcp: collapse more than two on retransmission · 4a17fc3a
      Ilpo Järvinen 提交于
      I always had thought that collapsing up to two at a time was
      intentional decision to avoid excessive processing if 1 byte
      sized skbs are to be combined for a full mtu, and consecutive
      retransmissions would make the size of the retransmittee
      double each round anyway, but some recent discussion made me
      to understand that was not the case. Thus make collapse work
      more and wait less.
      
      It would be possible to take advantage of the shifting
      machinery (added in the later patch) in the case of paged
      data but that can be implemented on top of this change.
      
      tcp_skb_is_last check is now provided by the loop.
      
      I tested a bit (ss-after-idle-off, fill 4096x4096B xfer,
      10s sleep + 4096 x 1byte writes while dropping them for
      some a while with netem):
      
      . 16774097:16775545(1448) ack 1 win 46
      . 16775545:16776993(1448) ack 1 win 46
      . ack 16759617 win 2399
      P 16776993:16777217(224) ack 1 win 46
      . ack 16762513 win 2399
      . ack 16765409 win 2399
      . ack 16768305 win 2399
      . ack 16771201 win 2399
      . ack 16774097 win 2399
      . ack 16776993 win 2399
      . ack 16777217 win 2399
      P 16777217:16777257(40) ack 1 win 46
      . ack 16777257 win 2399
      P 16777257:16778705(1448) ack 1 win 46
      P 16778705:16780153(1448) ack 1 win 46
      FP 16780153:16781313(1160) ack 1 win 46
      . ack 16778705 win 2399
      . ack 16780153 win 2399
      F 1:1(0) ack 16781314 win 2399
      
      While without drop-all period I get this:
      
      . 16773585:16775033(1448) ack 1 win 46
      . ack 16764897 win 9367
      . ack 16767793 win 9367
      . ack 16770689 win 9367
      . ack 16773585 win 9367
      . 16775033:16776481(1448) ack 1 win 46
      P 16776481:16777217(736) ack 1 win 46
      . ack 16776481 win 9367
      . ack 16777217 win 9367
      P 16777217:16777218(1) ack 1 win 46
      P 16777218:16777219(1) ack 1 win 46
      P 16777219:16777220(1) ack 1 win 46
        ...
      P 16777247:16777248(1) ack 1 win 46
      . ack 16777218 win 9367
      . ack 16777219 win 9367
        ...
      . ack 16777233 win 9367
      . ack 16777248 win 9367
      P 16777248:16778696(1448) ack 1 win 46
      P 16778696:16780144(1448) ack 1 win 46
      FP 16780144:16781313(1169) ack 1 win 46
      . ack 16780144 win 9367
      F 1:1(0) ack 16781314 win 9367
      
      The window seems to be 30-40 segments, which were successfully
      combined into: P 16777217:16777257(40) ack 1 win 46
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a17fc3a
  9. 22 11月, 2008 1 次提交
    • P
      tcp: Do not use TSO/GSO when there is urgent data · 33cf71ce
      Petr Tesarik 提交于
      This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
      
      Since most (if not all) implementations of TSO and even the in-kernel
      software GSO do not update the urgent pointer when splitting a large
      segment, it is necessary to turn off TSO/GSO for all outgoing traffic
      with the URG pointer set.
      
      Looking at tcp_current_mss (and the preceding comment) I even think
      this was the original intention. However, this approach is insufficient,
      because TSO/GSO is turned off only for newly created frames, not for
      frames which were already pending at the arrival of a message with
      MSG_OOB set. These frames were created when TSO/GSO was enabled,
      so they may be large, and they will have the urgent pointer set
      in tcp_transmit_skb().
      
      With this patch, such large packets will be fragmented again before
      going to the transmit routine.
      
      As a side note, at least the following NICs are known to screw up
      the urgent pointer in the TCP header when doing TSO:
      
      	Intel 82566MM (PCI ID 8086:1049)
      	Intel 82566DC (PCI ID 8086:104b)
      	Intel 82541GI (PCI ID 8086:1076)
      	Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cf71ce
  10. 03 11月, 2008 1 次提交
  11. 27 10月, 2008 1 次提交
  12. 24 10月, 2008 1 次提交
    • I
      tcp: Restore ordering of TCP options for the sake of inter-operability · fd6149d3
      Ilpo Järvinen 提交于
      This is not our bug! Sadly some devices cannot cope with the change
      of TCP option ordering which was a result of the recent rewrite of
      the option code (not that there was some particular reason steming
      from the rewrite for the reordering) though any ordering of TCP
      options is perfectly legal. Thus we restore the original ordering
      to allow interoperability with/through such broken devices and add
      some warning about this trap. Since the reordering just happened
      without any particular reason, this change shouldn't cost us
      anything.
      
      There are already couple of known failure reports (within close
      proximity of the last release), so the problem might be more
      wide-spread than a single device. And other reports which may
      be due to the same problem though the symptoms were less obvious.
      Analysis of one of the case revealed (with very high probability)
      that sack capability cannot be negotiated as the first option
      (SYN never got a response).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Reported-by: NAldo Maggi <sentiniate@tiscali.it>
      Tested-by: NAldo Maggi <sentiniate@tiscali.it>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd6149d3
  13. 22 10月, 2008 1 次提交
    • I
      tcp: should use number of sack blocks instead of -1 · 75e3d8db
      Ilpo Järvinen 提交于
      While looking for the recent "sack issue" I also read all eff_sacks
      usage that was played around by some relevant commit. I found
      out that there's another thing that is asking for a fix (unrelated
      to the "sack issue" though).
      
      This feature has probably very little significance in practice.
      Opposite direction timeout with bidirectional tcp comes to me as
      the most likely scenario though there might be other cases as
      well related to non-data segments we send (e.g., response to the
      opposite direction segment). Also some ACK losses or option space
      wasted for other purposes is necessary to prevent the earlier
      SACK feedback getting to the sender.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75e3d8db
  14. 08 10月, 2008 1 次提交
    • I
      tcp: kill pointless urg_mode · 33f5f57e
      Ilpo Järvinen 提交于
      It all started from me noticing that this urgent check in
      tcp_clean_rtx_queue is unnecessarily inside the loop. Then
      I took a longer look to it and found out that the users of
      urg_mode can trivially do without, well almost, there was
      one gotcha.
      
      Bonus: those funny people who use urg with >= 2^31 write_seq -
      snd_una could now rejoice too (that's the only purpose for the
      between being there, otherwise a simple compare would have done
      the thing). Not that I assume that the rest of the tcp code
      happily lives with such mind-boggling numbers :-). Alas, it
      turned out to be impossible to set wmem to such numbers anyway,
      yes I really tried a big sendfile after setting some wmem but
      nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
      So I hacked a bit variable to long and found out that it seems
      to work... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33f5f57e
  15. 01 10月, 2008 1 次提交
    • K
      tcp: Port redirection support for TCP · a3116ac5
      KOVACS Krisztian 提交于
      Current TCP code relies on the local port of the listening socket
      being the same as the destination address of the incoming
      connection. Port redirection used by many transparent proxying
      techniques obviously breaks this, so we have to store the original
      destination port address.
      
      This patch extends struct inet_request_sock and stores the incoming
      destination port value there. It also modifies the handshake code to
      use that value as the source port when sending reply packets.
      Signed-off-by: NKOVACS Krisztian <hidden@sch.bme.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3116ac5
  16. 23 9月, 2008 1 次提交
  17. 21 9月, 2008 11 次提交
  18. 27 8月, 2008 1 次提交
  19. 22 7月, 2008 1 次提交
  20. 19 7月, 2008 2 次提交
  21. 17 7月, 2008 3 次提交
  22. 03 7月, 2008 1 次提交
    • P
      tcp: de-bloat a bit with factoring NET_INC_STATS_BH out · 40b215e5
      Pavel Emelyanov 提交于
      There are some places in TCP that select one MIB index to
      bump snmp statistics like this:
      
      	if (<something>)
      		NET_INC_STATS_BH(<some_id>);
      	else if (<something_else>)
      		NET_INC_STATS_BH(<some_other_id>);
      	...
      	else
      		NET_INC_STATS_BH(<default_id>);
      
      or in a more tricky but still similar way.
      
      On the other hand, this NET_INC_STATS_BH is a camouflaged
      increment of percpu variable, which is not that small.
      
      Factoring those cases out de-bloats 235 bytes on non-preemptible
      i386 config and drives parts of the code into 80 columns.
      
      add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-235 (-235)
      function                                     old     new   delta
      tcp_fastretrans_alert                       1437    1424     -13
      tcp_dsack_set                                137     124     -13
      tcp_xmit_retransmit_queue                    690     676     -14
      tcp_try_undo_recovery                        283     265     -18
      tcp_sacktag_write_queue                     1550    1515     -35
      tcp_update_reordering                        162     106     -56
      tcp_retransmit_timer                         990     904     -86
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40b215e5