1. 03 12月, 2009 3 次提交
  2. 24 11月, 2009 1 次提交
  3. 29 10月, 2009 3 次提交
  4. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  5. 02 10月, 2009 1 次提交
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
  6. 03 9月, 2009 1 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
  7. 24 7月, 2009 1 次提交
  8. 20 7月, 2009 1 次提交
  9. 30 6月, 2009 1 次提交
  10. 03 6月, 2009 1 次提交
  11. 05 5月, 2009 1 次提交
    • I
      tcp: extend ECN sysctl to allow server-side only ECN · 255cac91
      Ilpo Järvinen 提交于
      This should be very safe compared with full enabled, so I see
      no reason why it shouldn't be done right away. As ECN can only
      be negotiated if the SYN sending party is also supporting it,
      somebody in the loop probably knows what he/she is doing. If
      SYN does not ask for ECN, the server side SYN-ACK is identical
      to what it is without ECN. Thus it's quite safe.
      
      The chosen value is safe w.r.t to existing configs which
      choose to currently set manually either 0 or 1 but
      silently upgrades those who have not explicitly requested
      ECN off.
      
      Whether to just enable both sides comes up time to time but
      unless that gets done now we can at least make the servers
      aware of ECN already. As there are some known problems to occur
      if ECN is enabled, it's currently questionable whether there's
      any real gain from enabling clients as servers mostly won't
      support it anyway (so we'd hit just the negative sides). After
      enabling the servers and getting that deployed, the client end
      enable really has some potential gain too.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      255cac91
  12. 20 4月, 2009 1 次提交
  13. 03 4月, 2009 2 次提交
  14. 16 3月, 2009 2 次提交
    • I
      tcp: simplify tcp_current_mss · 0c54b85f
      Ilpo Järvinen 提交于
      There's very little need for most of the callsites to get
      tp->xmit_goal_size updated. That will cost us divide as is,
      so slice the function in two. Also, the only users of the
      tp->xmit_goal_size are directly behind tcp_current_mss(),
      so there's no need to store that variable into tcp_sock
      at all! The drop of xmit_goal_size currently leaves 16-bit
      hole and some reorganization would again be necessary to
      change that (but I'm aiming to fill that hole with u16
      xmit_goal_size_segs to cache the results of the remaining
      divide to get that tso on regression).
      
      Bring xmit_goal_size parts into tcp.c
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Cc: Evgeniy Polyakov <zbr@ioremap.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c54b85f
    • I
      tcp: remove pointless .dsack/.num_sacks code · 5861f8e5
      Ilpo Järvinen 提交于
      In the pure assignment case, the earlier zeroing is
      still in effect.
      
      David S. Miller raised concerns if the ifs are there to avoid
      dirtying cachelines. I came to these conclusions:
      
      > We'll be dirty it anyway (now that I check), the first "real" statement
      > in tcp_rcv_established is:
      >
      >       tp->rx_opt.saw_tstamp = 0;
      >
      > ...that'll land on the same dword. :-/
      >
      > I suppose the blocks are there just because they had more complexity
      > inside when they had to calculate the eff_sacks too (maybe it would
      > have been better to just remove them in that drop-patch so you would
      > have had less head-ache :-)).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5861f8e5
  15. 03 3月, 2009 1 次提交
  16. 02 3月, 2009 7 次提交
  17. 22 2月, 2009 1 次提交
    • H
      tcp: Always set urgent pointer if it's beyond snd_nxt · 7691367d
      Herbert Xu 提交于
      Our TCP stack does not set the urgent flag if the urgent pointer
      does not fit in 16 bits, i.e., if it is more than 64K from the
      sequence number of a packet.
      
      This behaviour is different from the BSDs, and clearly contradicts
      the purpose of urgent mode, which is to send the notification
      (though not necessarily the associated data) as soon as possible.
      Our current behaviour may in fact delay the urgent notification
      indefinitely if the receiver window does not open up.
      
      Simply matching BSD however may break legacy applications which
      incorrectly rely on the out-of-band delivery of urgent data, and
      conversely the in-band delivery of non-urgent data.
      
      Alexey Kuznetsov suggested a safe solution of following BSD only
      if the urgent pointer itself has not yet been transmitted.  This
      way we guarantee that when the remote end sees the packet with
      non-urgent data marked as urgent due to wrap-around we would have
      advanced the urgent pointer beyond, either to the actual urgent
      data or to an as-yet untransmitted packet.
      
      The only potential downside is that applications on the remote
      end may see multiple SIGURG notifications.  However, this would
      occur anyway with other TCP stacks.  More importantly, the outcome
      of such a duplicate notification is likely to be harmless since
      the signal itself does not carry any information other than the
      fact that we're in urgent mode.
      
      Thanks to Ilpo Järvinen for fixing a critical bug in this and
      Jeff Chua for reporting that bug.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7691367d
  18. 19 2月, 2009 1 次提交
  19. 06 2月, 2009 1 次提交
  20. 26 12月, 2008 1 次提交
    • H
      tcp: Always set urgent pointer if it's beyond snd_nxt · 64ff3b93
      Herbert Xu 提交于
      Our TCP stack does not set the urgent flag if the urgent pointer
      does not fit in 16 bits, i.e., if it is more than 64K from the
      sequence number of a packet.
      
      This behaviour is different from the BSDs, and clearly contradicts
      the purpose of urgent mode, which is to send the notification
      (though not necessarily the associated data) as soon as possible.
      Our current behaviour may in fact delay the urgent notification
      indefinitely if the receiver window does not open up.
      
      Simply matching BSD however may break legacy applications which
      incorrectly rely on the out-of-band delivery of urgent data, and
      conversely the in-band delivery of non-urgent data.
      
      Alexey Kuznetsov suggested a safe solution of following BSD only
      if the urgent pointer itself has not yet been transmitted.  This
      way we guarantee that when the remote end sees the packet with
      non-urgent data marked as urgent due to wrap-around we would have
      advanced the urgent pointer beyond, either to the actual urgent
      data or to an as-yet untransmitted packet.
      
      The only potential downside is that applications on the remote
      end may see multiple SIGURG notifications.  However, this would
      occur anyway with other TCP stacks.  More importantly, the outcome
      of such a duplicate notification is likely to be harmless since
      the signal itself does not carry any information other than the
      fact that we're in urgent mode.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64ff3b93
  21. 06 12月, 2008 3 次提交
  22. 04 12月, 2008 1 次提交
    • I
      tcp: make urg+gso work for real this time · f8269a49
      Ilpo Järvinen 提交于
      I should have noticed this earlier... :-) The previous solution
      to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
      patterns, or even worse, a steep downward slope into packet
      counting because each skb pcount would be truncated to pcount
      of 2 and then the following fragments of the later portion would
      restore the window again.
      
      Basically this reverts "tcp: Do not use TSO/GSO when there is
      urgent data" (33cf71ce). It also removes some unnecessary code
      from tcp_current_mss that didn't work as intented either (could
      be that something was changed down the road, or it might have
      been broken since the dawn of time) because it only works once
      urg is already written while this bug shows up starting from
      ~64k before the urg point.
      
      The retransmissions already are split to mss sized chunks, so
      only new data sending paths need splitting in case they have
      a segment otherwise suitable for gso/tso. The actually check
      can be improved to be more narrow but since this is late -rc
      already, I'll postpone thinking the more fine-grained things.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8269a49
  23. 25 11月, 2008 2 次提交
    • I
      e1aa680f
    • I
      tcp: collapse more than two on retransmission · 4a17fc3a
      Ilpo Järvinen 提交于
      I always had thought that collapsing up to two at a time was
      intentional decision to avoid excessive processing if 1 byte
      sized skbs are to be combined for a full mtu, and consecutive
      retransmissions would make the size of the retransmittee
      double each round anyway, but some recent discussion made me
      to understand that was not the case. Thus make collapse work
      more and wait less.
      
      It would be possible to take advantage of the shifting
      machinery (added in the later patch) in the case of paged
      data but that can be implemented on top of this change.
      
      tcp_skb_is_last check is now provided by the loop.
      
      I tested a bit (ss-after-idle-off, fill 4096x4096B xfer,
      10s sleep + 4096 x 1byte writes while dropping them for
      some a while with netem):
      
      . 16774097:16775545(1448) ack 1 win 46
      . 16775545:16776993(1448) ack 1 win 46
      . ack 16759617 win 2399
      P 16776993:16777217(224) ack 1 win 46
      . ack 16762513 win 2399
      . ack 16765409 win 2399
      . ack 16768305 win 2399
      . ack 16771201 win 2399
      . ack 16774097 win 2399
      . ack 16776993 win 2399
      . ack 16777217 win 2399
      P 16777217:16777257(40) ack 1 win 46
      . ack 16777257 win 2399
      P 16777257:16778705(1448) ack 1 win 46
      P 16778705:16780153(1448) ack 1 win 46
      FP 16780153:16781313(1160) ack 1 win 46
      . ack 16778705 win 2399
      . ack 16780153 win 2399
      F 1:1(0) ack 16781314 win 2399
      
      While without drop-all period I get this:
      
      . 16773585:16775033(1448) ack 1 win 46
      . ack 16764897 win 9367
      . ack 16767793 win 9367
      . ack 16770689 win 9367
      . ack 16773585 win 9367
      . 16775033:16776481(1448) ack 1 win 46
      P 16776481:16777217(736) ack 1 win 46
      . ack 16776481 win 9367
      . ack 16777217 win 9367
      P 16777217:16777218(1) ack 1 win 46
      P 16777218:16777219(1) ack 1 win 46
      P 16777219:16777220(1) ack 1 win 46
        ...
      P 16777247:16777248(1) ack 1 win 46
      . ack 16777218 win 9367
      . ack 16777219 win 9367
        ...
      . ack 16777233 win 9367
      . ack 16777248 win 9367
      P 16777248:16778696(1448) ack 1 win 46
      P 16778696:16780144(1448) ack 1 win 46
      FP 16780144:16781313(1169) ack 1 win 46
      . ack 16780144 win 9367
      F 1:1(0) ack 16781314 win 9367
      
      The window seems to be 30-40 segments, which were successfully
      combined into: P 16777217:16777257(40) ack 1 win 46
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a17fc3a
  24. 22 11月, 2008 1 次提交
    • P
      tcp: Do not use TSO/GSO when there is urgent data · 33cf71ce
      Petr Tesarik 提交于
      This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
      
      Since most (if not all) implementations of TSO and even the in-kernel
      software GSO do not update the urgent pointer when splitting a large
      segment, it is necessary to turn off TSO/GSO for all outgoing traffic
      with the URG pointer set.
      
      Looking at tcp_current_mss (and the preceding comment) I even think
      this was the original intention. However, this approach is insufficient,
      because TSO/GSO is turned off only for newly created frames, not for
      frames which were already pending at the arrival of a message with
      MSG_OOB set. These frames were created when TSO/GSO was enabled,
      so they may be large, and they will have the urgent pointer set
      in tcp_transmit_skb().
      
      With this patch, such large packets will be fragmented again before
      going to the transmit routine.
      
      As a side note, at least the following NICs are known to screw up
      the urgent pointer in the TCP header when doing TSO:
      
      	Intel 82566MM (PCI ID 8086:1049)
      	Intel 82566DC (PCI ID 8086:104b)
      	Intel 82541GI (PCI ID 8086:1076)
      	Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cf71ce
  25. 03 11月, 2008 1 次提交