1. 30 5月, 2009 1 次提交
    • I
      tcp: fix loop in ofo handling code and reduce its complexity · 2df9001e
      Ilpo Järvinen 提交于
      Somewhat luckily, I was looking into these parts with very fine
      comb because I've made somewhat similar changes on the same
      area (conflicts that arose weren't that lucky though). The loop
      was very much overengineered recently in commit 91521944
      (tcp: Use SKB queue and list helpers instead of doing it
      by-hand), while it basically just wants to know if there are
      skbs after 'skb'.
      
      Also it got broken because skb1 = skb->next got translated into
      skb1 = skb1->next (though abstracted) improperly. Note that
      'skb1' is pointing to previous sk_buff than skb or NULL if at
      head. Two things went wrong:
      - We'll kfree 'skb' on the first iteration instead of the
        skbuff following 'skb' (it would require required SACK reneging
        to recover I think).
      - The list head case where 'skb1' is NULL is checked too early
        and the loop won't execute whereas it previously did.
      
      Conclusion, mostly revert the recent changes which makes the
      cset very messy looking but using proper accessor in the
      previous-like version.
      
      The effective changes against the original can be viewed with:
        git-diff 91521944^ \
      		net/ipv4/tcp_input.c | sed -n -e '57,70 p'
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2df9001e
  2. 29 5月, 2009 1 次提交
  3. 05 5月, 2009 2 次提交
    • S
      tcp: Fix tcp_prequeue() to get correct rto_min value · 0c266898
      Satoru SATOH 提交于
      tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of
      the actual value might be tuned. The following patches fix this and make
      tcp_prequeue get the actual value returns from tcp_rto_min().
      Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c266898
    • I
      tcp: extend ECN sysctl to allow server-side only ECN · 255cac91
      Ilpo Järvinen 提交于
      This should be very safe compared with full enabled, so I see
      no reason why it shouldn't be done right away. As ECN can only
      be negotiated if the SYN sending party is also supporting it,
      somebody in the loop probably knows what he/she is doing. If
      SYN does not ask for ECN, the server side SYN-ACK is identical
      to what it is without ECN. Thus it's quite safe.
      
      The chosen value is safe w.r.t to existing configs which
      choose to currently set manually either 0 or 1 but
      silently upgrades those who have not explicitly requested
      ECN off.
      
      Whether to just enable both sides comes up time to time but
      unless that gets done now we can at least make the servers
      aware of ECN already. As there are some known problems to occur
      if ECN is enabled, it's currently questionable whether there's
      any real gain from enabling clients as servers mostly won't
      support it anyway (so we'd hit just the negative sides). After
      enabling the servers and getting that deployed, the client end
      enable really has some potential gain too.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      255cac91
  4. 14 4月, 2009 1 次提交
  5. 23 3月, 2009 1 次提交
  6. 22 3月, 2009 1 次提交
  7. 16 3月, 2009 5 次提交
  8. 03 3月, 2009 1 次提交
  9. 02 3月, 2009 4 次提交
  10. 01 3月, 2009 1 次提交
  11. 07 1月, 2009 1 次提交
  12. 06 12月, 2008 7 次提交
  13. 26 11月, 2008 1 次提交
  14. 25 11月, 2008 7 次提交
    • I
      111cc8b9
    • I
      tcp: Make shifting not clear the hints · 92ee76b6
      Ilpo Järvinen 提交于
      The earlier version was just very basic one which is "playing
      safe" by always clearing the hints. However, clearing of a hint
      is extremely costly operation with large windows, so it must be
      avoided at all cost whenever possible, there is a way with
      shifting too achieve not-clearing.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92ee76b6
    • I
      tcp: Try to restore large SKBs while SACK processing · 832d11c5
      Ilpo Järvinen 提交于
      During SACK processing, most of the benefits of TSO are eaten by
      the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
      Then we're in problems when cleanup work for them has to be done
      when a large cumulative ACK comes. Try to return back to pre-split
      state already while more and more SACK info gets discovered by
      combining newly discovered SACK areas with the previous skb if
      that's SACKed as well.
      
      This approach has a number of benefits:
      
      1) The processing overhead is spread more equally over the RTT
      2) Write queue has less skbs to process (affect everything
         which has to walk in the queue past the sacked areas)
      3) Write queue is consistent whole the time, so no other parts
         of TCP has to be aware of this (this was not the case with
         some other approach that was, well, quite intrusive all
         around).
      4) Clean_rtx_queue can release most of the pages using single
         put_page instead of previous PAGE_SIZE/mss+1 calls
      
      In case a hole is fully filled by the new SACK block, we attempt
      to combine the next skb too which allows construction of skbs
      that are even larger than what tso split them to and it handles
      hole per on every nth patterns that often occur during slow start
      overshoot pretty nicely. Though this to be really useful also
      a retransmission would have to get lost since cumulative ACKs
      advance one hole at a time in the most typical case.
      
      TODO: handle upwards only merging. That should be rather easy
      when segment is fully sacked but I'm leaving that as future
      work item (it won't make very large difference anyway since
      this current approach already covers quite a lot of normal
      cases).
      
      I was earlier thinking of some sophisticated way of tracking
      timestamps of the first and the last segment but later on
      realized that it won't be that necessary at all to store the
      timestamp of the last segment. The cases that can occur are
      basically either:
        1) ambiguous => no sensible measurement can be taken anyway
        2) non-ambiguous is due to reordering => having the timestamp
           of the last segment there is just skewing things more off
           than does some good since the ack got triggered by one of
           the holes (besides some substle issues that would make
           determining right hole/skb even harder problem). Anyway,
           it has nothing to do with this change then.
      
      I choose to route some abnormal looking cases with goto noop,
      some could be handled differently (eg., by stopping the
      walking at that skb but again). In general, they either
      shouldn't happen at all or are rare enough to make no difference
      in practice.
      
      In theory this change (as whole) could cause some macroscale
      regression (global) because of cache misses that are taken over
      the round-trip time but it gets very likely better because of much
      less (local) cache misses per other write queue walkers and the
      big recovery clearing cumulative ack.
      
      Worth to note that these benefits would be very easy to get also
      without TSO/GSO being on as long as the data is in pages so that
      we can merge them. Currently I won't let that happen because
      DSACK splitting at fragment that would mess up pcounts due to
      sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
      avoided, we have some conditions that can be made less strict.
      
      TODO: I will probably have to convert the excessive pointer
      passing to struct sacktag_state... :-)
      
      My testing revealed that considerable amount of skbs couldn't
      be shifted because they were cloned (most likely still awaiting
      tx reclaim)...
      
      [The rest is considering future work instead since I got
      repeatably EFAULT to tcpdump's recvfrom when I added
      pskb_expand_head to deal with clones, so I separated that
      into another, later patch]
      
      ...To counter that, I gave up on the fifth advantage:
      
      5) When growing previous SACK block, less allocs for new skbs
         are done, basically a new alloc is needed only when new hole
         is detected and when the previous skb runs out of frags space
      
      ...which now only happens of if reclaim is fast enough to dispose
      the clone before the SACK block comes in (the window is RTT long),
      otherwise we'll have to alloc some.
      
      With clones being handled I got these numbers (will be somewhat
      worse without that), taken with fine-grained mibs:
      
                        TCPSackShifted 398
                         TCPSackMerged 877
                  TCPSackShiftFallback 320
            TCPSACKCOLLAPSEFALLBACKGSO 0
        TCPSACKCOLLAPSEFALLBACKSKBBITS 0
        TCPSACKCOLLAPSEFALLBACKSKBDATA 0
          TCPSACKCOLLAPSEFALLBACKBELOW 0
          TCPSACKCOLLAPSEFALLBACKFIRST 1
       TCPSACKCOLLAPSEFALLBACKPREVBITS 318
            TCPSACKCOLLAPSEFALLBACKMSS 1
         TCPSACKCOLLAPSEFALLBACKNOHEAD 0
          TCPSACKCOLLAPSEFALLBACKSHIFT 0
                TCPSACKCOLLAPSENOOPSEQ 0
        TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
           TCPSACKCOLLAPSENOOPSMALLLEN 0
                   TCPSACKCOLLAPSEHOLE 12
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      832d11c5
    • I
      tcp: make tcp_sacktag_one able to handle partial skb too · f58b22fd
      Ilpo Järvinen 提交于
      This is preparatory work for SACK combiner patch which may
      have to count TCP state changes for only a part of the skb
      because it will intentionally avoids splitting skb to SACKed
      and not sacked parts.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f58b22fd
    • I
      tcp: Make SACK code to split only at mss boundaries · adb92db8
      Ilpo Järvinen 提交于
      Sadly enough, this adds possible divide though we try to avoid
      it by checking one mss as common case.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adb92db8
    • I
      tcp: more aggressive skipping · e8bae275
      Ilpo Järvinen 提交于
      I knew already when rewriting the sacktag that this condition
      was too conservative, change it now since it prevent lot of
      useless work (especially in the sack shifter decision code
      that is being added by a later patch). This shouldn't change
      anything really, just save some processing regardless of the
      shifter.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8bae275
    • I
      e1aa680f
  15. 31 10月, 2008 1 次提交
  16. 30 10月, 2008 1 次提交
  17. 29 10月, 2008 1 次提交
  18. 08 10月, 2008 3 次提交
    • A
      tcp: Fix possible double-ack w/ user dma · 53240c20
      Ali Saidi 提交于
      From: Ali Saidi <saidi@engin.umich.edu>
      
      When TCP receive copy offload is enabled it's possible that
      tcp_rcv_established() will cause two acks to be sent for a single
      packet. In the case that a tcp_dma_early_copy() is successful,
      copied_early is set to true which causes tcp_cleanup_rbuf() to be
      called early which can send an ack. Further along in
      tcp_rcv_established(), __tcp_ack_snd_check() is called and will
      schedule a delayed ACK. If no packets are processed before the delayed
      ack timer expires the packet will be acked twice.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53240c20
    • I
      tcp: cleanup messy initializer · 4a7e5609
      Ilpo Järvinen 提交于
      I'm quite sure that if I give this function in its old format
      for you to inspect, you start to wonder what is the type of
      demanded or if it's a global variable.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a7e5609
    • I
      tcp: kill pointless urg_mode · 33f5f57e
      Ilpo Järvinen 提交于
      It all started from me noticing that this urgent check in
      tcp_clean_rtx_queue is unnecessarily inside the loop. Then
      I took a longer look to it and found out that the users of
      urg_mode can trivially do without, well almost, there was
      one gotcha.
      
      Bonus: those funny people who use urg with >= 2^31 write_seq -
      snd_una could now rejoice too (that's the only purpose for the
      between being there, otherwise a simple compare would have done
      the thing). Not that I assume that the rest of the tcp code
      happily lives with such mind-boggling numbers :-). Alas, it
      turned out to be impossible to set wmem to such numbers anyway,
      yes I really tried a big sendfile after setting some wmem but
      nothing happened :-). ...Tcp_wmem is int and so is sk_sndbuf...
      So I hacked a bit variable to long and found out that it seems
      to work... :-)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33f5f57e