You need to sign in or sign up before continuing.
  1. 18 2月, 2009 1 次提交
    • D
      net: Kill skb_truesize_check(), it only catches false-positives. · 92a0acce
      David S. Miller 提交于
      A long time ago we had bugs, primarily in TCP, where we would modify
      skb->truesize (for TSO queue collapsing) in ways which would corrupt
      the socket memory accounting.
      
      skb_truesize_check() was added in order to try and catch this error
      more systematically.
      
      However this debugging check has morphed into a Frankenstein of sorts
      and these days it does nothing other than catch false-positives.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a0acce
  2. 16 12月, 2008 1 次提交
    • H
      net: Add skb_gro_receive · 71d93b39
      Herbert Xu 提交于
      This patch adds the helper skb_gro_receive to merge packets for
      GRO.  The current method is to allocate a new header skb and then
      chain the original packets to its frag_list.  This is done to
      make it easier to integrate into the existing GSO framework.
      
      In future as GSO is moved into the drivers, we can undo this and
      simply chain the original packets together.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71d93b39
  3. 25 11月, 2008 1 次提交
    • I
      tcp: Try to restore large SKBs while SACK processing · 832d11c5
      Ilpo Järvinen 提交于
      During SACK processing, most of the benefits of TSO are eaten by
      the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
      Then we're in problems when cleanup work for them has to be done
      when a large cumulative ACK comes. Try to return back to pre-split
      state already while more and more SACK info gets discovered by
      combining newly discovered SACK areas with the previous skb if
      that's SACKed as well.
      
      This approach has a number of benefits:
      
      1) The processing overhead is spread more equally over the RTT
      2) Write queue has less skbs to process (affect everything
         which has to walk in the queue past the sacked areas)
      3) Write queue is consistent whole the time, so no other parts
         of TCP has to be aware of this (this was not the case with
         some other approach that was, well, quite intrusive all
         around).
      4) Clean_rtx_queue can release most of the pages using single
         put_page instead of previous PAGE_SIZE/mss+1 calls
      
      In case a hole is fully filled by the new SACK block, we attempt
      to combine the next skb too which allows construction of skbs
      that are even larger than what tso split them to and it handles
      hole per on every nth patterns that often occur during slow start
      overshoot pretty nicely. Though this to be really useful also
      a retransmission would have to get lost since cumulative ACKs
      advance one hole at a time in the most typical case.
      
      TODO: handle upwards only merging. That should be rather easy
      when segment is fully sacked but I'm leaving that as future
      work item (it won't make very large difference anyway since
      this current approach already covers quite a lot of normal
      cases).
      
      I was earlier thinking of some sophisticated way of tracking
      timestamps of the first and the last segment but later on
      realized that it won't be that necessary at all to store the
      timestamp of the last segment. The cases that can occur are
      basically either:
        1) ambiguous => no sensible measurement can be taken anyway
        2) non-ambiguous is due to reordering => having the timestamp
           of the last segment there is just skewing things more off
           than does some good since the ack got triggered by one of
           the holes (besides some substle issues that would make
           determining right hole/skb even harder problem). Anyway,
           it has nothing to do with this change then.
      
      I choose to route some abnormal looking cases with goto noop,
      some could be handled differently (eg., by stopping the
      walking at that skb but again). In general, they either
      shouldn't happen at all or are rare enough to make no difference
      in practice.
      
      In theory this change (as whole) could cause some macroscale
      regression (global) because of cache misses that are taken over
      the round-trip time but it gets very likely better because of much
      less (local) cache misses per other write queue walkers and the
      big recovery clearing cumulative ack.
      
      Worth to note that these benefits would be very easy to get also
      without TSO/GSO being on as long as the data is in pages so that
      we can merge them. Currently I won't let that happen because
      DSACK splitting at fragment that would mess up pcounts due to
      sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
      avoided, we have some conditions that can be made less strict.
      
      TODO: I will probably have to convert the excessive pointer
      passing to struct sacktag_state... :-)
      
      My testing revealed that considerable amount of skbs couldn't
      be shifted because they were cloned (most likely still awaiting
      tx reclaim)...
      
      [The rest is considering future work instead since I got
      repeatably EFAULT to tcpdump's recvfrom when I added
      pskb_expand_head to deal with clones, so I separated that
      into another, later patch]
      
      ...To counter that, I gave up on the fifth advantage:
      
      5) When growing previous SACK block, less allocs for new skbs
         are done, basically a new alloc is needed only when new hole
         is detected and when the previous skb runs out of frags space
      
      ...which now only happens of if reclaim is fast enough to dispose
      the clone before the SACK block comes in (the window is RTT long),
      otherwise we'll have to alloc some.
      
      With clones being handled I got these numbers (will be somewhat
      worse without that), taken with fine-grained mibs:
      
                        TCPSackShifted 398
                         TCPSackMerged 877
                  TCPSackShiftFallback 320
            TCPSACKCOLLAPSEFALLBACKGSO 0
        TCPSACKCOLLAPSEFALLBACKSKBBITS 0
        TCPSACKCOLLAPSEFALLBACKSKBDATA 0
          TCPSACKCOLLAPSEFALLBACKBELOW 0
          TCPSACKCOLLAPSEFALLBACKFIRST 1
       TCPSACKCOLLAPSEFALLBACKPREVBITS 318
            TCPSACKCOLLAPSEFALLBACKMSS 1
         TCPSACKCOLLAPSEFALLBACKNOHEAD 0
          TCPSACKCOLLAPSEFALLBACKSHIFT 0
                TCPSACKCOLLAPSENOOPSEQ 0
        TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
           TCPSACKCOLLAPSENOOPSMALLLEN 0
                   TCPSACKCOLLAPSEHOLE 12
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      832d11c5
  4. 01 11月, 2008 1 次提交
  5. 29 10月, 2008 1 次提交
  6. 08 10月, 2008 1 次提交
  7. 01 10月, 2008 1 次提交
  8. 23 9月, 2008 4 次提交
  9. 22 9月, 2008 1 次提交
  10. 11 9月, 2008 2 次提交
  11. 16 8月, 2008 1 次提交
  12. 12 8月, 2008 1 次提交
  13. 01 8月, 2008 1 次提交
  14. 30 7月, 2008 1 次提交
  15. 15 7月, 2008 1 次提交
    • P
      vlan: Don't store VLAN tag in cb · 6aa895b0
      Patrick McHardy 提交于
      Use a real skb member to store the skb to avoid clashes with qdiscs,
      which are allowed to use the cb area themselves. As currently only real
      devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
      invalidation is neccessary.
      
      The new member fills a hole on 64 bit, the skb layout changes from:
      
              __u32                      mark;                 /*   172     4 */
              sk_buff_data_t             transport_header;     /*   176     4 */
              sk_buff_data_t             network_header;       /*   180     4 */
              sk_buff_data_t             mac_header;           /*   184     4 */
              sk_buff_data_t             tail;                 /*   188     4 */
              /* --- cacheline 3 boundary (192 bytes) --- */
              sk_buff_data_t             end;                  /*   192     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
      to
      
              __u32                      mark;                 /*   172     4 */
              __u16                      vlan_tci;             /*   176     2 */
      
              /* XXX 2 bytes hole, try to pack */
      
              sk_buff_data_t             transport_header;     /*   180     4 */
              sk_buff_data_t             network_header;       /*   184     4 */
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6aa895b0
  16. 09 7月, 2008 1 次提交
  17. 20 6月, 2008 1 次提交
  18. 22 4月, 2008 1 次提交
  19. 21 4月, 2008 1 次提交
  20. 14 4月, 2008 4 次提交
  21. 03 4月, 2008 2 次提交
  22. 28 3月, 2008 5 次提交
  23. 06 3月, 2008 1 次提交
  24. 19 2月, 2008 1 次提交
  25. 04 2月, 2008 1 次提交
  26. 01 2月, 2008 1 次提交
  27. 29 1月, 2008 2 次提交
    • H
      [UDP]: Only increment counter on first peek/recv · a59322be
      Herbert Xu 提交于
      The previous move of the the UDP inDatagrams counter caused each
      peek of the same packet to be counted separately.  This may be
      undesirable.
      
      This patch fixes this by adding a bit to sk_buff to record whether
      this packet has already been seen through skb_recv_datagram.  We
      then only increment the counter when the packet is seen for the
      first time.
      
      The only dodgy part is the fact that skb_recv_datagram doesn't have
      a good way of returning this new bit of information.  So I've added
      a new function __skb_recv_datagram that does return this and made
      skb_recv_datagram a wrapper around it.
      
      The plan is to eventually replace all uses of skb_recv_datagram with
      this new function at which time it can be renamed its proper name.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a59322be
    • H
      [UDP]: Avoid repeated counting of checksum errors due to peeking · 27ab2568
      Herbert Xu 提交于
      Currently it is possible for two processes to peek on the same socket
      and end up incrementing the error counter twice for the same packet.
      
      This patch fixes it by making skb_kill_datagram return whether it
      succeeded in unlinking the packet and only incrementing the counter
      if it did.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27ab2568