1. 14 3月, 2009 1 次提交
  2. 02 3月, 2009 1 次提交
  3. 18 2月, 2009 1 次提交
    • D
      net: Kill skb_truesize_check(), it only catches false-positives. · 92a0acce
      David S. Miller 提交于
      A long time ago we had bugs, primarily in TCP, where we would modify
      skb->truesize (for TSO queue collapsing) in ways which would corrupt
      the socket memory accounting.
      
      skb_truesize_check() was added in order to try and catch this error
      more systematically.
      
      However this debugging check has morphed into a Frankenstein of sorts
      and these days it does nothing other than catch false-positives.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92a0acce
  4. 16 2月, 2009 1 次提交
    • P
      net: infrastructure for hardware time stamping · ac45f602
      Patrick Ohly 提交于
      The additional per-packet information (16 bytes for time stamps, 1
      byte for flags) is stored for all packets in the skb_shared_info
      struct. This implementation detail is hidden from users of that
      information via skb_* accessor functions. A separate struct resp.
      union is used for the additional information so that it can be
      stored/copied easily outside of skb_shared_info.
      
      Compared to previous implementations (reusing the tstamp field
      depending on the context, optional additional structures) this
      is the simplest solution. It does not extend sk_buff itself.
      
      TX time stamping is implemented in software if the device driver
      doesn't support hardware time stamping.
      
      The new semantic for hardware/software time stamping around
      ndo_start_xmit() is based on two assumptions about existing
      network device drivers which don't support hardware time
      stamping and know nothing about it:
       - they leave the new skb_shared_tx unmodified
       - the keep the connection to the originating socket in skb->sk
         alive, i.e., don't call skb_orphan()
      
      Given that skb_shared_tx is new, the first assumption is safe.
      The second is only true for some drivers. As a result, software
      TX time stamping currently works with the bnx2 driver, but not
      with the unmodified igb driver (the two drivers this patch series
      was tested with).
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac45f602
  5. 10 2月, 2009 1 次提交
  6. 09 2月, 2009 1 次提交
    • D
      net: Increase default NET_SKB_PAD to 32. · d6301d3d
      David S. Miller 提交于
      Several devices need to insert some "pre headers" in front of the
      main packet data when they transmit a packet.
      
      Currently we allocate only 16 bytes of pad room and this ends up not
      being enough for some types of hardware (NIU, usb-net, s390 qeth,
      etc.)
      
      So increase this to 32.
      
      Note that drivers still need to check in their transmit routine
      whether enough headroom exists, and if not use skb_realloc_headroom().
      Tunneling, IPSEC, and other encapsulation methods can cause the
      padding area to be used up.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6301d3d
  7. 30 1月, 2009 1 次提交
    • H
      gro: Avoid copying headers of unmerged packets · 86911732
      Herbert Xu 提交于
      Unfortunately simplicity isn't always the best.  The fraginfo
      interface turned out to be suboptimal.  The problem was quite
      obvious.  For every packet, we have to copy the headers from
      the frags structure into skb->head, even though for 99% of the
      packets this part is immediately thrown away after the merge.
      
      LRO didn't have this problem because it directly read the headers
      from the frags structure.
      
      This patch attempts to address this by creating an interface
      that allows GRO to access the headers in the first frag without
      having to copy it.  Because all drivers that use frags place the
      headers in the first frag this optimisation should be enough.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86911732
  8. 28 1月, 2009 1 次提交
  9. 16 12月, 2008 1 次提交
    • H
      net: Add skb_gro_receive · 71d93b39
      Herbert Xu 提交于
      This patch adds the helper skb_gro_receive to merge packets for
      GRO.  The current method is to allocate a new header skb and then
      chain the original packets to its frag_list.  This is done to
      make it easier to integrate into the existing GSO framework.
      
      In future as GSO is moved into the drivers, we can undo this and
      simply chain the original packets together.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71d93b39
  10. 25 11月, 2008 1 次提交
    • I
      tcp: Try to restore large SKBs while SACK processing · 832d11c5
      Ilpo Järvinen 提交于
      During SACK processing, most of the benefits of TSO are eaten by
      the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
      Then we're in problems when cleanup work for them has to be done
      when a large cumulative ACK comes. Try to return back to pre-split
      state already while more and more SACK info gets discovered by
      combining newly discovered SACK areas with the previous skb if
      that's SACKed as well.
      
      This approach has a number of benefits:
      
      1) The processing overhead is spread more equally over the RTT
      2) Write queue has less skbs to process (affect everything
         which has to walk in the queue past the sacked areas)
      3) Write queue is consistent whole the time, so no other parts
         of TCP has to be aware of this (this was not the case with
         some other approach that was, well, quite intrusive all
         around).
      4) Clean_rtx_queue can release most of the pages using single
         put_page instead of previous PAGE_SIZE/mss+1 calls
      
      In case a hole is fully filled by the new SACK block, we attempt
      to combine the next skb too which allows construction of skbs
      that are even larger than what tso split them to and it handles
      hole per on every nth patterns that often occur during slow start
      overshoot pretty nicely. Though this to be really useful also
      a retransmission would have to get lost since cumulative ACKs
      advance one hole at a time in the most typical case.
      
      TODO: handle upwards only merging. That should be rather easy
      when segment is fully sacked but I'm leaving that as future
      work item (it won't make very large difference anyway since
      this current approach already covers quite a lot of normal
      cases).
      
      I was earlier thinking of some sophisticated way of tracking
      timestamps of the first and the last segment but later on
      realized that it won't be that necessary at all to store the
      timestamp of the last segment. The cases that can occur are
      basically either:
        1) ambiguous => no sensible measurement can be taken anyway
        2) non-ambiguous is due to reordering => having the timestamp
           of the last segment there is just skewing things more off
           than does some good since the ack got triggered by one of
           the holes (besides some substle issues that would make
           determining right hole/skb even harder problem). Anyway,
           it has nothing to do with this change then.
      
      I choose to route some abnormal looking cases with goto noop,
      some could be handled differently (eg., by stopping the
      walking at that skb but again). In general, they either
      shouldn't happen at all or are rare enough to make no difference
      in practice.
      
      In theory this change (as whole) could cause some macroscale
      regression (global) because of cache misses that are taken over
      the round-trip time but it gets very likely better because of much
      less (local) cache misses per other write queue walkers and the
      big recovery clearing cumulative ack.
      
      Worth to note that these benefits would be very easy to get also
      without TSO/GSO being on as long as the data is in pages so that
      we can merge them. Currently I won't let that happen because
      DSACK splitting at fragment that would mess up pcounts due to
      sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
      avoided, we have some conditions that can be made less strict.
      
      TODO: I will probably have to convert the excessive pointer
      passing to struct sacktag_state... :-)
      
      My testing revealed that considerable amount of skbs couldn't
      be shifted because they were cloned (most likely still awaiting
      tx reclaim)...
      
      [The rest is considering future work instead since I got
      repeatably EFAULT to tcpdump's recvfrom when I added
      pskb_expand_head to deal with clones, so I separated that
      into another, later patch]
      
      ...To counter that, I gave up on the fifth advantage:
      
      5) When growing previous SACK block, less allocs for new skbs
         are done, basically a new alloc is needed only when new hole
         is detected and when the previous skb runs out of frags space
      
      ...which now only happens of if reclaim is fast enough to dispose
      the clone before the SACK block comes in (the window is RTT long),
      otherwise we'll have to alloc some.
      
      With clones being handled I got these numbers (will be somewhat
      worse without that), taken with fine-grained mibs:
      
                        TCPSackShifted 398
                         TCPSackMerged 877
                  TCPSackShiftFallback 320
            TCPSACKCOLLAPSEFALLBACKGSO 0
        TCPSACKCOLLAPSEFALLBACKSKBBITS 0
        TCPSACKCOLLAPSEFALLBACKSKBDATA 0
          TCPSACKCOLLAPSEFALLBACKBELOW 0
          TCPSACKCOLLAPSEFALLBACKFIRST 1
       TCPSACKCOLLAPSEFALLBACKPREVBITS 318
            TCPSACKCOLLAPSEFALLBACKMSS 1
         TCPSACKCOLLAPSEFALLBACKNOHEAD 0
          TCPSACKCOLLAPSEFALLBACKSHIFT 0
                TCPSACKCOLLAPSENOOPSEQ 0
        TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
           TCPSACKCOLLAPSENOOPSMALLLEN 0
                   TCPSACKCOLLAPSEHOLE 12
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      832d11c5
  11. 01 11月, 2008 1 次提交
  12. 29 10月, 2008 1 次提交
  13. 08 10月, 2008 1 次提交
  14. 01 10月, 2008 1 次提交
  15. 23 9月, 2008 4 次提交
  16. 22 9月, 2008 1 次提交
  17. 11 9月, 2008 2 次提交
  18. 16 8月, 2008 1 次提交
  19. 12 8月, 2008 1 次提交
  20. 01 8月, 2008 1 次提交
  21. 30 7月, 2008 1 次提交
  22. 15 7月, 2008 1 次提交
    • P
      vlan: Don't store VLAN tag in cb · 6aa895b0
      Patrick McHardy 提交于
      Use a real skb member to store the skb to avoid clashes with qdiscs,
      which are allowed to use the cb area themselves. As currently only real
      devices that consume the skb set the NETIF_F_HW_VLAN_TX flag, no explicit
      invalidation is neccessary.
      
      The new member fills a hole on 64 bit, the skb layout changes from:
      
              __u32                      mark;                 /*   172     4 */
              sk_buff_data_t             transport_header;     /*   176     4 */
              sk_buff_data_t             network_header;       /*   180     4 */
              sk_buff_data_t             mac_header;           /*   184     4 */
              sk_buff_data_t             tail;                 /*   188     4 */
              /* --- cacheline 3 boundary (192 bytes) --- */
              sk_buff_data_t             end;                  /*   192     4 */
      
              /* XXX 4 bytes hole, try to pack */
      
      to
      
              __u32                      mark;                 /*   172     4 */
              __u16                      vlan_tci;             /*   176     2 */
      
              /* XXX 2 bytes hole, try to pack */
      
              sk_buff_data_t             transport_header;     /*   180     4 */
              sk_buff_data_t             network_header;       /*   184     4 */
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6aa895b0
  23. 09 7月, 2008 1 次提交
  24. 20 6月, 2008 1 次提交
  25. 22 4月, 2008 1 次提交
  26. 21 4月, 2008 1 次提交
  27. 14 4月, 2008 4 次提交
  28. 03 4月, 2008 2 次提交
  29. 28 3月, 2008 4 次提交