1. 30 6月, 2006 3 次提交
    • A
      [NET]: make skb_release_data() static · 5bba1712
      Adrian Bunk 提交于
      skb_release_data() no longer has any users in other files.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bba1712
    • M
      [NET]: Add ECN support for TSO · b0da8537
      Michael Chan 提交于
      In the current TSO implementation, NETIF_F_TSO and ECN cannot be
      turned on together in a TCP connection.  The problem is that most
      hardware that supports TSO does not handle CWR correctly if it is set
      in the TSO packet.  Correct handling requires CWR to be set in the
      first packet only if it is set in the TSO header.
      
      This patch adds the ability to turn on NETIF_F_TSO and ECN using
      GSO if necessary to handle TSO packets with CWR set.  Hardware
      that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
      features flag.
      
      All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
      the output device does not have the NETIF_F_TSO_ECN feature set, GSO
      will split the packet up correctly with CWR only set in the first
      segment.
      
      With help from Herbert Xu <herbert@gondor.apana.org.au>.
      
      Since ECN can always be enabled with TSO, the SOCK_NO_LARGESEND sock
      flag is completely removed.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0da8537
    • H
      [NET]: Added GSO header verification · 576a30eb
      Herbert Xu 提交于
      When GSO packets come from an untrusted source (e.g., a Xen guest domain),
      we need to verify the header integrity before passing it to the hardware.
      
      Since the first step in GSO is to verify the header, we can reuse that
      code by adding a new bit to gso_type: SKB_GSO_DODGY.  Packets with this
      bit set can only be fed directly to devices with the corresponding bit
      NETIF_F_GSO_ROBUST.  If the device doesn't have that bit, then the skb
      is fed to the GSO engine which will allow the packet to be sent to the
      hardware if it passes the header check.
      
      This patch changes the sg flag to a full features flag.  The same method
      can be used to implement TSO ECN support.  We simply have to mark packets
      with CWR set with SKB_GSO_ECN so that only hardware with a corresponding
      NETIF_F_TSO_ECN can accept them.  The GSO engine can either fully segment
      the packet, or segment the first MTU and pass the rest to the hardware for
      further segmentation.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      576a30eb
  2. 23 6月, 2006 4 次提交
    • R
      [NET]: fix net-core kernel-doc · f4b8ea78
      Randy Dunlap 提交于
      Warning(/var/linsrc/linux-2617-g4//include/linux/skbuff.h:304): No description found for parameter 'dma_cookie'
      Warning(/var/linsrc/linux-2617-g4//include/net/sock.h:1274): No description found for parameter 'copied_early'
      Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'chan'
      Warning(/var/linsrc/linux-2617-g4//net/core/dev.c:3309): No description found for parameter 'event'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4b8ea78
    • H
      [NET]: Add software TSOv4 · f4c50d99
      Herbert Xu 提交于
      This patch adds the GSO implementation for IPv4 TCP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4c50d99
    • H
      [NET]: Merge TSO/UFO fields in sk_buff · 7967168c
      Herbert Xu 提交于
      Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not
      going to scale if we add any more segmentation methods (e.g., DCCP).  So
      let's merge them.
      
      They were used to tell the protocol of a packet.  This function has been
      subsumed by the new gso_type field.  This is essentially a set of netdev
      feature bits (shifted by 16 bits) that are required to process a specific
      skb.  As such it's easy to tell whether a given device can process a GSO
      skb: you just have to and the gso_type field and the netdev's features
      field.
      
      I've made gso_type a conjunction.  The idea is that you have a base type
      (e.g., SKB_GSO_TCPV4) that can be modified further to support new features.
      For example, if we add a hardware TSO type that supports ECN, they would
      declare NETIF_F_TSO | NETIF_F_TSO_ECN.  All TSO packets with CWR set would
      have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO
      packets would be SKB_GSO_TCPV4.  This means that only the CWR packets need
      to be emulated in software.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7967168c
    • H
      [NET]: Avoid allocating skb in skb_pad · 5b057c6b
      Herbert Xu 提交于
      First of all it is unnecessary to allocate a new skb in skb_pad since
      the existing one is not shared.  More importantly, our hard_start_xmit
      interface does not allow a new skb to be allocated since that breaks
      requeueing.
      
      This patch uses pskb_expand_head to expand the existing skb and linearize
      it if needed.  Actually, someone should sift through every instance of
      skb_pad on a non-linear skb as they do not fit the reasons why this was
      originally created.
      
      Incidentally, this fixes a minor bug when the skb is cloned (tcpdump,
      TCP, etc.).  As it is skb_pad will simply write over a cloned skb.  Because
      of the position of the write it is unlikely to cause problems but still
      it's best if we don't do it.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b057c6b
  3. 18 6月, 2006 4 次提交
  4. 26 4月, 2006 1 次提交
  5. 20 4月, 2006 1 次提交
    • D
      [NET]: Add skb->truesize assertion checking. · dc6de336
      David S. Miller 提交于
      Add some sanity checking.  truesize should be at least sizeof(struct
      sk_buff) plus the current packet length.  If not, then truesize is
      seriously mangled and deserves a kernel log message.
      
      Currently we'll do the check for release of stream socket buffers.
      
      But we can add checks to more spots over time.
      
      Incorporating ideas from Herbert Xu.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc6de336
  6. 31 3月, 2006 1 次提交
  7. 21 3月, 2006 4 次提交
    • H
      [NET]: Replace skb_pull/skb_postpull_rcsum with skb_pull_rcsum · cbb042f9
      Herbert Xu 提交于
      We're now starting to have quite a number of places that do skb_pull
      followed immediately by an skb_postpull_rcsum.  We can merge these two
      operations into one function with skb_pull_rcsum.  This makes sense
      since most pull operations on receive skb's need to update the
      checksum.
      
      I've decided to make this out-of-line since it is fairly big and the
      fast path where hardware checksums are enabled need to call
      csum_partial anyway.
      
      Since this is a brand new function we get to add an extra check on the
      len argument.  As it is most callers of skb_pull ignore its return
      value which essentially means that there is no check on the len
      argument.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbb042f9
    • J
      [NET]: Uninline kfree_skb and allow NULL argument · 231d06ae
      Jörn Engel 提交于
      o Uninline kfree_skb, which saves some 15k of object code on my notebook.
      
      o Allow kfree_skb to be called with a NULL argument.
      
        Subsequent patches can remove conditional from drivers and further
        reduce source and object size.
      Signed-off-by: NJrn Engel <joern@wohnheim.fh-wedel.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      231d06ae
    • P
      [NETFILTER]: Fix skb->nf_bridge lifetime issues · a193a4ab
      Patrick McHardy 提交于
      The bridge netfilter code simulates the NF_IP_PRE_ROUTING hook and skips
      the real hook by registering with high priority and returning NF_STOP if
      skb->nf_bridge is present and the BRNF_NF_BRIDGE_PREROUTING flag is not
      set. The flag is only set during the simulated hook.
      
      Because skb->nf_bridge is only freed when the packet is destroyed, the
      packet will not only skip the first invocation of NF_IP_PRE_ROUTING, but
      in the case of tunnel devices on top of the bridge also all further ones.
      Forwarded packets from a bridge encapsulated by a tunnel device and sent
      as locally outgoing packet will also still have the incorrect bridge
      information from the input path attached.
      
      We already have nf_reset calls on all RX/TX paths of tunnel devices,
      so simply reset the nf_bridge field there too. As an added bonus,
      the bridge information for locally delivered packets is now also freed
      when the packet is queued to a socket.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a193a4ab
    • P
      [NET]: Reduce size of struct sk_buff on 64 bit architectures · 77d2ca35
      Patrick McHardy 提交于
      Move skb->nf_mark next to skb->tc_index to remove a 4 byte hole between
      skb->nfmark and skb->nfct and another one between skb->users and skb->head
      when CONFIG_NETFILTER, CONFIG_NET_SCHED and CONFIG_NET_CLS_ACT are enabled.
      For all other combinations the size stays the same.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77d2ca35
  8. 17 1月, 2006 1 次提交
  9. 08 1月, 2006 1 次提交
    • P
      [NETFILTER]: Fix xfrm lookup in ip_route_me_harder/ip6_route_me_harder · 3e3850e9
      Patrick McHardy 提交于
      ip_route_me_harder doesn't use the port numbers of the xfrm lookup and
      uses ip_route_input for non-local addresses which doesn't do a xfrm
      lookup, ip6_route_me_harder doesn't do a xfrm lookup at all.
      
      Use xfrm_decode_session and do the lookup manually, make sure both
      only do the lookup if the packet hasn't been transformed already.
      
      Makeing sure the lookup only happens once needs a new field in the
      IP6CB, which exceeds the size of skb->cb. The size of skb->cb is
      increased to 48b. Apparently the IPv6 mobile extensions need some
      more room anyway.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e3850e9
  10. 04 1月, 2006 3 次提交
    • B
      [NET]: Speed up __alloc_skb() · 4947d3ef
      Benjamin LaHaise 提交于
      From: Benjamin LaHaise <bcrl@kvack.org>
      
      In __alloc_skb(), the use of skb_shinfo() which casts a u8 * to the 
      shared info structure results in gcc being forced to do a reload of the 
      pointer since it has no information on possible aliasing.  Fix this by 
      using a pointer to refer to skb_shared_info.
      
      By initializing skb_shared_info sequentially, the write combining buffers 
      can reduce the number of memory transactions to a single write.  Reorder 
      the initialization in __alloc_skb() to match the structure definition.  
      There is also an alignment issue on 64 bit systems with skb_shared_info 
      by converting nr_frags to a short everything packs up nicely.
      
      Also, pass the slab cache pointer according to the fclone flag instead 
      of using two almost identical function calls.
      
      This raises bw_unix performance up to a peak of 707KB/s when combined 
      with the spinlock patch.  It should help other networking protocols, too.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4947d3ef
    • A
      [NET]: Small cleanup to socket initialization · 77d76ea3
      Andi Kleen 提交于
      sock_init can be done as a core_initcall instead of calling
      it directly in init/main.c
      
      Also I removed an out of date #ifdef.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77d76ea3
    • H
      [IP]: Simplify and consolidate MSG_PEEK error handling · 3305b80c
      Herbert Xu 提交于
      When a packet is obtained from skb_recv_datagram with MSG_PEEK enabled
      it is left on the socket receive queue.  This means that when we detect
      a checksum error we have to be careful when trying to free the packet
      as someone could have dequeued it in the time being.
      
      Currently this delicate logic is duplicated three times between UDPv4,
      UDPv6 and RAWv6.  This patch moves them into a one place and simplifies
      the code somewhat.
      
      This is based on a suggestion by Eric Dumazet.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3305b80c
  11. 21 11月, 2005 2 次提交
  12. 11 11月, 2005 1 次提交
    • H
      [NET]: Detect hardware rx checksum faults correctly · fb286bb2
      Herbert Xu 提交于
      Here is the patch that introduces the generic skb_checksum_complete
      which also checks for hardware RX checksum faults.  If that happens,
      it'll call netdev_rx_csum_fault which currently prints out a stack
      trace with the device name.  In future it can turn off RX checksum.
      
      I've converted every spot under net/ that does RX checksum checks to
      use skb_checksum_complete or __skb_checksum_complete with the
      exceptions of:
      
      * Those places where checksums are done bit by bit.  These will call
      netdev_rx_csum_fault directly.
      
      * The following have not been completely checked/converted:
      
      ipmr
      ip_vs
      netfilter
      dccp
      
      This patch is based on patches and suggestions from Stephen Hemminger
      and David S. Miller.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb286bb2
  13. 10 11月, 2005 1 次提交
    • Y
      [NETFILTER]: Add nf_conntrack subsystem. · 9fb9cbb1
      Yasuyuki Kozakai 提交于
      The existing connection tracking subsystem in netfilter can only
      handle ipv4.  There were basically two choices present to add
      connection tracking support for ipv6.  We could either duplicate all
      of the ipv4 connection tracking code into an ipv6 counterpart, or (the
      choice taken by these patches) we could design a generic layer that
      could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
      (TCP, UDP, etc.) connection tracking helper module to be written.
      
      In fact nf_conntrack is capable of working with any layer 3
      protocol.
      
      The existing ipv4 specific conntrack code could also not deal
      with the pecularities of doing connection tracking on ipv6,
      which is also cured here.  For example, these issues include:
      
      1) ICMPv6 handling, which is used for neighbour discovery in
         ipv6 thus some messages such as these should not participate
         in connection tracking since effectively they are like ARP
         messages
      
      2) fragmentation must be handled differently in ipv6, because
         the simplistic "defrag, connection track and NAT, refrag"
         (which the existing ipv4 connection tracking does) approach simply
         isn't feasible in ipv6
      
      3) ipv6 extension header parsing must occur at the correct spots
         before and after connection tracking decisions, and there were
         no provisions for this in the existing connection tracking
         design
      
      4) ipv6 has no need for stateful NAT
      
      The ipv4 specific conntrack layer is kept around, until all of
      the ipv4 specific conntrack helpers are ported over to nf_conntrack
      and it is feature complete.  Once that occurs, the old conntrack
      stuff will get placed into the feature-removal-schedule and we will
      fully kill it off 6 months later.
      Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
      Signed-off-by: NHarald Welte <laforge@netfilter.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      9fb9cbb1
  14. 06 11月, 2005 1 次提交
  15. 29 10月, 2005 1 次提交
    • A
      [IPv4/IPv6]: UFO Scatter-gather approach · e89e9cf5
      Ananda Raju 提交于
      Attached is kernel patch for UDP Fragmentation Offload (UFO) feature.
      
      1. This patch incorporate the review comments by Jeff Garzik.
      2. Renamed USO as UFO (UDP Fragmentation Offload)
      3. udp sendfile support with UFO
      
      This patches uses scatter-gather feature of skb to generate large UDP
      datagram. Below is a "how-to" on changes required in network device
      driver to use the UFO interface.
      
      UDP Fragmentation Offload (UFO) Interface:
      -------------------------------------------
      UFO is a feature wherein the Linux kernel network stack will offload the
      IP fragmentation functionality of large UDP datagram to hardware. This
      will reduce the overhead of stack in fragmenting the large UDP datagram to
      MTU sized packets
      
      1) Drivers indicate their capability of UFO using
      dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG
      
      NETIF_F_HW_CSUM is required for UFO over ipv6.
      
      2) UFO packet will be submitted for transmission using driver xmit routine.
      UFO packet will have a non-zero value for
      
      "skb_shinfo(skb)->ufo_size"
      
      skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
      fragment going out of the adapter after IP fragmentation by hardware.
      
      skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
      contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
      indicating that hardware has to do checksum calculation. Hardware should
      compute the UDP checksum of complete datagram and also ip header checksum of
      each fragmented IP packet.
      
      For IPV6 the UFO provides the fragment identification-id in
      skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating
      IPv6 fragments.
      Signed-off-by: NAnanda Raju <ananda.raju@neterion.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      e89e9cf5
  16. 26 10月, 2005 1 次提交
  17. 09 10月, 2005 1 次提交
  18. 04 10月, 2005 1 次提交
    • H
      [NET]: Fix packet timestamping. · 325ed823
      Herbert Xu 提交于
      I've found the problem in general.  It affects any 64-bit
      architecture.  The problem occurs when you change the system time.
      
      Suppose that when you boot your system clock is forward by a day.
      This gets recorded down in skb_tv_base.  You then wind the clock back
      by a day.  From that point onwards the offset will be negative which
      essentially overflows the 32-bit variables they're stored in.
      
      In fact, why don't we just store the real time stamp in those 32-bit
      variables? After all, we're not going to overflow for quite a while
      yet.
      
      When we do overflow, we'll need a better solution of course.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      325ed823
  19. 09 9月, 2005 1 次提交
  20. 07 9月, 2005 1 次提交
  21. 30 8月, 2005 6 次提交