1. 19 2月, 2009 1 次提交
  2. 16 2月, 2009 2 次提交
    • P
      d24fff22
    • P
      net: infrastructure for hardware time stamping · ac45f602
      Patrick Ohly 提交于
      The additional per-packet information (16 bytes for time stamps, 1
      byte for flags) is stored for all packets in the skb_shared_info
      struct. This implementation detail is hidden from users of that
      information via skb_* accessor functions. A separate struct resp.
      union is used for the additional information so that it can be
      stored/copied easily outside of skb_shared_info.
      
      Compared to previous implementations (reusing the tstamp field
      depending on the context, optional additional structures) this
      is the simplest solution. It does not extend sk_buff itself.
      
      TX time stamping is implemented in software if the device driver
      doesn't support hardware time stamping.
      
      The new semantic for hardware/software time stamping around
      ndo_start_xmit() is based on two assumptions about existing
      network device drivers which don't support hardware time
      stamping and know nothing about it:
       - they leave the new skb_shared_tx unmodified
       - the keep the connection to the originating socket in skb->sk
         alive, i.e., don't call skb_orphan()
      
      Given that skb_shared_tx is new, the first assumption is safe.
      The second is only true for some drivers. As a result, software
      TX time stamping currently works with the bnx2 driver, but not
      with the unmodified igb driver (the two drivers this patch series
      was tested with).
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac45f602
  3. 09 2月, 2009 2 次提交
  4. 07 2月, 2009 1 次提交
  5. 06 2月, 2009 1 次提交
    • H
      gro: Fix frag_list merging on imprecisely split packets · 56035022
      Herbert Xu 提交于
      The previous fix ad0f9904 (gro:
      Fix handling of imprecisely split packets) only fixed the case
      of frags merging, frag_list merging in the same circumstances
      were still broken.
      
      In particular, the packet headers end up in the data stream.
      
      This patch fixes this plus another issue where an imprecisely
      split packet header may be read incorrectly (this is mostly
      harmless since it'll simply cause the packet to not match and
      be rejected for GRO).
      
      Thanks to Emil Tantilov and Jeff Kirsher for helping to track
      this down.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56035022
  6. 05 2月, 2009 1 次提交
    • H
      net: Partially allow skb destructors to be used on receive path · 9a279bcb
      Herbert Xu 提交于
      As it currently stands, skb destructors are forbidden on the
      receive path because the protocol end-points will overwrite
      any existing destructor with their own.
      
      This is the reason why we have to call skb_orphan in the loopback
      driver before we reinject the packet back into the stack, thus
      creating a period during which loopback traffic isn't charged
      to any socket.
      
      With virtualisation, we have a similar problem in that traffic
      is reinjected into the stack without being associated with any
      socket entity, thus providing no natural congestion push-back
      for those poor folks still stuck with UDP.
      
      Now had we been consistent in telling them that UDP simply has
      no congestion feedback, I could just fob them off.  Unfortunately,
      we appear to have gone to some length in catering for this on
      the standard UDP path, with skb/socket accounting so that has
      created a very unhealthy dependency.
      
      Alas habits are difficult to break out of, so we may just have
      to allow skb destructors on the receive path.
      
      It turns out that making skb destructors useable on the receive path
      isn't as easy as it seems.  For instance, simply adding skb_orphan
      to skb_set_owner_r isn't enough.  This is because we assume all
      over the IP stack that skb->sk is an IP socket if present.
      
      The new transparent proxy code goes one step further and assumes
      that skb->sk is the receiving socket if present.
      
      Now all of this can be dealt with by adding simple checks such
      as only treating skb->sk as an IP socket if skb->sk->sk_family
      matches.  However, it turns out that for bridging at least we
      don't need to do all of this work.
      
      This is of interest because most virtualisation setups use bridging
      so we don't actually go through the IP stack on the host (with
      the exception of our old nemesis the bridge netfilter, but that's
      easily taken care of).
      
      So this patch simply adds skb_orphan to the point just before we
      enter the IP stack, but after we've gone through the bridge on the
      receive path.  It also adds an skb_orphan to the one place in
      netfilter that touches skb->sk/skb->destructor, that is, tproxy.
      
      One word of caution, because of the internal code structure, anyone
      wishing to deploy this must use skb_set_owner_w as opposed to
      skb_set_owner_r since many functions that create a new skb from
      an existing one will invoke skb_set_owner_w on the new skb.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a279bcb
  7. 01 2月, 2009 1 次提交
    • H
      gro: Fix handling of imprecisely split packets · ad0f9904
      Herbert Xu 提交于
      The commit 89a1b249edcf9be884e71f92df84d48355c576aa (gro: Avoid
      copying headers of unmerged packets) only worked for packets
      which are either completely linear, completely non-linear, or
      packets which exactly split at the boundary between headers and
      payload.
      
      Anything else would cause bits in the header to go missing if
      the packet is held by GRO.
      
      This may have broken drivers such as ixgbe.
      
      This patch fixes the places that assumed or only worked with
      the above cases.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad0f9904
  8. 30 1月, 2009 3 次提交
  9. 28 1月, 2009 3 次提交
  10. 21 1月, 2009 1 次提交
  11. 20 1月, 2009 1 次提交
  12. 15 1月, 2009 3 次提交
  13. 11 1月, 2009 1 次提交
  14. 07 1月, 2009 5 次提交
  15. 05 1月, 2009 2 次提交
    • H
      gro: Add page frag support · 5d38a079
      Herbert Xu 提交于
      This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
      in one skb, rather than using the less efficient frag_list.
      
      It also adds a new interface, napi_gro_frags to allow drivers
      to inject page frags directly into the stack without allocating
      an skb.  This is intended to be the GRO equivalent for LRO's
      lro_receive_frags interface.
      
      The existing GSO interface can already handle page frags with
      or without an appended frag_list so nothing needs to be changed
      there.
      
      The merging itself is rather simple.  We store any new frag entries
      after the last existing entry, without checking whether the first
      new entry can be merged with the last existing entry.  Making this
      check would actually be easy but since no existing driver can
      produce contiguous frags anyway it would just be mental masturbation.
      
      If the total number of entries would exceed the capacity of a
      single skb, we simply resort to using frag_list as we do now.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d38a079
    • H
      gro: Use gso_size to store MSS · b530256d
      Herbert Xu 提交于
      In order to allow GRO packets without frag_list at all, we need to
      store the MSS in the packet itself.  The obvious place is gso_size.
      The only thing to watch out for is if the packet ends up not being
      GRO then we need to clear gso_size before pushing the packet into
      the stack.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b530256d
  16. 30 12月, 2008 1 次提交
    • E
      netns: foreach_netdev_safe is insufficient in default_device_exit · 8eb79863
      Eric W. Biederman 提交于
      During network namespace teardown we either move or delete
      all of the network devices associated with a network namespace.
      In the case of veth devices deleting one will also delete it's
      pair device.  If both devices are in the same network namespace
      then for_each_netdev_safe is insufficient as next may point
      to the second veth device we have deleted.
      
      To avoid problems I do what we do in __rtnl_kill_links and
      restart the scan of the device list, after we have deleted
      a device.
      
      Currently dev_change_netnamespace does not appear to suffer from
      this problem, but wireless devices are also paired and likely
      should be moved between network namespaces together.  So I have
      errored on the side of caution and restart the scan of the network
      devices in that case as well.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8eb79863
  17. 27 12月, 2008 1 次提交
  18. 26 12月, 2008 1 次提交
  19. 23 12月, 2008 1 次提交
  20. 18 12月, 2008 2 次提交
  21. 16 12月, 2008 2 次提交
    • H
      net: Add Generic Receive Offload infrastructure · d565b0a1
      Herbert Xu 提交于
      This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
      This is pretty similar to LRO except that this is protocol-independent.
      Instead of holding packets in an lro_mgr structure, they're now held in
      napi_struct.
      
      For drivers that intend to use this, they can set the NETIF_F_GRO bit and
      call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
      The latter will call napi_receive_skb automatically.  When napi_gro_receive
      is used, the driver must either call napi_complete/napi_rx_complete, or
      call napi_gro_flush in softirq context if the driver uses the primitives
      __napi_complete/__napi_rx_complete.
      
      Protocols will set the gro_receive and gro_complete function pointers in
      order to participate in this scheme.
      
      In addition to the packet, gro_receive will get a list of currently held
      packets.  Each packet in the list has a same_flow field which is non-zero
      if it is a potential match for the new packet.  For each packet that may
      match, they also have a flush field which is non-zero if the held packet
      must not be merged with the new packet.
      
      Once gro_receive has determined that the new skb matches a held packet,
      the held packet may be processed immediately if the new skb cannot be
      merged with it.  In this case gro_receive should return the pointer to
      the existing skb in gro_list.  Otherwise the new skb should be merged into
      the existing packet and NULL should be returned, unless the new skb makes
      it impossible for any further merges to be made (e.g., FIN packet) where
      the merged skb should be returned.
      
      Whenever the skb is merged into an existing entry, the gro_receive
      function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
      merely matches an existing entry but can't be merged with it, then
      this shouldn't be set.
      
      If gro_receive finds it pointless to hold the new skb for future merging,
      it should set NAPI_GRO_CB(skb)->flush.
      
      Held packets will be flushed by napi_gro_flush which is called by
      napi_complete and napi_rx_complete.
      
      Currently held packets are stored in a singly liked list just like LRO.
      The list is limited to a maximum of 8 entries.  In future, this may be
      expanded to use a hash table to allow more flows to be held for merging.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d565b0a1
    • H
      net: Add frag_list support to GSO · 1a881f27
      Herbert Xu 提交于
      This patch allows GSO to handle frag_list in a limited way for the
      purposes of allowing packets merged by GRO to be refragmented on
      output.
      
      Most hardware won't (and aren't expected to) support handling GRO
      frag_list packets directly.  Therefore we will perform GSO in
      software for those cases.
      
      However, for drivers that can support it (such as virtual NICs) we
      may not have to segment the packets at all.
      
      Whether the added overhead of GRO/GSO is worthwhile for bridges
      and routers when weighed against the benefit of potentially
      increasing the MTU within the host is still an open question.
      However, for the case of host nodes this is undoubtedly a win.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a881f27
  22. 08 12月, 2008 1 次提交
    • W
      netdevice: Kill netdev->priv · b74ca3a8
      Wang Chen 提交于
      This is the last shoot of this series.
      After I removing all directly reference of netdev->priv, I am killing
      "priv" of "struct net_device" and fixing relative comments/docs.
      
      Anyone will not be allowed to reference netdev->priv directly.
      If you want to reference the memory of private data, use netdev_priv()
      instead.
      If the private data is not allocted when alloc_netdev(), use
      netdev->ml_priv to point that memory after you creating that private
      data.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74ca3a8
  23. 21 11月, 2008 1 次提交
  24. 20 11月, 2008 2 次提交
    • S
      netdev: introduce dev_get_stats() · eeda3fd6
      Stephen Hemminger 提交于
      In order for the network device ops get_stats call to be immutable, the handling
      of the default internal network device stats block has to be changed. Add a new
      helper function which replaces the old use of internal_get_stats.
      
      Note: change return code to make it clear that the caller should not
      go changing the returned statistics.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeda3fd6
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c