1. 05 3月, 2009 1 次提交
  2. 03 3月, 2009 1 次提交
    • E
      netns: Remove net_alive · 17edde52
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17edde52
  3. 01 3月, 2009 1 次提交
    • H
      netpoll: Add drop checks to all entry points · 4ead4431
      Herbert Xu 提交于
      The netpoll entry checks are required to ensure that we don't
      receive normal packets when invoked via netpoll.  Unfortunately
      it only ever worked for the netif_receive_skb/netif_rx entry
      points.  The VLAN (and subsequently GRO) entry point didn't
      have the check and therefore can trigger all sorts of weird
      problems.
      
      This patch adds the netpoll check to all entry points.
      
      I'm still uneasy with receiving at all under netpoll (which
      apparently is only used by the out-of-tree kdump code).  The
      reason is it is perfectly legal to receive all data including
      headers into highmem if netpoll is off, but if you try to do
      that with netpoll on and someone gets a printk in an IRQ handler                                             
      you're going to get a nice BUG_ON.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ead4431
  4. 07 2月, 2009 1 次提交
  5. 21 1月, 2009 1 次提交
  6. 20 1月, 2009 1 次提交
  7. 15 1月, 2009 3 次提交
  8. 11 1月, 2009 1 次提交
  9. 07 1月, 2009 5 次提交
  10. 05 1月, 2009 2 次提交
    • H
      gro: Add page frag support · 5d38a079
      Herbert Xu 提交于
      This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
      in one skb, rather than using the less efficient frag_list.
      
      It also adds a new interface, napi_gro_frags to allow drivers
      to inject page frags directly into the stack without allocating
      an skb.  This is intended to be the GRO equivalent for LRO's
      lro_receive_frags interface.
      
      The existing GSO interface can already handle page frags with
      or without an appended frag_list so nothing needs to be changed
      there.
      
      The merging itself is rather simple.  We store any new frag entries
      after the last existing entry, without checking whether the first
      new entry can be merged with the last existing entry.  Making this
      check would actually be easy but since no existing driver can
      produce contiguous frags anyway it would just be mental masturbation.
      
      If the total number of entries would exceed the capacity of a
      single skb, we simply resort to using frag_list as we do now.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d38a079
    • H
      gro: Use gso_size to store MSS · b530256d
      Herbert Xu 提交于
      In order to allow GRO packets without frag_list at all, we need to
      store the MSS in the packet itself.  The obvious place is gso_size.
      The only thing to watch out for is if the packet ends up not being
      GRO then we need to clear gso_size before pushing the packet into
      the stack.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b530256d
  11. 30 12月, 2008 1 次提交
    • E
      netns: foreach_netdev_safe is insufficient in default_device_exit · 8eb79863
      Eric W. Biederman 提交于
      During network namespace teardown we either move or delete
      all of the network devices associated with a network namespace.
      In the case of veth devices deleting one will also delete it's
      pair device.  If both devices are in the same network namespace
      then for_each_netdev_safe is insufficient as next may point
      to the second veth device we have deleted.
      
      To avoid problems I do what we do in __rtnl_kill_links and
      restart the scan of the device list, after we have deleted
      a device.
      
      Currently dev_change_netnamespace does not appear to suffer from
      this problem, but wireless devices are also paired and likely
      should be moved between network namespaces together.  So I have
      errored on the side of caution and restart the scan of the network
      devices in that case as well.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8eb79863
  12. 27 12月, 2008 1 次提交
  13. 26 12月, 2008 1 次提交
  14. 23 12月, 2008 1 次提交
  15. 18 12月, 2008 2 次提交
  16. 16 12月, 2008 2 次提交
    • H
      net: Add Generic Receive Offload infrastructure · d565b0a1
      Herbert Xu 提交于
      This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
      This is pretty similar to LRO except that this is protocol-independent.
      Instead of holding packets in an lro_mgr structure, they're now held in
      napi_struct.
      
      For drivers that intend to use this, they can set the NETIF_F_GRO bit and
      call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
      The latter will call napi_receive_skb automatically.  When napi_gro_receive
      is used, the driver must either call napi_complete/napi_rx_complete, or
      call napi_gro_flush in softirq context if the driver uses the primitives
      __napi_complete/__napi_rx_complete.
      
      Protocols will set the gro_receive and gro_complete function pointers in
      order to participate in this scheme.
      
      In addition to the packet, gro_receive will get a list of currently held
      packets.  Each packet in the list has a same_flow field which is non-zero
      if it is a potential match for the new packet.  For each packet that may
      match, they also have a flush field which is non-zero if the held packet
      must not be merged with the new packet.
      
      Once gro_receive has determined that the new skb matches a held packet,
      the held packet may be processed immediately if the new skb cannot be
      merged with it.  In this case gro_receive should return the pointer to
      the existing skb in gro_list.  Otherwise the new skb should be merged into
      the existing packet and NULL should be returned, unless the new skb makes
      it impossible for any further merges to be made (e.g., FIN packet) where
      the merged skb should be returned.
      
      Whenever the skb is merged into an existing entry, the gro_receive
      function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
      merely matches an existing entry but can't be merged with it, then
      this shouldn't be set.
      
      If gro_receive finds it pointless to hold the new skb for future merging,
      it should set NAPI_GRO_CB(skb)->flush.
      
      Held packets will be flushed by napi_gro_flush which is called by
      napi_complete and napi_rx_complete.
      
      Currently held packets are stored in a singly liked list just like LRO.
      The list is limited to a maximum of 8 entries.  In future, this may be
      expanded to use a hash table to allow more flows to be held for merging.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d565b0a1
    • H
      net: Add frag_list support to GSO · 1a881f27
      Herbert Xu 提交于
      This patch allows GSO to handle frag_list in a limited way for the
      purposes of allowing packets merged by GRO to be refragmented on
      output.
      
      Most hardware won't (and aren't expected to) support handling GRO
      frag_list packets directly.  Therefore we will perform GSO in
      software for those cases.
      
      However, for drivers that can support it (such as virtual NICs) we
      may not have to segment the packets at all.
      
      Whether the added overhead of GRO/GSO is worthwhile for bridges
      and routers when weighed against the benefit of potentially
      increasing the MTU within the host is still an open question.
      However, for the case of host nodes this is undoubtedly a win.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a881f27
  17. 08 12月, 2008 1 次提交
    • W
      netdevice: Kill netdev->priv · b74ca3a8
      Wang Chen 提交于
      This is the last shoot of this series.
      After I removing all directly reference of netdev->priv, I am killing
      "priv" of "struct net_device" and fixing relative comments/docs.
      
      Anyone will not be allowed to reference netdev->priv directly.
      If you want to reference the memory of private data, use netdev_priv()
      instead.
      If the private data is not allocted when alloc_netdev(), use
      netdev->ml_priv to point that memory after you creating that private
      data.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74ca3a8
  18. 21 11月, 2008 1 次提交
  19. 20 11月, 2008 2 次提交
    • S
      netdev: introduce dev_get_stats() · eeda3fd6
      Stephen Hemminger 提交于
      In order for the network device ops get_stats call to be immutable, the handling
      of the default internal network device stats block has to be changed. Add a new
      helper function which replaces the old use of internal_get_stats.
      
      Note: change return code to make it clear that the caller should not
      go changing the returned statistics.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeda3fd6
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c
  20. 17 11月, 2008 1 次提交
  21. 14 11月, 2008 1 次提交
  22. 08 11月, 2008 2 次提交
  23. 06 11月, 2008 3 次提交
    • E
      net: Don't leak packets when a netns is going down · 0a36b345
      Eric W. Biederman 提交于
      I have been tracking for a while a case where when the
      network namespace exits the cleanup gets stck in an
      endless precessess of:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      
      It turns out that if you listen on a multicast address an unsubscribe
      packet is sent when the network device goes down.   If you shutdown
      the network namespace without carefully cleaning up this can trigger
      the unsubscribe packet to be sent over the loopback interface while
      the network namespace is going down.
      
      All of which is fine except when we drop the packet and forget to
      free it leaking the skb and the dst entry attached to.  As it
      turns out the dst entry hold a reference to the idev which holds
      the dev and keeps everything from being cleaned up.  Yuck!
      
      By fixing my earlier thinko and add the needed kfree_skb and everything
      cleans up beautifully. 
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a36b345
    • E
      net: Guaranetee the proper ordering of the loopback device. · ae33bc40
      Eric W. Biederman 提交于
      I was recently hunting a bug that occurred in network namespace
      cleanup.  In looking at the code it became apparrent that we have
      and will continue to have cases where if we have anything going
      on in a network namespace there will be assumptions that the
      loopback device is present.   Things like sending igmp unsubscribe
      messages when we bring down network devices invokes the routing
      code which assumes that at least the loopback driver is present.
      
      Therefore to avoid magic initcall ordering hackery that is hard
      to follow and hard to get right insert a call to register the
      loopback device directly from net_dev_init().    This guarantes
      that the loopback device is the first device registered and
      the last network device to go away.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae33bc40
    • E
      netns: Delete virtual interfaces during namespace cleanup · d0c082ce
      Eric W. Biederman 提交于
      When physical devices are inside of network namespace and that
      network namespace terminates we can not make them go away.  We
      have to keep them and moving them to the initial network namespace
      is the best we can do.
      
      For virtual devices left in a network namespace that is exiting
      we have no need to preserve them and we now have the infrastructure
      that allows us to delete them.  So delete virtual devices when we
      exit a network namespace.  Keeping the necessary user space clean up
      after a network namespace exits much more tractable.
      Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0c082ce
  24. 05 11月, 2008 1 次提交
    • P
      net: fix packet socket delivery in rx irq handler · 9b22ea56
      Patrick McHardy 提交于
      The changes to deliver hardware accelerated VLAN packets to packet
      sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
      The __vlan_hwaccel_rx() function is called directly from the drivers
      RX function, for non-NAPI drivers that means its still in RX IRQ
      context:
      
      [   27.779463] ------------[ cut here ]------------
      [   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
      ...
      [   27.782520]  [<c0264755>] netif_nit_deliver+0x5b/0x75
      [   27.782590]  [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162
      [   27.782664]  [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1]
      [   27.782738]  [<c0155b17>] handle_IRQ_event+0x23/0x51
      [   27.782808]  [<c015692e>] handle_edge_irq+0xc2/0x102
      [   27.782878]  [<c0105fd5>] do_IRQ+0x4d/0x64
      
      Split hardware accelerated VLAN reception into two parts to fix this:
      
      - __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
        device lookup, then calls netif_receive_skb()/netif_rx()
      
      - vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
        in softirq context, performs the real reception and delivery to
        packet sockets.
      Reported-and-tested-by: NRamon Casellas <ramon.casellas@cttc.es>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b22ea56
  25. 04 11月, 2008 1 次提交
  26. 28 10月, 2008 1 次提交
  27. 23 10月, 2008 1 次提交
    • H
      net: Fix disjunct computation of netdev features · b63365a2
      Herbert Xu 提交于
      My change
      
          commit e2a6b852
          net: Enable TSO if supported by at least one device
      
      didn't do what was intended because the netdev_compute_features
      function was designed for conjunctions.  So what happened was that
      it would simply take the TSO status of the last constituent device.
      
      This patch extends it to support both conjunctions and disjunctions
      under the new name of netdev_increment_features.
      
      It also adds a new function netdev_fix_features which does the
      sanity checking that usually occurs upon registration.  This ensures
      that the computation doesn't result in an illegal combination
      since this checking is absent when the change is initiated via
      ethtool.
      
      The two users of netdev_compute_features have been converted.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b63365a2