1. 26 5月, 2009 1 次提交
    • E
      net: txq_trans_update() helper · 08baf561
      Eric Dumazet 提交于
      We would like to get rid of netdev->trans_start = jiffies; that about all net
      drivers have to use in their start_xmit() function, and use txq->trans_start
      instead.
      
      This can be done generically in core network, as suggested by David.
      
      Some devices, (particularly loopback) dont need trans_start update, because
      they dont have transmit watchdog. We could add a new device flag, or rely
      on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
      different than -1. Use a helper function to hide our choice.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08baf561
  2. 25 5月, 2009 1 次提交
  3. 19 5月, 2009 1 次提交
    • E
      net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue · 7004bf25
      Eric Dumazet 提交于
      offsetof(struct net_device, features)=0x44
      offsetof(struct net_device, stats.tx_packets)=0x54
      offsetof(struct net_device, stats.tx_bytes)=0x5c
      offsetof(struct net_device, stats.tx_dropped)=0x6c
      
      Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
      tx path can slow down SMP operations, since they dirty a cache line
      that should stay shared (dev->features is needed in rx and tx paths)
      
      We could move away stats field in net_device but it wont help that much.
      (Two cache lines dirtied in tx path, we can do one only)
      
      Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
      netdev_queue because this structure is already touched in tx path and
      counters updates will then be free (no increase in size)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7004bf25
  4. 18 5月, 2009 1 次提交
    • E
      net: tx scalability works : trans_start · 9d21493b
      Eric Dumazet 提交于
      struct net_device trans_start field is a hot spot on SMP and high performance
      devices, particularly multi queues ones, because every transmitter dirties
      it. Is main use is tx watchdog and bonding alive checks.
      
      But as most devices dont use NETIF_F_LLTX, we have to lock
      a netdev_queue before calling their ndo_start_xmit(). So it makes
      sense to move trans_start from net_device to netdev_queue. Its update
      will occur on a already present (and in exclusive state) cache line, for
      free.
      
      We can do this transition smoothly. An old driver continue to
      update dev->trans_start, while an updated one updates txq->trans_start.
      
      Further patches could also put tx_bytes/tx_packets counters in 
      netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d21493b
  5. 07 5月, 2009 1 次提交
  6. 06 5月, 2009 1 次提交
    • J
      net: introduce a list of device addresses dev_addr_list (v6) · f001fde5
      Jiri Pirko 提交于
      v5 -> v6 (current):
      -removed so far unused static functions
      -corrected dev_addr_del_multiple to call del instead of add
      
      v4 -> v5:
      -added device address type (suggested by davem)
      -removed refcounting (better to have simplier code then safe potentially few
       bytes)
      
      v3 -> v4:
      -changed kzalloc to kmalloc in __hw_addr_add_ii()
      -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()
      
      v2 -> v3:
      -removed unnecessary rcu read locking
      -moved dev_addr_flush() calling to ensure no null dereference of dev_addr
      
      v1 -> v2:
      -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
      -removed unnecessary rcu_read locking in dev_addr_init
      -use compare_ether_addr_64bits instead of compare_ether_addr
      -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
      -use call_rcu instead of rcu_synchronize
      -moved is_etherdev_addr into __KERNEL__ ifdef
      
      This patch introduces a new list in struct net_device and brings a set of
      functions to handle the work with device address list. The list is a replacement
      for the original dev_addr field and because in some situations there is need to
      carry several device addresses with the net device. To be backward compatible,
      dev_addr is made to point to the first member of the list so original drivers
      sees no difference.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f001fde5
  7. 28 4月, 2009 2 次提交
  8. 27 4月, 2009 4 次提交
  9. 21 4月, 2009 1 次提交
  10. 16 4月, 2009 1 次提交
  11. 28 3月, 2009 1 次提交
    • D
      net: Add missing include into include/linux/netdevice.h · cc0be322
      Dmitri Vorobiev 提交于
      The inline function skb_gro_mac_header defined in include/linux/netdevice.h
      makes use of page_address(). Depending on configuration options, the latter
      is either defined as a macro or is declared as a function in another header
      file, namely include/linux/mm.h. However, include/linux/netdevice.h does not
      include include/linux/mm.h.
      
      On MIPS, this has produced the following build error:
      
        CC      kernel/sysctl_check.o
      In file included from include/linux/icmpv6.h:173,
                       from include/linux/ipv6.h:208,
                       from include/net/ip_vs.h:26,
                       from kernel/sysctl_check.c:6:
      include/linux/netdevice.h: In function 'skb_gro_mac_header':
      include/linux/netdevice.h:1132: error: implicit declaration of function
      'page_address'
      include/linux/netdevice.h:1133: warning: pointer/integer type mismatch
      in conditional expression
      make[1]: *** [kernel/sysctl_check.o] Error 1
      make: *** [kernel] Error 2
      
      The patch adds the missing include and fixes the build error.
      Signed-off-by: NDmitri Vorobiev <dmitri.vorobiev@movial.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc0be322
  12. 17 3月, 2009 1 次提交
    • H
      GRO: Move netpoll checks to correct location · d1c76af9
      Herbert Xu 提交于
      As my netpoll fix for net doesn't really work for net-next, we
      need this update to move the checks into the right place.  As it
      stands we may pass freed skbs to netpoll_receive_skb.
      
      This patch also introduces a netpoll_rx_on function to avoid GRO
      completely if we're invoked through netpoll.  This might seem
      paranoid but as netpoll may have an external receive hook it's
      better to be safe than sorry.  I don't think we need this for
      2.6.29 though since there's nothing immediately broken by it.
      
      This patch also moves the GRO_* return values to netdevice.h since
      VLAN needs them too (I tried to avoid this originally but alas
      this seems to be the easiest way out).  This fixes a bug in VLAN
      where it continued to use the old return value 2 instead of the
      correct GRO_DROP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c76af9
  13. 14 3月, 2009 3 次提交
  14. 05 3月, 2009 1 次提交
    • D
      vlan: Fix vlan-in-vlan crashes. · 9d40bbda
      David S. Miller 提交于
      As analyzed by Patrick McHardy, vlan needs to reset it's
      netdev_ops pointer in it's ->init() function but this
      leaves the compat method pointers stale.
      
      Add a netdev_resync_ops() and call it from the vlan code.
      
      Any other driver which changes ->netdev_ops after register_netdevice()
      will need to call this new function after doing so too.
      
      With help from Patrick McHardy.
      Tested-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d40bbda
  15. 15 2月, 2009 1 次提交
  16. 09 2月, 2009 2 次提交
  17. 06 2月, 2009 1 次提交
  18. 30 1月, 2009 2 次提交
  19. 22 1月, 2009 1 次提交
  20. 15 1月, 2009 1 次提交
  21. 13 1月, 2009 1 次提交
  22. 07 1月, 2009 2 次提交
  23. 05 1月, 2009 1 次提交
    • H
      gro: Add page frag support · 5d38a079
      Herbert Xu 提交于
      This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
      in one skb, rather than using the less efficient frag_list.
      
      It also adds a new interface, napi_gro_frags to allow drivers
      to inject page frags directly into the stack without allocating
      an skb.  This is intended to be the GRO equivalent for LRO's
      lro_receive_frags interface.
      
      The existing GSO interface can already handle page frags with
      or without an appended frag_list so nothing needs to be changed
      there.
      
      The merging itself is rather simple.  We store any new frag entries
      after the last existing entry, without checking whether the first
      new entry can be merged with the last existing entry.  Making this
      check would actually be easy but since no existing driver can
      produce contiguous frags anyway it would just be mental masturbation.
      
      If the total number of entries would exceed the capacity of a
      single skb, we simply resort to using frag_list as we do now.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d38a079
  24. 23 12月, 2008 1 次提交
  25. 16 12月, 2008 2 次提交
    • H
      net: Add Generic Receive Offload infrastructure · d565b0a1
      Herbert Xu 提交于
      This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
      This is pretty similar to LRO except that this is protocol-independent.
      Instead of holding packets in an lro_mgr structure, they're now held in
      napi_struct.
      
      For drivers that intend to use this, they can set the NETIF_F_GRO bit and
      call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
      The latter will call napi_receive_skb automatically.  When napi_gro_receive
      is used, the driver must either call napi_complete/napi_rx_complete, or
      call napi_gro_flush in softirq context if the driver uses the primitives
      __napi_complete/__napi_rx_complete.
      
      Protocols will set the gro_receive and gro_complete function pointers in
      order to participate in this scheme.
      
      In addition to the packet, gro_receive will get a list of currently held
      packets.  Each packet in the list has a same_flow field which is non-zero
      if it is a potential match for the new packet.  For each packet that may
      match, they also have a flush field which is non-zero if the held packet
      must not be merged with the new packet.
      
      Once gro_receive has determined that the new skb matches a held packet,
      the held packet may be processed immediately if the new skb cannot be
      merged with it.  In this case gro_receive should return the pointer to
      the existing skb in gro_list.  Otherwise the new skb should be merged into
      the existing packet and NULL should be returned, unless the new skb makes
      it impossible for any further merges to be made (e.g., FIN packet) where
      the merged skb should be returned.
      
      Whenever the skb is merged into an existing entry, the gro_receive
      function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
      merely matches an existing entry but can't be merged with it, then
      this shouldn't be set.
      
      If gro_receive finds it pointless to hold the new skb for future merging,
      it should set NAPI_GRO_CB(skb)->flush.
      
      Held packets will be flushed by napi_gro_flush which is called by
      napi_complete and napi_rx_complete.
      
      Currently held packets are stored in a singly liked list just like LRO.
      The list is limited to a maximum of 8 entries.  In future, this may be
      expanded to use a hash table to allow more flows to be held for merging.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d565b0a1
    • H
      net: Add frag_list support to GSO · 1a881f27
      Herbert Xu 提交于
      This patch allows GSO to handle frag_list in a limited way for the
      purposes of allowing packets merged by GRO to be refragmented on
      output.
      
      Most hardware won't (and aren't expected to) support handling GRO
      frag_list packets directly.  Therefore we will perform GSO in
      software for those cases.
      
      However, for drivers that can support it (such as virtual NICs) we
      may not have to segment the packets at all.
      
      Whether the added overhead of GRO/GSO is worthwhile for bridges
      and routers when weighed against the benefit of potentially
      increasing the MTU within the host is still an open question.
      However, for the case of host nodes this is undoubtedly a win.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a881f27
  26. 10 12月, 2008 1 次提交
    • N
      netpoll: fix race on poll_list resulting in garbage entry · 7b363e44
      Neil Horman 提交于
      	A few months back a race was discused between the netpoll napi service
      path, and the fast path through net_rx_action:
      http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
      
      A patch was submitted for that bug, but I think we missed a case.
      
      Consider the following scenario:
      
      INITIAL STATE
      CPU0 has one napi_struct A on its poll_list
      CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
      napi_struct A that CPU0 has on its list
      
      
      
      CPU0						CPU1
      net_rx_action					poll_napi
      !list_empty (returns true)			locks poll_lock for A
      						 poll_one_napi
      						  napi->poll
      						   netif_rx_complete
      						    __napi_complete
      						    (removes A from poll_list)
      list_entry(list->next)
      
      
      In the above scenario, net_rx_action assumes that the per-cpu poll_list is
      exclusive to that cpu.  netpoll of course violates that, and because the netpoll
      path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
      list at the top of the while loop in net_rx_action, but have it become empty by
      the time it calls list_entry.  Since the poll_list isn't surrounded by any other
      structure, the returned data from that list_entry call in this situation is
      garbage, and any number of crashes can result based on what exactly that garbage
      is.
      
      Given that its not fasible for performance reasons to place exclusive locks
      arround each cpus poll list to provide that mutal exclusion, I think the best
      solution is modify the netpoll path in such a way that we continue to guarantee
      that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
      implemented the patch below.  It adds an additional bit to the state field in
      the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
      be set. When a driver calls netif_rx_complete, if that bit is set, it will not
      remove the napi_struct from the poll_list.  That work will be saved for the next
      iteration of net_rx_action.
      
      I've tested this and it seems to work well.  About the biggest drawback I can
      see to it is the fact that it might result in an extra loop through
      net_rx_action in the event that the device is actually contended for (i.e. the
      netpoll path actually preforms all the needed work no the device, and the call
      to net_rx_action winds up doing nothing, except removing the napi_struct from
      the poll_list.  However I think this is probably a small price to pay, given
      that the alternative is a crash.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b363e44
  27. 08 12月, 2008 1 次提交
    • W
      netdevice: Kill netdev->priv · b74ca3a8
      Wang Chen 提交于
      This is the last shoot of this series.
      After I removing all directly reference of netdev->priv, I am killing
      "priv" of "struct net_device" and fixing relative comments/docs.
      
      Anyone will not be allowed to reference netdev->priv directly.
      If you want to reference the memory of private data, use netdev_priv()
      instead.
      If the private data is not allocted when alloc_netdev(), use
      netdev->ml_priv to point that memory after you creating that private
      data.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74ca3a8
  28. 25 11月, 2008 2 次提交
  29. 21 11月, 2008 1 次提交