1. 16 12月, 2008 2 次提交
    • H
      net: Add Generic Receive Offload infrastructure · d565b0a1
      Herbert Xu 提交于
      This patch adds the top-level GRO (Generic Receive Offload) infrastructure.
      This is pretty similar to LRO except that this is protocol-independent.
      Instead of holding packets in an lro_mgr structure, they're now held in
      napi_struct.
      
      For drivers that intend to use this, they can set the NETIF_F_GRO bit and
      call napi_gro_receive instead of netif_receive_skb or just call netif_rx.
      The latter will call napi_receive_skb automatically.  When napi_gro_receive
      is used, the driver must either call napi_complete/napi_rx_complete, or
      call napi_gro_flush in softirq context if the driver uses the primitives
      __napi_complete/__napi_rx_complete.
      
      Protocols will set the gro_receive and gro_complete function pointers in
      order to participate in this scheme.
      
      In addition to the packet, gro_receive will get a list of currently held
      packets.  Each packet in the list has a same_flow field which is non-zero
      if it is a potential match for the new packet.  For each packet that may
      match, they also have a flush field which is non-zero if the held packet
      must not be merged with the new packet.
      
      Once gro_receive has determined that the new skb matches a held packet,
      the held packet may be processed immediately if the new skb cannot be
      merged with it.  In this case gro_receive should return the pointer to
      the existing skb in gro_list.  Otherwise the new skb should be merged into
      the existing packet and NULL should be returned, unless the new skb makes
      it impossible for any further merges to be made (e.g., FIN packet) where
      the merged skb should be returned.
      
      Whenever the skb is merged into an existing entry, the gro_receive
      function should set NAPI_GRO_CB(skb)->same_flow.  Note that if an skb
      merely matches an existing entry but can't be merged with it, then
      this shouldn't be set.
      
      If gro_receive finds it pointless to hold the new skb for future merging,
      it should set NAPI_GRO_CB(skb)->flush.
      
      Held packets will be flushed by napi_gro_flush which is called by
      napi_complete and napi_rx_complete.
      
      Currently held packets are stored in a singly liked list just like LRO.
      The list is limited to a maximum of 8 entries.  In future, this may be
      expanded to use a hash table to allow more flows to be held for merging.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d565b0a1
    • H
      net: Add frag_list support to GSO · 1a881f27
      Herbert Xu 提交于
      This patch allows GSO to handle frag_list in a limited way for the
      purposes of allowing packets merged by GRO to be refragmented on
      output.
      
      Most hardware won't (and aren't expected to) support handling GRO
      frag_list packets directly.  Therefore we will perform GSO in
      software for those cases.
      
      However, for drivers that can support it (such as virtual NICs) we
      may not have to segment the packets at all.
      
      Whether the added overhead of GRO/GSO is worthwhile for bridges
      and routers when weighed against the benefit of potentially
      increasing the MTU within the host is still an open question.
      However, for the case of host nodes this is undoubtedly a win.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a881f27
  2. 10 12月, 2008 1 次提交
    • N
      netpoll: fix race on poll_list resulting in garbage entry · 7b363e44
      Neil Horman 提交于
      	A few months back a race was discused between the netpoll napi service
      path, and the fast path through net_rx_action:
      http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
      
      A patch was submitted for that bug, but I think we missed a case.
      
      Consider the following scenario:
      
      INITIAL STATE
      CPU0 has one napi_struct A on its poll_list
      CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
      napi_struct A that CPU0 has on its list
      
      
      
      CPU0						CPU1
      net_rx_action					poll_napi
      !list_empty (returns true)			locks poll_lock for A
      						 poll_one_napi
      						  napi->poll
      						   netif_rx_complete
      						    __napi_complete
      						    (removes A from poll_list)
      list_entry(list->next)
      
      
      In the above scenario, net_rx_action assumes that the per-cpu poll_list is
      exclusive to that cpu.  netpoll of course violates that, and because the netpoll
      path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
      list at the top of the while loop in net_rx_action, but have it become empty by
      the time it calls list_entry.  Since the poll_list isn't surrounded by any other
      structure, the returned data from that list_entry call in this situation is
      garbage, and any number of crashes can result based on what exactly that garbage
      is.
      
      Given that its not fasible for performance reasons to place exclusive locks
      arround each cpus poll list to provide that mutal exclusion, I think the best
      solution is modify the netpoll path in such a way that we continue to guarantee
      that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
      implemented the patch below.  It adds an additional bit to the state field in
      the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
      be set. When a driver calls netif_rx_complete, if that bit is set, it will not
      remove the napi_struct from the poll_list.  That work will be saved for the next
      iteration of net_rx_action.
      
      I've tested this and it seems to work well.  About the biggest drawback I can
      see to it is the fact that it might result in an extra loop through
      net_rx_action in the event that the device is actually contended for (i.e. the
      netpoll path actually preforms all the needed work no the device, and the call
      to net_rx_action winds up doing nothing, except removing the napi_struct from
      the poll_list.  However I think this is probably a small price to pay, given
      that the alternative is a crash.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b363e44
  3. 08 12月, 2008 1 次提交
    • W
      netdevice: Kill netdev->priv · b74ca3a8
      Wang Chen 提交于
      This is the last shoot of this series.
      After I removing all directly reference of netdev->priv, I am killing
      "priv" of "struct net_device" and fixing relative comments/docs.
      
      Anyone will not be allowed to reference netdev->priv directly.
      If you want to reference the memory of private data, use netdev_priv()
      instead.
      If the private data is not allocted when alloc_netdev(), use
      netdev->ml_priv to point that memory after you creating that private
      data.
      Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74ca3a8
  4. 25 11月, 2008 2 次提交
  5. 21 11月, 2008 2 次提交
  6. 20 11月, 2008 2 次提交
    • S
      netdev: introduce dev_get_stats() · eeda3fd6
      Stephen Hemminger 提交于
      In order for the network device ops get_stats call to be immutable, the handling
      of the default internal network device stats block has to be changed. Add a new
      helper function which replaces the old use of internal_get_stats.
      
      Note: change return code to make it clear that the caller should not
      go changing the returned statistics.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeda3fd6
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c
  7. 08 11月, 2008 2 次提交
  8. 06 11月, 2008 1 次提交
    • E
      net: Guaranetee the proper ordering of the loopback device. · ae33bc40
      Eric W. Biederman 提交于
      I was recently hunting a bug that occurred in network namespace
      cleanup.  In looking at the code it became apparrent that we have
      and will continue to have cases where if we have anything going
      on in a network namespace there will be assumptions that the
      loopback device is present.   Things like sending igmp unsubscribe
      messages when we bring down network devices invokes the routing
      code which assumes that at least the loopback driver is present.
      
      Therefore to avoid magic initcall ordering hackery that is hard
      to follow and hard to get right insert a call to register the
      loopback device directly from net_dev_init().    This guarantes
      that the loopback device is the first device registered and
      the last network device to go away.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae33bc40
  9. 04 11月, 2008 1 次提交
  10. 31 10月, 2008 1 次提交
    • R
      net: delete excess kernel-doc notation · ad1d967c
      Randy Dunlap 提交于
      Remove excess kernel-doc function parameters from networking header
      & driver files:
      
      Warning(include/net/sock.h:946): Excess function parameter or struct member 'sk' description in 'sk_filter_release'
      Warning(include/linux/netdevice.h:1545): Excess function parameter or struct member 'cpu' description in 'netif_tx_lock'
      Warning(drivers/net/wan/z85230.c:712): Excess function parameter or struct member 'regs' description in 'z8530_interrupt'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad1d967c
  11. 23 10月, 2008 1 次提交
    • H
      net: Fix disjunct computation of netdev features · b63365a2
      Herbert Xu 提交于
      My change
      
          commit e2a6b852
          net: Enable TSO if supported by at least one device
      
      didn't do what was intended because the netdev_compute_features
      function was designed for conjunctions.  So what happened was that
      it would simply take the TSO status of the last constituent device.
      
      This patch extends it to support both conjunctions and disjunctions
      under the new name of netdev_increment_features.
      
      It also adds a new function netdev_fix_features which does the
      sanity checking that usually occurs upon registration.  This ensures
      that the computation doesn't result in an illegal combination
      since this checking is absent when the change is initiated via
      ethtool.
      
      The two users of netdev_compute_features have been converted.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b63365a2
  12. 14 10月, 2008 1 次提交
  13. 09 10月, 2008 3 次提交
    • L
      dsa: add support for Trailer tagging format · 396138f0
      Lennert Buytenhek 提交于
      This adds support for the Trailer switch tagging format.  This is
      another tagging that doesn't explicitly mark tagged packets with a
      distinct ethertype, so that we need to add a similar hack in the
      receive path as for the Original DSA tagging format.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      Tested-by: NByron Bradley <byron.bbradley@gmail.com>
      Tested-by: NTim Ellis <tim.ellis@mac.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      396138f0
    • L
      dsa: add support for original DSA tagging format · cf85d08f
      Lennert Buytenhek 提交于
      Most of the DSA switches currently in the field do not support the
      Ethertype DSA tagging format that one of the previous patches added
      support for, but only the original DSA tagging format.
      
      The original DSA tagging format carries the same information as the
      Ethertype DSA tagging format, but with the difference that it does not
      have an ethertype field.  In other words, when receiving a packet that
      is tagged with an original DSA tag, there is no way of telling in
      eth_type_trans() that this packet is in fact a DSA-tagged packet.
      
      This patch adds a hook into eth_type_trans() which is only compiled in
      if support for a switch chip that doesn't support Ethertype DSA is
      selected, and which checks whether there is a DSA switch driver
      instance attached to this network device which uses the old tag format.
      If so, it sets the protocol field to ETH_P_DSA without looking at the
      packet, so that the packet ends up in the right place.
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      Tested-by: NNicolas Pitre <nico@marvell.com>
      Tested-by: NPeter van Valderen <linux@ddcrew.com>
      Tested-by: NDirk Teurlings <dirk@upexia.nl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf85d08f
    • L
      net: Distributed Switch Architecture protocol support · 91da11f8
      Lennert Buytenhek 提交于
      Distributed Switch Architecture is a protocol for managing hardware
      switch chips.  It consists of a set of MII management registers and
      commands to configure the switch, and an ethernet header format to
      signal which of the ports of the switch a packet was received from
      or is intended to be sent to.
      
      The switches that this driver supports are typically embedded in
      access points and routers, and a typical setup with a DSA switch
      looks something like this:
      
      	+-----------+       +-----------+
      	|           | RGMII |           |
      	|           +-------+           +------ 1000baseT MDI ("WAN")
      	|           |       |  6-port   +------ 1000baseT MDI ("LAN1")
      	|    CPU    |       |  ethernet +------ 1000baseT MDI ("LAN2")
      	|           |MIImgmt|  switch   +------ 1000baseT MDI ("LAN3")
      	|           +-------+  w/5 PHYs +------ 1000baseT MDI ("LAN4")
      	|           |       |           |
      	+-----------+       +-----------+
      
      The switch driver presents each port on the switch as a separate
      network interface to Linux, polls the switch to maintain software
      link state of those ports, forwards MII management interface
      accesses to those network interfaces (e.g. as done by ethtool) to
      the switch, and exposes the switch's hardware statistics counters
      via the appropriate Linux kernel interfaces.
      
      This initial patch supports the MII management interface register
      layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
      supports the "Ethertype DSA" packet tagging format.
      
      (There is no officially registered ethertype for the Ethertype DSA
      packet format, so we just grab a random one.  The ethertype to use
      is programmed into the switch, and the switch driver uses the value
      of ETH_P_EDSA for this, so this define can be changed at any time in
      the future if the one we chose is allocated to another protocol or
      if Ethertype DSA gets its own officially registered ethertype, and
      everything will continue to work.)
      Signed-off-by: NLennert Buytenhek <buytenh@marvell.com>
      Tested-by: NNicolas Pitre <nico@marvell.com>
      Tested-by: NByron Bradley <byron.bbradley@gmail.com>
      Tested-by: NTim Ellis <tim.ellis@mac.com>
      Tested-by: NPeter van Valderen <linux@ddcrew.com>
      Tested-by: NDirk Teurlings <dirk@upexia.nl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91da11f8
  14. 30 9月, 2008 1 次提交
  15. 23 9月, 2008 1 次提交
    • S
      net: network device name ifalias support · 0b815a1a
      Stephen Hemminger 提交于
      This patch add support for keeping an additional character alias
      associated with an network interface. This is useful for maintaining
      the SNMP ifAlias value which is a user defined value. Routers use this
      to hold information like which circuit or line it is connected to. It
      is just an arbitrary text label on the network device.
      
      There are two exposed interfaces with this patch, the value can be
      read/written either via netlink or sysfs.
      
      This could be maintained just by the snmp daemon, but it is more
      generally useful for other management tools, and the kernel is good
      place to act as an agreed upon interface to store it.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b815a1a
  16. 05 8月, 2008 2 次提交
    • D
      net: Kill plain NET_XMIT_BYPASS. · cc6533e9
      David S. Miller 提交于
      dst_input() was doing something completely absurd, looping
      on skb->dst->input() if NET_XMIT_BYPASS was seen, but these
      functions never return such an error.
      
      And as a result plain ole' NET_XMIT_BYPASS has no more
      references and can be completely killed off.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc6533e9
    • J
      net_sched: Add qdisc __NET_XMIT_STOLEN flag · 378a2f09
      Jarek Poplawski 提交于
      Patrick McHardy <kaber@trash.net> noticed:
      "The other problem that affects all qdiscs supporting actions is
      TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
      even though the packet is not queued, corrupting upper qdiscs'
      qlen counters."
      
      and later explained:
      "The reason why it translates it at all seems to be to not increase
      the drops counter. Within a single qdisc this could be avoided by
      other means easily, upper qdiscs would still increase the counter
      when we return anything besides NET_XMIT_SUCCESS though.
      
      This means we need a new NET_XMIT return value to indicate this to
      the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
      return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
      in dev_queue_xmit, similar to NET_XMIT_BYPASS."
      
      David Miller <davem@davemloft.net> noticed:
      "Maybe these NET_XMIT_* values being passed around should be a set of
      bits. They could be composed of base meanings, combined with specific
      attributes.
      
      So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"
      
      The attributes get masked out by the top-level ->enqueue() caller,
      such that the base meanings are the only thing that make their
      way up into the stack. If it's only about communication within the
      qdisc tree, let's simply code it that way."
      
      This patch is trying to realize these ideas.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      378a2f09
  17. 01 8月, 2008 1 次提交
  18. 23 7月, 2008 1 次提交
    • D
      net: Fix build failure with 'make mandocs'. · d29f749e
      Dave Jones 提交于
      The function header comments have to go with the functions
      they are documenting, or things go horribly wrong when we
      try to process them with the docbook tools.
      
      Warning(include/linux/netdevice.h:1006): No description found for parameter 'dev_queue'
      Warning(include/linux/netdevice.h:1033): No description found for parameter 'dev_queue'
      Warning(include/linux/netdevice.h:1067): No description found for parameter 'dev_queue'
      Warning(include/linux/netdevice.h:1093): No description found for parameter 'dev_queue'
      Warning(include/linux/netdevice.h:1474): No description found for parameter 'txq'
      Error(net/core/dev.c:1674): cannot understand prototype: 'u32 simple_tx_hashrnd; '
      Signed-off-by: NDave Jones <davej@redhat.com>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d29f749e
  19. 22 7月, 2008 1 次提交
  20. 19 7月, 2008 1 次提交
    • D
      pkt_sched: Manage qdisc list inside of root qdisc. · 30723673
      David S. Miller 提交于
      Idea is from Patrick McHardy.
      
      Instead of managing the list of qdiscs on the device level, manage it
      in the root qdisc of a netdev_queue.  This solves all kinds of
      visibility issues during qdisc destruction.
      
      The way to iterate over all qdiscs of a netdev_queue is to visit
      the netdev_queue->qdisc, and then traverse it's list.
      
      The only special case is to ignore builting qdiscs at the root when
      dumping or doing a qdisc_lookup().  That was not needed previously
      because builtin qdiscs were not added to the device's qdisc_list.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30723673
  21. 18 7月, 2008 12 次提交