1. 21 11月, 2008 1 次提交
  2. 20 11月, 2008 2 次提交
    • S
      netdev: introduce dev_get_stats() · eeda3fd6
      Stephen Hemminger 提交于
      In order for the network device ops get_stats call to be immutable, the handling
      of the default internal network device stats block has to be changed. Add a new
      helper function which replaces the old use of internal_get_stats.
      
      Note: change return code to make it clear that the caller should not
      go changing the returned statistics.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeda3fd6
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c
  3. 17 11月, 2008 1 次提交
  4. 08 11月, 2008 2 次提交
  5. 06 11月, 2008 3 次提交
    • E
      net: Don't leak packets when a netns is going down · 0a36b345
      Eric W. Biederman 提交于
      I have been tracking for a while a case where when the
      network namespace exits the cleanup gets stck in an
      endless precessess of:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      unregister_netdevice: waiting for lo to become free. Usage count = 3
      
      It turns out that if you listen on a multicast address an unsubscribe
      packet is sent when the network device goes down.   If you shutdown
      the network namespace without carefully cleaning up this can trigger
      the unsubscribe packet to be sent over the loopback interface while
      the network namespace is going down.
      
      All of which is fine except when we drop the packet and forget to
      free it leaking the skb and the dst entry attached to.  As it
      turns out the dst entry hold a reference to the idev which holds
      the dev and keeps everything from being cleaned up.  Yuck!
      
      By fixing my earlier thinko and add the needed kfree_skb and everything
      cleans up beautifully. 
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a36b345
    • E
      net: Guaranetee the proper ordering of the loopback device. · ae33bc40
      Eric W. Biederman 提交于
      I was recently hunting a bug that occurred in network namespace
      cleanup.  In looking at the code it became apparrent that we have
      and will continue to have cases where if we have anything going
      on in a network namespace there will be assumptions that the
      loopback device is present.   Things like sending igmp unsubscribe
      messages when we bring down network devices invokes the routing
      code which assumes that at least the loopback driver is present.
      
      Therefore to avoid magic initcall ordering hackery that is hard
      to follow and hard to get right insert a call to register the
      loopback device directly from net_dev_init().    This guarantes
      that the loopback device is the first device registered and
      the last network device to go away.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae33bc40
    • E
      netns: Delete virtual interfaces during namespace cleanup · d0c082ce
      Eric W. Biederman 提交于
      When physical devices are inside of network namespace and that
      network namespace terminates we can not make them go away.  We
      have to keep them and moving them to the initial network namespace
      is the best we can do.
      
      For virtual devices left in a network namespace that is exiting
      we have no need to preserve them and we now have the infrastructure
      that allows us to delete them.  So delete virtual devices when we
      exit a network namespace.  Keeping the necessary user space clean up
      after a network namespace exits much more tractable.
      Acked-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0c082ce
  6. 05 11月, 2008 1 次提交
    • P
      net: fix packet socket delivery in rx irq handler · 9b22ea56
      Patrick McHardy 提交于
      The changes to deliver hardware accelerated VLAN packets to packet
      sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
      The __vlan_hwaccel_rx() function is called directly from the drivers
      RX function, for non-NAPI drivers that means its still in RX IRQ
      context:
      
      [   27.779463] ------------[ cut here ]------------
      [   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
      ...
      [   27.782520]  [<c0264755>] netif_nit_deliver+0x5b/0x75
      [   27.782590]  [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162
      [   27.782664]  [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1]
      [   27.782738]  [<c0155b17>] handle_IRQ_event+0x23/0x51
      [   27.782808]  [<c015692e>] handle_edge_irq+0xc2/0x102
      [   27.782878]  [<c0105fd5>] do_IRQ+0x4d/0x64
      
      Split hardware accelerated VLAN reception into two parts to fix this:
      
      - __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
        device lookup, then calls netif_receive_skb()/netif_rx()
      
      - vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
        in softirq context, performs the real reception and delivery to
        packet sockets.
      Reported-and-tested-by: NRamon Casellas <ramon.casellas@cttc.es>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b22ea56
  7. 04 11月, 2008 1 次提交
  8. 28 10月, 2008 1 次提交
  9. 23 10月, 2008 1 次提交
    • H
      net: Fix disjunct computation of netdev features · b63365a2
      Herbert Xu 提交于
      My change
      
          commit e2a6b852
          net: Enable TSO if supported by at least one device
      
      didn't do what was intended because the netdev_compute_features
      function was designed for conjunctions.  So what happened was that
      it would simply take the TSO status of the last constituent device.
      
      This patch extends it to support both conjunctions and disjunctions
      under the new name of netdev_increment_features.
      
      It also adds a new function netdev_fix_features which does the
      sanity checking that usually occurs upon registration.  This ensures
      that the computation doesn't result in an illegal combination
      since this checking is absent when the change is initiated via
      ethtool.
      
      The two users of netdev_compute_features have been converted.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b63365a2
  10. 20 10月, 2008 1 次提交
  11. 17 10月, 2008 1 次提交
  12. 08 10月, 2008 2 次提交
    • H
      net: Fix netdev_run_todo dead-lock · 58ec3b4d
      Herbert Xu 提交于
      Benjamin Thery tracked down a bug that explains many instances
      of the error
      
      unregister_netdevice: waiting for %s to become free. Usage count = %d
      
      It turns out that netdev_run_todo can dead-lock with itself if
      a second instance of it is run in a thread that will then free
      a reference to the device waited on by the first instance.
      
      The problem is really quite silly.  We were trying to create
      parallelism where none was required.  As netdev_run_todo always
      follows a RTNL section, and that todo tasks can only be added
      with the RTNL held, by definition you should only need to wait
      for the very ones that you've added and be done with it.
      
      There is no need for a second mutex or spinlock.
      
      This is exactly what the following patch does.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58ec3b4d
    • P
      net: only invoke dev->change_rx_flags when device is UP · b6c40d68
      Patrick McHardy 提交于
      Jesper Dangaard Brouer <hawk@comx.dk> reported a bug when setting a VLAN
      device down that is in promiscous mode:
      
      When the VLAN device is set down, the promiscous count on the real
      device is decremented by one by vlan_dev_stop(). When removing the
      promiscous flag from the VLAN device afterwards, the promiscous
      count on the real device is decremented a second time by the
      vlan_change_rx_flags() callback.
      
      The root cause for this is that the ->change_rx_flags() callback is
      invoked while the device is down. The synchronization is meant to mirror
      the behaviour of the ->set_rx_mode callbacks, meaning the ->open function
      is responsible for doing a full sync on open, the ->close() function is
      responsible for doing full cleanup on ->stop() and ->change_rx_flags()
      is meant to do incremental changes while the device is UP.
      
      Only invoke ->change_rx_flags() while the device is UP to provide the
      intended behaviour.
      Tested-by: NJesper Dangaard Brouer <jdb@comx.dk>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6c40d68
  13. 30 9月, 2008 2 次提交
  14. 24 9月, 2008 1 次提交
  15. 23 9月, 2008 1 次提交
    • S
      net: network device name ifalias support · 0b815a1a
      Stephen Hemminger 提交于
      This patch add support for keeping an additional character alias
      associated with an network interface. This is useful for maintaining
      the SNMP ifAlias value which is a user defined value. Routers use this
      to hold information like which circuit or line it is connected to. It
      is just an arbitrary text label on the network device.
      
      There are two exposed interfaces with this patch, the value can be
      read/written either via netlink or sysfs.
      
      This could be maintained just by the snmp daemon, but it is more
      generally useful for other management tools, and the kernel is good
      place to act as an agreed upon interface to store it.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b815a1a
  16. 21 9月, 2008 2 次提交
  17. 09 9月, 2008 1 次提交
    • H
      net: Enable TSO if supported by at least one device · e2a6b852
      Herbert Xu 提交于
      As it stands users of netdev_compute_features (e.g., bridges/bonding)
      will only enable TSO if all consituent devices support it.  This
      is unnecessarily pessimistic since even on devices that do not
      support hardware TSO and SG, emulated TSO still performs to a par
      with TSO off.
      
      This patch enables TSO if at least on constituent device supports
      it in hardware.
      
      The direct beneficiaries will be virtualisation that uses bridging
      since this means that TSO will always be enabled for communication
      from the host to the guests.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2a6b852
  18. 08 9月, 2008 1 次提交
  19. 19 8月, 2008 1 次提交
    • D
      pkt_sched: Prevent livelock in TX queue running. · 195648bb
      David S. Miller 提交于
      If dev_deactivate() is trying to quiesce the queue, it
      is theoretically possible for another cpu to livelock
      trying to process that queue.  This happens because
      dev_deactivate() grabs the queue spinlock as it checks
      the queue state, whereas net_tx_action() does a trylock
      and reschedules the qdisc if it hits the lock.
      
      This breaks the livelock by adding a check on
      __QDISC_STATE_DEACTIVATED to net_tx_action() when
      the trylock fails.
      
      Based upon feedback from Herbert Xu and Jarek Poplawski.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      195648bb
  20. 18 8月, 2008 3 次提交
  21. 07 8月, 2008 3 次提交
  22. 05 8月, 2008 1 次提交
  23. 04 8月, 2008 1 次提交
  24. 03 8月, 2008 2 次提交
  25. 01 8月, 2008 1 次提交
  26. 30 7月, 2008 1 次提交
    • D
      pkt_sched: Fix OOPS on ingress qdisc add. · 8d50b53d
      David S. Miller 提交于
      Bug report from Steven Jan Springl:
      
      	Issuing the following command causes a kernel oops:
      		tc qdisc add dev eth0 handle ffff: ingress
      
      The problem mostly stems from all of the special case handling of
      ingress qdiscs.
      
      So, to fix this, do the grafting operation the same way we do for TX
      qdiscs.  Which means that dev_activate() and dev_deactivate() now do
      the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too.
      
      Future simplifications are possible now, mainly because it is
      impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL.  There
      are NULL checks all over to handle the ingress qdisc special case
      that used to exist before this commit.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d50b53d
  27. 26 7月, 2008 1 次提交
  28. 24 7月, 2008 1 次提交
    • D
      netdev: Remove warning from __netif_schedule(). · 5b3ab1db
      David S. Miller 提交于
      It isn't helping anything and we aren't going to be able to change all
      the drivers that do queue wakeups in strange situations.
      
      Just letting a noop_qdisc get scheduled will work because when
      qdisc_run() executes via net_tx_work() it will simply find no packets
      pending when it makes the ->dequeue() call in qdisc_restart.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b3ab1db