1. 29 1月, 2008 1 次提交
  2. 14 11月, 2007 1 次提交
    • P
      [PKT_SCHED]: Check subqueue status before calling hard_start_xmit · 5f1a485d
      Peter P Waskiewicz Jr 提交于
      The only qdiscs that check subqueue state before dequeue'ing are PRIO
      and RR.  The other qdiscs, including the default pfifo_fast qdisc,
      will allow traffic bound for subqueue 0 through to hard_start_xmit.
      The check for netif_queue_stopped() is done above in pkt_sched.h, so
      it is unnecessary for qdisc_restart().  However, if the underlying
      driver is multiqueue capable, and only sets queue states on subqueues,
      this will allow packets to enter the driver when it's currently unable
      to process packets, resulting in expensive requeues and driver
      entries.  This patch re-adds the check for the subqueue status before
      calling hard_start_xmit, so we can try and avoid the driver entry when
      the queues are stopped.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f1a485d
  3. 19 10月, 2007 1 次提交
    • H
      [NET]: Fix possible dev_deactivate race condition · ce0e32e6
      Herbert Xu 提交于
      The function dev_deactivate is supposed to only return when
      all outstanding transmissions have completed.  Unfortunately
      it is possible for store operations in the driver's transmit
      function to only become visible after dev_deactivate returns.
      
      This patch fixes this by taking the queue lock after we see
      the end of the queue run.  This ensures that all effects of
      any previous transmit calls are visible.
      
      If however we detect that there is another queue run occuring,
      then we'll warn about it because this should never happen as
      we have pointed dev->qdisc to noop_qdisc within the same queue
      lock earlier in the functino.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce0e32e6
  4. 18 10月, 2007 1 次提交
    • J
      [NET]: fix carrier-on bug? · bfaae0f0
      Jeff Garzik 提交于
      While looking at a net driver with the following construct,
      
      	if (!netif_carrier_ok(dev))
      		netif_carrier_on(dev);
      
      it stuck me that the netif_carrier_ok() check was redundant, since
      netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway.  This is
      the same reason why netif_queue_stopped() need not be called prior to
      netif_wake_queue().
      
      This is true, but there is however an unwanted side effect from assuming
      that netif_carrier_on() can be called multiple times:  it touches the
      watchdog, regardless of pre-existing carrier state.
      
      The fix:  move watchdog-up inside the bit-cleared code path.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfaae0f0
  5. 11 10月, 2007 2 次提交
    • J
      [NET_SCHED]: explict hold dev tx lock · 8236632f
      Jamal Hadi Salim 提交于
      For N cpus, with full throttle traffic on all N CPUs, funneling traffic
      to the same ethernet device, the devices queue lock is contended by all
      N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
      In the current mode of operation, after all the work of entering the
      dequeue region, we may endup aborting the path if we are unable to get
      the tx lock and go back to contend for the queue lock. As N goes up,
      this gets worse.
      
      The changes in this patch result in a small increase in performance
      with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3
      showed similar behavior;
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8236632f
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  6. 11 7月, 2007 5 次提交
    • P
      [NET_SCHED]: Remove unnecessary includes · 0ba48053
      Patrick McHardy 提交于
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ba48053
    • P
      [NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option · 876d48aa
      Patrick McHardy 提交于
      The generic estimator is always built in anways and all the config options
      does is prevent including a minimal amount of code for setting it up.
      Additionally the option is already automatically selected for most cases.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876d48aa
    • K
      [NET]: qdisc_restart - couple of optimizations. · e50c41b5
      Krishna Kumar 提交于
      Changes :
      
      - netif_queue_stopped need not be called inside qdisc_restart as
        it has been called already in qdisc_run() before the first skb
        is sent, and in __qdisc_run() after each intermediate skb is
        sent (note : we are the only sender, so the queue cannot get
        stopped while the tx lock was got in the ~LLTX case).
      
      - BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
        meant more packets are available, and __qdisc_run used to loop
        when qdisc_restart() returned -1. During those days, it was
        necessary to make sure that qlen is never less than zero, since
        __qdisc_run would get into an infinite loop if no packets are on
        the queue and this bug in qdisc was there (and worse - no more
        skbs could ever get queue'd as we hold the queue lock too). With
        Herbert's recent change to return values, this check is not
        required.  Hopefully Herbert can validate this change. If at all
        this is required, it should be added to skb_dequeue (in failure
        case), and not to qdisc_qlen.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e50c41b5
    • K
      [NET]: qdisc_restart - readability changes plus one bug fix. · 6c1361a6
      Krishna Kumar 提交于
      New changes :
      
      - Incorporated Peter Waskiewicz's comments.
      - Re-added back one warning message (on driver returning wrong value).
      
      Previous changes :
      
      - Converted to use switch/case code which looks neater.
      
      - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless
        check should be removed, since driver will return NETDEV_TX_LOCKED only
        if lockless is true and driver has to do the locking. In the original
        code as well as the latest code, this code can result in a bug where
        if LLTX is not set for a driver (lockless == 0) but the driver is written
        wrongly to do a trylock (despite LLTX being set), the driver returns
        LOCKED. But since lockless is zero, the packet is requeue'd instead of
        calling collision code which will issue warning and free up the skb.
        Instead this skb will be retried with this driver next time, and the same
        result will ensue. Removing this check will catch these driver bugs instead
        of hiding the problem. I am keeping this change to readability section
        since :
        	a. it is confusing to check two things as it is; and
        	b. it is difficult to keep this check in the changed 'switch' code.
      
      - Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is
        the work being done and easier to understand) and do_dev_requeue to
        dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to
        dev_handle_collision (handle_dev_cpu_collision is a small routine with only
        one caller, so there is no need to have two separate routines which also
        results in getting rid of two macros, etc.
      
      - Removed an XXX comment as it should never fail (I suspect this was related
        to batch skb WIP, Jamal ?). Converted some functions to original coding
        style of having the return values and the function name on same line, eg
        prio2list.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c1361a6
    • J
      [NET_SCHED]: Cleanup readability of qdisc restart · c716a81a
      Jamal Hadi Salim 提交于
      Over the years this code has gotten hairier. Resulting in many long
      discussions over long summer days and patches that get it wrong.
      This patch helps tame that code so normal people will understand it.
      
      Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy
      for their valuable reviews.
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c716a81a
  7. 04 6月, 2007 1 次提交
  8. 25 5月, 2007 1 次提交
  9. 11 5月, 2007 4 次提交
    • H
      [NET_SCHED]: Avoid requeue warning on dev_deactivate · 41a23b07
      Herbert Xu 提交于
      When we relinquish queue_lock in qdisc_restart and then retake it for
      requeueing, we might race against dev_deactivate and end up requeueing
      onto noop_qdisc.  This causes a warning to be printed.
      
      This patch fixes this by checking this before we requeue.  As an added
      bonus, we can remove the same check in __qdisc_run which was added to
      prevent dev->gso_skb from being requeued when we're shutting down.
      
      Even though we've had to add a new conditional in its place, it's better
      because it only happens on requeues rather than every single time that
      qdisc_run is called.
      
      For this to work we also need to move the clearing of gso_skb up in
      dev_deactivate as now qdisc_restart can occur even after we wait for
      __LINK_STATE_QDISC_RUNNING to clear (but it won't do anything as long
      as the queue and gso_skb is already clear).
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41a23b07
    • H
      [NET_SCHED]: Reread dev->qdisc for NETDEV_TX_OK · cce1fa36
      Herbert Xu 提交于
      Now that we return the queue length after NETDEV_TX_OK we better
      make sure that we have the right queue.  Otherwise we can cause a
      stall after a really quick dev_deactive/dev_activate.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cce1fa36
    • H
      [NET_SCHED]: Rationalise return value of qdisc_restart · d90df3ad
      Herbert Xu 提交于
      The current return value scheme and associated comment was invented
      back in the 20th century when we still had that tbusy flag.  Things
      have changed quite a bit since then (even Tony Blair is moving on
      now, not to mention the new French president).
      
      All we need to indicate now is whether the caller should continue
      processing the queue.  Therefore it's sufficient if we return 0 if
      we want to stop and non-zero otherwise.
      
      This is based on a patch by Krishna Kumar.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d90df3ad
    • T
      [NET]: Fix dev->qdisc race for NETDEV_TX_LOCKED case · 5830725f
      Thomas Graf 提交于
      When transmit fails with NETDEV_TX_LOCKED the skb is requeued
      to dev->qdisc again. The dev->qdisc pointer is protected by
      the queue lock which needs to be dropped when attempting to
      transmit and acquired again before requeing. The problem is
      that qdisc_restart() fetches the dev->qdisc pointer once and
      stores it in the `q' variable which is invalidated when
      dropping the queue_lock, therefore the variable needs to be
      refreshed before requeueing.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5830725f
  10. 26 4月, 2007 2 次提交
  11. 11 2月, 2007 1 次提交
  12. 09 2月, 2007 1 次提交
    • A
      [NET]: user of the jiffies rounding code: Networking · f5a6e01c
      Arjan van de Ven 提交于
      This patch introduces users of the round_jiffies() function in the
      networking code.
      
      These timers all were of the "about once a second" or "about once
      every X seconds" variety and several showed up in the "what wakes the
      cpu up" profiles that the tickless patches provide.  Some timers are
      highly dynamic based on network load; but even on low activity systems
      they still show up so the rounding is done only in cases of low
      activity, allowing higher frequency timers in the high activity case.
      
      The various hardware watchdogs are an obvious case; they run every 2
      seconds but aren't otherwise specific of exactly when they need to
      run.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a6e01c
  13. 03 12月, 2006 2 次提交
  14. 29 9月, 2006 1 次提交
    • P
      [NET_SCHED]: Fix fallout from dev->qdisc RCU change · 85670cc1
      Patrick McHardy 提交于
      The move of qdisc destruction to a rcu callback broke locking in the
      entire qdisc layer by invalidating previously valid assumptions about
      the context in which changes to the qdisc tree occur.
      
      The two assumptions were:
      
      - since changes only happen in process context, read_lock doesn't need
        bottem half protection. Now invalid since destruction of inner qdiscs,
        classifiers, actions and estimators happens in the RCU callback unless
        they're manually deleted, resulting in dead-locks when read_lock in
        process context is interrupted by write_lock_bh in bottem half context.
      
      - since changes only happen under the RTNL, no additional locking is
        necessary for data not used during packet processing (f.e. u32_list).
        Again, since destruction now happens in the RCU callback, this assumption
        is not valid anymore, causing races while using this data, which can
        result in corruption or use-after-free.
      
      Instead of "fixing" this by disabling bottem halfs everywhere and adding
      new locks/refcounting, this patch makes these assumptions valid again by
      moving destruction back to process context. Since only the dev->qdisc
      pointer is protected by RCU, but ->enqueue and the qdisc tree are still
      protected by dev->qdisc_lock, destruction of the tree can be performed
      immediately and only the final free needs to happen in the rcu callback
      to make sure dev_queue_xmit doesn't access already freed memory.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85670cc1
  15. 18 9月, 2006 1 次提交
    • H
      [NET]: Drop tx lock in dev_watchdog_up · d7811e62
      Herbert Xu 提交于
      Fix lockdep warning with GRE, iptables and Speedtouch ADSL, PPP over ATM.
      
      On Sat, Sep 02, 2006 at 08:39:28PM +0000, Krzysztof Halasa wrote:
      > 
      > =======================================================
      > [ INFO: possible circular locking dependency detected ]
      > -------------------------------------------------------
      > swapper/0 is trying to acquire lock:
      >  (&dev->queue_lock){-+..}, at: [<c02c8c46>] dev_queue_xmit+0x56/0x290
      > 
      > but task is already holding lock:
      >  (&dev->_xmit_lock){-+..}, at: [<c02c8e14>] dev_queue_xmit+0x224/0x290
      > 
      > which lock already depends on the new lock.
      
      This turns out to be a genuine bug.  The queue lock and xmit lock are
      intentionally taken out of order.  Two things are supposed to prevent
      dead-locks from occuring:
      
      1) When we hold the queue_lock we're supposed to only do try_lock on the
      tx_lock.
      
      2) We always drop the queue_lock after taking the tx_lock and before doing
      anything else.
      
      > 
      > the existing dependency chain (in reverse order) is:
      > 
      > -> #1 (&dev->_xmit_lock){-+..}:
      >        [<c012e7b6>] lock_acquire+0x76/0xa0
      >        [<c0336241>] _spin_lock_bh+0x31/0x40
      >        [<c02d25a9>] dev_activate+0x69/0x120
      
      This path obviously breaks assumption 1) and therefore can lead to ABBA
      dead-locks.
      
      I've looked at the history and there seems to be no reason for the lock
      to be held at all in dev_watchdog_up.  The lock appeared in day one and
      even there it was unnecessary.  In fact, people added __dev_watchdog_up
      precisely in order to get around the tx lock there.
      
      The function dev_watchdog_up is already serialised by rtnl_lock since
      its only caller dev_activate is always called under it.
      
      So here is a simple patch to remove the tx lock from dev_watchdog_up.
      In 2.6.19 we can eliminate the unnecessary __dev_watchdog_up and
      replace it with dev_watchdog_up.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7811e62
  16. 22 7月, 2006 1 次提交
  17. 01 7月, 2006 1 次提交
  18. 23 6月, 2006 2 次提交
    • H
      [NET]: Add generic segmentation offload · f6a78bfc
      Herbert Xu 提交于
      This patch adds the infrastructure for generic segmentation offload.
      The idea is to tap into the potential savings of TSO without hardware
      support by postponing the allocation of segmented skb's until just
      before the entry point into the NIC driver.
      
      The same structure can be used to support software IPv6 TSO, as well as
      UFO and segmentation offload for other relevant protocols, e.g., DCCP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6a78bfc
    • H
      [NET]: Prevent transmission after dev_deactivate · d4828d85
      Herbert Xu 提交于
      The dev_deactivate function has bit-rotted since the introduction of
      lockless drivers.  In particular, the spin_unlock_wait call at the end
      has no effect on the xmit routine of lockless drivers.
      
      With a little bit of work, we can make it much more useful by providing
      the guarantee that when it returns, no more calls to the xmit routine
      of the underlying driver will be made.
      
      The idea is simple.  There are two entry points in to the xmit routine.
      The first comes from dev_queue_xmit.  That one is easily stopped by
      using synchronize_rcu.  This works because we set the qdisc to noop_qdisc
      before the synchronize_rcu call.  That in turn causes all subsequent
      packets sent to dev_queue_xmit to be dropped.  The synchronize_rcu call
      also ensures all outstanding calls leave their critical section.
      
      The other entry point is from qdisc_run.  Since we now have a bit that
      indicates whether it's running, all we have to do is to wait until the
      bit is off.
      
      I've removed the loop to wait for __LINK_STATE_SCHED to clear.  This is
      useless because netif_wake_queue can cause it to be set again.  It is
      also harmless because we've disarmed qdisc_run.
      
      I've also removed the spin_unlock_wait on xmit_lock because its only
      purpose of making sure that all outstanding xmit_lock holders have
      exited is also given by dev_watchdog_down.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4828d85
  19. 20 6月, 2006 1 次提交
    • H
      [NET]: Prevent multiple qdisc runs · 48d83325
      Herbert Xu 提交于
      Having two or more qdisc_run's contend against each other is bad because
      it can induce packet reordering if the packets have to be requeued.  It
      appears that this is an unintended consequence of relinquinshing the queue
      lock while transmitting.  That in turn is needed for devices that spend a
      lot of time in their transmit routine.
      
      There are no advantages to be had as devices with queues are inherently
      single-threaded (the loopback device is not but then it doesn't have a
      queue).
      
      Even if you were to add a queue to a parallel virtual device (e.g., bolt
      a tbf filter in front of an ipip tunnel device), you would still want to
      process the queue in sequence to ensure that the packets are ordered
      correctly.
      
      The solution here is to steal a bit from net_device to prevent this.
      
      BTW, as qdisc_restart is no longer used by anyone as a module inside the
      kernel (IIRC it used to with netif_wake_queue), I have not exported the
      new __qdisc_run function.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48d83325
  20. 18 6月, 2006 1 次提交
    • H
      [NET]: Add netif_tx_lock · 932ff279
      Herbert Xu 提交于
      Various drivers use xmit_lock internally to synchronise with their
      transmission routines.  They do so without setting xmit_lock_owner.
      This is fine as long as netpoll is not in use.
      
      With netpoll it is possible for deadlocks to occur if xmit_lock_owner
      isn't set.  This is because if a printk occurs while xmit_lock is held
      and xmit_lock_owner is not set can cause netpoll to attempt to take
      xmit_lock recursively.
      
      While it is possible to resolve this by getting netpoll to use
      trylock, it is suboptimal because netpoll's sole objective is to
      maximise the chance of getting the printk out on the wire.  So
      delaying or dropping the message is to be avoided as much as possible.
      
      So the only alternative is to always set xmit_lock_owner.  The
      following patch does this by introducing the netif_tx_lock family of
      functions that take care of setting/unsetting xmit_lock_owner.
      
      I renamed xmit_lock to _xmit_lock to indicate that it should not be
      used directly.  I didn't provide irq versions of the netif_tx_lock
      functions since xmit_lock is meant to be a BH-disabling lock.
      
      This is pretty much a straight text substitution except for a small
      bug fix in winbond.  It currently uses
      netif_stop_queue/spin_unlock_wait to stop transmission.  This is
      unsafe as an IRQ can potentially wake up the queue.  So it is safer to
      use netif_tx_disable.
      
      The hamradio bits used spin_lock_irq but it is unnecessary as
      xmit_lock must never be taken in an IRQ handler.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      932ff279
  21. 17 5月, 2006 1 次提交
  22. 21 3月, 2006 1 次提交
  23. 30 8月, 2005 1 次提交
  24. 24 8月, 2005 1 次提交
    • T
      [PKT_SCHED]: Fix missing qdisc_destroy() in qdisc_create_dflt() · 0fbbeb1b
      Thomas Graf 提交于
      qdisc_create_dflt() is missing to destroy the newly allocated
      default qdisc if the initialization fails resulting in leaks
      of all kinds. The only caller in mainline which may trigger
      this bug is sch_tbf.c in tbf_create_dflt_qdisc().
      
      Note: qdisc_create_dflt() doesn't fulfill the official locking
            requirements of qdisc_destroy() but since the qdisc could
            never be seen by the outside world this doesn't matter
            and it can stay as-is until the locking of pkt_sched
            is cleaned up.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fbbeb1b
  25. 19 7月, 2005 1 次提交
  26. 06 7月, 2005 1 次提交
  27. 19 6月, 2005 3 次提交