1. 18 7月, 2008 10 次提交
    • D
      netdevice: Move qdisc_list back into net_device proper. · ead81cc5
      David S. Miller 提交于
      And give it it's own lock.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ead81cc5
    • D
      pkt_sched: Use per-queue locking in shutdown_scheduler_queue. · 17715e62
      David S. Miller 提交于
      This eliminates another qdisc_lock_tree user.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17715e62
    • D
      pkt_sched: Perform bulk of qdisc destruction in RCU. · 8a34c5dc
      David S. Miller 提交于
      This allows less strict control of access to the qdisc attached to a
      netdev_queue.  It is even allowed to enqueue into a qdisc which is
      in the process of being destroyed.  The RCU handler will toss out
      those packets.
      
      We will need this to handle sharing of a qdisc amongst multiple
      TX queues.  In such a setup the lock has to be shared, so will
      be inside of the qdisc itself.  At which point the netdev_queue
      lock cannot be used to hard synchronize access to the ->qdisc
      pointer.
      
      One operation we have to keep inside of qdisc_destroy() is the list
      deletion.  It is the only piece of state visible after the RCU quiesce
      period, so we have to undo it early and under the appropriate locking.
      
      The operations in the RCU handler do not need any looking because the
      qdisc tree is no longer visible to anything at that point.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a34c5dc
    • D
      pkt_sched: dev_init_scheduler() does not need to lock qdisc tree. · 16361127
      David S. Miller 提交于
      We are registering the device, there is no way anyone can get
      at this object's qdiscs yet in any meaningful way.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16361127
    • D
      pkt_sched: Schedule qdiscs instead of netdev_queue. · 37437bb2
      David S. Miller 提交于
      When we have shared qdiscs, packets come out of the qdiscs
      for multiple transmit queues.
      
      Therefore it doesn't make any sense to schedule the transmit
      queue when logically we cannot know ahead of time the TX
      queue of the SKB that the qdisc->dequeue() will give us.
      
      Just for sanity I added a BUG check to make sure we never
      get into a state where the noop_qdisc is scheduled.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37437bb2
    • D
      pkt_sched: Add and use qdisc_root() and qdisc_root_lock(). · 7698b4fc
      David S. Miller 提交于
      When code wants to lock the qdisc tree state, the logic
      operation it's doing is locking the top-level qdisc that
      sits of the root of the netdev_queue.
      
      Add qdisc_root_lock() to represent this and convert the
      easiest cases.
      
      In order for this to work out in all cases, we have to
      hook up the noop_qdisc to a dummy netdev_queue.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7698b4fc
    • D
      pkt_sched: Make QDISC_RUNNING a qdisc state. · e2627c8c
      David S. Miller 提交于
      Currently it is associated with a netdev_queue, but when we have
      qdisc sharing that no longer makes any sense.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2627c8c
    • D
      pkt_sched: Move gso_skb into Qdisc. · d3b753db
      David S. Miller 提交于
      We liberate any dangling gso_skb during qdisc destruction.
      
      It really only matters for the root qdisc.  But when qdiscs
      can be shared by multiple netdev_queue objects, we can't
      have the gso_skb in the netdev_queue any more.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3b753db
    • D
      net: Use queue aware tests throughout. · fd2ea0a7
      David S. Miller 提交于
      This effectively "flips the switch" by making the core networking
      and multiqueue-aware drivers use the new TX multiqueue structures.
      
      Non-multiqueue drivers need no changes.  The interfaces they use such
      as netif_stop_queue() degenerate into an operation on TX queue zero.
      So everything "just works" for them.
      
      Code that really wants to do "X" to all TX queues now invokes a
      routine that does so, such as netif_tx_wake_all_queues(),
      netif_tx_stop_all_queues(), etc.
      
      pktgen and netpoll required a little bit more surgery than the others.
      
      In particular the pktgen changes, whilst functional, could be largely
      improved.  The initial check in pktgen_xmit() will sometimes check the
      wrong queue, which is mostly harmless.  The thing to do is probably to
      invoke fill_packet() earlier.
      
      The bulk of the netpoll changes is to make the code operate solely on
      the TX queue indicated by by the SKB queue mapping.
      
      Setting of the SKB queue mapping is entirely confined inside of
      net/core/dev.c:dev_pick_tx().  If we end up needing any kind of
      special semantics (drops, for example) it will be implemented here.
      
      Finally, we now have a "real_num_tx_queues" which is where the driver
      indicates how many TX queues are actually active.
      
      With IGB changes from Jeff Kirsher.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd2ea0a7
    • D
      netdev: Allocate multiple queues for TX. · e8a0464c
      David S. Miller 提交于
      alloc_netdev_mq() now allocates an array of netdev_queue
      structures for TX, based upon the queue_count argument.
      
      Furthermore, all accesses to the TX queues are now vectored
      through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
      interfaces.  This makes it easy to grep the tree for all
      things that want to get to a TX queue of a net device.
      
      Problem spots which are not really multiqueue aware yet, and
      only work with one queue, can easily be spotted by grepping
      for all netdev_get_tx_queue() calls that pass in a zero index.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8a0464c
  2. 09 7月, 2008 11 次提交
  3. 28 6月, 2008 1 次提交
  4. 03 5月, 2008 1 次提交
  5. 29 3月, 2008 1 次提交
  6. 29 1月, 2008 5 次提交
  7. 14 11月, 2007 1 次提交
    • P
      [PKT_SCHED]: Check subqueue status before calling hard_start_xmit · 5f1a485d
      Peter P Waskiewicz Jr 提交于
      The only qdiscs that check subqueue state before dequeue'ing are PRIO
      and RR.  The other qdiscs, including the default pfifo_fast qdisc,
      will allow traffic bound for subqueue 0 through to hard_start_xmit.
      The check for netif_queue_stopped() is done above in pkt_sched.h, so
      it is unnecessary for qdisc_restart().  However, if the underlying
      driver is multiqueue capable, and only sets queue states on subqueues,
      this will allow packets to enter the driver when it's currently unable
      to process packets, resulting in expensive requeues and driver
      entries.  This patch re-adds the check for the subqueue status before
      calling hard_start_xmit, so we can try and avoid the driver entry when
      the queues are stopped.
      Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f1a485d
  8. 19 10月, 2007 1 次提交
    • H
      [NET]: Fix possible dev_deactivate race condition · ce0e32e6
      Herbert Xu 提交于
      The function dev_deactivate is supposed to only return when
      all outstanding transmissions have completed.  Unfortunately
      it is possible for store operations in the driver's transmit
      function to only become visible after dev_deactivate returns.
      
      This patch fixes this by taking the queue lock after we see
      the end of the queue run.  This ensures that all effects of
      any previous transmit calls are visible.
      
      If however we detect that there is another queue run occuring,
      then we'll warn about it because this should never happen as
      we have pointed dev->qdisc to noop_qdisc within the same queue
      lock earlier in the functino.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce0e32e6
  9. 18 10月, 2007 1 次提交
    • J
      [NET]: fix carrier-on bug? · bfaae0f0
      Jeff Garzik 提交于
      While looking at a net driver with the following construct,
      
      	if (!netif_carrier_ok(dev))
      		netif_carrier_on(dev);
      
      it stuck me that the netif_carrier_ok() check was redundant, since
      netif_carrier_on() checks bit __LINK_STATE_NOCARRIER anyway.  This is
      the same reason why netif_queue_stopped() need not be called prior to
      netif_wake_queue().
      
      This is true, but there is however an unwanted side effect from assuming
      that netif_carrier_on() can be called multiple times:  it touches the
      watchdog, regardless of pre-existing carrier state.
      
      The fix:  move watchdog-up inside the bit-cleared code path.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfaae0f0
  10. 11 10月, 2007 2 次提交
    • J
      [NET_SCHED]: explict hold dev tx lock · 8236632f
      Jamal Hadi Salim 提交于
      For N cpus, with full throttle traffic on all N CPUs, funneling traffic
      to the same ethernet device, the devices queue lock is contended by all
      N CPUs constantly. The TX lock is only contended by a max of 2 CPUS.
      In the current mode of operation, after all the work of entering the
      dequeue region, we may endup aborting the path if we are unable to get
      the tx lock and go back to contend for the queue lock. As N goes up,
      this gets worse.
      
      The changes in this patch result in a small increase in performance
      with a 4CPU (2xdual-core) with no irq binding. Both e1000 and tg3
      showed similar behavior;
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8236632f
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  11. 11 7月, 2007 5 次提交
    • P
      [NET_SCHED]: Remove unnecessary includes · 0ba48053
      Patrick McHardy 提交于
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ba48053
    • P
      [NET_SCHED]: Remove CONFIG_NET_ESTIMATOR option · 876d48aa
      Patrick McHardy 提交于
      The generic estimator is always built in anways and all the config options
      does is prevent including a minimal amount of code for setting it up.
      Additionally the option is already automatically selected for most cases.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876d48aa
    • K
      [NET]: qdisc_restart - couple of optimizations. · e50c41b5
      Krishna Kumar 提交于
      Changes :
      
      - netif_queue_stopped need not be called inside qdisc_restart as
        it has been called already in qdisc_run() before the first skb
        is sent, and in __qdisc_run() after each intermediate skb is
        sent (note : we are the only sender, so the queue cannot get
        stopped while the tx lock was got in the ~LLTX case).
      
      - BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
        meant more packets are available, and __qdisc_run used to loop
        when qdisc_restart() returned -1. During those days, it was
        necessary to make sure that qlen is never less than zero, since
        __qdisc_run would get into an infinite loop if no packets are on
        the queue and this bug in qdisc was there (and worse - no more
        skbs could ever get queue'd as we hold the queue lock too). With
        Herbert's recent change to return values, this check is not
        required.  Hopefully Herbert can validate this change. If at all
        this is required, it should be added to skb_dequeue (in failure
        case), and not to qdisc_qlen.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e50c41b5
    • K
      [NET]: qdisc_restart - readability changes plus one bug fix. · 6c1361a6
      Krishna Kumar 提交于
      New changes :
      
      - Incorporated Peter Waskiewicz's comments.
      - Re-added back one warning message (on driver returning wrong value).
      
      Previous changes :
      
      - Converted to use switch/case code which looks neater.
      
      - "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless
        check should be removed, since driver will return NETDEV_TX_LOCKED only
        if lockless is true and driver has to do the locking. In the original
        code as well as the latest code, this code can result in a bug where
        if LLTX is not set for a driver (lockless == 0) but the driver is written
        wrongly to do a trylock (despite LLTX being set), the driver returns
        LOCKED. But since lockless is zero, the packet is requeue'd instead of
        calling collision code which will issue warning and free up the skb.
        Instead this skb will be retried with this driver next time, and the same
        result will ensue. Removing this check will catch these driver bugs instead
        of hiding the problem. I am keeping this change to readability section
        since :
        	a. it is confusing to check two things as it is; and
        	b. it is difficult to keep this check in the changed 'switch' code.
      
      - Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is
        the work being done and easier to understand) and do_dev_requeue to
        dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to
        dev_handle_collision (handle_dev_cpu_collision is a small routine with only
        one caller, so there is no need to have two separate routines which also
        results in getting rid of two macros, etc.
      
      - Removed an XXX comment as it should never fail (I suspect this was related
        to batch skb WIP, Jamal ?). Converted some functions to original coding
        style of having the return values and the function name on same line, eg
        prio2list.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c1361a6
    • J
      [NET_SCHED]: Cleanup readability of qdisc restart · c716a81a
      Jamal Hadi Salim 提交于
      Over the years this code has gotten hairier. Resulting in many long
      discussions over long summer days and patches that get it wrong.
      This patch helps tame that code so normal people will understand it.
      
      Thanks to Thomas Graf, Peter J. waskiewicz Jr, and Patrick McHardy
      for their valuable reviews.
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c716a81a
  12. 04 6月, 2007 1 次提交