1. 06 9月, 2009 3 次提交
    • D
      net_sched: add classful multiqueue dummy scheduler · 6ec1c69a
      David S. Miller 提交于
      This patch adds a classful dummy scheduler which can be used as root qdisc
      for multiqueue devices and exposes each device queue as a child class.
      
      This allows to address queues individually and graft them similar to regular
      classes. Additionally it presents an accumulated view of the statistics of
      all real root qdiscs in the dummy root.
      
      Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
      
      - cl_ops->select_queue selects the tx queue number for new child classes.
      
      - qdisc_ops->attach() overrides root qdisc device grafting to attach
        non-shared qdiscs to the queues.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ec1c69a
    • P
      net_sched: move dev_graft_qdisc() to sch_generic.c · 589983cd
      Patrick McHardy 提交于
      It will be used in a following patch by the multiqueue qdisc.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      589983cd
    • P
      net_sched: reintroduce dev->qdisc for use by sch_api · af356afa
      Patrick McHardy 提交于
      Currently the multiqueue integration with the qdisc API suffers from
      a few problems:
      
      - with multiple queues, all root qdiscs use the same handle. This means
        they can't be exposed to userspace in a backwards compatible fashion.
      
      - all API operations always refer to queue number 0. Newly created
        qdiscs are automatically shared between all queues, its not possible
        to address individual queues or restore multiqueue behaviour once a
        shared qdisc has been attached.
      
      - Dumps only contain the root qdisc of queue 0, in case of non-shared
        qdiscs this means the statistics are incomplete.
      
      This patch reintroduces dev->qdisc, which points to the (single) root qdisc
      from userspace's point of view. Currently it either points to the first
      (non-shared) default qdisc, or a qdisc shared between all queues. The
      following patches will introduce a classful dummy qdisc, which will be used
      as root qdisc and contain the per-queue qdiscs as children.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af356afa
  2. 31 8月, 2009 1 次提交
    • K
      pkt_sched: Fix resource limiting in pfifo_fast · a453e068
      Krishna Kumar 提交于
      pfifo_fast_enqueue has this check:
              if (skb_queue_len(list) < qdisc_dev(qdisc)->tx_queue_len) {
      
      which allows each band to enqueue upto tx_queue_len skbs for a
      total of 3*tx_queue_len skbs. I am not sure if this was the
      intention of limiting in qdisc.
      
      Patch compiled and 32 simultaneous netperf testing ran fine. Also:
      # tc -s qdisc show dev eth2
      qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 16835026752 bytes 373116 pkt (dropped 0, overlimits 0 requeues 25) 
       rate 0bit 0pps backlog 0b 0p requeues 25 
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a453e068
  3. 29 8月, 2009 1 次提交
  4. 07 8月, 2009 1 次提交
    • K
      net: Avoid enqueuing skb for default qdiscs · bbd8a0d3
      Krishna Kumar 提交于
      dev_queue_xmit enqueue's a skb and calls qdisc_run which
      dequeue's the skb and xmits it. In most cases, the skb that
      is enqueue'd is the same one that is dequeue'd (unless the
      queue gets stopped or multiple cpu's write to the same queue
      and ends in a race with qdisc_run). For default qdiscs, we
      can remove the redundant enqueue/dequeue and simply xmit the
      skb since the default qdisc is work-conserving.
      
      The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
      default fast queue. The controversial part of the patch is
      incrementing qlen when a skb is requeued - this is to avoid
      checks like the second line below:
      
      +  } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
      >>         !q->gso_skb &&
      +          !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {
      
      Results of a 2 hour testing for multiple netperf sessions (1,
      2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
      aggregate Mb/s across iterations tested with this version on
      System-X boxes with Chelsio 10gbps cards:
      
      ----------------------------------
      Size |  ORG BW          NEW BW   |
      ----------------------------------
      128K |  156964          159381   |
      256K |  158650          162042   |
      ----------------------------------
      
      Changes from ver1:
      
      1. Move sch_direct_xmit declaration from sch_generic.h to
         pkt_sched.h
      2. Update qdisc basic statistics for direct xmit path.
      3. Set qlen to zero in qdisc_reset.
      4. Changed some function names to more meaningful ones.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbd8a0d3
  5. 18 5月, 2009 1 次提交
    • E
      net: tx scalability works : trans_start · 9d21493b
      Eric Dumazet 提交于
      struct net_device trans_start field is a hot spot on SMP and high performance
      devices, particularly multi queues ones, because every transmitter dirties
      it. Is main use is tx watchdog and bonding alive checks.
      
      But as most devices dont use NETIF_F_LLTX, we have to lock
      a netdev_queue before calling their ndo_start_xmit(). So it makes
      sense to move trans_start from net_device to netdev_queue. Its update
      will occur on a already present (and in exclusive state) cache line, for
      free.
      
      We can do this transition smoothly. An old driver continue to
      update dev->trans_start, while an updated one updates txq->trans_start.
      
      Further patches could also put tx_bytes/tx_packets counters in 
      netdev_queue to avoid dirtying dev->stats (vlan device comes to mind)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d21493b
  6. 06 1月, 2009 1 次提交
  7. 05 1月, 2009 1 次提交
    • M
      net: Fix for initial link state in 2.6.28 · 22604c86
      Michael Marineau 提交于
      From: Michael Marineau <mike@marineau.org>
      
      Commit b4730016 "Do not fire linkwatch
      events until the device is registered." was made as a workaround for
      drivers that call netif_carrier_off before registering the device.
      Unfortunately this causes these drivers to incorrectly report their
      link status as IF_OPER_UNKNOWN which can falsely set the IFF_RUNNING
      flag when the interface is first brought up. This issues was
      previously pointed out[1] but was dismissed saying that IFF_RUNNING is
      not related to the link status. From my digging IFF_RUNNING, as
      reported to userspace, is based on the link state. It is set based on
      __LINK_STATE_START and IF_OPER_UP or IF_OPER_UNKNOWN. See [2], [3],
      and [4]. (Whether or not the kernel has IFF_RUNNING set in flags is
      not reported to user space so it may well be independent of the link,
      I don't know if and when it may get set.)
      
      The end result depends slightly depending on the driver. The the two I
      tested were e1000e and b44. With e1000e if the system is booted
      without a network cable attached the interface will falsely report
      RUNNING when it is brought up causing NetworkManager to attempt to
      start it and eventually time out. With b44 when the system is booted
      with a network cable attached and brought up with dhcpcd it will time
      out the first time.
      
      The attached patch that will still set the operstate variable
      correctly to IF_OPER_UP/DOWN/etc when linkwatch_fire_event is called
      but then return rather than skipping the linkwatch_fire_event call
      entirely as the previous fix did. (sorry it isn't inline, I don't have
      a patch friendly email client at the moment)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22604c86
  8. 20 11月, 2008 2 次提交
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c
    • D
      net: Do not fire linkwatch events until the device is registered. · b4730016
      David S. Miller 提交于
      Several device drivers try to do things like netif_carrier_off()
      before register_netdev() is invoked.  This is bogus, but too many
      drivers do this to fix them all up in one go.
      Reported-by: NFolkert van Heusden <folkert@vanheusden.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4730016
  9. 14 11月, 2008 1 次提交
  10. 03 11月, 2008 1 次提交
  11. 31 10月, 2008 1 次提交
  12. 20 10月, 2008 1 次提交
    • J
      pkt_sched: sch_generic: Fix oops in sch_teql · 9f3ffae0
      Jarek Poplawski 提交于
      After these commands:
      # modprobe sch_teql
      # tc qdisc add dev eth0 root teql0
      # tc qdisc del dev eth0 root
      we get an oops in teql_destroy() when spin_lock is taken from a null
      qdisc_sleeping pointer. It's because at the moment teql0 dev haven't
      been activated yet, and a qdisc_root_sleeping() is pointing to noop
      qdisc's netdev_queue with qdisc_sleeping uninitialized. This patch
      fixes this both for noop and noqueue netdev_queues to avoid similar
      problems in the future.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f3ffae0
  13. 09 10月, 2008 1 次提交
  14. 07 10月, 2008 2 次提交
  15. 23 9月, 2008 3 次提交
  16. 17 9月, 2008 1 次提交
  17. 09 9月, 2008 1 次提交
  18. 27 8月, 2008 1 次提交
  19. 22 8月, 2008 1 次提交
  20. 19 8月, 2008 1 次提交
    • D
      pkt_sched: Don't hold qdisc lock over qdisc_destroy(). · 4d8863a2
      David S. Miller 提交于
      Based upon reports by Denys Fedoryshchenko, and feedback
      and help from Jarek Poplawski and Herbert Xu.
      
      We always either:
      
      1) Never made an external reference to this qdisc.
      
      or
      
      2) Did a dev_deactivate() which purged all asynchronous
         references.
      
      So do not lock the qdisc when we call qdisc_destroy(),
      it's illegal anyways as when we drop the lock this is
      free'd memory.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d8863a2
  21. 18 8月, 2008 3 次提交
    • D
      pkt_sched: No longer destroy qdiscs from RCU. · 1e0d5a57
      David S. Miller 提交于
      We can now kill them synchronously with all of the
      previous dev_deactivate() cures.
      
      This makes netdev destruction and shutdown saner as
      the qdiscs hold references to the device.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e0d5a57
    • D
      pkt_sched: Simplify dev_deactivate() polling loop. · 4335cd2d
      David S. Miller 提交于
      The condition under which the previous qdisc has no more references
      after we've attached &noop_qdisc is that both RUNNING and SCHED
      are both seen clear while holding the root lock.
      
      So just make specifically that check in the polling loop, instead
      of this overly complex "check without then check with lock held"
      sequence.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4335cd2d
    • D
      pkt_sched: Add 'deactivated' state. · a9312ae8
      David S. Miller 提交于
      This new state lets dev_deactivate() mark a qdisc as having been
      deactivated.
      
      dev_queue_xmit() and ing_filter() check for this bit and do not
      try to process the qdisc if the bit is set.
      
      dev_deactivate() polls the qdisc after setting the bit, waiting
      for both __QDISC_STATE_RUNNING and __QDISC_STATE_SCHED to clear.
      
      This isn't perfect yet, but subsequent changesets will make it so.
      This part is just one piece of the puzzle.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9312ae8
  22. 14 8月, 2008 1 次提交
    • D
      pkt_sched: Fix queue quiescence testing in dev_deactivate(). · b9a3b110
      David S. Miller 提交于
      Based upon discussions with Jarek P. and Herbert Xu.
      
      First, we're testing the wrong qdisc.  We just reset the device
      queue qdiscs to &noop_qdisc and checking it's state is completely
      pointless here.
      
      We want to wait until the previous qdisc that was sitting at
      the ->qdisc pointer is not busy any more.  And that would be
      ->qdisc_sleeping.
      
      Because of how we propagate the samples qdisc pointer down into
      qdisc_run and friends via per-cpu ->output_queue and netif_schedule,
      we have to wait also for the __QDISC_STATE_SCHED bit to clear as
      well.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9a3b110
  23. 03 8月, 2008 1 次提交
  24. 01 8月, 2008 1 次提交
  25. 30 7月, 2008 1 次提交
    • D
      pkt_sched: Fix OOPS on ingress qdisc add. · 8d50b53d
      David S. Miller 提交于
      Bug report from Steven Jan Springl:
      
      	Issuing the following command causes a kernel oops:
      		tc qdisc add dev eth0 handle ffff: ingress
      
      The problem mostly stems from all of the special case handling of
      ingress qdiscs.
      
      So, to fix this, do the grafting operation the same way we do for TX
      qdiscs.  Which means that dev_activate() and dev_deactivate() now do
      the "qdisc_sleeping <--> qdisc" transitions on dev->rx_queue too.
      
      Future simplifications are possible now, mainly because it is
      impossible for dev_queue->{qdisc,qdisc_sleeping} to be NULL.  There
      are NULL checks all over to handle the ingress qdisc special case
      that used to exist before this commit.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d50b53d
  26. 26 7月, 2008 1 次提交
  27. 25 7月, 2008 1 次提交
  28. 22 7月, 2008 3 次提交
  29. 21 7月, 2008 1 次提交
  30. 20 7月, 2008 1 次提交