1. 16 5月, 2012 1 次提交
  2. 02 4月, 2012 1 次提交
  3. 04 1月, 2012 1 次提交
    • E
      net_sched: qdisc_alloc_handle() can be too slow · fa0f5aa7
      Eric Dumazet 提交于
      When trying to allocate ~32768 qdiscs using autohandle mechanism, we can
      fill the space managed by kernel (handles in [8000-FFFF]:0000 range)
      
      But O(N^2) qdisc_alloc_handle() loops 0x10000 times instead of 0x8000
      
      time tc add qdisc add dev eth0 parent 10:7fff pfifo limit 10
      RTNETLINK answers: Cannot allocate memory
      real    1m54.826s
      user    0m0.000s
      sys     0m0.004s
      
      INFO: rcu_sched_state detected stall on CPU 0 (t=60000 jiffies)
      
      Half number of loops, and add a cond_resched() call.
      We hold rtnl at this point.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Dave Taht <dave.taht@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa0f5aa7
  4. 06 7月, 2011 1 次提交
  5. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  6. 20 5月, 2011 1 次提交
  7. 26 2月, 2011 1 次提交
  8. 21 1月, 2011 2 次提交
    • E
      net_sched: RCU conversion of stab · a2da570d
      Eric Dumazet 提交于
      This patch converts stab qdisc management to RCU, so that we can perform
      the qdisc_calculate_pkt_len() call before getting qdisc lock.
      
      This shortens the lock's held time in __dev_xmit_skb().
      
      This permits more qdiscs to get TCQ_F_CAN_BYPASS status, avoiding lot of
      cache misses and so reducing latencies.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2da570d
    • E
      net_sched: move TCQ_F_THROTTLED flag · fd245a4a
      Eric Dumazet 提交于
      In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
      ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
      the cache line containing qdisc lock and often dirtied fields.
      
      I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
      mostly, and shared by all cpus. This should speedup HTB/CBQ for example.
      
      Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
      "unsigned int" for __state container, reducing by 8 bytes Qdisc size.
      
      Introduce helpers to hide implementation details.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd245a4a
  9. 20 1月, 2011 1 次提交
  10. 05 10月, 2010 1 次提交
  11. 30 9月, 2010 1 次提交
  12. 19 8月, 2010 1 次提交
  13. 11 8月, 2010 1 次提交
  14. 10 8月, 2010 1 次提交
  15. 24 5月, 2010 1 次提交
  16. 18 5月, 2010 1 次提交
  17. 31 3月, 2010 1 次提交
  18. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  19. 23 3月, 2010 1 次提交
  20. 29 1月, 2010 1 次提交
  21. 26 11月, 2009 1 次提交
  22. 11 11月, 2009 1 次提交
  23. 07 10月, 2009 1 次提交
    • E
      pkt_sched: gen_estimator: Dont report fake rate estimators · d250a5f9
      Eric Dumazet 提交于
      Jarek Poplawski a écrit :
      >
      >
      > Hmm... So you made me to do some "real" work here, and guess what?:
      > there is one serious checkpatch warning! ;-) Plus, this new parameter
      > should be added to the function description. Otherwise:
      > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
      >
      > Thanks,
      > Jarek P.
      >
      > PS: I guess full "Don't" would show we really mean it...
      
      Okay :) Here is the last round, before the night !
      
      Thanks again
      
      [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
      
      We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
      is running.
      
      # tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
       rate 0bit 0pps backlog 0b 0p requeues 0
      
      User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
      one (because no estimator is active)
      
      After this patch, tc command output is :
      $ tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      
      We add a parameter to gnet_stats_copy_rate_est() function so that
      it can use gen_estimator_active(bstats, r), as suggested by Jarek.
      
      This parameter can be NULL if check is not necessary, (htb for
      example has a mandatory rate estimator)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d250a5f9
  24. 16 9月, 2009 1 次提交
  25. 15 9月, 2009 2 次提交
  26. 10 9月, 2009 1 次提交
    • P
      net_sched: fix estimator lock selection for mq child qdiscs · 23bcf634
      Patrick McHardy 提交于
      When new child qdiscs are attached to the mq qdisc, they are actually
      attached as root qdiscs to the device queues. The lock selection for
      new estimators incorrectly picks the root lock of the existing and
      to be replaced qdisc, which results in a use-after-free once the old
      qdisc has been destroyed.
      
      Mark mq qdisc instances with a new flag and treat qdiscs attached to
      mq as children similar to regular root qdiscs.
      
      Additionally prevent estimators from being attached to the mq qdisc
      itself since it only updates its byte and packet counters during dumps.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23bcf634
  27. 06 9月, 2009 4 次提交
    • D
      net_sched: add classful multiqueue dummy scheduler · 6ec1c69a
      David S. Miller 提交于
      This patch adds a classful dummy scheduler which can be used as root qdisc
      for multiqueue devices and exposes each device queue as a child class.
      
      This allows to address queues individually and graft them similar to regular
      classes. Additionally it presents an accumulated view of the statistics of
      all real root qdiscs in the dummy root.
      
      Two new callbacks are added to the qdisc_ops and qdisc_class_ops:
      
      - cl_ops->select_queue selects the tx queue number for new child classes.
      
      - qdisc_ops->attach() overrides root qdisc device grafting to attach
        non-shared qdiscs to the queues.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ec1c69a
    • P
      net_sched: move dev_graft_qdisc() to sch_generic.c · 589983cd
      Patrick McHardy 提交于
      It will be used in a following patch by the multiqueue qdisc.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      589983cd
    • P
      net_sched: reintroduce dev->qdisc for use by sch_api · af356afa
      Patrick McHardy 提交于
      Currently the multiqueue integration with the qdisc API suffers from
      a few problems:
      
      - with multiple queues, all root qdiscs use the same handle. This means
        they can't be exposed to userspace in a backwards compatible fashion.
      
      - all API operations always refer to queue number 0. Newly created
        qdiscs are automatically shared between all queues, its not possible
        to address individual queues or restore multiqueue behaviour once a
        shared qdisc has been attached.
      
      - Dumps only contain the root qdisc of queue 0, in case of non-shared
        qdiscs this means the statistics are incomplete.
      
      This patch reintroduces dev->qdisc, which points to the (single) root qdisc
      from userspace's point of view. Currently it either points to the first
      (non-shared) default qdisc, or a qdisc shared between all queues. The
      following patches will introduce a classful dummy qdisc, which will be used
      as root qdisc and contain the per-queue qdiscs as children.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af356afa
    • P
      net_sched: make cls_ops->change and cls_ops->delete optional · de6d5cdf
      Patrick McHardy 提交于
      Some schedulers don't support creating, changing or deleting classes.
      Make the respective callbacks optionally and consistently return
      -EOPNOTSUPP for unsupported operations, instead of currently either
      -EOPNOTSUPP, -ENOSYS or no error.
      
      In case of sch_prio and sch_multiq, the removed operations additionally
      checked for an invalid class. This is not necessary since the class
      argument can only orginate from ->get() or in case of ->change is 0
      for creation of new classes, in which case ->change() incorrectly
      returned -ENOENT.
      
      As a side-effect, this patch fixes a possible (root-only) NULL pointer
      function call in sch_ingress, which didn't implement a so far mandatory
      ->delete() operation.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de6d5cdf
  28. 05 9月, 2009 1 次提交
  29. 03 9月, 2009 1 次提交
  30. 02 9月, 2009 1 次提交
    • D
      pkt_sched: Revert tasklet_hrtimer changes. · 2fbd3da3
      David S. Miller 提交于
      These are full of unresolved problems, mainly that conversions don't
      work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
      tasklets can't be killed from softirq context.
      
      And when a qdisc gets reset, that's exactly what we need to do here.
      
      We'll work this out in the net-next-2.6 tree and if warranted we'll
      backport that work to -stable.
      
      This reverts the following 3 changesets:
      
      a2cb6a4d
      ("pkt_sched: Fix bogon in tasklet_hrtimer changes.")
      
      38acce2d
      ("pkt_sched: Convert CBQ to tasklet_hrtimer.")
      
      ee5f9757
      ("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fbd3da3
  31. 25 8月, 2009 1 次提交
  32. 23 8月, 2009 1 次提交
  33. 15 6月, 2009 1 次提交
  34. 01 2月, 2009 1 次提交
  35. 23 12月, 2008 1 次提交