1. 24 5月, 2011 1 次提交
    • E
      sch_sfq: avoid giving spurious NET_XMIT_CN signals · 8efa8854
      Eric Dumazet 提交于
      While chasing a possible net_sched bug, I found that IP fragments have
      litle chance to pass a congestioned SFQ qdisc :
      
      - Say SFQ qdisc is full because one flow is non responsive.
      - ip_fragment() wants to send two fragments belonging to an idle flow.
      - sfq_enqueue() queues first packet, but see queue limit reached :
      - sfq_enqueue() drops one packet from 'big consumer', and returns
      NET_XMIT_CN.
      - ip_fragment() cancel remaining fragments.
      
      This patch restores fairness, making sure we return NET_XMIT_CN only if
      we dropped a packet from the same flow.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8efa8854
  2. 23 4月, 2011 1 次提交
  3. 03 2月, 2011 1 次提交
  4. 22 1月, 2011 1 次提交
    • E
      net_sched: TCQ_F_CAN_BYPASS generalization · 23624935
      Eric Dumazet 提交于
      Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
      __dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
      than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq
      
      SFQ is special because it can have external classifiers, and in these
      cases, we cannot bypass queue discipline (packet could be dropped by
      classifier) without admin asking it, or further changes.
      
      Its worth doing this, especially for SFQ, avoiding dirtying memory in
      case no packets are already waiting in queue.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23624935
  5. 21 1月, 2011 2 次提交
    • E
      net_sched: accurate bytes/packets stats/rates · 9190b3b3
      Eric Dumazet 提交于
      In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
      a problem with pfifo_head drops that incorrectly decreased
      sch->bstats.bytes and sch->bstats.packets
      
      Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
      previously enqueued packet, and bstats cannot be changed, so
      bstats/rates are not accurate (over estimated)
      
      This patch changes the qdisc_bstats updates to be done at dequeue() time
      instead of enqueue() time. bstats counters no longer account for dropped
      frames, and rates are more correct, since enqueue() bursts dont have
      effect on dequeue() rate.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9190b3b3
    • E
      net_sched: sfq: allow divisor to be a parameter · 817fb15d
      Eric Dumazet 提交于
      SFQ currently uses a 1024 slots hash table, and its internal structure
      (sfq_sched_data) allocation needs order-1 page on x86_64
      
      Allow tc command to specify a divisor value (hash table size), between 1
      and 65536.
      If no value is provided, assume the 1024 default size.
      
      This allows admins to setup smaller (or bigger) SFQ for specific needs.
      
      This also brings back sfq_sched_data allocations to order-0 ones, saving
      3KB per SFQ qdisc.
      
      Jesper uses ~55.000 SFQ in one machine, this patch should free 165 MB of
      memory.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      817fb15d
  6. 20 1月, 2011 1 次提交
  7. 11 1月, 2011 1 次提交
  8. 01 1月, 2011 2 次提交
  9. 23 12月, 2010 1 次提交
    • E
      sfq: fix sfq class stats handling · ee09b3c1
      Eric Dumazet 提交于
      sfq_walk() runs without qdisc lock. By the time it selects a non empty
      hash slot and sfq_dump_class_stats() is run (with lock held), slot might
      have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of
      bounds, and crash in slot_queue_walk()
      
      On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and
      allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal
      memory access happens, only possibly wrong data reported to user.
      
      Also, slot_dequeue_tail() should make sure slot skb chain is correctly
      terminated, or sfq_dump_class_stats() can access freed skbs.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee09b3c1
  10. 21 12月, 2010 3 次提交
    • E
      net_sched: sch_sfq: better struct layouts · eda83e3b
      Eric Dumazet 提交于
      Here is a respin of patch.
      
      I'll send a short patch to make SFQ more fair in presence of large
      packets as well.
      
      Thanks
      
      [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts
      
      This patch shrinks sizeof(struct sfq_sched_data)
      from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and
      reduce text size as well.
      
         text    data     bss     dec     hex filename
         4821     152       0    4973    136d old/net/sched/sch_sfq.o
         4627     136       0    4763    129b new/net/sched/sch_sfq.o
      
      All data for a slot/flow is now grouped in a compact and cache friendly
      structure, instead of being spreaded in many different points.
      
      struct sfq_slot {
              struct sk_buff  *skblist_next;
              struct sk_buff  *skblist_prev;
              sfq_index       qlen; /* number of skbs in skblist */
              sfq_index       next; /* next slot in sfq chain */
              struct sfq_head dep; /* anchor in dep[] chains */
              unsigned short  hash; /* hash value (index in ht[]) */
              short           allot; /* credit for this slot */
      };
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Jarek Poplawski <jarkao2@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eda83e3b
    • E
      net_sched: sch_sfq: fix allot handling · aa3e2199
      Eric Dumazet 提交于
      When deploying SFQ/IFB here at work, I found the allot management was
      pretty wrong in sfq, even changing allot from short to int...
      
      We should init allot for each new flow, not using a previous value found
      in slot.
      
      Before patch, I saw bursts of several packets per flow, apparently
      denying the default "quantum 1514" limit I had on my SFQ class.
      
      class sfq 11:1 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 7p requeues 0 
       allot 11546 
      
      class sfq 11:46 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 1p requeues 0 
       allot -23873 
      
      class sfq 11:78 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 5p requeues 0 
       allot 11393 
      
      After patch, better fairness among each flow, allot limit being
      respected, allot is positive :
      
      class sfq 11:e parent 11: 
       (dropped 0, overlimits 0 requeues 86) 
       backlog 0b 3p requeues 86 
       allot 596 
      
      class sfq 11:94 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 3p requeues 0 
       allot 1468 
      
      class sfq 11:a4 parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 4p requeues 0 
       allot 650 
      
      class sfq 11:bb parent 11: 
       (dropped 0, overlimits 0 requeues 0) 
       backlog 0b 3p requeues 0 
       allot 596 
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa3e2199
    • E
      net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() · c4266263
      Eric Dumazet 提交于
      We currently return for each active SFQ slot the number of packets in
      queue. We can also give number of bytes accounted for these packets.
      
      tc -s class show dev ifb0
      
      Before patch :
      
      class sfq 11:3d9 parent 11:
       (dropped 0, overlimits 0 requeues 0)
       backlog 0b 3p requeues 0
       allot 1266
      
      After patch :
      
      class sfq 11:3e4 parent 11:
       (dropped 0, overlimits 0 requeues 0)
       backlog 4380b 3p requeues 0
       allot 1212
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4266263
  11. 20 8月, 2010 1 次提交
  12. 11 8月, 2010 1 次提交
  13. 10 8月, 2010 2 次提交
  14. 08 8月, 2010 1 次提交
  15. 05 8月, 2010 1 次提交
  16. 21 4月, 2010 1 次提交
  17. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  18. 06 9月, 2009 1 次提交
    • P
      net_sched: make cls_ops->change and cls_ops->delete optional · de6d5cdf
      Patrick McHardy 提交于
      Some schedulers don't support creating, changing or deleting classes.
      Make the respective callbacks optionally and consistently return
      -EOPNOTSUPP for unsupported operations, instead of currently either
      -EOPNOTSUPP, -ENOSYS or no error.
      
      In case of sch_prio and sch_multiq, the removed operations additionally
      checked for an invalid class. This is not necessary since the class
      argument can only orginate from ->get() or in case of ->change is 0
      for creation of new classes, in which case ->change() incorrectly
      returned -ENOENT.
      
      As a side-effect, this patch fixes a possible (root-only) NULL pointer
      function call in sch_ingress, which didn't implement a so far mandatory
      ->delete() operation.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de6d5cdf
  19. 03 6月, 2009 1 次提交
  20. 09 1月, 2009 1 次提交
  21. 22 12月, 2008 1 次提交
  22. 14 11月, 2008 1 次提交
  23. 31 10月, 2008 1 次提交
  24. 21 9月, 2008 1 次提交
  25. 05 8月, 2008 2 次提交
    • J
      net_sched: Add qdisc __NET_XMIT_BYPASS flag · c27f339a
      Jarek Poplawski 提交于
      Patrick McHardy <kaber@trash.net> noticed that it would be nice to
      handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag
      __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit().
      
      David Miller <davem@davemloft.net> spotted a serious bug in the first
      version of this patch.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c27f339a
    • J
      net_sched: Add qdisc __NET_XMIT_STOLEN flag · 378a2f09
      Jarek Poplawski 提交于
      Patrick McHardy <kaber@trash.net> noticed:
      "The other problem that affects all qdiscs supporting actions is
      TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
      even though the packet is not queued, corrupting upper qdiscs'
      qlen counters."
      
      and later explained:
      "The reason why it translates it at all seems to be to not increase
      the drops counter. Within a single qdisc this could be avoided by
      other means easily, upper qdiscs would still increase the counter
      when we return anything besides NET_XMIT_SUCCESS though.
      
      This means we need a new NET_XMIT return value to indicate this to
      the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
      return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
      in dev_queue_xmit, similar to NET_XMIT_BYPASS."
      
      David Miller <davem@davemloft.net> noticed:
      "Maybe these NET_XMIT_* values being passed around should be a set of
      bits. They could be composed of base meanings, combined with specific
      attributes.
      
      So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"
      
      The attributes get masked out by the top-level ->enqueue() caller,
      such that the base meanings are the only thing that make their
      way up into the stack. If it's only about communication within the
      qdisc tree, let's simply code it that way."
      
      This patch is trying to realize these ideas.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      378a2f09
  26. 26 7月, 2008 1 次提交
  27. 24 7月, 2008 1 次提交
  28. 20 7月, 2008 1 次提交
  29. 09 7月, 2008 1 次提交
  30. 02 7月, 2008 1 次提交
  31. 29 4月, 2008 1 次提交
  32. 01 2月, 2008 2 次提交
  33. 29 1月, 2008 1 次提交