1. 11 6月, 2016 1 次提交
    • E
      net_sched: remove generic throttled management · 45f50bed
      Eric Dumazet 提交于
      __QDISC_STATE_THROTTLED bit manipulation is rather expensive
      for HTB and few others.
      
      I already removed it for sch_fq in commit f2600cf0
      ("net: sched: avoid costly atomic operation in fq_dequeue()")
      and so far nobody complained.
      
      When one ore more packets are stuck in one or more throttled
      HTB class, a htb dequeue() performs two atomic operations
      to clear/set __QDISC_STATE_THROTTLED bit, while root qdisc
      lock is held.
      
      Removing this pair of atomic operations bring me a 8 % performance
      increase on 200 TCP_RR tests, in presence of throttled classes.
      
      This patch has no side effect, since nothing actually uses
      disc_is_throttled() anymore.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f50bed
  2. 09 6月, 2016 1 次提交
  3. 08 6月, 2016 1 次提交
    • E
      net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1
      Eric Dumazet 提交于
      Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
      agent [1] are problematic at scale :
      
      For each qdisc/class found in the dump, we currently lock the root qdisc
      spinlock in order to get stats. Sampling stats every 5 seconds from
      thousands of HTB classes is a challenge when the root qdisc spinlock is
      under high pressure. Not only the dumps take time, they also slow
      down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.
      
      An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
      that might need the qdisc lock in fq_codel_dump_stats() and
      fq_codel_dump_class_stats()
      
      In v2 of this patch, I now use the Qdisc running seqcount to provide
      consistent reads of packets/bytes counters, regardless of 32/64 bit arches.
      
      I also changed rate estimators to use the same infrastructure
      so that they no longer need to lock root qdisc lock.
      
      [1]
      http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Kevin Athey <kda@google.com>
      Cc: Xiaotian Pei <xiaotian@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edb09eb1
  4. 04 6月, 2016 1 次提交
  5. 01 3月, 2016 2 次提交
  6. 28 8月, 2015 1 次提交
    • D
      net: sched: consolidate tc_classify{,_compat} · 3b3ae880
      Daniel Borkmann 提交于
      For classifiers getting invoked via tc_classify(), we always need an
      extra function call into tc_classify_compat(), as both are being
      exported as symbols and tc_classify() itself doesn't do much except
      handling of reclassifications when tp->classify() returned with
      TC_ACT_RECLASSIFY.
      
      CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
      all others use tc_classify(). When tc actions are being configured
      out in the kernel, tc_classify() effectively does nothing besides
      delegating.
      
      We could spare this layer and consolidate both functions. pktgen on
      single CPU constantly pushing skbs directly into the netif_receive_skb()
      path with a dummy classifier on ingress qdisc attached, improves
      slightly from 22.3Mpps to 23.1Mpps.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b3ae880
  7. 30 9月, 2014 4 次提交
  8. 14 9月, 2014 2 次提交
  9. 14 3月, 2014 1 次提交
  10. 11 6月, 2013 1 次提交
    • E
      net_sched: add 64bit rate estimators · 45203a3b
      Eric Dumazet 提交于
      struct gnet_stats_rate_est contains u32 fields, so the bytes per second
      field can wrap at 34360Mbit.
      
      Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
      and switch the kernel to use this structure natively.
      
      This structure is dumped to user space as a new attribute :
      
      TCA_STATS_RATE_EST64
      
      Old tc command will now display the capped bps (to 34360Mbit), instead
      of wrapped values, and updated tc command will display correct
      information.
      
      Old tc command output, after patch :
      
      eric:~# tc -s -d qd sh dev lo
      qdisc pfifo 8001: root refcnt 2 limit 1000p
       Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
       rate 34360Mbit 189696pps backlog 0b 0p requeues 0
      
      This patch carefully reorganizes "struct Qdisc" layout to get optimal
      performance on SMP.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45203a3b
  11. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  12. 11 5月, 2012 1 次提交
  13. 02 4月, 2012 1 次提交
  14. 24 12月, 2011 1 次提交
    • E
      sch_hfsc: report backlog information · f5a59b73
      Eric Dumazet 提交于
      Add backlog (byte count) information in hfsc classes and qdisc, so that
      "tc -s" can report it to user, instead of 0 values :
      
      qdisc hfsc 1: root refcnt 6 default 20
       Sent 45141660 bytes 30545 pkt (dropped 0, overlimits 91751 requeues 0)
       rate 1492Kbit 126pps backlog 103226b 74p requeues 0
      ...
      class hfsc 1:20 parent 1:1 leaf 1201: rt m1 0bit d 0us m2 400000bit ls m1 0bit d 0us m2 200000bit
       Sent 49534912 bytes 33519 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 81822b 56p requeues 0
       period 23 work 49451576 bytes rtwork 13277552 bytes level 0
      ...
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: John A. Sullivan III <jsullivan@opensourcedevel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a59b73
  15. 21 1月, 2011 2 次提交
    • E
      net_sched: accurate bytes/packets stats/rates · 9190b3b3
      Eric Dumazet 提交于
      In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
      a problem with pfifo_head drops that incorrectly decreased
      sch->bstats.bytes and sch->bstats.packets
      
      Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
      previously enqueued packet, and bstats cannot be changed, so
      bstats/rates are not accurate (over estimated)
      
      This patch changes the qdisc_bstats updates to be done at dequeue() time
      instead of enqueue() time. bstats counters no longer account for dropped
      frames, and rates are more correct, since enqueue() bursts dont have
      effect on dequeue() rate.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9190b3b3
    • E
      net_sched: move TCQ_F_THROTTLED flag · fd245a4a
      Eric Dumazet 提交于
      In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
      ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
      the cache line containing qdisc lock and often dirtied fields.
      
      I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
      mostly, and shared by all cpus. This should speedup HTB/CBQ for example.
      
      Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
      "unsigned int" for __state container, reducing by 8 bytes Qdisc size.
      
      Introduce helpers to hide implementation details.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd245a4a
  16. 20 1月, 2011 1 次提交
  17. 11 1月, 2011 1 次提交
  18. 21 10月, 2010 1 次提交
  19. 02 9月, 2010 1 次提交
  20. 18 5月, 2010 2 次提交
  21. 07 10月, 2009 1 次提交
    • E
      pkt_sched: gen_estimator: Dont report fake rate estimators · d250a5f9
      Eric Dumazet 提交于
      Jarek Poplawski a écrit :
      >
      >
      > Hmm... So you made me to do some "real" work here, and guess what?:
      > there is one serious checkpatch warning! ;-) Plus, this new parameter
      > should be added to the function description. Otherwise:
      > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
      >
      > Thanks,
      > Jarek P.
      >
      > PS: I guess full "Don't" would show we really mean it...
      
      Okay :) Here is the last round, before the night !
      
      Thanks again
      
      [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
      
      We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
      is running.
      
      # tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
       rate 0bit 0pps backlog 0b 0p requeues 0
      
      User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
      one (because no estimator is active)
      
      After this patch, tc command output is :
      $ tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      
      We add a parameter to gnet_stats_copy_rate_est() function so that
      it can use gen_estimator_active(bstats, r), as suggested by Jarek.
      
      This parameter can be NULL if check is not necessary, (htb for
      example has a mandatory rate estimator)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d250a5f9
  22. 21 9月, 2009 1 次提交
  23. 06 9月, 2009 1 次提交
  24. 18 8月, 2009 1 次提交
  25. 09 6月, 2009 1 次提交
  26. 16 3月, 2009 1 次提交
    • J
      pkt_sched: Change misleading code in class delete. · 7cd0a638
      Jarek Poplawski 提交于
      While looking for a possible reason of bugzilla report on HTB oops:
      http://bugzilla.kernel.org/show_bug.cgi?id=12858
      I found the code in htb_delete calling htb_destroy_class on zero
      refcount is very misleading: it can suggest this is a common path, and
      destroy is called under sch_tree_lock. Actually, this can never happen
      like this because before deletion cops->get() is done, and after
      delete a class is still used by tclass_notify. The class destroy is
      always called from cops->put(), so without sch_tree_lock.
      
      This doesn't mean much now (since 2.6.27) because all vulnerable calls
      were moved from htb_destroy_class to htb_delete, but there was a bug
      in older kernels. The same change is done for other classful scheds,
      which, it seems, didn't have similar locking problems here.
      Reported-by: Nm0sia <m0sia@m0sia.ru>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cd0a638
  27. 01 2月, 2009 1 次提交
  28. 26 11月, 2008 2 次提交
  29. 20 11月, 2008 1 次提交
  30. 14 11月, 2008 1 次提交
  31. 31 10月, 2008 2 次提交