1. 30 9月, 2014 1 次提交
    • J
      net: sched: make bstats per cpu and estimator RCU safe · 22e0f8b9
      John Fastabend 提交于
      In order to run qdisc's without locking statistics and estimators
      need to be handled correctly.
      
      To resolve bstats make the statistics per cpu. And because this is
      only needed for qdiscs that are running without locks which is not
      the case for most qdiscs in the near future only create percpu
      stats when qdiscs set the TCQ_F_CPUSTATS flag.
      
      Next because estimators use the bstats to calculate packets per
      second and bytes per second the estimator code paths are updated
      to use the per cpu statistics.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22e0f8b9
  2. 14 9月, 2014 2 次提交
  3. 14 3月, 2014 1 次提交
  4. 11 6月, 2013 1 次提交
    • E
      net_sched: add 64bit rate estimators · 45203a3b
      Eric Dumazet 提交于
      struct gnet_stats_rate_est contains u32 fields, so the bytes per second
      field can wrap at 34360Mbit.
      
      Add a new gnet_stats_rate_est64 structure to get 64bit bps/pps fields,
      and switch the kernel to use this structure natively.
      
      This structure is dumped to user space as a new attribute :
      
      TCA_STATS_RATE_EST64
      
      Old tc command will now display the capped bps (to 34360Mbit), instead
      of wrapped values, and updated tc command will display correct
      information.
      
      Old tc command output, after patch :
      
      eric:~# tc -s -d qd sh dev lo
      qdisc pfifo 8001: root refcnt 2 limit 1000p
       Sent 80868245400 bytes 1978837 pkt (dropped 0, overlimits 0 requeues 0)
       rate 34360Mbit 189696pps backlog 0b 0p requeues 0
      
      This patch carefully reorganizes "struct Qdisc" layout to get optimal
      performance on SMP.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45203a3b
  5. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  6. 11 5月, 2012 1 次提交
  7. 02 4月, 2012 1 次提交
  8. 24 12月, 2011 1 次提交
    • E
      sch_hfsc: report backlog information · f5a59b73
      Eric Dumazet 提交于
      Add backlog (byte count) information in hfsc classes and qdisc, so that
      "tc -s" can report it to user, instead of 0 values :
      
      qdisc hfsc 1: root refcnt 6 default 20
       Sent 45141660 bytes 30545 pkt (dropped 0, overlimits 91751 requeues 0)
       rate 1492Kbit 126pps backlog 103226b 74p requeues 0
      ...
      class hfsc 1:20 parent 1:1 leaf 1201: rt m1 0bit d 0us m2 400000bit ls m1 0bit d 0us m2 200000bit
       Sent 49534912 bytes 33519 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 81822b 56p requeues 0
       period 23 work 49451576 bytes rtwork 13277552 bytes level 0
      ...
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: John A. Sullivan III <jsullivan@opensourcedevel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5a59b73
  9. 21 1月, 2011 2 次提交
    • E
      net_sched: accurate bytes/packets stats/rates · 9190b3b3
      Eric Dumazet 提交于
      In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
      a problem with pfifo_head drops that incorrectly decreased
      sch->bstats.bytes and sch->bstats.packets
      
      Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
      previously enqueued packet, and bstats cannot be changed, so
      bstats/rates are not accurate (over estimated)
      
      This patch changes the qdisc_bstats updates to be done at dequeue() time
      instead of enqueue() time. bstats counters no longer account for dropped
      frames, and rates are more correct, since enqueue() bursts dont have
      effect on dequeue() rate.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9190b3b3
    • E
      net_sched: move TCQ_F_THROTTLED flag · fd245a4a
      Eric Dumazet 提交于
      In commit 37112105 (net: QDISC_STATE_RUNNING dont need atomic bit
      ops) I moved QDISC_STATE_RUNNING flag to __state container, located in
      the cache line containing qdisc lock and often dirtied fields.
      
      I now move TCQ_F_THROTTLED bit too, so that we let first cache line read
      mostly, and shared by all cpus. This should speedup HTB/CBQ for example.
      
      Not using test_bit()/__clear_bit()/__test_and_set_bit allows to use an
      "unsigned int" for __state container, reducing by 8 bytes Qdisc size.
      
      Introduce helpers to hide implementation details.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Jesper Dangaard Brouer <hawk@diku.dk>
      CC: Jarek Poplawski <jarkao2@gmail.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd245a4a
  10. 20 1月, 2011 1 次提交
  11. 11 1月, 2011 1 次提交
  12. 21 10月, 2010 1 次提交
  13. 02 9月, 2010 1 次提交
  14. 18 5月, 2010 2 次提交
  15. 07 10月, 2009 1 次提交
    • E
      pkt_sched: gen_estimator: Dont report fake rate estimators · d250a5f9
      Eric Dumazet 提交于
      Jarek Poplawski a écrit :
      >
      >
      > Hmm... So you made me to do some "real" work here, and guess what?:
      > there is one serious checkpatch warning! ;-) Plus, this new parameter
      > should be added to the function description. Otherwise:
      > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
      >
      > Thanks,
      > Jarek P.
      >
      > PS: I guess full "Don't" would show we really mean it...
      
      Okay :) Here is the last round, before the night !
      
      Thanks again
      
      [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators
      
      We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator
      is running.
      
      # tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0)
       rate 0bit 0pps backlog 0b 0p requeues 0
      
      User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake
      one (because no estimator is active)
      
      After this patch, tc command output is :
      $ tc -s -d qdisc
      qdisc pfifo_fast 0: dev eth0 root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
       Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0)
       backlog 0b 0p requeues 0
      
      We add a parameter to gnet_stats_copy_rate_est() function so that
      it can use gen_estimator_active(bstats, r), as suggested by Jarek.
      
      This parameter can be NULL if check is not necessary, (htb for
      example has a mandatory rate estimator)
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d250a5f9
  16. 21 9月, 2009 1 次提交
  17. 06 9月, 2009 1 次提交
  18. 18 8月, 2009 1 次提交
  19. 09 6月, 2009 1 次提交
  20. 16 3月, 2009 1 次提交
    • J
      pkt_sched: Change misleading code in class delete. · 7cd0a638
      Jarek Poplawski 提交于
      While looking for a possible reason of bugzilla report on HTB oops:
      http://bugzilla.kernel.org/show_bug.cgi?id=12858
      I found the code in htb_delete calling htb_destroy_class on zero
      refcount is very misleading: it can suggest this is a common path, and
      destroy is called under sch_tree_lock. Actually, this can never happen
      like this because before deletion cops->get() is done, and after
      delete a class is still used by tclass_notify. The class destroy is
      always called from cops->put(), so without sch_tree_lock.
      
      This doesn't mean much now (since 2.6.27) because all vulnerable calls
      were moved from htb_destroy_class to htb_delete, but there was a bug
      in older kernels. The same change is done for other classful scheds,
      which, it seems, didn't have similar locking problems here.
      Reported-by: Nm0sia <m0sia@m0sia.ru>
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cd0a638
  21. 01 2月, 2009 1 次提交
  22. 26 11月, 2008 2 次提交
  23. 20 11月, 2008 1 次提交
  24. 14 11月, 2008 1 次提交
  25. 31 10月, 2008 2 次提交
  26. 27 8月, 2008 1 次提交
  27. 05 8月, 2008 2 次提交
    • J
      net_sched: Add qdisc __NET_XMIT_BYPASS flag · c27f339a
      Jarek Poplawski 提交于
      Patrick McHardy <kaber@trash.net> noticed that it would be nice to
      handle NET_XMIT_BYPASS by NET_XMIT_SUCCESS with an internal qdisc flag
      __NET_XMIT_BYPASS and to remove the mapping from dev_queue_xmit().
      
      David Miller <davem@davemloft.net> spotted a serious bug in the first
      version of this patch.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c27f339a
    • J
      net_sched: Add qdisc __NET_XMIT_STOLEN flag · 378a2f09
      Jarek Poplawski 提交于
      Patrick McHardy <kaber@trash.net> noticed:
      "The other problem that affects all qdiscs supporting actions is
      TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
      even though the packet is not queued, corrupting upper qdiscs'
      qlen counters."
      
      and later explained:
      "The reason why it translates it at all seems to be to not increase
      the drops counter. Within a single qdisc this could be avoided by
      other means easily, upper qdiscs would still increase the counter
      when we return anything besides NET_XMIT_SUCCESS though.
      
      This means we need a new NET_XMIT return value to indicate this to
      the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
      return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
      in dev_queue_xmit, similar to NET_XMIT_BYPASS."
      
      David Miller <davem@davemloft.net> noticed:
      "Maybe these NET_XMIT_* values being passed around should be a set of
      bits. They could be composed of base meanings, combined with specific
      attributes.
      
      So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"
      
      The attributes get masked out by the top-level ->enqueue() caller,
      such that the base meanings are the only thing that make their
      way up into the stack. If it's only about communication within the
      qdisc tree, let's simply code it that way."
      
      This patch is trying to realize these ideas.
      Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      378a2f09
  28. 20 7月, 2008 2 次提交
  29. 18 7月, 2008 1 次提交
  30. 09 7月, 2008 3 次提交
  31. 06 7月, 2008 1 次提交