提交 · a73ed26bbae7327370c5bd298f07de78df9e3466 · openanolis / cloud-kernel

10 12月, 2011 1 次提交

sch_red: generalize accurate MAX_P support to RED/GRED/CHOKE · a73ed26b

由 Eric Dumazet 提交于 12月 09, 2011

Now RED uses a Q0.32 number to store max_p (max probability), allow
RED/GRED/CHOKE to use/report full resolution at config/dump time.

Old tc binaries are non aware of new attributes, and still set/get Plog.

New tc binary set/get both Plog and max_p for backward compatibility,
they display "probability value" if they get max_p from new kernels.

# tc -d  qdisc show dev ...
...
qdisc red 10: parent 1:1 limit 360Kb min 30Kb max 90Kb ecn ewma 5
probability 0.09 Scell_log 15

Make sure we avoid potential divides by 0 in reciprocal_value(), if
(max_th - min_th) is big.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a73ed26b

09 12月, 2011 1 次提交

sch_red: Adaptative RED AQM · 8af2a218

由 Eric Dumazet 提交于 12月 08, 2011

Adaptative RED AQM for linux, based on paper from Sally FLoyd,
Ramakrishna Gummadi, and Scott Shenker, August 2001 :

http://icir.org/floyd/papers/adaptiveRed.pdf

Goal of Adaptative RED is to make max_p a dynamic value between 1% and
50% to reach the target average queue : (max_th - min_th) / 2

Every 500 ms:
 if (avg > target and max_p <= 0.5)
  increase max_p : max_p += alpha;
 else if (avg < target and max_p >= 0.01)
  decrease max_p : max_p *= beta;

target :[min_th + 0.4*(min_th - max_th),
          min_th + 0.6*(min_th - max_th)].
alpha : min(0.01, max_p / 4)
beta : 0.9
max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa)

Changes against our RED implementation are :

max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32
fixed point number, to allow full range described in Adatative paper.

To deliver a random number, we now use a reciprocal divide (thats really
a multiply), but this operation is done once per marked/droped packet
when in RED_BETWEEN_TRESH window, so added cost (compared to previous
AND operation) is near zero.

dump operation gives current max_p value in a new TCA_RED_MAX_P
attribute.

Example on a 10Mbit link :

tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \
   limit 400000 min 30000 max 90000 avpkt 1000 \
   burst 55 ecn adaptative bandwidth 10Mbit

# tc -s -d qdisc show dev eth3
...
qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn
adaptative ewma 5 max_p=0.113335 Scell_log 15
 Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0)
 rate 9749Kbit 831pps backlog 72056b 16p requeues 0
  marked 1357 early 35 pdrop 0 other 0
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8af2a218

02 12月, 2011 1 次提交

sch_red: fix red_change · 1ee5fa1e

由 Eric Dumazet 提交于 12月 01, 2011

Le mercredi 30 novembre 2011 à 14:36 -0800, Stephen Hemminger a écrit :

> (Almost) nobody uses RED because they can't figure it out.
> According to Wikipedia, VJ says that:
>  "there are not one, but two bugs in classic RED."

RED is useful for high throughput routers, I doubt many linux machines
act as such devices.

I was considering adding Adaptative RED (Sally Floyd, Ramakrishna
Gummadi, Scott Shender), August 2001

In this version, maxp is dynamic (from 1% to 50%), and user only have to
setup min_th (target average queue size)
(max_th and wq (burst in linux RED) are automatically setup)

By the way it seems we have a small bug in red_change()

if (skb_queue_empty(&sch->q))
	red_end_of_idle_period(&q->parms);

First, if queue is empty, we should call
red_start_of_idle_period(&q->parms);

Second, since we dont use anymore sch->q, but q->qdisc, the test is
meaningless.

Oh well...

[PATCH] sch_red: fix red_change()

Now RED is classful, we must check q->qdisc->q.qlen, and if queue is empty,
we start an idle period, not end it.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ee5fa1e

21 1月, 2011 1 次提交

net_sched: accurate bytes/packets stats/rates · 9190b3b3

由 Eric Dumazet 提交于 1月 20, 2011

In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9190b3b3

20 1月, 2011 1 次提交

net_sched: cleanups · cc7ec456

由 Eric Dumazet 提交于 1月 19, 2011

Cleanup net/sched code to current CodingStyle and practices.

Reduce inline abuse
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc7ec456

11 1月, 2011 1 次提交

net_sched: factorize qdisc stats handling · bfe0d029

由 Eric Dumazet 提交于 1月 09, 2011

HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.

They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets

Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.

Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfe0d029

04 1月, 2011 1 次提交

sch_red: report backlog information · 0dfb33a0

由 Eric Dumazet 提交于 1月 03, 2011

Provide child qdisc backlog (byte count) information so that "tc -s
qdisc" can report it to user.

packet count is already correctly provided.

qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn
 Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0)
 rate 242385Kbit 13630pps backlog 13560b 8p requeues 0
  marked 7865 early 1 pdrop 7 other 0
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dfb33a0

18 5月, 2010 1 次提交

net: Remove unnecessary returns from void function()s · 3fa21e07

由 Joe Perches 提交于 5月 17, 2010

This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fa21e07

06 9月, 2009 3 次提交

net_sched: remove some unnecessary checks in classful schedulers · 5b9a9ccf

由 Patrick McHardy 提交于 9月 04, 2009

The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all
originate from either ->get() or ->walk() and are always valid.

Remove unnecessary checks.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b9a9ccf

net_sched: make cls_ops->change and cls_ops->delete optional · de6d5cdf

由 Patrick McHardy 提交于 9月 04, 2009

Some schedulers don't support creating, changing or deleting classes.
Make the respective callbacks optionally and consistently return
-EOPNOTSUPP for unsupported operations, instead of currently either
-EOPNOTSUPP, -ENOSYS or no error.

In case of sch_prio and sch_multiq, the removed operations additionally
checked for an invalid class. This is not necessary since the class
argument can only orginate from ->get() or in case of ->change is 0
for creation of new classes, in which case ->change() incorrectly
returned -ENOENT.

As a side-effect, this patch fixes a possible (root-only) NULL pointer
function call in sch_ingress, which didn't implement a so far mandatory
->delete() operation.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de6d5cdf

net_sched: make cls_ops->tcf_chain() optional · 71ebe5e9

由 Patrick McHardy 提交于 9月 04, 2009

Some qdiscs don't support attaching filters. Handle this centrally in
cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71ebe5e9

20 11月, 2008 1 次提交

pkt_sched: remove unnecessary xchg() in packet schedulers · b94c8afc

由 Patrick McHardy 提交于 11月 20, 2008

The use of xchg() hasn't been necessary since 2.2.something when proper
locking was added to packet schedulers.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b94c8afc

14 11月, 2008 1 次提交

pkt_sched: Remove qdisc->ops->requeue() etc. · f30ab418

由 Jarek Poplawski 提交于 11月 13, 2008

After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.

The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f30ab418

31 10月, 2008 1 次提交

pkt_sched: Add qdisc->ops->peek() implementation. · 8e3af978

由 Jarek Poplawski 提交于 10月 31, 2008

Add qdisc->ops->peek() implementation for work-conserving qdiscs.
With feedback from Patrick McHardy.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e3af978

05 8月, 2008 1 次提交

net_sched: Add qdisc __NET_XMIT_STOLEN flag · 378a2f09

由 Jarek Poplawski 提交于 8月 04, 2008

Patrick McHardy <kaber@trash.net> noticed:
"The other problem that affects all qdiscs supporting actions is
TC_ACT_QUEUED/TC_ACT_STOLEN getting mapped to NET_XMIT_SUCCESS
even though the packet is not queued, corrupting upper qdiscs'
qlen counters."

and later explained:
"The reason why it translates it at all seems to be to not increase
the drops counter. Within a single qdisc this could be avoided by
other means easily, upper qdiscs would still increase the counter
when we return anything besides NET_XMIT_SUCCESS though.

This means we need a new NET_XMIT return value to indicate this to
the upper qdiscs. So I'd suggest to introduce NET_XMIT_STOLEN,
return that to upper qdiscs and translate it to NET_XMIT_SUCCESS
in dev_queue_xmit, similar to NET_XMIT_BYPASS."

David Miller <davem@davemloft.net> noticed:
"Maybe these NET_XMIT_* values being passed around should be a set of
bits. They could be composed of base meanings, combined with specific
attributes.

So you could say "NET_XMIT_DROP | __NET_XMIT_NO_DROP_COUNT"

The attributes get masked out by the top-level ->enqueue() caller,
such that the base meanings are the only thing that make their
way up into the stack. If it's only about communication within the
qdisc tree, let's simply code it that way."

This patch is trying to realize these ideas.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

378a2f09

20 7月, 2008 2 次提交

J
net_sched: Add accessor function for packet length for qdiscs · 0abf77e5
由 Jussi Kivilinna 提交于 7月 20, 2008
```
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
0abf77e5

net_sched: Add qdisc_enqueue wrapper · 5f86173b

由 Jussi Kivilinna 提交于 7月 20, 2008

Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f86173b

06 7月, 2008 1 次提交

net-sched: consolidate default fifo qdisc setup · fb0305ce

由 Patrick McHardy 提交于 7月 05, 2008

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb0305ce

04 6月, 2008 1 次提交

netlink: Improve returned error codes · bc3ed28c

由 Thomas Graf 提交于 6月 03, 2008

Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.

Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc3ed28c

29 1月, 2008 4 次提交

P
[NET_SCHED]: Use nla_policy for attribute validation in packet schedulers · 27a3421e
由 Patrick McHardy 提交于 1月 23, 2008
```
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
27a3421e

[NET_SCHED]: Propagate nla_parse return value · cee63723

由 Patrick McHardy 提交于 1月 23, 2008

nla_parse() returns more detailed errno codes, propagate them back on
error.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cee63723

[NET_SCHED]: Convert packet schedulers from rtnetlink to new netlink API · 1e90474c

由 Patrick McHardy 提交于 1月 22, 2008

Convert packet schedulers to use the netlink API. Unfortunately a gradual
conversion is not possible without breaking compilation in the middle or
adding lots of casts, so this patch converts them all in one step. The
patch has been mostly generated automatically with some minor edits to
at least allow seperate conversion of classifiers and actions.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e90474c

[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. · 20fea08b

由 Eric Dumazet 提交于 11月 14, 2007

Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20fea08b

11 7月, 2007 1 次提交

[NET_SCHED]: Remove unnecessary includes · 0ba48053

由 Patrick McHardy 提交于 7月 02, 2007

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ba48053

11 2月, 2007 1 次提交

[NET] SCHED: Fix whitespace errors. · 10297b99

由 YOSHIFUJI Hideaki 提交于 2月 09, 2007

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10297b99

03 12月, 2006 2 次提交

[NET_SCHED]: Fix endless loops (part 2): "simple" qdiscs · 5e50da01

由 Patrick McHardy 提交于 11月 29, 2006

Convert the "simple" qdiscs to use qdisc_tree_decrease_qlen() where
necessary:

- all graft operations
- destruction of old child qdiscs in prio, red and tbf change operation
- purging of queue in sfq change operation
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e50da01

[NET_SCHED]: Set parent classid in default qdiscs · 9f9afec4

由 Patrick McHardy 提交于 11月 29, 2006

Set parent classids in default qdiscs to allow walking up the tree
from outside the qdiscs. This is needed by the next patch.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f9afec4

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

21 3月, 2006 1 次提交

[PKT_SCHED]: Convert sch_red to a classful qdisc · f38c39d6

由 Patrick McHardy 提交于 3月 20, 2006

Convert sch_red to a classful qdisc. All qdiscs that maintain accurate
backlog counters are eligible as child qdiscs. When a queue limit larger
than zero is given, a bfifo qdisc is used for backwards compatibility.
Current versions of tc enforce a limit larger than zero, other users
can avoid creating the default qdisc by using zero.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38c39d6

06 11月, 2005 5 次提交

[PKT_SCHED]: (G)RED: Introduce hard dropping · bdc450a0

由 Thomas Graf 提交于 11月 05, 2005

Introduces a new flag TC_RED_HARDDROP which specifies that if ECN
marking is enabled packets should still be dropped once the
average queue length exceeds the maximum threshold.

This _may_ help to avoid global synchronisation during small
bursts of peers advertising but not caring about ECN. Use this
option very carefully, it does more harm than good if
(qth_max - qth_min) does not cover at least two average burst
cycles.

The difference to the current behaviour, in which we'd run into
the hard queue limit, is that due to the low pass filter of RED
short bursts are less likely to cause a global synchronisation.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>

bdc450a0

[PKT_SCHED]: RED: Cleanup and remove unnecessary code · dba051f3

由 Thomas Graf 提交于 11月 05, 2005

Removes the skb trimming code which is not needed since we never
touch the skb upon failure. Removes unnecessary includes,
initializers, and simplifies the code a bit. Removes Jamal's
obsolete email addresses upon his own request.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>

dba051f3

[PKT_SCHED]: RED: Dont start idle periods while already idling · 6a1b63d4

由 Thomas Graf 提交于 11月 05, 2005

We should not interrupt and restart an idle period while idling already.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>

6a1b63d4

T
[PKT_SCHED]: RED: Use generic queue management interface · 9e178ff2
由 Thomas Graf 提交于 11月 05, 2005
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
```
9e178ff2

[PKT_SCHED]: RED: Use new generic red interface · 6b31b28a

由 Thomas Graf 提交于 11月 05, 2005

Simplifies code a lot by separating the red algorithm and the
queueing logic. We now differentiate between probability marks
and forced marks but sum them together again to not break
backwards compatibility.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>

6b31b28a

09 7月, 2005 1 次提交

[NET]: Transform skb_queue_len() binary tests into skb_queue_empty() · b03efcfb

由 David S. Miller 提交于 7月 08, 2005

This is part of the grand scheme to eliminate the qlen
member of skb_queue_head, and subsequently remove the
'list' member of sk_buff.

Most users of skb_queue_len() want to know if the queue is
empty or not, and that's trivially done with skb_queue_empty()
which doesn't use the skb_queue_head->qlen member and instead
uses the queue list emptyness as the test.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b03efcfb

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功