提交 · d5b8aa1d246fddfe4042be6f6eb169efa5cfbb94 · openeuler / raspberrypi-kernel

27 6月, 2011 1 次提交

net_sched: fix dequeuer fairness · d5b8aa1d

由 jamal 提交于 6月 26, 2011

Results on dummy device can be seen in my netconf 2011
slides. These results are for a 10Gige IXGBE intel
nic - on another i5 machine, very similar specs to
the one used in the netconf2011 results.
It turns out - this is a hell lot worse than dummy
and so this patch is even more beneficial for 10G.

Test setup:
----------

System under test sending packets out.
Additional box connected directly dropping packets.
Installed prio qdisc on the eth device and default
netdev default length of 1000 used as is.
The 3 prio bands each were set to 100 (didnt factor in
the results).

5 packet runs were made and the middle 3 picked.

results
-------

The "cpu" column indicates the which cpu the sample
was taken on,
The "Pkt runx" carries the number of packets a cpu
dequeued when forced to be in the "dequeuer" role.
The "avg" for each run is the number of times each
cpu should be a "dequeuer" if the system was fair.

3.0-rc4      (plain)
cpu         Pkt run1        Pkt run2        Pkt run3
================================================
cpu0        21853354        21598183        22199900
cpu1          431058          473476          393159
cpu2          481975          477529          458466
cpu3        23261406        23412299        22894315
avg         11506948        11490372        11486460

3.0-rc4 with patch and default weight 64
cpu 	     Pkt run1        Pkt run2        Pkt run3
================================================
cpu0        13205312        13109359        13132333
cpu1        10189914        10159127        10122270
cpu2        10213871        10124367        10168722
cpu3        13165760        13164767        13096705
avg         11693714        11639405        11630008

As you can see the system is still not perfect but
is a lot better than what it was before...

At the moment we use the old backlog weight, weight_p
which is 64 packets. It seems to be reasonably fine
with that value.
The system could be made more fair if we reduce the
weight_p (as per my presentation), but we are going
to affect the shared backlog weight. Unless deemed
necessary, I think the default value is fine. If not
we could add yet another knob.
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5b8aa1d

22 6月, 2011 2 次提交

ip: introduce ip_is_fragment helper inline function · 56f8a75c

由 Paul Gortmaker 提交于 6月 21, 2011

There are enough instances of this:

    iph->frag_off & htons(IP_MF | IP_OFFSET)

that a helper function is probably warranted.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56f8a75c

net: remove mm.h inclusion from netdevice.h · b7f080cf

由 Alexey Dobriyan 提交于 6月 16, 2011

Remove linux/mm.h inclusion from netdevice.h -- it's unused (I've checked manually).

To prevent mm.h inclusion via other channels also extract "enum dma_data_direction"
definition into separate header. This tiny piece is what gluing netdevice.h with mm.h
via "netdevice.h => dmaengine.h => dma-mapping.h => scatterlist.h => mm.h".
Removal of mm.h from scatterlist.h was tried and was found not feasible
on most archs, so the link was cutoff earlier.

Hope people are OK with tiny include file.

Note, that mm_types.h is still dragged in, but it is a separate story.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7f080cf

10 6月, 2011 1 次提交

rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679

由 Greg Rose 提交于 6月 10, 2011

The message size allocated for rtnl ifinfo dumps was limited to
a single page.  This is not enough for additional interface info
available with devices that support SR-IOV and caused a bug in
which VF info would not be displayed if more than approximately
40 VFs were created per interface.

Implement a new function pointer for the rtnl_register service that will
calculate the amount of data required for the ifinfo dump and allocate
enough data to satisfy the request.
Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c7ac8679

07 6月, 2011 2 次提交

net: remove interrupt.h inclusion from netdevice.h · a6b7a407

由 Alexey Dobriyan 提交于 6月 06, 2011

* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6b7a407

net: Rework netdev_drivername() to avoid warning. · 3019de12

由 David S. Miller 提交于 6月 06, 2011

This interface uses a temporary buffer, but for no real reason.
And now can generate warnings like:

net/sched/sch_generic.c: In function dev_watchdog
net/sched/sch_generic.c:254:10: warning: unused variable drivername

Just return driver->name directly or "".
Reported-by: NConnor Hansen <cmdkhh@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3019de12

26 5月, 2011 1 次提交

sch_sfq: fix peek() implementation · 07bd8df5

由 Eric Dumazet 提交于 5月 25, 2011

Since commit eeaeb068 (sch_sfq: allow big packets and be fair),
sfq_peek() can return a different skb that would be normally dequeued by
sfq_dequeue() [ if current slot->allot is negative ]

Use generic qdisc_peek_dequeued() instead of custom implementation, to
get consistent result.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jesper Dangaard Brouer <hawk@diku.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07bd8df5

24 5月, 2011 1 次提交

sch_sfq: avoid giving spurious NET_XMIT_CN signals · 8efa8854

由 Eric Dumazet 提交于 5月 23, 2011

While chasing a possible net_sched bug, I found that IP fragments have
litle chance to pass a congestioned SFQ qdisc :

- Say SFQ qdisc is full because one flow is non responsive.
- ip_fragment() wants to send two fragments belonging to an idle flow.
- sfq_enqueue() queues first packet, but see queue limit reached :
- sfq_enqueue() drops one packet from 'big consumer', and returns
NET_XMIT_CN.
- ip_fragment() cancel remaining fragments.

This patch restores fairness, making sure we return NET_XMIT_CN only if
we dropped a packet from the same flow.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Jamal Hadi Salim <hadi@cyberus.ca>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8efa8854

23 5月, 2011 1 次提交

net: avoid synchronize_rcu() in dev_deactivate_many · 3137663d

由 Eric Dumazet 提交于 5月 19, 2011

dev_deactivate_many() issues one synchronize_rcu() call after qdiscs set
to noop_qdisc.

This call is here to make sure they are no outstanding qdisc-less
dev_queue_xmit calls before returning to caller.

But in dismantle phase, we dont have to wait, because we wont activate
again the device, and we are going to wait one rcu grace period later in
rollback_registered_many().

After this patch, device dismantle uses one synchronize_net() and one
rcu_barrier() call only, so we have a ~30% speedup and a smaller RTNL
latency.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>,
CC: Ben Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3137663d

20 5月, 2011 2 次提交

networking: NET_CLS_ROUTE4 depends on INET · 034cfe48

由 Randy Dunlap 提交于 5月 19, 2011

IP_ROUTE_CLASSID depends on INET and NET_CLS_ROUTE4 selects
IP_ROUTE_CLASSID, but when INET is not enabled, this kconfig warning
is produced, so fix it by making NET_CLS_ROUTE4 depend on INET.

warning: (NET_CLS_ROUTE4) selects IP_ROUTE_CLASSID which has unmet direct dependencies (NET && INET)
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

034cfe48

pkt_sched: Kill set but unused variable 'protocol' in tc_classify() · f06cd54f

由 David S. Miller 提交于 5月 19, 2011

I checked the history and this has been like this since the
beginning of time.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f06cd54f

08 5月, 2011 3 次提交

net,act_police,rcu: remove rcu_barrier() · 75ef0368

由 Lai Jiangshan 提交于 3月 15, 2011

There is no callback of this module maybe queued
since we use kfree_rcu(), we can safely remove the rcu_barrier().
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

75ef0368

net,rcu: convert call_rcu(tcf_police_free_rcu) to kfree_rcu() · 5957b1ac

由 Lai Jiangshan 提交于 3月 15, 2011

[PATCH 05/17] net,rcu: convert call_rcu(tcf_police_free_rcu) to kfree_rcu()

The rcu callback tcf_police_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(tcf_police_free_rcu).
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

5957b1ac

net,rcu: convert call_rcu(tcf_common_free_rcu) to kfree_rcu() · f5c8593c

由 Lai Jiangshan 提交于 3月 15, 2011

The rcu callback tcf_common_free_rcu() just calls a kfree(),
so we use kfree_rcu() instead of the call_rcu(tcf_common_free_rcu).
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

f5c8593c

23 4月, 2011 1 次提交

inet: constify ip headers and in6_addr · b71d1d42

由 Eric Dumazet 提交于 4月 22, 2011

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71d1d42

05 4月, 2011 1 次提交

pkt_sched: QFQ - quick fair queue scheduler · 0545a303

由 stephen hemminger 提交于 4月 04, 2011

This is an implementation of the Quick Fair Queue scheduler developed
by Fabio Checconi. The same algorithm is already implemented in ipfw
in FreeBSD. Fabio had an earlier version developed on Linux, I just
cleaned it up. Thanks to Eric Dumazet for testing this under load.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0545a303

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

05 3月, 2011 1 次提交

ipv4: Remove flowi from struct rtable. · 5e2b61f7

由 David S. Miller 提交于 3月 04, 2011

The only necessary parts are the src/dst addresses, the
interface indexes, the TOS, and the mark.

The rest is unnecessary bloat, which amounts to nearly
50 bytes on 64-bit.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e2b61f7

04 3月, 2011 1 次提交

net_sched: reduce fifo qdisc size · d276055c

由 Eric Dumazet 提交于 3月 03, 2011

Because of various alignements [SLUB / qdisc], we use 512 bytes of
memory for one {p|b}fifo qdisc, instead of 256 bytes on 64bit arches and
192 bytes on 32bit ones.

Move the "u32 limit" inside "struct Qdisc" (no impact on other qdiscs)

Change qdisc_alloc(), first trying a regular allocation before an
oversized one.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d276055c

26 2月, 2011 1 次提交
- H
  sched: protocol only needed when CONFIG_NET_CLS_ACT is enabled · 52bc9747
  由 Hagen Paul Pfeifer 提交于 2月 25, 2011
```
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  52bc9747
25 2月, 2011 8 次提交

D
sch_netem: Need to include vmalloc.h · 78776d3f
由 David S. Miller 提交于 2月 24, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
78776d3f

sch_choke: add choke_skb_cb · 26f70e12

由 Eric Dumazet 提交于 2月 24, 2011

Better document choke skb->cb[] use, like we did in netem and sfb

This adds a compile time check to make sure we dont exhaust skb->cb[]
space.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26f70e12

netem: update version and cleanup · 250a65f7

由 stephen hemminger 提交于 2月 23, 2011

Get rid of debug message that are not useful, and enable
the log messages in case of error.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

250a65f7

netem: revised correlated loss generator · 661b7972

由 stephen hemminger 提交于 2月 23, 2011

This is a patch originated with Stefano Salsano and Fabio Ludovici.
It provides several alternative loss models for use with netem.
This patch adds two state machine based loss models.

See: http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLGSigned-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

661b7972

Revert "sch_netem: Remove classful functionality" · 10f6dfcf

由 stephen hemminger 提交于 2月 23, 2011

Many users have wanted the old functionality that was lost
to be able to use pfifo as inner qdisc for netem. The reason that
netem could not be classful with the older API was because of the
limitations of the old dequeue/requeue interface; now that qdisc API has
a peek function, there is no longer a problem with using any
inner qdisc's.

This reverts commit 02201464.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10f6dfcf

netem: define NETEM_DIST_MAX · df173bda

由 stephen hemminger 提交于 2月 23, 2011

Rather than magic constant in code, expose the maximum size of
packet distribution table in API. In iproute2, q_netem defines
MAX_DIST as 16K already.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df173bda

netem: use vmalloc for distribution table · 6373a9a2

由 stephen hemminger 提交于 2月 23, 2011

The netem probability table can be large (up to 64K bytes)
which may be too large to allocate in one contiguous chunk.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6373a9a2

netem: cleanup dump code · 861d7f74

由 stephen hemminger 提交于 2月 23, 2011

Use nla_put_nested to update netlink attribute value.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

861d7f74

24 2月, 2011 3 次提交

em_meta: fix sparse warning · e0c56310

由 stephen hemminger 提交于 2月 23, 2011

gfp_t needs to be cast to integer.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0c56310

mqprio: cleanups · ea18fd95

由 stephen hemminger 提交于 2月 23, 2011

* make qdisc_ops local
* add sparse annotation about expected unlock/unlock in dump_class_stats
* fix indentation
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea18fd95

net_sched: SFB flow scheduler · e13e02a3

由 Eric Dumazet 提交于 2月 23, 2011

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].p_mark += p_increment;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
    ratelimit();
else
    mark/drop with probabilty p_min;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $DEV parent 1:11 handle 11:  \
        est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
        flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e13e02a3

23 2月, 2011 1 次提交

cls_u32: fix sparse warnings · 86fce3ba

由 stephen hemminger 提交于 2月 20, 2011

The variable _data is used in asm-generic to define sections
which causes sparse warnings, so just rename the variable.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86fce3ba

21 2月, 2011 1 次提交

net: Fix more stale on-stack list_head objects. · 5f04d506

由 Eric W. Biederman 提交于 2月 20, 2011

From: Eric W. Biederman <ebiederm@xmission.com>

In the beginning with batching unreg_list was a list that was used only
once in the lifetime of a network device (I think). Now we have calls
using the unreg_list that can happen multiple times in the life of a
network device like dev_deactivate and dev_close that are also using the
unreg_list. In addition in unregister_netdevice_queue we also do a
list_move because for devices like veth pairs it is possible that
unregister_netdevice_queue will be called multiple times.

So I think the change below to fix dev_deactivate which Eric D. missed
will fix this problem. Now to go test that.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f04d506

15 2月, 2011 1 次提交

sch_mqprio: Always set num_tc to 0 in mqprio_destroy() · ac7100ba

由 Ben Hutchings 提交于 2月 14, 2011

All the cleanup code in mqprio_destroy() is currently conditional on
priv->qdiscs being non-null, but that condition should only apply to
the per-queue qdisc cleanup.  We should always set the number of
traffic classes back to 0 here.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>

ac7100ba

03 2月, 2011 3 次提交

D
sch_choke: Need linux/vmalloc.h · cdfb74d4
由 David S. Miller 提交于 2月 02, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
cdfb74d4

sched: CHOKe flow scheduler · 45e14433

由 stephen hemminger 提交于 2月 02, 2011

CHOKe ("CHOose and Kill" or "CHOose and Keep") is an alternative
packet scheduler based on the Random Exponential Drop (RED) algorithm.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth)
	     Queue the new packet
	else
	     Select randomly a packet from the queue
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation",
  Proceeding of INFOCOM'2000, March 2000.

Help from:
     Eric Dumazet <eric.dumazet@gmail.com>
     Patrick McHardy <kaber@trash.net>
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45e14433

sfq: deadlock in error path · 119b3d38

由 stephen hemminger 提交于 2月 02, 2011

The change to allow divisor to be a parameter (in 2.6.38-rc1)
 commit 817fb15d
introduced a possible deadlock caught by sparse.

The scheduler tree lock was left locked in the case of an incorrect
divisor value. Simplest fix is to move test outside of lock
which also solves problem of partial update.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

119b3d38

27 1月, 2011 1 次提交

net_sched: sch_mqprio: dont leak kernel memory · 144ce879

由 Eric Dumazet 提交于 1月 26, 2011

mqprio_dump() should make sure all fields of struct tc_mqprio_qopt are
initialized.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

144ce879

22 1月, 2011 1 次提交

net_sched: TCQ_F_CAN_BYPASS generalization · 23624935

由 Eric Dumazet 提交于 1月 21, 2011

Now qdisc stab is handled before TCQ_F_CAN_BYPASS test in
__dev_xmit_skb(), we can generalize TCQ_F_CAN_BYPASS to other qdiscs
than pfifo_fast : pfifo, bfifo, pfifo_head_drop and sfq

SFQ is special because it can have external classifiers, and in these
cases, we cannot bypass queue discipline (packet could be dropped by
classifier) without admin asking it, or further changes.

Its worth doing this, especially for SFQ, avoiding dirtying memory in
case no packets are already waiting in queue.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23624935

21 1月, 2011 1 次提交

net_sched: accurate bytes/packets stats/rates · 9190b3b3

由 Eric Dumazet 提交于 1月 20, 2011

In commit 44b82883 (net_sched: pfifo_head_drop problem), we fixed
a problem with pfifo_head drops that incorrectly decreased
sch->bstats.bytes and sch->bstats.packets

Several qdiscs (CHOKe, SFQ, pfifo_head, ...) are able to drop a
previously enqueued packet, and bstats cannot be changed, so
bstats/rates are not accurate (over estimated)

This patch changes the qdisc_bstats updates to be done at dequeue() time
instead of enqueue() time. bstats counters no longer account for dropped
frames, and rates are more correct, since enqueue() bursts dont have
effect on dequeue() rate.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9190b3b3