提交 · 928a759593d21ec184536bde0b4816d21bcd5a86 · openeuler / Kernel

18 5月, 2017 2 次提交

net: sched: introduce tcf block infractructure · 6529eaba

由 Jiri Pirko 提交于 5月 17, 2017

Currently, the filter chains are direcly put into the private structures
of qdiscs. In order to be able to have multiple chains per qdisc and to
allow filter chains sharing among qdiscs, there is a need for common
object that would hold the chains. This introduces such object and calls
it "tcf_block".

Helpers to get and put the blocks are provided to be called from
individual qdisc code. Also, the original filter_list pointers are left
in qdisc privs to allow the entry into tcf_block processing without any
added overhead of possible multiple pointer dereference on fast path.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6529eaba

net: sched: move tc_classify function to cls_api.c · 87d83093

由 Jiri Pirko 提交于 5月 17, 2017

Move tc_classify function to cls_api.c where it belongs, rename it to
fit the namespace.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87d83093

09 5月, 2017 1 次提交

treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68

由 Michal Hocko 提交于 5月 08, 2017

There are many code paths opencoding kvmalloc.  Let's use the helper
instead.  The main difference to kvmalloc is that those users are
usually not considering all the aspects of the memory allocator.  E.g.
allocation requests <= 32kB (with 4kB pages) are basically never failing
and invoke OOM killer to satisfy the allocation.  This sounds too
disruptive for something that has a reasonable fallback - the vmalloc.
On the other hand those requests might fallback to vmalloc even when the
memory allocator would succeed after several more reclaim/compaction
attempts previously.  There is no guarantee something like that happens
though.

This patch converts many of those places to kv[mz]alloc* helpers because
they are more conservative.

Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

752ade68

14 4月, 2017 1 次提交

netlink: pass extended ACK struct to parsing functions · fceb6435

由 Johannes Berg 提交于 4月 12, 2017

Pass the new extended ACK reporting struct to all of the generic
netlink parsing functions. For now, pass NULL in almost all callers
(except for some in the core.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fceb6435

17 3月, 2017 1 次提交

net/sched: fq_codel: Avoid set-but-unused variable · a5e6a3b0

由 Or Gerlitz 提交于 3月 16, 2017

The code introduced by commit 2ccccf5f ("net_sched: update
hierarchical backlog too") only sets prev_backlog in fq_codel_dequeue()
but not using that anywhere, remove that setting.

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5e6a3b0

11 2月, 2017 1 次提交

sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_api · cf1facda

由 Jiri Pirko 提交于 2月 09, 2017

Creation is done in this file, move destruction to be at the same place.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf1facda

21 1月, 2017 1 次提交

fq_codel: Avoid regenerating skb flow hash unless necessary · 264b87fa

由 Andrew Collins 提交于 1月 18, 2017

The fq_codel qdisc currently always regenerates the skb flow hash.
This wastes some cycles and prevents flow seperation in cases where
the traffic has been encrypted and can no longer be understood by the
flow dissector.

Change it to use the prexisting flow hash if one exists, and only
regenerate if necessary.
Signed-off-by: NAndrew Collins <acollins@cradlepoint.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

264b87fa

26 6月, 2016 2 次提交

net_sched: fq_codel: cache skb->truesize into skb->cb · 008830bc

由 Eric Dumazet 提交于 6月 21, 2016

Now we defer skb drops, it makes sense to keep a copy
of skb->truesize in struct codel_skb_cb to avoid one
cache line miss per dropped skb in fq_codel_drop(),
to reduce latencies a bit further.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

008830bc

net_sched: drop packets after root qdisc lock is released · 520ac30f

由 Eric Dumazet 提交于 6月 21, 2016

Qdisc performance suffers when packets are dropped at enqueue()
time because drops (kfree_skb()) are done while qdisc lock is held,
delaying a dequeue() draining the queue.

Nominal throughput can be reduced by 50 % when this happens,
at a time we would like the dequeue() to proceed as fast as possible.

Even FQ is vulnerable to this problem, while one of FQ goals was
to provide some flow isolation.

This patch adds a 'struct sk_buff **to_free' parameter to all
qdisc->enqueue(), and in qdisc_drop() helper.

I measured a performance increase of up to 12 %, but this patch
is a prereq so that future batches in enqueue() can fly.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

520ac30f

16 6月, 2016 1 次提交

net_sched: fq_codel: defer skb freeing · ece5d4c7

由 Eric Dumazet 提交于 6月 13, 2016

Both fq_codel_change() and fq_codel_reset() can use rtnl_kfree_skbs()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ece5d4c7

09 6月, 2016 1 次提交

sched: remove qdisc->drop · a09ceb0e

由 Florian Westphal 提交于 6月 09, 2016

after removal of TCA_CBQ_OVL_STRATEGY from cbq scheduler, there are no
more callers of ->drop() outside of other ->drop functions, i.e.
nothing calls them.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a09ceb0e

08 6月, 2016 3 次提交

net: sched: do not acquire qdisc spinlock in qdisc/class stats dump · edb09eb1

由 Eric Dumazet 提交于 6月 06, 2016

Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure. Not only the dumps take time, they also slow
down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases.

An audit of existing qdiscs showed that sch_fq_codel is the only qdisc
that might need the qdisc lock in fq_codel_dump_stats() and
fq_codel_dump_class_stats()

In v2 of this patch, I now use the Qdisc running seqcount to provide
consistent reads of packets/bytes counters, regardless of 32/64 bit arches.

I also changed rate estimators to use the same infrastructure
so that they no longer need to lock root qdisc lock.

[1]
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdfSigned-off-by: NEric Dumazet <edumazet@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kevin Athey <kda@google.com>
Cc: Xiaotian Pei <xiaotian@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edb09eb1

fq_codel: return non zero qlen in class dumps · aafddbf0

由 Eric Dumazet 提交于 6月 06, 2016

We properly scan the flow list to count number of packets,
but John passed 0 to gnet_stats_copy_queue() so we report
a zero value to user space instead of the result.

Fixes: 64015853 ("net: sched: restrict use of qstats qlen")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: John Fastabend <john.r.fastabend@intel.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aafddbf0

fq_codel: fix NET_XMIT_CN behavior · 80e509db

由 Eric Dumazet 提交于 6月 04, 2016

My prior attempt to fix the backlogs of parents failed.

If we return NET_XMIT_CN, our parents wont increase their backlog,
so our qdisc_tree_reduce_backlog() should take this into account.

v2: Florian Westphal pointed out that we could drop the packet,
so we need to save qdisc_pkt_len(skb) in a temp variable before
calling fq_codel_drop()

Fixes: 9d18562a ("fq_codel: add batch ability to fq_codel_drop()")
Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too")
Reported-by: NStas Nichiporovich <stasn77@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80e509db

17 5月, 2016 1 次提交

fq_codel: fix memory limitation drift · 77f57761

由 Eric Dumazet 提交于 5月 15, 2016

memory_usage must be decreased in dequeue_func(), not in
fq_codel_dequeue(), otherwise packets dropped by Codel algo
are missing this decrease.

Also we need to clear memory_usage in fq_codel_reset()

Fixes: 95b58430 ("fq_codel: add memory limitation per queue")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77f57761

09 5月, 2016 1 次提交

fq_codel: add memory limitation per queue · 95b58430

由 Eric Dumazet 提交于 5月 06, 2016

On small embedded routers, one wants to control maximal amount of
memory used by fq_codel, instead of controlling number of packets or
bytes, since GRO/TSO make these not practical.

Assuming skb->truesize is accurate, we have to keep track of
skb->truesize sum for skbs in queue.

This patch adds a new TCA_FQ_CODEL_MEMORY_LIMIT attribute.

I chose a default value of 32 MBytes, which looks reasonable even
for heavy duty usages. (Prior fq_codel users should not be hurt
when they upgrade their kernels)

Two fields are added to tc_fq_codel_qd_stats to report :
 - Current memory usage
 - Number of drops caused by memory limits

# tc qd replace dev eth1 root est 1sec 4sec fq_codel memory_limit 4M
..
# tc -s -d qd sh dev eth1
qdisc fq_codel 8008: root refcnt 257 limit 10240p flows 1024
 quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
 Sent 2083566791363 bytes 1376214889 pkt (dropped 4994406, overlimits 0
requeues 21705223)
 rate 9841Mbit 812549pps backlog 3906120b 376p requeues 21705223
  maxpacket 68130 drop_overlimit 4994406 new_flow_count 28855414
  ecn_mark 0 memory_used 4190048 drop_overmemory 4994406
  new_flows_len 1 old_flows_len 177
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Dave Täht <dave.taht@gmail.com>
Cc: Sebastian Möller <moeller0@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95b58430

04 5月, 2016 1 次提交

fq_codel: add batch ability to fq_codel_drop() · 9d18562a

由 Eric Dumazet 提交于 5月 01, 2016

In presence of inelastic flows and stress, we can call
fq_codel_drop() for every packet entering fq_codel qdisc.

fq_codel_drop() is quite expensive, as it does a linear scan
of 4 KB of memory to find a fat flow.
Once found, it drops the oldest packet of this flow.

Instead of dropping a single packet, try to drop 50% of the backlog
of this fat flow, with a configurable limit of 64 packets per round.

TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this
limit configurable.

With this strategy the 4 KB search is amortized to a single cache line
per drop [1], so fq_codel_drop() no longer appears at the top of kernel
profile in presence of few inelastic flows.

[1] Assuming a 64byte cache line, and 1024 buckets
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDave Taht <dave.taht@gmail.com>
Cc: Jonathan Morton <chromatix99@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Dave Taht
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d18562a

26 4月, 2016 2 次提交

codel: split into multiple files · d068ca2a

由 Michal Kazior 提交于 4月 22, 2016

It was impossible to include codel.h for the
purpose of having access to codel_params or
codel_vars structure definitions and using them
for embedding in other more complex structures.

This splits allows codel.h itself to be treated
like any other header file while codel_qdisc.h and
codel_impl.h contain function definitions with
logic that was previously in codel.h.

This copies over copyrights and doesn't involve
code changes other than adding a few additional
include directives to net/sched/sch*codel.c.
Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d068ca2a

codel: generalize the implementation · 79bdc4c8

由 Michal Kazior 提交于 4月 22, 2016

This strips out qdisc specific bits from the code
and makes it slightly more reusable. Codel will be
used by wireless/mac80211 in the future.
Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79bdc4c8

01 3月, 2016 1 次提交

net_sched: update hierarchical backlog too · 2ccccf5f

由 WANG Cong 提交于 2月 25, 2016

When the bottom qdisc decides to, for example, drop some packet,
it calls qdisc_tree_decrease_qlen() to update the queue length
for all its ancestors, we need to update the backlog too to
keep the stats on root qdisc accurate.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ccccf5f

28 8月, 2015 1 次提交

net: sched: consolidate tc_classify{,_compat} · 3b3ae880

由 Daniel Borkmann 提交于 8月 26, 2015

For classifiers getting invoked via tc_classify(), we always need an
extra function call into tc_classify_compat(), as both are being
exported as symbols and tc_classify() itself doesn't do much except
handling of reclassifications when tp->classify() returned with
TC_ACT_RECLASSIFY.

CBQ and ATM are the only qdiscs that directly call into tc_classify_compat(),
all others use tc_classify(). When tc actions are being configured
out in the kernel, tc_classify() effectively does nothing besides
delegating.

We could spare this layer and consolidate both functions. pktgen on
single CPU constantly pushing skbs directly into the netif_receive_skb()
path with a dummy classifier on ingress qdisc attached, improves
slightly from 22.3Mpps to 23.1Mpps.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b3ae880

03 8月, 2015 1 次提交

fq_codel: explicitly reset flows in ->reset() · 3d0e0af4

由 Eric Dumazet 提交于 7月 31, 2015

Alex reported the following crash when using fq_codel
with htb:

  crash> bt
  PID: 630839  TASK: ffff8823c990d280  CPU: 14  COMMAND: "tc"
   [... snip ...]
   #8 [ffff8820ceec17a0] page_fault at ffffffff8160a8c2
      [exception RIP: htb_qlen_notify+24]
      RIP: ffffffffa0841718  RSP: ffff8820ceec1858  RFLAGS: 00010282
      RAX: 0000000000000000  RBX: 0000000000000000  RCX: ffff88241747b400
      RDX: ffff88241747b408  RSI: 0000000000000000  RDI: ffff8811fb27d000
      RBP: ffff8820ceec1868   R8: ffff88120cdeff24   R9: ffff88120cdeff30
      R10: 0000000000000bd4  R11: ffffffffa0840919  R12: ffffffffa0843340
      R13: 0000000000000000  R14: 0000000000000001  R15: ffff8808dae5c2e8
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
   #9 [...] qdisc_tree_decrease_qlen at ffffffff81565375
  #10 [...] fq_codel_dequeue at ffffffffa084e0a0 [sch_fq_codel]
  #11 [...] fq_codel_reset at ffffffffa084e2f8 [sch_fq_codel]
  #12 [...] qdisc_destroy at ffffffff81560d2d
  #13 [...] htb_destroy_class at ffffffffa08408f8 [sch_htb]
  #14 [...] htb_put at ffffffffa084095c [sch_htb]
  #15 [...] tc_ctl_tclass at ffffffff815645a3
  #16 [...] rtnetlink_rcv_msg at ffffffff81552cb0
  [... snip ...]

As Jamal pointed out, there is actually no need to call dequeue
to purge the queued skb's in reset, data structures can be just
reset explicitly. Therefore, we reset everything except config's
and stats, so that we would have a fresh start after device flipping.

Fixes: 4b549a2e ("fq_codel: Fair Queue Codel AQM")
Reported-by: NAlex Gartrell <agartrell@fb.com>
Cc: Alex Gartrell <agartrell@fb.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
[xiyou.wangcong@gmail.com: added codel_vars_init() and qdisc_qstats_backlog_dec()]
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d0e0af4

16 7月, 2015 2 次提交

fq_codel: fix return value of fq_codel_drop() · c0afd9ce

由 WANG Cong 提交于 7月 14, 2015

The ->drop() is supposed to return the number of bytes it dropped,
however fq_codel_drop() returns the index of the flow where it drops
a packet from.

Fix this by introducing a helper to wrap fq_codel_drop().

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0afd9ce

fq_codel: fix a use-after-free · 052cbda4

由 WANG Cong 提交于 7月 13, 2015

Fixes: 25331d6c ("net: sched: implement qstat helper routines")
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

052cbda4

11 5月, 2015 1 次提交

codel: add ce_threshold attribute · 80ba92fa

由 Eric Dumazet 提交于 5月 08, 2015

For DCTCP or similar ECN based deployments on fabrics with shallow
buffers, hosts are responsible for a good part of the buffering.

This patch adds an optional ce_threshold to codel & fq_codel qdiscs,
so that DCTCP can have feedback from queuing in the host.

A DCTCP enabled egress port simply have a queue occupancy threshold
above which ECT packets get CE mark.

In codel language this translates to a sojourn time, so that one doesn't
have to worry about bytes or bandwidth but delays.

This makes the host an active participant in the health of the whole
network.

This also helps experimenting DCTCP in a setup without DCTCP compliant
fabric.

On following example, ce_threshold is set to 1ms, and we can see from
'ldelay xxx us' that TCP is not trying to go around the 5ms codel
target.

Queue has more capacity to absorb inelastic bursts (say from UDP
traffic), as queues are maintained to an optimal level.

lpaa23:~# ./tc -s -d qd sh dev eth1
qdisc mq 1: dev eth1 root
 Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961)
 backlog 3108242b 364p requeues 42961
qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503)
 rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503
  count 0 lastcount 0 ldelay 1.0ms drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384
qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186)
 rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186
  count 0 lastcount 0 ldelay 694us drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873
qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms
 Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554)
 rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554
  count 0 lastcount 0 ldelay 889us drop_next 0us
  maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780
...
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80ba92fa

04 5月, 2015 2 次提交

sched: Call skb_get_hash_perturb in sch_fq_codel · 342db221

由 Tom Herbert 提交于 5月 01, 2015

Call skb_get_hash_perturb instead of doing skb_flow_dissect and then
jhash by hand.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

342db221

codel: fix maxpacket/mtu confusion · a5d28090

由 Eric Dumazet 提交于 4月 30, 2015

Under presence of TSO/GSO/GRO packets, codel at low rates can be quite
useless. In following example, not a single packet was ever dropped,
while average delay in codel queue is ~100 ms !

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 134376498 bytes 88797 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 13626b 3p requeues 0
  count 0 lastcount 0 ldelay 96.9ms drop_next 0us
  maxpacket 9084 ecn_mark 0 drop_overlimit 0

This comes from a confusion of what should be the minimal backlog. It is
pretty clear it is not 64KB or whatever max GSO packet ever reached the
qdisc.

codel intent was to use MTU of the device.

After the fix, we finally drop some packets, and rtt/cwnd of my single
TCP flow are meeting our expectations.

qdisc codel 0: parent 1:12 limit 16000p target 5.0ms interval 100.0ms
 Sent 102798497 bytes 67912 pkt (dropped 1365, overlimits 0 requeues 0)
 backlog 6056b 3p requeues 0
  count 1 lastcount 1 ldelay 36.3ms drop_next 0us
  maxpacket 10598 ecn_mark 0 drop_overlimit 0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Dave Taht <dave.taht@gmail.com>
Cc: Van Jacobson <vanj@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5d28090

10 12月, 2014 1 次提交

net: fix suspicious rcu_dereference_check in net/sched/sch_fq_codel.c · 69204cf7

由 Valdis.Kletnieks@vt.edu 提交于 12月 09, 2014

commit 46e5da40 (net: qdisc: use rcu prefix and silence
 sparse warnings) triggers a spurious warning:

net/sched/sch_fq_codel.c:97 suspicious rcu_dereference_check() usage!

The code should be using the _bh variant of rcu_dereference.
Signed-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69204cf7

30 9月, 2014 3 次提交

net: sched: enable per cpu qstats · b0ab6f92

由 John Fastabend 提交于 9月 28, 2014

After previous patches to simplify qstats the qstats can be
made per cpu with a packed union in Qdisc struct.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0ab6f92

net: sched: restrict use of qstats qlen · 64015853

由 John Fastabend 提交于 9月 28, 2014

This removes the use of qstats->qlen variable from the classifiers
and makes it an explicit argument to gnet_stats_copy_queue().

The qlen represents the qdisc queue length and is packed into
the qstats at the last moment before passnig to user space. By
handling it explicitely we avoid, in the percpu stats case, having
to figure out which per_cpu variable to put it in.

It would probably be best to remove it from qstats completely
but qstats is a user space ABI and can't be broken. A future
patch could make an internal only qstats structure that would
avoid having to allocate an additional u32 variable on the
Qdisc struct. This would make the qstats struct 128bits instead
of 128+32.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64015853

net: sched: implement qstat helper routines · 25331d6c

由 John Fastabend 提交于 9月 28, 2014

This adds helpers to manipulate qstats logic and replaces locations
that touch the counters directly. This simplifies future patches
to push qstats onto per cpu counters.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25331d6c

14 9月, 2014 2 次提交

net: rcu-ify tcf_proto · 25d8c0d5

由 John Fastabend 提交于 9月 12, 2014

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25d8c0d5

net: rcu-ify tcf_proto · 80a735f7

由 John Fastabend 提交于 9月 12, 2014

rcu'ify tcf_proto this allows calling tc_classify() without holding
any locks. Updaters are protected by RTNL.

This patch prepares the core net_sched infrastracture for running
the classifier/action chains without holding the qdisc lock however
it does nothing to ensure cls_xxx and act_xxx types also work without
locking. Additional patches are required to address the fall out.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80a735f7

24 8月, 2014 1 次提交

net: use reciprocal_scale() helper · 8fc54f68

由 Daniel Borkmann 提交于 8月 23, 2014

Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fc54f68

05 6月, 2014 1 次提交

net: use the new API kvfree() · 4cb28970

由 WANG Cong 提交于 6月 02, 2014

It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cb28970

14 3月, 2014 1 次提交

net_sched: return nla_nest_end() instead of skb->len · d59b7d80

由 Yang Yingliang 提交于 3月 12, 2014

nla_nest_end() already has return skb->len, so replace
return skb->len with return nla_nest_end instead().
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d59b7d80

15 1月, 2014 1 次提交

net: replace macros net_random and net_srandom with direct calls to prandom · 63862b5b

由 Aruna-Hewapathirane 提交于 1月 11, 2014

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: NAruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63862b5b

30 3月, 2013 1 次提交

net: fq_codel: Fix off-by-one error · cd68ddd4

由 Vijay Subramanian 提交于 3月 28, 2013

Currently, we hold a max of sch->limit -1 number of packets instead of
sch->limit packets. Fix this off-by-one error.
Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd68ddd4

04 9月, 2012 1 次提交

fq_codel: dont reinit flow state · b379135c

由 Eric Dumazet 提交于 9月 01, 2012

When fq_codel builds a new flow, it should not reset codel state.

Codel algo needs to get previous values (lastcount, drop_next) to get
proper behavior.
Signed-off-by: NDave Taht <dave.taht@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDave Taht <dave.taht@bufferbloat.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b379135c

17 5月, 2012 1 次提交

fq_codel: should use qdisc backlog as threshold · 865ec552

由 Eric Dumazet 提交于 5月 16, 2012

codel_should_drop() logic allows a packet being not dropped if queue
size is under max packet size.

In fq_codel, we have two possible backlogs : The qdisc global one, and
the flow local one.

The meaningful one for codel_should_drop() should be the global backlog,
not the per flow one, so that thin flows can have a non zero drop/mark
probability.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

865ec552

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功