- 05 7月, 2017 1 次提交
-
-
由 Reshetova, Elena 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 4月, 2017 1 次提交
-
-
由 WANG Cong 提交于
Dmitry reported a crash when injecting faults in attach_one_default_qdisc() and dev->qdisc is still a noop_disc, the check before qdisc_hash_add() fails to catch it because it tests NULL. We should test against noop_qdisc since it is the default qdisc at this point. Fixes: 59cc1f61 ("net: sched: convert qdisc linked list to hashtable") Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 3月, 2017 1 次提交
-
-
由 Jiri Kosina 提交于
The original reason [1] for having hidden qdiscs (potential scalability issues in qdisc_match_from_root() with single linked list in case of large amount of qdiscs) has been invalidated by 59cc1f61 ("net: sched: convert qdisc linked list to hashtable"). This allows us for bringing more clarity and determinism into the dump by making default pfifo qdiscs visible. We're not turning this on by default though, at it was deemed [2] too intrusive / unnecessary change of default behavior towards userspace. Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows applications to request complete qdisc hierarchy dump, including the ones that have always been implicit/invisible. Singleton noop_qdisc stays invisible, as teaching the whole infrastructure about singletons would require quite some surgery with very little gain (seeing no qdisc or seeing noop qdisc in the dump is probably setting the same user expectation). [1] http://lkml.kernel.org/r/1460732328.10638.74.camel@edumazet-glaptop3.roam.corp.google.com [2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.davem@davemloft.netSigned-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 12月, 2016 1 次提交
-
-
由 Matthias Tafelmeier 提交于
Oftenly, introducing side effects on packet processing on the other half of the stack by adjusting one of TX/RX via sysctl is not desirable. There are cases of demand for asymmetric, orthogonal configurability. This holds true especially for nodes where RPS for RFS usage on top is configured and therefore use the 'old dev_weight'. This is quite a common base configuration setup nowadays, even with NICs of superior processing support (e.g. aRFS). A good example use case are nodes acting as noSQL data bases with a large number of tiny requests and rather fewer but large packets as responses. It's affordable to have large budget and rx dev_weights for the requests. But as a side effect having this large a number on TX processed in one run can overwhelm drivers. This patch therefore introduces an independent configurability via sysctl to userland. Signed-off-by: NMatthias Tafelmeier <matthias.tafelmeier@gmx.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
1) Old code was hard to maintain, due to complex lock chains. (We probably will be able to remove some kfree_rcu() in callers) 2) Using a single timer to update all estimators does not scale. 3) Code was buggy on 32bit kernel (WRITE_ONCE() on 64bit quantity is not supposed to work well) In this rewrite : - I removed the RB tree that had to be scanned in gen_estimator_active(). qdisc dumps should be much faster. - Each estimator has its own timer. - Estimations are maintained in net_rate_estimator structure, instead of dirtying the qdisc. Minor, but part of the simplification. - Reading the estimator uses RCU and a seqcount to provide proper support for 32bit kernels. - We reduce memory need when estimators are not used, since we store a pointer, instead of the bytes/packets counters. - xt_rateest_mt() no longer has to grab a spinlock. (In the future, xt_rateest_tg() could be switched to per cpu counters) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 9月, 2016 3 次提交
-
-
由 Florian Westphal 提交于
This change replaces sk_buff_head struct in Qdiscs with new qdisc_skb_head. Its similar to the skb_buff_head api, but does not use skb->prev pointers. Qdiscs will commonly enqueue at the tail of a list and dequeue at head. While skb_buff_head works fine for this, enqueue/dequeue needs to also adjust the prev pointer of next element. The ->prev pointer is not required for qdiscs so we can just leave it undefined and avoid one cacheline write access for en/dequeue. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Moves qdisc stat accouting to qdisc_dequeue_head. The only direct caller of the __qdisc_dequeue_head version open-codes this now. This allows us to later use __qdisc_dequeue_head as a replacement of __skb_dequeue() (which operates on sk_buff_head list). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
A followup change will replace the sk_buff_head in the qdisc struct with a slightly different list. Use of the sk_buff_head helpers will thus cause compiler warnings. Open-code these accesses in an extra change to ease review. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 8月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Should qdisc_alloc() fail, we must release the module refcount we got right before. Fixes: 6da7c8fc ("qdisc: allow setting default queuing discipline") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 8月, 2016 1 次提交
-
-
由 Jiri Kosina 提交于
Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2016 2 次提交
-
-
由 Eric Dumazet 提交于
When qdisc bulk dequeue was added in linux-3.18 (commit 5772e9a3 "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"), it was constrained to some specific qdiscs. With some extra care, we can extend this to all qdiscs, so that typical traffic shaping solutions can benefit from small batches (8 packets in this patch). For example, HTB is often used on some multi queue device. And bonding/team are multi queue devices... Idea is to bulk-dequeue packets mapping to the same transmit queue. This brings between 35 and 80 % performance increase in HTB setup under pressure on a bonding setup : 1) NUMA node contention : 610,000 pps -> 1,110,000 pps 2) No node contention : 1,380,000 pps -> 1,930,000 pps Now we should work to add batches on the enqueue() side ;) Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Qdisc performance suffers when packets are dropped at enqueue() time because drops (kfree_skb()) are done while qdisc lock is held, delaying a dequeue() draining the queue. Nominal throughput can be reduced by 50 % when this happens, at a time we would like the dequeue() to proceed as fast as possible. Even FQ is vulnerable to this problem, while one of FQ goals was to provide some flow isolation. This patch adds a 'struct sk_buff **to_free' parameter to all qdisc->enqueue(), and in qdisc_drop() helper. I measured a performance increase of up to 12 %, but this patch is a prereq so that future batches in enqueue() can fly. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 6月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
qdisc are changed under RTNL protection and often while blocking BH and root qdisc spinlock. When lots of skbs need to be dropped, we free them under these locks causing TX/RX freezes, and more generally latency spikes. This commit adds rtnl_kfree_skbs(), used to queue skbs for deferred freeing. Actual freeing happens right after RTNL is released, with appropriate scheduling points. rtnl_qdisc_drop() can also be used in place of disc_drop() when RTNL is held. qdisc_reset_queue() and __qdisc_reset_queue() get the new behavior, so standard qdiscs like pfifo, pfifo_fast... have their ->reset() method automatically handled. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 6月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
1) qdisc_run_begin() is really using the equivalent of a trylock. Instead of using write_seqcount_begin(), use a combination of raw_write_seqcount_begin() and correct lockdep annotation. 2) sch_direct_xmit() should use regular spin_lock(root_lock) Fixes: f9eb8aea ("net_sched: transform qdisc running bit into a seqcount") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2016 1 次提交
-
-
由 WANG Cong 提交于
For gso_skb we only update qlen, backlog should be updated too. Note, it is correct to just update these stats at one layer, because the gso_skb is cached there. Reported-by: NStas Nichiporovich <stasn77@gmail.com> Fixes: 2ccccf5f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 5月, 2016 2 次提交
-
-
由 Florian Westphal 提交于
previous patches removed all direct accesses to dev->trans_start, so change the netif_trans_update helper to update trans_start of netdev queue 0 instead and then remove trans_start from struct net_device. AFAICS a lot of the netif_trans_update() invocations are now useless because they occur in ndo_start_xmit and driver doesn't set LLTX (i.e. stack already took care of the update). As I can't test any of them it seems better to just leave them alone. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
Replace all trans_start updates with netif_trans_update helper. change was done via spatch: struct net_device *d; @@ - d->trans_start = jiffies + netif_trans_update(d) Compile tested only. Cc: user-mode-linux-devel@lists.sourceforge.net Cc: linux-xtensa@linux-xtensa.org Cc: linux1394-devel@lists.sourceforge.net Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: MPT-FusionLinux.pdl@broadcom.com Cc: linux-scsi@vger.kernel.org Cc: linux-can@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-omap@vger.kernel.org Cc: linux-hams@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: devel@driverdev.osuosl.org Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: linux-bluetooth@vger.kernel.org Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NFelipe Balbi <felipe.balbi@linux.intel.com> Acked-by: NMugunthan V N <mugunthanvnm@ti.com> Acked-by: NAntonio Quartulli <a@unstable.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 4月, 2016 1 次提交
-
-
由 Florian Westphal 提交于
No more users in the tree, remove NETDEV_TX_LOCKED support. Adds another hole in softnet_stats struct, but better than keeping the unused collision counter around. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 4月, 2016 1 次提交
-
-
由 Lars Persson 提交于
A failure in validate_xmit_skb_list() triggered an unconditional call to dev_requeue_skb with skb=NULL. This slowly grows the queue discipline's qlen count until all traffic through the queue stops. We take the optimistic approach and continue running the queue after a failure since it is unknown if later packets also will fail in the validate path. Fixes: 55a93b3e ("qdisc: validate skb without holding lock") Signed-off-by: NLars Persson <larper@axis.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 3月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
Some devices declare a high number of TX queues, then set a much lower real_num_tx_queues This cause setups using fq_codel, sfq or fq as the default qdisc to consume more memory than really needed. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 1月, 2016 1 次提交
-
-
由 John Fastabend 提交于
When a qdisc is using per cpu stats (currently just the ingress qdisc) only the bstats are being freed. This also free's the qstats. Fixes: b0ab6f92 ("net: sched: enable per cpu qstats") Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue devices. One problem is that it updates sch->q.qlen and sch->qstats.drops on the mq/mqprio root qdisc, while it should not : Daniele reported underflows errors : [ 681.774821] PAX: sch->q.qlen: 0 n: 1 [ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head; [ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1 [ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015 [ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c [ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b [ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001 [ 681.774962] Call Trace: [ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f [ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50 [ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160 [ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel] [ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel] [ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0 [ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160 [ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200 [ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30 [ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260 [ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40 [ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100 [ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170 [ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70 mq/mqprio have their own ways to report qlen/drops by folding stats on all their queues, with appropriate locking. A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup() without proper locking : concurrent qdisc updates could corrupt the list that qdisc_match_from_root() parses to find a qdisc given its handle. Fix first problem adding a TCQ_F_NOPARENT qdisc flag that qdisc_tree_decrease_qlen() can use to abort its tree traversal, as soon as it meets a mq/mqprio qdisc children. Second problem can be fixed by RCU protection. Qdisc are already freed after RCU grace period, so qdisc_list_add() and qdisc_list_del() simply have to use appropriate rcu list variants. A future patch will add a per struct netdev_queue list anchor, so that qdisc_tree_decrease_qlen() can have more efficient lookups. Reported-by: NDaniele Fucini <dfucini@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 8月, 2015 3 次提交
-
-
由 Phil Sutter 提交于
Now that noqueue qdisc can be attached just like any other qdisc, no special treatment is necessary anymore when attaching it as default qdisc. This change has the added benefit that 'tc qdisc show' prints noqueue instead of nothing for devices defaulting to noqueue. Signed-off-by: NPhil Sutter <phil@nwl.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Phil Sutter 提交于
This way users can attach noqueue just like any other qdisc using tc without having to mess with tx_queue_len first. Signed-off-by: NPhil Sutter <phil@nwl.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Phil Sutter 提交于
Since alloc_netdev_mqs() sets IFF_NO_QUEUE for drivers not initializing tx_queue_len, it is safe to assume that if tx_queue_len is zero, dev->priv flags always contains IFF_NO_QUEUE. Signed-off-by: NPhil Sutter <phil@nwl.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 8月, 2015 1 次提交
-
-
由 Phil Sutter 提交于
Handle IFF_NO_QUEUE as alternative to tx_queue_len being zero. Signed-off-by: NPhil Sutter <phil@nwl.cc> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 10月, 2014 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Restore the quota fairness between qdisc's, that we broke with commit 5772e9a3 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"). Before that commit, the quota in __qdisc_run() were in packets as dequeue_skb() would only dequeue a single packet, that assumption broke with bulk dequeue. We choose not to account for the number of packets inside the TSO/GSO packets (accessable via "skb_gso_segs"). As the previous fairness also had this "defect". Thus, GSO/TSO packets counts as a single packet. Further more, we choose to slack on accuracy, by allowing a bulk dequeue try_bulk_dequeue_skb() to exceed the "packets" limit, only limited by the BQL bytelimit. This is done because BQL prefers to get its full budget for appropriate feedback from TX completion. In future, we might consider reworking this further and, if it allows, switch to a time-based model, as suggested by Eric. Right now, we only restore old semantics. Joint work with Eric, Hannes, Daniel and Jesper. Hannes wrote the first patch in cooperation with Daniel and Jesper. Eric rewrote the patch. Fixes: 5772e9a3 ("qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Testing xmit_more support with netperf and connected UDP sockets, I found strange dst refcount false sharing. Current handling of IFF_XMIT_DST_RELEASE is not optimal. Dropping dst in validate_xmit_skb() is certainly too late in case packet was queued by cpu X but dequeued by cpu Y The logical point to take care of drop/force is in __dev_queue_xmit() before even taking qdisc lock. As Julian Anastasov pointed out, need for skb_dst() might come from some packet schedulers or classifiers. This patch adds new helper to cleanly express needs of various drivers or qdiscs/classifiers. Drivers that need skb_dst() in their ndo_start_xmit() should call following helper in their setup instead of the prior : dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; -> netif_keep_dst(dev); Instead of using a single bit, we use two bits, one being eventually rebuilt in bonding/team drivers. The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being rebuilt in bonding/team. Eventually, we could add something smarter later. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 10月, 2014 3 次提交
-
-
由 Eric Dumazet 提交于
Validation of skb can be pretty expensive : GSO segmentation and/or checksum computations. We can do this without holding qdisc lock, so that other cpus can queue additional packets. Trick is that requeued packets were already validated, so we carry a boolean so that sch_direct_xmit() can validate a fresh skb list, or directly use an old one. Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads host. Turning TSO on or off had no effect on throughput, only few more cpu cycles. Lock contention on qdisc lock disappeared. Same if disabling TX checksum offload. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
The TSO and GSO segmented packets already benefit from bulking on their own. The TSO packets have always taken advantage of the only updating the tailptr once for a large packet. The GSO segmented packets have recently taken advantage of bulking xmit_more API, via merge commit 53fda7f7 ("Merge branch 'xmit_list'"), specifically via commit 7f2e870f ("net: Move main gso loop out of dev_hard_start_xmit() into helper.") allowing qdisc requeue of remaining list. And via commit ce93718f ("net: Don't keep around original SKB when we software segment GSO frames."). This patch allow further bulking of TSO/GSO packets together, when dequeueing from the qdisc. Testing: Measuring HoL (Head-of-Line) blocking for TSO and GSO, with netperf-wrapper. Bulking several TSO show no performance regressions (requeues were in the area 32 requeues/sec). Bulking several GSOs does show small regression or very small improvement (requeues were in the area 8000 requeues/sec). Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional latency. Base-case, which is "normal" GSO bulking, sees varying high-prio queue delay between 0.38ms to 0.47ms. Bulking several GSOs together, result in a stable high-prio queue delay of 0.50ms. Using igb at 100Mbit/s with GSO bulking, shows an improvement. Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows sending/processing an entire skb list. This patch implements qdisc bulk dequeue, by allowing multiple packets to be dequeued in dequeue_skb(). The optimization principle for this is two fold, (1) to amortize locking cost and (2) avoid expensive tailptr update for notifying HW. (1) Several packets are dequeued while holding the qdisc root_lock, amortizing locking cost over several packet. The dequeued SKB list is processed under the TXQ lock in dev_hard_start_xmit(), thus also amortizing the cost of the TXQ lock. (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more API to delay HW tailptr update, which also reduces the cost per packet. One restriction of the new API is that every SKB must belong to the same TXQ. This patch takes the easy way out, by restricting bulk dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the qdisc only have attached a single TXQ. Some detail about the flow; dev_hard_start_xmit() will process the skb list, and transmit packets individually towards the driver (see xmit_one()). In case the driver stops midway in the list, the remaining skb list is returned by dev_hard_start_xmit(). In sch_direct_xmit() this returned list is requeued by dev_requeue_skb(). To avoid overshooting the HW limits, which results in requeuing, the patch limits the amount of bytes dequeued, based on the drivers BQL limits. In-effect bulking will only happen for BQL enabled drivers. Small amounts for extra HoL blocking (2x MTU/0.24ms) were measured at 100Mbit/s, with bulking 8 packets, but the oscillating nature of the measurement indicate something, like sched latency might be causing this effect. More comparisons show, that this oscillation goes away occationally. Thus, we disregard this artifact completely and remove any "magic" bulking limit. For now, as a conservative approach, stop bulking when seeing TSO and segmented GSO packets. They already benefit from bulking on their own. A followup patch add this, to allow easier bisect-ability for finding regressions. Jointed work with Hannes, Daniel and Florian. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 9月, 2014 1 次提交
-
-
由 John Fastabend 提交于
In order to run qdisc's without locking statistics and estimators need to be handled correctly. To resolve bstats make the statistics per cpu. And because this is only needed for qdiscs that are running without locks which is not the case for most qdiscs in the near future only create percpu stats when qdiscs set the TCQ_F_CPUSTATS flag. Next because estimators use the bstats to calculate packets per second and bytes per second the estimator code paths are updated to use the per cpu statistics. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 9月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
pfifo_fast and htb use skb lists, without needing their spinlocks. (They instead use the standard qdisc lock) We can use __skb_queue_head_init() instead of skb_queue_head_init() to be consistent. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 9月, 2014 2 次提交
-
-
由 John Fastabend 提交于
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
Add __rcu notation to qdisc handling by doing this we can make smatch output more legible. And anyways some of the cases should be using rcu_dereference() see qdisc_all_tx_empty(), qdisc_tx_chainging(), and so on. Also *wake_queue() API is commonly called from driver timer routines without rcu lock or rtnl lock. So I added rcu_read_lock() blocks around netif_wake_subqueue and netif_tx_wake_queue. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 9月, 2014 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
More minor fixes to merge commit 53fda7f7 (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Fixing exit cases in qdisc_reset() and qdisc_destroy(), where a leftover requeued SKB (qdisc->gso_skb) can have the potential of being a skb list, thus use kfree_skb_list(). This is a followup to commit 10770bc2 ("qdisc: adjustments for API allowing skb list xmits"). Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 9月, 2014 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Minor adjustments for merge commit 53fda7f7 (Merge branch 'xmit_list') that allows us to work with a list of SKBs. Update code doc to function sch_direct_xmit(). In handle_dev_cpu_collision() use kfree_skb_list() in error handling. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 9月, 2014 2 次提交
-
-
由 David S. Miller 提交于
Just maintain the list properly by returning the head of the remaining SKB list from dev_hard_start_xmit(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-