- 12 3月, 2014 2 次提交
-
-
由 Eric Dumazet 提交于
We have seen delays of more than 50ms in class or qdisc dumps, in case device is under high TX stress, even with the prior 4KB per skb limit. Add cond_resched() to give a chance to higher prio tasks to get cpu. Signed-off-by; Eric Dumazet <edumazet@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Like all rtnetlink dump operations, we hold RTNL in tc_dump_qdisc(), so we do not need to use rcu protection to protect list of netdevices. This will allow preemption to occur, thus reducing latencies. Following patch adds explicit cond_resched() calls. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
Resizing fq hash table allocates memory while holding qdisc spinlock, with BH disabled. This is definitely not good, as allocation might sleep. We can drop the lock and get it when needed, we hold RTNL so no other changes can happen at the same time. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2014 1 次提交
-
-
由 Eric Dumazet 提交于
htb_dump() and htb_dump_class() do not strictly need to acquire qdisc lock to fetch qdisc and/or class parameters. We hold RTNL and no changes can occur. This reduces by 50% qdisc lock pressure while doing tc qdisc|class dump operations. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 3月, 2014 1 次提交
-
-
由 Hiroaki SHIMODA 提交于
On x86_64 we have 3 holes in struct tbf_sched_data. The member peak_present can be replaced with peak.rate_bytes_ps, because peak.rate_bytes_ps is set only when peak is specified in tbf_change(). tbf_peak_present() is introduced to test peak.rate_bytes_ps. The member max_size is moved to fill 32bit hole. Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 2月, 2014 1 次提交
-
-
由 Hiroaki SHIMODA 提交于
The allocated child qdisc is not freed in error conditions. Defer the allocation after user configuration turns out to be valid and acceptable. Fixes: cc106e44 ("net: sched: tbf: fix the calculation of max_size") Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 2月, 2014 1 次提交
-
-
由 Yang Yingliang 提交于
Replace two magic numbers which intialize clgstate::state. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2014 4 次提交
-
-
由 Yang Yingliang 提交于
Replace some magic numbers which describe states of GE model loss generator with enumerate. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yang Yingliang 提交于
In netem_change(), we have already get "struct netem_sched_data *q". Replace params of get_correlation() and other similar functions with "struct netem_sched_data *q". Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yang Yingliang 提交于
get_dist_table() and get_loss_clg() may be failed. These two functions should be called after setting the members of qdisc_priv(sch), or it will break the old settings while either of them is failed. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vijay Subramanian 提交于
Fix incorrect comment reported by Norbert Kiesel. Edit another comment to add more details. Also add references to algorithm (IETF draft and paper) to top of file. Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> CC: Mythili Prabhu <mysuryan@cisco.com> CC: Norbert Kiesel <nkiesel@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 2月, 2014 5 次提交
-
-
由 WANG Cong 提交于
We could allocate tc_action on stack in tca_action_flush(), since it is not large. Also, we could use create_a() in tcf_action_get_1(). Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
When an action is bonnd to a filter, there is no point to remove it outside. Currently we just silently decrease the refcnt, we should reject this explicitly with EPERM. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
For bindcnt and refcnt etc., they are common for all actions, not need to repeat such operations for their own, they can be unified now. Actions just need to do its specific cleanup if needed. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Now we can totally hide it from modules. tcf_hash_*() API's will operate on struct tc_action, modules don't need to care about the details. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 1月, 2014 1 次提交
-
-
由 Florian Westphal 提交于
This moves part of Eric Dumazets skb_gso_seglen helper from tbf sched to skbuff core so it may be reused by upcoming ip forwarding path patch. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 1月, 2014 1 次提交
-
-
由 Harry Mason 提交于
If the class in skb->priority is not a leaf, apply filters from the selected class, not the qdisc. This lets netfilter or user space partially classify the packet. Signed-off-by: NHarry Mason <harry.mason@smoothwall.net> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 1月, 2014 4 次提交
-
-
由 Hannes Frederic Sowa 提交于
Jakub Zawadzki noticed that some divisions by reciprocal_divide() were not correct [1][2], which he could also show with BPF code after divisions are transformed into reciprocal_value() for runtime invariance which can be passed to reciprocal_divide() later on; reverse in BPF dump ended up with a different, off-by-one K in some situations. This has been fixed by Eric Dumazet in commit aee636c4 ("bpf: do not use reciprocal divide"). This follow-up patch improves reciprocal_value() and reciprocal_divide() to work in all cases by using Granlund and Montgomery method, so that also future use is safe and without any non-obvious side-effects. Known problems with the old implementation were that division by 1 always returned 0 and some off-by-ones when the dividend and divisor where very large. This seemed to not be problematic with its current users, as far as we can tell. Eric Dumazet checked for the slab usage, we cannot surely say so in the case of flex_array. Still, in order to fix that, we propose an extension from the original implementation from commit 6a2d7a95 resp. [3][4], by using the algorithm proposed in "Division by Invariant Integers Using Multiplication" [5], Torbjörn Granlund and Peter L. Montgomery, that is, pseudocode for q = n/d where q, n, d is in u32 universe: 1) Initialization: int l = ceil(log_2 d) uword m' = floor((1<<32)*((1<<l)-d)/d)+1 int sh_1 = min(l,1) int sh_2 = max(l-1,0) 2) For q = n/d, all uword: uword t = (n*m')>>32 q = (t+((n-t)>>sh_1))>>sh_2 The assembler implementation from Agner Fog [6] also helped a lot while implementing. We have tested the implementation on x86_64, ppc64, i686, s390x; on x86_64/haswell we're still half the latency compared to normal divide. Joint work with Daniel Borkmann. [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c [3] https://gmplib.org/~tege/division-paper.pdf [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 [6] http://www.agner.org/optimize/asmlib.zipReported-by: NJakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Jesse Gross <jesse@nicira.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Many functions have open coded a function that returns a random number in range [0,N-1]. Under the assumption that we have a PRNG such as taus113 with being well distributed in [0, ~0U] space, we can implement such a function as uword t = (n*m')>>32, where m' is a random number obtained from PRNG, n the right open interval border and t our resulting random number, with n,m',t in u32 universe. Lets go with Joe and simply call it prandom_u32_max(), although technically we have an right open interval endpoint, but that we have documented. Other users can further be migrated to the new prandom_u32_max() function later on; for now, we need to make sure to migrate reciprocal_divide() users for the reciprocal_divide() follow-up fixup since their function signatures are going to change. Joint work with Hannes Frederic Sowa. Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
So that we will not expose struct tcf_common to modules. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Every action ops has a pointer to hash info, so we don't need to hard-code it in each module. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 1月, 2014 2 次提交
-
-
由 WANG Cong 提交于
It is not actually implemented. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yang Yingliang 提交于
Replace some magic numbers which describe states of 4-state model loss generator with enumerate. Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 1月, 2014 3 次提交
-
-
由 Wei Yongjun 提交于
The error code was not set if change indev fail, so the error condition wasn't reflected in the return value. Fix to return a negative error code from this error handling case instead of 0. Fixes: 2519a602 ('net_sched: optimize tcf_match_indev()') Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
In tcf_register_action() we check either ->type or ->kind to see if there is an existing action registered, but ipt action registers two actions with same type but different kinds. They should have different types too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 1月, 2014 1 次提交
-
-
由 Aruna-Hewapathirane 提交于
This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: NAruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 1月, 2014 7 次提交
-
-
由 WANG Cong 提交于
It is not necessary at all. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
tp->root is a void* pointer, no need to cast it. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
tcf_match_indev() is called in fast path, it is not wise to search for a netdev by ifindex and then compare by its name, just compare the ifindex. Also, dev->name could be changed by user-space, therefore the match would be always fail, but dev->ifindex could be consistent. BTW, this will also save some bytes from the core struct of u32. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
It will be needed by the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Refactor tcf_add_notify() and factor out tcf_del_notify(). Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
There is no need to store the index separatedly since tcf_hashinfo is allocated statically too. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Terry Lam 提交于
This is to be compatible with the use of "get_time" (i.e. default time unit in us) in iproute2 patch for HHF as requested by Stephen. Signed-off-by: NTerry Lam <vtlam@google.com> Acked-by: NNandita Dukkipati <nanditad@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 1月, 2014 1 次提交
-
-
由 Jason Wang 提交于
Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: NJason Wang <jasowang@redhat.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 1月, 2014 3 次提交
-
-
由 Jamal Hadi Salim 提交于
action flushing missaccounting Account only for deleted actions Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jamal Hadi Salim 提交于
Remove unnecessary checks for act->ops (suggested by Eric Dumazet). Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vijay Subramanian 提交于
Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. >From the IETF draft below: " Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " Many thanks to Dave Taht for extensive feedback, reviews, testing and suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and suggestions. Naeem Khademi and Dave Taht independently contributed to ECN support. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: NMythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 1月, 2014 1 次提交
-
-
由 Daniel Borkmann 提交于
Zefan Li requested [1] to perform the following cleanup/refactoring: - Split cgroupfs classid handling into net core to better express a possible more generic use. - Disable module support for cgroupfs bits as the majority of other cgroupfs subsystems do not have that, and seems to be not wished from cgroup side. Zefan probably might want to follow-up for netprio later on. - By this, code can be further reduced which previously took care of functionality built when compiled as module. cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so that we are consistent with {netclassid,netprio}_cgroup naming that is under net/core/ as suggested by Zefan. No change in functionality, but only code refactoring that is being done here. [1] http://patchwork.ozlabs.org/patch/304825/Suggested-by: NLi Zefan <lizefan@huawei.com> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Cc: Zefan Li <lizefan@huawei.com> Cc: Thomas Graf <tgraf@suug.ch> Cc: cgroups@vger.kernel.org Acked-by: NLi Zefan <lizefan@huawei.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-