- 16 8月, 2017 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
Traffic filters could keep direct pointers to classes in classful qdisc, thus qdisc destruction first removes all filters before freeing classes. Class destruction methods also tries to free attached filters but now this isn't safe because tcf_block_put() unlike to tcf_destroy_chain() cannot be called second time. This patch set class->block to NULL after first tcf_block_put() and turn second call into no-op. Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 8月, 2017 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
Without this filters cannot be attached. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 8月, 2017 1 次提交
-
-
由 Xin Long 提交于
Commit 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat") introduced a member nft_compat to xt_tgchk_param structure. But it didn't set it's value for ipt_init_target. With unexpected value in par.nft_compat, it may return unexpected result in some target's checkentry. This patch is to set all it's fields as 0 and only initialize the non-zero fields in ipt_init_target. v1->v2: As Wang Cong's suggestion, fix it by setting all it's fields as 0 and only initializing the non-zero fields. Fixes: 55917a21 ("netfilter: x_tables: add context to know if extension runs from nft_compat") Suggested-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2017 1 次提交
-
-
由 Xin Long 提交于
Now xt_tgchk_param par in ipt_init_target is a local varibale, par.net is not initialized there. Later when xt_check_target calls target's checkentry in which it may access par.net, it would cause kernel panic. Jaroslav found this panic when running: # ip link add TestIface type dummy # tc qd add dev TestIface ingress handle ffff: # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \ action xt -j CONNMARK --set-mark 4 This patch is to pass net param into ipt_init_target and set par.net with it properly in there. v1->v2: As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix it by also passing net_id to __tcf_ipt_init. v2->v3: Missed the fixes tag, so add it. Fixes: ecb2421b ("netfilter: add and use nf_ct_netns_get/put") Reported-by: NJaroslav Aster <jaster@redhat.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 7月, 2017 1 次提交
-
-
由 Roman Mashak 提交于
Make name consistent with other TC event notification routines, such as tcf_add_notify() and tcf_del_notify() Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to the page allocator. This has been true but only for allocations requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for smaller sizes. This is a bit unfortunate because there is no way to express the same semantic for those requests and they are considered too important to fail so they might end up looping in the page allocator for ever, similarly to GFP_NOFAIL requests. Now that the whole tree has been cleaned up and accidental or misled usage of __GFP_REPEAT flag has been removed for !costly requests we can give the original flag a better name and more importantly a more useful semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user that the allocator would try really hard but there is no promise of a success. This will work independent of the order and overrides the default allocator behavior. Page allocator users have several levels of guarantee vs. cost options (take GFP_KERNEL as an example) - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_ attempt to free memory at all. The most light weight mode which even doesn't kick the background reclaim. Should be used carefully because it might deplete the memory and the next user might hit the more aggressive reclaim - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic allocation without any attempt to free memory from the current context but can wake kswapd to reclaim memory if the zone is below the low watermark. Can be used from either atomic contexts or when the request is a performance optimization and there is another fallback for a slow path. - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) - non sleeping allocation with an expensive fallback so it can access some portion of memory reserves. Usually used from interrupt/bh context with an expensive slow path fallback. - GFP_KERNEL - both background and direct reclaim are allowed and the _default_ page allocator behavior is used. That means that !costly allocation requests are basically nofail but there is no guarantee of that behavior so failures have to be checked properly by callers (e.g. OOM killer victim is allowed to fail currently). - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior and all allocation requests fail early rather than cause disruptive reclaim (one round of reclaim in this implementation). The OOM killer is not invoked. - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator behavior and all allocation requests try really hard. The request will fail if the reclaim cannot make any progress. The OOM killer won't be triggered. - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior and all allocation requests will loop endlessly until they succeed. This might be really dangerous especially for larger orders. Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL because they already had their semantic. No new users are added. __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if there is no progress and we have already passed the OOM point. This means that all the reclaim opportunities have been exhausted except the most disruptive one (the OOM killer) and a user defined fallback behavior is more sensible than keep retrying in the page allocator. [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c] [mhocko@suse.com: semantic fix] Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz [mhocko@kernel.org: address other thing spotted by Vlastimil] Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Alex Belits <alex.belits@cavium.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David Daney <david.daney@cavium.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 7月, 2017 1 次提交
-
-
由 Reshetova, Elena 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2017 2 次提交
-
-
由 Reshetova, Elena 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. This patch uses refcount_inc_not_zero() instead of atomic_inc_not_zero_hint() due to absense of a _hint() version of refcount API. If the hint() version must be used, we might need to revisit API. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Reshetova, Elena 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 6月, 2017 1 次提交
-
-
由 Gao Feng 提交于
When qdisc fail to init, qdisc_create would invoke the destroy callback to cleanup. But there is no check if the callback exists really. So it would cause the panic if there is no real destroy callback like the qdisc codel, fq, and so on. Take codel as an example following: When a malicious user constructs one invalid netlink msg, it would cause codel_init->codel_change->nla_parse_nested failed. Then kernel would invoke the destroy callback directly but qdisc codel doesn't define one. It causes one panic as a result. Now add one the check for destroy to avoid the possible panic. Fixes: 87b60cfa ("net_sched: fix error recovery at qdisc creation") Signed-off-by: NGao Feng <gfree.wind@vip.163.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 6月, 2017 1 次提交
-
-
由 Daniel Borkmann 提交于
In order to be able to retrieve the attached programs from cls_bpf and act_bpf, we need to expose the prog ids via netlink so that an application can later on get an fd based on the id through the BPF_PROG_GET_FD_BY_ID command, and dump related prog info via BPF_OBJ_GET_INFO_BY_FD command for bpf(2). Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 6月, 2017 2 次提交
-
-
由 Jiri Benc 提交于
Allow requesting of zero UDP checksum for encapsulated packets. The name and meaning of the attribute is "NO_CSUM" in order to have the same meaning of the attribute missing and being 0. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
There's currently no way to request (outer) UDP checksum with act_tunnel_key. This is problem especially for IPv6. Right now, tunnel_key action with IPv6 does not work without going through hassles: both sides have to have udp6zerocsumrx configured on the tunnel interface. This is obviously not a good solution universally. It makes more sense to compute the UDP checksum by default even for IPv4. Just set the default to request the checksum when using act_tunnel_key. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2017 2 次提交
-
-
由 Dan Carpenter 提交于
I'm reviewing static checker warnings where we do ERR_PTR(0), which is the same as NULL. I'm pretty sure we intended to return ERR_PTR(-EINVAL) here. Sometimes these bugs lead to a NULL dereference but I don't immediately see that problem here. Fixes: 71d0ed70 ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
Laura reported a sleep-in-atomic kernel warning inside tcf_act_police_init() which calls gen_replace_estimator() with spinlock protection. It is not necessary in this case, we already have RTNL lock here so it is enough to protect concurrent writers. For the reader, i.e. tcf_act_police(), it needs to make decision based on this rate estimator, in the worst case we drop more/less packets than necessary while changing the rate in parallel, it is still acceptable. Reported-by: NLaura Abbott <labbott@redhat.com> Reported-by: NNick Huber <nicholashuber@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2017 1 次提交
-
-
由 Jiri Pirko 提交于
We need to push the chain index down to the drivers, so they have the information to which chain the rule belongs. For now, no driver supports multichain offload, so only chain 0 is supported. This is needed to prevent chain squashes during offload for now. Later this will be used to implement multichain offload. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2017 1 次提交
-
-
由 Jiri Pirko 提交于
There is need to instruct the HW offloaded path to push certain matched packets to cpu/kernel for further analysis. So this patch introduces a new TRAP control action to TC. For kernel datapath, this action does not make much sense. So with the same logic as in HW, new TRAP behaves similar to STOLEN. The skb is just dropped in the datapath (and virtually ejected to an upper level, which does not exist in case of kernel). Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NYotam Gigi <yotamg@mellanox.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 6月, 2017 2 次提交
-
-
由 Jiri Pirko 提交于
It really makes no sense to have cls_act enabled without cls. In that case, the cls_act code is dead. So select it. This also fixes an issue recently reported by kbuild robot: [linux-next:master 1326/4151] net/sched/act_api.c:37:18: error: implicit declaration of function 'tcf_chain_get' Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Fixes: db50514f ("net: sched: add termination action to allow goto chain") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
Benefit from the support of ip header fields dissection and allow users to set rules matching on ipv4 tos and ttl or ipv6 traffic-class and hoplimit. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 5月, 2017 2 次提交
-
-
由 WANG Cong 提交于
tcf_chain_get() always creates a new filter chain if not found in existing ones. This is totally unnecessary when we get or delete filters, new chain should be only created for new filters (or new actions). Fixes: 5bc17018 ("net: sched: introduce multichain support for filters") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
With the introduction of chain goto action, the reclassification would cause the re-iteration of the actual chain. It makes more sense to restart the whole thing and re-iterate starting from the original tp - start of chain 0. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 5月, 2017 1 次提交
-
-
由 Jiri Pirko 提交于
Benefit from the support of tcp flags dissection and allow user to insert rules matching on tcp flags. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 5月, 2017 3 次提交
-
-
由 Jiri Pirko 提交于
When user instructs to remove all filters from chain, we cannot destroy the chain as other actions may hold a reference. Also the put in errout would try to destroy it again. So instead, just walk the chain and remove all existing filters. Fixes: 5bc17018 ("net: sched: introduce multichain support for filters") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
*p_filter_chain is rcu-dereferenced on reader path. So here in writer, property assign the pointer. Fixes: 2190d1d0 ("net: sched: introduce helpers to work with filter chains") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Since the head is guaranteed by the check above to be null, the call_rcu would explode. Remove the previously logically dead code that was made logically very much alive and kicking. Fixes: 985538ee ("net/sched: remove redundant null check on head") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2017 1 次提交
-
-
由 Davide Caratti 提交于
skb->csum_not_inet carries the indication on which algorithm is needed to compute checksum on skb in the transmit path, when skb->ip_summed is equal to CHECKSUM_PARTIAL. If skb carries a SCTP packet and crc32c hasn't been yet written in L4 header, skb->csum_not_inet is assigned to 1; otherwise, assume Internet Checksum is needed and thus set skb->csum_not_inet to 0. Suggested-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2017 11 次提交
-
-
由 David S. Miller 提交于
We still need to initialize err to -EINVAL for the case where 'opt' is NULL in dsmark_init(). Fixes: 6529eaba ("net: sched: introduce tcf block infractructure") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Introduce new type of termination action called "goto_chain". This allows user to specify a chain to be processed. This action type is then processed as a return value in tcf_classify loop in similar way as "reclassify" is, only it does not reset to the first filter in chain but rather reset to the first filter of the desired chain. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Tp pointer will be needed by the next patch in order to get the chain. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Instead of having only one filter per block, introduce a list of chains for every block. Create chain 0 by default. UAPI is extended so the user can specify which chain he wants to change. If the new attribute is not specified, chain 0 is used. That allows to maintain backward compatibility. If chain does not exist and user wants to manipulate with it, new chain is created with specified index. Also, when last filter is removed from the chain, the chain is destroyed. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Since there will be multiple chains to dump, push chain dumping code to a separate function. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Introduce struct tcf_chain object and set of helpers around it. Wraps up insertion, deletion and search in the filter chain. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Call the helper from the function rather than to always adjust the return value of the function. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
The use of "nprio" variable in tc_ctl_tfilter is a bit cryptic and makes a reader wonder what is going on for a while. So help him to understand this priority allocation dance a litte bit better. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Make the name consistent with the rest of the helpers around. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Currently, the filter chains are direcly put into the private structures of qdiscs. In order to be able to have multiple chains per qdisc and to allow filter chains sharing among qdiscs, there is a need for common object that would hold the chains. This introduces such object and calls it "tcf_block". Helpers to get and put the blocks are provided to be called from individual qdisc code. Also, the original filter_list pointers are left in qdisc privs to allow the entry into tcf_block processing without any added overhead of possible multiple pointer dereference on fast path. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Move tc_classify function to cls_api.c where it belongs, rename it to fit the namespace. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 5月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
BBR congestion control depends on pacing, and pacing is currently handled by sch_fq packet scheduler for performance reasons, and also because implemening pacing with FQ was convenient to truly avoid bursts. However there are many cases where this packet scheduler constraint is not practical. - Many linux hosts are not focusing on handling thousands of TCP flows in the most efficient way. - Some routers use fq_codel or other AQM, but still would like to use BBR for the few TCP flows they initiate/terminate. This patch implements an automatic fallback to internal pacing. Pacing is requested either by BBR or use of SO_MAX_PACING_RATE option. If sch_fq happens to be in the egress path, pacing is delegated to the qdisc, otherwise pacing is done by TCP itself. One advantage of pacing from TCP stack is to get more precise rtt estimations, and less work done from TX completion, since TCP Small queue limits are not generally hit. Setups with single TX queue but many cpus might even benefit from this. Note that unlike sch_fq, we do not take into account header sizes. Taking care of these headers would add additional complexity for no practical differences in behavior. Some performance numbers using 800 TCP_STREAM flows rate limited to ~48 Mbit per second on 40Gbit NIC. If MQ+pfifo_fast is used on the NIC : $ sar -n DEV 1 5 | grep eth 14:48:44 eth0 725743.00 2932134.00 46776.76 4335184.68 0.00 0.00 1.00 14:48:45 eth0 725349.00 2932112.00 46751.86 4335158.90 0.00 0.00 0.00 14:48:46 eth0 725101.00 2931153.00 46735.07 4333748.63 0.00 0.00 0.00 14:48:47 eth0 725099.00 2931161.00 46735.11 4333760.44 0.00 0.00 1.00 14:48:48 eth0 725160.00 2931731.00 46738.88 4334606.07 0.00 0.00 0.00 Average: eth0 725290.40 2931658.20 46747.54 4334491.74 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 4 0 0 259825920 45644 2708324 0 0 21 2 247 98 0 0 100 0 0 4 0 0 259823744 45644 2708356 0 0 0 0 2400825 159843 0 19 81 0 0 0 0 0 259824208 45644 2708072 0 0 0 0 2407351 159929 0 19 81 0 0 1 0 0 259824592 45644 2708128 0 0 0 0 2405183 160386 0 19 80 0 0 1 0 0 259824272 45644 2707868 0 0 0 32 2396361 158037 0 19 81 0 0 Now use MQ+FQ : lpaa23:~# echo fq >/proc/sys/net/core/default_qdisc lpaa23:~# tc qdisc replace dev eth0 root mq $ sar -n DEV 1 5 | grep eth 14:49:57 eth0 678614.00 2727930.00 43739.13 4033279.14 0.00 0.00 0.00 14:49:58 eth0 677620.00 2723971.00 43674.69 4027429.62 0.00 0.00 1.00 14:49:59 eth0 676396.00 2719050.00 43596.83 4020125.02 0.00 0.00 0.00 14:50:00 eth0 675197.00 2714173.00 43518.62 4012938.90 0.00 0.00 1.00 14:50:01 eth0 676388.00 2719063.00 43595.47 4020171.64 0.00 0.00 0.00 Average: eth0 676843.00 2720837.40 43624.95 4022788.86 0.00 0.00 0.40 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 0 259832240 46008 2710912 0 0 21 2 223 192 0 1 99 0 0 1 0 0 259832896 46008 2710744 0 0 0 0 1702206 198078 0 17 82 0 0 0 0 0 259830272 46008 2710596 0 0 0 0 1696340 197756 1 17 83 0 0 4 0 0 259829168 46024 2710584 0 0 16 0 1688472 197158 1 17 82 0 0 3 0 0 259830224 46024 2710408 0 0 0 0 1692450 197212 0 18 82 0 0 As expected, number of interrupts per second is very different. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 5月, 2017 1 次提交
-
-
由 Eric Dumazet 提交于
In commit 59cc1f61 ("net: sched: convert qdisc linked list to hashtable") we missed the opportunity to considerably speed up tc_dump_tclass_root() if a qdisc handle is provided by user. Instead of iterating all the qdiscs, use qdisc_match_from_root() to directly get the one we look for. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 5月, 2017 1 次提交
-
-
由 Michal Hocko 提交于
fq_alloc_node, alloc_netdev_mqs and netif_alloc* open code kmalloc with vmalloc fallback. Use the kvmalloc variant instead. Keep the __GFP_REPEAT flag based on explanation from Eric: "At the time, tests on the hardware I had in my labs showed that vmalloc() could deliver pages spread all over the memory and that was a small penalty (once memory is fragmented enough, not at boot time)" The way how the code is constructed means, however, that we prefer to go and hit the OOM killer before we fall back to the vmalloc for requests <=32kB (with 4kB pages) in the current code. This is rather disruptive for something that can be achived with the fallback. On the other hand __GFP_REPEAT doesn't have any useful semantic for these requests. So the effect of this patch is that requests which fit into 32kB will fall back to vmalloc easier now. Link: http://lkml.kernel.org/r/20170306103327.2766-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Eric Dumazet <edumazet@google.com> Cc: David Miller <davem@davemloft.net> Cc: Shakeel Butt <shakeelb@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-