- 08 7月, 2018 9 次提交
-
-
由 Vlad Buslov 提交于
Change action API to assume that action init function always takes reference to action, even when overwriting existing action. This is necessary because action API continues to use action pointer after init function is done. At this point action becomes accessible for concurrent modifications, so user must always hold reference to it. Implement helper put list function to atomically release list of actions after action API init code is done using them. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Return from action init function with reference to action taken, even when overwriting existing action. Action init API initializes its fourth argument (pointer to pointer to tc action) to either existing action with same index or newly created action. In case of existing index(and bind argument is zero), init function returns without incrementing action reference counter. Caller of action init then proceeds working with action, without actually holding reference to it. This means that action could be deleted concurrently. Change action init behavior to always take reference to action before returning successfully, in order to protect from concurrent deletion. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Implement helper delete function that uses new action ops 'delete', instead of destroying action directly. This is required so act API could delete actions by index, without holding any references to action that is being deleted. Implement function __tcf_action_put() that releases reference to action and frees it, if necessary. Refactor action deletion code to use new put function and not to rely on rtnl lock. Remove rtnl lock assertions that are no longer needed. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Extend action ops with 'delete' function. Each action type to implements its own delete function that doesn't depend on rtnl lock. Implement delete function that is required to delete actions without holding rtnl lock. Use action API function that atomically deletes action only if it is still in action idr. This implementation prevents concurrent threads from deleting same action twice. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Implement new action API function that atomically finds and deletes action from idr by index. Intended to be used by lockless actions that do not rely on rtnl lock. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Without rtnl lock protection it is no longer safe to use pointer to tc action without holding reference to it. (it can be destroyed concurrently) Remove unsafe action idr lookup function. Instead of it, implement safe tcf idr check function that atomically looks up action in idr and increments its reference and bind counters. Implement both action search and check using new safe function Reference taken by idr check is temporal and should not be accounted by userspace clients (both logically and to preserver current API behavior). Subtract temporal reference when dumping action to userspace using existing tca_get_fill function arguments. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Add additional 'rtnl_held' argument to act API init functions. It is required to implement actions that need to release rtnl lock before loading kernel module and reacquire if afterwards. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Change type of action reference counter to refcount_t. Change type of action bind counter to atomic_t. This type is used to allow decrementing bind counter without testing for 0 result. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Implement functions to atomically update and free action cookie using rcu mechanism. Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 7月, 2018 3 次提交
-
-
由 Jianbo Liu 提交于
As support dissecting of QinQ inner and outer vlan headers, user can add rules to match on QinQ vlan headers. Signed-off-by: NJianbo Liu <jianbol@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jianbo Liu 提交于
Currently the encapsulated ethertype is not dumped as it's the same as TCA_FLOWER_KEY_ETH_TYPE keyvalue. But the dumping result is inconsistent with input, we add dumping it with TCA_FLOWER_KEY_VLAN_ETH_TYPE. Signed-off-by: NJianbo Liu <jianbol@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jianbo Liu 提交于
As flow dissector stores vlan ethertype, tc flower now can match on that. It is to make preparation for supporting QinQ. Signed-off-by: NJianbo Liu <jianbol@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 7月, 2018 6 次提交
-
-
由 Jesus Sanchez-Palencia 提交于
Use the socket error queue for reporting dropped packets if the socket has enabled that feature through the SO_TXTIME API. Packets are dropped either on enqueue() if they aren't accepted by the qdisc or on dequeue() if the system misses their deadline. Those are reported as different errors so applications can react accordingly. Userspace can retrieve the errors through the socket error queue and the corresponding cmsg interfaces. A struct sock_extended_err* is used for returning the error data, and the packet's timestamp can be retrieved by adding both ee_data and ee_info fields as e.g.: ((__u64) serr->ee_data << 32) + serr->ee_info This feature is disabled by default and must be explicitly enabled by applications. Enabling it can bring some overhead for the Tx cycles of the application. Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesus Sanchez-Palencia 提交于
Add infra so etf qdisc supports HW offload of time-based transmission. For hw offload, the time sorted list is still used, so packets are dequeued always in order of txtime. Example: $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \ clockid CLOCK_REALTIME In this example, the Qdisc will use HW offload for the control of the transmission time through the network adapter. The hrtimer used for packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME as reference and packets leave the Qdisc "delta" (100000) nanoseconds before their transmission time. Because this will be using HW offload and since dynamic clocks are not supported by the hrtimer, the system clock and the PHC clock must be synchronized for this mode to behave as expected. Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vinicius Costa Gomes 提交于
The ETF (Earliest TxTime First) qdisc uses the information added earlier in this series (the socket option SO_TXTIME and the new role of sk_buff->tstamp) to schedule packets transmission based on absolute time. For some workloads, just bandwidth enforcement is not enough, and precise control of the transmission of packets is necessary. Example: $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \ clockid CLOCK_TAI In this example, the Qdisc will provide SW best-effort for the control of the transmission time to the network adapter, the time stamp in the socket will be in reference to the clockid CLOCK_TAI and packets will leave the qdisc "delta" (100000) nanoseconds before its transmission time. The ETF qdisc will buffer packets sorted by their txtime. It will drop packets on enqueue() if their skbuff clockid does not match the clock reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped if it expires while being enqueued. The qdisc also supports the SO_TXTIME deadline mode. For this mode, it will dequeue a packet as soon as possible and change the skb timestamp to 'now' during etf_dequeue(). Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid to be configured, but it's been decided that usage of CLOCK_TAI should be enforced until we decide to allow for other clockids to be used. The rationale here is that PTP times are usually in the TAI scale, thus no other clocks should be necessary. For now, the qdisc will return EINVAL if any clocks other than CLOCK_TAI are used. Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vinicius Costa Gomes 提交于
This adds 'qdisc_watchdog_init_clockid()' that allows a clockid to be passed, this allows other time references to be used when scheduling the Qdisc to run. Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Yongjun 提交于
'keys_ex' is malloced by tcf_pedit_keys_ex_parse() in tcf_pedit_init() but not all of the error handle path free it, this may cause memory leak. This patch fix it. Fixes: 71d0ed70 ("net/act_pedit: Support using offset relative to the conventional network headers") Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Qiaobin Fu 提交于
The new action inheritdsfield copies the field DS of IPv4 and IPv6 packets into skb->priority. This enables later classification of packets based on the DS field. v5: *Update the drop counter for TC_ACT_SHOT v4: *Not allow setting flags other than the expected ones. *Allow dumping the pure flags. v3: *Use optional flags, so that it won't break old versions of tc. *Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags. v2: *Fix the style issue *Move the code from skbmod to skbedit Original idea by Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NQiaobin Fu <qiaobinf@bu.edu> Reviewed-by: NMichel Machado <michel@digirati.com.br> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2018 1 次提交
-
-
由 Roman Mashak 提交于
Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2018 3 次提交
-
-
由 Simon Horman 提交于
Allow setting tunnel options using the act_tunnel_key action. Options are expressed as class:type:data and multiple options may be listed using a comma delimiter. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ geneve_opts 0102:80:00800022,0102:80:00800022 \ action mirred egress redirect dev geneve0 Signed-off-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG during validation of user input. Cc: Alexander Aring <aring@mojatatu.com> Signed-off-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Metadata may be NULL for one of two reasons: * Missing user input * Failure to allocate the metadata dst Disambiguate these case by returning -EINVAL for the former and -ENOMEM for the latter rather than -EINVAL for both cases. This is in preparation for using extended ack to provide more information to users when parsing their input. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 6月, 2018 6 次提交
-
-
由 Roman Mashak 提交于
Since char can be unsigned or signed, and bitwise operators may have implementation-dependent results when performed on signed operands, declare 'u8 *' operand instead. Suggested-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roman Mashak 提交于
Change "tc filter pedit .." to "tc actions pedit .." in error messages to clearly refer to pedit action. Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roman Mashak 提交于
Replace constant integer with sizeof() to clearly indicate the destination buffer length in skb_header_pointer() calls. Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roman Mashak 提交于
The variable _data in include/asm-generic/sections.h defines sections, this causes sparse warning in pedit: net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one ./include/asm-generic/sections.h:36:13: originally declared here Therefore rename the variable. Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roman Mashak 提交于
Fix coding style issues in tc pedit action detected by the checkpatch script. Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yousuk Seung 提交于
Extend slotting with support for non-uniform distributions. This is similar to netem's non-uniform distribution delay feature. Commit f043efeae2f1 ("netem: support delivering packets in delayed time slots") added the slotting feature to approximate the behaviors of media with packet aggregation but only supported a uniform distribution for delays between transmission attempts. Tests with TCP BBR with emulated wifi links with non-uniform distributions produced more useful results. Syntax: slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \ [bytes MAX_BYTES] The syntax and use of the distribution table is the same as in the non-uniform distribution delay feature. A file DISTRIBUTION must be present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by NETEM_DIST_SCALE. A random value x is selected from the table and it takes DELAY + ( x * JITTER ) as delay. Correlation between values is not supported. Examples: Normal distribution delay with mean = 800us and stdev = 100us. > tc qdisc add dev eth0 root netem slot dist normal 800us 100us Optionally set the max slot size in bytes and/or packets. > tc qdisc add dev eth0 root netem slot dist normal 800us 100us \ bytes 64k packets 42 Signed-off-by: NYousuk Seung <ysseung@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 6月, 2018 6 次提交
-
-
由 John Hurley 提交于
Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a block when a callback tries to register to a block that already has offloaded rules. If all existing rules cannot be offloaded then the registration is rejected. This replaces the previous policy of rejecting such callback registration outright. On unregistration of a callback, the rules are flushed for that given cb. The implementation of block sharing in the NFP driver, for example, duplicates shared rules to all devs bound to a block. This meant that rules could still exist in hw even after a device is unbound from a block (assuming the block still remains active). Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Add the offload tcf_proto_op in cls_bpf to generate an offload message for each bpf prog in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' prog. A prog contains a flag to indicate if it is in hardware or not. To ensure the offload function properly maintains this flag, keep a reference counter for the number of instances of the prog that are in hardware. Only update the flag when this counter changes from or to 0. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Add the offload tcf_proto_op in cls_u32 to generate an offload message for each filter and the hashtable in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' rule. A filter contains a flag to indicate if it is in hardware or not. To ensure the offload function properly maintains this flag, keep a reference counter for the number of instances of the filter that are in hardware. Only update the flag when this counter changes from or to 0. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Add the reoffload tcf_proto_op in matchall to generate an offload message for each filter in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' rule. Ensure matchall flags correctly report if the rule is in hw by keeping a reference counter for the number of instances of the rule offloaded. Only update the flag when this counter changes from or to 0. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Add the reoffload tcf_proto_op in flower to generate an offload message for each filter in the given tcf_proto. Call the specified callback with this new offload message. The function only returns an error if the callback rejects adding a 'hardware only' rule. A filter contains a flag to indicate if it is in hardware or not. To ensure the reoffload function properly maintains this flag, keep a reference counter for the number of instances of the filter that are in hardware. Only update the flag when this counter changes from or to 0. Add a generic helper function to implement this behaviour. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Pass the extact struct from a tc qdisc add to the block bind function and, in turn, to the setup_tc ndo of binding device via the tc_block_offload struct. Pass this back to any block callback registrations to allow netlink logging of fails in the bind process. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 6月, 2018 1 次提交
-
-
由 Cong Wang 提交于
After commit a09ceb0e ("sched: remove qdisc->drop"), it is no longer used. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 6月, 2018 1 次提交
-
-
由 Cong Wang 提交于
In update_vf(): cftree_remove(cl); update_cfmin(cl->cl_parent); the cl_cfmin of cl->cl_parent is intentionally updated to 0 when that parent only has one child. And if this parent is root qdisc, we could end up, in hfsc_schedule_watchdog(), that we can't decide the next schedule time for qdisc watchdog. But it seems safe that we can just skip it, as this watchdog is not always scheduled anyway. Thanks to Marco for testing all the cases, nothing is broken. Reported-by: NMarco Berizzi <pupilla@libero.it> Tested-by: NMarco Berizzi <pupilla@libero.it> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 6月, 2018 1 次提交
-
-
由 Paolo Abeni 提交于
If flower filter is created without the skip_sw flag, fl_mask_put() can race with fl_classify() and we can destroy the mask rhashtable while a lookup operation is accessing it. BUG: unable to handle kernel paging request at 00000000000911d1 PGD 0 P4D 0 SMP PTI CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016 RIP: 0010:rht_bucket_nested+0x20/0x60 Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0 RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206 RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12 RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001 RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2 R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000 R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68 FS: 0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0 Call Trace: fl_lookup+0x134/0x140 [cls_flower] fl_classify+0xf3/0x180 [cls_flower] tcf_classify+0x78/0x150 __netif_receive_skb_core+0x69e/0xa50 netif_receive_skb_internal+0x42/0xf0 tun_get_user+0xdd5/0xfd0 [tun] tun_sendmsg+0x52/0x70 [tun] handle_tx+0x2b3/0x5f0 [vhost_net] vhost_worker+0xab/0x100 [vhost] kthread+0xf8/0x130 ret_from_fork+0x35/0x40 Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress CR2: 00000000000911d1 Fix the above waiting for a RCU grace period before destroying the rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy() must run in process context, as pointed out by Cong Wang. v1 -> v2: use tcf_queue_work to run rhashtable_destroy(). Fixes: 05cd271f ("cls_flower: Support multiple masks per priority") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 6月, 2018 2 次提交
-
-
由 Davide Caratti 提交于
in the following script # tc actions add action ife encode allow prio pass index 42 # tc actions replace action ife encode allow tcindex drop index 42 the action control should remain equal to 'pass', if the kernel failed to replace the TC action. Pospone the assignment of the action control, to ensure it is not overwritten in the error path of tcf_ife_init(). Fixes: ef6980b6 ("introduce IFE action") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
a recursive lock warning [1] can be observed with the following script, # $TC actions add action ife encode allow prio pass index 42 IFE type 0xED3E # $TC actions replace action ife encode allow tcindex pass index 42 in case the kernel was unable to run the last command (e.g. because of the impossibility to load 'act_meta_skbtcindex'). For a similar reason, the kernel can leak idr in the error path of tcf_ife_init(), because tcf_idr_release() is not called after successful idr reservation: # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No such file or directory We have an error talking to the kernel # $TC actions add action ife encode allow tcindex index 47 IFE type 0xED3E RTNETLINK answers: No space left on device We have an error talking to the kernel # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47 IFE type 0xFEFE RTNETLINK answers: No space left on device We have an error talking to the kernel Since tcfa_lock is already taken when the action is being edited, a call to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock again. On the other hand, tcf_idr_release() needs to be called in the error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation. Fix both problems in tcf_ife_init(). Since the cleanup() routine can now be called when ife->params is NULL, also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu). [1] ============================================ WARNING: possible recursive locking detected 4.17.0-rc4.kasan+ #417 Tainted: G E -------------------------------------------- tc/3932 is trying to acquire lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife] but task is already holding lock: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&p->tcfa_lock)->rlock); lock(&(&p->tcfa_lock)->rlock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by tc/3932: #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife] #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife] stack backtrace: CPU: 3 PID: 3932 Comm: tc Tainted: G E 4.17.0-rc4.kasan+ #417 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: dump_stack+0x9a/0xeb __lock_acquire+0xf43/0x34a0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? debug_check_no_locks_freed+0x2b0/0x2b0 ? __mutex_lock+0x62f/0x1240 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_acquire+0x10b/0x330 lock_acquire+0x10b/0x330 ? tcf_ife_cleanup+0x19/0x80 [act_ife] _raw_spin_lock_bh+0x38/0x70 ? tcf_ife_cleanup+0x19/0x80 [act_ife] tcf_ife_cleanup+0x19/0x80 [act_ife] __tcf_idr_release+0xff/0x350 tcf_ife_init+0xdde/0x13c0 [act_ife] ? ife_exit_net+0x290/0x290 [act_ife] ? __lock_is_held+0xb4/0x140 tcf_action_init_1+0x67b/0xad0 ? tcf_action_dump_old+0xa0/0xa0 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? memset+0x1f/0x40 tcf_action_init+0x30f/0x590 ? tcf_action_init_1+0xad0/0xad0 ? memset+0x1f/0x40 tc_ctl_action+0x48e/0x5e0 ? mutex_lock_io_nested+0x1160/0x1160 ? tca_action_gd+0x990/0x990 ? sched_clock+0x5/0x10 ? find_held_lock+0x39/0x1d0 rtnetlink_rcv_msg+0x4da/0x990 ? validate_linkmsg+0x680/0x680 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 netlink_rcv_skb+0x127/0x350 ? validate_linkmsg+0x680/0x680 ? netlink_ack+0x970/0x970 ? __kmalloc_node_track_caller+0x304/0x3a0 netlink_unicast+0x40f/0x5d0 ? netlink_attachskb+0x580/0x580 ? _copy_from_iter_full+0x187/0x760 ? import_iovec+0x90/0x390 netlink_sendmsg+0x67f/0xb50 ? netlink_unicast+0x5d0/0x5d0 ? copy_msghdr_from_user+0x206/0x340 ? netlink_unicast+0x5d0/0x5d0 sock_sendmsg+0xb3/0xf0 ___sys_sendmsg+0x60a/0x8b0 ? copy_msghdr_from_user+0x340/0x340 ? lock_downgrade+0x5e0/0x5e0 ? tty_write_lock+0x18/0x50 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock+0x5/0x10 ? sched_clock_cpu+0x18/0x170 ? find_held_lock+0x39/0x1d0 ? lock_downgrade+0x5e0/0x5e0 ? lock_acquire+0x10b/0x330 ? __audit_syscall_entry+0x316/0x690 ? current_kernel_time64+0x6b/0xd0 ? __fget_light+0x55/0x1f0 ? __sys_sendmsg+0xd2/0x170 __sys_sendmsg+0xd2/0x170 ? __ia32_sys_shutdown+0x70/0x70 ? syscall_trace_enter+0x57a/0xd60 ? rcu_read_lock_sched_held+0xdc/0x110 ? __bpf_trace_sys_enter+0x10/0x10 ? do_syscall_64+0x22/0x480 do_syscall_64+0xa5/0x480 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fd646988ba0 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100 Fixes: 4e8c8615 ("net sched: net sched: ife action fix late binding") Fixes: ef6980b6 ("introduce IFE action") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 6月, 2018 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
When blackhole is used on top of classful qdisc like hfsc it breaks qlen and backlog counters because packets are disappear without notice. In HFSC non-zero qlen while all classes are inactive triggers warning: WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc] and schedules watchdog work endlessly. This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS, this flag tells upper layer: this packet is gone and isn't queued. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-