- 27 5月, 2020 2 次提交
-
-
由 Guillaume Nault 提交于
With struct flow_dissector_key_mpls now recording the first FLOW_DIS_MPLS_MAX labels, we can extend Flower to filter on any of these LSEs independently. In order to avoid creating new netlink attributes for every possible depth, let's define a new TCA_FLOWER_KEY_MPLS_OPTS nested attribute that contains the list of LSEs to match. Each LSE is represented by another attribute, TCA_FLOWER_KEY_MPLS_OPTS_LSE, which then contains the attributes representing the depth and the MPLS fields to match at this depth (label, TTL, etc.). For each MPLS field, the mask is always set to all-ones, as this is what the original API did. We could allow user configurable masks in the future if there is demand for more flexibility. The new API also allows to only specify an LSE depth. In that case, Flower only verifies that the MPLS label stack depth is greater or equal to the provided depth (that is, an LSE exists at this depth). Filters that only match on one (or more) fields of the first LSE are dumped using the old netlink attributes, to avoid confusing user space programs that don't understand the new API. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2020 1 次提交
-
-
由 Vlad Buslov 提交于
Implement tcf_proto_ops->terse_dump() callback for flower classifier. Only dump handle, flags and action data in terse mode. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 3月, 2020 1 次提交
-
-
由 Jiri Pirko 提交于
It may be up to the driver (in case ANY HW stats is passed) to select which type of HW stats he is going to use. Add an infrastructure to expose this information to user. $ tc filter add dev enp3s0np1 ingress proto ip handle 1 pref 1 flower dst_ip 192.168.1.1 action drop $ tc -s filter show dev enp3s0np1 ingress filter protocol ip pref 1 flower chain 0 filter protocol ip pref 1 flower chain 0 handle 0x1 eth_type ipv4 dst_ip 192.168.1.1 in_hw in_hw_count 2 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 10 sec used 10 sec Action statistics: Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats immediate <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 3月, 2020 3 次提交
-
-
由 Guillaume Nault 提交于
Pass extack down to fl_set_key_flags() and set message on error. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
Pass extack down to fl_set_key_port_range() and set message on error. Both the min and max ports would qualify as invalid attributes here. Report the min one as invalid, as it's probably what makes the most sense from a user point of view. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
Pass extack down to fl_set_key_mpls() and set message on error. Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 2月, 2020 2 次提交
-
-
由 Jason Baron 提交于
tc flower rules that are based on src or dst port blocking are sometimes ineffective due to uninitialized stack data. __skb_flow_dissect() extracts ports from the skb for tc flower to match against. However, the port dissection is not done when when the FLOW_DIS_IS_FRAGMENT bit is set in key_control->flags. All callers of __skb_flow_dissect(), zero-out the key_control field except for fl_classify() as used by the flower classifier. Thus, the FLOW_DIS_IS_FRAGMENT may be set on entry to __skb_flow_dissect(), since key_control is allocated on the stack and may not be initialized. Since key_basic and key_control are present for all flow keys, let's make sure they are initialized. Fixes: 62230715 ("flow_dissector: do not dissect l4 ports for fragments") Co-developed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Refactor tc_setup_flow_action() function not to use rtnl lock and remove 'rtnl_held' argument that is no longer needed. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 2月, 2020 1 次提交
-
-
由 Davide Caratti 提交于
unlike other classifiers that can be offloaded (i.e. users can set flags like 'skip_hw' and 'skip_sw'), 'cls_flower' doesn't validate the size of netlink attribute 'TCA_FLOWER_FLAGS' provided by user: add a proper entry to fl_policy. Fixes: 5b33f488 ("net/flower: Introduce hardware offload support") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 1月, 2020 1 次提交
-
-
由 Cong Wang 提交于
The current implementations of ops->bind_class() are merely searching for classid and updating class in the struct tcf_result, without invoking either of cl_ops->bind_tcf() or cl_ops->unbind_tcf(). This breaks the design of them as qdisc's like cbq use them to count filters too. This is why syzbot triggered the warning in cbq_destroy_class(). In order to fix this, we have to call cl_ops->bind_tcf() and cl_ops->unbind_tcf() like the filter binding path. This patch does so by refactoring out two helper functions __tcf_bind_filter() and __tcf_unbind_filter(), which are lockless and accept a Qdisc pointer, then teaching each implementation to call them correctly. Note, we merely pass the Qdisc pointer as an opaque pointer to each filter, they only need to pass it down to the helper functions without understanding it at all. Fixes: 07d79fc7 ("net_sched: add reverse binding for tc class") Reported-and-tested-by: syzbot+0a0596220218fcb603a8@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+63bdb6006961d8c917c6@syzkaller.appspotmail.com Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 12月, 2019 1 次提交
-
-
由 Davide Caratti 提交于
Revert "net/sched: cls_u32: fix refcount leak in the error path of u32_change()", and fix the u32 refcount leak in a more generic way that preserves the semantic of rule dumping. On tc filters that don't support lockless insertion/removal, there is no need to guard against concurrent insertion when a removal is in progress. Therefore, for most of them we can avoid a full walk() when deleting, and just decrease the refcount, like it was done on older Linux kernels. This fixes situations where walk() was wrongly detecting a non-empty filter, like it happened with cls_u32 in the error path of change(), thus leading to failures in the following tdc selftests: 6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev 6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle 74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id On cls_flower, and on (future) lockless filters, this check is necessary: move all the check_empty() logic in a callback so that each filter can have its own implementation. For cls_flower, it's sufficient to check if no IDRs have been allocated. This reverts commit 275c44aa. Changes since v1: - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED is used, thanks to Vlad Buslov - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov - squash revert and new fix in a single patch, to be nice with bisect tests that run tdc on u32 filter, thanks to Dave Miller Fixes: 275c44aa ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()") Fixes: 6676d5e4 ("net: sched: set dedicated tcf_walker flag when tp is empty") Suggested-by: NJamal Hadi Salim <jhs@mojatatu.com> Suggested-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Reviewed-by: NVlad Buslov <vladbu@mellanox.com> Tested-by: NJamal Hadi Salim <jhs@mojatatu.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 12月, 2019 1 次提交
-
-
由 Pankaj Bharadiya 提交于
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" | while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: NPankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.comCo-developed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net
-
- 04 12月, 2019 1 次提交
-
-
由 Yoshiki Komachi 提交于
The recent commit 5c72299f ("net: sched: cls_flower: Classify packets using port ranges") had added filtering based on port ranges to tc flower. However the commit missed necessary changes in hw-offload code, so the feature gave rise to generating incorrect offloaded flow keys in NIC. One more detailed example is below: $ tc qdisc add dev eth0 ingress $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \ dst_port 100-200 action drop With the setup above, an exact match filter with dst_port == 0 will be installed in NIC by hw-offload. IOW, the NIC will have a rule which is equivalent to the following one. $ tc qdisc add dev eth0 ingress $ tc filter add dev eth0 ingress protocol ip flower ip_proto tcp \ dst_port 0 action drop The behavior was caused by the flow dissector which extracts packet data into the flow key in the tc flower. More specifically, regardless of exact match or specified port ranges, fl_init_dissector() set the FLOW_DISSECTOR_KEY_PORTS flag in struct flow_dissector to extract port numbers from skb in skb_flow_dissect() called by fl_classify(). Note that device drivers received the same struct flow_dissector object as used in skb_flow_dissect(). Thus, offloaded drivers could not identify which of these is used because the FLOW_DISSECTOR_KEY_PORTS flag was set to struct flow_dissector in either case. This patch adds the new FLOW_DISSECTOR_KEY_PORTS_RANGE flag and the new tp_range field in struct fl_flow_key to recognize which filters are applied to offloaded drivers. At this point, when filters based on port ranges passed to drivers, drivers return the EOPNOTSUPP error because they do not support the feature (the newly created FLOW_DISSECTOR_KEY_PORTS_RANGE flag). Fixes: 5c72299f ("net: sched: cls_flower: Classify packets using port ranges") Signed-off-by: NYoshiki Komachi <komachi.yoshiki@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 11月, 2019 2 次提交
-
-
由 Xin Long 提交于
This patch is to allow matching options in erspan. The options can be described in the form: VER:INDEX:DIR:HWID/VER:INDEX_MASK:DIR_MASK:HWID_MASK. When ver is set to 1, index will be applied while dir and hwid will be ignored, and when ver is set to 2, dir and hwid will be used while index will be ignored. Different from geneve, only one option can be set. And also, geneve options, vxlan options or erspan options can't be set at the same time. # ip link add name erspan1 type erspan external # tc qdisc add dev erspan1 ingress # tc filter add dev erspan1 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ erspan_opts 1:12:0:0/1:ffff:0:0 \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - improve some err msgs of extack. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to allow matching gbp option in vxlan. The options can be described in the form GBP/GBP_MASK, where GBP is represented as a 32bit hexadecimal value. Different from geneve, only one option can be set. And also, geneve options and vxlan options can't be set at the same time. # ip link add name vxlan0 type vxlan dstport 0 external # tc qdisc add dev vxlan0 ingress # tc filter add dev vxlan0 protocol ip parent ffff: \ flower \ enc_src_ip 10.0.99.192 \ enc_dst_ip 10.0.99.193 \ enc_key_id 11 \ vxlan_opts 01020304/ffffffff \ ip_proto udp \ action mirred egress redirect dev eth0 v1->v2: - add .strict_start_type for enc_opts_policy as Jakub noticed. - use Duplicate instead of Wrong in err msg for extack as Jakub suggested. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 8月, 2019 5 次提交
-
-
由 Vlad Buslov 提交于
Don't manually take rtnl lock in flower classifier before calling cls hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held' parameter. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
In order to remove dependency on rtnl lock when calling hardware offload API, take reference to action mirred dev when initializing flow_action structure in tc_setup_flow_action(). Implement function tc_cleanup_flow_action(), use it to release the device after hardware offload API is done using it. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
In order to allow using new flow_action infrastructure from unlocked classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held' argument. Take rtnl lock before accessing tc_action data. This is necessary to protect from concurrent action replace. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
To remove dependency on rtnl lock, extend classifier ops with new ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while holding cb_lock every time filter if successfully added to or deleted from hardware. Implement the new API in flower classifier. Use it to manage hw_filters list under cb_lock protection, instead of relying on rtnl lock to synchronize with concurrent fl_reoffload() call. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Without rtnl lock protection filters can no longer safely manage block offloads counter themselves. Refactor cls API to protect block offloadcnt with tcf_block->cb_lock that is already used to protect driver callback list and nooffloaddevcnt counter. The counter can be modified by concurrent tasks by new functions that execute block callbacks (which is safe with previous patch that changed its type to atomic_t), however, block bind/unbind code that checks the counter value takes cb_lock in write mode to exclude any concurrent modifications. This approach prevents race conditions between bind/unbind and callback execution code but allows for concurrency for tc rule update path. Move block offload counter, filter in hardware counter and filter flags management from classifiers into cls hardware offloads API. Make functions tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API private. Implement following new cls API to be used instead: tc_setup_cb_add() - non-destructive filter add. If filter that wasn't already in hardware is successfully offloaded, increment block offloads counter, set filter in hardware counter and flag. On failure, previously offloaded filter is considered to be intact and offloads counter is not decremented. tc_setup_cb_replace() - destructive filter replace. Release existing filter block offload counter and reset its in hardware counter and flag. Set new filter in hardware counter and flag. On failure, previously offloaded filter is considered to be destroyed and offload counter is decremented. tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block offloads counter. tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and call tc_cls_offload_cnt_update() if cb() didn't return an error. Refactor all offload-capable classifiers to atomically offload filters to hardware, change block offload counter, and set filter in hardware counter and flag by means of the new cls API functions. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2019 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
Rename this type definition and adapt users. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 7月, 2019 2 次提交
-
-
由 Pablo Neira Ayuso 提交于
And any other existing fields in this structure that refer to tc. Specifically: * tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule(). * TC_CLSFLOWER_* to FLOW_CLS_*. * tc_cls_common_offload to tc_cls_common_offload. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Blakey 提交于
New matches for conntrack mark, label, zone, and state. Signed-off-by: NPaul Blakey <paulb@mellanox.com> Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NYossi Kuperman <yossiku@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2019 1 次提交
-
-
由 Cong Wang 提交于
Similarly, other callers of idr_get_next_ul() suffer the same overflow bug as they don't handle it properly either. Introduce idr_for_each_entry_continue_ul() to help these callers iterate from a given ID. cls_flower needs more care here because it still has overflow when does arg->cookie++, we have to fold its nested loops into one and remove the arg->cookie++. Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr") Fixes: 12d6066c ("net/mlx5: Add flow counters idr") Reported-by: NLi Shuang <shuali@redhat.com> Cc: Davide Caratti <dcaratti@redhat.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Chris Mi <chrism@mellanox.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Tested-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 6月, 2019 1 次提交
-
-
由 Jiri Pirko 提交于
Use previously introduced infra to obtain and store ingress ifindex instead doing it locally. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 6月, 2019 1 次提交
-
-
由 Jiri Pirko 提交于
This config option makes only couple of lines optional. Two small helpers and an int in couple of cls structs. Remove the config option and always compile this in. This saves the user from unexpected surprises when he adds a filter with ingress device match which is silently ignored in case the config option is not set. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2019 1 次提交
-
-
由 Vlad Buslov 提交于
Current flower mask creating code assumes that temporary mask that is used when inserting new filter is stack allocated. To prevent race condition with data patch synchronize_rcu() is called every time fl_create_new_mask() replaces temporary stack allocated mask. As reported by Jiri, this increases runtime of creating 20000 flower classifiers from 4 seconds to 163 seconds. However, this design is no longer necessary since temporary mask was converted to be dynamically allocated by commit 2cddd201 ("net/sched: cls_flower: allocate mask dynamically in fl_change()"). Remove synchronize_rcu() calls from mask creation code. Instead, refactor fl_change() to always deallocate temporary mask with rcu grace period. Fixes: 195c234d ("net: sched: flower: handle concurrent mask insertion") Reported-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Tested-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 5月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 08 5月, 2019 1 次提交
-
-
由 Pieter Jansen van Vuuren 提交于
Based on feedback from Jiri avoid carrying a pointer to the tcf_block structure in the tc_cls_common_offload structure. Instead store a flag in driver private data which indicates if offloads apply to a shared block at block binding time. Suggested-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 5月, 2019 1 次提交
-
-
由 Pieter Jansen van Vuuren 提交于
Some actions like the police action are stateful and could share state between devices. This is incompatible with offloading to multiple devices and drivers might want to test for shared blocks when offloading. Store a pointer to the tcf_block structure in the tc_cls_common_offload structure to allow drivers to determine when offloads apply to a shared block. Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2019 2 次提交
-
-
由 Johannes Berg 提交于
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kubecek 提交于
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Acked-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 4月, 2019 1 次提交
-
-
由 Vlad Buslov 提交于
Recent changes that introduced unlocked flower did not properly account for case when reoffload is initiated concurrently with filter updates. To fix the issue, extend flower with 'hw_filters' list that is used to store filters that don't have 'skip_hw' flag set. Filter is added to the list when it is inserted to hardware and only removed from it after being unoffloaded from all drivers that parent block is attached to. This ensures that concurrent reoffload can still access filter that is being deleted and prevents race condition when driver callback can be removed when filter is no longer accessible trough idr, but is still present in hardware. Refactor fl_change() to respect new filter reference counter and to release filter reference with __fl_put() in case of error, instead of directly deallocating filter memory. This allows for concurrent access to filter from fl_reoffload() and protects it with reference counting. Refactor fl_reoffload() to iterate over hw_filters list instead of idr. Implement fl_get_next_hw_filter() helper function that is used to iterate over hw_filters list with reference counting and skips filters that are being concurrently deleted. Fixes: 92149190 ("net: sched: flower: set unlocked flag for flower proto ops") Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 4月, 2019 2 次提交
-
-
由 Vlad Buslov 提交于
Fix net reference counting in fl_change() and remove redundant call to tcf_exts_get_net() from __fl_delete(). __fl_put() already tries to get net before releasing exts and deallocating a filter, so this code caused flower classifier to obtain net twice per filter that is being deleted. Implementation of __fl_delete() called tcf_exts_get_net() to pass its result as 'async' flag to fl_mask_put(). However, 'async' flag is redundant and only complicates fl_mask_put() implementation. This functionality seems to be copied from filter cleanup code, where it was added by Cong with following explanation: This patchset tries to fix the race between call_rcu() and cleanup_net() again. Without holding the netns refcnt the tc_action_net_exit() in netns workqueue could be called before filter destroy works in tc filter workqueue. This patchset moves the netns refcnt from tc actions to tcf_exts, without breaking per-netns tc actions. This doesn't apply to flower mask, which doesn't call any tc action code during cleanup. Simplify fl_mask_put() by removing the flag parameter and always use tcf_queue_work() to free mask objects. Fixes: 06177558 ("net: sched: flower: introduce reference counting for filters") Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw") Fixes: 05cd271f ("cls_flower: Support multiple masks per priority") Reported-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Implementation of function rhashtable_insert_fast() check if its internal helper function __rhashtable_insert_fast() returns non-NULL pointer and seemingly return -EEXIST in such case. However, since __rhashtable_insert_fast() is called with NULL key pointer, it never actually checks for duplicates, which means that -EEXIST is never returned to the user. Use rhashtable_lookup_insert_fast() hash table API instead. In order to verify that it works as expected and prevent the problem from happening in future, extend tc-tests with new test that verifies that no new filters with existing key can be inserted to flower classifier. Fixes: 1f17f774 ("net: sched: flower: insert filter to ht before offloading it to hw") Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2019 1 次提交
-
-
由 Vlad Buslov 提交于
John reports: Recent refactoring of fl_change aims to use the classifier spinlock to avoid the need for rtnl lock. In doing so, the fl_hw_replace_filer() function was moved to before the lock is taken. This can create problems for drivers if duplicate filters are created (commmon in ovs tc offload due to filters being triggered by user-space matches). Drivers registered for such filters will now receive multiple copies of the same rule, each with a different cookie value. This means that the drivers would need to do a full match field lookup to determine duplicates, repeating work that will happen in flower __fl_lookup(). Currently, drivers do not expect to receive duplicate filters. To fix this, verify that filter with same key is not present in flower classifier hash table and insert the new filter to the flower hash table before offloading it to hardware. Implement helper function fl_ht_insert_unique() to atomically verify/insert a filter. This change makes filter visible to fast path at the beginning of fl_change() function, which means it can no longer be freed directly in case of error. Refactor fl_change() error handling code to deallocate the filter with rcu timeout. Fixes: 620da486 ("net: sched: flower: refactor fl_change") Reported-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2019 1 次提交
-
-
由 John Hurley 提交于
Recent changes to TC flower remove the requirement for rtnl lock when accessing and modifying filters. Refcounts now ensure access and deletion do not happen concurrently. However, the reoffload function which cycles through all filters and replays them to registered hw drivers is not protected. Use the fl_get_next_filter() function to cycle the filters for reoffload and ensure the ref taken by this function is put when done with each filter. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 3月, 2019 2 次提交
-
-
由 Vlad Buslov 提交于
Set TCF_PROTO_OPS_DOIT_UNLOCKED for flower classifier to indicate that its ops callbacks don't require caller to hold rtnl lock. Don't take rtnl lock in fl_destroy_filter_work() that is executed on workqueue instead of being called by cls API and is not affected by setting TCF_PROTO_OPS_DOIT_UNLOCKED. Rtnl mutex is still manually taken by flower classifier before calling hardware offloads API that has not been updated for unlocked execution. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Use 'rtnl_held' flag to track if caller holds rtnl lock. Propagate the flag to internal functions that need to know rtnl lock state. Take rtnl lock before calling tcf APIs that require it (hw offload, bind filter, etc.). Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-