- 25 3月, 2017 2 次提交
-
-
由 Florian Fainelli 提交于
Introduce a new configuration symbol: MDIO_DEVICE which allows building the MDIO devices and bus code, without pulling in the entire Ethernet PHY library and devices code. PHYLIB nows select MDIO_DEVICE and the relevant Makefile files are updated to reflect that. When MDIO_DEVICE (MDIO bus/device only) is selected, but not PHYLIB, we have mdio-bus.ko as a loadable module, and it does not have a module_exit() function because the safety of removing a bus class is unclear. When both MDIO_DEVICE and PHYLIB are enabled, we need to assemble everything into a common loadable module: libphy.ko because of nasty circular dependencies between phy.c, phy_device.c and mdio_bus.c which are really tough to untangle. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
CONFIG_OF_MDIO is actually what triggers the build of drivers/of/of_mdio.c, so providing inline stubs when CONFIG_OF_MDIO=y should be based on that symbol as well. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2017 2 次提交
-
-
由 Chenbo Feng 提交于
Returns the owner uid of the socket inside a sk_buff. This is useful to perform per-UID accounting of network traffic or per-UID packet filtering. The socket need to be a fullsock otherwise overflowuid is returned. Signed-off-by: NChenbo Feng <fengc@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chenbo Feng 提交于
Retrieve the socket cookie generated by sock_gen_cookie() from a sk_buff with a known socket. Generates a new cookie if one was not yet set.If the socket pointer inside sk_buff is NULL, 0 is returned. The helper function coud be useful in monitoring per socket networking traffic statistics and provide a unique socket identifier per namespace. Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NChenbo Feng <fengc@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 3月, 2017 12 次提交
-
-
由 Martin KaFai Lau 提交于
This patch adds hash of maps support (hashmap->bpf_map). BPF_MAP_TYPE_HASH_OF_MAPS is added. A map-in-map contains a pointer to another map and lets call this pointer 'inner_map_ptr'. Notes on deleting inner_map_ptr from a hash map: 1. For BPF_F_NO_PREALLOC map-in-map, when deleting an inner_map_ptr, the htab_elem itself will go through a rcu grace period and the inner_map_ptr resides in the htab_elem. 2. For pre-allocated htab_elem (!BPF_F_NO_PREALLOC), when deleting an inner_map_ptr, the htab_elem may get reused immediately. This situation is similar to the existing prealloc-ated use cases. However, the bpf_map_fd_put_ptr() calls bpf_map_put() which calls inner_map->ops->map_free(inner_map) which will go through a rcu grace period (i.e. all bpf_map's map_free currently goes through a rcu grace period). Hence, the inner_map_ptr is still safe for the rcu reader side. This patch also includes BPF_MAP_TYPE_HASH_OF_MAPS to the check_map_prealloc() in the verifier. preallocation is a must for BPF_PROG_TYPE_PERF_EVENT. Hence, even we don't expect heavy updates to map-in-map, enforcing BPF_F_NO_PREALLOC for map-in-map is impossible without disallowing BPF_PROG_TYPE_PERF_EVENT from using map-in-map first. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
This patch adds a few helper funcs to enable map-in-map support (i.e. outer_map->inner_map). The first outer_map type BPF_MAP_TYPE_ARRAY_OF_MAPS is also added in this patch. The next patch will introduce a hash of maps type. Any bpf map type can be acted as an inner_map. The exception is BPF_MAP_TYPE_PROG_ARRAY because the extra level of indirection makes it harder to verify the owner_prog_type and owner_jited. Multi-level map-in-map is not supported (i.e. map->map is ok but not map->map->map). When adding an inner_map to an outer_map, it currently checks the map_type, key_size, value_size, map_flags, max_entries and ops. The verifier also uses those map's properties to do static analysis. map_flags is needed because we need to ensure BPF_PROG_TYPE_PERF_EVENT is using a preallocated hashtab for the inner_hash also. ops and max_entries are needed to generate inlined map-lookup instructions. For simplicity reason, a simple '==' test is used for both map_flags and max_entries. The equality of ops is implied by the equality of map_type. During outer_map creation time, an inner_map_fd is needed to create an outer_map. However, the inner_map_fd's life time does not depend on the outer_map. The inner_map_fd is merely used to initialize the inner_map_meta of the outer_map. Also, for the outer_map: * It allows element update and delete from syscall * It allows element lookup from bpf_prog The above is similar to the current fd_array pattern. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joel Scherpelz 提交于
This commit adds a new sysctl accept_ra_rt_info_min_plen that defines the minimum acceptable prefix length of Route Information Options. The new sysctl is intended to be used together with accept_ra_rt_info_max_plen to configure a range of acceptable prefix lengths. It is useful to prevent misconfigurations from unintentionally blackholing too much of the IPv6 address space (e.g., home routers announcing RIOs for fc00::/7, which is incorrect). Signed-off-by: NJoel Scherpelz <jscherpelz@google.com> Acked-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Remove the indirect MMD read/write methods which are now no longer necessary. Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Make phy_(read|write)_mmd() generic 802.3 clause 45 register accessors for both Clause 22 and Clause 45 PHYs, using either the direct register reading for Clause 45, or the indirect method for Clause 22 PHYs. Allow this behaviour to be overriden by PHY drivers where necessary. Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Russell King 提交于
Move the phy_(read|write)__mmd() helpers out of line, they will become our main MMD accessor functions, and so will be a little more complex. This complexity doesn't belong in an inline function. Also move the _indirect variants as well to keep like functionality together. Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Thierry Reding 提交于
Prior to the recent multi-queue changes the driver would configure the queues to use the AVB mode, but the mode then got switched to DCB. The hardware still works fine in DCB mode, but my testing capabilities are limited, so it's safer to revert to the prior setting anyway. Signed-off-by: NThierry Reding <treding@nvidia.com> Acked-By: NJoao Pinto <jpinto@synopsys.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Reshetova, Elena 提交于
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 andy zhou 提交于
With the introduction of open flow 'clone' action, the OVS user space can now translate the 'clone' action into kernel datapath 'sample' action, with 100% probability, to ensure that the clone semantics, which is that the packet seen by the clone action is the same as the packet seen by the action after clone, is faithfully carried out in the datapath. While the sample action in the datpath has the matching semantics, its implementation is only optimized for its original use. Specifically, there are two limitation: First, there is a 3 level of nesting restriction, enforced at the flow downloading time. This limit turns out to be too restrictive for the 'clone' use case. Second, the implementation avoid recursive call only if the sample action list has a single userspace action. The main optimization implemented in this series removes the static nesting limit check, instead, implement the run time recursion limit check, and recursion avoidance similar to that of the 'recirc' action. This optimization solve both #1 and #2 issues above. One related optimization attempts to avoid copying flow key as long as the actions enclosed does not change the flow key. The detection is performed only once at the flow downloading time. Another related optimization is to rewrite the action list at flow downloading time in order to save the fast path from parsing the sample action list in its original form repeatedly. Signed-off-by: NAndy Zhou <azhou@ovn.org> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Josh Hunt 提交于
Allows reading of SK_MEMINFO_VARS via socket option. This way an application can get all meminfo related information in single socket option call instead of multiple calls. Adds helper function, sk_get_meminfo(), and uses that for both getsockopt and sock_diag_put_meminfo(). Suggested by Eric Dumazet. Signed-off-by: NJosh Hunt <johunt@akamai.com> Reviewed-by: NJason Baron <jbaron@akamai.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
sctp_stream_free uses struct sctp_stream as a param, but struct sctp_stream is defined after it's declaration. This patch is to declare struct sctp_stream before sctp_stream_free. Fixes: a8386317 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Roopa Prabhu 提交于
neigh notifications today carry pid 0 for nlmsg_pid in all cases. This patch fixes it to carry calling process pid when available. Applications (eg. quagga) rely on nlmsg_pid to ignore notifications generated by their own netlink operations. This patch follows the routing subsystem which already sets this correctly. Reported-by: NVivek Venkatraman <vivek@cumulusnetworks.com> Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 3月, 2017 10 次提交
-
-
由 Soheil Hassas Yeganeh 提交于
SOF_TIMESTAMPING_OPT_STATS can be enabled and disabled while packets are collected on the error queue. So, checking SOF_TIMESTAMPING_OPT_STATS in sk->sk_tsflags is not enough to safely assume that the skb contains OPT_STATS data. Add a bit in sock_exterr_skb to indicate whether the skb contains opt_stats data. Fixes: 1c885808 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: NJongHwan Kim <zzoru007@gmail.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
As tp->dst_pending_confirm's value can only be set 0 or 1, this patch is to change to define it as a bit instead of __u32. Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andreas Gruenbacher 提交于
Add rhashtable_lookup_get_insert_fast for fixed keys, similar to rhashtable_lookup_get_insert_key for explicit keys. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joao Pinto 提交于
This patch adds the configuration of RX queues' routing. Signed-off-by: NJoao Pinto <jpinto@synopsys.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joao Pinto 提交于
This patch adds the configuration of RX and TX queues' priority. Signed-off-by: NJoao Pinto <jpinto@synopsys.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Philippe Reynes 提交于
The function usbnet_{get|set}_settings is no longer used, so we remove it. Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com> Acked-by: NOliver Neukum <oneukum@suse.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Philippe Reynes 提交于
The ethtool api {get|set}_settings is deprecated. We add the new api {get|set}_link_ksettings to this driver. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com> Acked-by: NOliver Neukum <oneukum@suse.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
This patch adds support for ECMP hash policy choice via a new sysctl called fib_multipath_hash_policy and also adds support for L4 hashes. The current values for fib_multipath_hash_policy are: 0 - layer 3 (default) 1 - layer 4 If there's an skb hash already set and it matches the chosen policy then it will be used instead of being calculated (currently only for L4). In L3 mode we always calculate the hash due to the ICMP error special case, the flow dissector's field consistentification should handle the address order thus we can remove the address reversals. If the skb is provided we always use it for the hash calculation, otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Peng Tao 提交于
To allow canceling all packets of a connection. Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Reviewed-by: NJorgen Hansen <jhansen@vmware.com> Signed-off-by: NPeng Tao <bergwolf@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Peng Tao 提交于
So that we can cancel a queued pkt later if necessary. Signed-off-by: NPeng Tao <bergwolf@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2017 2 次提交
-
-
由 Mike Christie 提交于
The implicit transition time tells initiators the min time to wait before timing out a transition. We currently schedule the transition to occur in tg_pt_gp_implicit_trans_secs seconds so there is no room for delays. If core_alua_do_transition_tg_pt_work->core_alua_update_tpg_primary_metadata needs to write out info to a remote file, then the initiator can easily time out the operation. Signed-off-by: NMike Christie <mchristi@redhat.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Mike Christie 提交于
This patch allows passthrough backends to use the core/base LIO ALUA setup and state checks, but still handle the execution of commands. This will allow the target_core_user module to execute STPG and RTPG in userspace, and not have to duplicate the ALUA state checks, path information (needed so we can check if command is executable on specific paths) and setup (rtslib sets/updates the configfs ALUA interface like it does for iblock or file). For STPG, the target_core_user userspace daemon, tcmu-runner will still execute the STPG, and to update the core/base LIO state it will use the existing configfs interface. For RTPG, tcmu-runner will loop over configfs and/or cache the state. Signed-off-by: NMike Christie <mchristi@redhat.com> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
- 17 3月, 2017 9 次提交
-
-
由 Reshetova, Elena 提交于
refcount_t type and corresponding API (see include/linux/refcount.h) should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: NElena Reshetova <elena.reshetova@intel.com> Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NDavid Windsor <dwindsor@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Alexei Starovoitov 提交于
Optimize bpf_call -> bpf_map_lookup_elem() -> array_map_lookup_elem() into a sequence of bpf instructions. When JIT is on the sequence of bpf instructions is the sequence of native cpu instructions with significantly faster performance than indirect call and two function's prologue/epilogue. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Soheil Hassas Yeganeh 提交于
The tcp_tw_recycle was already broken for connections behind NAT, since the per-destination timestamp is not monotonically increasing for multiple machines behind a single destination address. After the randomization of TCP timestamp offsets in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection), the tcp_tw_recycle is broken for all types of connections for the same reason: the timestamps received from a single machine is not monotonically increasing, anymore. Remove tcp_tw_recycle, since it is not functional. Also, remove the PAWSPassive SNMP counter since it is only used for tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req since the strict argument is only set when tcp_tw_recycle is enabled. Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Cc: Lutz Vieweg <lvml@5t9.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Soheil Hassas Yeganeh 提交于
Commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection) randomizes TCP timestamps per connection. After this commit, there is no guarantee that the timestamps received from the same destination are monotonically increasing. As a result, the per-destination timestamp cache in TCP metrics (i.e., tcpm_ts in struct tcp_metrics_block) is broken and cannot be relied upon. Remove the per-destination timestamp cache and all related code paths. Note that this cache was already broken for caching timestamps of multiple machines behind a NAT sharing the same address. Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Cc: Lutz Vieweg <lvml@5t9.de> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jack Morgenstein 提交于
Some Hypervisors detach VFs from VMs by instantly causing an FLR event to be generated for a VF. In the mlx4 case, this will cause that VF's comm channel to be disabled before the VM has an opportunity to invoke the VF device's "shutdown" method. For such Hypervisors, there is a race condition between the VF's shutdown method and its internal-error detection/reset thread. The internal-error detection/reset thread (which runs every 5 seconds) also detects a disabled comm channel. If the internal-error detection/reset flow wins the race, we still get delays (while that flow tries repeatedly to detect comm-channel recovery). The cited commit fixed the command timeout problem when the internal-error detection/reset flow loses the race. This commit avoids the unneeded delays when the internal-error detection/reset flow wins. Fixes: d585df1c ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Reported-by: NSimon Xiao <sixiao@microsoft.com> Signed-off-by: NTariq Toukan <tariqt@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Heiko Carstens 提交于
The last caller of assert_held_device_hotplug() is gone, so remove it again. Link: http://lkml.kernel.org/r/20170314125226.16779-3-heiko.carstens@de.ibm.comSigned-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masami Hiramatsu 提交于
Add a prototype of task_struct to fix below warning on arm64. In file included from arch/arm64/kernel/probes/kprobes.c:19:0: include/linux/kasan.h:81:132: error: 'struct task_struct' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] static inline void kasan_unpoison_task_stack(struct task_struct *task) {} As same as other types (kmem_cache, page, and vm_struct) this adds a prototype of task_struct data structure on top of kasan.h. [arnd] A related warning was fixed before, but now appears in a different line in the same file in v4.11-rc2. The patch from Masami Hiramatsu still seems appropriate, so let's take his version. Fixes: 71af2ed5 ("kasan, sched/headers: Remove <linux/sched.h> from <linux/kasan.h>") Link: https://patchwork.kernel.org/patch/9569839/ Link: http://lkml.kernel.org/r/20170313141517.3397802-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Acked-by: NAlexander Potapenko <glider@google.com> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ido Schimmel 提交于
Whenever a FIB rule is added or removed, a notification is sent in the FIB notification chain. However, listeners don't have a way to tell which rule was added or removed. This is problematic as we would like to give listeners the ability to decide which action to execute based on the notified rule. Specifically, offloading drivers should be able to determine if they support the reflection of the notified FIB rule and flush their LPM tables in case they don't. Do that by adding a notifier info to these notifications and embed the common FIB rule struct in it. Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Acked-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ido Schimmel 提交于
Currently, when non-default (custom) FIB rules are used, devices capable of layer 3 offloading flush their tables and let the kernel do the forwarding instead. When these devices' drivers are loaded they register to the FIB notification chain, which lets them know about the existence of any custom FIB rules. This is done by sending a RULE_ADD notification based on the value of 'net->ipv4.fib_has_custom_rules'. This approach is problematic when VRF offload is taken into account, as upon the creation of the first VRF netdev, a l3mdev rule is programmed to direct skbs to the VRF's table. Instead of merely reading the above value and sending a single RULE_ADD notification, we should iterate over all the FIB rules and send a detailed notification for each, thereby allowing offloading drivers to sanitize the rules they don't support and potentially flush their tables. While l3mdev rules are uniquely marked, the default rules are not. Therefore, when they are being notified they might invoke offloading drivers to unnecessarily flush their tables. Solve this by adding an helper to check if a FIB rule is a default rule. Namely, its selector should match all packets and its action should point to the local, main or default tables. As noted by David Ahern, uniquely marking the default rules is insufficient. When using VRFs, it's common to avoid false hits by moving the rule for the local table to just before the main table: Default configuration: $ ip rule show 0: from all lookup local 32766: from all lookup main 32767: from all lookup default Common configuration with VRFs: $ ip rule show 1000: from all lookup [l3mdev-table] 32765: from all lookup local 32766: from all lookup main 32767: from all lookup default Signed-off-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Acked-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 3月, 2017 3 次提交
-
-
由 Vivien Didelot 提交于
If a DSA switch driver cannot program an ageing time value due to it being out-of-range, switchdev will raise a stack trace before failing. To fix this, add ageing_time_min and ageing_time_max members to the dsa_switch in order for the switch drivers to optionally specify their supported ageing time limits. The DSA core will now check for provided ageing time limits and return -ERANGE from the switchdev prepare phase if the value is out-of-range. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amritha Nambiar 提交于
The configurable priority to traffic class mapping and the user specified queue ranges are used to configure the traffic class, overriding the hardware defaults when the 'hw' option is set to 0. However, when the 'hw' option is non-zero, the hardware QOS defaults are used. This patch makes it so that we can pass the data the user provided to ndo_setup_tc. This allows us to pull in the queue configuration if the user requested it as well as any additional hardware offload type requested by using a value other than 1 for the hw value. Finally it also provides a means for the device driver to return the level supported for the offload type via the qopt->hw value. Previously we were just always assuming the value to be 1, in the future values beyond just 1 may be supported. Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch is meant to allow for support of multiple hardware offload type for a single device. There is currently no bounds checking for the hw member of the mqprio_qopt structure. This results in us being able to pass values from 1 to 255 with all being treated the same. On retreiving the value it is returned as 1 for anything 1 or greater being set. With this change we are currently adding limited bounds checking by defining an enum and using those values to limit the reported hardware offloads. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-