- 02 11月, 2019 1 次提交
-
-
由 Björn Töpel 提交于
In this commit the XSKMAP entry lookup function used by the XDP redirect code is moved from the xskmap.c file to the xdp_sock.h header, so the lookup can be inlined from, e.g., the bpf_xdp_redirect_map() function. Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush() functions. Finally, all the XDP socket functions were moved from linux/bpf.h to net/xdp_sock.h, where most of the XDP sockets functions are anyway. This yields a ~2% performance boost for the xdpsock "rx_drop" scenario. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com
-
- 01 11月, 2019 3 次提交
-
-
由 Vivien Didelot 提交于
Now that the DSA ports are listed in the switch fabric, there is no need to store the dsa_switch structures from the drivers in the fabric anymore. So get rid of the dst->ds static array. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
Drivers do not use the ds->rtable static arrays anymore, get rid of it. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
Implement a new list of DSA links in the switch fabric itself, to provide an alterative to the ds->rtable static arrays. At the same time, provide a new dsa_routing_port() helper to abstract the usage of ds->rtable in drivers. If there's no port to reach a given device, return the first invalid port, ds->num_ports. This avoids potential signedness errors or the need to define special values. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 10月, 2019 11 次提交
-
-
由 Vlad Buslov 提交于
Extend struct tc_action with new "tcfa_flags" field. Set the field in tcf_idr_create() function and provide new helper tcf_idr_create_from_flags() that derives 'cpustats' boolean from flags value. Update individual hardware-offloaded actions init() to pass their "flags" argument to new helper in order to skip percpu stats allocation when user requested it through flags. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Extend TCA_ACT space with nla_bitfield32 flags. Add TCA_ACT_FLAGS_NO_PERCPU_STATS as the only allowed flag. Parse the flags in tcf_action_init_1() and pass resulting value as additional argument to a_o->init(). Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Modify stats update helper functions introduced in previous patches in this series to fallback to regular tc_action->tcfa_{b|q}stats if cpu stats are not allocated for the action argument. If regular non-percpu allocated counters are in use, then obtain action tcfa_lock while modifying them. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Previous commit introduced helper function for updating qstats and refactored set of actions to use the helpers, instead of modifying qstats directly. However, one of the affected action exposes its qstats to skb_tc_reinsert(), which then modifies it. Refactor skb_tc_reinsert() to return integer error code and don't increment overlimit qstats in case of error, and use the returned error code in tcf_mirred_act() to manually increment the overlimit counter with new helper function. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Extract common code that increments cpu_qstats counters into standalone act API functions. Change hardware offloaded actions that use percpu counter allocation to use the new functions instead of accessing cpu_qstats directly. This commit doesn't change functionality. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Extract common code that increments cpu_bstats counter into standalone act API function. Change hardware offloaded actions that use percpu counter allocation to use the new function instead of incrementing cpu_bstats directly. This commit doesn't change functionality. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vlad Buslov 提交于
Currently, all implementations of tc_action_ops->stats_update() callback have almost exactly the same implementation of counters update code (besides gact which also updates drop counter). In order to simplify support for using both percpu-allocated and regular action counters depending on run-time flag in following patches, extract action counters update code into standalone function in act API. This commit doesn't change functionality. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We already annotated most accesses to sk->sk_napi_id We missed sk_mark_napi_id() and sk_mark_napi_id_once() which might be called without socket lock held in UDP stack. KCSAN reported : BUG: KCSAN: data-race in udpv6_queue_rcv_one_skb / udpv6_queue_rcv_one_skb write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 0: sk_mark_napi_id include/net/busy_poll.h:125 [inline] __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline] udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459 dst_input include/net/dst.h:442 [inline] ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 write to 0xffff888121c6d108 of 4 bytes by interrupt on cpu 1: sk_mark_napi_id include/net/busy_poll.h:125 [inline] __udpv6_queue_rcv_skb net/ipv6/udp.c:571 [inline] udpv6_queue_rcv_one_skb+0x70c/0xb40 net/ipv6/udp.c:672 udpv6_queue_rcv_skb+0xb5/0x400 net/ipv6/udp.c:689 udp6_unicast_rcv_skb.isra.0+0xd7/0x180 net/ipv6/udp.c:832 __udp6_lib_rcv+0x69c/0x1770 net/ipv6/udp.c:913 udpv6_rcv+0x2b/0x40 net/ipv6/udp.c:1015 ip6_protocol_deliver_rcu+0x22a/0xbe0 net/ipv6/ip6_input.c:409 ip6_input_finish+0x30/0x50 net/ipv6/ip6_input.c:450 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip6_input+0x177/0x190 net/ipv6/ip6_input.c:459 dst_input include/net/dst.h:442 [inline] ip6_rcv_finish+0x110/0x140 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ipv6_rcv+0x1a1/0x1b0 net/ipv6/ip6_input.c:284 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 10890 Comm: syz-executor.0 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: e68b6e50 ("udp: enable busy polling for all sockets") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matteo Croce 提交于
The ICMP flow dissector currently parses only the Type and Code fields. Some ICMP packets (echo, timestamp) have a 16 bit Identifier field which is used to correlate packets. Add such field in flow_dissector_key_icmp and replace skb_flow_get_be16() with a more complex function which populate this field. Signed-off-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Matteo Croce 提交于
Documents two piece of code which can't be understood at a glance. Signed-off-by: NMatteo Croce <mcroce@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This socket field can be read and written by concurrent cpus. Use READ_ONCE() and WRITE_ONCE() annotations to document this, and avoid some compiler 'optimizations'. KCSAN reported : BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0: sk_incoming_cpu_update include/net/sock.h:953 [inline] tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 __do_softirq+0x115/0x33f kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082 do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337 do_softirq kernel/softirq.c:329 [inline] __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189 read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1: sk_incoming_cpu_update include/net/sock.h:952 [inline] tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 10月, 2019 2 次提交
-
-
由 Ursula Braun 提交于
The only smc-related reference in net/sock.h is struct smc_hashinfo. But just its address is refered to. Thus there is no need for the include of net/smc.h. Remove it. Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
Return directly from within the loop as soon as the port is found, otherwise we won't return NULL if the end of the list is reached. Fixes: b96ddf25 ("net: dsa: use ports list in dsa_to_port") Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 10月, 2019 3 次提交
-
-
由 Andrew Lunn 提交于
Add plumbing to allow DSA drivers to register parameters with devlink. To keep with the abstraction, the DSA drivers pass the ds structure to these helpers, and the DSA core then translates that to the devlink structure associated to the device. Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
sk_page_frag() optimizes skb_frag allocations by using per-task skb_frag cache when it knows it's the only user. The condition is determined by seeing whether the socket allocation mask allows blocking - if the allocation may block, it obviously owns the task's context and ergo exclusively owns current->task_frag. Unfortunately, this misses recursion through memory reclaim path. Please take a look at the following backtrace. [2] RIP: 0010:tcp_sendmsg_locked+0xccf/0xe10 ... tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 sock_xmit.isra.24+0xa1/0x170 [nbd] nbd_send_cmd+0x1d2/0x690 [nbd] nbd_queue_rq+0x1b5/0x3b0 [nbd] __blk_mq_try_issue_directly+0x108/0x1b0 blk_mq_request_issue_directly+0xbd/0xe0 blk_mq_try_issue_list_directly+0x41/0xb0 blk_mq_sched_insert_requests+0xa2/0xe0 blk_mq_flush_plug_list+0x205/0x2a0 blk_flush_plug_list+0xc3/0xf0 [1] blk_finish_plug+0x21/0x2e _xfs_buf_ioapply+0x313/0x460 __xfs_buf_submit+0x67/0x220 xfs_buf_read_map+0x113/0x1a0 xfs_trans_read_buf_map+0xbf/0x330 xfs_btree_read_buf_block.constprop.42+0x95/0xd0 xfs_btree_lookup_get_block+0x95/0x170 xfs_btree_lookup+0xcc/0x470 xfs_bmap_del_extent_real+0x254/0x9a0 __xfs_bunmapi+0x45c/0xab0 xfs_bunmapi+0x15/0x30 xfs_itruncate_extents_flags+0xca/0x250 xfs_free_eofblocks+0x181/0x1e0 xfs_fs_destroy_inode+0xa8/0x1b0 destroy_inode+0x38/0x70 dispose_list+0x35/0x50 prune_icache_sb+0x52/0x70 super_cache_scan+0x120/0x1a0 do_shrink_slab+0x120/0x290 shrink_slab+0x216/0x2b0 shrink_node+0x1b6/0x4a0 do_try_to_free_pages+0xc6/0x370 try_to_free_mem_cgroup_pages+0xe3/0x1e0 try_charge+0x29e/0x790 mem_cgroup_charge_skmem+0x6a/0x100 __sk_mem_raise_allocated+0x18e/0x390 __sk_mem_schedule+0x2a/0x40 [0] tcp_sendmsg_locked+0x8eb/0xe10 tcp_sendmsg+0x27/0x40 sock_sendmsg+0x30/0x40 ___sys_sendmsg+0x26d/0x2b0 __sys_sendmsg+0x57/0xa0 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In [0], tcp_send_msg_locked() was using current->page_frag when it called sk_wmem_schedule(). It already calculated how many bytes can be fit into current->page_frag. Due to memory pressure, sk_wmem_schedule() called into memory reclaim path which called into xfs and then IO issue path. Because the filesystem in question is backed by nbd, the control goes back into the tcp layer - back into tcp_sendmsg_locked(). nbd sets sk_allocation to (GFP_NOIO | __GFP_MEMALLOC) which makes sense - it's in the process of freeing memory and wants to be able to, e.g., drop clean pages to make forward progress. However, this confused sk_page_frag() called from [2]. Because it only tests whether the allocation allows blocking which it does, it now thinks current->page_frag can be used again although it already was being used in [0]. After [2] used current->page_frag, the offset would be increased by the used amount. When the control returns to [0], current->page_frag's offset is increased and the previously calculated number of bytes now may overrun the end of allocated memory leading to silent memory corruptions. Fix it by adding gfpflags_normal_context() which tests sleepable && !reclaim and use it to determine whether to use current->task_frag. v2: Eric didn't like gfp flags being tested twice. Introduce a new helper gfpflags_normal_context() and combine the two tests. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Geert Uytterhoeven 提交于
Fix misspellings of "disconnect", "disconnecting", "connections", and "disconnected". Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be> Acked-by: NKalle Valo <kvalo@codeaurora.org> Acked-by: NSimon Horman <horms@verge.net.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 10月, 2019 2 次提交
-
-
由 Guillaume Nault 提交于
In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances, but there are a few paths calling rtnl_net_notifyid() from atomic context or from RCU critical sections. The later also precludes the use of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new() call is wrong too, as it uses GFP_KERNEL unconditionally. Therefore, we need to pass the GFP flags as parameter and propagate it through function calls until the proper flags can be determined. In most cases, GFP_KERNEL is fine. The exceptions are: * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump() indirectly call rtnl_net_notifyid() from RCU critical section, * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as parameter. Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used by nlmsg_new(). The function is allowed to sleep, so better make the flags consistent with the ones used in the following ovs_vport_cmd_fill_info() call. Found by code inspection. Fixes: 9a963454 ("netns: notify netns id events") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ben Dooks (Codethink) 提交于
If CONFIG_NET_HWBM is not set, then these stub functions in <net/hwbm.h> should be declared static to avoid trying to export them from any driver that includes this. Fixes the following sparse warnings: ./include/net/hwbm.h:24:6: warning: symbol 'hwbm_buf_free' was not declared. Should it be static? ./include/net/hwbm.h:25:5: warning: symbol 'hwbm_pool_refill' was not declared. Should it be static? ./include/net/hwbm.h:26:5: warning: symbol 'hwbm_pool_add' was not declared. Should it be static? Signed-off-by: NBen Dooks (Codethink) <ben.dooks@codethink.co.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 10月, 2019 3 次提交
-
-
由 Taehee Yoo 提交于
This patch removes variables and callback these are related to the nested device structure. devices that can be nested have their own nest_level variable that represents the depth of nested devices. In the previous patch, new {lower/upper}_level variables are added and they replace old private nest_level variable. So, this patch removes all 'nest_level' variables. In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added to get lockdep subclass value, which is actually lower nested depth value. But now, they use the dynamic lockdep key to avoid lockdep warning instead of the subclass. So, this patch removes ->ndo_get_lock_subclass() callback. Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
Current vxlan code doesn't limit the number of nested devices. Nested devices would be handled recursively and this routine needs huge stack memory. So, unlimited nested devices could make stack overflow. In order to fix this issue, this patch adds adjacent links. The adjacent link APIs internally check the depth level. Test commands: ip link add dummy0 type dummy ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \ dstport 4789 for i in {1..100} do let A=$i-1 ip link add vxlan$i type vxlan id $i group 239.1.1.1 \ dev vxlan$A dstport 4789 done ip link del dummy0 The top upper link is vxlan100 and the lowest link is vxlan0. When vxlan0 is deleting, the upper devices will be deleted recursively. It needs huge stack memory so it makes stack overflow. Splat looks like: [ 229.628477] ============================================================================= [ 229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2 [ 229.629785] ----------------------------------------------------------------------------- [ 229.629785] [ 229.655439] ================================================================== [ 229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200 [ 229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0 [ 229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334 [ 229.655688] [ 229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff ........h....... [ 229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff 8.......h....... [ 229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f 30b.......`".... [ 229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff .`.............. [ 229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff h........!...... [ 229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00 ................ [ 229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df !..!............ [ 229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff .`.............. [ 229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00 ...A.... [ 229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a [ ... ] Fixes: acaf4e70 ("net: vxlan: when lower dev unregisters remove vxlan dev as well") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taehee Yoo 提交于
All bonding device has same lockdep key and subclass is initialized with nest_level. But actual nest_level value can be changed when a lower device is attached. And at this moment, the subclass should be updated but it seems to be unsafe. So this patch makes bonding use dynamic lockdep key instead of the subclass. Test commands: ip link add bond0 type bond for i in {1..5} do let A=$i-1 ip link add bond$i type bond ip link set bond$i master bond$A done ip link set bond5 master bond0 Splat looks like: [ 307.992912] WARNING: possible recursive locking detected [ 307.993656] 5.4.0-rc3+ #96 Tainted: G W [ 307.994367] -------------------------------------------- [ 307.995092] ip/761 is trying to acquire lock: [ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.997045] but task is already holding lock: [ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 307.999215] other info that might help us debug this: [ 308.000251] Possible unsafe locking scenario: [ 308.001137] CPU0 [ 308.001533] ---- [ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2); [ 308.003302] *** DEADLOCK *** [ 308.004310] May be due to missing lock nesting notation [ 308.005319] 3 locks held by ip/761: [ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0 [ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding] [ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding] [ 308.009422] stack backtrace: [ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96 [ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 308.012179] Call Trace: [ 308.012601] dump_stack+0x7c/0xbb [ 308.013089] __lock_acquire+0x269d/0x3de0 [ 308.013669] ? register_lock_class+0x14d0/0x14d0 [ 308.014318] lock_acquire+0x164/0x3b0 [ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.015520] _raw_spin_lock_nested+0x2e/0x60 [ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding] [ 308.017215] bond_get_stats+0xb8/0x500 [bonding] [ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding] [ 308.019710] ? rcu_read_lock_held+0x90/0xa0 [ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding] [ 308.021953] dev_get_stats+0x1ec/0x270 [ 308.022508] bond_get_stats+0x1d1/0x500 [bonding] Fixes: d3fff6c4 ("net: add netdev_lockdep_set_classes() helper") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 10月, 2019 2 次提交
-
-
由 Eric Dumazet 提交于
syzbot reported the following issue : BUG: KCSAN: data-race in update_defense_level / update_defense_level read to 0xffffffff861a6260 of 4 bytes by task 3006 on cpu 1: update_defense_level+0x621/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:177 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 write to 0xffffffff861a6260 of 4 bytes by task 7333 on cpu 0: update_defense_level+0xa62/0xb30 net/netfilter/ipvs/ip_vs_ctl.c:205 defense_work_handler+0x3d/0xd0 net/netfilter/ipvs/ip_vs_ctl.c:225 process_one_work+0x3d4/0x890 kernel/workqueue.c:2269 worker_thread+0xa0/0x800 kernel/workqueue.c:2415 kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7333 Comm: kworker/0:5 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events defense_work_handler Indeed, old_secure_tcp is currently a static variable, while it needs to be a per netns variable. Fixes: a0840e2e ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NSimon Horman <horms@verge.net.au>
-
由 Eric Dumazet 提交于
UDP IPv6 packets auto flowlabels are using a 32bit secret (static u32 hashrnd in net/core/flow_dissector.c) and apply jhash() over fields known by the receivers. Attackers can easily infer the 32bit secret and use this information to identify a device and/or user, since this 32bit secret is only set at boot time. Really, using jhash() to generate cookies sent on the wire is a serious security concern. Trying to change the rol32(hash, 16) in ip6_make_flowlabel() would be a dead end. Trying to periodically change the secret (like in sch_sfq.c) could change paths taken in the network for long lived flows. Let's switch to siphash, as we did in commit df453700 ("inet: switch IP ID generator to siphash") Using a cryptographically strong pseudo random function will solve this privacy issue and more generally remove other weak points in the stack. Packet schedulers using skb_get_hash_perturb() benefit from this change. Fixes: b5677416 ("ipv6: Enable auto flow labels by default") Fixes: 42240901 ("ipv6: Implement different admin modes for automatic flow labels") Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel") Fixes: cb1ce2ef ("ipv6: Implement automatic flow label generation on transmit") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NJonathan Berger <jonathann1@walla.com> Reported-by: NAmit Klein <aksecurity@gmail.com> Reported-by: NBenny Pinkas <benny@pinkas.net> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 10月, 2019 11 次提交
-
-
由 Pablo Neira Ayuso 提交于
This patch allows you to register one netdev basechain to multiple devices. This adds a new NFTA_HOOK_DEVS netlink attribute to specify the list of netdevices. Basechains store a list of hooks. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Rise the maximum limit of devices per flowtable up to 256. Rename NFT_FLOWTABLE_DEVICE_MAX to NFT_NETDEVICE_MAX in preparation to reuse the netdev hook parser for ingress basechain. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Use a list of hooks per device instead an array. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Hardware offload needs access to the priority field, store this field in the nf_flowtable object. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Vivien Didelot 提交于
Now that ports are dynamically listed in the fabric, there is no need to provide a special helper to allocate the dsa_switch structure. This will give more flexibility to drivers to embed this structure as they wish in their private structure. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Vivien Didelot 提交于
Allocate the struct dsa_port the first time it is accessed with dsa_port_touch, and remove the static dsa_port array from the dsa_switch structure. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Vivien Didelot 提交于
Use the new ports list instead of iterating over switches and their ports when setting up the default CPU port. Unassign it on teardown. Now that we can iterate over multiple CPU ports, remove dst->cpu_dp. At the same time, provide a better error message for CPU-less tree. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Vivien Didelot 提交于
Use the new ports list instead of iterating over switches and their ports when setting up the switches and their ports. At the same time, provide setup states and messages for ports and switches as it is done for the trees. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Vivien Didelot 提交于
Use the new ports list instead of accessing the dsa_switch array of ports in the dsa_to_port helper. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Vivien Didelot 提交于
Add a list of switch ports within the switch fabric. This will help the lookup of a port inside the whole fabric, and it is the first step towards supporting multiple CPU ports, before deprecating the usage of the unique dst->cpu_dp pointer. In preparation for a future allocation of the dsa_port structures, return -ENOMEM in case no structure is returned, even though this error cannot be reached yet. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
由 Vivien Didelot 提交于
Do not let the drivers access the ds->ports static array directly while there is a dsa_to_port helper for this purpose. At the same time, un-const this helper since the SJA1105 driver assigns the priv member of the returned dsa_port structure. Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
-
- 22 10月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
This patch removes the iph field from the state structure, which is not properly initialized. Instead, add a new field to make the "do we want to set DF" be the state bit and move the code to set the DF flag from ip_frag_next(). Joint work with Pablo and Linus. Fixes: 19c3401a ("net: ipv4: place control buffer handling away from fragmentation iterators") Reported-by: NPatrick Schönthaler <patrick@notvads.ovh> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 10月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
Intel test robot reported a ~7% regression on TCP_CRR tests that they bisected to the cited commit. Indeed, every time a new TCP socket is created or deleted, the atomic counter net->count is touched (via get_net(net) and put_net(net) calls) So cpus might have to reload a contended cache line in net_hash_mix(net) calls. We need to reorder 'struct net' fields to move @hash_mix in a read mostly cache line. We move in the first cache line fields that can be dirtied often. We probably will have to address in a followup patch the __randomize_layout that was added in linux-4.13, since this might break our placement choices. Fixes: 355b9855 ("netns: provide pure entropy for net_hash_mix()") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nkernel test robot <oliver.sang@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-