- 02 4月, 2021 5 次提交
-
-
由 Cong Wang 提交于
The RCU callback sk_psock_destroy() only queues work psock->gc, so we can just switch to rcu work to simplify the code. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-6-xiyou.wangcong@gmail.com
-
由 Cong Wang 提交于
We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches zero but here we want to purge them explicitly in sock_map_close(). There are also some nasty race conditions on testing bit SK_PSOCK_TX_ENABLED and queuing/canceling the psock work, we can expand psock->ingress_lock a bit to protect them too. As noticed by John, we still have to lock the psock->work, because the same work item could be running concurrently on different CPU's. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-5-xiyou.wangcong@gmail.com
-
由 Cong Wang 提交于
We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-4-xiyou.wangcong@gmail.com
-
由 Cong Wang 提交于
Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJakub Sitnicki <jakub@cloudflare.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-3-xiyou.wangcong@gmail.com
-
由 Cong Wang 提交于
Currently we purge the ingress_skb queue only when psock refcnt goes down to 0, so locking the queue is not necessary, but in order to be called during ->close, we have to lock it here. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJakub Sitnicki <jakub@cloudflare.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210331023237.41094-2-xiyou.wangcong@gmail.com
-
- 27 3月, 2021 1 次提交
-
-
由 Martin KaFai Lau 提交于
This patch adds a few kernel function bpf_kfunc_call_test*() for the selftest's test_run purpose. They will be allowed for tc_cls prog. The selftest calling the kernel function bpf_kfunc_call_test*() is also added in this patch. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210325015252.1551395-1-kafai@fb.com
-
- 25 3月, 2021 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
This patch adds dev_fill_forward_path() which resolves the path to reach the real netdevice from the IP forwarding side. This function takes as input the netdevice and the destination hardware address and it walks down the devices calling .ndo_fill_forward_path() for each device until the real device is found. For instance, assuming the following topology: IP forwarding / \ br0 eth0 / \ eth1 eth2 . . . ethX ab:cd:ef:ab:cd:ef where eth1 and eth2 are bridge ports and eth0 provides WAN connectivity. ethX is the interface in another box which is connected to the eth1 bridge port. For packets going through IP forwarding to br0 whose destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() provides the following path: br0 -> eth1 .ndo_fill_forward_path for br0 looks up at the FDB for the bridge port from the destination MAC address to get the bridge port eth1. This information allows to create a fast path that bypasses the classic bridge and IP forwarding paths, so packets go directly from the bridge port eth1 to eth0 (wan interface) and vice versa. fast path .------------------------. / \ | IP forwarding | | / \ \/ | br0 eth0 . / \ -> eth1 eth2 . . . ethX ab:cd:ef:ab:cd:ef Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2021 1 次提交
-
-
由 Dmitry Vyukov 提交于
netdev_wait_allrefs() issues a warning if refcount does not drop to 0 after 10 seconds. While 10 second wait generally should not happen under normal workload in normal environment, it seems to fire falsely very often during fuzzing and/or in qemu emulation (~10x slower). At least it's not possible to understand if it's really a false positive or not. Automated testing generally bumps all timeouts to very high values to avoid flake failures. Add net.core.netdev_unregister_timeout_secs sysctl to make the timeout configurable for automated testing systems. Lowering the timeout may also be useful for e.g. manual bisection. The default value matches the current behavior. Signed-off-by: NDmitry Vyukov <dvyukov@google.com> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877 Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 3月, 2021 4 次提交
-
-
由 Eric Dumazet 提交于
When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the initial net device refcount was 0. When CONFIG_PCPU_DEV_REFCNT is not set, this means the first dev_hold() triggers an illegal refcount operation (addition on 0) refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x128/0x1a4 Fix is to change initial (and final) refcount to be 1. Also add a missing kerneldoc piece, as reported by Stephen Rothwell. Fixes: 919067cc ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NGuenter Roeck <groeck@google.com> Tested-by: NGuenter Roeck <groeck@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
xps_queue_show is mostly made of an RCU read-side critical section and calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not allowed as this call may sleep and such behaviours aren't allowed in RCU read-side critical sections. Fix this by using GFP_NOWAIT instead. Fixes: 5478fcd0 ("net: embed nr_ids in the xps maps") Reported-by: Nkernel test robot <oliver.sang@intel.com> Suggested-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
ptype_all and ptype_base are declared in net/core/dev.c as non-static, because they are used by net-procfs.c too. However, a "make W=1" build complains that there was no previous declaration of ptype_all and ptype_base in a header file, so this way of declaring things constitutes a violation of coding style. Let's move the extern declarations of ptype_all and ptype_base to the linux/netdevice.h file, which is included by net-procfs.c too. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vladimir Oltean 提交于
Since their introduction in commit 04157469 ("net: Use static_key for XPS maps"), xps_needed and xps_rxqs_needed were never used outside net/core/dev.c, so I don't really understand why they were exported as symbols in the first place. This is needed in order to silence a "make W=1" warning about these static keys not being declared as static variables, but not having a previous declaration in a header file nonetheless. Cc: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2021 1 次提交
-
-
由 Eric Dumazet 提交于
I was working on a syzbot issue, claiming one device could not be dismantled because its refcount was -1 unregister_netdevice: waiting for sit0 to become free. Usage count = -1 It would be nice if syzbot could trigger a warning at the time this reference count became negative. This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults to per cpu variables (as before this patch) on SMP builds. v2: free_dev label in alloc_netdev_mqs() is moved to avoid a compiler warning (-Wunused-label), as reported by kernel test robot <lkp@intel.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2021 15 次提交
-
-
由 Xiong Zhenwu 提交于
A typo is found out by codespell tool in 1734th line of drop_monitor.c: $ codespell ./net/core/ ./net/core/drop_monitor.c:1734: guarnateed ==> guaranteed Fix a typo found by codespell. Signed-off-by: NXiong Zhenwu <xiong.zhenwu@zte.com.cn> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
In __netif_set_xps_queue, old map entries from the old dev_maps are freed but their corresponding entry in the old dev_maps aren't NULLed. Fix this. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
When setting up an new dev_maps in __netif_set_xps_queue, we remove and free maps from unused CPUs/rx-queues near the end of the function; by calling remove_xps_queue. However it's possible those maps are also part of the old not-freed-yet dev_maps, which might be used concurrently. When that happens, a map can be freed while its corresponding entry in the old dev_maps table isn't NULLed, leading to: "BUG: KASAN: use-after-free" in different places. This fixes the map freeing logic for unused CPUs/rx-queues, to also NULL the map entries from the old dev_maps table. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Most of the xps_cpus_show and xps_rxqs_show functions share the same logic. Having it in two different functions does not help maintenance. This patch moves their common logic into a new function, xps_queue_show, to improve this. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Now that nr_ids and num_tc are stored in the xps dev_maps, which are RCU protected, we do not have the need to protect the maps in the rtnl lock. Move the rtnl unlock up so we reduce the rtnl locking section. We also increase the reference count on the subordinate device if any, as we don't want this device to be freed while we use it (now that the rtnl lock isn't protecting it in the whole function). Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Improve the readability of the loop removing tx-queue from unused CPUs/rx-queues in __netif_set_xps_queue. The change should only be cosmetic. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
This patch adds an helper, xps_copy_dev_maps, to copy maps from dev_maps to new_dev_maps at a given index. The logic should be the same, with an improved code readability and maintenance. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in net_device. That will simplify a lot the code removing the need for lots of if/else conditionals as the correct map will be available using its offset in the array. This should not modify the xps maps behaviour in any way. Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Remove the xps possible_mask. It was an optimization but we can just loop from 0 to nr_ids now that it is embedded in the xps dev_maps. That simplifies the code a bit. Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Embed nr_ids (the number of cpu for the xps cpus map, and the number of rxqs for the xps cpus map) in dev_maps. That will help not accessing out of bound memory if those values change after dev_maps was allocated. Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
The xps cpus/rxqs map is accessed using dev->num_tc, which is used when allocating the map. But later updates of dev->num_tc can lead to having a mismatch between the maps and how they're accessed. In such cases the map values do not make any sense and out of bound accesses can occur (that can be easily seen using KASAN). This patch aims at fixing this by embedding num_tc into the maps, using the value at the time the map is created. This brings two improvements: - The maps can be accessed using the embedded num_tc, so we know for sure we won't have out of bound accesses. - Checks can be made before accessing the maps so we know the values retrieved will make sense. We also update __netif_set_xps_queue to conditionally copy old maps from dev_maps in the new one only if the number of traffic classes from both maps match. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Make the implementations of xps_cpus_show and xps_rxqs_show to converge, as the two share the same logic but diverted over time. This should not modify their behaviour but will help future changes and improve maintenance. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
In net-sysfs, get_netdev_queue_index returns an unsigned int. Some of its callers use an unsigned long to store the returned value. Update the code to be consistent, this should only be cosmetic. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Use bitmap_zalloc instead of zalloc_cpumask_var in xps_cpus_show to align with xps_rxqs_show. This will improve maintenance and allow us to factorize the two functions. The function should behave the same. Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Bohac 提交于
__dev_alloc_name(), when supplied with a name containing '%d', will search for the first available device number to generate a unique device name. Since commit ff927412 ("net: introduce name_node struct to be used in hashlist") network devices may have alternate names. __dev_alloc_name() does take these alternate names into account, possibly generating a name that is already taken and failing with -ENFILE as a result. This demonstrates the bug: # rmmod dummy 2>/dev/null # ip link property add dev lo altname dummy0 # modprobe dummy numdummies=1 modprobe: ERROR: could not insert 'dummy': Too many open files in system Instead of creating a device named dummy1, modprobe fails. Fix this by checking all the names in the d->name_node list, not just d->name. Signed-off-by: NJiri Bohac <jbohac@suse.cz> Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist") Reviewed-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2021 1 次提交
-
-
由 Wei Wang 提交于
Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to determine if the kthread owns this napi and could call napi->poll() on it. However, if socket busy poll is enabled, it is possible that the busy poll thread grabs this SCHED bit (after the previous napi->poll() invokes napi_complete_done() and clears SCHED bit) and tries to poll on the same napi. napi_disable() could grab the SCHED bit as well. This patch tries to fix this race by adding a new bit NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in ____napi_schedule() if the threaded mode is enabled, and gets cleared in napi_complete_done(), and we only poll the napi in kthread if this bit is set. This helps distinguish the ownership of the napi between kthread and other scenarios and fixes the race issue. Fixes: 29863d41 ("net: implement threaded-able napi poll loop support") Reported-by: NMartin Zaharinov <micron10@gmail.com> Suggested-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NWei Wang <weiwan@google.com> Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 3月, 2021 3 次提交
-
-
由 Martin Willi 提交于
When a non-initial netns is destroyed, the usual policy is to delete all virtual network interfaces contained, but move physical interfaces back to the initial netns. This keeps the physical interface visible on the system. CAN devices are somewhat special, as they define rtnl_link_ops even if they are physical devices. If a CAN interface is moved into a non-initial netns, destroying that netns lets the interface vanish instead of moving it back to the initial netns. default_device_exit() skips CAN interfaces due to having rtnl_link_ops set. Reproducer: ip netns add foo ip link set can0 netns foo ip netns delete foo WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60 CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1 Workqueue: netns cleanup_net [<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14) [<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8) [<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114) [<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac) [<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60) [<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380) [<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438) [<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8) [<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c) [<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c) To properly restore physical CAN devices to the initial netns on owning netns exit, introduce a flag on rtnl_link_ops that can be set by drivers. For CAN devices setting this flag, default_device_exit() considers them non-virtual, applying the usual namespace move. The issue was introduced in the commit mentioned below, as at that time CAN devices did not have a dellink() operation. Fixes: e008b5fc ("net: Simplfy default_device_exit and improve batching.") Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.orgSigned-off-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Lorenzo Bianconi 提交于
For wireless devices (e.g. mt76 driver) multiple net_devices belongs to the same wireless phy and the napi object is registered in a dummy netdevice related to the wireless phy. Export dev_set_threaded in order to be reused in device drivers enabling threaded NAPI. Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Manu Bretelle 提交于
Augment the current set of options that are accessible via bpf_{g,s}etsockopt to also support SO_REUSEPORT. Signed-off-by: NManu Bretelle <chantra@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210310182305.1910312-1-chantra@fb.com
-
- 15 3月, 2021 5 次提交
-
-
由 Alexander Lobakin 提交于
Flow Dissector code never modifies the input buffer, neither skb nor raw data. Make 'data' argument const for all of the Flow dissector's functions. Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Lobakin 提交于
'hash' stores not the flow hash, but the index of the GRO bucket corresponding to it. Change its name to 'bucket' to avoid confusion while reading lines like '__set_bit(hash, &napi->gro_bitmask)'. Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Lobakin 提交于
GRO bucket index doesn't change through the entire function. Store a pointer to the corresponding bucket instead of its member and use it consistently through the function. It is performance-safe since &gro_list->list == gro_list. Misc: remove superfluous braces around single-line branches. Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Lobakin 提交于
gro_list_prepare() always returns &napi->gro_hash[bucket].list, without any variations. Moreover, it uses 'napi' argument only to have access to this list, and calculates the bucket index for the second time (firstly it happens at the beginning of dev_gro_receive()) to do that. Given that dev_gro_receive() already has an index to the needed list, just pass it as the first argument to eliminate redundant calculations, and make gro_list_prepare() return void. Also, both arguments of gro_list_prepare() can be constified since this function can only modify the skbs from the bucket list. Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Lobakin 提交于
flow_dissector_key_icmp::id is of type u16 (CPU byteorder), ICMP header has its ID field in network byteorder obviously. Sparse says: net/core/flow_dissector.c:178:43: warning: restricted __be16 degrades to integer Convert ID value to CPU byteorder when storing it into flow_dissector_key_icmp. Fixes: 5dec597e ("flow_dissector: extract more ICMP information") Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 3月, 2021 1 次提交
-
-
由 Tonghao Zhang 提交于
Introduce the new function tw_prot_init (inspired by req_prot_init) to simplify "proto_register" function. tw_prot_cleanup will take care of a partially initialized timewait_sock_ops. Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2021 2 次提交
-
-
由 Ido Schimmel 提交于
In the rare case that drop_monitor fails to register its probe on the 'napi_poll' tracepoint, it will not deactivate its hysteresis timer as part of the error path. If the hysteresis timer was armed by the shortly lived 'kfree_skb' probe and user space retries to initiate tracing, a warning will be emitted for trying to initialize an active object [1]. Fix this by properly undoing all the operations that were done prior to probe registration, in both software and hardware code paths. Note that syzkaller managed to fail probe registration by injecting a slab allocation failure [2]. [1] ODEBUG: init active (active state 0) object type: timer_list hint: sched_send_work+0x0/0x60 include/linux/list.h:135 WARNING: CPU: 1 PID: 8649 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505 Modules linked in: CPU: 1 PID: 8649 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505 [...] Call Trace: __debug_object_init+0x524/0xd10 lib/debugobjects.c:588 debug_timer_init kernel/time/timer.c:722 [inline] debug_init kernel/time/timer.c:770 [inline] init_timer_key+0x2d/0x340 kernel/time/timer.c:814 net_dm_trace_on_set net/core/drop_monitor.c:1111 [inline] set_all_monitor_traces net/core/drop_monitor.c:1188 [inline] net_dm_monitor_start net/core/drop_monitor.c:1295 [inline] net_dm_cmd_trace+0x720/0x1220 net/core/drop_monitor.c:1339 genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:739 genl_family_rcv_msg net/netlink/genetlink.c:783 [inline] genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:800 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 genl_rcv+0x24/0x40 net/netlink/genetlink.c:811 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2348 ___sys_sendmsg+0xf3/0x170 net/socket.c:2402 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2435 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae [2] FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 1 CPU: 1 PID: 8645 Comm: syz-executor.0 Not tainted 5.11.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0xfa/0x151 should_fail.cold+0x5/0xa should_failslab+0x5/0x10 __kmalloc+0x72/0x3f0 tracepoint_add_func+0x378/0x990 tracepoint_probe_register+0x9c/0xe0 net_dm_cmd_trace+0x7fc/0x1220 genl_family_rcv_msg_doit+0x228/0x320 genl_rcv_msg+0x328/0x580 netlink_rcv_skb+0x153/0x420 genl_rcv+0x24/0x40 netlink_unicast+0x533/0x7d0 netlink_sendmsg+0x856/0xd90 sock_sendmsg+0xcf/0x120 ____sys_sendmsg+0x6e8/0x810 ___sys_sendmsg+0xf3/0x170 __sys_sendmsg+0xe5/0x1b0 do_syscall_64+0x2d/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 70c69274 ("drop_monitor: Initialize timer and work item upon tracing enable") Fixes: 8ee2267a ("drop_monitor: Convert to using devlink tracepoint") Reported-by: syzbot+779559d6503f3a56213d@syzkaller.appspotmail.com Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yunsheng Lin 提交于
gro list uses skb_shinfo(skb)->frag_list to link two skb together, and NAPI_GRO_CB(p)->last->next is used when there are more skb, see skb_gro_receive_list(). gso expects that each segmented skb is linked together using skb->next, so only the first skb->next need to set to skb_shinfo(skb)-> frag_list when doing gso list segment. It is the same reason that nskb->next does not need to be set to list_skb before goto the error handling, because nskb->next already pointers to list_skb. And nskb is also the last skb at the end of loop, so remove tail variable and use nskb instead. Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-