- 21 1月, 2021 2 次提交
-
-
由 Jakub Kicinski 提交于
rollback_registered() is a local helper, it's common for driver code to call unregister_netdevice_queue(dev, NULL) when they want to unregister netdevices under rtnl_lock. Inline rollback_registered() and adjust the only remaining caller. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev failure.") moved net_set_todo() outside of rollback_registered() so that rollback_registered() can be used in the failure path of register_netdevice() but without risking a double free. Since commit cf124db5 ("net: Fix inconsistent teardown and release of private netdev state."), however, we have a better way of handling that condition, since destructors don't call free_netdev() directly. After the change in commit c269a24c ("net: make free_netdev() more lenient with unregistering devices") we can now move net_set_todo() back. Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 20 1月, 2021 4 次提交
-
-
由 Paolo Abeni 提交于
The commit dbd50f23 ("net: move the hsize check to the else block in skb_segment") introduced a data corruption for devices supporting scatter-gather. The problem boils down to signed/unsigned comparison given unexpected results: if signed 'hsize' is negative, it will be considered greater than a positive 'len', which is unsigned. This commit addresses resorting to the old checks order, so that 'hsize' never has a negative value when compared with 'len'. v1 -> v2: - reorder hsize checks instead of explicit cast (Alex) Bisected-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Fixes: dbd50f23 ("net: move the hsize check to the else block in skb_segment") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NXin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/861947c2d2d087db82af93c21920ce8147d15490.1611074818.git.pabeni@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Tariq Toukan 提交于
With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be logically done when RXCSUM offload is off. Fixes: 14136564 ("net: Add TLS RX offload feature") Signed-off-by: NTariq Toukan <tariqt@nvidia.com> Reviewed-by: NBoris Pismenny <borisp@nvidia.com> Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Xin Long 提交于
This patch is to define a inline function skb_csum_is_sctp(), and also replace all places where it checks if it's a SCTP CSUM skb. This function would be used later in many networking drivers in the following patches. Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Oleksandr Mazur 提交于
Fix incorrect user_ptr dereferencing when handling port param get/set: idx [0] stores the 'struct devlink' pointer; idx [1] stores the 'struct devlink_port' pointer; Fixes: 637989b5 ("devlink: Always use user_ptr[0] for devlink and simplify post_doit") CC: Parav Pandit <parav@mellanox.com> Signed-off-by: NOleksandr Mazur <oleksandr.mazur@plvision.eu> Signed-off-by: NVadym Kochan <vadym.kochan@plvision.eu> Link: https://lore.kernel.org/r/20210119085333.16833-1-vadym.kochan@plvision.euSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 1月, 2021 1 次提交
-
-
由 Tariq Toukan 提交于
ndo_sk_get_lower_dev returns the lower netdev that corresponds to a given socket. Additionally, we implement a helper netdev_sk_get_lowest_dev() to get the lowest one in chain. Signed-off-by: NTariq Toukan <tariqt@nvidia.com> Reviewed-by: NBoris Pismenny <borisp@nvidia.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 1月, 2021 2 次提交
-
-
由 Xin Long 提交于
After commit 89319d38 ("net: Add frag_list support to skb_segment"), it goes to process frag_list when !hsize in skb_segment(). However, when using skb frag_list, sg normally should not be set. In this case, hsize will be set with len right before !hsize check, then it won't go to frag_list processing code. So the right thing to do is move the hsize check to the else block, so that it won't affect the !hsize check for frag_list processing. v1->v2: - change to do "hsize <= 0" check instead of "!hsize", and also move "hsize < 0" into else block, to save some cycles, as Alex suggested. Signed-off-by: NXin Long <lucien.xin@gmail.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Alexander Lobakin 提交于
Commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") ensured that skbs with data size lower than 1025 bytes will be kmalloc'ed to avoid excessive page cache fragmentation and memory consumption. However, the fix adressed only __napi_alloc_skb() (primarily for virtio_net and napi_get_frags()), but the issue can still be achieved through __netdev_alloc_skb(), which is still used by several drivers. Drivers often allocate a tiny skb for headers and place the rest of the frame to frags (so-called copybreak). Mirror the condition to __netdev_alloc_skb() to handle this case too. Since v1 [0]: - fix "Fixes:" tag; - refine commit message (mention copybreak usecase). [0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me Fixes: a1c7fff7 ("net: netdev_alloc_skb() use build_skb()") Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.meSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 16 1月, 2021 2 次提交
-
-
由 Eric Dumazet 提交于
syzbot report reminded us that very big ewma_log were supported in the past, even if they made litle sense. tc qdisc replace dev xxx root est 1sec 131072sec ... While fixing the bug, also add boundary checks for ewma_log, in line with range supported by iproute2. UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38 shift exponent -1 is negative CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83 call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744 __do_softirq+0x2bc/0xa77 kernel/softirq.c:343 asm_call_irq_on_stack+0xf/0x20 </IRQ> __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:226 [inline] __irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420 irq_exit_rcu+0x5/0x20 kernel/softirq.c:432 sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline] RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline] RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline] RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516 Fixes: 1c0d32fd ("net_sched: gen_estimator: complete rewrite of rate estimators") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20210114181929.1717985-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Tom Rix 提交于
Defining DEBUG should only be done in development. So remove DEBUG. Signed-off-by: NTom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20210114212917.48174-1-trix@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 1月, 2021 3 次提交
-
-
由 Eran Ben Elisha 提交于
Add support for parsing PTP L2 packet header. Such packet consists of an L2 header (with ethertype of ETH_P_1588), PTP header, body and an optional suffix. Signed-off-by: NEran Ben Elisha <eranbe@nvidia.com> Reviewed-by: NTariq Toukan <tariqt@nvidia.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Tariq Toukan 提交于
Cited patch below blocked the TLS TX device offload unless HW_CSUM is set. This broke devices that use IP_CSUM && IP6_CSUM. Here we fix it. Note that the single HW_TLS_TX feature flag indicates support for both IPv4/6, hence it should still be disabled in case only one of (IP_CSUM | IPV6_CSUM) is set. Fixes: ae0b04b2 ("net: Disable NETIF_F_HW_TLS_TX when HW_CSUM is disabled") Signed-off-by: NTariq Toukan <tariqt@nvidia.com> Reported-by: NRohit Maheshwari <rohitm@chelsio.com> Reviewed-by: NMaxim Mikityanskiy <maximmi@mellanox.com> Link: https://lore.kernel.org/r/20210114151215.7061-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eric Dumazet 提交于
Both virtio net and napi_get_frags() allocate skbs with a very small skb->head While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage. For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations per page (order-3 page in x86), or even 64 on PowerPC We have been tracking OOM issues on GKE hosts hitting tcp_mem limits but consuming far more memory for TCP buffers than instructed in tcp_mem[2] Even if we force napi_alloc_skb() to only use order-0 pages, the issue would still be there on arches with PAGE_SIZE >= 32768 This patch makes sure that small skb head are kmalloc backed, so that other objects in the slab page can be reused instead of being held as long as skbs are sitting in socket queues. Note that we might in the future use the sk_buff napi cache, instead of going through a more expensive __alloc_skb() Another idea would be to use separate page sizes depending on the allocated length (to never have more than 4 frags per page) I would like to thank Greg Thelen for his precious help on this matter, analysing crash dumps is always a time consuming task. Fixes: fd11a83d ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Greg Thelen <gthelen@google.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 1月, 2021 1 次提交
-
-
由 Menglong Dong 提交于
Replace the check for ETH_P_8021Q and ETH_P_8021AD in __netif_receive_skb_core with eth_type_vlan. Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn> Link: https://lore.kernel.org/r/20210111104221.3451-1-dong.menglong@zte.com.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 13 1月, 2021 1 次提交
-
-
由 Daniel Borkmann 提交于
The _bpf_setsockopt() is able to set some of the SOL_SOCKET level options, however, _bpf_getsockopt() has little support to actually retrieve them. This small patch adds few misc options such as SO_MARK, SO_PRIORITY and SO_BINDTOIFINDEX. For the latter getter and setter are added. The mark and priority in particular allow to retrieve the options from BPF cgroup hooks to then implement custom behavior / settings on the syscall hooks compared to other sockets that stick to the defaults, for example. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/cba44439b801e5ddc1170e5be787f4dc93a2d7f9.1610406333.git.daniel@iogearbox.net
-
- 12 1月, 2021 1 次提交
-
-
由 Willem de Bruijn 提交于
skb_seq_read iterates over an skb, returning pointer and length of the next data range with each call. It relies on kmap_atomic to access highmem pages when needed. An skb frag may be backed by a compound page, but kmap_atomic maps only a single page. There are not enough kmap slots to always map all pages concurrently. Instead, if kmap_atomic is needed, iterate over each page. As this increases the number of calls, avoid this unless needed. The necessary condition is captured in skb_frag_must_loop. I tried to make the change as obvious as possible. It should be easy to verify that nothing changes if skb_frag_must_loop returns false. Tested: On an x86 platform with CONFIG_HIGHMEM=y CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y CONFIG_NETFILTER_XT_MATCH_STRING=y Run ip link set dev lo mtu 1500 iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT dd if=/dev/urandom of=in bs=1M count=20 nc -l -p 8000 > /dev/null & nc -w 1 -q 0 localhost 8000 < in Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 10 1月, 2021 1 次提交
-
-
由 Eric Dumazet 提交于
GRO_DROP can only be returned from napi_gro_frags() if the skb has not been allocated by a prior napi_get_frags() Since drivers must use napi_get_frags() and test its result before populating the skb with metadata, we can safely remove GRO_DROP since it offers no practical use. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: NEdward Cree <ecree.xilinx@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 09 1月, 2021 8 次提交
-
-
由 Jakub Kicinski 提交于
If register_netdevice() fails at the very last stage - the notifier call - some subsystems may have already seen it and grabbed a reference. struct net_device can't be freed right away without calling netdev_wait_all_refs(). Now that we have a clean interface in form of dev->needs_free_netdev and lenient free_netdev() we can undo what commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev failure.") has done and complete the unregistration path by bringing the net_set_todo() call back. After registration fails user is still expected to explicitly free the net_device, so make sure ->needs_free_netdev is cleared, otherwise rolling back the registration will cause the old double free for callers who release rtnl_lock before the free. This also solves the problem of priv_destructor not being called on notifier error. net_set_todo() will be moved back into unregister_netdevice_queue() in a follow up. Reported-by: NHulk Robot <hulkci@huawei.com> Reported-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
There are two flavors of handling netdev registration: - ones called without holding rtnl_lock: register_netdev() and unregister_netdev(); and - those called with rtnl_lock held: register_netdevice() and unregister_netdevice(). While the semantics of the former are pretty clear, the same can't be said about the latter. The netdev_todo mechanism is utilized to perform some of the device unregistering tasks and it hooks into rtnl_unlock() so the locked variants can't actually finish the work. In general free_netdev() does not mix well with locked calls. Most drivers operating under rtnl_lock set dev->needs_free_netdev to true and expect core to make the free_netdev() call some time later. The part where this becomes most problematic is error paths. There is no way to unwind the state cleanly after a call to register_netdevice(), since unreg can't be performed fully without dropping locks. Make free_netdev() more lenient, and defer the freeing if device is being unregistered. This allows error paths to simply call free_netdev() both after register_netdevice() failed, and after a call to unregister_netdevice() but before dropping rtnl_lock. Simplify the error paths which are currently doing gymnastics around free_netdev() handling. Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Explain the two basic flows of struct net_device's operation. Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Baptiste Lepers 提交于
reuse->socks[] is modified concurrently by reuseport_add_sock. To prevent reading values that have not been fully initialized, only read the array up until the last known safe index instead of incorrectly re-reading the last index of the array. Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets") Signed-off-by: NBaptiste Lepers <baptiste.lepers@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dongseok Yi 提交于
skbs in fraglist could be shared by a BPF filter loaded at TC. If TC writes, it will call skb_ensure_writable -> pskb_expand_head to create a private linear section for the head_skb. And then call skb_clone_fraglist -> skb_get on each skb in the fraglist. skb_segment_list overwrites part of the skb linear section of each fragment itself. Even after skb_clone, the frag_skbs share their linear section with their clone in PF_PACKET. Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have a link for the same frag_skbs chain. If a new skb (not frags) is queued to one of the sk_receive_queue, multiple ptypes can see and release this. It causes use-after-free. [ 4443.426215] ------------[ cut here ]------------ [ 4443.426222] refcount_t: underflow; use-after-free. [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 [ 4443.426808] Call trace: [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426823] skb_release_data+0x144/0x264 [ 4443.426828] kfree_skb+0x58/0xc4 [ 4443.426832] skb_queue_purge+0x64/0x9c [ 4443.426844] packet_set_ring+0x5f0/0x820 [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 [ 4443.426853] __sys_setsockopt+0x188/0x278 [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 [ 4443.426869] el0_svc_common+0xf0/0x1d0 [ 4443.426873] el0_svc_handler+0x74/0x98 [ 4443.426880] el0_svc+0x8/0xc Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.) Signed-off-by: NDongseok Yi <dseok.yi@samsung.com> Acked-by: NWillem de Bruijn <willemb@google.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Lorenzo Bianconi 提交于
Introduce xdp_prepare_buff utility routine to initialize per-descriptor xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in all XDP capable drivers. Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NShay Agroskin <shayagr@amazon.com> Acked-by: NMartin Habets <habetsm.xilinx@gmail.com> Acked-by: NCamelia Groza <camelia.groza@nxp.com> Acked-by: NMarcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Lorenzo Bianconi 提交于
Introduce xdp_init_buff utility routine to initialize xdp_buff fields const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on xdp_init_buff in all XDP capable drivers. Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NShay Agroskin <shayagr@amazon.com> Acked-by: NMartin Habets <habetsm.xilinx@gmail.com> Acked-by: NCamelia Groza <camelia.groza@nxp.com> Acked-by: NMarcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Zheng Yongjun 提交于
The function sockfd_lookup uses fget on the value that is stored in the file field of the returned structure, so fput should ultimately be applied to this value. This can be done directly, but it seems better to use the specific macro sockfd_put, which does the same thing. The cleanup was done using the following semantic patch: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ expression s; @@ s = sockfd_lookup(...) ... + sockfd_put(s); ?- fput(s->file); // </smpl> Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20201229134834.22962-1-zhengyongjun3@huawei.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 08 1月, 2021 12 次提交
-
-
由 Jonathan Lemon 提交于
Unlike the rest of the skb_zcopy_ functions, these routines operate on a 'struct ubuf', not a skb. Remove the 'skb_' prefix from the naming to make things clearer. Suggested-by: NWillem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
Currently, when an ubuf is attached to a new skb, the shared flags word is initialized to a fixed value. Instead of doing this, set the default flags in the ubuf, and have new skbs inherit from this default. This is needed when setting up different zerocopy types. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
In preparation for expanded zerocopy (TX and RX), move the zerocopy related bits out of tx_flags into their own flag word. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
At Willem's suggestion, rename the sock_zerocopy_* functions so that they match the MSG_ZEROCOPY flag, which makes it clear they are specific to this zerocopy implementation. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
RX zerocopy fragment pages which are not allocated from the system page pool require special handling. Give the callback in skb_zcopy_clear() a chance to process them first. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
The sock_zerocopy_put_abort function contains logic which is specific to the current zerocopy implementation. Add a wrapper which checks the callback and dispatches apppropriately. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
Add an optional skb parameter to the zerocopy callback parameter, which is passed down from skb_zcopy_clear(). This gives access to the original skb, which is needed for upcoming RX zero-copy error handling. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
Rename the get routines for consistency. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
Replace sock_zerocopy_put with the generic skb_zcopy_put() function. Pass 'true' as the success argument, as this is identical to no change. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
Before this change, the caller of sock_zerocopy_callback would need to save the zerocopy status, decrement and check the refcount, and then call the callback function - the callback was only invoked when the refcount reached zero. Now, the caller just passes the status into the callback function, which saves the status and handles its own refcounts. This makes the behavior of the sock_zerocopy_callback identical to the tpacket and vhost callbacks. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jonathan Lemon 提交于
All 'struct ubuf_info' users should have a callback defined as of commit 0a4a060b ("sock: fix zerocopy_success regression with msg_zerocopy"). Remove the dead code path to consume_skb(), which makes assumptions about how the structure was allocated. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
All drivers use udp_tunnel_nic_*_port() helpers, prepare for NDO removal by invoking those helpers directly. The helpers are safe to call on all devices, they check if device has the UDP tunnel state initialized. Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Reviewed-by: NJacob Keller <jacob.e.keller@intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 29 12月, 2020 2 次提交
-
-
由 weichenchen 提交于
pneigh_enqueue() tries to obtain a random delay by mod NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY) migth be zero at that point because someone could write zero to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the callers check it. This patch uses prandom_u32_max() to get a random delay instead which avoids potential division by zero. Signed-off-by: Nweichenchen <weichen.chen@linux.alibaba.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Antoine Tenart 提交于
Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't see an actual bug being triggered, but let's be safe here and take the rtnl lock while accessing the map in sysfs. Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue") Signed-off-by: NAntoine Tenart <atenart@kernel.org> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-