- 08 2月, 2023 1 次提交
-
-
由 Marco Elver 提交于
stable inclusion from stable-v5.10.163 commit 67349025f00d0749e36386cfcfc32c2887f47fdb category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6D0MR CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=67349025f00d0749e36386cfcfc32c2887f47fdb -------------------------------- [ Upstream commit fa69ee5a ] It turns out that usage of skb extensions can cause memory leaks. Ido Schimmel reported: "[...] there are instances that blindly overwrite 'skb->extensions' by invoking skb_copy_header() after __alloc_skb()." Therefore, give up on using skb extensions for KCOV handle, and instead directly store kcov_handle in sk_buff. Fixes: 6370cc3b ("net: add kcov handle to skb extensions") Fixes: 85ce50d3 ("net: kcov: don't select SKB_EXTENSIONS when there is no NET") Fixes: 97f53a08 ("net: linux/skbuff.h: combine SKB_EXTENSIONS + KCOV handling") Link: https://lore.kernel.org/linux-wireless/20201121160941.GA485907@shredder.lan/Reported-by: NIdo Schimmel <idosch@idosch.org> Signed-off-by: NMarco Elver <elver@google.com> Link: https://lore.kernel.org/r/20201125224840.2014773-1-elver@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Stable-dep-of: db0b124f ("igc: Enhance Qbv scheduling by using first flag bit") Signed-off-by: NSasha Levin <sashal@kernel.org> Conflicts: include/linux/skbuff.h Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: NLiu Jian <liujian56@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 13 12月, 2022 1 次提交
-
-
由 Kuniyuki Iwashima 提交于
stable inclusion from stable-v5.10.140 commit 8db070463e3ebedf81dfcc1d8bc51f46e751872f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8db070463e3ebedf81dfcc1d8bc51f46e751872f -------------------------------- [ Upstream commit d2154b0a ] While reading sysctl_tstamp_allow_data, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: b245be1f ("net-timestamp: no-payload only sysctl") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 14 6月, 2022 1 次提交
-
-
由 Jean-Philippe Brucker 提交于
mainline inclusion from mainline-v5.18-rc2 commit 1effe8ca category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56XHY CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1effe8ca4e34 ---------------------------------------------------------------------- Fix a use-after-free when using page_pool with page fragments. We encountered this problem during normal RX in the hns3 driver: (1) Initially we have three descriptors in the RX queue. The first one allocates PAGE1 through page_pool, and the other two allocate one half of PAGE2 each. Page references look like this: RX_BD1 _______ PAGE1 RX_BD2 _______ PAGE2 RX_BD3 _________/ (2) Handle RX on the first descriptor. Allocate SKB1, eventually added to the receive queue by tcp_queue_rcv(). (3) Handle RX on the second descriptor. Allocate SKB2 and pass it to netif_receive_skb(): netif_receive_skb(SKB2) ip_rcv(SKB2) SKB3 = skb_clone(SKB2) SKB2 and SKB3 share a reference to PAGE2 through skb_shinfo()->dataref. The other ref to PAGE2 is still held by RX_BD3: SKB2 ---+- PAGE2 SKB3 __/ / RX_BD3 _________/ (3b) Now while handling TCP, coalesce SKB3 with SKB1: tcp_v4_rcv(SKB3) tcp_try_coalesce(to=SKB1, from=SKB3) // succeeds kfree_skb_partial(SKB3) skb_release_data(SKB3) // drops one dataref SKB1 _____ PAGE1 \____ SKB2 _____ PAGE2 / RX_BD3 _________/ In skb_try_coalesce(), __skb_frag_ref() takes a page reference to PAGE2, where it should instead have increased the page_pool frag reference, pp_frag_count. Without coalescing, when releasing both SKB2 and SKB3, a single reference to PAGE2 would be dropped. Now when releasing SKB1 and SKB2, two references to PAGE2 will be dropped, resulting in underflow. (3c) Drop SKB2: af_packet_rcv(SKB2) consume_skb(SKB2) skb_release_data(SKB2) // drops second dataref page_pool_return_skb_page(PAGE2) // drops one pp_frag_count SKB1 _____ PAGE1 \____ PAGE2 / RX_BD3 _________/ (4) Userspace calls recvmsg() Copies SKB1 and releases it. Since SKB3 was coalesced with SKB1, we release the SKB3 page as well: tcp_eat_recv_skb(SKB1) skb_release_data(SKB1) page_pool_return_skb_page(PAGE1) page_pool_return_skb_page(PAGE2) // drops second pp_frag_count (5) PAGE2 is freed, but the third RX descriptor was still using it! In our case this causes IOMMU faults, but it would silently corrupt memory if the IOMMU was disabled. Change the logic that checks whether pp_recycle SKBs can be coalesced. We still reject differing pp_recycle between 'from' and 'to' SKBs, but in order to avoid the situation described above, we also reject coalescing when both 'from' and 'to' are pp_recycled and 'from' is cloned. The new logic allows coalescing a cloned pp_recycle SKB into a page refcounted one, because in this case the release (4) will drop the right reference, the one taken by skb_try_coalesce(). Fixes: 53e0961d ("page_pool: add frag page recycling support in page pool") Suggested-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: NYunsheng Lin <linyunsheng@huawei.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NJiantao Xiao <xiaojiantao1@h-partners.com> Reviewed-by: NJian Shen <shenjian15@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 28 5月, 2022 2 次提交
-
-
由 lena wang 提交于
stable inclusion from stable-v5.10.104 commit 4e178ed14bda47942c1ccad3f60b774870b45db9 bugzilla: https://gitee.com/openeuler/kernel/issues/I56XAC Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4e178ed14bda47942c1ccad3f60b774870b45db9 -------------------------------- commit 224102de upstream. The truesize for a UDP GRO packet is added by main skb and skbs in main skb's frag_list: skb_gro_receive_list p->truesize += skb->truesize; The commit 53475c5d ("net: fix use-after-free when UDP GRO with shared fraglist") introduced a truesize increase for frag_list skbs. When uncloning skb, it will call pskb_expand_head and trusesize for frag_list skbs may increase. This can occur when allocators uses __netdev_alloc_skb and not jump into __alloc_skb. This flow does not use ksize(len) to calculate truesize while pskb_expand_head uses. skb_segment_list err = skb_unclone(nskb, GFP_ATOMIC); pskb_expand_head if (!skb->sk || skb->destructor == sock_edemux) skb->truesize += size - osize; If we uses increased truesize adding as delta_truesize, it will be larger than before and even larger than previous total truesize value if skbs in frag_list are abundant. The main skb truesize will become smaller and even a minus value or a huge value for an unsigned int parameter. Then the following memory check will drop this abnormal skb. To avoid this error we should use the original truesize to segment the main skb. Fixes: 53475c5d ("net: fix use-after-free when UDP GRO with shared fraglist") Signed-off-by: Nlena wang <lena.wang@mediatek.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/1646133431-8948-1-git-send-email-lena.wang@mediatek.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Eric Dumazet 提交于
stable inclusion from stable-v5.10.103 commit c5722243d0e5428f3f62682fb38f03a1698578ba bugzilla: https://gitee.com/openeuler/kernel/issues/I56NE7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c5722243d0e5428f3f62682fb38f03a1698578ba -------------------------------- commit ef527f96 upstream. Whenever one of these functions pull all data from an skb in a frag_list, use consume_skb() instead of kfree_skb() to avoid polluting drop monitoring. Fixes: 6fa01ccd ("skbuff: Add pskb_extract() helper function") Signed-off-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220220154052.1308469-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 27 1月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
mainline inclusion from mainline-v5.16-rc1 commit c4777efa category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4RYXI CVE: NA -------------------------------- While commit 097b9146 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()") fixed immediate issues found when KFENCE was enabled/tested, there are still similar issues, when tcp_trim_head() hits KFENCE while the master skb is cloned. This happens under heavy networking TX workloads, when the TX completion might be delayed after incoming ACK. This patch fixes the WARNING in sk_stream_kill_queues when sk->sk_mem_queued/sk->sk_forward_alloc are not zero. Fixes: d3fb45f3 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NMarco Elver <elver@google.com> Link: https://lore.kernel.org/r/20211102004555.1359210-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Reviewed-by: NLiu Jian <liujian56@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 1月, 2022 1 次提交
-
-
由 Gal Pressman 提交于
stable inclusion from stable-v5.10.88 commit 337bb7bf7c31e7a4a883054775db169e20e3723b bugzilla: 186058 https://gitee.com/openeuler/kernel/issues/I4QW6A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=337bb7bf7c31e7a4a883054775db169e20e3723b -------------------------------- [ Upstream commit 8a03ef67 ] When printing netdev features %pNF already takes care of the 0x prefix, remove the explicit one. Fixes: 6413139d ("skbuff: increase verbosity when dumping skb data") Signed-off-by: NGal Pressman <gal@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 11 11月, 2021 4 次提交
-
-
由 Ilias Apalodimas 提交于
mainline inclusion from mainline-v5.14-rc3 commit 2cc3aeb5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2cc3aeb5eccc ---------------------------------------------------------------------- As Alexander points out, when we are trying to recycle a cloned/expanded SKB we might trigger a race. The recycling code relies on the pp_recycle bit to trigger, which we carry over to cloned SKBs. If that cloned SKB gets expanded or if we get references to the frags, call skb_release_data() and overwrite skb->head, we are creating separate instances accessing the same page frags. Since the skb_release_data() will first try to recycle the frags, there's a potential race between the original and cloned SKB, since both will have the pp_recycle bit set. Fix this by explicitly those SKBs not recyclable. The atomic_sub_return effectively limits us to a single release case, and when we are calling skb_release_data we are also releasing the option to perform the recycling, or releasing the pages from the page pool. Fixes: 6a5bcd84 ("page_pool: Allow drivers to hint on SKB recycling") Reported-by: NAlexander Duyck <alexanderduyck@fb.com> Suggested-by: NAlexander Duyck <alexanderduyck@fb.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NYongxin Li <liyongxin1@huawei.com> Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ilias Apalodimas 提交于
mainline inclusion from mainline-v5.14-rc1 commit 6a5bcd84 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6a5bcd84e886 ---------------------------------------------------------------------- Up to now several high speed NICs have custom mechanisms of recycling the allocated memory they use for their payloads. Our page_pool API already has recycling capabilities that are always used when we are running in 'XDP mode'. So let's tweak the API and the kernel network stack slightly and allow the recycling to happen even during the standard operation. The API doesn't take into account 'split page' policies used by those drivers currently, but can be extended once we have users for that. The idea is to be able to intercept the packet on skb_release_data(). If it's a buffer coming from our page_pool API recycle it back to the pool for further usage or just release the packet entirely. To achieve that we introduce a bit in struct sk_buff (pp_recycle:1) and a field in struct page (page->pp) to store the page_pool pointer. Storing the information in page->pp allows us to recycle both SKBs and their fragments. We could have skipped the skb bit entirely, since identical information can bederived from struct page. However, in an effort to affect the free path as less as possible, reading a single bit in the skb which is already in cache, is better that trying to derive identical information for the page stored data. The driver or page_pool has to take care of the sync operations on it's own during the buffer recycling since the buffer is, after opting-in to the recycling, never unmapped. Since the gain on the drivers depends on the architecture, we are not enabling recycling by default if the page_pool API is used on a driver. In order to enable recycling the driver must call skb_mark_for_recycle() to store the information we need for recycling in page->pp and enabling the recycling bit, or page_pool_store_mem_info() for a fragment. Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NYongxin Li <liyongxin1@huawei.com> Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com> (fix conflicts: replace skb_get_kcov_handle by skb_csum_is_sctp) Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Matteo Croce 提交于
mainline inclusion from mainline-v5.14-rc1 commit c420c989 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c420c98982fa ---------------------------------------------------------------------- This is a prerequisite patch, the next one is enabling recycling of skbs and fragments. Add an extra argument on __skb_frag_unref() to handle recycling, and update the current users of the function with that. Signed-off-by: NMatteo Croce <mcroce@microsoft.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NYongxin Li <liyongxin1@huawei.com> Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jonathan Lemon 提交于
mainline inclusion from mainline-v5.12-rc1-dontuse commit 70c43167 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4CVS3 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=70c4316749f6 ---------------------------------------------------------------------- RX zerocopy fragment pages which are not allocated from the system page pool require special handling. Give the callback in skb_zcopy_clear() a chance to process them first. Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NYongxin Li <liyongxin1@huawei.com> Signed-off-by: NJunxin Chen <chenjunxin1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 10月, 2021 3 次提交
-
-
由 Pravin B Shelar 提交于
stable inclusion from stable-5.10.57 commit e8b287e783811959ea7e5b6ac7841ff7e8180a55 bugzilla: 176179 https://gitee.com/openeuler/kernel/issues/I4DZ64 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8b287e783811959ea7e5b6ac7841ff7e8180a55 -------------------------------- [ Upstream commit a17ad096 ] In some cases skb head could be locked and entire header data is pulled from skb. When skb_zerocopy() called in such cases, following BUG is triggered. This patch fixes it by copying entire skb in such cases. This could be optimized incase this is performance bottleneck. ---8<--- kernel BUG at net/core/skbuff.c:2961! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 0 Comm: swapper/2 Tainted: G OE 5.4.0-77-generic #86-Ubuntu Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:skb_zerocopy+0x37a/0x3a0 RSP: 0018:ffffbcc70013ca38 EFLAGS: 00010246 Call Trace: <IRQ> queue_userspace_packet+0x2af/0x5e0 [openvswitch] ovs_dp_upcall+0x3d/0x60 [openvswitch] ovs_dp_process_packet+0x125/0x150 [openvswitch] ovs_vport_receive+0x77/0xd0 [openvswitch] netdev_port_receive+0x87/0x130 [openvswitch] netdev_frame_hook+0x4b/0x60 [openvswitch] __netif_receive_skb_core+0x2b4/0xc90 __netif_receive_skb_one_core+0x3f/0xa0 __netif_receive_skb+0x18/0x60 process_backlog+0xa9/0x160 net_rx_action+0x142/0x390 __do_softirq+0xe1/0x2d6 irq_exit+0xae/0xb0 do_IRQ+0x5a/0xf0 common_interrupt+0xf/0xf Code that triggered BUG: int skb_zerocopy(struct sk_buff *to, struct sk_buff *from, int len, int hlen) { int i, j = 0; int plen = 0; /* length of skb->head fragment */ int ret; struct page *page; unsigned int offset; BUG_ON(!from->head_frag && !hlen); Signed-off-by: NPravin B Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paul Blakey 提交于
stable inclusion from stable-5.10.54 commit 570341f10eccfe5c339363544a0339a8550944b7 bugzilla: 175586 https://gitee.com/openeuler/kernel/issues/I4DVDU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=570341f10eccfe5c339363544a0339a8550944b7 -------------------------------- commit 8550ff8d upstream. When multiple SKBs are merged to a new skb under napi GRO, or SKB is re-used by napi, if nfct was set for them in the driver, it will not be released while freeing their stolen head state or on re-use. Release nfct on napi's stolen or re-used SKBs, and in gro_list_prepare, check conntrack metadata diff. Fixes: 5c6b9460 ("net/mlx5e: CT: Handle misses after executing CT action") Reviewed-by: NRoi Dayan <roid@nvidia.com> Signed-off-by: NPaul Blakey <paulb@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Aleksandr Nogikh 提交于
stable inclusion from stable-5.10.54 commit 4a31baf55f6af1cae76db61e33df345d3fbee900 bugzilla: 175586 https://gitee.com/openeuler/kernel/issues/I4DVDU Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4a31baf55f6af1cae76db61e33df345d3fbee900 -------------------------------- [ Upstream commit 6370cc3b ] Remote KCOV coverage collection enables coverage-guided fuzzing of the code that is not reachable during normal system call execution. It is especially helpful for fuzzing networking subsystems, where it is common to perform packet handling in separate work queues even for the packets that originated directly from the user space. Enable coverage-guided frame injection by adding kcov remote handle to skb extensions. Default initialization in __alloc_skb and __build_skb_around ensures that no socket buffer that was generated during a system call will be missed. Code that is of interest and that performs packet processing should be annotated with kcov_remote_start()/kcov_remote_stop(). An alternative approach is to determine kcov_handle solely on the basis of the device/interface that received the specific socket buffer. However, in this case it would be impossible to distinguish between packets that originated during normal background network processes or were intentionally injected from the user space. Signed-off-by: NAleksandr Nogikh <nogikh@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 09 4月, 2021 1 次提交
-
-
由 Marco Elver 提交于
stable inclusion from stable-5.10.21 commit 97ff09a7ed484fef2b1bbc103857444b7332fca8 bugzilla: 50609 -------------------------------- commit 097b9146 upstream. Avoid the assumption that ksize(kmalloc(S)) == ksize(kmalloc(S)): when cloning an skb, save and restore truesize after pskb_expand_head(). This can occur if the allocator decides to service an allocation of the same size differently (e.g. use a different size class, or pass the allocation on to KFENCE). Because truesize is used for bookkeeping (such as sk_wmem_queued), a modified truesize of a cloned skb may result in corrupt bookkeeping and relevant warnings (such as in sk_stream_kill_queues()). Link: https://lkml.kernel.org/r/X9JR/J6dMMOy1obu@elver.google.com Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NMarco Elver <elver@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210201160420.2826895-1-elver@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 08 2月, 2021 1 次提交
-
-
由 Alexander Lobakin 提交于
stable inclusion from stable-5.10.11 commit b65578cec113b357228997d872d73d51e698f2d3 bugzilla: 47621 -------------------------------- commit 66c55602 upstream. Commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") ensured that skbs with data size lower than 1025 bytes will be kmalloc'ed to avoid excessive page cache fragmentation and memory consumption. However, the fix adressed only __napi_alloc_skb() (primarily for virtio_net and napi_get_frags()), but the issue can still be achieved through __netdev_alloc_skb(), which is still used by several drivers. Drivers often allocate a tiny skb for headers and place the rest of the frame to frags (so-called copybreak). Mirror the condition to __netdev_alloc_skb() to handle this case too. Since v1 [0]: - fix "Fixes:" tag; - refine commit message (mention copybreak usecase). [0] https://lore.kernel.org/netdev/20210114235423.232737-1-alobakin@pm.me Fixes: a1c7fff7 ("net: netdev_alloc_skb() use build_skb()") Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Link: https://lore.kernel.org/r/20210115150354.85967-1-alobakin@pm.meSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 29 1月, 2021 2 次提交
-
-
由 Eric Dumazet 提交于
stable inclusion from stable-5.10.10 commit 024158d3b5715e830bf4b51c4452132937c8f1e8 bugzilla: 47610 -------------------------------- [ Upstream commit 3226b158 ] Both virtio net and napi_get_frags() allocate skbs with a very small skb->head While using page fragments instead of a kmalloc backed skb->head might give a small performance improvement in some cases, there is a huge risk of under estimating memory usage. For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations per page (order-3 page in x86), or even 64 on PowerPC We have been tracking OOM issues on GKE hosts hitting tcp_mem limits but consuming far more memory for TCP buffers than instructed in tcp_mem[2] Even if we force napi_alloc_skb() to only use order-0 pages, the issue would still be there on arches with PAGE_SIZE >= 32768 This patch makes sure that small skb head are kmalloc backed, so that other objects in the slab page can be reused instead of being held as long as skbs are sitting in socket queues. Note that we might in the future use the sk_buff napi cache, instead of going through a more expensive __alloc_skb() Another idea would be to use separate page sizes depending on the allocated length (to never have more than 4 frags per page) I would like to thank Greg Thelen for his precious help on this matter, analysing crash dumps is always a time consuming task. Fixes: fd11a83d ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Greg Thelen <gthelen@google.com> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Dongseok Yi 提交于
stable inclusion from stable-5.10.10 commit 24cd3317418955fd9489e4e7e9c2b643691b270b bugzilla: 47610 -------------------------------- [ Upstream commit 53475c5d ] skbs in fraglist could be shared by a BPF filter loaded at TC. If TC writes, it will call skb_ensure_writable -> pskb_expand_head to create a private linear section for the head_skb. And then call skb_clone_fraglist -> skb_get on each skb in the fraglist. skb_segment_list overwrites part of the skb linear section of each fragment itself. Even after skb_clone, the frag_skbs share their linear section with their clone in PF_PACKET. Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have a link for the same frag_skbs chain. If a new skb (not frags) is queued to one of the sk_receive_queue, multiple ptypes can see and release this. It causes use-after-free. [ 4443.426215] ------------[ cut here ]------------ [ 4443.426222] refcount_t: underflow; use-after-free. [ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190 refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO) [ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8 [ 4443.426808] Call trace: [ 4443.426813] refcount_dec_and_test_checked+0xa4/0xc8 [ 4443.426823] skb_release_data+0x144/0x264 [ 4443.426828] kfree_skb+0x58/0xc4 [ 4443.426832] skb_queue_purge+0x64/0x9c [ 4443.426844] packet_set_ring+0x5f0/0x820 [ 4443.426849] packet_setsockopt+0x5a4/0xcd0 [ 4443.426853] __sys_setsockopt+0x188/0x278 [ 4443.426858] __arm64_sys_setsockopt+0x28/0x38 [ 4443.426869] el0_svc_common+0xf0/0x1d0 [ 4443.426873] el0_svc_handler+0x74/0x98 [ 4443.426880] el0_svc+0x8/0xc Fixes: 3a1296a3 (net: Support GRO/GSO fraglist chaining.) Signed-off-by: NDongseok Yi <dseok.yi@samsung.com> Acked-by: NWillem de Bruijn <willemb@google.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 27 1月, 2021 1 次提交
-
-
由 Vasily Averin 提交于
stable inclusion from stable-5.10.8 commit 43f6ea41408b651180e49bef2f7a2f5f5d40a9a4 bugzilla: 47450 -------------------------------- commit 54970a2f upstream. syzbot reproduces BUG_ON in skb_checksum_help(): tun creates (bogus) skb with huge partial-checksummed area and small ip packet inside. Then ip_rcv trims the skb based on size of internal ip packet, after that csum offset points beyond of trimmed skb. Then checksum_tg() called via netfilter hook triggers BUG_ON: offset = skb_checksum_start_offset(skb); BUG_ON(offset >= skb_headlen(skb)); To work around the problem this patch forces pskb_trim_rcsum_slow() to return -EINVAL in described scenario. It allows its callers to drop such kind of packets. Link: https://syzkaller.appspot.com/bug?id=b419a5ca95062664fe1a60b764621eb4526e2cd0 Reported-by: syzbot+7010af67ced6105e5ab6@syzkaller.appspotmail.com Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Acked-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/1b2494af-2c56-8ee2-7bc0-923fcad1cdf8@virtuozzo.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 04 12月, 2020 1 次提交
-
-
由 Davide Caratti 提交于
skb_mpls_dec_ttl() reads the LSE without ensuring that it is contained in the skb "linear" area. Fix this calling pskb_may_pull() before reading the current ttl. Found by code inspection. Fixes: 2a2ea508 ("net: sched: add mpls manipulation actions to TC") Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/53659f28be8bc336c113b5254dc637cc76bbae91.1606987074.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 28 11月, 2020 1 次提交
-
-
由 Willem de Bruijn 提交于
When setting sk_err, set it to ee_errno, not ee_origin. Commit f5f99309 ("sock: do not set sk_err in sock_dequeue_err_skb") disabled updating sk_err on errq dequeue, which is correct for most error types (origins): - sk->sk_err = err; Commit 38b25793 ("sock: reset sk_err when the error queue is empty") reenabled the behavior for IMCP origins, which do require it: + if (icmp_next) + sk->sk_err = SKB_EXT_ERR(skb_next)->ee.ee_origin; But read from ee_errno. Fixes: 38b25793 ("sock: reset sk_err when the error queue is empty") Reported-by: NAyush Ranjan <ayushranjan@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Link: https://lore.kernel.org/r/20201126151220.2819322-1-willemdebruijn.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 10月, 2020 1 次提交
-
-
由 Vladimir Oltean 提交于
Currently skb_dump has a restriction to only dump full packet for the first 5 socket buffers, then only headers will be printed. Remove this arbitrary and confusing restriction, which is only documented vaguely ("up to") in the comments above the prototype. Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 10月, 2020 1 次提交
-
-
由 Guillaume Nault 提交于
Openvswitch allows to drop a packet's Ethernet header, therefore skb_mpls_push() and skb_mpls_pop() might be called with ethernet=true and mac_len=0. In that case the pointer passed to skb_mod_eth_type() doesn't point to an Ethernet header and the new Ethertype is written at unexpected locations. Fix this by verifying that mac_len is big enough to contain an Ethernet header. Fixes: fa4e0f88 ("net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actions") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Acked-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 10月, 2020 1 次提交
-
-
由 Guillaume Nault 提交于
Implement TCA_VLAN_ACT_POP_ETH and TCA_VLAN_ACT_PUSH_ETH, to respectively pop and push a base Ethernet header at the beginning of a frame. POP_ETH is just a matter of pulling ETH_HLEN bytes. VLAN tags, if any, must be stripped before calling POP_ETH. PUSH_ETH is restricted to skbs with no mac_header, and only the MAC addresses can be configured. The Ethertype is automatically set from skb->protocol. These restrictions ensure that all skb's fields remain consistent, so that this action can't confuse other part of the networking stack (like GSO). Since openvswitch already had these actions, consolidate the code in skbuff.c (like for vlan and mpls push/pop). Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 9月, 2020 1 次提交
-
-
由 Yunsheng Lin 提交于
When budget is non-zero, skb_unref() has already handled the NULL checking. When budget is zero, the dev_consume_skb_any() has handled NULL checking in __dev_kfree_skb_irq(), or dev_kfree_skb() which also ultimately call skb_unref(). So remove the unnecessary checking in napi_consume_skb(). Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2020 1 次提交
-
-
由 Herbert Xu 提交于
The function consume_skb is only meaningful when tracing is enabled. This patch makes it conditional on CONFIG_TRACEPOINTS. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 8月, 2020 2 次提交
-
-
由 Al Viro 提交于
It's always 0. Note that we theoretically could use ~0U as well - result will be the same modulo 0xffff, _if_ the damn thing did the right thing for any value of initial sum; later we'll make use of that when convenient. However, unlike csum_and_copy_..._user(), there are instances that did not work for arbitrary initial sums; c6x is one such. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> -
由 Al Viro 提交于
it's always 0 Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 19 8月, 2020 2 次提交
-
-
由 Miaohe Lin 提交于
pskb_carve_frag_list() may return -ENOMEM in pskb_carve_inside_nonlinear(). we should handle this correctly or we would get wrong sk_buff. Fixes: 6fa01ccd ("skbuff: Add pskb_extract() helper function") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Miaohe Lin 提交于
The frags of skb_shared_info of the data is assigned in following loop. It is meaningless to do a memcpy of frags here. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 8月, 2020 1 次提交
-
-
由 Miaohe Lin 提交于
We may access the two bytes after vlan_hdr in vlan_set_encap_proto(). So we should pull VLAN_HLEN + sizeof(unsigned short) in skb_vlan_untag() or we may access the wrong data. Fixes: 0d5501c1 ("net: Always untag vlan-tagged traffic on input.") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 8月, 2020 1 次提交
-
-
由 Miaohe Lin 提交于
Use helper function ip_is_fragment() to check ip fragment. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 8月, 2020 2 次提交
-
-
由 Miaohe Lin 提交于
When we don't care about vlan depth, we could pass NULL instead of the address of a unused local variable to skb_network_protocol() as a param. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Miaohe Lin 提交于
In fact, skb_pagelen() - skb_headlen() is equal to __skb_pagelen(), use it directly to avoid unnecessary skb_headlen() call. Also fix the CHECK note of checkpatch.pl: Comparison to NULL could be written "!__pskb_pull_tail" Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 8月, 2020 1 次提交
-
-
由 Yousuk Seung 提交于
This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports the earliest departure time(EDT) of the timestamped skb. By tracking EDT values of the skb from different timestamps, we can observe when and how much the value changed. This allows to measure the precise delay injected on the sender host e.g. by a bpf-base throttler. Signed-off-by: NYousuk Seung <ysseung@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
skb_gro_receive() used to be used by SCTP, it is no longer the case. skb_gro_receive_list() is in the same category : never used from modules. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 5月, 2020 1 次提交
-
-
由 Florian Westphal 提交于
mptcp calls this from the transmit side, from process context. Allow a sleeping allocation instead of unconditional GFP_ATOMIC. Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2020 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
In skb_panic() the real pointer values are really needed to diagnose issues, e.g. data and head are related (to calculate headroom). The hashed versions of the addresses doesn't make much sense here. The patch use the printk specifier %px to print the actual address. The printk documentation on %px: https://www.kernel.org/doc/html/latest/core-api/printk-formats.html#unmodified-addresses Fixes: ad67b74d ("printk: hash addresses printed with %p") Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 3月, 2020 1 次提交
-
-
由 Florian Westphal 提交于
Xin Long says: On udp rx path udp_rcv_segment() may do segment where the frag skbs will get the header copied from the head skb in skb_segment_list() by calling __copy_skb_header(), which could overwrite the frag skbs' extensions by __skb_ext_copy() and cause a leak. This issue was found after loading esp_offload where a sec path ext is set in the skb. Fix this by discarding head state of the fraglist skb before replacing its contents. Fixes: 3a1296a3 ("net: Support GRO/GSO fraglist chaining.") Cc: Steffen Klassert <steffen.klassert@secunet.com> Reported-by: NXiumei Mu <xmu@redhat.com> Tested-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 3月, 2020 1 次提交
-
-
由 Yadu Kishore 提交于
Problem: TCP checksum in the output path is not being offloaded during GSO in the following case: The network driver does not support scatter-gather but supports checksum offload with NETIF_F_HW_CSUM. Cause: skb_segment calls skb_copy_and_csum_bits if the network driver does not announce NETIF_F_SG. It does not check if the driver supports NETIF_F_HW_CSUM. So for devices which might want to offload checksum but do not support SG there is currently no way to do so if GSO is enabled. Solution: In skb_segment check if the network controller does checksum and if so call skb_copy_bits instead of skb_copy_and_csum_bits. Testing: Without the patch, ran iperf TCP traffic with NETIF_F_HW_CSUM enabled in the network driver. Observed the TCP checksum offload is not happening since the skbs received by the driver in the output path have skb->ip_summed set to CHECKSUM_NONE. With the patch ran iperf TCP traffic and observed that TCP checksum is being offloaded with skb->ip_summed set to CHECKSUM_PARTIAL. Also tested with the patch by disabling NETIF_F_HW_CSUM in the driver to cover the newly introduced if-else code path in skb_segment. Link: https://lore.kernel.org/netdev/CA+FuTSeYGYr3Umij+Mezk9CUcaxYwqEe5sPSuXF8jPE2yMFJAw@mail.gmail.comSigned-off-by: NYadu Kishore <kyk.segfault@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-