- 19 8月, 2022 3 次提交
-
-
由 Cong Wang 提交于
tcp_cleanup_rbuf() retrieves the skb from sk_receive_queue, it assumes the skb is not yet dequeued. This is no longer true for tcp_read_skb() case where we dequeue the skb first. Fix this by introducing a helper __tcp_cleanup_rbuf() which does not require any skb and calling it in tcp_read_skb(). Fixes: 04919bed ("tcp: Introduce tcp_read_skb()") Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Cong Wang 提交于
Before commit 965b57b4 ("net: Introduce a new proto_ops ->read_skb()"), skb was not dequeued from receive queue hence when we close TCP socket skb can be just flushed synchronously. After this commit, we have to uncharge skb immediately after being dequeued, otherwise it is still charged in the original sock. And we still need to retain skb->sk, as eBPF programs may extract sock information from skb->sk. Therefore, we have to call skb_set_owner_sk_safe() here. Fixes: 965b57b4 ("net: Introduce a new proto_ops ->read_skb()") Reported-and-tested-by: syzbot+a0e6f8738b58f7654417@syzkaller.appspotmail.com Tested-by: NStanislav Fomichev <sdf@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
If construction of the array of policies fails when recording non-first policy we need to unwind. netlink_policy_dump_add_policy() itself also needs fixing as it currently gives up on error without recording the allocated pointer in the pstate pointer. Reported-by: syzbot+dc54d9ba8153b216cae0@syzkaller.appspotmail.com Fixes: 50a896cf ("genetlink: properly support per-op policy dumping") Link: https://lore.kernel.org/r/20220816161939.577583-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 18 8月, 2022 1 次提交
-
-
由 Vladimir Oltean 提交于
ds->ops->port_stp_state_set() is, like most DSA methods, optional, and if absent, the port is supposed to remain in the forwarding state (as standalone). Such is the case with the mv88e6060 driver, which does not offload the bridge layer. DSA warns that the STP state can't be changed to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not. The error message is also not up to modern standards, so take the opportunity to make it more descriptive. Fixes: fd364541 ("net: dsa: change scope of STP state setter") Reported-by: NSergei Antonov <saproj@gmail.com> Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: NSergei Antonov <saproj@gmail.com> Link: https://lore.kernel.org/r/20220816201445.1809483-1-vladimir.oltean@nxp.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 8月, 2022 3 次提交
-
-
由 Jakub Kicinski 提交于
Even though the normal strparser's init function has a return value we got away with ignoring errors until now, as it only validates the parameters and we were passing correct parameters. tls_strp can fail to init on memory allocation errors, which syzbot duly induced and reported. Reported-by: syzbot+abd45eb849b05194b1b6@syzkaller.appspotmail.com Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser") Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Geert Uytterhoeven 提交于
NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca ("netfilter: provide config option to disable ancient procfs parts") in v3.3. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NFlorian Westphal <fw@strlen.de>
-
由 Zhengchao Shao 提交于
In the gnet_stats_add_queue_cpu function, the qstats->qlen statistics are incorrectly set to qcpu->backlog. Fixes: 448e163f ("gen_stats: Add gnet_stats_add_queue()") Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20220815030848.276746-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 16 8月, 2022 2 次提交
-
-
由 Zhengchao Shao 提交于
When bulk delete command is received in the rtnetlink_rcv_msg function, if bulk delete is not supported, module_put is not called to release the reference counting. As a result, module reference count is leaked. Fixes: a6cec0bc ("net: rtnetlink: add bulk delete support flag") Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Acked-by: NNikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20220815024629.240367-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Pablo Neira Ayuso 提交于
Since f3a2181e ("netfilter: nf_tables: Support for sets with multiple ranged fields"), it possible to combine intervals and concatenations. Later on, ef516e86 ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag for userspace to report that the set stores a concatenation. Make sure NFT_SET_CONCAT is set on if field_count is specified for consistency. Otherwise, if NFT_SET_CONCAT is specified with no field_count, bail out with EINVAL. Fixes: ef516e86 ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 15 8月, 2022 7 次提交
-
-
由 Pablo Neira Ayuso 提交于
These flags are mutually exclusive, report EINVAL in this case. Fixes: aaa31047 ("netfilter: nftables: add catch-all set element support") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise, NFTA_SET_ELEM_KEY_END should not be present. For catch-all element, NFTA_SET_ELEM_KEY_END should not be present. The NFT_SET_ELEM_INTERVAL_END is never used with this set flags combination. Fixes: 7b225d0b ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jamal Hadi Salim 提交于
Follows up on: https://lore.kernel.org/all/20220809170518.164662-1-cascardo@canonical.com/ handle of 0 implies from/to of universe realm which is not very sensible. Lets see what this patch will do: $sudo tc qdisc add dev $DEV root handle 1:0 prio //lets manufacture a way to insert handle of 0 $sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 \ route to 0 from 0 classid 1:10 action ok //gets rejected... Error: handle of 0 is not valid. We have an error talking to the kernel, -1 //lets create a legit entry.. sudo tc filter add dev $DEV parent 1:0 protocol ip prio 100 route from 10 \ classid 1:10 action ok //what did the kernel insert? $sudo tc filter ls dev $DEV parent 1:0 filter protocol ip pref 100 route chain 0 filter protocol ip pref 100 route chain 0 fh 0x000a8000 flowid 1:10 from 10 action order 1: gact action pass random type none pass val 0 index 1 ref 1 bind 1 //Lets try to replace that legit entry with a handle of 0 $ sudo tc filter replace dev $DEV parent 1:0 protocol ip prio 100 \ handle 0x000a8000 route to 0 from 0 classid 1:10 action drop Error: Replacing with handle of 0 is invalid. We have an error talking to the kernel, -1 And last, lets run Cascardo's POC: $ ./poc 0 0 -22 -22 -22 Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com> Acked-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Xiong 提交于
The issue happens on specific paths in the function. After both the object `rt` and `neigh` are grabbed successfully, when `lifetime` is nonzero but the metric needs change, the function just deletes the route and set `rt` to NULL. Then, it may try grabbing `rt` and `neigh` again if above conditions hold. The function simply overwrite `neigh` if succeeds or returns if fails, without decreasing the reference count of previous `neigh`. This may result in memory leaks. Fix it by decrementing the reference count of `neigh` in place. Fixes: 6b2e04bc ("net: allow user to set metric on default route learned via Router Advertisement") Signed-off-by: NXin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: NXin Tan <tanxin.ctf@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Mikhalitsyn 提交于
Right now we have a neigh_param PROXY_QLEN which specifies maximum length of neigh_table->proxy_queue. But in fact, this limitation doesn't work well because check condition looks like: tbl->proxy_queue.qlen > NEIGH_VAR(p, PROXY_QLEN) The problem is that p (struct neigh_parms) is a per-device thing, but tbl (struct neigh_table) is a system-wide global thing. It seems reasonable to make proxy_queue limit per-device based. v2: - nothing changed in this patch v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Suggested-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Reviewed-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Denis V. Lunev 提交于
Normal processing of ARP request (usually this is Ethernet broadcast packet) coming to the host is looking like the following: * the packet comes to arp_process() call and is passed through routing procedure * the request is put into the queue using pneigh_enqueue() if corresponding ARP record is not local (common case for container records on the host) * the request is processed by timer (within 80 jiffies by default) and ARP reply is sent from the same arp_process() using NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside pneigh_enqueue()) And here the problem comes. Linux kernel calls pneigh_queue_purge() which destroys the whole queue of ARP requests on ANY network interface start/stop event through __neigh_ifdown(). This is actually not a problem within the original world as network interface start/stop was accessible to the host 'root' only, which could do more destructive things. But the world is changed and there are Linux containers available. Here container 'root' has an access to this API and could be considered as untrusted user in the hosting (container's) world. Thus there is an attack vector to other containers on node when container's root will endlessly start/stop interfaces. We have observed similar situation on a real production node when docker container was doing such activity and thus other containers on the node become not accessible. The patch proposed doing very simple thing. It drops only packets from the same namespace in the pneigh_queue_purge() where network interface state change is detected. This is enough to prevent the problem for the whole node preserving original semantics of the code. v2: - do del_timer_sync() if queue is empty after pneigh_queue_purge() v3: - rebase to net tree Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@kernel.org> Cc: Yajun Deng <yajun.deng@linux.dev> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Christian Brauner <brauner@kernel.org> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Cc: Konstantin Khorenko <khorenko@virtuozzo.com> Cc: kernel@openvz.org Cc: devel@openvz.org Investigated-by: NAlexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com> Signed-off-by: NDenis V. Lunev <den@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maxim Kochetkov 提交于
MHI channel may generates event/interrupt right after enabling. It may leads to 2 race conditions issues. 1) Such event may be dropped by qcom_mhi_qrtr_dl_callback() at check: if (!qdev || mhi_res->transaction_status) return; Because dev_set_drvdata(&mhi_dev->dev, qdev) may be not performed at this moment. In this situation qrtr-ns will be unable to enumerate services in device. --------------------------------------------------------------- 2) Such event may come at the moment after dev_set_drvdata() and before qrtr_endpoint_register(). In this case kernel will panic with accessing wrong pointer at qcom_mhi_qrtr_dl_callback(): rc = qrtr_endpoint_post(&qdev->ep, mhi_res->buf_addr, mhi_res->bytes_xferd); Because endpoint is not created yet. -------------------------------------------------------------- So move mhi_prepare_for_transfer_autoqueue after endpoint creation to fix it. Fixes: a2e2cc0d ("net: qrtr: Start MHI channels during init") Signed-off-by: NMaxim Kochetkov <fido_max@inbox.ru> Reviewed-by: NHemant Kumar <quic_hemantk@quicinc.com> Reviewed-by: NManivannan Sadhasivam <mani@kernel.org> Reviewed-by: NLoic Poulain <loic.poulain@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 8月, 2022 1 次提交
-
-
由 Hongbin Wang 提交于
Functions ip6_tnl_change, ip6_tnl_update and ip6_tnl0_update do always return 0, change the type of functions to void. Signed-off-by: NHongbin Wang <wh_bin@126.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 8月, 2022 5 次提交
-
-
由 Pablo Neira Ayuso 提交于
If the NFTA_SET_ELEM_OBJREF netlink attribute is present and NFT_SET_OBJECT flag is set on, report EINVAL. Move existing sanity check earlier to validate that NFT_SET_OBJECT requires NFTA_SET_ELEM_OBJREF. Fixes: 8aeff920 ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Xin Xiong 提交于
The issue happens on some error handling paths. When the function fails to grab the object `xprt`, it simply returns 0, forgetting to decrease the reference count of another object `xps`, which is increased by rpc_sysfs_xprt_kobj_get_xprt_switch(), causing refcount leaks. Also, the function forgets to check whether `xps` is valid before using it, which may result in NULL-dereferencing issues. Fix it by adding proper error handling code when either `xprt` or `xps` is NULL. Fixes: 5b7eb784 ("SUNRPC: take a xprt offline using sysfs") Signed-off-by: NXin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: NXin Tan <tanxin.ctf@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mikulas Patocka 提交于
The functions clear_bit and set_bit do not imply a memory barrier, thus it may be possible that the waitqueue_active function (which does not take any locks) is moved before clear_bit and it could miss a wakeup event. Fix this bug by adding a memory barrier after clear_bit. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
While looping to build the bitmap of used anonymous set names, check the current set in the iteration, instead of the one that is being created. Fixes: 37a9cc52 ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
To avoid allocation of the conntrack extension area when possible, the default behaviour was changed to only allocate the event extension if a userspace program is subscribed to a notification group. Problem is that while 'conntrack -E' does enable the event allocation behind the scenes, 'conntrack -E expect' does not: no expectation events are delivered unless user sets "net.netfilter.nf_conntrack_events" back to 1 (always on). Fix the autodetection to also consider EXP type group. We need to track the 6 event groups (3+3, new/update/destroy for events and for expectations each) independently, else we'd disable events again if an expectation group becomes empty while there is still an active event group. Fixes: 2794cdb0 ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Reported-by: NYi Chen <yiche@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de>
-
- 11 8月, 2022 13 次提交
-
-
由 Florian Westphal 提交于
nf_tables_check_loops() can be called from rhashtable list walk so cond_resched() cannot be used here. Fixes: 81ea0106 ("netfilter: nf_tables: add rescheduling points during loop detection walks") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
This uses a pseudo-linearization scheme with a 64k global buffer, but BIG TCP arrival means IPv6 TCP stack can generate skbs that exceed this size. In practice, IRC commands are not expected to exceed 512 bytes, plus this is interactive protocol, so we should not see large packets in practice. Given most IRC connections nowadays use TLS so this helper could also be removed in the near future. Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536") Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
This uses a pseudo-linearization scheme with a 64k global buffer, but BIG TCP arrival means IPv6 TCP stack can generate skbs that exceed this size. Use skb_linearize. It should be possible to rewrite this to properly deal with segmented skbs (i.e., only do small chunk-wise accesses), but this is going to be a lot more intrusive than this because every helper function needs to get the sk_buff instead of a pointer to a raw data buffer. In practice, provided we're really looking at FTP control channel packets, there should never be a case where we deal with huge packets. Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536") Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
With BIG TCP, packets generated by tcp stack may exceed 64kb. Cap datalen at 64kb. The internal message format uses 16bit fields, so no embedded message can exceed 64k size. Multiple h323 messages in a single superpacket may now result in a message to get treated as incomplete/truncated, but thats better than scribbling past h323_buffer. Another alternative suitable for net tree would be a switch to skb_linearize(). Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536") Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
For historical reason this code performs pseudo linearization of skbs via skb_header_pointer and a global 64k buffer. With arrival of BIG TCP, packets generated by TCP stack can exceed 64kb. Rewrite this to only extract the needed header data. This also allows to get rid of the locking. Fixes: 7c4e983c ("net: allow gso_max_size to exceed 65536") Fixes: 0fe79f28 ("net: allow gro_max_size to exceed 65536") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Maxim Mikityanskiy 提交于
Currently, tls_device_down synchronizes with tls_device_resync_rx using RCU, however, the pointer to netdev is stored using WRITE_ONCE and loaded using READ_ONCE. Although such approach is technically correct (rcu_dereference is essentially a READ_ONCE, and rcu_assign_pointer uses WRITE_ONCE to store NULL), using special RCU helpers for pointers is more valid, as it includes additional checks and might change the implementation transparently to the callers. Mark the netdev pointer as __rcu and use the correct RCU helpers to access it. For non-concurrent access pass the right conditions that guarantee safe access (locks taken, refcount value). Also use the correct helper in mlx5e, where even READ_ONCE was missing. The transition to RCU exposes existing issues, fixed by this commit: 1. bond_tls_device_xmit could read netdev twice, and it could become NULL the second time, after the NULL check passed. 2. Drivers shouldn't stop processing the last packet if tls_device_down just set netdev to NULL, before tls_dev_del was called. This prevents a possible packet drop when transitioning to the fallback software mode. Fixes: 89df6a81 ("net/bonding: Implement TLS TX device offload") Fixes: c55dcdd4 ("net/tls: Fix use-after-free after the TLS device goes down and up") Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com> Link: https://lore.kernel.org/r/20220810081602.1435800-1-maximmi@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Another device offload bug, we use the length of the output skb as an indication of how much data to copy. But that skb is sized to offset + record length, and we start from offset. So we end up double-counting the offset which leads to skb_copy_bits() returning -EFAULT. Reported-by: NTariq Toukan <tariqt@nvidia.com> Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser") Tested-by: NRan Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-2-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
We can't do skb_walk_frags() on the input skbs, because the input skbs is really just a pointer to the tcp read queue. We need to bound the "is decrypted" check by the amount of data in the message. Note that the walk in tls_device_reencrypt() is after a CoW so the skb there is safe to walk. Actually in the current implementation it can't have frags at all, but whatever, maybe one day it will. Reported-by: NTariq Toukan <tariqt@nvidia.com> Fixes: 84c61fe1 ("tls: rx: do not use the standard strparser") Tested-by: NRan Rozenstein <ranro@nvidia.com> Link: https://lore.kernel.org/r/20220809175544.354343-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
When a route filter is replaced and the old filter has a 0 handle, the old one won't be removed from the hashtable, while it will still be freed. The test was there since before commit 1109c005 ("net: sched: RCU cls_route"), when a new filter was not allocated when there was an old one. The old filter was reused and the reinserting would only be necessary if an old filter was replaced. That was still wrong for the same case where the old handle was 0. Remove the old filter from the list independently from its handle value. This fixes CVE-2022-2588, also reported as ZDI-CAN-17440. Reported-by: NZhenpeng Lin <zplin@u.northwestern.edu> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: NKamal Mostafa <kamal@canonical.com> Cc: <stable@vger.kernel.org> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Link: https://lore.kernel.org/r/20220809170518.164662-1-cascardo@canonical.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Hawkins Jiawei 提交于
Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda #0 <TASK> __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit f1ff5ce2 ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: NJakub Kicinski <kuba@kernel.org> Acked-by: NWen Gu <guwen@linux.alibaba.com> Signed-off-by: NHawkins Jiawei <yin31149@gmail.com> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Hou Tao 提交于
The value of sock local storage map is writable in map iterator, so check max_rdwr_access instead of max_rdonly_access. Fixes: 5ce6e77c ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: NHou Tao <houtao1@huawei.com> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-6-houtao@huaweicloud.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Hou Tao 提交于
sock_map_iter_attach_target() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in sock_map_iter_detach_target() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). Fixing it by acquiring an extra map uref in .init_seq_private and releasing it in .fini_seq_private. Fixes: 03653515 ("net: Allow iterating sockmap and sockhash") Signed-off-by: NHou Tao <houtao1@huawei.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-5-houtao@huaweicloud.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Hou Tao 提交于
bpf_iter_attach_map() acquires a map uref, and the uref may be released before or in the middle of iterating map elements. For example, the uref could be released in bpf_iter_detach_map() as part of bpf_link_release(), or could be released in bpf_map_put_with_uref() as part of bpf_map_release(). So acquiring an extra map uref in bpf_iter_init_sk_storage_map() and releasing it in bpf_iter_fini_sk_storage_map(). Fixes: 5ce6e77c ("bpf: Implement bpf iterator for sock local storage map") Signed-off-by: NHou Tao <houtao1@huawei.com> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/r/20220810080538.1845898-4-houtao@huaweicloud.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 10 8月, 2022 5 次提交
-
-
由 Pablo Neira Ayuso 提交于
dst->ops is set on when nft_expr_clone() fails, but module refcount has not been bumped yet, therefore nft_expr_destroy() leads to module reference underflow. Fixes: 8cfd9b0f ("netfilter: nftables: generalize set expressions support") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces the flag notation. Fixes: 7b225d0b ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
The generation ID is bumped from the commit path while holding the mutex, however, netlink dump operations rely on RCU. This patch also adds missing cb->base_eq initialization in nf_tables_dump_set(). Fixes: 38e029f1 ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Ido Schimmel 提交于
After a failed devlink reload, devlink parameters are still registered, which means user space can set and get their values. In the case of the mlxsw "acl_region_rehash_interval" parameter, these operations will trigger a use-after-free [1]. Fix this by rejecting set and get operations while in the failed state. Return the "-EOPNOTSUPP" error code which does not abort the parameters dump, but instead causes it to skip over the problematic parameter. Another possible fix is to perform these checks in the mlxsw parameter callbacks, but other drivers might be affected by the same problem and I am not aware of scenarios where these stricter checks will cause a regression. [1] mlxsw_spectrum3 0000:00:10.0: Port 125: Failed to register netdev mlxsw_spectrum3 0000:00:10.0: Failed to create ports ================================================================== BUG: KASAN: use-after-free in mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904 Read of size 4 at addr ffff8880099dcfd8 by task kworker/u4:4/777 CPU: 1 PID: 777 Comm: kworker/u4:4 Not tainted 5.19.0-rc7-custom-126601-gfe26f28c586d #1 Hardware name: QEMU MSN4700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: netns cleanup_net Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x92/0xbd lib/dump_stack.c:106 print_address_description mm/kasan/report.c:313 [inline] print_report.cold+0x5e/0x5cf mm/kasan/report.c:429 kasan_report+0xb9/0xf0 mm/kasan/report.c:491 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report_generic.c:306 mlxsw_sp_acl_tcam_vregion_rehash_intrvl_get+0xbd/0xd0 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c:904 mlxsw_sp_acl_region_rehash_intrvl_get+0x49/0x60 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl.c:1106 mlxsw_sp_params_acl_region_rehash_intrvl_get+0x33/0x80 drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3854 devlink_param_get net/core/devlink.c:4981 [inline] devlink_nl_param_fill+0x238/0x12d0 net/core/devlink.c:5089 devlink_param_notify+0xe5/0x230 net/core/devlink.c:5168 devlink_ns_change_notify net/core/devlink.c:4417 [inline] devlink_ns_change_notify net/core/devlink.c:4396 [inline] devlink_reload+0x15f/0x700 net/core/devlink.c:4507 devlink_pernet_pre_exit+0x112/0x1d0 net/core/devlink.c:12272 ops_pre_exit_list net/core/net_namespace.c:152 [inline] cleanup_net+0x494/0xc00 net/core/net_namespace.c:582 process_one_work+0x9fc/0x1710 kernel/workqueue.c:2289 worker_thread+0x675/0x10b0 kernel/workqueue.c:2436 kthread+0x30c/0x3d0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK> The buggy address belongs to the physical page: page:ffffea0000267700 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99dc flags: 0x100000000000000(node=0|zone=1) raw: 0100000000000000 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880099dce80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8880099dcf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff8880099dcf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8880099dd000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8880099dd080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Fixes: 98bbf70c ("mlxsw: spectrum: add "acl_region_rehash_interval" devlink param") Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NJiri Pirko <jiri@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Peilin Ye 提交于
Imagine two non-blocking vsock_connect() requests on the same socket. The first request schedules @connect_work, and after it times out, vsock_connect_timeout() sets *sock* state back to TCP_CLOSE, but keeps *socket* state as SS_CONNECTING. Later, the second request returns -EALREADY, meaning the socket "already has a pending connection in progress", even though the first request has already timed out. As suggested by Stefano, fix it by setting *socket* state back to SS_UNCONNECTED, so that the second request will return -ETIMEDOUT. Suggested-by: NStefano Garzarella <sgarzare@redhat.com> Fixes: d021c344 ("VSOCK: Introduce VM Sockets") Reviewed-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-